Difference between revisions of "Supporting GUI Translation/Translations DAT Format"

From ScummVM :: Wiki
Jump to navigation Jump to search
(Update description for new format.)
(Update description for version 4)
 
(One intermediate revision by the same user not shown)
Line 6: Line 6:
# [[#The header|a header]]
# [[#The header|a header]]
# [[#List of Languages|a block with the list of languages]]
# [[#List of Languages|a block with the list of languages]]
# [[#List of Codepages|a block with the list of codepages]]
# [[#English messages|a block with the the english messages]]
# [[#English messages|a block with the the english messages]]
# [[#Translated messages|a block with the translated messages for language 1]]
# [[#Translated messages|a block with the translated messages for language 1]]
Line 11: Line 12:
# ...
# ...
# a block with the translated messages for language n
# a block with the translated messages for language n
#  [[#Codepage mapping|a block with the mapping for codepage 1]]
# a block with the mapping for codepage 2
# ...
# a block with the mapping for codepage m


The following description is valid for version 2 of the file format. Version 1 is very similar but does not contain the context strings in the [[#Translated messages|translated messages]].
The latest file format version is 4. The description below is for version 4 but indicate blocks added, removed, or changed from version 1, 2 and 3.
* Version 2 adds context information in the [[#Translated messages|translated messages]].
* Version 3 adds code page descriptions (for translations not using ASCII or ISO-8859-1).
* Version 4 uses UTF-8 for translations and drops the code page descriptions.


== The header ==
== The header ==
Line 18: Line 26:
!Type    !! Size  !! Order  !! Description
!Type    !! Size  !! Order  !! Description
|-
|-
|String  || 12   ||             || 'TRANSLATIONS'
|String  || 12   ||           || 'TRANSLATIONS'
|-
|Byte    || 1    ||          || Version (of the file format)
|-
|uint16  || 2    || BE        || Number of translations
|-
|-
|Byte    || 1    ||             || Version (of the file format)
|(uint16)|| (2)  || (BE)      || Only in version 3: Number of code pages
|-
|-
|uint16 || 2     || BE        || Number of translations
|uint32  || 4     || BE        || Size in bytes of block 1 (list of languages)<br>In version 3 and below the type is uint16
|-
|-
|uint16 || 2     || BE       || Size in bytes of block 1 (list of languages)
|(uint16)|| (2|| (BE)      || Only in version 3: Size in bytes of block 2 (list of codepages)
|-
|-
|uint16 || 2     || BE        || Size in bytes of block 2 (english messages)
|uint32  || 4     || BE        || Size in bytes of block 3 (english messages)<br>In version 3 and below the type is uint16
|-
|-
|uint16 || 2     || BE        || Size in bytes of block 3 (first translation)
|uint32  || 4     || BE        || Size in bytes of block 4 (first translation)<br>In version 3 and below the type is uint16
|-
|-
|uint16 || 2     || BE        || Size in bytes of block 4 (second translation)
|uint32  || 4     || BE        || Size in bytes of block 5 (second translation)<br>In version 3 and below the type is uint16
|-
|-
|...         || ...   || ...       || ...
|...     || ...   || ...       || ...
|-
|-
|uint16 || 2     || BE        || Size in bytes of block n+2 (n<sup>th</sup> translation)
|uint32  || 4     || BE        || Size in bytes of block n+2 (n<sup>th</sup> translation)<br>In version 3 and below the type is uint16
|}
|}
In version 3 with code page mapping information, the size for the codepage mapping blocks is not written since they all are 256 * 4 bytes long.


== List of Languages ==
== List of Languages ==
Line 46: Line 60:
|uint16 || 2    || BE        || Size (in bytes) of the following string (including the terminating '\0')
|uint16 || 2    || BE        || Size (in bytes) of the following string (including the terminating '\0')
|-
|-
|String  || ??    ||            || Language and country code (e.g. 'de_DE')
|String  || ??    ||            || Language and country code (e.g. 'de_DE'). The country code is optional (e.g. 'eu').
|-
|-
|uint16 || 2    || BE        || Size (in bytes) of the language name string (including the terminating '\0')
|uint16 || 2    || BE        || Size (in bytes) of the language name string (including the terminating '\0')
|-
|-
|String  || ??    ||            || Language name (e.g. 'Deutsch')
|String  || ??    ||            || Language name (e.g. 'Deutsch'). This is the name that appears in the GUI.
|}
 
== List of Codepages ==
'''This block is only present in version 3 and was removed in version 4.'''
 
For each codepage there is the following entry:
 
{| border="1" cellpadding="2" cellspacing="0"
!Type    !! Size  !! Order  !! Description
|-
|uint16 || 2    || BE        || Size (in bytes) of the following string (including the terminating '\0')
|-
|String  || ??    ||            || Codepage name (e.g. 'iso-8859-5')
|}
|}


Line 84: Line 111:
!Type    !! Size  !! Order  !! Description
!Type    !! Size  !! Order  !! Description
|-
|-
|uint16 || 2    || BE        || Number of translated messages
|uint16 || 2    || BE        || Number of translated messages
|-
|-
|uint16 || 2     || BE        || Size (in bytes) of the charset string (including the terminating '\0')
|(uint16)||(2)    || (BE)       || Only in version 3 and below: Size (in bytes) of the charset string (including the terminating '\0')
|-
|-
|String || ??   ||            || Charset (e.g. 'iso-8859-1')
|(String)|| (??||            || Only in version 3 and below: Charset (e.g. 'iso-8859-1')<br>In version 4 and above the charset is always UTF-8
|-
|-
|             ||      ||            || First translation entry (see below)
|       ||      ||            || First translation entry (see below)
|-
|-
|...         || ...    || ...        || ...
|...     || ...    || ...        || ...
|-
|-
|             ||      ||            || Last translation entry
|       ||      ||            || Last translation entry
|}
|}


Line 110: Line 137:
|-
|-
|String  || ??    ||            || Context string (if a context is defined).
|String  || ??    ||            || Context string (if a context is defined).
|}
== Codepage mapping ==
'''This block is only present in version 3 and was removed in version 4.'''
For each codepage there is a block giving the mapping from each character of the 8 bits codepage (e.g. iso-8859-5) to the equivalent unicode glyph.
{| border="1" cellpadding="2" cellspacing="0"
!Type    !! Size  !! Order  !! Description
|-
|uint32 || 4    || BE        || Mapping for a char value of 0
|-
|uint32 || 4    || BE        || Mapping for a char value of 1
|-
|...        || ...    || ...        || ...
|-
|uint32 || 4    || BE        || Mapping for a char value of 255
|}
|}

Latest revision as of 13:32, 30 August 2020


The translations.dat file is generated from the po/*.po files in the ScummVM source code repository. It contains the data needed by ScummVM to display a translated GUI.

The file is a binary file with

  1. a header
  2. a block with the list of languages
  3. a block with the list of codepages
  4. a block with the the english messages
  5. a block with the translated messages for language 1
  6. a block with the translated messages for language 2
  7. ...
  8. a block with the translated messages for language n
  9. a block with the mapping for codepage 1
  10. a block with the mapping for codepage 2
  11. ...
  12. a block with the mapping for codepage m

The latest file format version is 4. The description below is for version 4 but indicate blocks added, removed, or changed from version 1, 2 and 3.

  • Version 2 adds context information in the translated messages.
  • Version 3 adds code page descriptions (for translations not using ASCII or ISO-8859-1).
  • Version 4 uses UTF-8 for translations and drops the code page descriptions.

The header

Type Size Order Description
String 12 'TRANSLATIONS'
Byte 1 Version (of the file format)
uint16 2 BE Number of translations
(uint16) (2) (BE) Only in version 3: Number of code pages
uint32 4 BE Size in bytes of block 1 (list of languages)
In version 3 and below the type is uint16
(uint16) (2) (BE) Only in version 3: Size in bytes of block 2 (list of codepages)
uint32 4 BE Size in bytes of block 3 (english messages)
In version 3 and below the type is uint16
uint32 4 BE Size in bytes of block 4 (first translation)
In version 3 and below the type is uint16
uint32 4 BE Size in bytes of block 5 (second translation)
In version 3 and below the type is uint16
... ... ... ...
uint32 4 BE Size in bytes of block n+2 (nth translation)
In version 3 and below the type is uint16

In version 3 with code page mapping information, the size for the codepage mapping blocks is not written since they all are 256 * 4 bytes long.

List of Languages

For each translation there is the following entry:

Type Size Order Description
uint16 2 BE Size (in bytes) of the following string (including the terminating '\0')
String ?? Language and country code (e.g. 'de_DE'). The country code is optional (e.g. 'eu').
uint16 2 BE Size (in bytes) of the language name string (including the terminating '\0')
String ?? Language name (e.g. 'Deutsch'). This is the name that appears in the GUI.

List of Codepages

This block is only present in version 3 and was removed in version 4.

For each codepage there is the following entry:

Type Size Order Description
uint16 2 BE Size (in bytes) of the following string (including the terminating '\0')
String ?? Codepage name (e.g. 'iso-8859-5')

English messages

Type Size Order Description
uint16 2 BE Number of messages
First message entry (see below)
Second message entry
... ... ... ...
Last message entry

Each message entry has the following format:

Type Size Order Description
uint16 2 BE Size (in bytes) of the english message string (including the terminating '\0')
String ?? English message

The messages are sorted in alphabetical order.

Translated messages

For each translation there is a block with the following format:

Type Size Order Description
uint16 2 BE Number of translated messages
(uint16) (2) (BE) Only in version 3 and below: Size (in bytes) of the charset string (including the terminating '\0')
(String) (??) Only in version 3 and below: Charset (e.g. 'iso-8859-1')
In version 4 and above the charset is always UTF-8
First translation entry (see below)
... ... ... ...
Last translation entry

Each translation entry has the following format:

Type Size Order Description
uint16 2 BE Index of the entry in the english message table (index starts at 0)
uint16 2 BE Size (in bytes) of the translated message string (including the terminating '\0')
String ?? Translated message
uint16 2 BE Size (in bytes) of the context string (including the terminating '\0').
Size is 0 when there is no context.
String ?? Context string (if a context is defined).

Codepage mapping

This block is only present in version 3 and was removed in version 4.

For each codepage there is a block giving the mapping from each character of the 8 bits codepage (e.g. iso-8859-5) to the equivalent unicode glyph.

Type Size Order Description
uint32 4 BE Mapping for a char value of 0
uint32 4 BE Mapping for a char value of 1
... ... ... ...
uint32 4 BE Mapping for a char value of 255