Difference between revisions of "AGI/Specifications/Data"

From ScummVM :: Wiki
Jump to navigation Jump to search
(Created Other Data page, initial draft)
 
(→‎A note about word numbers: -- fix code formatting)
 
(6 intermediate revisions by 4 users not shown)
Line 6: Line 6:
==Format of the object file==
==Format of the object file==


The object file stores two bits of information about the inventory items used in an AGI game. The starting room location and the name of the inventory item. It also has a byte that determines the maximum number of animated objects.
The '''object''' file stores two bits of information about the inventory items used in an AGI game. The starting room location and the name of the inventory item. It also has a byte that determines the maximum number of animated objects.
The file encryption


The first obstacle to overcome is the fact that most object files are encrypted. I say most because some of the earlier AGI games were not, in which case you can skip to the next section. Those that are encrypted are done so with the string ``Avis Durgan'' (or, in case of AGDS games, ``Alex Simkin''). The process of unencrypting the file is to simply taken every eleven bytes from the file and XOR each element of those eleven bytes with the corresponding element in the string ``Avis Durgan''. This sort of encryption is very easy to crack if you know what you are doing and is simply meant to act as a shield so as not to encourage cheating. In some games, however, the object names are clearly visible in the saved game files even when the object file is encrypted, so it's not a very effective shield.
===The file encryption===
File format
 
The first obstacle to overcome is the fact that most '''object''' files are encrypted. I say most because some of the earlier AGI games were not, in which case you can skip to the next section. Those that are encrypted are done so with the string "Avis Durgan" (or, in case of AGDS games, "Alex Simkin"). The process of unencrypting the file is to simply taken every eleven bytes from the file and XOR each element of those eleven bytes with the corresponding element in the string "Avis Durgan". This sort of encryption is very easy to crack if you know what you are doing and is simply meant to act as a shield so as not to encourage cheating. In some games, however, the object names are clearly visible in the saved game files even when the '''object''' file is encrypted, so it's not a very effective shield.
 
===File format===
   
   


Line 31: Line 33:
</pre>
</pre>


where i is the entry number starting at 0. All offsets are taken from the start of entry for inventory item 0 (not the start of the file).
where ''i'' is the entry number starting at 0. All offsets are taken from the start of entry for inventory item 0 (not the start of the file).


Then comes the textual names themselves. This is simply a list of NULL terminated strings. The offsets mentioned in the above section point to the first character in the string and the last character is the one before the 0x00.
Then comes the textual names themselves. This is simply a list of NULL terminated strings. The offsets mentioned in the above section point to the first character in the string and the last character is the one before the 0x00.
===Notes for the Amiga Platform===
The Amiga platform uses 4 bytes for each entry, it has a 1 byte padding on the object


<span id="Words"></span>
<span id="Words"></span>
==Format of words.tok==
==Format of words.tok==


The words.tok file is used to store the games vocabulary, i.e. the dictionary of words that the interpreter understands. These words are stored along with a word number which is used by the said test commands as argument values for that command. Many words can have the same word number which basically means that these words are synonyms for each other as far as the game is concerned.
The '''words.tok''' file is used to store the games vocabulary, i.e. the dictionary of words that the interpreter understands. These words are stored along with a word number which is used by the '''said''' test commands as argument values for that command. Many words can have the same word number which basically means that these words are synonyms for each other as far as the game is concerned.


The file itself is both packed and encrypted. Words are stored in alphabetic order which is required for the compression method to work.
The file itself is both packed and encrypted. Words are stored in alphabetic order which is required for the compression method to work.
The first section
 
===The first section===


At the start of the file is a section that is always 26x2 bytes long. This section contains a two byte entry for every letter of the alphabet. It is essentially an index which gives the starting location of the words beginning with the corresponding letter.
At the start of the file is a section that is always 26x2 bytes long. This section contains a two byte entry for every letter of the alphabet. It is essentially an index which gives the starting location of the words beginning with the corresponding letter.
Line 58: Line 64:


All offsets are taken from the beginning of the file. If no words start with a particular letter, then the offset in that field will be 0x0000.
All offsets are taken from the beginning of the file. If no words start with a particular letter, then the offset in that field will be 0x0000.
The words section


Words are stored in a compressed way in which each word will use part of the previous word as a starting point for itself. For example, ``forearm'' and ``forest'' both have the prefix ``fore''. If ``forest'' comes immediately after ``forearm'', then the data for ``forest'' will specify that it will start with the first four characters of the previous word. Whether this method is used for further confusion for would be cheaters or whether it is to help in the searching process, I don't yet know, but it most certainly isn't purely for compression since the words.tok file is usally quite small and no attempt is made to compress any of the larger files (before AGI version 3 that is).
===The words section===
 
Words are stored in a compressed way in which each word will use part of the previous word as a starting point for itself. For example, "forearm" and "forest" both have the prefix "fore". If "forest" comes immediately after "forearm", then the data for "forest" will specify that it will start with the first four characters of the previous word. Whether this method is used for further confusion for would be cheaters or whether it is to help in the searching process, I don't yet know, but it most certainly isn't purely for compression since the words.tok file is usally quite small and no attempt is made to compress any of the larger files (before AGI version 3 that is).


<pre>
<pre>
Line 75: Line 82:


If a word does not use any part of the previous word, then the prefix field is equal to zero. This will always be the case for the first word starting with a new letter. There is nothing to indicate where the words starting with one letter finish and the next set starts, infact the words section is just one continuous chain of words conforming to the above format. The index section mentioned earlier is not needed to read the words in which suggests that the whole words.tok format is organised to find words quickly.
If a word does not use any part of the previous word, then the prefix field is equal to zero. This will always be the case for the first word starting with a new letter. There is nothing to indicate where the words starting with one letter finish and the next set starts, infact the words section is just one continuous chain of words conforming to the above format. The index section mentioned earlier is not needed to read the words in which suggests that the whole words.tok format is organised to find words quickly.
A note about word numbers
 
===A note about word numbers===


Some word numbers have special meaning. They are listed below:
Some word numbers have special meaning. They are listed below:


<pre>
     Word# Meaning
     Word# Meaning
     ----- -----------------------------------------------------------
     ----- -----------------------------------------------------------
Line 87: Line 94:
           input list is
           input list is
     ----- -----------------------------------------------------------
     ----- -----------------------------------------------------------


Example:
Example:
 
if (said(take, anyword)) {
<syntax type="C++">
print("You can't - Blackbeard has chopped both your arms off.");
if (said(take, anyword)) {
print("You can't - Blackbeard has chopped both your arms off.");
}
}
</pre>
</syntax>


<span id="Sample"></span>
<span id="Sample"></span>
==Sample code==
==Sample code==



Latest revision as of 22:04, 30 November 2011

Other game data

Written by Lance Ewing (Last updated: 31 August 1997).

Format of the object file

The object file stores two bits of information about the inventory items used in an AGI game. The starting room location and the name of the inventory item. It also has a byte that determines the maximum number of animated objects.

The file encryption

The first obstacle to overcome is the fact that most object files are encrypted. I say most because some of the earlier AGI games were not, in which case you can skip to the next section. Those that are encrypted are done so with the string "Avis Durgan" (or, in case of AGDS games, "Alex Simkin"). The process of unencrypting the file is to simply taken every eleven bytes from the file and XOR each element of those eleven bytes with the corresponding element in the string "Avis Durgan". This sort of encryption is very easy to crack if you know what you are doing and is simply meant to act as a shield so as not to encourage cheating. In some games, however, the object names are clearly visible in the saved game files even when the object file is encrypted, so it's not a very effective shield.

File format

    Byte  Meaning
    ----- -----------------------------------------------------------
     0-1  Offset of the start of inventory item names
      2   Maximum number of animated objects
    ----- -----------------------------------------------------------

Following the first three bytes as a section containing a three byte entry for each inventory item all of which conform to the following format:

    Byte  Meaning
    ----- -----------------------------------------------------------
     0-1  Offset of inventory item name i
      2   Starting room number for inventory item i or 255 carried
    ----- -----------------------------------------------------------

where i is the entry number starting at 0. All offsets are taken from the start of entry for inventory item 0 (not the start of the file).

Then comes the textual names themselves. This is simply a list of NULL terminated strings. The offsets mentioned in the above section point to the first character in the string and the last character is the one before the 0x00.

Notes for the Amiga Platform

The Amiga platform uses 4 bytes for each entry, it has a 1 byte padding on the object

Format of words.tok

The words.tok file is used to store the games vocabulary, i.e. the dictionary of words that the interpreter understands. These words are stored along with a word number which is used by the said test commands as argument values for that command. Many words can have the same word number which basically means that these words are synonyms for each other as far as the game is concerned.

The file itself is both packed and encrypted. Words are stored in alphabetic order which is required for the compression method to work.

The first section

At the start of the file is a section that is always 26x2 bytes long. This section contains a two byte entry for every letter of the alphabet. It is essentially an index which gives the starting location of the words beginning with the corresponding letter.

    Byte  Meaning
    ----- -----------------------------------------------------------
     0-1  Hi and then Lo byte for 'A' offset
     ...
    50-51 Hi and then Lo byte for 'Z' offset
     52   Words section
    ----- -----------------------------------------------------------

The important thing to note from the above is that the 16 bit words are big-endian (HI-LO). The little endian (LO-HI) byte order convention used everywhere else in the AGI system is not used here. For example, 0x00 and 0x24 means 0x0024, not 0x2400. Big endian words are used later on for word numbers as well.

All offsets are taken from the beginning of the file. If no words start with a particular letter, then the offset in that field will be 0x0000.

The words section

Words are stored in a compressed way in which each word will use part of the previous word as a starting point for itself. For example, "forearm" and "forest" both have the prefix "fore". If "forest" comes immediately after "forearm", then the data for "forest" will specify that it will start with the first four characters of the previous word. Whether this method is used for further confusion for would be cheaters or whether it is to help in the searching process, I don't yet know, but it most certainly isn't purely for compression since the words.tok file is usally quite small and no attempt is made to compress any of the larger files (before AGI version 3 that is).

    Byte  Meaning
    ----- -----------------------------------------------------------
      0   Number of characters to include from start of prevous word
      1   Char 1 (xor 0x7F gives the ASCII code for the character)
      2   Char 2
     ...
      n   Last char
     n+1  Wordnum (LO-HI) -- see below
    ----- -----------------------------------------------------------

If a word does not use any part of the previous word, then the prefix field is equal to zero. This will always be the case for the first word starting with a new letter. There is nothing to indicate where the words starting with one letter finish and the next set starts, infact the words section is just one continuous chain of words conforming to the above format. The index section mentioned earlier is not needed to read the words in which suggests that the whole words.tok format is organised to find words quickly.

A note about word numbers

Some word numbers have special meaning. They are listed below:

   Word# Meaning
   ----- -----------------------------------------------------------
     0   Words are ignored (e.g. the, at)
     1   Anyword
   9999  ROL (Rest Of Line) -- it does matter what the rest of the
         input list is
   ----- -----------------------------------------------------------

Example:

<syntax type="C++"> if (said(take, anyword)) { print("You can't - Blackbeard has chopped both your arms off."); } </syntax>

Sample code

The following examples are available in the distribution package:

  • object.pas by Peter Kelly: displays contents of the object file
  • words.pas by Peter Kelly: displays contents of the words.tok file