SCI/Specifications/SCI virtual machine/Introduction

From ScummVM :: Wiki
Jump to navigation Jump to search

Introduction

Script resources

Like any processor, the SCI virtual machine is virtually useless without code to execute. This code is provided by script resources, which constitute the logic behind any SCI game.

In order to operate on the script resource, those first have to be loaded to the heap. The heap is the only memory space that the VM can work on directly (with some restrictions); all other memory spaces have to be used implicitly or explicitly by using kernel calls. The heap also contains a stack, which is heavily used by SCI bytecode.

Each script resource may contain one or several of various script objects, listed here:

  • Type 1: Object
  • Type 2: Code
  • Type 3: Synonym word lists
  • Type 4: Said specs
  • Type 5: Strings
  • Type 6: Class
  • Type 7: Exports
  • Type 8: Relocation table
  • Type 9: Preload text (a flag, rather than a real section)
  • Type 10: Local variables

Standard SCI0 scripts (of post-0.000.396 SCI0, approximately) consist of a four-byte header, followed by a list of bytes:

  • [00][01]: Block type as LE 16 bit value, or 0 to terminate script resource
  • [02][03]: Block size as LE 16 bit value; includes header size
  • [04].@.@.@: Data

The code blocks contain the SCI bytecode that actually gets executed. The export block (of which there may be only one (or none at all)) contains script-relative pointers to exported functions, which can be called by the SCI operations calle and callb. The local variables block, which stores one of the four variable types, is used to share variables among the objects and classes of one script.

But the most important script members are Objects and Classes. As in the usual OOP terms, Classes refer to object prototypes, and Objects are instantiated Classes. However, unlike most OOP languages, SCI treats the base class very similar to objects, so that they may actually get called by the SCI bytecode. Therefore, they also have their own space for selectors (see below). Also, each object or class knows which class it inherits from and which class it was instantiated from (in the case of objects).

Note that all script segments are optional and 16 bit aligned; they are described in more detail below:


Object segments

Objects look like this (LE 16 bit values):

  • [00][01]: Magic number 0x1234
  • [02][03]: Local variable offset (filled in at run-time)
  • [04][05]: Offset of the function selector list, relative to its own position
  • [06][07]: Number of variable selectors (= #vs)
  • [08][09]: The 'species' selector
  • [0a][0b]: The 'superClass' selector
  • [0c][0d]: The '-info-' selector
  • [0e][0f]: The 'name' selector (object/class name)
  • [10].@.@.@: (#vs-4) more variable selectors
  • [08+@ #vs*2][09+@ #vs*2]: Number of function selectors (= #fs)
  • [0a+@ #vs*2].@.@.@: Selector IDs for the functions
  • [08+@ #vs*2 +@ #fs*2][09+@ #vs*2 +@ #fs*2]zero
  • [0a+@ #vs*2 +@ #fs*2].@.@.@: Function selector code pointers

For objects, the selectors are simply values for the selector IDs specified in their species class (which is either present by its offset (in-memory) or class ID (in-script)- the same for the species' superclass (superClass selector)). Info typically has one of the following values (although this does not appear to be relevant for SCI):

  • 0x0000: Normal (statical) object
  • 0x0001: Clone
  • 0x8000: Class Other values are used, but do not appear to be of relevance.[1]


Code segments

Code segments contain free-form SCI bytecode. Pointers into this code are held by objects, classes, and export entries; these entries are, in turn, referenced in the export segment.


Synonym word list segments

Inside these, synonyms for certain words may be found. A synonym is a tuple (a, b), where both a and b are word groups, and b is the replacement for a if this synonym is in use. They are stored as 16 bit LE values in sequence (first a, then b). Synonyms must be set explicitly by the kernel function SetSynonyms() (as described Section 5.5.2.39). It is not possible to select synonyms selectively.


Said spec segments

This section contains said specs (explained in Section 6.2.4), tightly grouped.

String segments

This segment contains a sequence of asciiz strings describing class and object names, debug information, and (occasionally) game text.

Class segments

Classes look similar to objects:

  • [00][01]: Magic number 0x1234
  • [02][03]: Local variable offset (filled in at run-time)
  • [04][05]: Offset of the function selector list, relative to its own position
  • [06][07]: Number of variable selectors (= #vs)
  • [08][09]: The 'species' selector
  • [0a][0b]: The 'superClass' selector
  • [0c][0d]: The '-info-' selector
  • [0e][0f]: The 'name' selector (object/class name)
  • [10].@.@.@: (#vs-4) more variable selectors
  • [08+@ #vs*2][09+@ #vs*2]: Selector ID of the first varselector (0)
  • [0a+@ #vs*2].@.@.@: Selector ID of the second etc. varselectors
  • [08+@ #vs*4][09+@ #vs*4]: Number of function selectors (#fs)
  • [0a+@ #vs*4].@.@.@: Function selector code pointers
  • [08+@ #vs*4 +@ #fs*2][09+@ #vs*4 +@ #fs*2]: 0
  • [0a+@ #vs*4 +@ #fs*2].@.@.@: Selector ID of the first etc. funcselectors

Simply put, they look like objects with each selector section followed by a list of selector IDs.

Export segments

External symbols are contained herein, the number of which is described by the first (16 bit LE) value in the segment. All the values that follow point to addresses that the program counter will jump to when a calle operation is invoked. An exception is script 0, entry 0, which points to the first object whose ’play’ method should be invoked during startup (a magical entry point like C’s ’main())’ function).

Relocation tables

This section contains script-relative pointers pointing to pointers inside the script. These refer to script-relative addresses and need to be relocated when the script is loaded to the heap; this is done by adding the offset of the first byte of the script on the heap to each of the values referenced in this section[2] The section itself starts with a 16 bit LE value containing the number of pointers that follow, with each of the script-relative 16 bit pointers beyond having semantics as described above.

The Preload Text flag

This is an actual script section, although it is always of size 4 (i.e. only consists of the script header). It is only checked for presence; if script.x is loaded and contains this section, the text.x resource is also loaded implicitly along with it.[3]

Local variable segments

This section contains the script’s local variable segment, which consists of a sequence of 16 bit little-endian values.


Selectors

Selectors are very important in SCI. They can be either methods or object/class-relative variables, and this makes the interpretation of SCI operations like send a bit tricky.

Each class comes with two two-dimensional tables. The first table contains selector values and se­lector indices4 for each variable selector. The second table contains selector indices and script-relative method offsets. Objects look nearly identical, but they do not contain the list of selector indices for variable selectors, since those can be looked up at the class they were instantiated from (their ”species”, which happens to be one of the variable selectors). Now, whenever a selector is sent for, the engine has to determine the right action to take. FreeSCI first determines whether the selector is a variable selector, by looking for it in the list of variable selector indices of the species class of the object that the ”send” was sent to (classes use their own class number[4] as their species class).[5] If it is, the selector value is either read (if no parameter was provided to the send call) or set (if one parameter was provided). If the selector was not part of the variable selectors of the specified object, the object’s methods are checked for this selector index. If they don’t contain the selector index, either, then FreeSCI recurses into checking the method selectors of the object’s superclasses. If it finds the selector value there, it calls the heap address corresponding to the selector index.

Function invocation

SCI provides three distinct ways for invocating a function[6]: Calling exported functions (calle, callb) Calling selector methods (send, self, super) Calling PC-relative addresses (call) Exported functions are called by providing a script number and an exported function number (which is then looked up in the script’s Type 7 block). They use the object they were called from to look up local variables and selectors for self and super. Selector methods are called by providing an object and a selector index. The selector index gets looked up in the object’s selector tables, and, if it is used for a method, this method gets invocated. The provided object is used for local references. PC-relative calls only make sense inside scripts, since they jump to a position relative to the call opcode. The calling object is used for local references.

Variable types

SCI bytecode can address four types of variables (not counting the variable selectors). Those variable types are:

Local variables
These are the variables stored in Type 10 script blocks. They are shared between the objects and classes of each script.
Global variables
These variables are the local variables of script 0.
Temporary variables
Those variables are stored on the stack. They are relative to the stack frame of the current method, so space for them must be allocated before they can be used. This is commonly done by using the link operation.
Parameters
Parameters are stored on the stack below the current stack frame, as they technically belong to the calling function. They can be modified, if necessary.[7]

Notes

  1. See SQ3’s inventory objects for an example
  2. Thanks to Francois Boyer for this information.
  3. This is ignored by FreeSCI at this moment, since all resources are present in memory all the time.
  4. Those can be used as an index into vocab.997, where the selector names are stored as strings.
  5. In practice, send looks up the heap position of the requested class in a global class table.
  6. Of course, ”manual” invocation (using push and jump operations) could also be used, but there are no special provisions for it, and it does not appear to be used in the existing SCI bytecode.
  7. Obviously, SCI uses a call-by-value model for primitives and call-by-reference for objects.