Open main menu

SCUMM/Virtual Machine

< SCUMM
Revision as of 14:41, 22 April 2006 by Joachimeberhard (talk | contribs) (Initial move in of http://www.scummvm.org/docs/specs/scrp.php)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The SCUMM virtual machine

The SCUMM virtual machine was given a complete rewrite between version 5 and version 6. Both virtual machines are simple byte-code machines with built-in cooperative threading; up to 25 threads can be run simultaneously. Each thread gets 16 words of local storage. Up to 8192 words and 32768 bits of global storage are also available. The word and bit storage areas occupy seperate address spaces. The exception to this rule is Zak256, where both actually use the same memory (possibly this is also true for other older SCUMM versions).

The V5 machine has hard-coded limits on 800 word variables and 2048 bit variables. All operations are done to and from a variable (global or local). The V6 machine can stores the number of variables in the MAXS resource in the index and uses a 100-word global stack for performing operations.

Additional storage is supplied in Array resources. V5 requires arrays to be declared in the data file; V6 can define new ones on the fly, which means they can be used as a dynamic heap.

Pointers

Both V5 and V6 use a common system of encoding the address of something in memory. These are referred to as pointers. These come in several forms, depending on what is being pointed at.

Word variables

15 14 13 12 ... 0
0 0 0 address

This is a plain direct reference to the word global storage.

Bit variables

15 14 ... 0
1 address

Local variables

15 14 13 12 ... 4 3 ... 0
0 1 ? 0 var no

Indirected word variables (V5 only)

15 14 13 12 ... 0
0 0 1 address
0 0 I offset

If bit 13 is set, then the next 16 bit word from the script is fetched (represented by the second row in the above graph). If the I bit of this second word is not set, the effective word address becomes address+offset. If it is set, the effective word address address+[offset] is used instead.

Threads

Both engines have automatic cooperative threading. Instructions will be executed out of a single script until it suspends for some reason; the flow of execution will then move on to the next script. Each script has the opportunity to run once per frame, with some exceptions.

In V6, because the stack is shared among all threads, it is the script's responsibility to ensure that the stack is empty when executing an instruction that could cause the thread to be descheduled. Leaving items on the stack is highly dangerous! If state needs to be saved from one frame to the next, it should be stored in local variables.

Threads may be in one of the following states:

State Meaning
RUNNING the thread is currently executing
PENDED the thread was executing, but has spawned another thread. When the child thread is descheduled, the pended thread will start executing again
DELAYED the thread is not executing, and is waiting for a timer to expire
FROZEN the thread is suspended and will not execute until it has been thawed again

A thread may spawn another thread at any point. When this happens, the parent thread moves into the PENDED state and stops executing. The child thread starts executing immediately. When the child thread is descheduled, the parent moves back into the RUNNING state and continues on immediately after the instruction that created the thread. PENDING threads may be nested up to 15 times.

To summarise:

A thread will run once every frame, unless it is delayed or frozen.

Any code may start a new thread at any point. (Sometimes this is done automatically.) The new thread will immediately run until it is descheduled, at which point execution continues in the parent thread.

Any non-PENDED thread may be frozen at any point. That thread will not be run again until it is unfrozen. Attempts to freeze a PENDED thread will be ignored. It is possible to mark a thread as unfreezable when it is created; however, even unfreezable threads can be frozen with the appropriate code (see the freezeScripts and freezeUnfreeze opcodes).

Version 5 instruction encoding

Unfortunately, there is no set V5 instruction encoding. In the general form, instructions take a fixed number of parameters and return an optional value. The format is:

opcode [result] [parameter1] [parameter2]...

The result is a word pointer as usual. The parameters are word LE or byte values depending on the opcode. The meaning of the parameters is encoded in the top bits of the opcode. For example, the add instruction takes one parameter, and so the opcode looks like this:

7 6 0
p1 $5A

If the p1 bit is set, then the parameter is evaluated as a pointer and dereferenced. Otherwise it is treated as a constant. If multiple parameters are present, the bits work down from the MSB to the LSB.

However, many instructions are exceptions to this rule. For example, instructions such as actorSet that take an auxiliary opcode will put it somewhere in the instruction stream; actorSet puts it after the first parameter, and then the other parameters vary according to the particular operation. jumpRelative puts the jump target immediately after the opcode encoded as a constant. Some opcodes (such as drawBox) take too many parameters and have a supplementary opcode byte to contain the extra parameter bits. You have been warned.

Version 6 instruction encoding

After V5, the byte-code engine was rewritten completely. The new engine is far more orthogonal and considerably easier to understand.

Most instructions use the stack for all input and output, so the opcodes in this case are single bytes with no parameters. Some instructions do take parameters, but these are always of fixed size and type and so no parameter type bits are needed.

As a result of this more efficient encoding, the V6 engine actually has more instructions than the V5 one, despite using fewer entries in the opcode map.

Arrays

Both V5 and V6 allow Array resources to be used as a dynamic heap. V5 allows up to 32 one-dimensional arrays, also referred to as Strings; V6 allows an arbitrary number (the maximum is defined in the MAXS chunk) of two-dimensional arrays. V5 arrays can only store bytes, while V6 ones can store bytes or words.

Before an Array can be used, it must be defined. V5 does this with arrayOps/5, passing in the Array resource number (1..32) and the size. The specified Array is initialised to zero. V6 does it with the dim, dim2 or arrayOps/208 opcodes, depending on the type of array and how it is to be initialised.