SCI/Specifications/Sound/SCI0 Resource Format

From ScummVM :: Wiki
Jump to navigation Jump to search

The SCI0 Sound Resource Format

Original document by Ravi Iyengar


Sierra’s SCI0 sound resources contain the music and sound effects played during the game. With the introduction of SCI, the company took advantage of new sound hardware which allowed for far bet­ter music than the traditional PC speaker could ever create. Sierra chose two devices to specifically target: the MT-32, and the AdLib. The MT-32 is a MIDI synth while the AdLib is a less expensive card based around the OPL2, a non-MIDI chip. Anyone interested in Sierra music and its history can find information at the Sierra Soundtrack Series.

Music is stored as a series of MIDI events, and the sound resource is basically just a MIDI file. The MIDI standard and device implementations are not covered here in detail, but specifications should be readily available elsewhere. SCI0 Sound resources can also contain digital samples, although an SCI remake of KQ1 is the only DOS game I know of that includes them. These files still contain MIDI data, but the wave data is appended at the end. The MIDI data is an approximation of the sound effect for hardware that can’t play digital sound.

Some people prefer the one-based numbering system for channel and program numbers. I person­ally prefer the zero-based system, and use it here. If you’re familiar with channels 1-16, be aware that I will call them 0-15. My intention is not to be deviant from other programs but to be more accurate in representing the way information gets stored. The same is true for programs 0-127 as opposed to 1-128. For whatever reason, convention already holds that controls be numbered 0-127, so nothing in my treatment of them should be abnormal.

Sierra changed its sound file format in the switch to SCI1. I refer only to SCI0 sound files in this specification. Hybrid interpreters such as the one used for Quest for Glory II are also excluded. Finally, SCI games written for non-DOS systems may have different formats. This document applies to Sierra’s IBM games.

Sound Devices

A gamer’s sound hardware greatly affects how music will sound. Devices used by SCI0 can be broken into general categories:

MIDI Synths
These will generally give the best sound quality. MIDI synths are polyphonic with de­finable instruments through patch files and full support for MIDI controls. The General MIDI standard had not been written when Sierra began writing SCI games, and as far as I know no SCI0 game uses a GM driver or includes a GM track. This means that synths had to be individually supported.
Non-MIDI Synths
Generally not as good as MIDI synths, but also less expensive. The OPLx family of chips are still very common among home PC users thanks to the AdLib and SoundBlaster cards. Synths are polyphonic with definable instruments through patch files, but drivers must be written to interpret MIDI events and turn them into commands the hardware will recognize. Support for most sound controls gets lost in the process. Furthermore, drivers must map logical, polyphonic MIDI channels to physical, monophonic hardware channels. A control (4Bh) was introduced for this purpose and will be discussed later.
Beepers produce very poor music and don’t support instrument definitions, but all PC users have one so supporting them covers people without special sound hardware. The most common device is the PC speaker, which is monophonic. Another is the Tandy speaker with 3 channels. Drivers must interpret MIDI events, but need only concern themselves with basic functionality. Interpreting the MIDI events is also made easier because each channel is monophonic. To play a chord on the Tandy, for example, each voice must be put in a separate MIDI channel.
Wave Devices
Wave devices play digital sound data. They could be used in conjunction with one of the above devices to add special sound effects to a game. The Amiga port of SCI uses a wave device to play music.
With such a diverse group of devices to support, Sierra put a lot of the work on the shoulders of the drivers. Functions for loading patch files, handling events, pausing, etc. are all in the drivers. The interpreter calls them as needed but does not concern itself at all with how they get implemented.
Listed here are devices supported by the SCI0 interpreter with a little information about each. There could very well be other hardware not listed here, so please send in any missing information.
Device Name Driver Patch Poly Flag
Roland MT-32 mt32 001 32 01h
AdLib adl 003 9 04h
PC Speaker std * 1 20h
Tandy 1000 or PCJr jr * 3 10h
Tandy 1000 SL, TL tandy * 3 10h
IBM Music Feature imf 002 8 +
Yamaha FB-01 fb01 002 8 02h
CMS or Game Blaster cms 101 12 04h
Casio MT540 or CT460 mt540 004 10 08h
Casio CSM-1 007
Roland D110,D10,D20 000
Amiga Sound amigasnd 4 40h
General MIDI 004 01h

(thanks to Shane T. for providing some of this).

Blank fields are unknown, not unused.

When asked which patch to load, the PC and Tandy speaker drivers return FFFFh, which is a signal that they do not use patches.
The imf driver almost certainly uses 02h for the play flag, but I haven’t confirmed this.

The driver column holds the file name of each driver without the .drv extension. The patch column specifies which patch resource each driver requests. The poly column is the maximum number of voices which can be played at once according to the driver. The flag column gives each device’s play flag. Play flags, explained in the header section, determine which channels a device will play.

File Format

There are 2 formats used in SCI0 games. One is an early format used in a version of kq4 and 1988 xmas card. The other one is used in all other SCI0 titles.

The first two bytes of the file contain a magic number identifying the resource type. The rest of the file contains a dump of the uncompressed data. The identifier is the resource type (04h for sound) OR-ed with 80h and stored as a word. The result will be 84h 00h in extracted sound files.

The sound resource data itself is a header with channel initialization followed by a series of MIDI events. The header provides the sound driver with 2 pieces of information about each channel.

Header (kq4, 1988 xmas card)

The first byte is a digital sample flag. Afterwards 1 byte follows for each channel (totals in 17 bytes).

The upper 4 bits of that byte specify how many voices each logical MIDI channel will be playing. The lower 4 bits specify which drivers should react on that channel. Bit 0 set means AdLib shall react. Bit 1 set means PCjr shall react. MT32 will react on all channels. Bit 3 signals the control channel.

The original sierra driver needs bit 3 set and bit 0 unset to find the control channel. Also the AdLib driver needs bit 3 to be unset, otherwise the driver will ignore the channel even if bit 0 is set.

Currently its not known if the digital sample flag behaves the same as in the SCI0 header.

Header (all other SCI0 titles)

The first byte is a digital sample flag. Afterwards 2 bytes follow for each channel (totals in 33 bytes).

The first of those bytes specifies how many voices each logical MIDI channel will be playing. For MIDI synths, this information is not really necessary and is probably ignored. The same goes for beepers. This byte is only useful for non-MIDI synths which must know how many hardware channels each logical MIDI channel will need. This value is only an initial setting. Sound files can request changes to the mapping later with control changes. Requesting more hardware channels than are actually available can cause errors on some drivers.

The second byte describes how the user's sound hardware should treat the channel. It is the combination of bit flags which may be OR-ed together. If the appropriate bit is set for the currently selected sound device, the channel will be played. If it is not, the channel will be silent. The driver decides which bit it will use as the play flag, and the table under Sound Devices lists the flag used by each driver. Drivers ignore the first byte (used to request hardware channels) on MIDI channels they don't play.

The MT-32 always plays channel 9, the MIDI percussion channel, regardless of whether or not the channel is flagged for the device. Other MIDI devices may also do this.

A byte at the beginning of the file, before channel initialization, specifies whether the resource contains a digital sample or not. A value of 0 means that there is only MIDI data. A value of 2 means that there is a digital sample at the end of the file. In this case, only the first 15 MIDI channels have header bytes. The two header bytes for the last channel is replaced with an offset to the digital sound effect. The offset is stored in big-endian order in the resource. If present, it points to the last byte before the digital sample header. If the offset is 0, the file must be searched for the status FCh, and the digital sample header will come next. There may be two FCh bytes in a row, in which case both will come before the digital sample header. The digital sample header is discussed in more detail in the digial sample section.

The header format:

  • 1 byte - digital sample flag (0 or 2)
  • 2 bytes - initialization for channel 0
  • 2 bytes - initialization for channel 1

. . .

  • 2 bytes - initialization for channel 15 OR offset to digital sample

The header is always 33 bytes long.


The actual music is stored in a series of events. The generic form for an event is:

<delta time> [byte - status] [byte - p1 [p2]]

Delta time is the number of ticks to wait after executing the previous event before executing this event. Ticks occur at 60 Hz. The delta time value is usually a single byte. However, longer delays can be produced by using F8h any number of times before the delta time value. Each F8h byte causes a delay of 240 ticks before continuing playback. For example, the sequence F8 F8 78 FC waits 600 ticks then stops the sequence because of the FCh status. The fact that F8h waits F0h ticks makes me think that E9h is the largest technically allowable delta time.

The delta time must be present in most events. The only exception is when FCh is the status, because FCh is a real-time message. Sierra's resources seem to have always provided a delta time, though. Note also that FCh cannot be used as a delta time value - it will be interpreted as a stop sequence status.

The status byte is basically a command. The most significant bit is always set. This feature is important because the status byte will not always be present. A missing status byte is known as running status mode and the last status gets repreated with the new parameters. Parameters will never have their most significant bits set.

The generic form for a status byte is (in bits) 1xxxcccc. The lower nibble usually specifies a channel. The upper specifies a status.

Status Reference

8x n v
Note off: Stop playing note n on channel x, releasing the key with velocity v. If a hold pedal is pressed, the note will continue to play after this status is received and end when the pedal is released.
9x n v
Note on: Play note n on with velocity v on channel x. Playing a note with velocity 0 is a way of turning the note off.
Ax n p
Key pressure (after-touch): Set key pressure to p for note n on channel x. This is to modify key pressure for a note that is already playing.
Bx c s
Control: Set control c to s on channel x. This can be confusing because there isn't just one meaning. Changing the settings on different controls will, of course, have different outcomes.

Controls which handle any value are continuous controllers. They have a continuous range. Controls which are only on/off are switches. Their defined values are 01h (OFF) and 7Fh (ON).

Listed in this reference are the non-standard MIDI controls I've found in Sierra SCI0 sound files. Standard controls are not listed here. Not all drivers support all controls.

Control Reference

Channel mapping: When a channel sets this control, it tells the driver how many notes it will be playing at once, and therefore how many hardware channels it occupies.
Reset on PauseSound: An on/off switch where a value of zero is off and a non-zero value is on. Note that this is not the same as for standard MIDI control switches. When this control is on, calling the sound driver's PauseSound subfunction will reset the sound position to the beginning. The initial value is set to off when a sound gets loaded.
Unknown: Experiments in setting and clearing it show that a value of 0 will cause notes to be played without regard for the velocity parameter while a value of 1 will enable velocities.
Reverb: I know little about this myself. Rickard Lind reports that it exists in the MT-32 driver and supports parameter values 0-10 (possibly 0-16?).
Cumulative cue: The interpreter can get cues from the sound file, which sets the Sound object's signal property. When a sound gets loaded, the initial cue is set to 127. When a CC60 occurs, the new control value is added to the current cue. If the cue were 130, for example, a CC60 5 on any channel would make the new cumulative cue equal 135.
Cx p
Program change: Set program (patch / instrument / ect.) to p for channel x. This is a simple instrument change.
Channel 15, however, includes two special cases of this status. The first relates to communication with the game interpreter. If p is less than 127 then the signal property for the game interpreter's Sound object gets set to p, triggering a non-cumulative cue.
If p is equal to 127, then the current position within the sound resource is remembered as the loop point. Normally the driver loops to the beginning of the sound when the sequence ends. If an explicit loop point is set, the sound will be replayed from the marked point instead.
The actual time of the loop point is better explained with a short diagram:

0x10 0x91 0x20 0x20 play a note on channel 1
0x05 0x91 0x20 0x00 stop the previous note
0x00 0x92 0x30 0x10 play a note on channel 2
[restart here]
0x00 0xCF 0x7F set loop point
0x00 0xC8 0x05 change to program 5 on channel 8
0x00 0xCF 0x13 set signal to 19
0x20 0xFC end of file, loop to marked location

In both situations (p < 127 and p = 127), no actual program change takes place. Channel 15 is used for control, not playing music.
Dx p
Pressure (after-touch): Set key pressure to p on channel x. This is similar to Ax but differs in its scope. Message Ax is applied on a per-note basis while message Dx is applied to an entire channel.
Ex t b
Pitch wheel: Set the pitch wheel to tb. The setting is actually a 14 bit number with the least significant 7 bits stored in b and the most significant 7 bits stored in t. The range of values is 0000h to 3FFFh. A value of 2000h means that the pitch wheel is centered. Larger values raise pitch and smaller values lower it.
Begin SysEx: Starts a system exclusive data block. The block must terminate with F7h.
End SysEx: Ends a system exclusive data block. Normal sound data resumes at this point.
Stop Sequence: This is a system real-time message which tells the sound driver to stop the current sound. The sound object's signal property gets set to FFFFh and the position moves to the loop point, which defaults to the beginning. Drivers allow this message to occur without a delta time, but I haven't seen any examples.

Digital samples

The digital sample header is 44 bytes long. Offset 14 in the header contains the frequency as a short integer. Offset 32 contains the sample length, also as a short integer. Other fields in the header are unknown (to me) at the time of writing, but aren't critical to playback.

The wave data comes immediately after the header, stored in unsigned 8 bit PCM format.

Amiga Sound (SCI0)

The SCI0 Amiga Sound driver does not use a patch resource, instead it loads an external instrument bank called 'bank.001'. This file has the following structure (all numbers are big-endian):

  • [00]..[07] String "X0iUo123"
  • [08]..[25] Bank name
  • [26][27] Number of instruments (= #i)
  • [28].. #i instruments

An instrument has the following format:

  • [00][01] Instrument number
  • [02]..[1f] Instrument name
  • [20][21] Flags:
    • Bit 0 looping on/off
    • Bit 1 pitch changes on/off
  • [22] Transpose value in semitones (= #t)
  • [23][24] Segment 1 size in words (= #s1)
  • [25][26][27][28] Segment 2 offset in bytes
  • [29][2a] Segment 2 size in words (= #s2)
  • [2b][2c][2d][2e] Segment 3 offset in bytes
  • [2f][30] Segment 3 size in words (= #s3)
  • [31]..[3c] Velocity envelope
  • [3d].. #s1+#s2+#s3 signed 8-bit samples

A velocity envelope has the following format:

  • [00] Phase 1 period size in ticks (= #p1)
  • [01] Phase 2 period size in ticks (= #p2)
  • [02] Phase 3 period size in ticks (= #p3)
  • [03] Phase 4 period size in ticks (= #p4)
  • [04] Phase 1 velocity delta (= #d1), range [0-64]
  • [05] Phase 2 velocity delta (= #d2)
  • [06] Phase 3 velocity delta (= #d3)
  • [07] Phase 4 velocity delta (= #d4)
  • [08] Phase 1 target velocity (= #v1), range [0-64]
  • [09] Phase 2 target velocity (= #v2)
  • [0a] Phase 3 target velocity (= #v3)
  • [0b] Phase 4 target velocity (assumed to be 0) (= #v4)

With looping off, all samples are played. The segments and the envelope data are ignored. With looping on, Segment 1 is played first, followed by a looping of Segment 2 (Segment 3 is never played). As Segments 1 and 2 may overlap; it is possible for #s1 + #s2 to exceed the total number of samples; in that case #s3 will be negative.

Velocity envelope phases 1 and 2 are applied after note-on, and phases 3 and 4 after note-off. If #p0 is zero, no velocity envelope is applied. For other phases, a period size of 0 is interpreted as a period size of 256 ticks. If the velocity drops to 0 at any point, the note is stopped right away, even if more phases follow. Otherwise, the note is stopped after phase 4, which always has a target volume of 0 (note that it is possible to construct velocity envelopes that never terminate).

Each envelope phase n operates as follows, where #v0 is the velocity from the note-on event (divided by two to scale it to Amiga volume levels):

<syntax type="C"> vel = #v(n-1); while (true) {

   set_channel_velocity(vel * #v0 / 64);
   vel -= #dn;
   if ((#dn >= 0 and vel <= #vn) or (#dn < 0 and vel >= #vn)) {
       if (#vn == 0)


All instruments have a samplerate of 20000Hz. With pitch changes off, the instrument is always played at this frequency, regardless of the note. With pitch changes on, #t is first added to the note. The instrument is then played at the corresponding frequency (where note 101 equals 20000Hz).

The Amiga has four audio channels. Channels 0 and 3 are panned hard left and channels 1 and 2 are panned hard right. The first MIDI channel with playmask 0x40 is mapped to channel 0, the second to channel 1, etc. The driver seems to ignore all pan and volume commands.

Amiga Sound (SCI1)/Macintosh Sound (SCI1/1.1)

The SCI1 Amiga sound driver adds several features: multiple samples per instrument, variable samplerates and pitch bend support. SCI1/1.1 Macintosh games use the same format in the 7.pat file.

The sound bank header contains pointers to the 128 instruments (NULL indicates instrument not used):

  • [0000]..[0003] (uint32) pointer to instrument 0


  • [01fc]..[01ff] (uint32) pointer to instrument 127

Each instrument has samples assigned to ranges of notes:

  • [00]..[07] Instrument name
  • [08][09] Unused (always 0)
  • [0a] List of note range data, terminated with 0xff 0xff

Note range data:

  • [00][01] (int16) Start note number [0-127]
  • [02][03] (int16) End note number (inclusive) [0-127]
  • [04]..[07] (uint32) pointer to sample data
  • [08][09] (int16) Transpose value
  • [0a] Attack speed [0-31]
  • [0b] Attack target velocity [0-64]
  • [0c] Decay speed [0-31]
  • [0d] Decay target velocity [0-64]
  • [0e] Release speed [0-31]
  • [0f] Unknown (always 0)
  • [10][11] (int16) Fixed note number for this instrument, or -1
  • [12][13] (int16) 0 = looping on, <>0 = looping off

The transpose value is linear. A value of 0x10 transposes a note to the next higher note, but all other transpositions are based on the difference in frequency between these two notes. As a result, a value of 0x20 will not transpose a note by two, but fall slightly short of that.

Sample data header (phase 2 is used for looping):

  • [00]..[07] Sample name
  • [08][09] (int16) 0 = unsigned samples, <>0 = signed samples
  • [0a][0b] (uint16) Start offset of phase 1
  • [0c][0d] (uint16) End offset (inclusive) of phase 1
  • [0e][0f] (uint16) Start offset of phase 2
  • [10][11] (uint16) End offset (inclusive) of phase 2
  • [12][13] (int16) Native MIDI note for this sample
  • [14][17] (uint32) Pointer to period table

The Amiga sound system uses period lengths to determine the sample rate. A period length is the amount of CPU ticks that a single byte should take. For NTSC Amiga machines, samplerate relates to period length as follows: samplerate = 3579545 / period length.

A period table contains period lengths for an entire octave. To simplify the implementation of the transpose value, the table contains two additional semitones, one before the first semitone of the octave and one after the last. Additionally, in order to support pitch bends the table contains 3 additional entries after every semitone at 25 cent intervals (100 cents equal 1 semitone), resulting in 56 entries total. This table contains period lengths for the lowest octave. Dividing a period length by two transposes a note to the next octave.

  • [0000]..[0003] (uint32) period for cent 0
  • [0004]..[0007] (uint32) period for cent 25
  • [0008]..[000b] (uint32) period for cent 50
  • [000c]..[000f] (uint32) period for cent 75
  • [0010]..[0013] (uint32) period for cent 100 (octave starts here)


  • [00dc]..[00df] (uint32) period for cent 1375
  • [00e0]..[00e3] (uint16) samplerate in Hz used to generate the table (not used by SSCI)

MIDI notes are not used directly when doing table lookups. Every sample has a native MIDI note that needs to be taken as a base. If the sample is played back at the native MIDI note, the period length is obtained from the 8th semitone in the table (cent 800) at the 11th octave, i.e. the value at offset 0x0080 in the table, divided by two 10 times. This period length will also correspond to the samplerate at the end of the table.

General MIDI and MT-32 (SCI1)

The SCI1 MT-32 driver uses patch file 1, and the GM driver uses patch file 4. The file formats are identical however, and the same goes (to a large extent) for the drivers. The patch files have the following structure:

  • [000]..[07f] patchMap
  • [080]..[0ff] patchKeyShift
  • [100]..[17f] patchVolumeAdjust
  • [180]..[1ff] percMap
  • [200] percVolumeAdjust
  • [201]..[280] velocityMapIndex
  • [281]..[300] velocityMap 0
  • [301]..[380] velocityMap 1
  • [381]..[400] velocityMap 2
  • [401]..[480] velocityMap 3
  • [481][482] Size (little endian) of MIDI data (= #i)
  • [483].. #i bytes of MIDI data
  • patchMap[i]: Native patch number for patch i , or -1 for unused entries.
  • patchKeyShift[i]: Key shift value for patch i. This value should be added to every non-rhythm key that is played. If the key goes out of bounds [0,127], it should be clipped by a multiple of 12 semitones.
  • patchVolumeAdjust[i]: Volume adjust value for patch i. This value should be added to the volume (when setting controller 07h) before scaling it by the master volume.
  • percMap[i]: Native key for key i of the percussion channel, or -1 for unused entries.
  • percVolumeAdjust: Volume adjust value for percussion.
  • velocityMapIndex[i]: Specifies which velocityMap to use for patch i.
  • velocityMap[i]: Native velocity for velocity i.
  • MIDI data: MIDI data to initialize the device. Note that this data can contain sysex commands, after which an appropriate delay should be executed.