Open main menu

Difference between revisions of "SCI/Specifications/SCI in action/Parser"

Merging of the SCI documentation. Work in progress
(Merging of the SCI documentation. Work in progress. Formatting needs improving.)
 
(Merging of the SCI documentation. Work in progress)
Line 1: Line 1:
''Document conversion incomplete. Work in progress.''
=The Parser=
=The Parser=


Line 113: Line 115:
===The tree vocabulary (VOCAB.900)===
===The tree vocabulary (VOCAB.900)===
This vocabulary is used solely for building parse trees. It consists of a series of word values which end up in the data nodes on the tree. It doesn't make much sense without the original parsing code.
This vocabulary is used solely for building parse trees. It consists of a series of word values which end up in the data nodes on the tree. It doesn't make much sense without the original parsing code.
==The black box: The magic behind Sierra's text parser==
''Original document by Lars Skovlund. Document incomplete by the author.''
This document describes the process of parsing user input and relating it to game actions. This document does not describe the process of the user typing his command; only the "behind-the-scenes" work is described, hence the title.
The process of parsing is two-fold, mainly for speed reasons. The Parse kernel function takes the actual input string and generates a special "said" event (type 0x80) from it. This function is only called once per line. Parse can either accept or reject the input.
A rejection can only occur if Parse fails to identify a word in the sentence.
Even if Parse accepts the sentence, it does not need to make sense. Still, syntax checks are made - see later.
Assuming that the parsing succeeded, the User object (which encapsulates the parser) then goes on to call the relevant event handlers. These event hand- lerrs in turn call the Said kernel function. This function is potentially called hundreds or even thousands of times, so it must execute as quickly as possible. Said simply determines from the pre-parsed input line whether or not a specific command is desired.
The Parse function must always work on an internal copy of the actual string, because the user must be able to recall his exact last input using the F3 key. The parser's first step is to convert the input line to pure lower case. This is because the vocabulary words are entered in lower case. The parser then searches the main vocabulary (VOCAB.000), hoping to find the word.
This doesn't necessarily happen yet. Consider, for example, the meaning of the word "carefully", which does not appear in the vocabulary, but is found anyway. This is due to the so-called suffix vocabulary, which is discussed in another document.
If the word still can't be found, the interpreter copies the failing word into a buffer temporarily allocated on the heap (remember, the interpreter operates on its own local buffers). It then calls the Game::wordFail method which prints an appropriate message. The interpreter then deallocates the buffer and exits (it does, however, still return an event. The claimed property of that event is set to TRUE to indicate that the event has already been responded to (error message printed)).
If the interpreter succeeds in identifying all the words, it then goes on to check the syntax of the sentence - it builds a parse tree. See the appropri- ate document.
If the syntax of the sentence is invalid, the interpreter calls Game::syntaxFail, passing the entire input line. As for the error situation, the event is claimed.
As mentioned in the beginning of this text, this function generates an event. This event, apart from its type id, does not contain any data. Rather, all pertinent data is kept in the interpreter.
The Said kernel function is called for each command which the game might respond to at any given time. Its only parameter is a pointer to a said information block which resides in script space. This block is described below (see the Said specs section).
The Said function first does some sanity checking on the event pointer which Parse stored earlier. It must be a said event (type property), and it must not have been handled by an earlier call to Said (claimed property).
It then word-extends the passed said block into a temporary buffer (command codes are byte-sized, remember?). This is supposedly just for convenience/speed, and not really needed.
==The Parse tree==
This and the two following sections borrow some ideas and structures from abstract language theory. Readers might want to consider related literature.
Most of the information explained here was gathered by Lars Skovlund, and, before that, Dark Minister.
After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>&pi;</sub> from the input tokens
<br><i>I :=</i> &omega;<sub>0</sub>, &omega;<sub>1</sub>,&omega;<sub>2</sub> &hellip; &omega;<sub>n-1</sub> <!-- Mediawiki LaTeX <math>I:=~\omega_0,\omega_1,\omega_2 \ldots \omega_{n-1}</math> --><br>
where
* <i>&omega;<sub>j</sub> &isin; W</i>
* <i>&gamma;<sub>j</sub> &isin; &Gamma;</i>
* <i>&mu;<sub>j</sub> &isin; 2<sup>c</sup></i>
* <i>&omega;<sub>j</sub> = (&gamma;<sub>j</sub>, &mu;<sub>j</sub></i>
<!-- Math formulas
* <math>\omega_j \in W</math>
*<math>\gamma_j \in \Gamma</math>
*<math>\mu_j \in 2^C</math>
*<math>\omega_j = (\gamma_j, \mu_j</math> -->
With <i>W</i> being the set of all words, <i>&Gamma;</i> being the set of all word groups, <i>C</i> being the set of all class masks <nowiki>{1,2,4,8,10,20,40,80,100}</nowiki>, &gamma;<sub>j</sub> being the word group <i>&omega;<sub>j</sub></i> belongs to, and <i>&omega;<sub>j</sub></i> being its class mask, as described above.
For the following sections, we define
245

edits