Open main menu

Difference between revisions of "SCI/Specifications/SCI in action/Parser"

→‎The Parse tree: Fixing lots of mistakes and errors (partially introduced during transition from LaTeX to wiki, partially older)
m (→‎Said specs: fix invalid usage of <syntax>)
(→‎The Parse tree: Fixing lots of mistakes and errors (partially introduced during transition from LaTeX to wiki, partially older))
Line 154: Line 154:
After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>&pi;</sub> from the input tokens
After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>&pi;</sub> from the input tokens


<br><i>I :=</i> &omega;<sub>0</sub>, &omega;<sub>1</sub>,&omega;<sub>2</sub> &hellip; &omega;<sub>n-1</sub> <!-- Mediawiki LaTeX <math>I:=~\omega_0,\omega_1,\omega_2 \ldots \omega_{n-1}</math> --><br>
:<i>I :=</i> &omega;<sub>0</sub>, &omega;<sub>1</sub>, &omega;<sub>2</sub> &hellip; &omega;<sub>n-1</sub> <!-- Mediawiki LaTeX <math>I:=~\omega_0,\omega_1,\omega_2 \ldots \omega_{n-1}</math> --><br>


where
where
Line 161: Line 161:
* <i>&gamma;<sub>j</sub> &isin; &Gamma;</i>
* <i>&gamma;<sub>j</sub> &isin; &Gamma;</i>
* <i>&mu;<sub>j</sub> &isin; 2<sup>c</sup></i>
* <i>&mu;<sub>j</sub> &isin; 2<sup>c</sup></i>
* <i>&omega;<sub>j</sub> = (&gamma;<sub>j</sub>, &mu;<sub>j</sub></i>
* <i>&omega;<sub>j</sub> = (&gamma;<sub>j</sub>, &mu;<sub>j</sub>)</i>
<!-- Math formulas
<!-- Math formulas
* <math>\omega_j \in W</math>
* <math>\omega_j \in W</math>
Line 172: Line 172:
For the following sections, we define
For the following sections, we define


* group <i>W &rarr; &Gamma;.group : (&gamma;,&mu;) (&gamma;,&mu;) &#8614; &gamma;</i>
* group: <i>W &rarr; &Gamma; : (&gamma;,&mu;) &#8614; &gamma;</i>
* classes: <i>W &rarr; C</i>.classes: (&gamma;,&mu;) &#8614; &mu;
* classes: <i>W &rarr; C : (&gamma;,&mu;) &#8614; &mu;</i>
* <i>C<sub>x</sub> = {&omega;|&omega; &isin; class(&omega;)}</i>
* <i>C<sub>x</sub> = {&omega; &isin; W | x &isin; classes(&omega;)}</i>


To do that, it uses the class masks <i>M</i> as input for a pushdown automaton (PDA) A built from a parser grammar; if <i>M</i> was accepted by <i>A</i>, the parse tree <i>T</i><sub>&pi;</sub> will be built from the matching syntax tree to represent the semantics.  
To do that, it uses the class masks <i>M</i> as input for a pushdown automaton (PDA) A built from a parser grammar; if <i>M</i> was accepted by <i>A</i>, the parse tree <i>T</i><sub>&pi;</sub> will be built from the matching syntax tree to represent the semantics.  
Line 207: Line 207:
|0x14d
|0x14d
|Terminal
|Terminal
|Match on word group: Matches if <span style="white-space:nowrap"><i>m<sub>i</sub> &#61; group(&omega;<sub>j</sub>)</i></span>
|Match on word group: Matches if <span style="white-space:nowrap"><i>m<sub>i</sub> = group(&omega;<sub>j</sub>)</i></span>
|----
|----
|0x154
|0x154
Line 215: Line 215:
|}
|}


With the notable exception of the first rule, these rules constitute  <span style="white-space:nowrap"><i>P. V := {&chi;|&exist;R &isin; P.&chi; &exist R}</i></span>; typically, <span style="white-space:nowrap"><i> V = {0&chi;12&fnof;&hellip;0&chi;13&fnof;}&sdot; s = m<sub>0</sub></i></span> of the first rule encountered; in all games observed, it was set to 0x13c. &Sigma; contains all word groups and class masks. For the sake of simplicity, we will consider rules matching composite class masks to be several rules. Here is a simplified example of what such a grammar might look like (the hexadecimal prefix '0x' is omitted for brevity):
With the notable exception of the first rule, these rules constitute  <span style="white-space:nowrap"><i>P. V := {x | &exist;R &isin; P. x &isin; R}</i></span>; typically, <span style="white-space:nowrap"><i> V = {0x12f,&hellip;,0x13f}&sdot; s = m<sub>0</sub></i></span> of the first rule encountered; in all games observed, it was set to 0x13c. &Sigma; contains all word groups and class masks. For the sake of simplicity, we will consider rules matching composite class masks to be several rules. Here is a simplified example of what such a grammar might look like (the hexadecimal prefix '0x' is omitted for brevity):


In addition to this grammar, each right-hand non-terminal <i>m<sub>i</sub></i> carries its semantic value <i>&rho;<sub>i</sub></i> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <i>T</i><sub>&pi;</sub>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref>
In addition to this grammar, each right-hand non-terminal <i>m<sub>i</sub></i> carries its semantic value <i>&rho;<sub>i</sub></i> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <i>T</i><sub>&pi;</sub>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref>
Line 223: Line 223:


====Parser grammar example====
====Parser grammar example====
<span style="white-space:nowrap">G = &lang;{12&fnof;&hellip;13e},{C<sub>1</sub>,C<sub>2</sub>,C<sub>4</sub>,&hellip;,C<sub>100</sub>},P,13c&rang;</span>
<span style="white-space:nowrap">G = &lang;{12f,&hellip;,13e},{C<sub>1</sub>,C<sub>2</sub>,C<sub>4</sub>,&hellip;,C<sub>100</sub>},P,13c&rang;</span>
{| <i>
{| <i>
|----
|----
Line 295: Line 295:
there are semantic tags <i>&sigma;<sub>r,1</sub>,&sigma;<sub>r,2</sub>&hellip;&sigma;<sub>r,n</sub> &isin; S</i> , as explained above. <i>T</i><sub>&pi;</sub> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <i>T</i><sub>&pi;</sub> being constructed from nodes, which have the following structure:
there are semantic tags <i>&sigma;<sub>r,1</sub>,&sigma;<sub>r,2</sub>&hellip;&sigma;<sub>r,n</sub> &isin; S</i> , as explained above. <i>T</i><sub>&pi;</sub> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <i>T</i><sub>&pi;</sub> being constructed from nodes, which have the following structure:


<i>NODE = {&#9671;} &cup; S &#215; V &#215; (NODE &cup; &Gamma;);*</i>
<i>NODE = {&#9671;} &cup; S &#215; V &#215; (NODE &cup; &Gamma;)*;</i>


Where S is the set of possible semantic values, and V is the set of non-terminals as defined in the grammar. We will also use the sequence <i>&gamma;<sub>0</sub>,&gamma<sub>1</sub>,&gamma<sub>2</sub> &hellip; &gamma<sub>k-1</sub></i> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <i>r<sub>0</sub>,r<sub>1</sub>,r<sub>2</sub> &hellip; r<sub>l-1</sub></i> , which will be the list of rules used to create the left derivation tree as described in the previous section.
Where S is the set of possible semantic values, and V is the set of non-terminals as defined in the grammar. We will also use the sequence <i>&gamma;<sub>0</sub>,&gamma<sub>1</sub>,&gamma<sub>2</sub> &hellip; &gamma<sub>k-1</sub></i> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <i>r<sub>0</sub>,r<sub>1</sub>,r<sub>2</sub> &hellip; r<sub>l-1</sub></i> , which will be the list of rules used to create the left derivation tree as described in the previous section.
1,079

edits