1,079
edits
m (→Said specs: fix invalid usage of <syntax>) |
(→The Parse tree: Fixing lots of mistakes and errors (partially introduced during transition from LaTeX to wiki, partially older)) |
||
Line 154: | Line 154: | ||
After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>π</sub> from the input tokens | After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>π</sub> from the input tokens | ||
:<i>I :=</i> ω<sub>0</sub>, ω<sub>1</sub>, ω<sub>2</sub> … ω<sub>n-1</sub> <!-- Mediawiki LaTeX <math>I:=~\omega_0,\omega_1,\omega_2 \ldots \omega_{n-1}</math> --><br> | |||
where | where | ||
Line 161: | Line 161: | ||
* <i>γ<sub>j</sub> ∈ Γ</i> | * <i>γ<sub>j</sub> ∈ Γ</i> | ||
* <i>μ<sub>j</sub> ∈ 2<sup>c</sup></i> | * <i>μ<sub>j</sub> ∈ 2<sup>c</sup></i> | ||
* <i>ω<sub>j</sub> = (γ<sub>j</sub>, μ<sub>j</sub></i> | * <i>ω<sub>j</sub> = (γ<sub>j</sub>, μ<sub>j</sub>)</i> | ||
<!-- Math formulas | <!-- Math formulas | ||
* <math>\omega_j \in W</math> | * <math>\omega_j \in W</math> | ||
Line 172: | Line 172: | ||
For the following sections, we define | For the following sections, we define | ||
* group <i>W → Γ | * group: <i>W → Γ : (γ,μ) ↦ γ</i> | ||
* classes: <i>W → C | * classes: <i>W → C : (γ,μ) ↦ μ</i> | ||
* <i>C<sub>x</sub> = {ω | * <i>C<sub>x</sub> = {ω ∈ W | x ∈ classes(ω)}</i> | ||
To do that, it uses the class masks <i>M</i> as input for a pushdown automaton (PDA) A built from a parser grammar; if <i>M</i> was accepted by <i>A</i>, the parse tree <i>T</i><sub>π</sub> will be built from the matching syntax tree to represent the semantics. | To do that, it uses the class masks <i>M</i> as input for a pushdown automaton (PDA) A built from a parser grammar; if <i>M</i> was accepted by <i>A</i>, the parse tree <i>T</i><sub>π</sub> will be built from the matching syntax tree to represent the semantics. | ||
Line 207: | Line 207: | ||
|0x14d | |0x14d | ||
|Terminal | |Terminal | ||
|Match on word group: Matches if <span style="white-space:nowrap"><i>m<sub>i</sub> | |Match on word group: Matches if <span style="white-space:nowrap"><i>m<sub>i</sub> = group(ω<sub>j</sub>)</i></span> | ||
|---- | |---- | ||
|0x154 | |0x154 | ||
Line 215: | Line 215: | ||
|} | |} | ||
With the notable exception of the first rule, these rules constitute <span style="white-space:nowrap"><i>P. V := { | With the notable exception of the first rule, these rules constitute <span style="white-space:nowrap"><i>P. V := {x | ∃R ∈ P. x ∈ R}</i></span>; typically, <span style="white-space:nowrap"><i> V = {0x12f,…,0x13f}⋅ s = m<sub>0</sub></i></span> of the first rule encountered; in all games observed, it was set to 0x13c. Σ contains all word groups and class masks. For the sake of simplicity, we will consider rules matching composite class masks to be several rules. Here is a simplified example of what such a grammar might look like (the hexadecimal prefix '0x' is omitted for brevity): | ||
In addition to this grammar, each right-hand non-terminal <i>m<sub>i</sub></i> carries its semantic value <i>ρ<sub>i</sub></i> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <i>T</i><sub>π</sub>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref> | In addition to this grammar, each right-hand non-terminal <i>m<sub>i</sub></i> carries its semantic value <i>ρ<sub>i</sub></i> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <i>T</i><sub>π</sub>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref> | ||
Line 223: | Line 223: | ||
====Parser grammar example==== | ====Parser grammar example==== | ||
<span style="white-space:nowrap">G = ⟨{ | <span style="white-space:nowrap">G = ⟨{12f,…,13e},{C<sub>1</sub>,C<sub>2</sub>,C<sub>4</sub>,…,C<sub>100</sub>},P,13c⟩</span> | ||
{| <i> | {| <i> | ||
|---- | |---- | ||
Line 295: | Line 295: | ||
there are semantic tags <i>σ<sub>r,1</sub>,σ<sub>r,2</sub>…σ<sub>r,n</sub> ∈ S</i> , as explained above. <i>T</i><sub>π</sub> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <i>T</i><sub>π</sub> being constructed from nodes, which have the following structure: | there are semantic tags <i>σ<sub>r,1</sub>,σ<sub>r,2</sub>…σ<sub>r,n</sub> ∈ S</i> , as explained above. <i>T</i><sub>π</sub> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <i>T</i><sub>π</sub> being constructed from nodes, which have the following structure: | ||
<i>NODE = {◇} ∪ S × V × (NODE ∪ Γ); | <i>NODE = {◇} ∪ S × V × (NODE ∪ Γ)*;</i> | ||
Where S is the set of possible semantic values, and V is the set of non-terminals as defined in the grammar. We will also use the sequence <i>γ<sub>0</sub>,&gamma<sub>1</sub>,&gamma<sub>2</sub> … &gamma<sub>k-1</sub></i> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <i>r<sub>0</sub>,r<sub>1</sub>,r<sub>2</sub> … r<sub>l-1</sub></i> , which will be the list of rules used to create the left derivation tree as described in the previous section. | Where S is the set of possible semantic values, and V is the set of non-terminals as defined in the grammar. We will also use the sequence <i>γ<sub>0</sub>,&gamma<sub>1</sub>,&gamma<sub>2</sub> … &gamma<sub>k-1</sub></i> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <i>r<sub>0</sub>,r<sub>1</sub>,r<sub>2</sub> … r<sub>l-1</sub></i> , which will be the list of rules used to create the left derivation tree as described in the previous section. |
edits