Difference between revisions of "SCI/Specifications/SCI in action/Parser"

Jump to navigation Jump to search
(→‎The Parse tree: Started to convert things to <math>; clarified some parts of the text)
 
(3 intermediate revisions by the same user not shown)
Line 218: Line 218:
With the notable exception of the first rule, these rules constitute  the set of production rules <math>P</math>. Furthermore we have <math>V := \{x \mid \exists R \in P :  x \in R\}</math>; typically, <math>V = \{ \mathrm{0x12f}, \ldots, \mathrm{0x13f}\}</math> and  <math>s = m_0</math> of the first rule encountered; in all games observed, it was set to 0x13c. <math>\Sigma</math> contains all word groups and class masks. For the sake of simplicity, we will consider rules matching composite class masks to be several rules. Here is a simplified example of what such a grammar might look like (the hexadecimal prefix '0x' is omitted for brevity):
With the notable exception of the first rule, these rules constitute  the set of production rules <math>P</math>. Furthermore we have <math>V := \{x \mid \exists R \in P :  x \in R\}</math>; typically, <math>V = \{ \mathrm{0x12f}, \ldots, \mathrm{0x13f}\}</math> and  <math>s = m_0</math> of the first rule encountered; in all games observed, it was set to 0x13c. <math>\Sigma</math> contains all word groups and class masks. For the sake of simplicity, we will consider rules matching composite class masks to be several rules. Here is a simplified example of what such a grammar might look like (the hexadecimal prefix '0x' is omitted for brevity):


In addition to this grammar, each right-hand non-terminal <i>m<sub>i</sub></i> carries its semantic value <i>&rho;<sub>i</sub></i> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <i>T</i><sub>&pi;</sub>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref>
In addition to this grammar, each right-hand non-terminal <math>m_i</math> carries its semantic value <math>\rho_i</math> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <math>T_\pi</math>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref>


Obviously, G is an ambiguous grammar. In SCI, rule precedence is implied by rule order, so the resulting left derivation tree is well-defined (in the example, it would be defined by <i>D<sub>0</sub></i>.<ref>In FreeSCI, you can use the ”parse” console command to retreive all possible left derivation trees.</ref>
Obviously, <math>G</math> is an ambiguous grammar. In SCI, rule precedence is implied by rule order, so the resulting left derivation tree is well-defined (in the Parser example further down, it would be defined by <math>D_0</math>.<ref>In FreeSCI, you can use the ”parse” console command to retrieve all possible left derivation trees.</ref>
<br>
<br>


====Parser grammar example====
====Parser grammar example====
<span style="white-space:nowrap">G = &lang; { 12f, &hellip;, 13e }, { C<sub>1</sub>, C<sub>2</sub>, C<sub>4</sub>, &hellip;, C<sub>100</sub> }, P, 13c &rang;</span>
<math>G = \langle \{ \mathrm{12f}, \ldots, \mathrm{13e} \}, \{ C_1, C_2, C_4, \ldots, C_{100} \}, P, \mathrm{13c} \rangle</math>
{| <i>
{| <i>
|----
|----
Line 290: Line 290:


===Semantics===
===Semantics===
This is important, since the parser does much more than just accept or discard input. Using the semantic tags applied to each non-terminal on the right-hand side of a rule, it constructs what I will call the semantic parse tree <i>T</i><sub>&pi;</sub> , which attempts to describe what the input means. For each non-terminal rule
This is important, since the parser does much more than just accept or discard input. Using the semantic tags applied to each non-terminal on the right-hand side of a rule, it constructs what I will call the semantic parse tree <math>T_\pi</math> , which attempts to describe what the input means. For each non-terminal rule


<i>r = &upsilon;<sub>0</sub> &#8614; &upsilon;<sub>1</sub>&upsilon;<sub>2</sub>&hellip;&upsilon;<sub>n</sub></i>
<math>r = v_0 \to v_1 v_2 \ldots v_n</math>


there are semantic tags <i>&sigma;<sub>r,1</sub>,&sigma;<sub>r,2</sub>&hellip;&sigma;<sub>r,n</sub> &isin; S</i> , as explained above. <i>T</i><sub>&pi;</sub> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <i>T</i><sub>&pi;</sub> being constructed from nodes, which have the following structure:
there are semantic tags <math>\sigma_{r,1}, \sigma_{r,2} \ldots \sigma_{r,n} \in S</math>, as explained above. <math>T_\pi</math> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <math>T_\pi</math> being constructed from nodes, which have the following recursive structure:


<i>NODE = {&#9671;} &cup; S &#215; V &#215; (NODE &cup; &Gamma;)*;</i>
<math>
\newcommand{\Node}{\textrm{NODE}}
\Node = \{\diamond\} \cup S \times V \times (\Node \cup \Gamma)*;</math>


Where S is the set of possible semantic values, and V is the set of non-terminals as defined in the grammar. We will also use the sequence <i>&gamma;<sub>0</sub>,&gamma<sub>1</sub>,&gamma<sub>2</sub> &hellip; &gamma<sub>k-1</sub></i> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <i>r<sub>0</sub>,r<sub>1</sub>,r<sub>2</sub> &hellip; r<sub>l-1</sub></i> , which will be the list of rules used to create the left derivation tree as described in the previous section.
Where <math>S</math> is the set of possible semantic values, and <math>V</math> is the set of non-terminals as defined in the grammar. We will also use the sequence <math>\gamma_{0}, \gamma_{1}, \gamma_{2} \ldots \gamma_{k-1}</math> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <math>r_{0}, r_{1}, r_{2} \ldots r_{l-1}</math>, which will be the list of rules used to create the left derivation tree as described in the previous section.


<pre>
<pre>
Line 332: Line 334:
====Example: Parser example====
====Example: Parser example====
Parse is called with <i>"open door"</i>.
Parse is called with <i>"open door"</i>.
* <i>"open" &isin; &lang;842,{C<sub>80</sub>}&rang;</i> (an imperative word of the word group 0x842)
* <math>\textrm{"open"} \in \langle 842, \{ C_{80} \} \rangle</math> (an imperative word of the word group 0x842)
* <i>"door" &isin; &lang;917,{C<sub>10</sub>}&rang; (a substantive of the word group 0x917)</i>
* <math>\textrm{"door"} \in \langle 917, \{ C_{10} \} \rangle</math> (a substantive of the word group 0x917)
* <i>I = &lang;842,{C<sub>80</sub>}&rang;,&lang;917,{C<sub>10</sub>}&rang;</i>
* <math>I = \langle 842, \{C_{80} \} \rangle, \langle 917, \{C_{10} \} \rangle</math>


<i>I</i> is clearly accepted by automatons based on the grammar described above, There are two possible derivations:
<math>I</math> is clearly accepted by automatons based on the grammar described above, There are two possible derivations:


{|
{|
|D<sub>0</sub> = 13c
|colspan="2" align="center"| D<sub>0</sub> = 13c
|width="50em"|&nbsp;
|colspan="2" align="center"| D<sub>1</sub> = 13c
|----
|----
|(13c &#8614; 13b134)
|(13c &#8614; 13b 134)
|&#8866; 13b 134
|&#8866; 13b 134
|
|(13c &#8614; 13b)
|&#8866; 13b
|----
|----
|(13b &#8614; 131)
|(13b &#8614; 131)
|&#8866; 131 134
|
|(13b &#8614; 131 134)
|&#8866; 131 134
|&#8866; 131 134
|----
|----
|(131 &#8614; C<sub>80</sub>
|&#8866; C<sub>80</sub> 134
|
|(131 &#8614; C<sub>80</sub>
|(131 &#8614; C<sub>80</sub>
|&#8866; C<sub>80</sub> 134
|&#8866; C<sub>80</sub> 134
Line 352: Line 365:
|(134 &#8614; C<sub>10</sub>
|(134 &#8614; C<sub>10</sub>
|&#8866; C<sub>80</sub> C<sub>10</sub>
|&#8866; C<sub>80</sub> C<sub>10</sub>
|----
|
|}
 
<br>
 
{|
|D<sub>1</sub> = 13c
|----
|(13c &#8614; 13b)
|&#8866; 13b
|----
|(13b &#8614; 131134)
|&#8866; 131 134
|----
|(131 &#8614; C<sub>80</sub>
|&#8866; C<sub>80</sub> 134
|----
|(134 &#8614; C<sub>10</sub>
|(134 &#8614; C<sub>10</sub>
|&#8866; C<sub>80</sub> C<sub>10</sub>
|&#8866; C<sub>80</sub> C<sub>10</sub>
|----
|----
|}
|}


====Example: Semantic tree example====
====Example: Semantic tree example====
Line 545: Line 541:


==Matching the trees==
==Matching the trees==
The exact algorithm used to compare <i>T</i><sub>&pi;</sub> to <i>T</i><sub>&Sigma;</sub> is not known yet. The one described here is based on the approximation used in FreeSCI, which is very similar to the original SCI one.
The exact algorithm used to compare <math>T_\pi</math> to <math>T_\Sigma</math> is not known yet. The one described here is based on the approximation used in FreeSCI, which is very similar to the original SCI one.
 
First, we need to describe a set of functions for traversing the nodes of <i>T</i><sub>&Sigma;</sub> and <i>T</i><sub>&pi;</sub> , and doing some work. They will be operating on the sets &#8469; (all non-negative integral numbers), &#120121; (Booleans), and <i>NODE</i> (which we defined earlier).
<br>
first: Node &rarr; S first((s, v, n<sub>0</sub>, n<sub>1</sub> &hellip; n<sub>i</sub>)) := s<br>
 
second : Node &rarr; V second((s, v, n<sub>0</sub>, n<sub>1</sub> &hellip; n<sub>i</sub>)) := v<br>
 
word : Node &rarr; &Gamma word((s, v, γ)) := &gamma;<br>
 
children : Node &rarr; Node∗ children((s, v, n<sub>0</sub>, n<sub>1</sub> &hellip; n<sub>i</sub>)) := { m | &forall;m.m&isin;{ n<sub>0</sub>, n<sub>1</sub> &hellip; n<sub>i</sub> } &and; m&isin;Node }<br>
 
all_children : Node &rarr; Node∗ all_children(n) := children(n) &cup; { m | &exist;l.l&isin;all_children(n).m&isin;l }<br>
 
is_word : Node &rarr; B is_word((s, v, n<sub>0</sub>, n<sub>1</sub> &hellip; n<sub>i</sub>) = tt &hArr; (i = 0) &and; n<sub>0</sub> &isin; &Gamma<br>
 
contains_word : Node &times; S &times; &Gamma &rarr; B contains_word(n, s, &gamma;) = tt &gamma; = 0xfff<ref>This is the so-called "anyword". Words with a word group of 0xfff match any other word.</ref> ∨ (&hArr; &exist;m.m&isin;all_children(n).(s = second(m)) &and; (is_word(m) &and; (word(m) = &gamma;)))<br>
 
verify_sentence_part_elements : Node &times; Node &rarr; B verify_sentence_part_elements(n<sub>p</sub>, n<sub>s</sub>) = tt &hArr; (first(n<sub>s</sub> = 152) &and; ((&forall;m.m &isin; Node.verify_sentence_part_elements(m, n<sub>s</sub>) &hArr; { w | &exist;t.t &isin; all_children(m).w = word(t)} = &empty;) &or; &exist;m &isin;<br>
children(n<sub>s</sub>).verify_sentence_part_elements(m, n<sub>s</sub>)) ) &or; ((second(n<sub>s</sub>) = 153) &and; (&exist;m.m &isin; children(n<sub>s</sub>).(&exist;o &isin; all_children(n<sub>s</sub>).(first(o) = first(n<sub>p</sub>)) &and; word(o) = word(m))) ) &or; ((second(n<sub>s</sub>) &isin; {144, 14c}) &and; (&exist;m.m &isin; children(n<sub>s</sub>).verify_sentence_part(m, n<sub>s</sub>)))<br>


verify_sentence_part : Node &times; Node &rarr; B<br>
First, we need to describe a set of functions for traversing the nodes of <math>T_\Sigma</math> and <math>T_\pi</math>, and doing some work. They will be operating on the sets <math>\mathbb{N}</math> (all non-negative integral numbers), <math>\mathbb{B}</math> (Booleans), and <math>\mathrm{Node}</math> (which we defined earlier).
verify_sentence_part(n<sub>p</sub>, n<sub>s</sub>) = tt &hArr; &forall;n.n &isin; children(n<sub>s</sub>):&exist;m.m&isin;children(n<sub>p</sub>).(first(m) = first(n)) &and; verify_sentence_part_elements(n, m)<br>


verify_sentence_part_brackets : Node &times; Node &rarr; B verify_sentence_part_brackets(n<sub>p</sub>, n<sub>s</sub>) = tt &hArr; (first(n<sub>p</sub>) = 152 &and; (&forall;m.m&isin;Node.(first(m) = first(n<sub>s</sub>)) &and; (second(m) = second(n<sub>s</sub>)). verify_sentence_part(n<sub>p</sub>, m) &hArr; { w | &exist;t.t &isin; all_children(m).w = word(t)} = &empty;)) &or; ((first(n<sub>p</sub>) &isin; {141, 142, 143}) &and; verify_sentence_part(n<sub>p</sub>, n<sub>s</sub>))<br>
<math>
\newcommand{\tuple}[1]{\langle #1 \rangle}
\newcommand{\nat}{\mathbb{N}}
\newcommand{\bool}{\mathbb{B}}
\newcommand{\Btt}{\textrm{tt}}
\newcommand{\Bff}{\textrm{ff}}
\newcommand{\Node}{\mathrm{Node}}
\begin{align*}
\textrm{first}: & \Node \to S \\
\textrm{first}: & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto s \\
\\
\textrm{second}: & \Node \to V \\
\textrm{second}: & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto v \\
\\
\textrm{word}: & \Node \to \Gamma \\
\textrm{word}: & \tuple{s, v, \gamma} \mapsto \gamma \\
\\
\textrm{children}: & \Node \to \Node* \\
\textrm{children}: & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto \{ m\in \Node \mid m\in\{ n_{0}, n_{1} \ldots n_{i} \} \} \\
\\
\textrm{all-children}: & \Node \to \Node* \\
\textrm{all-children}: & n \mapsto \textrm{children}(n) \cup \{ m \mid \exists l \in\textrm{all-children}(n) . m\in l \} \\
\\
\textrm{is-word} : & \Node \to \bool \\
\textrm{is-word} : & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto \Btt \iff (i = 0) \land n_{0} \in \Gamma \\
\textrm{verify-sentence-part-elements}: & \Node \times \Node \to \bool \\
\textrm{verify-sentence-part-elements}: & \tuple{n_{p}, n_{s}} \mapsto \Btt \iff (\textrm{first}(n_{s} = 152) \land \\
& ((\forall m \in \Node.\textrm{verify-sentence-part-elements}(m, n_{s}) \iff \\
& \{ w \mid \exists t \in \textrm{all-children}(m).w = \textrm{word}(t)\} = \emptyset) \lor \\
& \exists m \in \textrm{children}(n_{s}).\textrm{verify-sentence-part-elements}(m, n_{s})) ) \lor \\
& ((\textrm{second}(n_{s}) = 153) \land (\exists m.m \in \textrm{children}(n_{s}).(\exists o \in \textrm{all-children}(n_{s}).\\
& (\textrm{first}(o) = \textrm{first}(n_{p})) \land \textrm{word}(o) = \textrm{word}(m))) ) \lor \\
& ((\textrm{second}(n_{s}) \in \{144, 14c\}) \land
(\exists m \in \textrm{children}(n_{s}).\textrm{verify-sentence-part}(m, n_{s}))) \\
\\
\textrm{verify-sentence-part}: & \Node \times \Node \to \bool \\
\textrm{verify-sentence-part}: & \tuple{n_{p}, n_{s}} \mapsto \Btt \iff \forall n \in \textrm{children}(n_{s}):\exists m\in\textrm{children}(n_{p}). \\
& (\textrm{first}(m) = \textrm{first}(n)) \land \textrm{verify-sentence-part-elements}(n, m) \\
\\
\textrm{verify-sentence-part-brackets}: & \Node \times \Node \to \bool \\
\textrm{verify-sentence-part-brackets}: & \tuple{n_{p}, n_{s}} \mapsto \Btt \iff
(\textrm{first}(n_{p}) = 152 \land
(\forall m\in \Node. (\textrm{first}(m) = \textrm{first}(n_{s})) \land\\
& (\textrm{second}(m) = \textrm{second}(n_{s})). \textrm{verify-sentence-part}(n_{p}, m) \iff\\
& \{ w \mid \exists t \in \textrm{all-children}(m).w = \textrm{word}(t)\} = \emptyset)) \lor \\
& ((\textrm{first}(n_{p}) \in \{141, 142, 143\}) \land
    \textrm{verify-sentence-part}(n_{p}, n_{s})) \\
\end{align*}
</math>


With these functions, we can now define an algorithm for augmenting T<sub>&pi;</sub> and T<sub>&Sigma;</sub> :
With these functions, we can now define an algorithm for augmenting <math>T_\pi</math> and <math>T_\Sigma</math>:


Algorithm SCI-AUGMENT matched := tt claim_on_match := tt FOREACH n &isin; root(T<sub>&Sigma;</sub>) IF ((first(n) = 14b) &and; (second(n) = f900)) THEN claim_on_match := ff ELSE IF &not;verify_sentence_part_brackets(n, root(T<sub>&pi;</sub>)) THEN matched := ff HCAEROF
  Algorithm SCI-AUGMENT:
  matched := tt
  claim_on_match := tt
  FOREACH n &isin; root(<math>T_\Sigma</math>)
    IF ((first(n) = 14b) &and; (second(n) = f900)) THEN
      claim_on_match := ff
    ELSE IF &not;verify_sentence_part_brackets(n, root(<math>T_\pi</math>)) THEN
      matched := ff
    ENDIF
  ENDFOREACH


Augmenting succeeded if matched = tt; in this case, T<sub>&pi</sub> is one of the trees accepted by the description provided by T<sub>&Sigma</sub>. In this case, Said() will return 1. It will also claim the event previously provided to Parse(), unless claim_on_match = ff.
Augmenting succeeded if matched = tt; in this case, <math>T_\pi</math> is one of the trees accepted by the description provided by <math>T_\Sigma</math>. In this case, Said() will return 1. It will also claim the event previously provided to Parse(), unless claim_on_match = ff.


==Notes==
==Notes==
<references/>
<references/>
1,079

edits

Navigation menu