1,079
edits
(Merging of the SCI documentation. Work in progress.) |
|||
(14 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
=The Parser= | =The Parser= | ||
Line 22: | Line 20: | ||
;0x080 : Relative pronoun | ;0x080 : Relative pronoun | ||
;0x100 : Noun | ;0x100 : Noun | ||
;0x200 : Indicative verb | ;0x200 : Indicative verb (such as "is","went" as opposed to _do_ this or that, which is imperative) | ||
;0x400 : Adverb | ;0x400 : Adverb | ||
;0x800 : Imperative verb | ;0x800 : Imperative verb | ||
Line 54: | Line 52: | ||
|} | |} | ||
<br> | <br> | ||
===The suffix vocabulary (VOCAB.901)=== | ===The suffix vocabulary (VOCAB.901)=== | ||
For the following section, a reference to a grammar book may be advisable. | For the following section, a reference to a grammar book may be advisable. | ||
Line 155: | Line 154: | ||
After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>π</sub> from the input tokens | After tokenizing, looking up, and finally aliasing the data found in the parsed input string, the interpreter proceeds to build a parse tree <!-- <math>T_\pi</math> --> <i>T</i><sub>π</sub> from the input tokens | ||
:<math>I~:=~\omega_0,\omega_1,\omega_2 \ldots \omega_{n-1}</math> | |||
where | where | ||
* < | * <math>\omega_j \in W</math>, with <math>W</math> being the set of all words; | ||
* < | * <math>\gamma_j \in \Gamma</math>, with <math>\Gamma</math> being the set of all word groups; | ||
* < | * <math>\mu_j \in 2^C</math>, with <math>C</math> being the set of all class masks <nowiki>{0x1,0x2,0x4,0x8,0x10,0x20,0x40,0x80,0x100}</nowiki>, and <math>2^C</math> being the set of all subsets of <math>C</math>; | ||
* <math>\omega_j = (\gamma_j, \mu_j)</math>, with <math>\gamma_j</math> being the word group <math>\omega_j</math> belongs to, and <math>\mu_j</math> being its class mask. | |||
Note that elements of <math>2^C</math> (i.e. sets of class masks) can be identified with the ORed value of these class masks. So the set <nowiki>{0x2,0x4,0x80}</nowiki> can be identified with the value 0x86. | |||
For the following sections, we define two maps and a set: | |||
* <math>\operatorname{group}: W \to \Gamma : (\gamma,\mu) \mapsto \gamma</math> | |||
* <math>\operatorname{classes}: W \to 2^C : (\gamma,\mu) \mapsto \mu</math> | |||
* <math>C_x = \{ \omega \in W \mid x \in \operatorname{classes}(\omega)\}</math> | |||
To do that, it uses the class masks <math>M</math> as input for a pushdown automaton (PDA) A built from a parser grammar; if <math>M</math> was accepted by <math>A</math>, the parse tree <math>T_pi</math> will be built from the matching syntax tree to represent the semantics. | |||
The PDA is defined by a grammar <math>G = (V,\Sigma,P,s)</math> where | |||
* <math>V</math> is finite set of <i>nonterminal symbols</i>, | |||
* <math>\Sigma</math> is a finite set <i>terminal symbols</i> that is disjoint from <math>V</math>, | |||
* <math>P</math> is a finite set of <i>production rules</i>, | |||
* <math>s \in V</math> is the <i>start symbol</i>. | |||
Most of this grammar, along with its semantics, is stored in <tt>vocab.900</tt>. This resource contains a parser rule at every 20 bytes, starting with a non-terminal symbol <math>\upsilon</math> (one word) and a null-terminated list of up to five tuples <math>(\sigma_i, m_i)</math>, both of which are words. In these tuples, <math>m_i</math> is a terminal or non-terminal symbol (determined by <math>\sigma_i</math> ), and <math>\sigma_i</math> is the meaning of <math>m_i</math> as described in the following table: | |||
{| border="1" | {| border="1" style="margin: 1em auto 1em auto" | ||
|< | |<math>\sigma_i</math> | ||
|Type | |Type | ||
|Meaning | |Meaning | ||
Line 204: | Line 204: | ||
|0x146 | |0x146 | ||
|Terminal | |Terminal | ||
|Match on class mask: Matches if < | |Match on class mask: Matches if <math>m_i\in\operatorname{classes}(\omega_j)</math> | ||
|---- | |---- | ||
|0x14d | |0x14d | ||
|Terminal | |Terminal | ||
|Match on word group: Matches if < | |Match on word group: Matches if <math>m_i=\operatorname{group}(\omega_j)</math> | ||
|---- | |---- | ||
|0x154 | |0x154 | ||
Line 216: | Line 216: | ||
|} | |} | ||
With the notable exception of the first rule, these rules constitute < | With the notable exception of the first rule, these rules constitute the set of production rules <math>P</math>. Furthermore we have <math>V := \{x \mid \exists R \in P : x \in R\}</math>; typically, <math>V = \{ \mathrm{0x12f}, \ldots, \mathrm{0x13f}\}</math> and <math>s = m_0</math> of the first rule encountered; in all games observed, it was set to 0x13c. <math>\Sigma</math> contains all word groups and class masks. For the sake of simplicity, we will consider rules matching composite class masks to be several rules. Here is a simplified example of what such a grammar might look like (the hexadecimal prefix '0x' is omitted for brevity): | ||
In addition to this grammar, each right-hand non-terminal < | In addition to this grammar, each right-hand non-terminal <math>m_i</math> carries its semantic value <math>\rho_i</math> , which is not relevant for constructing a syntax tree, but must be considered for the semantic tree <math>T_\pi</math>. These values were omitted in the example above. As in the example above, the grammar is a context-free (type 2) grammar, almost in Chomsky Normal Form (CNF) in SCI; constructing a grammar with CNF rules from it would be trivial.<ref>FreeSCI constructs a GNF (Greibach Normal Form) representation from these rules for parsing.</ref> | ||
Obviously, G is an ambiguous grammar. In SCI, rule precedence is implied by rule order, so the resulting left derivation tree is well-defined (in the example, it would be defined by < | Obviously, <math>G</math> is an ambiguous grammar. In SCI, rule precedence is implied by rule order, so the resulting left derivation tree is well-defined (in the Parser example further down, it would be defined by <math>D_0</math>.<ref>In FreeSCI, you can use the ”parse” console command to retrieve all possible left derivation trees.</ref> | ||
<br> | <br> | ||
====Parser grammar example==== | ====Parser grammar example==== | ||
< | <math>G = \langle \{ \mathrm{12f}, \ldots, \mathrm{13e} \}, \{ C_1, C_2, C_4, \ldots, C_{100} \}, P, \mathrm{13c} \rangle</math> | ||
{| <i> | {| <i> | ||
|---- | |---- | ||
| | |P = { | ||
|13c | |13c | ||
|& | |→ | ||
|13b 134 | |13b 134 | ||
|---- | |---- | ||
| | | | ||
|13c | |13c | ||
|& | |→ | ||
|13b 13d 133 | |13b 13d 133 | ||
|---- | |---- | ||
| | | | ||
|13c | |13c | ||
|& | |→ | ||
|13b 13d | |13b 13d | ||
|---- | |---- | ||
| | | | ||
|13c | |13c | ||
|& | |→ | ||
| | |13b | ||
|---- | |---- | ||
| | | | ||
|13c | |13c | ||
|& | |→ | ||
|13b 13d 13b 13d | |13b 13d 13b 13d | ||
|---- | |---- | ||
| | | | ||
|13b | |13b | ||
|& | |→ | ||
|131 134 | |131 134 | ||
|---- | |---- | ||
| | | | ||
|13b | |13b | ||
|& | |→ | ||
|131 13d 13d | |131 13d 13d | ||
|---- | |---- | ||
| | | | ||
|13b | |13b | ||
|& | |→ | ||
|131 | |131 | ||
|---- | |---- | ||
| | |||
|13d | |13d | ||
|& | |→ | ||
|134 | |134 | ||
|---- | |---- | ||
| | | | ||
|131 | |131 | ||
|& | |→ | ||
|C<sub> | |C<sub>80</sub> | ||
|---- | |---- | ||
| | | | ||
|133 | |133 | ||
|& | |→ | ||
|C<sub> | |C<sub>20</sub> | ||
|---- | |---- | ||
| | | | ||
|134 | |134 | ||
|& | |→ | ||
|C<sub> | |C<sub>10</sub> } | ||
|---- | |---- | ||
|}</i> | |}</i> | ||
===Semantics=== | ===Semantics=== | ||
This is important, since the parser does much more than just accept or discard input. Using the semantic tags applied to each non-terminal on the right-hand side of a rule, it constructs what I will call the semantic parse tree < | This is important, since the parser does much more than just accept or discard input. Using the semantic tags applied to each non-terminal on the right-hand side of a rule, it constructs what I will call the semantic parse tree <math>T_\pi</math> , which attempts to describe what the input means. For each non-terminal rule | ||
< | <math>r = v_0 \to v_1 v_2 \ldots v_n</math> | ||
there are semantic tags < | there are semantic tags <math>\sigma_{r,1}, \sigma_{r,2} \ldots \sigma_{r,n} \in S</math>, as explained above. <math>T_\pi</math> is now constructed from the resulting derivation and the semantic tags assiociated with each non-terminal of the rule used. The construction algorithm is explained below with <math>T_\pi</math> being constructed from nodes, which have the following recursive structure: | ||
< | <math> | ||
\newcommand{\Node}{\textrm{NODE}} | |||
\Node = \{\diamond\} \cup S \times V \times (\Node \cup \Gamma)*;</math> | |||
Where S is the set of possible semantic values, and V is the set of non-terminals as defined in the grammar. We will also use the sequence < | Where <math>S</math> is the set of possible semantic values, and <math>V</math> is the set of non-terminals as defined in the grammar. We will also use the sequence <math>\gamma_{0}, \gamma_{1}, \gamma_{2} \ldots \gamma_{k-1}</math> which will represent the word groups the input tokens belonged to (in the exact order they were accepted), and the sequence <math>r_{0}, r_{1}, r_{2} \ldots r_{l-1}</math>, which will be the list of rules used to create the left derivation tree as described in the previous section. | ||
<pre> | |||
< | Helper function sci_said_recursive: S × V × (V ∪ Σ)* → NODE | ||
Parameters: s ∈ S, Rule r ∈ V × (V ∪ Σ): v0 → v1 v2 ... vi | |||
cnmr = cnr | |||
NODE n := s, v0 | |||
FOR j := 1 TO i | |||
IF (vj ∈ Σ) THEN | |||
n := n, γcnγ | |||
cnγ := cnγ + 1 | |||
ELSE | |||
cnoldr := cnr | |||
cnr := cnr + 1 | |||
n := n, sci_said_recursive(Σrmr,j, rcnoldr) | |||
FI | |||
ROF | |||
RETURN (n) | |||
Helper function get_children: NODE → NODE* | |||
get_children((s, v, n0, n1 ... nm)) := n0, n1 ... nm | |||
Algorithm SCI-SAID-TREE | |||
cnγ := 0; | |||
cnr := 1; | |||
ntemp := ntemp, SCI-SAID-RECURSIVE(0, r0) | |||
root(TΠ) := (141, 13f, get_children(ntemp)) | |||
</pre> | |||
</ | |||
Here is an example, based on the previous one: | Here is an example, based on the previous one: | ||
Line 333: | Line 334: | ||
====Example: Parser example==== | ====Example: Parser example==== | ||
Parse is called with <i>"open door"</i>. | Parse is called with <i>"open door"</i>. | ||
* < | * <math>\textrm{"open"} \in \langle 842, \{ C_{80} \} \rangle</math> (an imperative word of the word group 0x842) | ||
* < | * <math>\textrm{"door"} \in \langle 917, \{ C_{10} \} \rangle</math> (a substantive of the word group 0x917) | ||
* < | * <math>I = \langle 842, \{C_{80} \} \rangle, \langle 917, \{C_{10} \} \rangle</math> | ||
< | <math>I</math> is clearly accepted by automatons based on the grammar described above, There are two possible derivations: | ||
{| | {| | ||
|D<sub>0</sub> = 13c | |colspan="2" align="center"| D<sub>0</sub> = 13c | ||
|width="50em"| | |||
|colspan="2" align="center"| D<sub>1</sub> = 13c | |||
|---- | |---- | ||
|(13c ↦ | |(13c ↦ 13b 134) | ||
|⊢ 13b 134 | |⊢ 13b 134 | ||
| | |||
|(13c ↦ 13b) | |||
|⊢ 13b | |||
|---- | |---- | ||
|(13b ↦ 131) | |(13b ↦ 131) | ||
|⊢ 131 134 | |||
| | |||
|(13b ↦ 131 134) | |||
|⊢ 131 134 | |⊢ 131 134 | ||
|---- | |---- | ||
|(131 ↦ C<sub>80</sub> | |||
|⊢ C<sub>80</sub> 134 | |||
| | |||
|(131 ↦ C<sub>80</sub> | |(131 ↦ C<sub>80</sub> | ||
|⊢ C<sub>80</sub> 134 | |⊢ C<sub>80</sub> 134 | ||
Line 353: | Line 365: | ||
|(134 ↦ C<sub>10</sub> | |(134 ↦ C<sub>10</sub> | ||
|⊢ C<sub>80</sub> C<sub>10</sub> | |⊢ C<sub>80</sub> C<sub>10</sub> | ||
| | | | ||
|(134 ↦ C<sub>10</sub> | |(134 ↦ C<sub>10</sub> | ||
|⊢ C<sub>80</sub> C<sub>10</sub> | |⊢ C<sub>80</sub> C<sub>10</sub> | ||
|---- | |---- | ||
|} | |} | ||
====Example: Semantic tree example==== | ====Example: Semantic tree example==== | ||
Line 452: | Line 447: | ||
This sequence of operators and word groups is now used to build the Said tree <i>T</i><sub>Σ</sub> . I will describe the algorithm used to generate <i>T</i><sub>Σ</sub> by providing a grammar <i>G</i><sub>Σ</sub> , with <i>L(G<sub>Σ</sub>) </i>containing all valid Said specs. The semantics will be provided under each rule with a double arrow: | This sequence of operators and word groups is now used to build the Said tree <i>T</i><sub>Σ</sub> . I will describe the algorithm used to generate <i>T</i><sub>Σ</sub> by providing a grammar <i>G</i><sub>Σ</sub> , with <i>L(G<sub>Σ</sub>) </i>containing all valid Said specs. The semantics will be provided under each rule with a double arrow: | ||
< | <pre> | ||
G\Sigma = ({saidspec, optcont, leftspec, midspec, rightspec, word, cwordset, wordset, expr, cwordrefset, wordrefset, recref}, \Gamma, P, saidspec) | G\Sigma = ({saidspec, optcont, leftspec, midspec, rightspec, word, cwordset, wordset, expr, cwordrefset, wordrefset, recref}, \Gamma, P, saidspec) | ||
Line 543: | Line 538: | ||
\Rightarrow (141 144 (144 14f wordset)) | \Rightarrow (141 144 (144 14f wordset)) | ||
} | } | ||
</ | </pre> | ||
==Matching the trees== | ==Matching the trees== | ||
The exact algorithm used to compare < | The exact algorithm used to compare <math>T_\pi</math> to <math>T_\Sigma</math> is not known yet. The one described here is based on the approximation used in FreeSCI, which is very similar to the original SCI one. | ||
First, we need to describe a set of functions for traversing the nodes of <math>T_\Sigma</math> and <math>T_\pi</math>, and doing some work. They will be operating on the sets <math>\mathbb{N}</math> (all non-negative integral numbers), <math>\mathbb{B}</math> (Booleans), and <math>\mathrm{Node}</math> (which we defined earlier). | |||
<math> | |||
\newcommand{\tuple}[1]{\langle #1 \rangle} | |||
\newcommand{\nat}{\mathbb{N}} | |||
\newcommand{\bool}{\mathbb{B}} | |||
\newcommand{\Btt}{\textrm{tt}} | |||
\newcommand{\Bff}{\textrm{ff}} | |||
\newcommand{\Node}{\mathrm{Node}} | |||
\begin{align*} | |||
\textrm{first}: & \Node \to S \\ | |||
\textrm{first}: & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto s \\ | |||
\\ | |||
\textrm{second}: & \Node \to V \\ | |||
\textrm{second}: & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto v \\ | |||
\\ | |||
\textrm{word}: & \Node \to \Gamma \\ | |||
\textrm{word}: & \tuple{s, v, \gamma} \mapsto \gamma \\ | |||
\\ | |||
\textrm{children}: & \Node \to \Node* \\ | |||
\textrm{children}: & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto \{ m\in \Node \mid m\in\{ n_{0}, n_{1} \ldots n_{i} \} \} \\ | |||
\\ | |||
\textrm{all-children}: & \Node \to \Node* \\ | |||
\textrm{all-children}: & n \mapsto \textrm{children}(n) \cup \{ m \mid \exists l \in\textrm{all-children}(n) . m\in l \} \\ | |||
\\ | |||
\textrm{is-word} : & \Node \to \bool \\ | |||
\textrm{is-word} : & \tuple{s, v, n_{0}, n_{1} \ldots n_{i}} \mapsto \Btt \iff (i = 0) \land n_{0} \in \Gamma \\ | |||
\textrm{verify-sentence-part-elements}: & \Node \times \Node \to \bool \\ | |||
\textrm{verify-sentence-part-elements}: & \tuple{n_{p}, n_{s}} \mapsto \Btt \iff (\textrm{first}(n_{s} = 152) \land \\ | |||
& ((\forall m \in \Node.\textrm{verify-sentence-part-elements}(m, n_{s}) \iff \\ | |||
& \{ w \mid \exists t \in \textrm{all-children}(m).w = \textrm{word}(t)\} = \emptyset) \lor \\ | |||
& \exists m \in \textrm{children}(n_{s}).\textrm{verify-sentence-part-elements}(m, n_{s})) ) \lor \\ | |||
& ((\textrm{second}(n_{s}) = 153) \land (\exists m.m \in \textrm{children}(n_{s}).(\exists o \in \textrm{all-children}(n_{s}).\\ | |||
& (\textrm{first}(o) = \textrm{first}(n_{p})) \land \textrm{word}(o) = \textrm{word}(m))) ) \lor \\ | |||
& ((\textrm{second}(n_{s}) \in \{144, 14c\}) \land | |||
(\exists m \in \textrm{children}(n_{s}).\textrm{verify-sentence-part}(m, n_{s}))) \\ | |||
\\ | |||
\textrm{verify-sentence-part}: & \Node \times \Node \to \bool \\ | |||
\textrm{verify-sentence-part}: & \tuple{n_{p}, n_{s}} \mapsto \Btt \iff \forall n \in \textrm{children}(n_{s}):\exists m\in\textrm{children}(n_{p}). \\ | |||
& (\textrm{first}(m) = \textrm{first}(n)) \land \textrm{verify-sentence-part-elements}(n, m) \\ | |||
\\ | |||
\textrm{verify-sentence-part-brackets}: & \Node \times \Node \to \bool \\ | |||
\textrm{verify-sentence-part-brackets}: & \tuple{n_{p}, n_{s}} \mapsto \Btt \iff | |||
(\textrm{first}(n_{p}) = 152 \land | |||
(\forall m\in \Node. (\textrm{first}(m) = \textrm{first}(n_{s})) \land\\ | |||
& (\textrm{second}(m) = \textrm{second}(n_{s})). \textrm{verify-sentence-part}(n_{p}, m) \iff\\ | |||
& \{ w \mid \exists t \in \textrm{all-children}(m).w = \textrm{word}(t)\} = \emptyset)) \lor \\ | |||
& ((\textrm{first}(n_{p}) \in \{141, 142, 143\}) \land | |||
\textrm{verify-sentence-part}(n_{p}, n_{s})) \\ | |||
\end{align*} | |||
</math> | |||
With these functions, we can now define an algorithm for augmenting <math>T_\pi</math> and <math>T_\Sigma</math>: | |||
Algorithm SCI-AUGMENT: | |||
matched := tt | |||
claim_on_match := tt | |||
FOREACH n ∈ root(<math>T_\Sigma</math>) | |||
IF ((first(n) = 14b) ∧ (second(n) = f900)) THEN | |||
claim_on_match := ff | |||
ELSE IF ¬verify_sentence_part_brackets(n, root(<math>T_\pi</math>)) THEN | |||
matched := ff | |||
ENDIF | |||
ENDFOREACH | |||
Augmenting succeeded if matched = tt; in this case, <math>T_\pi</math> is one of the trees accepted by the description provided by <math>T_\Sigma</math>. In this case, Said() will return 1. It will also claim the event previously provided to Parse(), unless claim_on_match = ff. | |||
==Notes== | ==Notes== | ||
<references/> | <references/> |
edits