ASDParser algorithm details - Part 1: Phrase structure representation

Copyright 2008  James A. Mason

    While an instance of ASDParser is parsing a given utterance string by performing a depth-first search through a given ASDGrammar network, it builds a phrase structure to represent the grammatical structure of the utterance.  That phrase structure records the successfully-completed paths through components of the grammar network that match subphrases of the utterance.  The nodes of the phrase structure are instances of ASDPhraseNode, which have the following instance variables:

   private String nodeWord; // the vocabulary element in the node
   private ASDGrammarNode nodeInstance;
      // the grammar node that corresponds to this phrase node
   private ASDPhraseNode nodeNext;
      // the phrase node that follows this one; null if none
   private ASDPhraseNode nodeSubphrase;
      // the first node in the subphrase represented by this node;
      // null if there is no subphrase
   private Object nodeValue;
      // the semantic value computed for the node;

    Such nodes are linked as shown in the example phrase structure diagram in Figure 1 for an utterance being parsed with the grammar for English cardinal numbers (cardinal.jpg and cardinal.grm).  The example shows the phrase structure as it would appear part way through the parsing process.

[ example phrase structure diagram ]

                                                             Figure 1

In the diagram, the nodeNext links are shown horizontally and the nodeSubphrase links are shown vertically from the boxes that represent ASDPhraseNode instances.  The nodeWord and nodeInstance values are shown inside the boxes.  (In an actual phrase structure, nodeInstance values are links to nodes in an ASDGrammar, but in the diagram they are just shown as numbers, which correspond to instance numbers in the grammar.)  To keep the diagrams simple, the values of nodeValue instance variables are not shown in the diagrams; for phrases parsed with the cardinal grammar, those values are the numerical values of the words and phrases that the nodes represent.

    The leftmost node at the top level of the phrase structure is a header node whose purpose is to facilitate re-linking of the nodes at the top level during the parsing process.  As the parse proceeds and subphrases are recognized, the first node of each completed subphrase is linked, via a nodeSubphrase link, from (i.e., below, in the diagram) a new node that represents the completed subphrase and that contains its phrase type as its nodeWord value. 

    Initially, the parse begins with the nodes corresponding to the words of the given phrase ( "two hundred and twelve thousand" ) linked horizontally to the right of the header node, as shown in Figure 2.  The initial nodeInstance values are all null and remain so until each node is matched with a corresponding node in the ASDGrammar during the parsing process.  For the same reason, the node containing "thousand" in Figure 1 has no value shown for its nodeInstance.

[ example phrase structure diagram ]
                                                           
                                                             Figure 2

    Figure 3 shows the phrase structure when a parse of the phrase has been completed.  A parse is complete when the top level of the phrase structure contains just one node after the header node, and the nodeWord value in that node is one of the expected phrase types (in this case, "CARDINAL") which have been specified as goals for the parse.  Notice that each horizontally-linked list of nodes below the top level in the completed phrase structure corresponds to a path from an initial node to a final node in the cardinal grammar.  The example shown is consistent with the semantic action and value constraints with which the nodes of the grammar has been augmented.  The cardinalnodeValue in the topmost CARDINAL node (just to the right of the header node) would be 212000, the value of the entire parsed phrase "two hundred and twelve thousand".

[ example phrase structure diagram ]

                                                             Figure 3

    At the nodes indicated by (1) and (2) in Figure 1, local ambiguities will have been detected during the parsing process up to the state shown by the Figure.  After the node [MULTIPLIER 1],  the parse could have advanced to the dummy node [$$ 2] in the grammar rather than to node [and 1].  Also, after the node [and 1] the parse could have advanced to the initial node [CARDINAL 1] in the grammar, rather than to [CARDINAL 2] as it has in the diagram.  If the parsing process subsequent to Figure 1 were to reach a dead end, it would backtrack to the most recent point of local ambiguity and try an alternative advance from that point.

    Most of the complexity of the parsing algorithm in ASDParser comes from the need to handle such ambiguities.  At each step the parser must be able to recognize local ambiguity, if it occurs.  It must have mechanisms for remembering the state of the parse at points of local ambiguity, and for backtracking to the most recent point of local ambiguity when the parser reaches a dead end in advancing its search through the syntax diagram.   For example, if the phrase being parsed were "two million and twelve thousand" rather than "two hundred and twelve thousand", the semantic action augmentation of  grammar node [MULTIPLIER 1] would reject the advance that took place from node [CARDINAL 1] to node [MULTIPLIER 1] at the top level in Figure 3, requiring the parser instead to backtrack to local ambiguity (2) in Figure 1.  There it would begin a new subphrase at the CARDINAL node, corresponding to the initial grammar node [CARDINAL 1], after node [and 1], as shown in Figure 4.  (Among other things, ASDParser must keep track of the nodes at which subphrases begin, so it can replace completed subphrase correctly at the top level when final nodes are reached in the grammar.)

[ example phrase structure diagram ]

                                                             Figure 4

Then the parse could proceed to the final phrase structure shown in Figure 5, which satisfies the semantic augmentations of nodes in the grammar and which yields the value 2012000 for the overall phrase. .

[ example phrase structure diagram ]

                                                             Figure 5


ASD Parser algorithm details - Contents

ASD notes and documentation

ASD home page


last modified 2010 Aug 21