ASDParser algorithm overview

Copyright 1995, 1999, 2008  James A. Mason
last modified 2010 Aug 21 (retargeting a link at the bottom)

Given an ASD grammar, a phrase to parse and a list of phrase types, one of which is expected to correspond to the phrase as a whole, the ASDParser performs a depth-first search through the grammar, starting with a grammar node that matches the first item (e.g., word) in the phrase, trying to reach a final node in the grammar whose end label matches one of the expected phrase types at the same time that the end of the entire phrase has been reached.   If the search is successful, the parser has found a first parse for the given phrase.  It can then be made to backtrack and try to find additional successful parsings of the given phrase. (If desired, the parsing algorithm can be modified easily to proceed in a breadth-first manner,  as will be indicated later in these notes.)

    A parse of a given phrase proceeds in a left-to-right, bottom-up manner, with the initial phrase structure consisting of a list of the words in the given phrase.  The parsing algorithm consists of a basic loop, on each iteration of which the parse is advanced, if possible, in one of four different ways:  
  1. to an initial node in the grammar which corresponds to the next item in the phrase structure, thus beginning a subphrase;
  2. to a non-initial node in the grammar which matches the next item in the phrase structure;
  3. to a dummy node in the grammar which follows the current node, and results in insertion of a dummy item into the phrase structure; or
  4. from a final node in the grammar, which results in replacement of the current subphrase by its syntactic type, at the top level of the phrase structure, with the subphrase hanging below (i.e. dependent on) the replacement item.  For example (using the grammar for English cardinal numbers), if the phrase structure before the advance is.
     CARDINAL(1) MULTIPLIER(1) and(1) DECADE(1) UNIT(2)
         |           |                   |        |
       UNIT(1)     hundred(1)         twenty(1)  one(1)
         }
       five(1)

    
and the advance is from node UNIT(2) in the grammar,  corresponding
          to the item at the end of the top level of the phrase structure, then the
          new phrase structure becomes

     CARDINAL(1) MULTIPLIER(1) and(1)     CARDINAL
         |           |                       |      
       UNIT(1)     hundred(1)         -----------------
      
         }                           
DECADE(1) UNIT(2)
       five(1)                           |        |
                                      twenty(1)  one(1)
                                    
    
At this point, the new item, CARDINAL, at the top level in the phrase structure
          does not yet have an associated instance number, and the parse continues
          from node and(1) in the grammar.

If none of these four possibilities for advancing the parse is available, the parse has reached a dead end.  Then it must backtrack to an earlier point of local ambiguity, if there have been any, and try an alternative there which hasn't been tried yet. 

    The semantic action algorithms for nodes in the grammar are applied in cases (1), (2), and (3), before the parse completes its advance to the kind of node in question.  Evaluating the semantic action augmentation may indicate that the current direction in which the parser is searching through the grammar should not be continued, and that the parser should backtrack to the most recent point of local ambiguity, if any, and try searching in a different direction from there.  Semantic action augmentations can even indicate that the entire parse should be terminated, if the grammar writer decides that is appropriate in certain cases. 

    The semantic value computations for final nodes in the grammar are applied in case (4), when a subphrase is replaced at the top level of the phrase structure by its syntactic type.  The value computed becomes the semantic value for the new top-level item that represents the completed subphrase. 

    To represent semantic values or syntactic features (e.g., singular vs. plural of nouns or verbs, or the case of nouns) during a parse, the parser provides a mechanism for the semantic action and semantic value augmentations of a grammar to set and access pairs of "features" and "values".  A feature can be any string, and its corresponding value can be any Java object..  ASDParser provides the following instance functions to set and access feature values and to access semantic values of nodes in a phrase structure.  At each stage in the parsing process, the parser maintains a set of feature-value pairs for the current subphrase being parsed at the top level of the phrase structure.  That subphrase corresponds to the current search path through a component of the grammar.
The functions get and valueOf are identical; they just provide alternative names for getting the current value of a feature.  In addition to the foregoing instance functions of ASDParser, the class ASDPhraseNode provides a method
to return the semantic value of a node in the phrase structure.  Illustrations of uses of these various instance functions are provided in  the example Numerals.java .

    The algorithm to parse a given utterance (i.e., phrase) to one of a set of expectedTypes (i.e., phrase type goals) has the following basic overall structure.  The semantic action NOADVANCE indicates that the parser should stop pursuing the current parse direction.  QUIT indicates that the entire parse should be terminated as unsuccessful.  choices is a list of (remaining) choices for advancing the parse from its current state.

    Segment the utterance into a list of words. 
  This is the initial phraseStructure.

  Begin at the front of the phraseStructure word list.

  Set the choices list to empty initially.

  loop until there is just one item in the phraseStructure list (at
       the top level),
and that item is a member of expectedTypes

     if the list of choices is empty then

        compute a list of possible choices for advancement
        from the current state of the parse,
by considering
        the successors of the current node (if any) in the grammar,

        as well as all initial nodes corresponding to the next item
        at the top level list
in the phraseStructure

     endif

     if the list of choices is empty then

        if there are points of local ambiguity
           remaining from earlier in the parse
then

           backtrack to a remaining point of local ambiguity
           in the parse,
and retrieve the list of untried
           choices for advancement from that point


        endif

        if the list of choices is (still) empty then

           terminate the parse as unsuccessful

        endif

     endif

     Remove the first possibility from the list of choices
     and save the rest (if any) in case of later backtracking.

     if the current possibility for advancing the parse
        corresponds to a
final node in the grammar then

        Evaluate the semantic value expression for the current
        grammar node,
to compute the semantic value of the
        completed (sub)phrase.


        if the semantic value computation yields QUIT then

           terminate the parse as unsuccessful

        else if the semantic value computation yields NOADVANCE then

           backtrack to the most recent point of local ambiguity
           in the
parse and retrieve the list of untried
           choices for
advancement from that point

        else

           Perform a replacement advance (type 4 above), and install
           the phrase type of the completed (sub)phrase and the
           computed semantic value in the new top-level node of the
           phraseStructure, with the (sub)phrase linked below the new
           node.


           if the replacement was at the beginning of the
              overall phrase
then

              prepare to continue the parse from the beginning
              of the new top-level word list in the phraseStructure


           else

              prepare to continue the parse from the item in the
             
top-level word list of the phraseStructure just
              before
the replacement

           endif

           Set the list of choices for advancement of the parse
           to empty.


        endif

     else

        Apply the current possibility for advancing the parse,
        according to its type
(1), (2), or (3) above,
        and evaluate the
semantic action for the node to which
        the parser has just moved in the grammar.


        if the semantic action yields QUIT then

           terminate the parse as unsuccessful

        else if the semantic action yields NOADVANCE then

           backtrack to a remaining point of local ambiguity
           in the
parse (if any) and retrieve the list of untried
           choices for
advancement from that point.  If there are
           no such remaining points of local ambiguity,
           terminate the parse as unsuccessful

        else

          
set the
list of choices for advancement of the parse
           to empty


        endif

     endif

  endloop


    This version of the algorithm finds a first successful parse, if one exists.  It does not check for other successful parses that might be found subsequently, by backtracking to points of local ambiguity recorded during the first parse.  So in ASDParser the algorithm has been modified to search for additional successful parses if desired.  This has been done by making the body of the main loop into a callable procedure by itself, so it can be called again, after backtracking, to try to find additional parses.  The algorithm for that procedure is as follows:

advance:

  if the list of choices for advancing the parse is empty then

     compute a list of possible choices for advancement
     from the current state of the parse,
by considering
     the successors of the current node (if any) in the grammar,

     as well as all initial nodes corresponding to the next item
     at the top level
in the phraseStructure list

  endif

  if the list of choices is empty then

     return NOADVANCE

  endif

  Remove the first possibility from the list of choices.  If any
  other possibilities remain in the list,
the parsing process is
  at a point of (remaining) local ambiguity;
save the remaining
  possibilities in case of later backtracking.

     [Note: The data structure that is used for recording points
      of local ambiguity determines whether the parse proceeds
      in a depth-first or a breadth-first manner.  ASDParser
      uses a stack (last-in-first-out list), which makes it search
      depth-first.  If a queue (first-in-first-out list) were used
      instead, the search would proceed breadth-first.]

  if the current possibility for advancing the parse
     corresponds to a
final node in the grammar then

     evaluate the semantic value expression for the current node to
     compute the semantic value of the completed phrase

     if the semantic value computed is NOADVANCE or QUIT then

       
return that value as the value of this procedure

     else

        perform a replacement advance (type 4 above), and install
        the phrase type of the completed phrase and the computed
        semantic value in the new top-level node of the
        phraseStructure, with the subphrase linked below the
        new node.


        if the replacement was at the beginning of the
           overall phrase
then

           prepare to continue the parse from the beginning
           of the new top level list of the phraseStructure


        else

           prepare to continue the parse from the item
           just
before the new top-level node of the
           phraseStructure

        endif

     endif

     return SUCCEED

  else

      apply the current possibility for advancing the parse,
      according to its type
(1), (2), or (3) above,
      and evaluate the
semantic action for the node to which
      the parser has just moved in the grammar.


      if the semantic action yields QUIT or NOADVANCE then

         return the result of the semantic action
         (QUIT or NOADVANCE)


      endif

      return SUCCEED

   endif

end of advance


    Then initialize and backup are defined as shown below, and a general parse procedure is defined to find either the first or the next successful parse of a given phrase:

initialize:

   Segment the utterance into a list of words. 
   This is the initial phraseStructure.

  
   Begin at the front of the word list.

   Set the list of choices to empty.

end of initialize


backup:

   Backtrack to the most recent point of local ambiguity
   (if any) in the parse,
and retrieve the list of untried
   choices for advancement from that point. 

   Return true if successful, false if unsuccessful. 

   [Note: This procedure uses the data structure from the
   advance
procedure that is used to keep track of remaining
   points of local ambiguity.  It takes the
top item when
   the data structure is
a stack.  It should take the front
   item if the data structure is a
queue.]

end of backup


parse(expectedTypes):

   loop

      result := advance()

      if result = QUIT then

         return false to indicate an unsuccessful parse

      elseif result = SUCCEED then

         if there is just one item in the list
            at the top level of the phraseStructure,
            and
that item is a member of expectedTypes then

             return true to indicate a successful parse

         endif

      elseif result = NOADVANCE then

         if not backup() then

            return false to indicate an unsuccessful parse

         endif

      endif

   endloop

end of parse


Of course, the initialize procedure is called only once, before the attempt to find a first successful parse.  Attempts to find additional successful parses are made by just calling parse(expectedTypes) again after it returns true.

Note: The parsing algorithm imposes a few restrictions on how ASD grammars should be written that are to be used with ASDParser:
ASD notes and documentation

ASD home page