Augmented Syntax Diagram (ASD) grammar file format

page last updated 2006 Sep 27

ASD grammar files are plain text files.  An ASD grammar file lists words and phrase types in the form of a lexicon, in alphabetical order with each word or phrase type followed by a list of its instances.  Each instance is represented in a way that is useful to the ASD software tools.  ASD grammar files can be printed and edited (carefully!) with any editor for text files.  However, use of the ASDEditor tool is recommended for editing them, to avoid creating corrupted grammar files which will not be readable by the ASD tools.  The file-type convention for ASD grammar files is .grm or .asd .  Details of the format for ASD grammar files are provided below.  Earlier, prototype versions of the ASDParser and ASDEditor were implemented in Smalltalk, and a still earlier version of the parser was implemented in Lisp.  The same ASD grammar file format is used for the implementations in Java, Lisp and Smalltalk.

ASD grammar files can be saved by the ASDEditor in two ways: (1) optimized for parsing and (2) unoptimized for parsing.  The ASDParser can use grammar files in either optimized or unoptimized format, but it generally requires more parsing steps to parse phrases when it uses an unoptimized grammar file.  Grammars saved in unoptimized file format are generally smaller than the same grammars saved in optimized file format.  The format of optimized grammar files is described first below.  Diffierences in the format of unoptimized grammar files are described after that.

Optimized grammar file format

A grammar for English cardinal numbers is shown in cardinal.jpg in the graphical form in which it is displayed by ASDEditor.  The contents of the file cardinal.grm below show how the same grammar is saved by ASDEditor as a file which can be used by ASDParser.  It provides an example of the format of an optimized ASD grammar file.  The file is organized as a lexicon, with entries for each word, phrase type, and punctuation mark in the grammar.  The entries are listed in alphabetical order beginning with the entry for $$, which represents a null or dummy node.  Each entry in the file is enclosed in parentheses, with the word or phrase type for the entry appearing immediately after the opening parenthesis.  The list of instances for the entry is also enclosed in parentheses, and each instance is itself enclosed in parentheses.  The layout produced by the ASDEditor puts the instances for an entry on separate lines, indented two spaces from the left margin, but that is for human readability only.  When an ASD grammar file is read by the ASDEditor or ASDParser, line breaks and indenting (except in quoted strings) are ignored; the structure of the file is indicated entirely by the nested parentheses.

Each instance of a word, phrase type, or punctuation mark is represented by a parenthesized list of seven items:

  1. the instance number, corresponding to the number shown after the word or phrase type in a node displayed by the ASDEditor.
  2. If the instance corresponds to an initial node in the grammar, the second item is a parenthesized list of phrase types which can begin, directly or indirectly, at that initial node.  Otherwise, if the instance does not correspond to an initial node in the grammar, the second item is nil.
  3. If the instance corresponds to a final node in the grammar, the third item is the phrase type which ends at that node.  Otherwise, if the instance does not correspond to a final node in the grammar, the third item is a parenthesized list which describes the edges which go out of the node to successor nodes in the grammar diagram.  (See below for details of the representation of edges.)
  4. If the instance corresponds to a final node in the grammar, the fourth item is a string in single quotes which represents the semantic value of the phrase which ends at the node.  That string is interpreted by the ASDParser typically as the name of a Java method which is to be invoked in an application which is using the ASDParser.  (For cardinal.grm the corresponding application is Numerals.java.)  Alternatively, the semantic value string can be a number (as for the word eight instance 1), or it can be a literal string surrounded by double quotes (") inside the single quotes around the item itself.  Also, it is possible for a user of ASDParser to define a class which implements the ASDSemantics interface and which can interpret the semantic value string in any way the user desires.  If there is no semantic value associated with the phrase that ends at the node, the semantic value string is just the empty string (''). Alternatively, if the instance does not correspond to a final node in the grammar, then the fourth item is a list in parentheses of all phrase-type labels which are found in immediate successor nodes of the current node.  That list is produced by the ASDEditor when it creates the file for saving an ASD grammar; it may, of course, be an empty list.  The list is used by ASDParser to optimize its parsing performance.
  5. the semantic action string, if any, for the instance.  Its purpose is somewhat similar to the semantic value string, except that it is interpreted when ASDParser enters the corresponding node in the grammar during parsing.  It is typically interpreted by the ASDParser as the name of a Java method which is to be invoked by an instance of a class (possibly the ASDParser itself) which implements the ASDSemantics interface.  Also, it is possible for a user of ASDParser to define a class which implements the ASDSemantics interface and which can interpret the semantic action string in any way the user desires.  If there is no semantic action for the grammar instance, the semantic action string is just the empty string ('').
  6. the horizontal pixel coordinate for the corresponding node in the syntax diagram for the grammar.  This is used by ASDEditor to remind it where on the display screen to position the node.
  7. the vertical pixel coordinate for the corresponding node in the syntax diagram for the grammar.
Each edge in item 3 of a non-final instance is represented by a parenthesized list containing
  1. the word, phrase type, or punctuation mark in the successor node to which the edge connects the given node;
  2. the instance number for the node to which the edge connects the given node;
  3. the horizontal pixel coordinate for the circular "handle" in the center of the edge;
  4. the vertical pixel coordinate for the circular "handle" in the center of the edge.
Of course only ASDEditor, not ASDParser, uses the pixel coordinate information.

The contents of the cardinal.grm file are as follows:

($$ (
  (1 nil CARDINAL 'valueOfV' '' 586 200)
  (2 nil CARDINAL 'valueOfVTimesM' '' 342 388)
))

(, (
  (1 nil ((CARDINAL 2 401 321) (and 1 368 294)) (CARDINAL) '' 319 287)
))

(- (
  (1 nil ((UNIT 2 565 143)) (UNIT) '' 535 116)
))

(and (
  (1 nil ((CARDINAL 2 455 321)) (CARDINAL) '' 409 287)
))

(CARDINAL (
  (1 (CARDINAL) ((MULTIPLIER 1 157 349)) (MULTIPLIER) 'setVNodeValue' 57 342)
  (2 nil CARDINAL 'valueOfVTimesMPlusV2' 'cardinal_2_action' 474 342)
))

(DECADE (
  (1 (CARDINAL) ((UNIT 2 542 163) ($$ 1 542 185) (- 1 517 143)) (UNIT) 'setVNodeValue' 447 156)
))

(eight (
  (1 (UNIT CARDINAL) UNIT '8' '' 10 220)
))

(eighteen (
  (1 (CARDINAL) CARDINAL '18' '' 123 244)
))

(eighty (
  (1 (DECADE CARDINAL) DECADE '80' '' 300 179)
))

(eleven (
  (1 (CARDINAL) CARDINAL '11' '' 128 37)
))

(fifteen (
  (1 (CARDINAL) CARDINAL '15' '' 130 156)
))

(fifty (
  (1 (DECADE CARDINAL) DECADE '50' '' 299 94)
))

(five (
  (1 (UNIT CARDINAL) UNIT '5' '' 17 126)
))

(forty (
  (1 (DECADE CARDINAL) DECADE '40' '' 295 65)
))

(four (
  (1 (UNIT CARDINAL) UNIT '4' '' 15 97)
))

(fourteen (
  (1 (CARDINAL) CARDINAL '14' '' 122 127)
))

(hundred (
  (1 (MULTIPLIER) MULTIPLIER '100' '' 439 7)
))

(million (
  (1 (MULTIPLIER) MULTIPLIER '1000000' '' 440 64)
))

(MULTIPLIER (
  (1 nil ((and 1 338 321) (CARDINAL 2 371 349) ($$ 2 305 372) (, 1 293 321)) (CARDINAL) 'multiplier_1_action' 194 342)
))

(nine (
  (1 (UNIT CARDINAL) UNIT '9' '' 12 253)
))

(nineteen (
  (1 (CARDINAL) CARDINAL '19' '' 124 271)
))

(ninety (
  (1 (DECADE CARDINAL) DECADE '90' '' 300 207)
))

(one (
  (1 (UNIT CARDINAL) UNIT '1' '' 13 5)
))

(seven (
  (1 (UNIT CARDINAL) UNIT '7' '' 10 187)
))

(seventeen (
  (1 (CARDINAL) CARDINAL '17' '' 114 213)
))

(seventy (
  (1 (DECADE CARDINAL) DECADE '70' '' 299 151)
))

(six (
  (1 (UNIT CARDINAL) UNIT '6' '' 20 155)
))

(sixteen (
  (1 (CARDINAL) CARDINAL '16' '' 129 186)
))

(sixty (
  (1 (DECADE CARDINAL) DECADE '60' '' 300 122)
))

(ten (
  (1 (CARDINAL) CARDINAL '10' '' 127 7)
))

(thirteen (
  (1 (CARDINAL) CARDINAL '13' '' 121 96)
))

(thirty (
  (1 (DECADE CARDINAL) DECADE '30' '' 294 35)
))

(thousand (
  (1 (MULTIPLIER) MULTIPLIER '1000' '' 439 34)
))

(three (
  (1 (UNIT CARDINAL) UNIT '3' '' 8 65)
))

(twelve (
  (1 (CARDINAL) CARDINAL '12' '' 128 67)
))

(twenty (
  (1 (DECADE CARDINAL) DECADE '20' '' 293 6)
))

(two (
  (1 (UNIT CARDINAL) UNIT '2' '' 14 36)
))

(UNIT (
  (1 (CARDINAL) CARDINAL 'nodeValue' '' 450 221)
  (2 nil CARDINAL 'valueOfV' 'unit_2_action' 586 156)
))

(UNKNOWN (
  (1 (UNKNOWNWORD CARDINAL) UNKNOWNWORD '' '' 30 434)
))

(UNKNOWNWORD (
  (1 (CARDINAL) CARDINAL 'valueOfV' 'UNKNOWNCARDINAL_action' 252 434)
))

Unoptimized grammar file format

The unoptimized format for ASD grammar files differs from the optimized format in just two ways, involving the second and fourth items in the list of seven items  that represents an instance in the grammar. 

For the second item:  If the instance corresponds to an initial node in the grammar, the second item is just the letter T (as "true" is represented in the LISP language), rather than being a parenthesized list of phrase types which can begin, directly or indirectly, at that initial node.  Otherwise, if the instance does not correspond to an initial node in the grammar, the second item is nil (as "false" is represented in the LISP language).  That is, the second item simply indicates whether or not the node is an initial node, without telling what phrase types can begin at that node.

For the fourth item:  If the instance does not correspond to a final node in the grammar, then the fourth item is either T or nil, depending on whether or not any of the labels in its immediate successor nodes are phrase-types.   That is, the fourth item indicates whether or not a new subphrase can begin immediately after the current node, but it does not indicate what type(s) of subphrase can begin there.

 The cardinal grammar above, re-saved in unoptimized form, is shown below:  (This example does not include any instances for which the fourth item is nil.)

($$ (
  (1 nil CARDINAL 'valueOfV' '' 586 200)
  (2 nil CARDINAL 'valueOfVTimesM' '' 342 388)
))

(, (
  (1 nil ((CARDINAL 2 401 321) (and 1 368 294)) T '' 319 287)
))

(- (
  (1 nil ((UNIT 2 565 143)) T '' 535 116)
))

(and (
  (1 nil ((CARDINAL 2 455 321)) T '' 409 287)
))

(CARDINAL (
  (1 T ((MULTIPLIER 1 157 349)) T 'setVNodeValue' 57 342)
  (2 nil CARDINAL 'valueOfVTimesMPlusV2' 'cardinal_2_action' 474 342)
))

(DECADE (
  (1 T ((UNIT 2 542 163) ($$ 1 542 185) (- 1 517 143)) T 'setVNodeValue' 447 156)
))

(eight (
  (1 T UNIT '8' '' 10 220)
))

(eighteen (
  (1 T CARDINAL '18' '' 123 244)
))

(eighty (
  (1 T DECADE '80' '' 300 179)
))

(eleven (
  (1 T CARDINAL '11' '' 128 37)
))

(fifteen (
  (1 T CARDINAL '15' '' 130 156)
))

(fifty (
  (1 T DECADE '50' '' 299 94)
))

(five (
  (1 T UNIT '5' '' 17 126)
))

(forty (
  (1 T DECADE '40' '' 295 65)
))

(four (
  (1 T UNIT '4' '' 15 97)
))

(fourteen (
  (1 T CARDINAL '14' '' 122 127)
))

(hundred (
  (1 T MULTIPLIER '100' '' 439 7)
))

(million (
  (1 T MULTIPLIER '1000000' '' 440 64)
))

(MULTIPLIER (
  (1 nil ((and 1 338 321) (CARDINAL 2 371 349) ($$ 2 305 372) (, 1 293 321)) T 'multiplier_1_action' 194 342)
))

(nine (
  (1 T UNIT '9' '' 12 253)
))

(nineteen (
  (1 T CARDINAL '19' '' 124 271)
))

(ninety (
  (1 T DECADE '90' '' 300 207)
))

(one (
  (1 T UNIT '1' '' 13 5)
))

(seven (
  (1 T UNIT '7' '' 10 187)
))

(seventeen (
  (1 T CARDINAL '17' '' 114 213)
))

(seventy (
  (1 T DECADE '70' '' 299 151)
))

(six (
  (1 T UNIT '6' '' 20 155)
))

(sixteen (
  (1 T CARDINAL '16' '' 129 186)
))

(sixty (
  (1 T DECADE '60' '' 300 122)
))

(ten (
  (1 T CARDINAL '10' '' 127 7)
))

(thirteen (
  (1 T CARDINAL '13' '' 121 96)
))

(thirty (
  (1 T DECADE '30' '' 294 35)
))

(thousand (
  (1 T MULTIPLIER '1000' '' 439 34)
))

(three (
  (1 T UNIT '3' '' 8 65)
))

(twelve (
  (1 T CARDINAL '12' '' 128 67)
))

(twenty (
  (1 T DECADE '20' '' 293 6)
))

(two (
  (1 T UNIT '2' '' 14 36)
))

(UNIT (
  (1 T CARDINAL 'nodeValue' '' 450 221)
  (2 nil CARDINAL 'valueOfV' 'unit_2_action' 586 156)
))

(UNKNOWN (
  (1 T UNKNOWNWORD '' '' 30 434)
))

(UNKNOWNWORD (
  (1 T CARDINAL 'valueOfV' 'UNKNOWNCARDINAL_action' 252 434)
))

Another example of the difference between optimized and unoptimized grammar file formats

The grammar expression.grm for arithmetic expressions, saved in optimized file format is

($$ (
  (1 nil TERM 'expression_$$_1_v' '' 186 453)
  (2 nil EXPRESSION 'expression_$$_2_v' '' 586 105)
  (3 nil FACTOR 'expression_$$_3_v' '' 272 181)
))

(* (
  (1 (x) x '' '' 297 392)
))

(+ (
  (1 nil ((TERM 1 503 81)) (TERM) 'expression_plus_1' 489 44)
))

(- (
  (1 nil ((TERM 1 528 81)) (TERM) 'expression_minus_1' 542 43)
  (2 (EXPRESSION FACTOR TERM) ((FACTOR 2 105 231)) (FACTOR) '' 75 224)
))

(. (
  (1 nil ((DIGITSTRING 2 242 124) ($$ 3 243 156)) (DIGITSTRING) '' 206 117)
  (2 (EXPRESSION FACTOR TERM) ((DIGITSTRING 2 242 103)) (DIGITSTRING) 'expression_DECIMALPOINT_2' 206 76)
))

(/ (
  (1 nil ((FACTOR 1 126 431)) (FACTOR) 'expression_divided_by_1' 148 395)
))

([ (
  (1 (EXPRESSION FACTOR TERM) ((EXPRESSION 2 107 287)) (EXPRESSION) '' 77 280)
))

(] (
  (1 nil FACTOR 'expression_RIGHT_BRACKET_v' '' 249 280)
))

(DIGITSTRING (
  (1 (EXPRESSION FACTOR TERM) ((. 1 181 124) ($$ 3 214 156)) () 'expression_DIGITSTRING_1' 79 117)
  (2 nil FACTOR 'expression_DIGITSTRING_2_v' '' 270 117)
))

(EXPRESSION (
  (1 nil ((RPAREN 1 273 340)) () 'expression_EXPRESSION' 173 333)
  (2 nil ((] 1 228 287)) () 'expression_EXPRESSION' 128 280)
))

(FACTOR (
  (1 (EXPRESSION TERM) ((x 1 112 431) (/ 1 147 431) ($$ 1 166 460)) (x) 'expression_FACTOR_1' 95 453)
  (2 nil FACTOR 'expression_FACTOR_2_v' '' 126 224)
))

(LPAREN (
  (1 (EXPRESSION FACTOR TERM) ((EXPRESSION 1 152 340)) (EXPRESSION) '' 80 333)
))

(NUMBER (
  (1 (EXPRESSION DIGITSTRING FACTOR TERM) DIGITSTRING 'expression_NUMBER_1_v' '' 77 35)
))

(RPAREN (
  (1 nil FACTOR 'expression_RIGHT_BRACKET_v' '' 293 333)
))

(TERM (
  (1 (EXPRESSION) ((+ 1 515 81) (- 1 542 81) ($$ 2 564 112)) () 'expression_TERM_1' 504 105)
))

(x (
  (1 nil ((FACTOR 1 93 431)) (FACTOR) 'expression_times_1' 78 395)
))

The same grammar saved in unoptimized file format is shown below.  Notice, for example, that the fourth item is nil in the first (and only) instance of  the phrase type TERM, indicating that no new subphrase can begin immediately after a part of a parsed phrase that corresponds to node (TERM 1) in the grammar.  Such a nil has exactly the same meaning as an empty list () in the optimized file format.  In fact, nil and () can be used interchangeably in grammar files which are used as input to ASDEditor and ASDParser.

($$ (
  (1 nil TERM 'expression_$$_1_v' '' 186 453)
  (2 nil EXPRESSION 'expression_$$_2_v' '' 586 105)
  (3 nil FACTOR 'expression_$$_3_v' '' 272 181)
))

(* (
  (1 T x '' '' 297 392)
))

(+ (
  (1 nil ((TERM 1 503 81)) T 'expression_plus_1' 489 44)
))

(- (
  (1 nil ((TERM 1 528 81)) T 'expression_minus_1' 542 43)
  (2 T ((FACTOR 2 105 231)) T '' 75 224)
))

(. (
  (1 nil ((DIGITSTRING 2 242 124) ($$ 3 243 156)) T '' 206 117)
  (2 T ((DIGITSTRING 2 242 103)) T 'expression_DECIMALPOINT_2' 206 76)
))

(/ (
  (1 nil ((FACTOR 1 126 431)) T 'expression_divided_by_1' 148 395)
))

([ (
  (1 T ((EXPRESSION 2 107 287)) T '' 77 280)
))

(] (
  (1 nil FACTOR 'expression_RIGHT_BRACKET_v' '' 249 280)
))

(DIGITSTRING (
  (1 T ((. 1 181 124) ($$ 3 214 156)) nil 'expression_DIGITSTRING_1' 79 117)
  (2 nil FACTOR 'expression_DIGITSTRING_2_v' '' 270 117)
))

(EXPRESSION (
  (1 nil ((RPAREN 1 273 340)) nil 'expression_EXPRESSION' 173 333)
  (2 nil ((] 1 228 287)) nil 'expression_EXPRESSION' 128 280)
))

(FACTOR (
  (1 T ((x 1 112 431) (/ 1 147 431) ($$ 1 166 460)) T 'expression_FACTOR_1' 95 453)
  (2 nil FACTOR 'expression_FACTOR_2_v' '' 126 224)
))

(LPAREN (
  (1 T ((EXPRESSION 1 152 340)) T '' 80 333)
))

(NUMBER (
  (1 T DIGITSTRING 'expression_NUMBER_1_v' '' 77 35)
))

(RPAREN (
  (1 nil FACTOR 'expression_RIGHT_BRACKET_v' '' 293 333)
))

(TERM (
  (1 T ((+ 1 515 81) (- 1 542 81) ($$ 2 564 112)) nil 'expression_TERM_1' 504 105)
))

(x (
  (1 nil ((FACTOR 1 93 431)) T 'expression_times_1' 78 395)
))