next up previous contents
Next: About this document Up: QT4XML: A Query Tool Previous: Migrating from POET OQL

Using Lex and Yacc

Currently a subset of the grammar for QT4XML is supported in the prototype implementation. It would be useful to use lex and yacc utilities to tokenize and parse the grammar. A lex is a lexical analyzer and is used to break up the input stream into useful elements. A yacc (also stands for Yet Another Compiler Compiler), given a grammar, generates a parser for it. JavaCC [Microsystems, 1998] is a Java parser generator which can be used as a lex and/or yacc utility. The input to JavaCC is a grammar file having a general form of:

    javacc_input ::= javacc_options
                     "PARSER_BEGIN" "(" <IDENTIFIER> ")"
                     java_compilation_unit
                     "PARSER_END" "(" <IDENTIFIER> ")"
                     ( production )*
                     <EOF>

The grammar file starts with a list of options (which is optional). This is then followed by a Java compilation unit (which should be at least a C class declaration with the same name as that of the generated parser) enclosed between "PARSER_BEGIN(name)" and "PARSER_END(name)". After this is a list of grammar productions. The BNF production is the standard production for specifying JavaCC grammar, however, JAVA CODE production or regular expression production can also be used. Productions can also be token manager declarations to introduce declarations that get inserted into the generated token manager. The name following "PARSER_BEGIN" and "PARSER_END" must be the same and that is the name of the generated parser.
For example, if name is "MyParser", and the grammar input file name name is "MyParser.jj", then the following files are generated when the command javacc MyParser.jj is executed:

The generated parser file contains everything in the compilation unit described earlier in addition to the generated parser code on the basis of the input grammar. There is a public method declaration for each non-terminal in the productions defined. One feature to note is that there is no single start symbol in JavaCC as in yacc.
The generated token manager has one public method, Token getNextToken() throws ParseError;.
Other files are generated which can be used for error checking, etc. These are standard across all generated parsers and can be reused.

With the parser ready, it can be used to generate code.


next up previous contents
Next: About this document Up: QT4XML: A Query Tool Previous: Migrating from POET OQL

Sonali Sheth
Wed Jul 7 23:16:41 EDT 1999