Currently a subset of the grammar for QT4XML is supported in the prototype implementation. It would be useful to use lex and yacc utilities to tokenize and parse the grammar. A lex is a lexical analyzer and is used to break up the input stream into useful elements. A yacc (also stands for Yet Another Compiler Compiler), given a grammar, generates a parser for it. JavaCC [Microsystems, 1998] is a Java parser generator which can be used as a lex and/or yacc utility. The input to JavaCC is a grammar file having a general form of:
javacc_input ::= javacc_options "PARSER_BEGIN" "(" <IDENTIFIER> ")" java_compilation_unit "PARSER_END" "(" <IDENTIFIER> ")" ( production )* <EOF>
The grammar file starts with a list of options
(which is optional). This is then followed by a Java
compilation unit (which should be at least a
C class declaration with the same name as that of the
generated parser) enclosed between "PARSER_BEGIN(name)"
and "PARSER_END(name)". After this is
a list of grammar productions. The BNF production
is the standard production for specifying JavaCC
grammar, however, JAVA CODE production or regular
expression production can also be used. Productions
can also be token manager declarations to introduce
declarations that get inserted into the
generated token manager.
The name following "PARSER_BEGIN" and "PARSER_END"
must be the same and that is the name of the generated
parser.
For example, if name is "MyParser", and the grammar
input file name name is "MyParser.jj",
then the following files are generated when the
command javacc MyParser.jj is executed:
The generated parser file contains everything
in the compilation unit described earlier in
addition to the generated parser code on the basis
of the input grammar. There is a public method
declaration for each non-terminal in the productions
defined. One feature to note is that there
is no single start symbol in JavaCC as in yacc.
The generated token manager has one public method,
Token getNextToken() throws ParseError;.
Other files are generated which can be used for error
checking, etc. These are standard across all generated
parsers and can be reused.
With the parser ready, it can be used to generate code.