--------------------------------------------------------------------------------------- Thanks to Dr Spencer Rugaber's from Georgia Tech for the description --------------------------------------------------------------------------------------- BNF AND EBNF 1. A grammar, G, is a tuple (N, Sigma, P, S) where N is a set of non-terminals, Sigma is a set of terminals, P is a set of productions, and S is an element of N. In the case of BNF, these components can be defined as follows. 2. N is a set of strings. The strings begin with a left angle bracket (<), contain a sequence of alphabetic characters (ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz) and spaces ( ) beginning with an alphabetic character, and terminate with a right angle bracket (>). 3. The following BNF rules can be used to describe this set. My comments are placed within C-style comments brackets. ::= A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ::= |_| /* A space character. I am trying to approximate the graphic used in the Algol 60 report */ ::= | ::= /* The null string of symbols */ | ::= ::= < > N is a set of nonterminal names. 4. Sigma is another set of strings, this one of terminal symbols. Some of these symbols are metacharacters used by BNF and some of them are used by the target language. Backus used the following symbols in the Algol 60 report. ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 0123456789 +-*/ quotient /* colon overstriking hyphen approximates this symbol */ ^ /* actually an up arrow */ < <= /* actually they are overstruck */ = >= /* actually they are overstruck */ /= /* actually they are overstruck */ ()[]'`. 10 /* as a subscript for base notation */ ,:; -| /* the PL/I not character */ /\ /* as a single character */ \/ /* as a single character */ implication /* a subset symbol */ equivalence /* three horizontal lines */ I am not listing space here although Algol recognizes it. It is defined above, and I am going to use it to separate tokens rather than be included in them. Backus was not explicitly concerned with lexical issues. For example, the report can be interpreted as indicating that Algol 60 keywords could be treated as single terminal symbols. In defining BNF, I will punt by making the following definition. ::= | ::= | /* Note the ambiguity here */ ::= ::= /* Note the ambiguity here */ ::= /* What ever you need */ ::= ::= | ::= | Sigma is a set of tokens. 5. P is a set of productions. A production has a left hand side and a right hand side. They are separated by a . The right hand side is either empty or contains a sequence of alternatives. An alternative is a concatenation of symbols. A symbol is either a nonterminal or a token. ::= ::= ::= | ::= | ::= ::= | ::= | P is a set of productions. 6. S is merely an element of N. 7. Note that there are several problems with the notation described above. + There is a confusion between metacharacters and terminal characters when trying to define the metacharacters of BNF. + Lists are awkward to express. + Grouping/factoring is only available by making a new definition. 8. BNF is capable of describing all context free grammars. 9. EBNF (Extended BNF) adds some features that makes it easier to express productions while still describing the same set of grammars. There are several extensions and some transformations provided by EBNF. For example, as languages have gotten more complex, the number of nonterminals found in a grammar has gone up, while the number of terminals has stayed constants. It makes sense therefore to quote the terminals and let the nonterminals stand alone. + Nonterminals now stand alone, without being surrounded by angle brackets. + Therefore, terminals must now be enclosed in apostrophes. + The left hand side is now separated from the right hand side by a right pointing arrow (->) instead of the (::=) symbol. The productions are thought of a rules where the left hand side generates or produces the right hand side. + Rules are terminated by periods, avoiding the potential for confusion when empty strings occur in the grammar. + Parentheses can now be used for grouping, allowing nested alternatives. Hence, A -> B | C X. B -> T U | W Z. can be expressed with the single rule A -> (T U | W Z) | C X. assuming B is not used elsewhere. + Optional constructs can be placed in square brackets. Hence, A -> X | X A | X B. can be expressed by the rule A -> X [ A | B ]. + The Kleene star (*), used in describing regular expressions can be used to describe lists. In particular, A* indicates zero or more occurrences of the symbol A. A -> X | X B. B -> empty | Y B. can be expressed with A -> X (Y*). assuming B is not used elsewhere. Sometimes curly brackets are used instead of *, avoiding the need for parentheses. The above example could then be expressed as the following. A -> X {Y}. + A variation of * is the use of + to indicate one or more occurrences. A -> X B. B -> Y | Y B. can be expressed with A -> X (Y+). assuming B is not used elsewhere. ---------------------------------------------------------------------------------------