LL1 Parsing @ supposed to represent nothing at all ::= "{" "}" ::=

| @ ::= ";" | @ ::= | | | ::= identifier ":=" ::= "while" "do" ::= "print" ::= number | identifier | "(" ")" ::=

::= "*" | "/" | @ ::=

::= "+" | "-" | @ ::=

::= "=" | "<" | @ ::= nullable(sequence A B C) = nullable(A) /\ nullable(B) /\ nullable(C) nullable(choice A B C) = nullable(A) \/ nullable(B) \/ nullable(C) nullable(terminal symbol) = false nullable(non-terminal) = nullable(its definition) nullable(@) = true nullable() = nullable("{") = false nullable() = nullable(something | @) = true nullable() = nullable(something | @) = true nummable() = nullable(identifier) = false nummable() = nullable("while") = false nummable() = nullable("print") = false nullable() = nullable() \/ nullable() \/ nullable() \/ nullable() = false nullable( = nullable() /\ nullable() = false nullable() = nullable(something | @) = true nullable( = nullable() /\ nullable() = false nullable() = nullable(something | @) = true nullable( = nullable() /\ nullable() = false nullable() = nullable(something | @) = true nullable() = nullable() = false + represents union, * represents intersection first(@) = { } first(terminal symbol a) = { a } first(non-terminal symbol) = first(its definition) first(A | B | C) = first(A) + first(B) + first(C) first(A B C) = first(A) if not nullable(A) first(A) + first(B) if nullable(A) /\ not nullable(B) first(A) + first(B) if nullable(A) /\ not nullable(B) first(A) + first(B) + first(C) if nullable(A) /\ nullable(B) first() = { "{" } first() = first() = { identifier, "while", "print", "{" } first() = { ";" } first() = first() + first() + first() + first() = { identifier, "while", "print", "{" } first() = { identifier } first() = { "while" } first() = { "print" } first() = { number, identifer, "(" } first() = first(value) = { number, identifer, "(" } first() = { "*", "/" } first() = first() = { number, identifer, "(" } first() = { "+", "-" } first() = first() = { number, identifer, "(" } first() = { "=", "<" } first() = first() = { number, identifer, "(" } in a rule A ::= B | C | D indicate(B) = if B not nullable then first(B) else first(B) + follow(A) indicate(C) = if B not nullable then first(C) else first(C) + follow(A) indicate(D) = if B not nullable then first(D) else first(D) + follow(A) indicate() is not just for a particular piece of grammar, it is different depending on where that piece of grammar appears, which rule it is an option for. follow sets (only for non-terminal symbols) are accumulated over the entire grammar whenever appears inside but not at the end of a sequence in a rule ::= ... | P Q R X Y Z follow() includes first(X Y Z) and if nullable(X Y Z) it also includes follow() whenever appears at the end of a rule ::= ... | P Q R follow() includes follow() What we get is follow() = { ";", "}", eof } follow() = { "}" } follow() = { "}" } follow() = { ";", "}" } follow() = { ";", "}" } follow() = { ";", "}" } follow() = { ";", "}" } follow() = { "*", "/", "+", "-", "=", "<", ")", ";", "}", "do" } follow() = { "+", "-", "=", "<", ")", ";", "}", "do" } follow() = { "+", "-", "=", "<", ")", ";", "}", "do" } follow() = { "=", "<", ")", ";", "}", "do" } follow() = { "=", "<", ")", ";", "}", "do" } follow() = { ")", ";", "}", "do" } follow() = { ")", ";", "}", "do" } follow() = { ")", ";", "}", "do" } The only non-trivial indicates are ::=

{ identifier, "while", "print", "{" } | @ { "}" } ::= ";" { ";" } | @ { "}" } ::= "*" { "*" } | "/" { "/" } | @ { "+", "-", "=", "<", ")", ";", "}", "do" } ::= "+" { "+" } | "-" { "-" } | @ { "=", "<", ")", ";", "}", "do" } ::= "=" { "=" } | "<" { "<" } | @ { ")", ";", "}", "do" } which means that all of the grammar's indicate sets are disjoint so LL1 will work. parse a sequence A ::= B C D bt = parse(B) ct = parse(C) dt = parse(D) return new node ("A", bt, ct, dt) parse a choice * represents intersection A ::= B | C | D look at next symbol s if s in indicate(B) t = parse(B) else if s in indicate(C) t = parse(C) else if s in indicate(D) t = parse(D) else syntax error s seen when one of indicate(B) + indicate(C) + indicate(D) expected return new node ("A", t) obviously the choice can not be made unless indicate(B) * indicate(C) * indicate(D) = { } parse a terminal symbol, A, for example "while" or identifier read next symbol s if s = A return new node("symbol", s) else syntax error s seen when A expected parse a non-terminal symbol,

return parse(its definition) This style of definition for arithmetic ::=

::= "-" | @ would look more natural as ::= { "-" } where curly brackets mean their content is optional which makes it easier to see a big problem. It builds the wrong tree. 8 - 4 - 2 only fits if 8 is accepted as the and 4 - 2 is the other (4 - 2 fits if 4 is the value and 2 is the other 2 fits if 2 is the and the optional part is absent) That would prodcue a subtract note with 8 on the left and (4 - 2)'s tree on the right, forcing right-to-left evaluation. What we really want is ::= { "-" } which would be written in the simpler style as ::= "-" | Which would be programmed like this: parse_expr(): S = look ahead at next symbol if S in indicate set for first option L = parse_expr() S = consume next symbol if S != "-" sytax error R = parse_value() return new node("-", L, R) else if S in indicate set for second option return parse_value() else syntax error Not only has that got an infinite recursion in the first case, but also the indicate sets for the two options will be identical. In this case, and many like it, the solution is not to say that the grammar isn't LL1 therefore we must give up on LL1 and go with something else, as many people insist. There is a very simple solution. Replace the recursion with a loop. parse_expr() L = parse_value() S = look ahead at next symbol while S = "-" consume next symbol (the -) R = parse_value() L = new node("-", L, R) return L That is, strictly speaking LALR1 (close to LR1, the real bottom up method) but it fits perfectly in with the much better structured LL1 framework.