ECOOP 2026 Pearl
A Simple Recipe
for Decent Recursive
Descent Parsers
for Decent Recursive
Descent Parsers
Luyu Cheng and Lionel Parreaux
Keep handwritten parsing direct.
Make accepted syntax visible again.
Make accepted syntax visible again.
1Token-local rules
2Recursive descent control flow
3Generated diagrams and tables
1/13
Two Routes Into Parsing
Theory-first route
1
Context-free grammars
2
Automata and algorithms
3
LL and LR families
Tool-first route
1
Parser generators
2
Parser combinators
3
Working parsers quickly
The missing middle: a quick path to an inspectable parser for a complete language.
2/13
The Missing Middle
Recursive descent remains worth learning. It gives a concrete model: consume, choose, build syntax.
Most tutorials stop at expressions. Full languages need statements, declarations, blocks, patterns, and types.
Pratt’s idea is broader than precedence. Tokens and rules can carry parsing behavior.
The recipe is a disciplined realization. Keep recursive descent as execution, with modular, extensible, visible rules.
This paper presents
A tutorial recipe for writing flexible recursive-descent parsers.
In a modern statically typed language, using Pratt parsing as the running example.
LanguageMLscript
Targeta simple language based on Caml Light
3/13
Revisiting Pratt Parsing
Binding Power
Placeholder for the revised explanation of left and right binding power.
| tokens | A | + | B | * | C | ||||
|---|---|---|---|---|---|---|---|---|---|
| binding powers | 3 | 4 | 5 | 6 | |||||
| ←→ | |||||||||
| tokens | A | + | B | + | C | ||||
|---|---|---|---|---|---|---|---|---|---|
| binding powers | 3 | 4 | 3 | 4 | |||||
| ←→ | |||||||||
4/13
Binding Power Demo
5/13
The Pratt-Shaped Loop
fun expr(bp: Int): Tree = let acc = atom() exprCont(acc, bp)fun exprCont(acc, bp) = if peek is Some(op) and op.lbp > bp then let rhs = expr(op.rbp) exprCont(apply(op, acc, rhs), bp) _ then accLoop Steps
1
Parse one atomic phrase.
2
Test whether the next token can continue it.
3
Recurse under that threshold.
6/13
Trace One Pratt Loop
Loop body
fun expr(bp: Int): Tree = let acc = atom() exprCont(acc, bp)fun exprCont(acc, bp) = if peek is Some(op) and op.lbp > bp then let op = consume() let rhs = expr(op.rbp) exprCont(apply(op, acc, rhs), bp) _ then accInput Stream
A→+→B→*→C→end
ready
State
acc
A
next
+
bp
0
Call Stack
7/13
Expression Parsing Is Not Enough
Original TDOP
print value + increment
The association problem is between neighboring argument positions.
log value base radix
A delimiter documents the role of the following argument.
clear item, item, item
Variable-length argument lists still fit the token-local view.
Whole-Language Syntax
if condition then consequence else alternative
A rule spans several keywords and named subterms.
match subject with
pattern -> branch
| pattern -> branch
Branches repeat and each branch carries its own syntax.
let rec name argument = body in rest
Declarations and expressions compose in the same parser.
Keep the control flow general and define what counts as a continuation.
8/13
From Operators to Rules
Classic Pratt Table
| token | lbp | rbp | continuation |
|---|---|---|---|
| + | 3 | 4 | left + expr(4) |
| * | 5 | 6 | left * expr(6) |
| apply | 7 | 8 | left expr(8) |
The table maps one token to one continuation action.
→
Rule Representation
Rule[A]
an array of choices
End(value)
finish the rule and return the value
Keyword(kw, rest)
consume a keyword and continue with rest
Ref(kind, process, bp, rest)
parse a referenced syntax kind, process it, then continue
let ifThenElse = keyword(_if) of term(name: "condition") of keyword(_then) of term(name: "consequent") of end keyword(_else) of term(name: "alternative") of endRules describe parseable fragments, not only symbolic operators.
9/13
One Rule, Two Views
Source rule
let ifThen = keyword(_if) of term(name: "condition") of keyword(_then) of term(name: "consequent") of endlet ifThenElse = keyword(_if) of term(name: "condition") of keyword(_then) of term(name: "consequent") of keyword(_else) of term(name: "alternative") of endlet printFormat = keyword("print") of term(name: "x") of keyword("format") of stringLiteral(name: "f") of endterm of // Tree.Application: <expr> <expr> term of process: (argument, _) => callee => Tree.Application( callee, argument :: Nil) name: "application argument" bp: Keyword.appPrecThe parser interprets the rule; the talk, docs, and users see the same rule as a diagram.
Rendered shape
10/13
Parsing Terms and Types
Generic machinery executes reified rules with parseRule and dispatches syntax kinds through parseKind.
Terms and types both follow “parse an atom, then greedily extend it.”
Keywords, applications, and ascriptions are delegated to rules.
typeExpr and typeExprCont follow the same pattern, but run over typeRule.
With Infix, the remaining loop becomes a reified continuation.
11/13
Two Functions Interpret Rules
Functions in the parser
fun parseKind(kind: Str, bp: Int): Treefun parseRule(bp: Int, rule: Rule[Tree])val afterRef: Option[[Str, Rule[Option[Tree -> A]]]]1
parseKind retrieves a named rule or handles built-in syntax kinds.
2
parseRule consumes a Rule and returns a tree.
3
afterRef drives continuations that reference the same syntax kind.
parseRule, with details omitted
fun parseRule(bp: Int, rule: Rule[Tree]) = if peek is Some(Token.Identifier(name)) and Keyword.all |> MutMap.get(name) is Some(keyword) and rule.keywordChoices |> Map.get(name) is Some(rest) then consume parseRule(bp, rest) Some(other) and rule.refChoice is Some(Ref(kind, f, ebp', rest)) and let ebp = ebp' |> Option.getOrElse(Keyword.max) ebp > bp and parseKind(kind, bp) as acc is ¬(Tree.Error | Tree.Empty) and parseRule(bp, rest) as tree is Tree.Error then tree else f(acc, tree) Some(other) and rule.endChoice is Some(value) then value Some(other) then consume Tree.Error of None, "unexpected token " + other12/13
Conclusion
Pratt parsing can be a recipe for an inspectable parser, not just an operator table.
Keep Pratt's control flow, but move syntax into first-class Rule and Choice data.
parseKind and parseRule interpret those rules under a binding-power threshold.
The same rules parse, support user extension, and render as syntax diagrams.
Caml Light mostly fits the recipe; ambiguous top-level let still exposes where manual parsing remains.
For MLscript, the goal is to reduce parser-combinator opacity and ad-hoc recursive-descent special cases.
13/13