By Nick Snyder for the GopherCon Liveblog on August 29, 2018
Presenter: Sugu Sougoumarane
Liveblogger: Nick Snyder
Sugu Sougoumarane is the co-creator of Vitess, which contains a SQL parser that is now used by many other projects. In this talk he demonstrates how to write a parser using goyacc.
Works with LALR(1) grammars (look head one token and decide what action to take).
General steps:
Goyacc is almost an exact translation of the original yacc so some of the idiosyncrasies have been inherited. For example, C programs return only 1 value: 0 for success and 1 for failure. This means you need awkward boilerplate to give values to the lexer:
%{
package jsonparser
func setResult(l yyLexer, v Result) {
l.(*lex).result = v
}
%}
%union{
}
%start main
%%
main:
{
setResult(yylex, 0)
}
Use go generate to create the actual parser.go file:
//go:generate goyacc -l -o parser.go parser.y
Area code has three parts: area code, first part, second part.
%token D
phone:
area part1 part2
| area '-' part1 '-' part2
area: D D D
part1: D D D
part2: D D D D
Captital letters signify tokens.
The generated parser is just a single function that runs a state machine and uses local variables.
These variables are saved in a union data structure:
%union{
result Result
part string
ch byte
}
%type phone
%type area part1 part2
Actions run Go code (i.e. everything inside the braces) when a rule matches. Dollar variables address a variable that is a value returned by the parser.
part2: D D D D
{
$$ = cat($1, $2, $3, $4)
}
Two things are happening concurrently during lexing:
Sometimes lex can return the byte itself as an int. Yacc has builtin predetermined tokens so all first 127 bytes are reserved and can be returned without telling the parser you are returning them
b := l.nextb()
if unicode.IsDigit(rune(b)) {
lval.ch = b
return D
}
return b
Goyacc
Lex is not code that you live in. It is code you write once and then use for a long time. Ok if the code is not clean.
For complicated grammars (e.g. SQL), Goyacc can generate a big result structure that is expensive to pass around. Goyacc actually assigns this structure every time there is a state transition.
C (yacc) has structure called union which efficiently packs the datastructre, but there is no equivalent in Go....except interfaces are a very close equivalent!
Unlike C union type, you can type assert an interface in Go. One limitation with using a type asserted Go interface is that it is an rvalue which means you can't assign to it.
Switching Vitess to use an interface instead of struct doubles performance, but would be a backward incompatible change to goyacc.