So this dude called Robert Nystrom wrote a book called
Crafting Interpreters. You should totally get it. It's amazing.
It walks you through every small piece of creating an interpreter
for a small programming language called Lox. The first ~third of
the book guides us through building a tree-walk interpreter in
Java, called jlox
. I followed along with it, but
instead of Java, wrote the code in Rust. Thus the name:
jlox-rs
, because I have the creativity of a small
handful of dried moths. Rust can be compiled to JavaScript, and
after sprinkling a layer of web magic, you can now use this site
to poke at my interpreter!
Challenges
The book sets a number of challenges. I've taken on all (?) of
these, so the interpreter here has a few features not part of
"standard" Lox:
-
You get a nice, friendly error message (thanks to a thing called
an "error production") if you use a binary operator with a
missing left-hand-side value.
-
You get a nice error message if you try to divide by zero.
-
When you concatenate (with
+
) anything to a string,
that thing gets cast to a string.
-
Not depicted here: when run in REPL mode, the value of the last
evaluated expression in an input is automatically printed.
-
You get a nice, friendly error message when you try to access a
declared, but uninitialized variable.
-
break
works inside loops as you'd expect (and
errors out when used outside loops)
-
Anonymous functions exist: you can
var f = fun() { return 3; }; print f();
-
You get a static analysis error if a local variable is not used
(except in the REPL).
-
Local variables are accessed in O(1) time by binding them in the
variable resolution pass: each gets an index, and the variables
are accessed in a
Vec
by index at runtime; instead
of by name, in a (hash)map. By the way, this made everything
after it way more complex :D
-
Here's one way this optimization made things more complex: when
running as a REPL, each new input counts as starting from line
`0`. That means variable bindings that say "this variable was
declared at `(0:10)`" need to also account for which command the
declaration was in.
-
Static methods exist. See the
advanced_classes
example.
- Getters also exist! And inheritance works with them!
Rust things
The differences between Java and Rust led to a few interesting
results:
-
The Java implementation uses Java to generate Java code to DRY
the AST. Naturally, I used Rust macros for this (and a few other
conveniences).
-
The Java implementation stores references to
Environment
s (basically, runtime variable scopes)
in multiple places; most notably, closures are implemented by
"just" storing another reference to the right environment. In
Rust, the least painful way I could find was wrapping
environments in Rc<RefCell<_>>
.
-
It's not just environments! All runtime values can appear in
various positions where multiple references to them must exist
at the same time; so basically EVERY runtime value / variable
primarily exists as a
Rc<RefCell<_>>
.
-
The
Result
type in Rust is awesome. I used it to
handle all kinds of errors, and the
thiserror
library to raise specific error types.
The book uses both booleans to store success states in some
cases, and exceptions in some others; translating those took a
little thinking (not too much).
-
While I was at it, I added the location of the error to error
messages. As the book hints, this is most optimally done by
storing code locations as byte offsets most of the time, and
only resolving them to line/column numbers when an error is
printed to the user. (So that's what I did.)
-
There are a few enums with lots of variants. Sometimes code
needs / wants to operate on a specific variant. I didn't find a
great solution to this: mostly the data of each variant is now a
stand-alone struct, which works, but it feels a bit clumsy. It
also gets a bit tangled up with all the
Rc<RefCell<_>>
.
Other things
-
I added tests for features I implemented as I went along; this
really helped out during some hairy refactors.
-
The Rust-to-WebAssembly pipeline is now fairly mature, I had
very few problems setting it up.
-
The code editor is Monaco-Editor, which is essentially the same
thing that's used in VS Code. For syntax highlighting, I took
one of the example Monaco language tokenizers, and tweaked it
until it matched this Lox variant. So I guess you could say it's
custom!
-
There are a number of things in the implementation I'm not quite
happy with, but this is good enough to stop, and move on to the
next part of the book!