I’ll be joining Recurse Center for their Summer 1 batch from May 22nd to Aug 10th. RC is a self-directed, community-driven educational retreat for programmers in New York City.
I am going to use my tiny lisp interpreter called lisper as an umbrella project to explore a few fundamental ideas in computer science; mainly:
- Code generation
- Garbage Collection
Following Andrew Ng’s advice, I picked projects that I’m definitely underqualified for. I’ll explain why this matters to me.
1. Code Generation
Writing an interpreter is easy. You can cheat by making the host language do most of the heavy lifting. You write a parser to parse the string into an AST and then evaluate it by walking down and replacing expressions with values. All the primitives operations like numeric computation and string manipulation; classic data structures like arrays and maps etc can be mapped directly to the host language. You typically don’t have to worry about code generation or garbage collection because it works automatically. This feels wrong when your intention is to learn how things work.
I think it is going to be interesting dealing with nothing but a few primitive operations and a couple of registers. I’ve no idea how I’ll generate code for a curried function. I know how tail call optimizations or C calling conventions work, but only a 10,000ft view. I have to implement it before I get any better at it.
There are a whole lot of challenges here and I will have to learn a lot before I can start. What platform will I target? Should I go very low-level like native x86/LLVM or a pick a higher level abstraction like C--, BEAM or JVM? Or should I write a toy VM to start with? Maybe a Symbolics or Atari emulator for fun? Who knows?
2. Garbage Collection
I’m a big fan of statically typed pure functional programming and one of the important factors that contribute to the insane performance is the kickass GC under the hood. It is critical for any high-level language which generates a lot of intermediate objects; even more, if the values are immutable like Erlang or Clojure.
I poked into the recent GC developments in Go 1.8 and didn’t understand it well enough. I worked with the per-process GC of Erlang for a while and it looked really great. There are myriad ways to tune the JVM GC. All of this sounds so exciting and the best way to learn is to write a toy implementation.
Some further reading,
- Erlang Garbage Collection Details and Why It Matters
- Go’s march to low-latency GC
- Golang’s Real-time GC in Theory and Practice
I fell in love with Erlang’s preemptive scheduler and its approach to fault tolerance immediately after I started working with it. It is significantly easier to write systems that will keep working even when some parts of it behave badly. For example, it would be pretty terrible if Emacs crashed every time Skype went down; but somehow we consider this to be acceptable while developing software. Shitty python exceptions affect every user on your web application, not just the user that caused it.
Erlang’s processes act independently of one another. One process cannot block or starve another one, just like operating systems. This is called a preemptive scheduler. The runtime or OS can stop and start processes wherever it wants to - leading to fair utilization of resources b/w competing concurrent units.
Contrast this with node.js, where all the callbacks and handlers execute in the same context, affecting each other in myriad ways and any one of them can block and starve everyone else. This is called co-operative scheduling.
I want to implement both strategies for lisper.
There is still a lot more to do, but I don’t think I’ll have any time left. A type checker would be great to have. Support for distributed systems would be awesome.
RC is a lot about discovering new things from fellow recursers. I hope to get pulled into interesting projects by other people. Fixing unknown unknowns :)