We will define four kinds of object to begin with:
FOO, BAR, ADD-TWO.
We will normalize characters to upper-case in this project, but this
is not strictly necessary.
NILNULL in C and other
languages.
NIL, or a reference to another pair.
The types of each element may be different.
Integers, symbols and NIL are called simple data.
The term atom can refer to either a simple datum or a pair
(purists may disagree on this point).
Note that integers and symbols are immutable, so we can think of two integers with the same value as being the same object. This is particularly useful for symbols, because it allows us to test for equality by comparing pointers. Go has some limitations on pointer comparisons, so we need to wrap the name of our symbol in a structure.
Let's declare some Go types to hold our data. There are many clever ways to store LISP objects efficiently, but for this implementation we will stick to a very simple scheme [please excuse the pun].
// Atom is either an Atom or a Pair
type Atom struct {
_type AtomType
value AtomValue
}
We use _type since type is a reserved word in Go.
// AtomType is the enum for the type of value in a cell.
type AtomType int
const (
// Nil represents the empty list.
AtomType_Nil AtomType = iota
// Integer is a number.
AtomType_Integer
// Pair is a "cons" cell holding a "car" and "cdr" pointer.
AtomType_Pair
// Symbol is a string of characters, converted to upper-case.
AtomType_Symbol
)
// AtomValue is the value of an Atom.
// It can be a simple type, like an integer or symbol, or a pointer to a Pair.
type value struct {
integer int
pair *Pair
symbol *Symbol
}
// Pair is the two elements of a cell.
// "car" is the left-hand value and "cdr" is the right-hand.
type Pair struct {
car, cdr Atom
}
// Symbol implements data for a symbol.
type Symbol struct {
label []byte
}
A few helper functions will be handy:
// car returns the first item from a list.
// It will panic if p is not a Pair
func car(p Atom) Atom {
return p.value.pair.car
}
// cdr returns the remainder of a list.
// It will panic if p is not a Pair
func cdr(p Atom) Atom {
return p.value.pair.cdr
}
// nilp is a predicate function. It returns true if the atom is NIL.
func nilp(atom Atom) bool {
return atom._type == AtomType_Nil
}
The "p" in nilp stands for "predicate". Identifiers in Go
may not contain question marks. There is no need to restrict our LISP
implementation in that way, of course.
Let's create a global for the NIL variable, too.
// _nil is the NIL symbol.
// This should be immutable, so don't change it!
var _nil = Atom{_type: AtomType_Nil}
We use _nil since nil is a reserved word in Go.
Go doesn't allow immutable structures, so we can't declare this as a "const."
Know that bad things will happen if you update this variable.
Integers and (pointers to) strings can be copied around, but we need to allocate pairs on the heap.
// cons returns a new Pair created on the heap.
func cons(car, cdr Atom) Atom {
return Atom{
_type: AtomType_Pair,
value: AtomValue{
pair: &Pair{
car: car,
cdr: cdr,
},
},
}
}
cons is a function to allocate a pair on the heap and
assign its two elements.
It may look like this cons will leak memory the moment its return value is discarded.
Go is a garbage-collected language so the memory will eventually be reclaimed.
Now we can start creating LISP objects. An integer:
// make_int returns an Atom on the stack.
func make_int(x int) Atom {
return Atom{
_type: AtomType_Integer,
value: AtomValue{
integer: x,
},
}
}
And a symbol:
// make_sym returns an Atom on the stack.
// The name of the symbol is converted to uppercase.
// Memory for the symbol is allocated and owned by the Atom.
func make_sym(sym []byte) Atom {
return Atom{
_type: AtomType_Symbol,
value: AtomValue{
// upper-case copy of the symbol
symbol: &Symbol{label: bytes.ToUpper(sym)},
},
}
}
We will write a pair like this:
(a . b)
where a is the car and b is the
cdr.
By using the cdr of a pair to reference another pair, we can create a chain:
(a . (b . (c . (d . NIL))))
Notice that the cdr of the last pair is NIL. This
signifies the end of the chain, and we call this structure a
list. To avoid having to write a large number of brackets, we
will write the previous list like this:
(a b c d)
Finally, if the cdr of the last pair in a list is not
NIL, we will write this:
(p q . r)
which is equivalent to
(p . (q . r))
This is called an improper list.
Printing an atom or list is simple. (In theory - the boilerplate from handling errors makes it look complicated.)
// print_expr is a helper function to write an expression
// to stdout, ignoring errors.
func print_expr(expr Atom) {
_, _ = expr.Write(os.Stdout)
}
// Write writes the value of an Atom to the writer.
// If the atom is a pair, Write is called recursively
// to write out the entire list.
func (a Atom) Write(w io.Writer) (int, error) {
switch a._type {
case AtomType_Nil:
// atom is nil, so write "NIL"
return w.Write([]byte{'N', 'I', 'L'})
case AtomType_Integer:
// atom is an integer
return w.Write([]byte(fmt.Sprintf("%d", a.value.integer)))
case AtomType_Pair:
// atom is a list, so write it out surrounded by ( and ).
totalBytesWritten, err := w.Write([]byte{'('})
if err != nil {
return totalBytesWritten, err
}
// print the car of the list.
bytesWritten, err := car(a).Write(w)
totalBytesWritten += bytesWritten
if err != nil {
return totalBytesWritten, err
}
// write the remainder of the list
for p := cdr(a); !nilp(p); p = cdr(p) {
// write a space to separate expressions in the list.
bytesWritten, err = w.Write([]byte{' '})
totalBytesWritten += bytesWritten
if err != nil {
return totalBytesWritten, err
}
if p._type == AtomType_Pair {
// print the car of the list
bytesWritten, err = car(p).Write(w)
totalBytesWritten += bytesWritten
if err != nil {
return totalBytesWritten, err
}
} else {
// found an "improper list" (ends with a dotted pair).
// write dot then space to separate the dotted pair.
bytesWritten, err = w.Write([]byte{'.', ' '})
totalBytesWritten += bytesWritten
if err != nil {
return totalBytesWritten, err
}
// print the atom
bytesWritten, err = p.Write(w)
totalBytesWritten += bytesWritten
if err != nil {
return totalBytesWritten, err
}
// dotted pair ends a list, so quit the loop now
break
}
}
// write the closing paren
bytesWritten, err = w.Write([]byte{')'})
totalBytesWritten += bytesWritten
// and return
return totalBytesWritten, err
case AtomType_Symbol:
return w.Write(a.value.symbol.label)
}
panic(fmt.Sprintf("assert(_type != %d)", a._type))
}
By using recursion we can print arbitrarily complex data structures. (Actually that's not true: for a very deeply nested structure we will run out of stack space, and a self-referencing tree will never finish printing).
Because this is Go, we'll also implement the Stringer interface for atoms.
It's easy since we can let the Write method do the work for us.
// String implements the Stringer interface.
func (a Atom) String() string {
sb := &strings.Builder{}
if _, err := a.Write(sb); err != nil {
panic(err)
}
return sb.String()
}
See what print_expr does with various atoms:
| ID | Atom | Output |
|---|---|---|
| 1 | make_int(42)
| 42 |
| 2 | make_sym("FOO")
| FOO |
| 3 | cons(make_sym("X"), make_sym("Y"))
| (X . Y) |
| 4 | cons(make_int(1),
| (1 2 3) |
Quick note on that ID column: it matches the test case ID's in the chapter's test file.
All this is pretty trivial. We'll get on to some more interesting stuff in the next chapter.
Remember we said that we would treat identical symbols as being the same object? We can enforce that by keeping track of all the symbols created, and returning the same atom if the same sequence of characters is requested subsequently.
Languages with a set or hashtable container make this easy, but we can use the LISP data structures already implemented to store the symbols in a list:
// sym_table is a global symbol table.
// it is a list of all existing symbols.
var sym_table = _nil
// make_sym returns an Atom on the stack.
// The name of the symbol is always converted to uppercase.
// If the symbol already exists in the global symbol table, that symbol is
// returned. Otherwise, a new symbol is created on the stack, added to the
// symbol table, and returned. The new symbol allocates space for the name.
func make_sym(name []byte) Atom {
// make an upper-case copy of the name
name = bytes.ToUpper(name)
// search for any existing symbol with the same name
for p := sym_table; !nilp(p); p = cdr(p) {
if atom := car(p); bytes.Equal(name, atom.value.symbol.label) {
// found match, so return the existing symbol
return atom
}
}
// did not find a matching symbol, so create a new one
atom := Atom{
_type: AtomType_Symbol,
value: AtomValue{
symbol: &Symbol{
label: name,
},
},
}
// add it to the symbol_table
sym_table = cons(atom, sym_table)
// and return it
return atom
}
Neat, huh? It's not particularly efficient, but it will do fine for now.