Most computer languages have a variety of syntactic conventions (keywords, infix operators, brackets, operator precedence, dot notation, semicolons, etc.), but as a member of the Lisp family of languages, all of Scheme's syntax is based on lists in parenthesized prefix notation. This may seem unfamiliar, but it has the virtues of simplicity and consistency. (Some have joked that "Lisp" stands for "Lots of Irritating Silly Parentheses"; I think it stand for "Lisp Is Syntactically Pure".) Consider:
Note that the exclamation mark is not a special character in Scheme; it is just part of the name "set!". Only parentheses are special. A list such as (set! x y) with a special keyword in the first position is called a special form in Scheme; the beauty of the language is that we only need six special forms, plus three other syntactic constructions—variables, constants, and procedure calls:
Java Scheme if (x.val() > 0) {
z = f(a * x.val() + b[i]);
}(if (> (val x) 0)
(set! z (f (+ (* a (val x))
(aref b i)))))
Form | Syntax | Semantics and Example |
---|---|---|
variable reference | var | A symbol is interpreted as a variable name;
its value is the variable's
value. Example: x |
constant literal | number | A number
evaluates to itself. Examples: 12 or -3.45e+6 |
quotation | (quote exp) | Return the exp literally; do not evaluate it. Example: (quote (a b c)) ⇒ (a b c) |
conditional | (if test conseq alt) | Evaluate test; if true,
evaluate and return conseq; otherwise evaluate and return
alt. Example: (if (< 10 20) (+ 1 1) (+ 3 3)) ⇒ 2 |
assignment | (set! var exp) | Evaluate exp and assign that value to
var, which must have been previously defined (with a
define or as a parameter to an enclosing procedure).
Example: (set! x2 (* x x)) |
definition | (define var exp) | Define a new variable and give it
the value of evaluating the expression exp.
Examples: (define r 3) or (define square (lambda (x) (* x x))). |
procedure | (lambda (var...) exp) | Create a procedure
with parameter(s) named var... and the expression as the body. Example: (lambda (r) (* 3.141592653 (* r r))) |
sequencing | (begin exp...) |
Evaluate each of the expressions in left-to-right order, and
return the final value.
Example: (begin (set! x 1) (set! x (+ x 1)) (* x 2)) ⇒ 4 |
procedure call | (proc exp...) | If proc is anything other than one of the symbols if, set!, define, lambda, begin, or quote then it is treated as a procedure. It is evaluated using the same rules defined here. All the expressions are evaluated as well, and then the procedure is called with the list of expressions as arguments. Example: (square 12) ⇒ 144 |
In this table, var must be a symbol--an identifier such as x or square--and number must be an integer or floating point number, while the other italicized words can be any expression. The notation exp... means zero or more repetitions of exp.
To learn more about Scheme consult some of the fine books (by Friedman and Fellesein, Dybvig, Queinnec, Harvey and Wright or Sussman and Abelson), videos (by Abelson and Sussman), tutorials (by Dorai, PLT, or Neller), or the reference manual.
>> program = "(begin (define r 3) (* 3.141592653 (* r r)))" >>> parse(program) ['begin', ['define', 'r', 3], ['*', 3.141592653, ['*', 'r', 'r']]] >>> eval(parse(program)) 28.274333877We're using here the simplest possible internal representation, one where Scheme lists, numbers, and symbols are represented as Python lists, numbers, and strings, respectively.
def eval(x, env=global_env): "Evaluate an expression in an environment." if isa(x, Symbol): # variable reference return env.find(x)[x] elif not isa(x, list): # constant literal return x elif x[0] == 'quote': # (quote exp) (_, exp) = x return exp elif x[0] == 'if': # (if test conseq alt) (_, test, conseq, alt) = x return eval((conseq if eval(test, env) else alt), env) elif x[0] == 'set!': # (set! var exp) (_, var, exp) = x env.find(var)[var] = eval(exp, env) elif x[0] == 'define': # (define var exp) (_, var, exp) = x env[var] = eval(exp, env) elif x[0] == 'lambda': # (lambda (var*) exp) (_, vars, exp) = x return lambda *args: eval(exp, Env(vars, args, env)) elif x[0] == 'begin': # (begin exp*) for exp in x[1:]: val = eval(exp, env) return val else: # (proc exp*) exps = [eval(exp, env) for exp in x] proc = exps.pop(0) return proc(*exps) isa = isinstance Symbol = str
That's all there is to eval! ... well, except for environments.
Let's look at an example of what happens when we define and then call a Scheme procedure. First the definition.
>> program = "(define area (lambda (r) (* 3.141592653 (* r r)))" >>> parse(program) ['define', 'area', ['lambda', ['r'], ['*', 3.141592653, ['*', 'r', 'r']]]] >>> eval(parse(program)) None
In this evaluation we take the branch if x[0] == 'define', under which Python sets var to 'area' and exp to ['lambda', ['r'], ['*', 3.141592653, ['*', 'r', 'r']]]. Python then modifies the env (which at this point is the global environment), adding a new variable, 'area', and setting it to the result of evaluating exp. To evaluate exp, we follow the elif x[0] == 'lambda' in eval, branch, assigning the three variables (_, vars, exp) to the corresponding elements of the list x (and signalling an error if x is not of length 3). We then create a new procedure that, when called, will evaluate the expression ['*', 3.141592653 ['*', 'r', 'r']], in the environment formed by binding the formal parameters of the procedure (in this case there is just one parameter, r) to the arguments supplied in the procedure call, and in addition using the current environment for any variables not in the parameter list (for example, the variable *). The value of this newly-minted procedure is then assigned as the value of area in global_env.
Now what happens when we evaluate (area 3)? Since area is not one of the special form symbols, this must be a procedure call (the last else: of eval), and the whole list of expressions is evaluated, one at a time. Evaluating area yields the procedure we just created; evaluating 3 yields 3. We then (according to the last line of eval) call the newly-created procedure with the argument list [3]. this means evaluating exp, which is ['*', 3.141592653 ['*', 'r', 'r']], in the environment where r is 3 and the outer environment is the global environment, and therefore * is the multiplication procedure.
Now we're ready to explain the details of the Env class:
class Env(dict): "An environment: a dict of {'var':val} pairs, with an outer Env." def __init__(self, parms=(), args=(), outer=None): self.update(zip(parms,args)) self.outer = outer def find(self, var): "Find the innermost Env where var appears." return self if var in self else self.outer.find(var)Note that Env is a subclass of dict, which means that the ordinary dictionary operations work on it. In addition there are two methods, the constructor __init__ and find to find the right environment for a variable. The key to understanding this class (and the reason we need a class at all, rather than just using dict) is the notion of an outer environment. Consider this program:
(define make-account
(define a1 (make-account 100.00)) (a1 -20.00) |
Each rectangular box represents an environment, and the color of the box matches the color of the variables that are newly defined in the environment. In the last two lines we define a1 and call (a1 -20.00); this represents the creation of a bank account with a 100 dollar opening balance, followed by a 20 dollar withdrawal. In the process of evaluationg (a1 -20.00), we will eval the expression highlighted in yellow. There are three variables in that expression. amt can be found immediately in the innermost (green) environment. But balance is not defined there: we have to look at the green environment's outer env, the blue one. And finally, the variable + is not found in either of those; we need to do one more outer step, to the global (red) environment. This process of looking first in inner environments and then in outer ones is called lexical scoping. Env.find finds the right environment according to lexical scoping rules.
All that is left is to define the global environment. It needs to have + and all the other Scheme built-in procedures. We won't bother to implement them all, but we'll get a bunch by importing Python's math module, and then we'll explicitly add 20 popular ones:
def add_globals(env): "Add some Scheme standard procedures to an environment." import math, operator as op env.update(vars(math)) # sin, sqrt, ... env.update( {'+':op.add, '-':op.sub, '*':op.mul, '/':op.div, 'not':op.not_, '>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq, 'equal?':op.eq, 'eq?':op.is_, 'length':len, 'cons':lambda x,y:[x]+y, 'car':lambda x:x[0],'cdr':lambda x:x[1:], 'append':op.add, 'list':lambda *x:list(x), 'list?': lambda x:isa(x,list), 'null?':lambda x:x==[], 'symbol?':lambda x: isa(x, Symbol)}) return env global_env = add_globals(Env())
>>> program = "(set! twox (* x 2))" >>> tokenize(program) ['(', 'set!', 'twox', '(', '*', 'x', '2', ')', ')'] >>> parse(program) ['set!', 'twox', ['*', 'x', 2]]There are many tools for lexical analysis (such as Mike Lesk and Eric Schmidt's lex), but we'll use a very simple tool: Python's str.split. We just add spaces around each paren, and then call str.split to get a list of tokens.
Now for syntactic analysis. We have seen that Lisp syntax is very simple, but some Lisp interpreters have made the job of syntactic analysis even easier by accepting as a program any string of characters that represents a list. In other words, the string (set! 1 2) would be accepted as a syntactically valid program, and only when executed would the interpreter complain that set! requires the first argument to be a symbol, not a number. In Java or Python, the equivalent statement, 1 = 2, would be recognized as an error at compile time. On the other hand, Java and Python are not required to detect at compile time that the expression x/0 is an error, so you see it is not always strictly determined when an error should be recognized. Lispy implements parse using read, a function we will define to read any expression (number, symbol, or nested list).
read works by calling read_from on the tokens obtained by tokenize. Given a list of tokens, we start by looking at the first token; if it is a ')' that's a syntax error. If it is a '(', then we start building up a list of expressions until we hit a matching ')'. Anything else must be a symbol or number, which forms a complete expression by itself. The only remaining trick is knowing that '2' represents an integer, 2.0 represents a float, and x represents a symbol. We'll let Python make this distinction: for each non-paren token, first try to interpret it as an int, then as a float, and finally as a symbol. Following these instructions we get:
def read(s): "Read a Scheme expression from a string." return read_from(tokenize(s)) parse = read def tokenize(s): "Convert a string into a list of tokens." return s.replace('(',' ( ').replace(')',' ) ').split() def read_from(tokens): "Read an expression from a sequence of tokens." if len(tokens) == 0: raise SyntaxError('unexpected EOF while reading') token = tokens.pop(0) if '(' == token: L = [] while tokens[0] != ')': L.append(read_from(tokens)) tokens.pop(0) # pop off ')' return L elif ')' == token: raise SyntaxError('unexpected )') else: return atom(token) def atom(token): "Numbers become numbers; every other token is a symbol." try: return int(token) except ValueError: try: return float(token) except ValueError: return Symbol(token)
Finally we'll add a function, to_string, to convert an expression back into a Lisp-readable string, and a function repl, which stands for read-eval-print-loop, to form an interactive Lisp interpreter:
def to_string(exp): "Convert a Python object back into a Lisp-readable string." return '('+' '.join(map(to_string, exp))+')' if isa(exp, list) else str(exp) def repl(prompt='lis.py> '): "A prompt-read-eval-print loop." while True: val = eval(parse(raw_input(prompt))) if val is not None: print to_string(val)Here it is at work:
>>> repl() lis.py> (define area (lambda (r) (* 3.141592653 (* r r)))) lis.py> (area 3) 28.274333877 lis.py> (define fact (lambda (n) (if (<= n 1) 1 (* n (fact (- n 1)))))) lis.py> (fact 10) 3628800 lis.py> (fact 100) 9332621544394415268169923885626670049071596826438162146859296389521759999322991 5608941463976156518286253697920827223758251185210916864000000000000000000000000 lis.py> (area (fact 10)) 4.1369087198e+13 lis.py> (define first car) lis.py> (define rest cdr) lis.py> (define count (lambda (item L) (if L (+ (equal? item (first L)) (count item (rest L))) 0))) lis.py> (count 0 (list 0 1 2 3 0 0)) 3 lis.py> (count (quote the) (quote (the more the merrier the bigger the better))) 4
bash$ grep "^\s*[^#\s]" lis.py | wc 90 398 3423
From there Tony and I split paths. He reasoned that the hard part was the interpreter for expressions; he needed Lisp for that, but he knew how to write a tiny C routine for echoing the non-Lisp characters and link it in to the Lisp program. I didn't know how to do that linking, but I reasoned that writing an interpreter for this trivial language (all it had was set variable, fetch variable, and string concatenate) was easy, so I wrote an interpreter in C. So, ironically, Tony wrote a Lisp program (with one small routine in C) because he was a C programmer, and I wrote a C program because I was a Lisp programmer.
In the end, we both got our theses done (Tony, Peter).
################ Lispy: Scheme Interpreter in Python ## (c) Peter Norvig, 2010; See http://norvig.com/lispy.html ################ Symbol, Env classes from __future__ import division Symbol = str class Env(dict): "An environment: a dict of {'var':val} pairs, with an outer Env." def __init__(self, parms=(), args=(), outer=None): self.update(zip(parms,args)) self.outer = outer def find(self, var): "Find the innermost Env where var appears." return self if var in self else self.outer.find(var) def add_globals(env): "Add some Scheme standard procedures to an environment." import math, operator as op env.update(vars(math)) # sin, sqrt, ... env.update( {'+':op.add, '-':op.sub, '*':op.mul, '/':op.div, 'not':op.not_, '>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq, 'equal?':op.eq, 'eq?':op.is_, 'length':len, 'cons':lambda x,y:[x]+y, 'car':lambda x:x[0],'cdr':lambda x:x[1:], 'append':op.add, 'list':lambda *x:list(x), 'list?': lambda x:isa(x,list), 'null?':lambda x:x==[], 'symbol?':lambda x: isa(x, Symbol)}) return env global_env = add_globals(Env()) isa = isinstance ################ eval def eval(x, env=global_env): "Evaluate an expression in an environment." if isa(x, Symbol): # variable reference return env.find(x)[x] elif not isa(x, list): # constant literal return x elif x[0] == 'quote': # (quote exp) (_, exp) = x return exp elif x[0] == 'if': # (if test conseq alt) (_, test, conseq, alt) = x return eval((conseq if eval(test, env) else alt), env) elif x[0] == 'set!': # (set! var exp) (_, var, exp) = x env.find(var)[var] = eval(exp, env) elif x[0] == 'define': # (define var exp) (_, var, exp) = x env[var] = eval(exp, env) elif x[0] == 'lambda': # (lambda (var*) exp) (_, vars, exp) = x return lambda *args: eval(exp, Env(vars, args, env)) elif x[0] == 'begin': # (begin exp*) for exp in x[1:]: val = eval(exp, env) return val else: # (proc exp*) exps = [eval(exp, env) for exp in x] proc = exps.pop(0) return proc(*exps) ################ parse, read, and user interaction def read(s): "Read a Scheme expression from a string." return read_from(tokenize(s)) parse = read def tokenize(s): "Convert a string into a list of tokens." return s.replace('(',' ( ').replace(')',' ) ').split() def read_from(tokens): "Read an expression from a sequence of tokens." if len(tokens) == 0: raise SyntaxError('unexpected EOF while reading') token = tokens.pop(0) if '(' == token: L = [] while tokens[0] != ')': L.append(read_from(tokens)) tokens.pop(0) # pop off ')' return L elif ')' == token: raise SyntaxError('unexpected )') else: return atom(token) def atom(token): "Numbers become numbers; every other token is a symbol." try: return int(token) except ValueError: try: return float(token) except ValueError: return Symbol(token) def to_string(exp): "Convert a Python object back into a Lisp-readable string." return '('+' '.join(map(to_string, exp))+')' if isa(exp, list) else str(exp) def repl(prompt='lis.py> '): "A prompt-read-eval-print loop." while True: val = eval(parse(raw_input(prompt))) if val is not None: print to_string(val)