Lisp as an Alternative to Java

In the October 1999 Communications of the ACM Lutz Prechelt had an interesting article entitled Comparing Java vs. C/C++ Efficiency Issues to Interpersonal Issues which asked 38 programmers to implement versions of a program in C, C++, or Java. The conclusions showed that Java was 3 or 4 times slower than C or C++, but that the variance between programmers was larger than the variance between languages, suggesting that one might want to spend more time on training programmers rather than arguing over language choice. (Or, suggesting that you should hire the good programmers and avoid the bad ones.) The variance for Java was lower than for C or C++. (Cynics could say that Java forces you to write uniformly slow programs.) I applaud this line of work, and hope that more studies of this kind will be done.

It turns out my hopes were answered. First, Prechelt published another article that covers Tcl, Python, Perl, and Rexx. Also, Ron Garret (nee Erann Gat) did a follow-up study in which he asked programmers to write Prechelt's test program in Lisp. His results show that the resulting Lisp programs ran faster on average than C, C++ or Java programs (although the fastest Lisp program was not as fast as the fastest C program), and that Lisp programs took less development time than the other languages.

I did not participate in the study, but after I saw it, I wrote my version in Lisp. It took me about 2 hours (compared to a range of 2 to 8.5 hours for the other Lisp programmers in the study, 3 to 25 for C/C++ and 4 to 63 for Java) and I ended up with 45 non-comment non-blank lines (compared with a range of 51 to 182 for Lisp, and 107 to 614 for the other languages). (That means that some Java programmer was spending 13 lines and 84 minutes to provide the functionality of each line of my Lisp program.)

Here is my program:

;; Peter Norvig - Programming Challange from Erann Gat:
;; Given a list of words and a list of phone numbers, find all the ways that
;; each phone number can be expressed as a list of words.
;; Run: (main "word-list-file-name" "phone-number-file-name")

(defvar *dict* nil 
  "A hash table mapping a phone number (integer) to a list of words from the 
  input dictionary that produce that number.")

(defun main (&optional (dict "dict") (nums "nums") (dict-size 100))
  "Read the input file ĘDICT and load it into *dict*.  Then for each line in 
  NUMS, print all the translations of the number into a sequence of words,
  according to the rules of translation."
  (setf *dict* (load-dictionary dict dict-size))
  (with-open-file (in nums)
    (loop for num = (read-line in nil) while num do
          (print-translations num (remove-if-not #'digit-char-p num)))))

(defun print-translations (num digits &optional (start 0) (words nil))
  "Print each possible translation of NUM into a string of words.  DIGITS
  must be WORD with non-digits removed.  On recursive calls, START is the
  position in DIGITS at which to look for the next word, and WORDS is the list
  of words found for (subseq DIGITS 0 START).  So if START gets to the end of
  DIGITS, then we have a solution in WORDS.  Otherwise, for every prefix of
  DIGITS, look in the dictionary for word(s) that map to the value of the
  prefix (computed incrementally as N), and for each such word try to extend
  the solution with a recursive call.  There are two complications: (1) the
  rules say that in addition to dictionary words, you can use a single 
  digit in the output, but not two digits in a row. Also (and this seems 
  silly) you can't have a digit in a place where any word could appear.
  I handle this with the variable FOUND-WORD; if it is false after the loop, 
  and the most recent word is not a digit, try a recursive call that pushes a 
  digit. (2) The other complication is that the obvious way of mapping 
  strings to integers would map R to 2 and ER to 02, which of course is 
  the same integer as 2.  Therefore we prepend a 1 to every number, and R 
  becomes 12 and ER becomes 102."
  (if (>= start (length digits))
      (format t "~a:~{ ~a~}~%" num (reverse words))
      (let ((found-word nil)
            (n 1)) ; leading zero problem
        (loop for i from start below (length digits) do
              (setf n (+ (* 10 n) (nth-digit digits i)))
              (loop for word in (gethash n *dict*) do
                 (setf found-word t)
                 (print-translations num digits (+ 1 i) (cons word words))))
        (when (and (not found-word) (not (numberp (first words))))
           (print-translations num digits (+ start 1) 
                               (cons (nth-digit digits start) words))))))

(defun load-dictionary (file size)
  "Create a hashtable from the file of words (one per line).  Takes a hint
  for the initial hashtable size.  Each key is the phone number for a word;
  each value is a list of words with that phone number."
  (let ((table (make-hash-table :test #'eql :size size)))
    (with-open-file (in file)
      (loop for word = (read-line in nil) while word do
        (push word (gethash (word->number word) table))))

(defun word->number (word)
  "Translate a word (string) into a phone number, according to the rules."
  (let ((n 1)) ; leading zero problem
    (loop for i from 0 below (length word) 
          for ch = (char word i) do
          (when (alpha-char-p ch) (setf n (+ (* 10 n) (char->digit ch)))))

(defun nth-digit (digits i) 
  "The i-th element of a character string of digits, as an integer 0 to 9."
  (- (char-code (char digits i)) #.(char-code #\0)))

(defun char->digit (ch)
  "Convert a character to a digit according to the phone number rules."
  (ecase (char-downcase ch)
    ((#\e) 0)
    ((#\j #\n #\q) 1)
    ((#\r #\w #\x) 2)
    ((#\d #\s #\y) 3)
    ((#\f #\t) 4)
    ((#\a #\m) 5)
    ((#\c #\i #\v) 6)
    ((#\b #\k #\u) 7)
    ((#\l #\o #\p) 8)
    ((#\g #\h #\z) 9)))

Two notes: (1) At first, I thought I had beaten the other Lisp programmers (and everyone else) by finishing in 1.5 hours, but then I realized that I had cheated: the directions said to code as you would professionally, and that means I should comment my work. It took 30 minutes more to put in the comments and clean up a little. (2) I was lucky in that when I was almost done, I realized that I had a bug: I couldn't distinguish between the number 123 and 0123. I was about to change numbers to strings (that is, 123 to "123" and 0123 to "0123"), which would have taken 15 minutes or so, when I realized that I could fix the bug with a one-character change: change the initialization of n from 0 to 1, so that 123 is represented as 1123 and 0123 as 10123. With that fix, everything seemed to work fine.


This article is available in Czech, thanks to Daniela Milton.

Peter Norvig