LTD: A Lisp To Dylan Conversion Tool
The program LTD is a Common Lisp program to help convert
Common Lisp programs to Dylan. It does as much as is reasonably
possible, aided by user-settable preferences, but there are often
constructs that cannot be translated directly to Dylan. These are
left alone, but are flagged by comments so that they can be changed by
hand.
Installing LTD
The installation process is as follows:
- Move to a directory where you want the code to be.
- Obtain the file ltd.tar .
- Unpack it with tar -xf ltd.tar; it contains Lisp source code, HTML
documentation, Dylan library code, and test files (Lisp files and the corresponding translated Dylan files).
- Start Common Lisp and evaluate (load "load.lisp") and (load-ltd :compile t)
- Optionally save out an image with LTD installed. In Lispworks this would be
(lw:save-image "filename")
Using LTD
First, either start Lisp and repeat step 4 from above, or start up the
saved image from step 5 above.
The main entry point in LTD is the
function ltd-files. It operates on a list of lisp files,
translating each one to a dylan file. A simple call
is (ltd-files "~user/dir/*.lisp"). This will convert each
".lisp" file, creating a corresponding ".dylan" file. A more complex
call is (ltd-files '("a.lisp" "b.lisp") :output
"dylan-dir/*.dylan"). This converts the two lisp files "a" and
"b", placing the results in the "dylan-dir" subdirectory.
To test your installation fully, run (test-ltd) to convert
all 39 files in the test directory. As a
simpler test example, run (ltd-files "test/TEST.lisp") and see the output
in the file "test/TEST.dylan".
Conversion Options
There are a number of options that the user can set to change the way
conversion is done. Each option has a name (e.g., :tab-stop),
s default value (e.g., 2), a type (e.g., integer),
and a description (e.g., "Number of spaces to indent for each block".
Each option must take on either a value of the specified type, or the constant :?, which means that the
Option values can be changed with the functions set-option or
new-options. Here are some examples:
;; Make each new block indent 4 spaces:
(set-option :tab-stop 4)
;; Change several options at once:
(new-options :tab-stop 4 :comments '/* :end-name nil)
;; Prompt the user for the correct way to convert each NIL:
(set-option :nil-as :?)
;; Prompt the user for every decision except tab stop
(new-options :? t :tab-stop 2)
The complete list of default options is in options.lisp.
Inside LTD, the Lisp to Dylan converter
LTD works by first reading a Lisp s-expression with ltd-read,
then converting it to an internal s-expression representation which represents
a Dylan parse tree with cvt-exp, and then printing the
internal representation with dpp-exp (DPP stands for Dylan
Pretty-Print). For example, calling ltd-read on the fragment
of source file
;; Compute sign of x
(cond ((< x 0) -1)
((> x 0) 1)
(t 0))
results in the expression
#S(com :comment " Compute sign of x"
:code (cond ((< x 0) -1)
((> x 0) 1)
(t 0)))
which gets translated by cvt-exp to the internal form
#S(com :comment " Compute sign of x"
:code (IF (< X 0) -1 (:ELSIF (> X 0) 1) (:ELSE 0)))
which then gets printed by dpp-exp as something like
// Compute sign of x
if (x < 0) -1; elsif (x > 0) 1; else 0; end
or perhaps as
/* Compute sign of x */
if (x < 0)
-1;
elsif (x > 0)
1;
else
0;
end
We cover the three main functions (ltd-read, cvt-exp and dpp-exp) in turn.
LTD-READ
The function ltd-read is just like read, except that
- The file positions of each non-atomic expression within the file is retained in
a table, *file-position-table*, for use
in printing error messages.
- The empty list () is read as the symbol |()|, while nil is read as nil.
- Comments in the source input are retained by wrapping a
#S(com ...) form around the following expression.
Note that both comments delimited by semicolon and by #|
... |# are supported. However, the strategy of attaching
comments to the following expression has a problem: sometimes there is
no following expression. Comments that appear just before a right
paren or a dot in a dotted list are silently ignored.
Note that ltd-read uses the readtable
ltd-readtable. If you want to convert a program that uses
its own readmacros in its source files, then you will have to make
sure that those readmacros are defined within *ltd-readtable*.
(We should probably provide an easy way to do this.)
CVT-EXP
After a form has been read, it is converted by cvt-exp as follows:
- Constant variables are converted according to Dylan conventions;
for example, pi becomes $pi, and *global* (if it is
defined as a constant) becomes $global. Note that LTD will only know
that user-defined variables are defined as constants if the source code is
loaded before it is converted to Dylan. For this and other
reasons, we recommend loading your source code before you convert it, but good
results can also be obtained without pre-loading the code.
- Symbols that violate Dylan lexical rules are converted. For
example, ~3 is a valid symbol in Lisp; we translate that
to %3 in Dylan.
- Other atoms are left unchanged.
- Lists that start with a symbol that has been
given a conversion function via ltd-fn are
converted by that function.
- Lists that start with a symbol that has a macro definition are macroexpanded,
then converted. (This is another reason why it can be a good idea to load the code first,
to get the macro definitions. However, you may prefer to write ltd-fn
specifications for the macros, or to leave them unexpanded.)
- Lists that start with a non-symbol are converted by converting each element of the list.
The bulk of the work is in defining conversion functions
via ltd-fn.
Calls to this macro are of the form (ltd-fn Lisp
Dylan). There are three formats that this macro accepts.
Examples are:
(ltd-fn car head)
(ltd-fn ecase #'cvt-ecase)
(ltd-fn (some f . x*) `(any? ,f . ,x*))
The first example says that the Lisp function car corresponds
to the Dylan function head. This means both that
#'car should translate to head, and that (car
x) should translate to (head x'), where
x' is the result of converting x.
The second example says that to translate an expression of the form
(ecase ...), call the Lisp function cvt-ecase with the
expression as its sole argument, and use the result as the translation.
The third form is more complex. It says the following:
- Given a call such as (some 'evenp (rest numbers)), match (via destructuring-bind)
f against the first argument, 'evenp and x* against the rest, ((rest numbers)).
- Convert the arguments to Dylan, based on their name.
For example, the variable name
f is always converted as a function, so it becomes even?.
The variable x* is converted to the list ((tail numbers)).
The exact conversion rules are below.
- Execute the right hand side (i.e. `(any? ,f . ,x*) in this case)
with the variables f and x* bound.
In addition, the variable exp is bound to
the whole expression. Return the result. In this case it would be (any? even? (tail numbers)), which is printed as the Dylan expressionany?(even? tail(numbers)).
- If the result is of the form (let ...), then convert it to
(begin (let ...)). This is because Dylan lets must appear at
the top level of a body, unlike Lisp lets. (The functioncvt-body
looks for and expunges unneeded begins. It was more succint and less
error-prone to make this happen automatically
for every call to ltd-fn, rather than
relying on the programmer to get each one right.)
- If the original call to the conversion function is of the form #'some,
then something quite different happens. In this case,
The automatic conversion of variable names on the left-hand side of
ltd-fn is tricky, and takes some getting used to, but it
makes for very compact code. The rules are:
- A name ending in *, such as x* in the example above, represents a list; it is expanded with cvt-exps.
- The name keys is left unconverted, except that if :test-not fn appears in the list, it is converted to :test (complement fn).
- The names f and pred are converted via cvt-fn.
That is 'car would convert to head, not #"car" if
it were bound to a variable named f.
- The names name, ignore, and asis are not
converted.
- The name body is converted by cvt-body, which handles
declarations followed by a list of expressions.
- The names type and class are converted by cvt-type.
That is, 'list is converted to <list>, not #"list"
when it is bound to a variable named type.
- Ther names stdin and stdout convert via cvt-exp,
except that they have default values *standard-input* and *standard-output*, respectively.
Dylan Parse Tree Language
The complete internal parse tree language is as follows:
exp :=
(BEGIN exp*)
(BLOCK name exp*)
(CASE branch*)
(DEFINE-CLASS name args class-slot)
(DEFINE-CONSTANT bindings exp)
(DEFINE-GENERIC name args)
(DEFINE-FUNCTION name args exp*)
(DEFINE-METHOD name args exp*)
(DEFINE-VARIABLE bindings exp)
(DEFINE-MODULE name exp*)
(FLUID-BIND bindings exp)
(FOR for-clause* exp*)
(IF test exp*)
(LET bindings init)
(METHOD args exp*)
(LET-HANDLER exp exp exp*)
(LOCAL exp*)
(QUOTE exp)
(SELECT exp branch*)
(UNLESS test exp*)
(UNTIL test exp*)
(WHILE test exp*)
(:CLEANUP exp*) Allowed only in BLOCK
(:ELSEIF test exp*) Allowed only in IF
(:ELSE exp*) Allowed only in IF
(:EXCEPTION exp*) Allowed only in BLOCK
(:FINALLY exp*) Allowed only in FOR
(:LOCAL-METHOD name args exp*) Allowed only in :LOCAL
(:CLAUSE [export|use] name*) Allowed only in DEFINE-MODULE
(:RETURN exp*) Allowed only as first exp in METHOD
E.g. (:RETURN y x)
test :=
exp
branch :=
(:BRANCH branch-lhs exp*)
branch-lhs :=
exp
(:LIST-BARE exp*) This is like :LIST, but doesn't print parens.
for-clause :=
(:FOR-CLAUSE exp*) E.g. (:for-clause until (done? thing))
(:LIST-BARE for-clause*)
args :=
(:LIST exp) E.g. (:LIST x (|::| y <number>))
bindings :=
exp E.g. x
(:LIST exp*) E.g. (:LIST x (|::| y <number>))
class-slot :=
(:SLOT name exp*)
(:SLOT-KEYWORD name exp*)
Note that this is quite distinct from the old prefix-Dylan syntax
defined in the 1992 Dylan manual. The rationale is that we wanted a
representation that would be easy to print, with only local knowledge of each
expression. So a representation like (cond (test val) ...) is a bad
one, becuase you have to know that the (test val) list is
treated differently than other lists; this requires non-local
knowledge. The representation (case (:branch test val) ...) is a
good one, because the :branch identifies the constituent.
Note that the choice of a good local representation means that the
printing routines never have to "parse" the representation. One
pleasant result of this is that the printing routines don't have to
worry about comments. That is, they never have to say "if the
following expression is a symbol, or a comment expression
wrapped around a symbol, then ...". There is some of this kind of logic
inside the conversion routines, but none in the printing routines.
Actually, the LTD program and its author may have been inconsistent in the use of
:LIST vs :ARGS; I should go back and staighten that out. The idea is
that a :LIST expression is printed (by dpp-list) as a comma
separated list, while a :ARGS expression is printed (by
dpp-args) the same, except that there is no comma between a
keyword and a following expression. :LIST-BARE and :ARGS-BARE are
analagous, except they do not print the parentheses around the
arguments.
To get a better understanding of this, try converting some Common Lisp
forms with cvt-exp and looking at the output.
Inside DPP, the Dylan pretty-printer
The actual printing is handled by DPP (Dylan Pretty-Printer), in the
file dpp.lisp. It has the main entry
point dpp-exp. The call (dpp-exp x :stream
s) prints the internal s-expression x on the
stream s. The function dpp-exp accepts all the keyword
arguments that write does.
There should be less need to fiddle with DPP than with LTD. Most of
the customization you will want to do (such as changing the number of
spaces to indent) can be done with the printing
options. However, if you really need more control over the
output, you can make changes to the dpp entries in the file
dpp.lisp. Here are four example
entries to give you an idea of how dpp works:
(dpp 'symbol dpp-symbol)
(dpp (if) dpp-conditional)
(dpp (:else) dpp-unindented)
(dpp (begin) (format s "~@<begin~W~:>" `(:body (end) ,@(rest x))))
The first example says that all expressions of type symbol will be
handled by calling the function dpp-symbol with two
arguments: the stream to print on and the symbol to print. In place
of 'symbol, you can use any (quoted) Common Lisp type specifier.
In the second example, (if) is used to indicate a list whose
first element is if. The example says that all such lists
are printed by a function named dpp-conditional. Again, the
arguments are the stream and the expression.
The third example is very similar; its purpose is to show the
convention that elements of the Dylan parse tree that represent parts
of statements start with keyword symbols (such as :else),
whereas those that represent complete statements (such as if)
are non-keyword symbols.
The fourth example says that all lists that start with the symbol begin
will be printed by calling the function
(lambda (s x) (format s "~@<begin~W~:>" `(:body (end) ,@(rest x))))
That is, the expression is called with s bound to the stream
and x bound to the expression to print. There are two things
worth discussing in this example.
First, the expression `(:body (end) ,@(rest x)) is the
internal parse tree representation for a body that should be followed by just an
"end" (rather than, say, "end method" or "end method name").
Second, the format string "~@<begin~W~:>" makes use of
pretty-printing format directives from Chapter 27 of Common Lisp
the Language: 2nd Edition. You will need to understand this
chapter before you can make changes to DPP.
Please observe the convention that printing blocks surround a whole
statement; the dpp-body function does not start a new
printing block. Sub-parts may be put in blocks.
To Do List
LTD has been tested on about 35,000 lines (1.1MB in 39 files) of Lisp
code (in the test directory) taken from an
archive of Software
from AI books and several other sources. By "tested", I mean that
LTD runs without crashing on two platforms (Lispworks and Lucid Common
Lisp, on a Sparc 20 running SunOS 4.1.4), and that a quick look at the
generated Dylan code reveals no completely obvious errors, other than
the functions that are known to be not implemented. It takes about 3
minutes to translate the code. The resulting Lisp is 42,000 lines and
1.5MB. The "code bloat" seems to generally run around 20% to 30% for
Lisp to Dylan, but note that the Dylan code does not include the body
of macros (since we don't know how to automatically translate them),
so it is actually somewhat worse.
The major problems (and potential risks) in using LTD appear to be:
- Unimplemented Common Lisp functions. DEFMACRO, READ, LOAD, SET-MACRO-CHARACTER
and EVAL seem to be the most important ones.
- Incomplete specification of some Common Lisp functions. We know that FORMAT
and LOOP are not completely specified, and some others may be too.
- Possible failure to translate sub-pieces. For "regular" functions, all the
arguments should be translated to Dylan, and I think this has been done properly.
However, the translation of special forms and macros need to be
pieced together bit by bit, and its possible we may have spliced in a raw
Lisp portion where we should have spliced in a translated Dylan portion.
This is hard to test
because trivial expressions like x or 1 are the same in Lisp and Dylan, so something
that seems to work on a simple case may in fact be buggy.
- Run-time errors due to comments and/or the |()| symbol showing up in unexpected
places. An effort was made to find these, but some more may be lurking.
You may be interested in seeing the counts of unimplemented functions over
the 1.1MB of Lisp test files:
> (ltd-test)
...
Counts of unimplmented functions:
221 DEFMACRO
51 READ
34 SET-MACRO-CHARACTER
22 LOAD
22 EVAL
13 REQUIRE
8 SHADOW
8 READ-FROM-STRING
6 GET-INTERNAL-RUN-TIME
5 DEFSETF
5 PROVIDE
5 GET-SETF-METHOD
5 MERGE-PATHNAMES
4 FBOUNDP
4 DIRECTORY
4 PATHNAME-NAME
4 COMPILE
3 TRACE
3 RASSOC
3 MAKE-STRING-OUTPUT-STREAM
3 FIND-PACKAGE
3 GET-OUTPUT-STREAM-STRING
2 READ-PRESERVING-WHITESPACE
2 MAKE-STRING-INPUT-STREAM
2 BOUNDP
2 COPY-READTABLE
2 COPY-TREE
2 INTEGER-LENGTH
2 SET-EXCLUSIVE-OR
2 Y-OR-N-P
2 SYMBOL-PACKAGE
1 PROBE-FILE
1 USE-PACKAGE
1 PRINC-TO-STRING
1 GET-MACRO-CHARACTER
1 PROGV
1 PATHNAME-TYPE
1 MAKE-RANDOM-STATE
1 COMPLEX
1 MAKE-SYNONYM-STREAM
1 MAKE-PACKAGE
1 READ-DELIMITED-LIST
1 IMPORT
1 UNINTERN
1 SET
1 MACROEXPAND
1 TREE-EQUAL
Below is a list of enhancements that should be made; please let Peter Norvig know of additional
problems you encounter or functionality you desire.
- Add support for the over 200 unimplemented Common Lisp functions, especially the ones listed above.
These are mostly little-used functions, but when you use the function,
its no longer little-used.
- Add an interface to the LispWorks editor and/or GNU/Xemacs to make it
easier to take the warnings and edit the resulting code.
- Make clear what library and module some of the CL functions come from.
Right now, we assume you will just somehow know which libraries to include.
- There should be a more straightforward way to load a system or set of files,
remember all the macro definitions, constant declarations and read macros, and
then call ltd-files on the files.
- The lexical environment should be tracked. If you define a local function
with flet that happens to have the same name as a global Lisp function,
you should get the local function when you call it from inside the body, not
the Dylan translation of the global function.
- We should handle name clashes that result from translation. E.g., we translate
car to head, but if the Lisp program defines a function named
head, that should be translated to something else.
- We still don't handle the two/one name space problem. That is,
(let ((list '())) (list list))
is legal Lisp, but
begin let list = #(); list(list) end
which is what we would translate to, would give an error in Dylan.
- We need to deal with packages (in-package, export, import) better. Currently,
each EXPORT or DEFPACKAGE translates to a define module, but we really should gather information from everyehwere and combine them into one big definition.
- Support for loop is incomplete. Jeremy would like all except package iteration. We're close to that, but not quite. In the 1.1MB of Lisp code tested,
there was only one LOOP clause not handled: use of destructuring-bind in an IN clause
(loop for (literal-list label) in literal-lists ...).
- Support for format is incomplete. Jeremy would especially
like ADS[{^*, and would ultimately like all except BOX$GTP&<. We convert to FORMATTER output, but that is not a great solution.
Done List
The following bug reports have been fixed:
- (coerce x 'list) goes to as(<list>, x), not as(x, <list>).
- new-options had an infinite loop.
- Problem with superclasses and options in defstruct fixed.
- Multiple comments with nothing to attach to cause problems. For example:
(
;
;
)
An even trickier case is
(; comment
#+feature x)
Fixing this required some additional read-table hacking.
- Macro must-be-call was incorrectly used before defined.
- A CLtL1-style `loop' form (no atoms, repeat forever) does not get a
`block(return)' generated for it.
- The following expression was translated into something with three block statements:
(dolist (x (rest choices) nil)
(let ((result (stock-random-objects x put-in)))
(when result (return result))))
- The `ecase' form wasn't handled right -- the values are not treated as
quoted, and a default error clause is generated which is not needed because
Dylan `select' defaults to an error anyway.
- The reader distinguishes NIL and (), which gets converted to |()|,
but the code doesn't always expect a |()| where a list can appear. Similarly,
the code doesn't always check for possible comments. Added functions
null/, first/, rest/, second/, which are the same as their slash-less counterparts,
except that they handle |()| and comments. Did a global replace, so these
functions are propobably called in places where they are not really needed.
- Added &allow-other-keys to new-options.
- Fixed a bug in WHEN.
- Fixed with-slots to not introduce a local variable when the argument is atomic.
- Fixed cvt-defclass to handle comments on slots.
- Fived cvt-if-not to handle #'remove-if-not as well as (remove-if-not ...).
- Added conversion functions for symbol-function and symbol-value.
- Added ACROSS, FOR ... ON, UP/DOWNFROM, UP/DOWNTO clause to loops.
- Fixed translation of (- x) and (/ x); special case for (- ).
- Fixed doc strings in defstructs.
- Reading #2A( ... ) gives an error when there is a comment in the ...,
so I turned off comment reading within.
- Macroexpanding sometimes gave errors because of |()| and comments.
Added a to-normal-lisp function to get rid of these. Somewhat problematic.
Credits for LTD
LTD was written by Peter
Norvig as a side project in Aug/Sep 1995 and is copyright by Harlequin. Much of the DPP
pretty-printing code is adapted from the Harlequin Dylan file infixify.dylan
in the directory ~dylan/tools, written by Jonathan Bachrach and Paul
Haahr, and copyright by Harlequin Ltd (although none of that code was
copied verbatim, since it is in Dylan, not Lisp). Much of the LTD conversion code is adapted from
Harlequin file lisp-to-dylan.lisp
in the directory ~swm/dylan, written by Scott McKay and copyright by
Harlequin. Some of that code was copied verbatim. The motivation for LTD
was that swm's code worked by
doing replacements in a editor buffer, and so was not portable. Also,
I felt the Common Lisp pretty-printing system could do a better job
of indenting, in most cases.
LTD requires Common Lisp with the pretty-printer that is
documented in Chapter 27 of Common Lisp: the Language 2nd
Edition, by Guy L. Steele Jr, Digital Press, 1990, and in even
greater detail in MIT AI Lab Memo MIT/AIM-1102a, July 1989, and in the article "Using
the New Common Lisp Pretty Printer", by
Richard C. Waters, Lisp Pointers, V, 2, 1992.