Why Ocaml?
Kevin Murphy, 3 December 2002.
In Fall 2002, I started a project which involved a good mix of string processing, simple statistics and some simple data structures like hash tables and trees. A preliminary prototype in matlab was very slow, so I wanted to look for a more suitable language to implement it in. My desiderata are listed below.
The language should
Have an intepreter for rapid prototyping, ease of debugging, and maximum fun.
Have a native code (not just byte code) compiler that produces fast code that can be run stand-alone or called from the interactive environment.
Have good support for vectors, multi-dimensional arrays, strings, hash tables, etc. in the standard library.
Have a free implementation.
Work under linux and windows. (so I can transfer code easily between my desktop and my laptop.)
This left (in my mind) Lisp or ML, both of which meet the above desiderata. (Lisp is a functional programming language with imperative features; ML is a strong, statically-typed version of Lisp.) Deciding between Lisp and ML is harder...
ML vs Lisp
Popularity/ familiarity. Lisp is more widely known/used than ML (especially in AI). There is a lot of code already written in Lisp.
Type checking. ML is statically type checked (unlike lisp), which reduces errors and improves efficiency. Although Lisp allows one to declare types to improve efficiency, it is a bit ugly and not as powerful as ML. In addition, ML has type inference, which means it is not necessary to explicitly declare types. (The CMU CL compiler also does some type inference.)
Compilers. For ML, there are two free compilers: The Standard ML of New Jersey and Ocaml. The Ocaml compiler is somewhat more efficient than the SML/NJ compiler (see "Do you blow SML/NJ's socks off?"). In addition, Ocaml comes with some excellent libraries, and support for objects, making it preferable to SML in my opinion.
For Lisp, there are several compilers. Allegro lisp compiler is expensive. GNU Clisp compiler is free and portable, but has poor floating point performance. CMU common lisp is free and has good floating point performance, but only has a unix port.
Speed. According to The great computer language shootout, (see also the newer Computer language shootout benchmarks) Ocaml is the second fastest language - slower than C, but faster than C++. No matter how I changed the weights reflecting relative importance of speed, memory usage, lines of code, mathematical vs string processing, etc., it always came out in the top 3. I was skeptical, but the same results hold true in the Win32 version of the shootout, implemented independently.
Syntax. I have not yet gotten used to lisp syntax (it is said that lisp stands for "lost in superfluous parentheses"). On the other hand, Ocaml also has a few quirks, e.g., one must remember to write +. for real addition and + for integer addition. However, this seems quite natural. More importantly, people claim Lisp's macros can be used to define fancy syntactic sugar. Ocaml also has a preprocessor, but I haven't learned how to use it yet.
Speed of OCaml
The benchmarks above suggests the Ocaml compiler generates the second fastest code of any of the currently available compilers (gcc and the Intel C compilers being first). Given that Ocaml is also a beautiful language to program in, this is pretty compelling. But maybe the benchmarks are unreliable? See eg Ocaml is only fast if used imperatively, Slashdot 14 March 2005. This is possible. However, I found several other favorable reports on Ocaml's performance. e.g., this example, which implements the Sieve of Eratosthenes for computing primes in Ocaml and C. The Ocaml code is faster, even though the C code is well-written.
In addition, I found this quote from Doug McClain, on a detailed comparative study of C++, IDL, Fortran, SML, Ocaml, Dylan, Erlang, Clean, Haskell, Lisp, Mathematica for scientific computing: "And most importantly, the CAML version works, and it works properly every time. I am assured, having monitored its runtime behavior that there are no memory leaks. Furthermore, the quality of code generated by the CAML compiler has been analyzed by the Intel VTune system and it show no pipeline stalls, maximum parallelism between integer and floating point units, and machine assembly code that is as good or better than can be achieved by hand coding."
So I did my own experiment. It involved a lot of simple floating point arithmetic, plus some string matching. I found the following speedups relative to intrepreted matlab 6.1: Ocaml native code compiler: 10 times faster, Ocaml bytecode compiler: 2 times faster, Matlab mcc compiler: 1.4 times faster. (The matlab code has 670 lines of code, the equivalent Ocaml code has 989 lines.)
Ocaml links
Ocaml wiki
Cocan ocaml wiki
Doug Bagley's editorial on Ocaml.
Ocaml vs Ruby
Lisp vs Ocaml vs C++. In particular, Ocaml produces much smaller binaries than lisp.
PsiLab, Ocaml package for numerical computation.
LACAML, Ocaml interface to LAPACK, BLAS, etc.
Ocaml interface to GSL, the Gnu Scientific library.
Ocaml interface to FFTW, the fastest FFT in the West (a C library for DFTs, which was generated using Ocaml).
NML, a vectorized Ocaml-like language for numerical computing. (See also the NML announcement.) No longer updated.
OcamlMex, lets you call Ocaml code from Matlab.
Image processing in ocaml
Tries and other finite-state language processing tools.
Tries, bit vectors, heaps, etc..
Agrep, regular expression matching with errors.
OcamlDot, interface to Dot graph layout package.
Humps, long list of free Ocaml software.
Making Ocaml run fast: advice from the creators of the language.
Caml Weekly news, a good archive of edited emails
Summary of discussion on operator overloading, with emphasis on matrix operations
F#, Microsoft's way of combining Ocaml with C#.
Ocaml coding style guidelines, from a caltech class
IBAL, Pfeffer's stochastic/ decision theoretic agent language, implemented in Ocaml
Ocaml manual
Ocaml homepage
Maxent (logistic regression) code
Comparison of other languages
Click here for my comparison.
Going back to Matlab...
After making prototypes of my statistical language project in matlab and Ocaml, I found that the Ocaml version was about 10 times faster than matlab. However, Since January 2003, I have gone back to matlab because I have become more involved in computer vision projects, for which matlab is ideal (although Frank Dellaert uses Ocaml for vision), and because my collaborator uses Matlab.