The Colm Programming Language
COmputer Language Manipulation

What is Colm?

Colm is a programming language designed for the analysis and transformation of computer languages. Colm is influenced primarily by TXL. It is in the family of program transformation languages.

A transformation language has a type system based on formal languages. Rather than define classes or data structures, one defines grammars. A parser is constructed automatically from the grammar, and the parser is used for two purposes: to parse the input language, and to parse the structural patterns in the program that performs the analysis. In this setting, grammar-based parsing is critical because it guarantees that both the input and the structural patterns are parsed into trees from the same set of types, allowing comparison.

Colm's main contribution lies in the parsing method. Colm's parsing engine is generalized, but it also allows for the construction of arbitrary global data structures that can be queried during parsing. In other generalized methods, construction of global data requires some very careful consideration because of inherent concurrency in the parsing method. It is such a tricky task that it is often avoided altogether and the problem is deferred to a post-parse disambiguation of the parse forest.

Using Colm it is possible to get the correct parse tree on the first pass of the input, granted that some backtracking is acceptable. Colm eliminates the need to reason about concurrent updates to global data or to acquire many possible parse trees, only to throw away the incorrect ones.


Feb 1, 2014: colm-0.12.0.tar.gz
   (literal tokens defined using backtick, bugfixes, much more: ChangeLog)

May 26, 2013: colm-0.11.tar.gz
   (ref and ptr require <> around the type, print semantcs changed, more )

April 18, 2013: colm-0.10.tar.gz
   (self-hosted grammar, parse expressions return trees, more)

Feb 19, 2013: colm-0.9.tar.gz
   (namespace/context syntax, token consumption)

Dec 29, 2012: colm-0.8.tar.gz
   (syntax and VM changes)

Aug 4, 2012: colm-0.7.tar.gz
   (a new parse syntax consitent with the send syntax)

June 20, 2012: colm-0.6.tar.gz
   (finer control over ignore tokens)

May 19, 2012: colm-0.5.tar.gz
   (namespaces in C++ interface, shadow-tree parsing)

Jan 21, 2012: colm-0.4.tar.gz

Nov 20, 2011: colm-0.3.tar.gz

July 31, 2011: colm-0.2.tar.gz


Colm does not yet have any documentation. The best way to learn about it is by reading my Ph.D. thesis. Please keep in mind that this is not a manual. A proper manual will follow.

Thurston Ph.D. Thesis

Adrian Thurston is responsible for this. Last changed: Sep 7, 2014