A New Compiler for the Arnold Schwarzenegger Language
Since the release of version 0.4, there is now a basic ArnoldC compiler in the repository. This ArnoldC compiler is a test case for the compiler framework. If the framework can handle this language well, then it will be useful for other languages too. The full code for the compiler can be found here.
In this blog post, I will show how the compiler can be used to generate fast code for the CPU. To follow along, you can either clone the repository and run:
$ cargo install --path arnoldc
Or download the arnoldc
binary from the release page.
The ArnoldC language is based on one-liners from Arnold Schwarzenegger movies. This is what a valid "Hello, World!" program looks like:
IT'S SHOWTIME
TALK TO THE HAND "Hello, World!\n"
YOU HAVE BEEN TERMINATED
Here, IT'S SHOWTIME
means "begin main", TALK TO THE HAND
means "print", and YOU HAVE BEEN TERMINATED
means "end main".
Before we use the compiler, let's see whether the installation was successful:
$ arnoldc --help
This should print:
A compiler for the ArnoldC language
Usage: arnoldc [OPTIONS] [INPUT]
Arguments:
[INPUT] The input file (- is interpreted as stdin) [default: -]
Options:
--convert-scf-to-cf Convert structured control flow (scf) operations to cf
--convert-cf-to-llvm Convert control flow (cf) operations to LLVM
--convert-experimental-to-mlir Convert experimental operations to MLIR
--convert-func-to-llvm Convert function operations to LLVM
--convert-mlir-to-llvmir Convert MLIR to LLVM IR
--print-ir-before-all Print the IR before each pass
--convert-arnold-to-mlir Convert ArnoldC operations to MLIR
--compile Compile the code
--debug Print debug information
-h, --help Print help
-V, --version Print version
To compile ArnoldC, let's create a file called hello.arnoldc
with the hello world program:
IT'S SHOWTIME
TALK TO THE HAND "Hello, World!\n"
YOU HAVE BEEN TERMINATED
Next, let's see what the compiler generates when we run the --convert-arnold-to-mlir
pass:
$ arnoldc --convert-arnold-to-mlir hello.arnoldc
This prints:
module {
func.func @main() -> i32 {
experimental.printf("Hello, World!\0A")
%0 = arith.constant 0 : i32
return %0 : i32
}
}
What this shows is that the compiler has converted the ArnoldC code to MLIR.
It also added a 0 return value to the main
function.
This ensures that the program will return a 0 status code, which is the convention for programs that didn't crash.
Although this MLIR code looks nice (or at least more so than ArnoldC), let's get it to run. To do so, convert the MLIR code to LLVM IR by running all the required passes in order:
$ arnoldc \
--convert-arnold-to-mlir \
--convert-experimental-to-mlir \
--convert-scf-to-cf \
--convert-cf-to-llvm \
--convert-func-to-llvm \
--convert-mlir-to-llvmir \
hello.arnoldc
Think of each of these passes as a set of transformations.
For example, the --convert-arnold-to-mlir
pass transforms:
TALK TO THE HAND "Hello, World!\n"
to
experimental.printf("Hello, World!\0A")
The command with all passes applied prints the following LLVM IR:
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
declare i32 @printf(ptr)
define i32 @main() {
%3 = alloca i8, i16 15, align 1
store [15 x i8] c"Hello, World!\0A\00", ptr %3, align 1
%4 = call i32 @printf(ptr %3)
ret i32 0
}
!llvm.module.flags = !{!0}
!0 = !{i32 2, !"Debug Info Version", i32 3}
Remembering these passes and in the order in which to run them is cumbersome, so it is also possible to use the compile
flag, which is a wrapper around the above command and produces the same result:
$ arnoldc --compile hello.arnoldc
To run our compiled code, we can use the LLVM interpreter via the lli
command.
lli
executes programs written in the LLVM bitcode format.
This executable is part of the LLVM project, so it can usually be installed via the package manager.
For example, on MacOS, brew install llvm
.
Enough talk, let's run the code!
$ arnoldc --compile hello.arnoldc | lli
Hello, World!
Or produce a native executable:
$ arnoldc --compile hello.arnoldc | llc -filetype=obj -o hello.o
$ clang hello.o -o hello
$ ./hello
Hello, World!
Although the compiler is still far from complete (see status for details), there is one more thing we can do. We can print a variable:
IT'S SHOWTIME
HEY CHRISTMAS TREE x
YOU SET US UP @NO PROBLEMO
TALK TO THE HAND "x: "
TALK TO THE HAND x
YOU HAVE BEEN TERMINATED
This should print:
x: 1
because HEY CHRISTMAS TREE x
is equivalent to what in Python would be x =
and @NO PROBLEMO
in ArnoldC is equivalent to the boolean True
.
Let's see what the compiler generates.
To get readable code, we do only the --convert-arnold-to-mlir
pass:
$ arnoldc --convert-arnold-to-mlir print.arnoldc
This prints:
module {
func.func @main() -> i32 {
%x = arith.constant 1 : i16
experimental.printf("x: ")
experimental.printf("%d", %x)
%0 = arith.constant 0 : i32
return %0 : i32
}
}
Which returns the expected value:
$ arnoldc --compile print.arnoldc | lli
x: 1
As a final example, let's see what happens if we write invalid code:
IT'S SHOWTIME
TALK "Hello, World!"
YOU HAVE BEEN TERMINATED
If we now run the compiler, it will fail with a clear error message:
$ arnoldc --compile invalid.arnoldc
thread 'main' panicked at arnoldc/src/main.rs:67:60:
called `Result::unwrap()` on an `Err` value:
---
0 | ITS SHOWTIME {
1 | TALK "Hello, World!\n"
^ Unknown operation: TALK
---
As is expected when building a compiler, the error message point to the exact location of the error with a description of the problem.
This concludes the walkthrough, or as Arnold would say:
YOU HAVE BEEN TERMINATED
Next Steps
To learn how to build your own compiler, see the files inside the arnoldc
directory.
It is split into three parts:
src/main.rs
defines the command line interface.src/arnold.rs
specifies how to parse the ArnoldC code (convert the text to data structures).src/arnold_to_mlir.rs
contains the--convert-arnold-to-mlir
pass, which converts the ArnoldC code to MLIR.
All other passes such as --convert-func-to-llvm
are implemented in the xrcf
crate.
If you want to contribute to the compiler framework, see contributing.
Although the compiler framework is not yet feature complete, if you want to build your own compiler, here are some modern compiler projects that could serve as inspiration:
- jax: A Python library for accelerator-oriented computing
- tvm: A end to end machine learning compiler framework for CPUs, GPUs, and accelerators.
- torch-mlir: Compiles PyTorch to MLIR.
- Flang: A LLVM-based Fortran compiler.
Or you could build a compiler for a different movie star. Or your favorite tensor processing unit. It's up to you.