The RLLVMCompile Package

RLLVMCompile on github

RLLVMCompile_0.2-0.tar.gz

The RLLVMCompile package is a functioning prototype of a customizable and extensible compiler for simple R code and specific, but common, R idioms. The compiler functions are written entirely in R and use the Rllvm package to generate the native code. The result is code that is competitive with compiled C/C++ code, but generated directly from R implementations.

How it works

The compiler is intentionally quite simple at present. It takes an R function or collection of expressions and translates the R calls to corresponding LLVM instructions. For example,

  x = exp( ((x - mu)/sigma)^2 )
is recognized as an assignment call (=) with two elements - the left hand side and the right hand side. We compile the right-hand side, by traversing this sub-expression. We recognize this as a call to exp() with one argument. This one argument is an expression which is a call to ^ with two inputs: (x - mu)/sigma and the literal value 2. We compile the first of these expression, finally getting to binary operators for / and -.

Performance

The compiler is written in R and we have made no effort yet to make it fast. The resulting code however is fast, competitive with C code. Results depend greatly on the particular problem and implementation in R or C. However, we do see significant improvements in performance for common problems, ranging from
  1. making interpreted R code essentially equivalent in performance to R internal functions implemented in C,
  2. 200 for a 2 dimensional random walk implemented in R,
  3. a 50% speedup over heavily vectorized versions of the 2-D random walk
  4. a factor of 600 for the naieve, simple Fibonacci sequence implementation,
  5. a 20x speedup over reading data from a large file by changing the nature of the computations, but doing them all in R and compiling
  6. 4 - 6x speedup over native C code in R for computing distances by avoiding redundant computations and memory consumption due to the rigid C code in R.
See timings.

Generating PTX Code

Recently (July 2013), I added a customized version of the compiler that compiles a variation of R code that targets GPUs. We can compile an R function as a GPU kernel. The R code can refer to CUDA terms such as threadIdx, blockIdx, blockDim and gridDim. We access the x, y, z components as if these were R lists, e.g. threadIdx$x and gridDim$x. We rewrite these expressions to calls to access the corresponding registers.

When we have created the LLVM IR code for a routine, we don't compile it to native code, but use the LLVM backend. We can then load this code via the RCUDA package and then invoke it with the .gpu() function.

This is an example of how we can easily customize the compiler to handle non-standard R code in target-specific ways.

We plan to improve the compiler and also add additional processing steps that analyzes memory usage to avoid copies and also attempt to reuse memory. The CodeDepends package is a useful aid in this, allowing us to calculate data flow in a sequence of R expressions. We also plan to identify parallelism and map the computations to GPUs or multiple CPUs.

Documentation

  • Examples in tests
  • (July 16, 2013) Compiling GPU kernels
  • Features and Issues

    This does not handle all R code by any means. It handles It cannot handle calls to eval() and will never be able to do this generally. (It might be able to handle specific cases.)

    Type Specification

    The compiler needs the caller to provide type information for the parameters and the return type. However, these can be determined dynamically given sample R arguments the code will be called with. We are also developing a type inference package.

    License

    This is distributed under the BSD License.
    Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Thu Jul 18 17:06:11 PDT 2013