The RLLVMCompile Package
RLLVMCompile on github
RLLVMCompile_0.2-0.tar.gz
The RLLVMCompile package is a functioning prototype of a customizable
and extensible compiler for simple R code and specific, but common, R
idioms. The compiler functions are written entirely in R and use the
Rllvm package to generate the native code.
The result is code that is competitive with compiled C/C++ code,
but generated directly from R implementations.
How it works
The compiler is intentionally quite simple at present.
It takes an R function or collection of expressions and
translates the R calls to corresponding LLVM instructions.
For example,
x = exp( ((x - mu)/sigma)^2 )
is recognized as an assignment call (=) with two elements - the left
hand side and the right hand side.
We compile the right-hand side, by traversing this sub-expression.
We recognize this as a call to exp() with one argument.
This one argument is an expression which is a call to ^ with two inputs:
(x - mu)/sigma
and the literal value 2.
We compile the first of these expression, finally getting to binary
operators for / and -.
Performance
The compiler is written in R and we have made no effort
yet to make it fast. The resulting code however is fast,
competitive with C code. Results depend greatly on the particular
problem and implementation in R or C.
However, we do see significant improvements in performance
for common problems, ranging from
- making interpreted R code essentially equivalent in performance to R internal functions
implemented in C,
- 200 for a 2 dimensional random walk implemented in R,
- a 50% speedup over heavily vectorized versions of the 2-D
random walk
- a factor of 600 for the naieve, simple Fibonacci sequence implementation,
- a 20x speedup over reading data from a large file by changing
the nature of the computations, but doing them all in R and compiling
- 4 - 6x speedup over native C code in R for computing distances
by avoiding redundant computations and memory consumption due to
the rigid C code in R.
See timings.
Generating PTX Code
Recently (July 2013), I added a customized version of the compiler
that compiles a variation of R code that targets GPUs. We can compile
an R function as a GPU kernel. The R code can refer to CUDA terms
such as threadIdx, blockIdx, blockDim and gridDim. We access the x,
y, z components as if these were R lists, e.g.
threadIdx$x
and gridDim$x
. We rewrite these
expressions to calls to access the corresponding registers.
When we have created the LLVM IR code for a routine, we don't
compile it to native code, but use the LLVM backend.
We can then load this code via the RCUDA
package and then invoke it with the .gpu()
function.
This is an example of how we can easily customize the compiler
to handle non-standard R code in target-specific ways.
We plan to improve the compiler and also add additional processing
steps that analyzes memory usage to avoid copies and also attempt to
reuse memory. The CodeDepends package is
a useful aid in this, allowing us to calculate data flow in a sequence
of R expressions. We also plan to identify parallelism and map the
computations to GPUs or multiple CPUs.
Documentation
-
- Examples in tests
-
-
- (July 16, 2013) Compiling GPU kernels
-
Features and Issues
This does not handle all R code by any means.
It handles
- arithmetic (integer and floating point)
- logical operations (&&, ||, negation)
- element-wise integer index subsetting and assignment (adjusting to 0-based counting),
- if() statements,
- for() and while() loops,
- sapply() calls,
- recursively compile functions that are called by other
functions,
- use and call native routines
- work with some R objects in the compiled code
It cannot handle calls to eval()
and will never be able to do this generally.
(It might be able to handle specific cases.)
Type Specification
The compiler needs the caller to provide type information for the
parameters and the return type. However, these can be determined dynamically given
sample R arguments the code will be called with.
We are also developing a type inference package.
License
This is distributed under the BSD License.
Duncan Temple Lang
<duncan@wald.ucdavis.edu>
Last modified: Thu Jul 18 17:06:11 PDT 2013