An automated mechanism for invoking C routines from R

The Rffi, RGCCTranslationUnit and Rllvm packages

Duncan Temple Lang

University of California at Davis

Department of Statistics


Abstract

We describe an approach to calling arbitrary C routines from R. This is an alternative mechanism from writing wrapper routines via the .C or .Call interfaces. This approach involves dynamically describing the signature of the routine - the types of inputs and its outputs - of a routine. We then use this description without any supporting C code to invoke the existing routine. This approach can handle scalar data types as well as complex structures. We also illustrate a mechanism for automating this dynamic interface by programatically reading the existing C code and generating the signatures and data structures descriptions. We also suggest an approach to making R more efficient using compiled code.

High-level interpreted languages such as R, MATLAB and Python, typically provide a mechanism to invoke compiled, native code. One can pass values from those languages across the interface to C/C++ and FORTRAN routines. The values are marshaled to the corresponding representation in C/C++ or FORTRAN and control is passed to the native routine. The results are then marshaled back to an R representation. In R, this foreign function interface (FFI) is available via the .C() , .Call() , .FORTRAN() and .External() functions. The typical sequence of steps to interfacing to a routine is

  1. identify the routine and its signature - types of its input parameters and its return type

  2. determine whether to invoke the routine via the .C() or .Call() interface

  3. write a wrapper routine that conforms to the specific types allowed by the chosen interface (.C or .Call)

  4. compile the new wrapper routine, and link as a shared library with the compiled target routine

  5. dynamically load the DSO containing the wrapper routine

  6. invoke the routine, coercing the R values to the relevant R data types to conform with the expected types of the wrapper routine

  7. debug using a native debugger such as gdb

This is often a time consuming task and some challenging for those less familiar with writing this type of code. One is working in two languages simultaneously and good design decisions are not initially obvious.

In this paper, we present an alternative mechanism for invoking native C routines from within R. It removes the need to determine the appropriate R interface (.C or .Call) (step 2), write a wrapper routine in C, compile and link and load that compiled wrapper routine (steps 3, 4, and 5). And because the user doesn't have to write C code, there is no new native code to debug (step 7). So we have reduced many of the steps. We have also removed the need for the user to have tools such as a compiler installed.

The key to the mechanism we describe here is libffi. This is a compiled library that provides data structures and routines via the C language which enable us to invoke routines by simply describing the types of the inputs and output. libffi takes care of making the call and marshaling the raw inputs to and from the routine being called. It does this in a portable manner across different operating systems, allowing us to have a single piece of code for R that works on all systems. The Rffi package provides an interface to libffi and supporting functionality to convert between R and C data types in a general manner.

This libffi-based mechanism for interfacing to native routines and data structure is simpler in many ways than the regular .C() or .Call() interfaces. Importantly, it allows us to invoke arbitrary routines directly without having to write a wrapper routine. However, it does require understanding a new model. Furthermore, the descriptions must be available and instantiated at run-time. This potentially makes the mechanism slightly slower than the .C() or .Call() interfaces. This is insignificant

[Note]Note

Measure

and so Rffi potential offers benefits in both flexibility and simplicity

The paper is divided into four sections. First we describe the Rffi package and how we can manually create descriptions of routines and data structures in C and use these to invoke routines from R without any compilation of code (after the Rffi package is installed). In the second part of the paper, we describe how we can use a tool such as RCIndex or RGCCTranslationUnit to programmatically generate the routine and data structure descriptions by automating the reading of the C code. In the third part of the paper, we show how we can pass function pointers in C as arguments in calls from R and also how we can use R functions as implementations of (This part I might not bother to do.) We end the paper with a brief discussion about different enhancements that others might want to pursue.

Need for support routines at run-time, e.g. to convert arrays in C to R vectors when the length of the array is not known when R is converting the result in Rffi, but just sees a pointer.

The Basics of Rffi

In this section of the paper we present the basic facilities provided by the Rffi package for directly and dynamically interfacing to existing compiled routines. This approach allows us to describe the details of a routine and the layout or definition of complex data structures within R. We then use these generic descriptions to invoke specific functions corresponding to that interface or that particular definition. The essential ideas are that we create a description of a routine to specify the different parameter and return types of the routine. This then allows us to invoke that native routine from R, passing R objects as the arguments to the routine and obtaining both the returned value and any mutable inputs that might have been changed. The types of the inputs and outputs of a routine can be simple primitive C-level data types such as int, double, short, long, char * (for character strings) and so on. We can also define complex data structures (structs) by defining the collection of field names and types. The Rffi package also handles marshaling such struct types to and from R. In this section we illustrate how to invoke different native routines from R.

We start with a very simple sample routine that passes an integer value to a C routine and returns an integer. The routine intCall in the Rffi package provides an example. In order to be able to invoke such a routine, we need to describe a generic routine with this signature - inputs are a single integer and outputs are a double. A signature is the both the collection of input types - the parameters - and the return type. We create a call interface object - a CIF - to describe the class of routines with the same signature. The R function CIF() allows us to define such an interface that we can use for any C routine with this particular signature. For our routine that accepts an integer and returns a C double we create the CIF with the R code

cif = CIF(doubleType, list(sint32Type))

The first argument is the return type. If the routine has input parameters, we specify their types via a list. The list can have names to give symbolic names to the parameters, but these are not used at present. There are 15 built-in data type identifiers available as R variables in the Rffi package. These are described in table ???

Table 1. Standard data type identifiers

R variableC data type
doubleTypedouble
floatTypefloat
longdoubleTypelong double
pointerTypepointer to any type of object/value
sint16Typesigned 16-bit integer
sint32Typesigned 32-bit integer, corresponding to int
sint64Typesigned 64-bit integer
sint8Typesigned 8-bit integer
stringTypea char *, a pointer to a string. This is introduced in R to identify strings rather than generic pointers.
structTypea struct description containing information about the types of the fields
uint16Typeunsigned 16-bit integer
uint32Typeunsigned 32-bit integer
uint64Typeunsigned 64-bit integer
uint8Typeunsigned 8-bit integer
voidTypethe void type



Programmatically Obtaining & Using Interface Descriptions

Using Function Pointers

Consider numerical optimization. Suppose we have C code that takes a starting value/vector of the parameters and a function that is to be optimized and its derivative/gradient. The C routine takes care of the internal details of the optimization algorithm such as step halving, etc. As it tries to find the optimal value of the parameters, it invokes the function and gradient. We'll ignore the gradient for the moment. The C routine we want to invoke is declared as

int
optimize(double start, double (*fun)(double val, void *userData), void *userData);


We can create a CIF for this routine with

cif = CIF(doubleType, list(doubleType, pointerType, pointerType), rep(FALSE, 3))

Now we need to provide the routine that gives the value of the function to be optimized. If we have loaded the DSO with the routine that we want to pass to optimize, we can access it with getNativeSymbolInfo() , e.g.

myFunPtr = getNativeSymbolInfo("myFun")$address

This is an externalptr object that contains the memory address of the routine. It is a pointer to a C routine. We can pass this to optimize now using

opt = callCIF(cif, "optimize", 2.3, myFunPtr)

This the returns the value that optimizes this function, based on the algorithm in optimize.

R functions as pointers to C routines

What if we want to implement the function to be optimized as R code. This is certainly more convenient to our work flow as we do not have to step away from R and write the code in C, compile and debug it and then load it. We might have our likelihood function as something like

lik = function(theta) 
        sum(log(dexp(data, theta)))


or using a closure to avoid the global variable data

likGenerator = 
 function(data)
    function(theta) 
        sum(log(dexp(data, theta)))


and the

myLik = likGenerator(myData)

We want to pass this as the

userData


argument to optimize and arrange for the C routine we pass as the C routine pointer argument -

fun


- to take the candidate value and the user data (the R function) and to turn this into a call to the R function which will evaluate the likelihood. To do this, we need to create this C routine to call our function. If we were to write it by hand, it might look something like the following:

double
R_myFun(double val, void *data)
{
    SEXP call, ans;
    PROTECT( call = allocVector(LANGSXP, 2));
    SETCAR(call, (SEXP) data);
    SETCAR(CDR(call), ScalarReal(val));
    ans = Rf_eval(call, R_GlobalEnv);
    UNPROTECT(1);
    return(asReal(ans));
}


This creates a call to a function and inserts the function and the candidate value as R objects. Then it evaluates the call and converts the result to a numeric scalar and returns the value.

We can use this in our call to optimize as

fun.ptr =  getNativeSymbolInfo("R_myFun")$address
callCIF(cif, "optimize", 2.3, fun.ptr, myLik)


The optimize routine has no knowledge that R was called to evaluate the function. Instead it has merely invoked our routine R_myFun.

e

Note that the routine R_myFun was created to have the same signature as the function pointer expected by optimize. Furthermore, the body of the routine would be different and more involved if there were additional parameters of different types in the routine that in turn had to be passed to the R function. So what we would like is a mechanism to automate the create of such a wrapper routine. We can do this with a description of the signature and data types in the function pointer. But we cannot arrange to have the routine called unless we compile and load it.

We can generate the code in R from the CIF types or from the translation unit description of the function pointer. After we do this and write it to a file, we then compile and load the code and continue as if we had written the code manually, as above. So let's now focus on how we generate this routine using functionality in RGCCTranslationUnit.

See funPtrTU.R We will write a function that takes an description of a function pointer from the translation unit and generates a wrapper routine that calls an R function. The function needs the description of the function pointer, the name of the new routine to create and may require the caller to specify which parameter refers to the R function, i.e. the user data passed to the routine. The function generates code very similar to that in the R_myFun code above. It creates an R data structure to represent the call with one less arguments than the number of parameters passed to the actual routine. This is because we don't pass the user data to R since this is the R function object. Our routine then adds the corresponding R value for each of the parameters in the routine to the call. Then it invokes the call with Rf_eval. The final step is to convert the R object returned from the R function to the appropriate C representation. Within the code, there are declarations of local variables and simple R memory management.

# This is for runFunPtr in ../../src/test.c - not the optimize() in the paper.

createCallRFunctionWrapper =
function(funPtr,    # the description of the function pointer - class FunctionPointer
         funcName,  # name of the routine we create
                    # index of parameter that is the user data containing the R function SEXP
         userDataParam = findUserDataParam(funPtr),
                    # names to use for the parameters of the routine
         paramNames = names(funPtr@parmeters))
{
        # So we have an explicit representation for a FunctionPointer
        # We have the return type and the parameters. From these we can
        # define our wrapper routine
  if(length(names(funPtr@parameters)) == 0)  #?  NULL since the TU loses them - yes!
      names(funPtr@parameters) = paste("x", seq(along = funPtr@parameters), sep = "")


     # Now create the body of the routine first.
   params = funPtr@parameters
   body = c(
                  # local variables
             "SEXP call, ans, ptr;", 

                  # create the call
             sprintf("PROTECT(ptr = call = allocVector(LANGSXP, %d));", length(params)),

                  # put the function into the first element of the call
             sprintf("SETCAR(ptr, (SEXP) %s); ptr = CDR(ptr);", names(params)[ userDataParam ]),
                  # add each of the parameters, except the userDataParam
             mapply(function(id, parm)
                       sprintf("SETCAR(ptr, %s); ptr = CDR(ptr);", convertValueToR(id, parm@type)),
                    names(params)[ - userDataParam], params[ - userDataParam]),

                  # invoke the call
             "ans = Rf_eval(call, R_GlobalEnv);",
  
             "UNPROTECT(1);",
                  # conver the result back to a C value
             sprintf("return(%s);", gsub(";$", "", convertRValue("",  "ans", funPtr@returnType)))
            )

       # Now we have to get its declaration or signature to add to the top of the body
   ret = getNativeDeclaration("", funPtr@returnType, addSemiColon = FALSE)
   decl = mapply(getNativeDeclaration, names(params),
                                       lapply(params, function(x) x@type),
                  MoreArgs = list(addSemiColon = FALSE))


       # put the pieces of the code together into a character vector
  c(ret,
      sprintf("%s(%s)", funcName, paste(decl, collapse = ", ")),
      "{",
      paste("    ", body),
      "}")

}





We can now call this code for our particular example and then test it.

library(RGCCTranslationUnit)
 # read the TU file
tu = parseTU("../TU/test.c.001t.tu")
 # find only the routines in files that start with test
r = getRoutines(tu, "test")
 # get the routine of interest
f = r$runFunPtr
 # resolve all the data types referenced in the routine
f = resolveType(f, tu)

 # Verify the types interactively
sapply(f$parameters, class)
sapply(f$parameters, function(x) class(x$type))

 # Get the function pointer type parameter
funPtr = f$parameters[["fun"]]$type
class(funPtr) == "FunctionPointer"

  # Now we generate the wrapper routine, sourceing the function to do this
  # and then invoking it with the relevant arguments.
source("funPtrTU.R")
code = createCallRFunctionWrapper(funPtr, "R_myFun2", 2, paramNames = c("value", "userData"))
  # Write the generated code along with the necessary C header files
cat("#include <Rdefines.h>",
    code,
    sep = "\n", file = "foo.c")


    # Compile the foo.c into a DSO
  system(sprintf("%s/bin/R CMD SHLIB foo.c", R.home()))

    # load the resulting DSO
  dyn.load("foo.so")
    # get a reference to the newly generated routine which we
    # can pass as the function pointer argument.
  f = getNativeSymbolInfo("R_myFun2")$address

     # Define the R function that will be called each iteration
  myFun = function(val)
               val + 1

    # Now we can invoke the original C routine and pass our function pointer - R_myFun2
    # and our R function - myFun. We create the CIF first and then invoke it.
  library(Rffi)
  cif = CIF(doubleType, list(sint32Type, doubleType, pointerType, pointerType), rep(FALSE, 4))

  ans = callCIF(cif, "runFunPtr", 3, pi, f, myFun)
  print(ans)


Compiling directly with Rllvm

We want to illustrate a different approach than generating C code via RGCCTranslationUnit. Instead of assuming the machine has a compiler such as gcc and having to invoke it and get the compilation and linker flags set appropriately or call the SHLIB command in R, we want to compile and load the code directly within in R. The Rllvm package allows us to do this.

Rllvm is an interface to the low-level virtual machine software that can portably generate machine code on different computer architectures. Within R, we can create descriptions of the definitions (not just declarations) of native routines and have llvm generate the machine code to implement those within the R process. So we can generate the implementation of of R_myFun as we would write it above and then use the routine directly within R without calls to a compiler, etc. And we can do this for arbitrary function pointer signatures.

Using the C code above as a template, we can generate the Rllvm code to define the equivalent routine.

There is no doubt that this is a complex task. We are working with the instruction set of a low-level virtual machine. This makes writing C code seem terse, let alone R code. The point of this example is not to suggest this is how R users should consider writing wrapper routines for passing R functions as C function pointers. Instead, the aim is to illustrate that this can be automated using RGCCTranslationUnit and Rllvm and can be used via Rllvm. My hope is that advanced users will be able to explore additional uses of these powerful underlying technologies and the R interfaces to them.