This package provides a mechanism in R to process
C and C++ source code. This will then allows us to
Given this information, we can
- find variable and routine declarations,
- find definitions of data structures, classes, and enumeration
- analyze the body of a routine and identify related routines
The package provides R-level functionality and classes for working
with translation unit dumps from GCC of source code. One uses GCC to
create a .tu file which represents the structure of the source code in
a low-level format. Currently, this information is read into R using a
Perl module GCC::TranslationUnit. This is accessed
from within R via the RSPerl
package. We have developed an almost equivalent C++ based parser
that we are currently in the process of integrating into the package
to remove the dependency on RSPerl. A new version of the package will
be available at the beginning of July.
- automate the creation of bindings to routines and data
- generate registration information for an R package using native
- use source code as data for statistical analysis of software itself.
The current version works but is in the middle of a transition to a
C++ parser and the setup fits my needs rather than being a robust
installation for others. However, the basics of processing the types,
finding the definitions of routines, classes, methods, data
structures, enumerations, etc. are working. I am adding support for
generating bindings and creating registration information as useful
examples of how this material can be used. I am using this
specifically to generate bindings for R to the wxWidgets GUI toolkit in the RwxWidgets.
- Slides from a talk
presented at the DSC 2007 in New Zealand.
- A paper based on that talk
and the package.
- Some documents about different aspects of the package.
- These are "works in progress" and some are more in the spirit
of notes to myself.
- Overview XML, HTML, PDF
- Guided tour XML, HTML, PDF
- Diagrams of the basic workflow for
generating bindings and
of the R commands used in the steps.
- Derived classes XML, HTML, PDF
- Native Registration XML, HTML, PDF
- In/Out parameters XML, HTML, PDF
- A discussion of one strategy for creating the bindings
for C++ code, specifically the wxWidgets library.
XML, HTML, PDF
The ability to process source code has been something I and others
have been exploring for some time now.
is work I did last summer to add support for
R to the SWIG software for generating bindings
from C/C++ code to other languages such as Perl, Python, and R.
Joseph Wang has since adapted this and work is ongoing to
merge the two directions.
SWIG is useful as it offers a language with which
"users" can customize the generation of the generated bindings,
and can use this same input for different target languages.
There are two obvious limitations with using SWIG
- SWIG is not a compiler and so extracts information about the
data structures and routines using a more heuristic parser.
It is extremely good, but is not exactly the same as the
compiler's view of the code.
- SWIG does not give us information about
the bodies of the routines and so doesn't allow us to analyze
the code. This is not important for those only interested
in generating bindings to R or another language.
It is an issue for those of us that want a tool for
analyzing the code itself.
- The GccTranslationUnit
module is a Python-based tool to read the tu files.
Why not SWIG?
I have thought about how to generate reflection interface from C/C++ code for some time - several
years. It is not just about generating bindings, although that is
certainly an important target application.
It is also about doing code analysis.
I started the work on the SWIG bindings for R because
I feel that we can use this translation unit approach
to generate input to SWIG and then leverage that
mechanism to generate bindings.
- Want others to be able to modify the
binding generation code directly from within R.
- Want to be able to get "correct" results from the compiler, and
not "pretty close to correct" results from a pseudo compiler/parser.
Want information about the bodies of the routines/methods, and not
the signature information. This is important for code analysis,
SWIG's input language allows the user to control what she wants
to have in the bindings, but it is yet another language to
understand and we have one already at our disposal - R.
Duncan Temple Lang
Last modified: Thu May 4 15:30:14 PDT 2006