Rcompression package for in-memory compression

Last Release: 0.93-2 (06 Apr 2011)

This package is a basic interface to the zlib and bzip2 facilities for compressing and uncompressing data that are in memory rather than in files. This is useful when the data we have to work with is never in a file on our local file system but rather given to us as part of a transaction with a remote server. For example, we might receive a gzipped-text file from retrieving a URI via the RCurl package. Or we might receive a compressed micro-array file from a Web service via the SSOAP package. Rather than having to collect that data, then write it to disk and then read it back into R, we can uncompress it directly in memory. This avoids unecessary I/O and also improves "security" as our scripts do not need to access the file system. (This is currently not that important as R is not secure in any way, but as we use R more extensively in embedded situations, e.g. in databases, Web servers, spreadsheets, other languages like Perl & Python, etc., this does become an issue).

The current interface is more complete than earlier versions. It provides access to

Recent versions (0.91) onwards are able to deal with updating zip files directly in memory rather than using external executables and temporary files. There are also many high-level facilities/syntactic-conveniences for updating and appending to a zip archive.

At present, one must have the entire data vector in memory before the call and the tools operate on it directly. It is entirely feasible to allow us to generalize this and have the tools ask for more data as it is needed by the decompression libraries. And we can do the same thing with the output. In this way, it could work with the existing connections mechanism in R at the R level. Unfortunately, the connections API at the C-level is not public and it is not amenable to extensions implemented in R packages, i.e. externally from the R source code.

Installation

You will need to have libz (a.ka. zlib) and libbz2 (a.k.a bzip2) installed. The configuration script attempts to find these but is currently not very flexible or aggressive about finding them. I will add more facilities as people start to use this. So please send me mail rather than just hacking the code yourself. (Although sending your changes is even better!) You can find the libraries at Both are trivial to install on almost all machines.

Documentation

  • Changes across releases

  • Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Sat Feb 13 15:25:41 PST 2010