Quick Introduction to the RCCS package


Duncan Temple Lang

Introduction

This is a brief introduction to the RCSS package. It is intended to allow us to read Cascading Style Sheet (CSS) documents within R and to treat them as structured documents. We can then list the different CSS rules, imports, etc. in R to explore the contents of the CSS document. More interestingly, we can query the CSS document for what style would be applied to a particular XML (or HTML) node. In this way, we can check all the nodes in an XML document have a style, find which styles are applied to which types of nodes and determine what styles we have not implemented in the CSS document.

Reading CSS files

At present there are just a few functions in the package. We start by reading a CSS document with the readCSS() function. The primary argument in a call to this function provides the CSS content. This can be either the name of a local file or the CSS content itself, i.e. an R string containing the CSS contents. If the file is local, we can give its name, e.g

f = system.file("samples", "eg.css", package = "RCSS")
css = readCSS(f)

If the CSS material is available via a remote URL, then we can either download it to a local file (e.g. download.file() ) and use readCSS() as before, or we can get its contents with getUrl() in the RCurl package. With getURL() , we end up with the CSS content as a string [1]. We can the pass this to readCSS() and specify that it is to be interpreted as the text of the CSS content and not to be confused with a file name!

txt = getURL("http://www.omegahat.org/OmegaTech.css")
css = readCSS(txt, asText = TRUE)

We can also provide the text using the I() function in R to make it of class "AsIs":

css = readCSS(I(txt))

Alternatively, the CSS content might come from parsing an XML/HTML file and taking the inlined CSS material. Let's look at an SVG file with an inlined CSS file

library(XML)
f = system.file("samples", "bob.svg", package = "RCSS")
doc = xmlParse(f)
node = getNodeSet(doc, "//x:style[@type='text/css']", "x")[[1]]
css = readCSS(I(xmlValue(node)))

Working with the CSS object

Now that we have the parsed CSS object in R, we can ask how many elements there are in the CSS file via the length() method,

 length(css)

In a limited way, we can think of the CSS object as a list. We can access individual elements using the [ and [[ operators, e.g

css[[1]]
css[1:2]

The elements these return are of the general class CSSStatement but each will be of a class reflecting a particular type of statement. (This hierarchy reflects the underlying libcroco library). The classes are AT_RULE, RULESET, IMPORT_RULE, MEDIA_RULE, PAGE_RULE, CHARSET_RULE, FONT_FACE_RULE. All of these are references to the C-level CSS statement objects. They are not regular R objects. However, we can extract information from them and convert them to R lists, etc.

We can ask for the names of the names of the elements and we get back the CSS selectors as strings.

names(css)

We can convert a reference to a C-level CSS entry to an R object using the generic function asCSSObject() . For example,

css = readCSS( system.file("samples", "eg.css", package = "RCSS") )
asCSSObject(css[[3]])

The different classes for the different types of CSS objects are all sub-classes of the R class CSSElement. These classes are CSSRuleset, CSSMediaRule, CSSCharsetRule and CSSFontFaceRule. (Neither CSSPageRule or CSSImportRule seem to appear in the list of CSS elements after the file has been read.)

When we convert a CSS element to an R object, we have access to the details of the elements. Converting a RULESET, we end up with a CSSRuleset. This has two slots: declarations and selectors. The selectors indicate to which nodes this CSS element applies. The declarations is a list of the properties specific to this element.

library(RCSS)
css = readCSS(system.file("samples", "descendant.css", package = "RCSS"))
x = asCSSObject(css[[4]])  
x@declarations    

The elements of the declarations list are CSSDeclaration objects. Such objects have a property name, a value and a level of importance. The selectors give the "rules" for which XML/HTML elements this CSS element applies.

We'll provide more information about these R-level classes in the future. For now, explore the classes.

Resolving CSS elements for XML nodes

One of the motivations for this package is to have the ability to be able to query a CSS file for an element that would apply to a particular XML or HTML node in a real document. We can find which elements have no styles or simply determine which CSS element applies and where it is located in the CSS.

We'll look at an HTML file, in fact the version for this very document. We'll find the associated CSS file

library(XML)
doc = htmlParse(system.file("doc", "guide.html", package = "RCSS"))
cssName = getNodeSet(doc, "//head/link[@rel='stylesheet' and @type='text/css']/@href")[[1]]
css = readCSS(cssName)

Now we have the CSS file and the HTML document and its nodes. Let's look at the first h2 node and find its corresponding CSS rule, if there is one.

nodes = getNodeSet(doc, "//h2")
getCSSRules(nodes[[1]], css)

Unfortunately, at present the matching is case-sensitive even though the CSS elements have an is_case_sensitive slot! This seems to be an error in libcroco, but we'll dig into it more to find a fix.

Let's look at an SVG file that is provided as an example in this package. We'll find its inlined CSS document and then find all the path nodes and see which of these are governed by an entry in the CSS document.

library(XML)
doc = xmlParse(system.file("samples", "bob.svg", package = "RCSS"))
txt = xmlValue(getNodeSet(doc, "//x:style[@type='text/css']", "x")[[1]])
css = readCSS(I(txt))
paths = getNodeSet(doc, "//x:path", "x")
rules = lapply(paths, getCSSRules, css)
table(sapply(rules, length))

So 29 of the 208 path nodes do not have a CSS entry.

If we are matching numerous XML/HTML nodes, it might be of some value to create just once the selection engine object used in the matching. Otherwise, a separate selection engine will be created for each call to getCSSRules() and destroyed at the end of that call. This is relatively cheap, but we can avoid it by passing a selection engine object in the call to getCSSRules() .

  library(XML)
  doc = htmlParse("http://www.omegahat.org/index.html")
  allNodes = getNodeSet(doc, "//*")
  eng = .Call("R_createSelectionEngine")
  css = readCSS(system.file("samples", "OmegaTech.css", package = 'RCSS'))
  rules = lapply(allNodes, getCSSRules, css, eng)
  table(sapply(rules, length))




[1] A character vector of length 1