When we load the XSL file and start the processing of the XML documentation with XSL, the template for our top-level node defines the function randomizeNodes() which it will invoke to randomize the question nodes.

Homework 1

Date: Sun May 18 11:36:28 2008

Due: Friday

Total marks: 18

Question 1 []

This has no title and you don't have to do much. But then again, you don't get many marks either! There are no solutions as the question seems vacuous and the numPoints attribute doesn't appear.

Question 2: Numerical Summaries [3]

Read the following data into R as a data frame and create a numerical summary of the variables in the data frame.

myData.txt
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Here we use an XInclude. If we instruct the XSL processor to follow the XInclude directives, we will end up with a copy of the data in the document. If we don't, then the XSL rules for the resulting xi:include nodes will provide a link to the data. Can we do both? This is slightly tricky because it is the XML parser that does the XInclude. If we do it in the parsing stage, we don't see the xi:include node, but just what it is replaced by. So we need to tell the XML parser not to do the XInclude, and then process that node so as to a) put the link to the URL, b) process the XInclude. The XML package provides functions to do this. Firstly, ensure that we process the XML document with , e.g.
xsltApplyStyleSheet("hw.xml", "hw.xsl", solutions = 0, xinclude = FALSE)
Next, provide an XSL template for xi:include which emits the link and then calls an R function which processes the XInclude node.

Solution
data = read.table("http://www.omegahat.org/HW/myData.txt")
summary(data)

Question 3: Fibonacci Numbers [10]

Write a function to compute the first 10 Fibonacci numbers.

Solution
There are several ways to do this and some are faster than others.
fib = 
function(n)
{
  if(n == 0 || n == 1)
     return(1)

  fib(n-1) + fib(n-2)
}

Question 4 [5]

Create an informative plot for the following data:

structure(list(x = 1:10, y = c(8.58146989277466, 12.0478462004630, 
16.8235360430284, 21.7284811660330, 28.59736273056, 34.4060590670857, 
37.066471856029, 43.5312485076099, 49.7638083326812, 50.9410207991281
)), .Names = c("x", "y"), row.names = c(NA, -10L), class = "data.frame")

Solution

There are two variables and if we look at a scatter plot, we see that there is a linear relationship between them. So a scatterplot with a regression line would be reasonable.

We can reuse the XSL code from IDynDocs to generate a plot and create the necessary output. This is in IDynDocs/XSL/html.xsl and dynRfo.xsl, etc. There are minor issues about exporting those functions as they use different names within closures, etc. and so we need to sort that out first. Alternatively, we can just define a simple function ourselves. It will not be as general, e.g. dealing with references to other r:code nodes within the body of this r:plot, but for our purposes it is quite simple. We can define an XSL rule that calls an R function defined as

function(nodeset){
    node = nodeset[[1]]
    filename = tempfile()
    jpeg(filename)
    x = eval(parse(text = xmlValue(node)))
    dev.off()
    filename
}

This is called with the r:plot node as the argument which appears as a XPathNodeset and so we fetch the first element which is the r:plot node. We create a graphics device using the jpeg() and return that file name. Here we use a random file name. Using the digest, we can use a name that does not change unless the code in the r:plot node changes.

We can either call this function from XSL via r:call('funName') or we can explicitly register the function via addXSLTFunctions() and then use the name with which we registered it.

There is one remaining problem to be solved and that is how do we get the data for the r:plot code. It is local to the question node, so we need to ensure that it is defined. We'll ignore issues of scope, etc. and just load it into the global/session environment for now. In the r:plot node, we arrange to load the data from any r:data within this question. Since we are in the r:plot node within a question we need to go back to the question node and then find all the r:data nodes below it. Here is where we use axes in XPath. To find the enclosing question ancestor node, we use ancestor::question. Then we can find all the r:data nodes below this as ancestor::question//r:data. We pass the set of these to an R function that process each node and assigns the result of reading the data into R to a variable identified by the r:var attribute on that r:data node.

We can solve the issue with scope and overwriting global variables by changing the way we deal with the data and passing it to the graphicsEval() function. The idea is that we will restore the data to an R object within the graphicsEval() and then we will evaluate the code using the with() function. We use much of the same XSL code and the only difference is in the R code, as we should expect since this is really an R evaluation issue.

If we don't use this approach, we need to put the plotting code directly into the with() call ourselves:

with(d,
     { plot(x, y)
       abline(lm(y ~ x))
     })

Similarly, we need to have an r:var attribute on the r:data node.