toXML()
. This is a
generic function. The class-specific versions of this are responsible
for returning a string containing the appropriate XML output.
filter$write(read.table()) filter$write(1:10) filter$write(factor(rbinom(100)))
We can access the information within a DTD locally using the
parseDTD()
function and the argument of the same name to
xmlTreeParse()
. The DTD elements returned by both are
identical, so we describe the value returned by parseDTD()
.
Before this, we give a very brief overview of what is in a DTD
and what we can expect to see in the user-level objects
parseDTD()
xmlTreeParse
, both the internal and external DTD
tables are returned. Each of these is as described here.)
<!ENTITY % foo "my text to be repeated">
Internal entities of this form are converted to user-level objects of
class XMLEntity
. Each of these has 3 fields. These are
the name
which is the identifier used to refer to the
entity. The value
field is the expansion of the macro.
The orig
field is the unexpanded value which means that
if the value contains references to other entities, these will not be
expanded.
For example, the entries in the DTD
<!ENTITY % bar "for R and S"> <!ENTITY testEnt "test entity &bar;">produces the
XMLEntity
object
$name [1] "testEnt" $value [1] "test entity for R and S" $original [1] "test entity %bar;" attr(,"class") [1] "XMLEntity"
The names of the entities list uses the names of each the entities.
XMLExternalEntity
. This has the same fields as the class
XMLEntity
but the interpretation of the
value
field is left to the user-level software.
One can use scan()
, url.scan
, and other
functions for reading the value of the remote content.
A basicelement definition has the following components
<!ELEMENT name content>The name is the text used to introduce it in an XML document as in
<name> </name> <name />
The content is the most complicated aspect of an element, but it is relatively simple to understand in most cases. It is used to indicate what are the possible combination of elements that can be nested within this element. It allows the author of the DTD to specify an ordering of the sub-elements as well limited control over the number of such elements one can use in any position. The three basic structures used in the content definition are
(content) +
means that at least one is expected, but there can
be any number of structures matching this content description
after the first one.
(content)*
means that there are 0 or more
expected.
(content)?
means zero or one.
The following example illustrates all of the basic features
<!ELEMENT entry3 ( (variables | (tmp, x)), (record)* , (a*, b,c,d, (e|f)) , (foo)+ ) >Here we define an element named
entry3
. This has 4 basic
types that can be nested within in, and in a specific order. First,
we must have a variables
element or the pair
tmp
followed by x
. There should be exactly
one of either of these entries.
This is followed optionally by any number of record
element instances.
After this, there must be a sequence of
element instances
a
, b,
c
, d
and either of e
or f
.
And finally, we can have one or more foo
entries, but at
least one.
The attributes an element supports are listed separately
via a ATTLIST
element
<!ATTLIST element-name attributeId type default ... >
The structure returned from parsing and converting a DTD to a user-level object is quite simple. It is a list of length 2, one for the entities and the other for the elements within the DTD. If the DTD object comes from a document, it separates the entities and elements defined locally or internally in the document and those in the external DTD if there is one. This results in a list of length 2 which contains the internal and external DTDs. Each of these is then a list of length 2 with the entities and elements.
The entities element in a DTD is a named list. The names are the
identifiers for the entities.
Each entry in this list is an object of class
XMLEntity
or XMLExternalEntity
.
In either case, each has 3 fields. name
,
content
and original
.
The name is the identifier of the entity.
The value is the text used to substitute in place of the entity
reference. The original
field is for use when reproducing
or analyzing the DTD. If the value contains references to other
entities, this field reflects that and is the unexpanded or literal
version of the entity definition as it appears in the DTD document.
The elements list is also a named list, with the names being those of
the elements. Each entry in the list is an object of class
XMLElementDef
.
These contain 4 fields:
name
type
1
indicating an
ELEMENT_NODE. An explanatory string is used as the name for this
integer enumeration value.
contents
XMLElementContent
and has 3
fields:
type
PCData
, Sequence
,
Element
, Or
, and so on.
ocur
Once
, Zero or One
,
Mult
and One or More
.
elements
XMLContent
objects
that describe the feasible sub-elements within this
element being defined.
These are usually specializations of the class
XMLContent
: XMLOrContent
, XMLElementContent
,
XMLSequenceContent
.
These have the same structure, just different meaning and semantics.
attributes
XMLAttributeDef
objects, with the
names being those of the attributes being defined for this
element.
entry3
above is given below. It is an
object of class XMLSequenceContent
.
Hence, its type
field is a named
integer with value 3
and name Sequence
.
Since the entire content has no qualifier, the ocur
field is Once
.
Now we look at the sub-elements, accessible from the
elements
field.
This is a list of length 4, one fore each term in the sequence.
The classes of the objects may help to explain its structure.
sapply(d$elements$entry3$content$elements,class) [1] "XMLOrContent" "XMLElementContent" "XMLSequenceContent" [4] "XMLElementContent"Let's look at the third entry, the
XMLSequenceContent
object.
r <- d$elements$entry3$content$elements[[3]]Again, this is a sequence. Its sub-entries are of diffrent content classes.
sapply(r$elements, class) [1] "XMLElementContent" "XMLElementContent" "XMLElementContent" [4] "XMLElementContent" "XMLOrContent"The first 4 are reasonably obvious. These identify single elements and are the primitive content types.
> r$elements[[1]] $type Element 2 $ocur Mult 2 $elements [1] "a" attr(,"class") [1] "XMLElementContent"We see that the expected type is
a
and that there can be
zero or more of these.
The more interesting entry is the last one. Its primitive display is given below.
$type Or 4 $ocur Once 1 $elements $elements[[1]] $type Element 2 $ocur Once 1 $elements [1] "e" attr(,"class") [1] "XMLElementContent" $type Element 2 $ocur Once 1 $elements [1] "f" attr(,"class") [1] "XMLElementContent" attr(,"class") [1] "XMLOrContent"We see that it is of type
Or
and that we expect exactly
one instance of it. It is interpreted by expecting any of the content
structures described in its elements
list. Each of these
is a simple XMLElementContent
object and so is a
"primitive".
filter$output("variable", c(unit="mpg"), value)Then, the filter will check its current state, specifically the last open/unfinished element, and examine its content specification. If the previous command was something like
filter$open("variables", numRecords=nrow(data))then the filter will extract the list of possible entries for this tag.
dtd$entries[["variables"]]$contents$elementsThen it determines whether the element
variable
can be added.
In the case of a dataset, this is a simple lookup.
The only acceptable value is a variable
element.
> d$elements$variables$contents $type Element 2 $ocur Mult 3 $elements [1] "variable" attr(,"class") [1] "XMLElementContent"