GenXDM User's Guide


Table of Contents

Overview
Injecting a GenXDM Bridge Implementation
Dependency Injection Pattern
Factory Method Pattern
Reading an XML Document with DocumentHandler
Using Error Handlers and Resolvers
Creating an XML Document with FragmentBuilder
Create the Document
Create an Element
Close Containers
Access the Document
Examining an XML Document with Model/Cursor
Understanding Model and Cursor
Inspect the Model
Modifying an XML Document
Example: Modifying a Document
Writing to a File or Socket with DocumentHandler
Example: Writing to Disk
Implementing Schema Awareness
Example: Initializing the Schema Context
Example: Validating a Document
Navigating and Investigating Typed Documents

Overview

This book describes how GenXDM operates within the XML life cycle. Topics covered include:

  1. Injecting a GenXDM Bridge Implementation explains how to prepare GenXDM for use by injecting a bridge implementation, using ProcessingContext.

  2. Reading an XML Document describes how a simple XML document, received from file, URL, or socket, can be reviewed using DocumentHandler.

  3. Creating an XML Docment describes how a simple XML document is created (or constructed) using FragmentBuilder.

  4. Examining an XML Document explains how to use Model/Cursor to examine an XML document that you have read or constructed.

  5. Modifying an XML Document explains how you can modify a document that you've received, or use a partner processor to modify a document that you created.

  6. Writing to a File or Socket describes how you can serialize a document to a file or socket.

  7. Implementing Types explains how to use typed trees.

Injecting a GenXDM Bridge Implementation

In order to use GenXDM, you must implement a bridge to connect the GenXDM API to the underlying data model. The choice of bridge therefore depends upon your preferred tree model. For example, you can inject a bridge to the Document Object Model (DOM).

Once a bridge is injected, GenXDM can be used without consideration of the underlying tree model. All bridges operate the same way. Two primary methods of bridge injection are described in the following sections:

Regardless of the pattern chosen, the end result is a ProcessingContext<N> object, or the equivalent of:

ProcessingContext<N> context = getContext(ProcessingContextFactory<N> factory);

Throughout the rest of this guide, this object is referred to as the context, whether it's a processing context for DOM, AxiOM, Cx, or some other bridge. All of that implementation-specific dependency has been either localized into the factory method, or it has been hidden through dependency injection.

Once the processing context is prepared, representing ProcessingContext<N> for a <N>ode type, you are ready to start working with XML.

Dependency Injection Pattern

The ideal solution for implementing a bridge is to use the Dependency Injection pattern. When the bridge is obtained using dependency injection, the GenXDM code is not closely coupled with the actual bridge implementation. Several well-known Java toolkits are available for the dependency injection pattern:

JSR 330 has standardized injection in Java SE and EE, allowing you to choose a suitable framework for dependency injection and make use of it.

Factory Method Pattern

The bridgekit module, which is included in the GenXDM code, includes an interface that can be used to loosely couple the GenXDM API to a single bridge implementation. The org.genxdm.bridgekit.ProcessingContextFactory interface is parameterized on <N>ode, so can only create a ProcessingContext for one bridge implementation.

If a ProcessingContextFactory is implemented and used, then it is the only place in the code where there is a dependency on a particular bridge implementation. For example, when using this pattern and the DOM bridge, the class that implements ProcessingContextFactory should be the only place in all the code that contains import org.w3c.dom.Node.

Reading an XML Document with DocumentHandler

GenXDM uses DocumentHandler to read XML documents. ProcessingContext<N> extends DocumentHandlerFactory<N>, so the simplest way of obtaining a tool for parsing XML is:

DocumentHandler<N> parser = context.newDocumentHandler();

DocumentHandler uses the InputSource parse method to read documents:

N parse(InputSource, URI)

Using Error Handlers and Resolvers

There are several additional methods in the DocumentHandlerFactory interface, allowing you to get and set an error handler (XMLReporter, a javax.xml.stream abstraction) or to get and set a Resolver (org.genxdm.io). There is also a method, newDocumentHandler(XMLReporter, Resolver) that allows creation of a DocumentHandler<N> with customized error handler and resolver, without changing the defaults.

Creating an XML Document with FragmentBuilder

In GenXDM, FragmentBuilder is used to create or construct an XML document in memory.

Note

This section assumes that you have a ProcessingContext<N>, called context.

This section demonstrates the construction of a simple XML document in memory:

<?xml version="1.0"?>
<!-- comment -->
<?pi data?>
<element xmlns="http://localhost/" attr="value">text</element>

To create this XML document in GenXDM, we use ProcessingContext, which constructs the document using the proper node type:

FragmentBuilder<N> builder = context.newFragmentBuilder();

The following sections describe how to create and review this document:

The document contains all seven GenXDM NodeKind-s: DOCUMENT, TEXT, COMMENT, PROCESSING_INSTRUCTION, ELEMENT, NAMESPACE, and ATTRIBUTE.

Create the Document

Several tasks are part of the initial document creation:

  1. Create a constant string

  2. Create the document

  3. Add whitespace (embedded linefeeds)

  4. Create a comment

  5. Create a processing instruction

For example:

final String LF = "\n"; 
builder.startDocument(null, null);
builder.text(LF); 
builder.comment("comment"); 
builder.text(LF); 
builder.processingInstruction("pi", "data"); 
builder.startElement("http://localhost/", "element", "");
builder.namespace("", "http://localhost/");
builder.attribute("", "attr", "", "value"); 
builder.text("text"); 
builder.endElement(); 
builder.text(LF); 
builder.endDocument;

Use startDocument() to create a document as <N>ode of NodeKind.DOCUMENT.

The parameters to the startDocument() method are a URI representing the SystemId and an internal subset. Both of theses can be set to null, which is usually correct. As this document will be created in memory, there is no specific location.

There are several embedded linefeeds in this document. Each must be created as a text node.

Create an Element

Next, create an element.

Elements are like documents: both are containers. Therefore, instead of a single method, like text() that creates a complete node at once, we provide a startElement() method that opens the container.

startElement parameters are always given in the same order, and may never be null:

namespace URI The domain for the name. To indicate a global namespace, specify an empty string: ""
local name The name itself.
prefix The binding for the namespace in this scope. Note that, for this document, the default prefix is not in the global namespace, but is bound to the namespace http://localhost/ "".

Close Containers

To complete the document creation, end the element and document, closing the containers.

To complete the element, call endElement(), so the element has no further children.

A final text(LF) method finishes adds a new line at the end of the document, and then end the document by calling endDocument().

Access the Document

The document created is now stored in memory. To access the document, use:

N doc = builder.getNode();

FragmentBuilder<N> extends NodeSource<N>, which is an interface offering a tool that is capable of creating a single node or a sequence of nodes. Its two methods are:

  • N getNode();

  • List<N> getNodes();

If a sequence of nodes has been created, and getNode() is called, only the first node is returned. However, the most common usage of FragmentBuilder is to create a tree rooted at a single node, so getNode() is provided to make this most-common use case more convenient

Examining an XML Document with Model/Cursor

GenXDM employs Model/Cursor to examine XML documents.

Note

This section assumes that:

This section describes how a retrieved document is processed. The details of the processing depend upon the application's needs and the structure of the XML, which is presumably known in advance.

The descriptions and examples that follow refer to the document below, which is a variant of the po.xml described in the W3C XML Schema Primer:

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
   <shipTo country="US">
      <name>Alice Smith</name>
      <street>123 Maple Street</street>
      <city>Mill Valley</city>
      <state>CA</state>
      <zip>90952</zip>
   </shipTo>
   <billTo country="US">
      <name>Robert Smith</name>
      <street>8 Oak Avenue</street>
      <city>Old Town</city>
      <state>PA</state>
      <zip>95819</zip>
   </billTo>
   <creditcard type="Visa" 
expiration="2011-08">4444333322221111</creditcard>
   <items>
      <item partNum="872-AA">
         <productName>Lawnmower</productName>
         <quantity>1</quantity>
         <USPrice>148.95</USPrice>
         <comment>Confirm this is electric</comment>
      </item>
   </items>
</purchaseOrder>

Understanding Model and Cursor

GenXDM offers two tools for navigating and examining an XML document. Model and Cursor provide mostly equivalent functionality, and have similar APIs. The primary difference between the two is how state is maintained.

Structural Navigation

Structural navigation allows you to find other nodes based on structural relationships between two nodes.

The results can be surprising. In particular, most XML contains whitespace (linefeeds and tabs or spaces) so that it shows up like the po.xml document above, on multiple lines with indentation. In the XML, these spaces are represented as text nodes. So the first child node of the document node is actually a text node, not the <purchaseOrder> element. The first child element of the document node is the <purchaseOrder> element.

Most XML processing is concerned primarily with the content of elements and attributes. The textual content of element nodes are actually text nodes, in the XQuery Data Model.

The methods that can be used for structural navigation include:

N getRoot(N); get the document node, or the root if this is a fragment
N getParent(N); get the parent of the target node, which must be an element or document node
N getFirstChild(N); get the first ...
N getLastChild(N); or last child of the target node, which will never be a document, attribute, or namespace node, but may be an element, text, comment, or processing instruction. This method only works if the target is a document or element node (nobody else has children).
N getNextSibling(N); get the next ...
N getPreviousSibling(N); or previous sibling, if the target node is an element, text, comment, or processing instruction. The returned node may be an element, text, comment, or processing instruction node.
N getFirstChildElement(N); get the first child ...
N getNextSiblingElement(N); or the next sibling element. These are 'filtered' results similar to getFirstChild(N) and getNextSibling(N); they will skip over text, comment, and processing instruction node results.
N getFirstChildElementByName(N, String, String); get the first child ...
N getNextSiblingElementByName(N, String, String); or the next sibling element that matches the indicated namespace (second argument) and local name (third argument). This is a further filter of the getFirstChildElement(N) and getNextSiblingElement(N) methods, which skips not only text, comment, and processing instruction nodes, but also skips elements that do not have the name desired.
N getAttribute(N, String, String); equivalent to getChildElementByName, only for attributes instead of child elements. This method only works if the target is an element node and contains an attribute with the specified namespace and name.

Axis Navigation

Note

Axis navigation is available only in the Model. Cursor cannot be used with axis navigation.

Axis navigation is a form of navigation that processes axes. XPath and XQuery define a number of "axes", which are sequences of similar nodes, starting from one point in the document and accumulating based on a pattern. Many XML documents contain chunks of repetitive structure, with varying data inside the structures. Axis navigation makes processing these chunks straightforward.

For example, consider a library element that contains a number of book elements. Each book element has a title, author, unit price, and quantity. Using axis navigation, for each book in the library you could:

  • Look up the quantity available for each title.

  • Compare the quantity desired with the quantity available.

  • Calculate the cost of an order by multiplying the quantity.

  • Add the book subtotal to the order total.

Informational Methods

There are informational methods in both Model and Cursor that are useful for discerning information about nodes. Note that the Cursor methods do not take the context-node first argument.

getBaseURI(N)

getDocumentURI(N)

Both document URI and base URI are available when the underlying tree model supports them. These methods are useful when resolving imports and inclusions
hasAttributes(N)

hasNamespaces(N)

hasChildren(N)

hasParent(N)

hasNextSibling(N)

hasPreviousSibling(N)

These methods allow you to ask nodes about their relationships, without calling node-accessor methods and checking for null.
isAttribute(N)

isNamespace(N)

isElement(N)

isTest(N)

You can also ask a node what it is, similar to calling getNodeKind(N) and interpreting the results.
isId(N)

isIdRefs(N)

These methods tell you whether an attribute is of type ID, or an element has an attribute of type ID, or if an attribute is of type IDREF or IDREFS.
getAttributeNames(N, boolean)

getAttributeStringValue(N, String, String)

getNamespaceNames(N, boolean)

getNamespaceForPrefix(N, String)

getNamespaceBindings(N)

These methods allow you to examine attributes and namespaces without getting these nodes as nodes. Note that getNamespaceBindings() allows an underlying tree model to represent namespaces as something other than nodes.
NodeKind getNodeKind(); Tells you which of DOCUMENT, ELEMENT, ATTRIBUTE, NAMESPACE, TEXT, COMMENT, and PROCESSING_INSTRUCTION this node is. This is most useful to distinguish between text, comment, processing instruction, and element nodes, when navigating over children, or between element and document nodes when navigating over ancestors. You'll generally know when you're investigating attributes and namespaces.
String getNamespaceURI(N);

String getLocalName(N);

String getPrefix(N);

Get the full qualified name of this node, if it has one: its namespace URI (element, attribute, and namespace nodes), its prefix (element and attribute nodes only), and its local name (element, attribute, namespace, and processing instruction nodes).
boolean matches(N, NodeKind, String, String);

boolean matches(N, String, String);

The first of these checks whether the NodeKind of the target node matches the supplied NodeKind. Then, for both methods, it checks whether the supplied namespace and local name match the namespace and local name of the target node.
String getStringValue(N); Returns the string value of the node. For comment, processing instruction, text, attribute, and namespace nodes, this is the 'value' of the node. For document and element nodes, it is the value of all its descendant text and element nodes.

Several of these methods are used in the examples below.

Inspect the Model

You can determine many things about a node using the GenXDM API. For example, with the tools described in Informational Methods, an application could navigate over the provided document to print a shipping label:

Ship to:
Alice Smith
123 Maple Street
Mill Valley, CA 90952

For details, see Example: Inspection Using Model and Example: Inspection Using Cursor.

Axis navigation can also be used to inspect a document. For an example, see Example: Credit Card Processing.

Example: Inspection Using Model

This example uses Model to inspect the po.xml document provided at the top of the Examining an XML Document with Model/Cursor section and prints an address label.

The example assumes a method, implemented elsewhere, that allows us to get a PrintWriter that sends a job to the local label printer:

PrintWriter printer = getPrinter();
printer.println("Ship to:");

It's very common to move from the document node to the first (and only) child element as the first act of XML processing. In this case, you "know" that this is a purchaseOrder element. However, you could test this, if there were a chance of receiving some other document type.

  1. Get the first element:

    Model<N> model = context.getModel();

    N node = model.getFirstChildElement(document);

  2. Get the desired elements in order:

    node = model.getFirstChildElementByName(node, "", "shipTo");

    Here, assume that the child elements of the purchaseOrder may appear in any order.

    Next, assume that the order of elements inside an address must appear in the order given, but verify at each step that it's the right thing. Also, note that if desired you can assign a new value to the same N variable that you supply as the context.

  3. Get the string value of the text node.

    node = model.getFirstChildElement(node);
    if (model.matches("", "name"))
        printer.println(model.getStringValue(node));
    

    Here you can take a shortcut. Because you know that the name element contains a single text node with the value that you want, you can navigate to that text node, and get the string value of the text node. Since there is only one text node, you can take advantage of the fact that the string value of an element node is the string value of its children, concatenated, and if there's only one, then the string value of the single child is the string value of the node.

  4. Format the string.

    model.getFirstChild(node)

    The string value of the shipTo element contains extra carriage returns, which are not desirable for label printing:

    Alice Smith
    123 Maple Street
    Mill Valley
    CA
    90952

    To get the string in a more appropriate format:

    node = model.getNextSiblingElement(node);
    if (model.matches("", "street"))
        printer.println(model.getStringValue(node));
    node = model.getNextSiblingElement(node);
    
    if (model.matches("", "city"))
        printer.print(model.getStringValue(node) + ", ");
    node = model.getNextSiblingElement(node);
    
    if (model.matches("", "state"))
        printer.print(model.getStringValue(node) + " ");
    node = model.getNextSiblingElement(node);
    
    if (model.matches("", "zip"))
        printer.println(model.getStringValue(node));
    

Example: Inspection Using Cursor

This example uses Cursor to inspect the po.xml document provided at the top of the Examining an XML Document with Model/Cursor section and prints an address label.

The primary difference between this and the Model example is that the Cursor has positional state. Instead of keeping track of state yourself, using N node, the cursor "moves to" a location.

PrintWriter printer = getPrinter();
printer.println("Ship to:");
Cursor<N> cursor = context.newCursor(document);

cursor.moveToFirstChildElement();

cursor.moveToFirstChildElementByName("", "shipTo"); 
cursor.moveToFirstChildElement(); 
if (cursor.matches("", "name"))
    printer.println(cursor.getStringValue());
cursor.moveToNextSiblingElement();
if (cursor.matches("", "street"))
    printer.println(cursor.getStringValue());
cursor.moveToNextSiblingElement();
if (cursor.matches("", "city"))
    printer.print(cursor.getStringValue() + ", "); 
cursor.moveToNextSiblingElement();
if (cursor.matches("", "state"))
    printer.print(cursor.getStringValue() + " "); 
cursor.moveToNextSiblingElement();
if (cursor.matches("", "zip"))
    printer.println(cursor.getStringValue());

Example: Credit Card Processing

This example uses methods described in Informational Methods to examine and process an order form. In order to process the example credit card, you will verify the card authorization data. Authorization returns a chargeID, which is then sent, along with the amount charged and the merchantID, to the card processor.

The steps needed are described in the sections below:

Verify the Credit Card Information

In this example, the card processor expects the data formatted as:

NAME\nADDRESS\nCITY, STATE ZIP\nCARDNUM\nEXPIRATION

In order to retrieve this data from the form, use a use a simple navigation and error checking method, similar to that used in the previous example on Inspection, this time assuming N document, N node, and Model<N> model:

node = model.getFirstChildElement(document);
node = model.getFirstChildElementByName(node, "", "billTo");

StringBuilder buffer = new StringBuilder(); 
N buyerNode = model.getFirstChildElement(node); 
if (model.matches(buyerNode, "", "name"))
    buffer.append(model.getStringValue(buyerNode) + "\n"); 
buyerNode = model.getNextSiblingElement(buyerNode);
if (model.matches(buyerNode, "", "street"))
    buffer.append(model.getStringValue(buyerNode) + "\n"); 
buyerNode = model.getNextSiblingElement(buyerNode);
if (model.matches(buyerNode, "", "city"))
    buffer.append(model.getStringValue(buyerNode) + ", "); 
buyerNode = model.getNextSiblingElement(buyerNode);
if (model.matches(buyerNode, "", state"))
    buffer.append(model.getStringValue(buyerNode) + " "); 
buyerNode = model.getNextSiblingElement(buyerNode);
if (model.matches(buyerNode, "", "zip"))
    buffer.append(model.getStringValue(buyerNode) + "\n");

The above obtains the buyer address.

Add Data from the Credit Card Element

To retrieve information from the credit card element, you must first navigate there from the billTo node then get the card number and expiration date:

node = model.getNextSiblingElementByName(node, "", "creditcard");
buffer.append(model.getStringValue(node) + "\n"); 
buffer.append(model.getAttributeStringValue(node, "", "expiration"));

Submit the Bill to the Processor

Note

Because axis navigation is not available with the Cursor, the Model must be used for this step.

Finally, to submit the bill to the cardProcessor, prepare a line-delimited string in the format:

CHARGE_ID\nMERCHANT_ID\nAMOUNT

The axis navigation methods, described in Axis Navigation, are used to prepare this data:

node = model.getFirstChildElement(document);
node = model.getFirstChildElementByName(node, "", "items"); 
Iterable<N> items = model.getChildElementsByName(node, "", "item"); 
double totalPrice = 0.0;

for (N item : items) {
    int inStock = checkInventory(model.getAttributeStringValue(item, "", "partNum"));
    int quantity = Integer.valueOf(model.getStringValue(
                      model.getChildElementByName(item, "", "quantity")));
    double unitPrice = Double.valueOf(model.getStringValue(
                          model.getChildElementByName(item, "", "USPrice")));
    double subTotal = unitPrice * ( (inStock < quantity) ? inStock : quantity );
    // subTotal may be *zero* if there's nothing in stock.
    totalPrice += subTotal;
}

cardProcessor.charge(chargeID + "\n" + merchantID + "\n" + totalPrice); warehouse.fulfill(document);

Modifying an XML Document

GenXDM uses mutable methods to modify XML in place:

  • MutableContext

  • NodeFactory

  • MutableModel/MutableCursor

Note

This section assumes that:

This section describes how to modify the XML document received in order to add an order ID to the purchase.

Example: Modifying a Document

This example uses GenXDM to modify the po.xml document shown in Examining an XML Document with Model/Cursor, in order to add an order ID to the purchase.

Note that the po.xml file includes just one customer with a given name in a given zip code.

  1. Create a static class IdGenerator:

    class IdGenerator {
    
    public static String newOrderId() { ... } 
    
    }

  2. Obtain the MutableContext<N>, and get a MutableModel<N> from there.

    MutableContext<N> mutant = context.getMutableContext(); 
    MutableModel<N> mutantModel = mutant.getModel(); 
    NodeFactory<N> factory = mutantModel.getFactory(document);

    Note that the NodeFactory is obtained from the MutableModel, not from the MutableContext. This is necessary to avoid ownership issues over the DOM bridge: DOM confuses the document node (a container) with the factory for creating nodes, and every node has an "owner" document. To avoid the costs associated with changing ownership, obtain a NodeFactory from a MutableModel or MutableCursor, supplying the owner document node. If you are not using the DOM bridge, this is not an issue.

  3. Add the order ID to the document:

    Using Model:

    N node = mutantModel.getFirstChildElement(document);
    if (mutantModel.matches(node, "", "purchaseOrder")) {
        N orderId = factory.createAttribute("", "id", "", IdGenerator.newOrderId());
        mutantModel.insertAttribute(node, orderId); 
    }

    Using Cursor:

    MutableCursor<N> mutantCursor = mutant.newCursor(document); 
    NodeFactory<N> factory = mutantCursor.getFactory(); 
    mutantCursor.moveToFirstChildElement();
    if (mutantCursor.matches("", "purchaseOrder")) {
        N orderId = factory.createAttribute("", "id", "", IdGenerator.newOrderId());
        mutantCursor.insertAttribute(orderId);
    }

    Note that the method variation from Model to Cursor is exactly as with the immutable base interfaces: "get" becomes "moveTo", and methods do not need the first context node argument to methods, "N".

Writing to a File or Socket with DocumentHandler

To write your document to a file or a socket, GenXDM uses DocumentHandler.

Note

This section assumes that:

This section describes how a document is archived to disk.

Example: Writing to Disk

This example uses DocumentHandler to write the po.xml document to disk. To archive the document:

  • Using a Writer:

    File filename = getUniqueArchivalLocation();
    Writer writer = new FileWriter(filename);
    
    DocumentHandler handler = context.newDocumentHandler(); 
    handler.write(writer, document);
    

  • Using an OutputStream:

    URL url = getTargetEndpoint();
    URLConnection connection = url.openConnection(); 
    OutputStream stream = connection.getOutputStream();
    
    DocumentHandler handler = context.newDocumentHandler(); 
    handler.write(stream, document, "UTF-8");
    

Note

Note that GenXDM does not currently offer pretty printing capabilities.

Implementing Schema Awareness

Schema-aware processing, also called typed processing, uses a schema to describe the content, down to the primitive types of certain text and attribute nodes, of a category of XML instance documents. With the schema and an instance, you can use GenXDM to annotate the instance, in a process called "validation". Once an instance has been validated, both the type annotation and a typed value (called an "atom" and represented by the <A> parameter in the API, so that it may differ from bridge to bridge) may be queried from the enhanced (typed or schema-aware) Model and Cursor.

The basic steps to implementing schema awareness are:

Example: Initializing the Schema Context

Applications should know in advance what schemas they are handling. Very little can be accomplished with a document that is effectively an unknown format, containing data (however strongly typed) about which the developers of the application know nothing. As a result, the normal flow in preparing for schema-aware, typed processing is to begin by registering all the schema components defined by the schemas of interest with the TypedContext.

Note

This section assumes that:

  • context is a ProcessingContext<N>.

In this example, you create an enhanced context and configure a SchemaParser to parse the schema.

The enhanced context is the TypedContext. A TypedContext is-a Schema, which also defines methods for interactively defining schema components, and for registering "bags" of schema components. The Schema has-a ComponentProvider and has-a ComponentBag:

  • ComponentProvider is a user-accessible interface providing components of a given type by name.

  • ComponentBag provides access to components by iteration over each of the primary component types. It is mostly useful for bulk handling of schema components.

To initialize the Schema:

  1. Register the schema components with the TypedContext. In order to initialize the schema context, you must register the schema components:

    TypedContext<N, A> tcontext = context.getTypedContext();

    While it is possible to programmatically create and declare (or define) schema components, the usual method for initializing a schema context is to use a SchemaParser. The SchemaParser interface is defined in the org.genxdm.xs package. The GenXDM distribution includes a basic implementation of a schema parsing processor, in the module proc-w3cxs. This processor implements the SchemaParser interface, which includes several methods which must be called to initialize any parser processor correctly.

  2. Provide the SchemaParser with its "bootstrap" component provider. At a minimum, the bootstrap component provider must define the schema for schema, but also typically defines the XML and schema instance namespaces.

    Default GenXDM bridges build schema implementations using the utilities in bridgekit, which includes initialization of the bootstrap provider. You can therefore define this code snippet to instantiate and begin initializing the schema parser:

    SchemaParser schemaParser = new W3cXmlSchemaParser(); 
    schemaParser.setComponentProvider(tcontext.getComponentProvider());

  3. Provide the SchemaParser with a resolver. The resolver is needed to handle schema include and import directives. The resolver works with the schema catalog: the catalog can resolve a namespace and schema location, or schema location alone, to return a new URI. That URI is then passed to the catalog resolver to produce an input stream.

    From the SchemaParser interface:

    void setCatalogResolver(CatalogResolver resolver, SchemaCatalog catalog)
    
    SchemaCatalog catalog = new DefaultSchemaCatalog(new DefaultCatalog()); 
    schemaParser.setCatalogResolver(DefaultCatalogResolver.SINGLETON,
    catalog);
    

    Note that the supplied default catalog resolver is a singleton, so does not need instantiation.

    DefaultSchemaCatalog takes a Catalog in its constructor; proc-w3cxs also provides a default Catalog implementation. The classic implementation of a cataloging resolver locates the necessary information at a remote URI, and then remaps those bits to locally-stored files. However, the default implementations supplied with W3cXmlSchemaParser are not particularly robust, and do not do on-the-fly remapping. For your implementation, you may need to provide alternate implementations rather than using DefaultCatalog and DefaultSchemaCatalog shown in the example above.

    Note

    There are two more optional methods on the SchemaParser interface, allowing the user to set the regular expression compiler and to set schema load options. However, the W3cXmlSchemaParser ignores options, and chooses a reasonable default for its regular expression compiler.

  4. Supply an error handler. Two default implementations are provided in the core API: SchemaExceptionCatcher and SchemaExceptionThrower.

    The catcher accumulates errors, which can then be examined after parsing (unless something fatal happens). The thrower offers a stricter approach: it throws an exception when any problem is encountered, stopping further processing. (See Checking for Errors for additional information.)

    The simpler approach can be used when the schemas are well-formed and accessible, as are their includes and imports from the resolver, and are valid. In this case, use:

    SchemaExceptionHandler handler = SchemaExceptionThrower.SINGLETON;

    Note

    The signature of the parse method returns a ComponentBag. Note that the Schema interface defines a method to register schema components in bulk, so long as they are provided as component bags. The method also requires a schema location URI and a System ID. Both URIs may be null. If they are not null, the schema location provides the base URI (used for resolution of imports and includes via relative URI), and the System ID is the 'canonical identifier' for this schema document.

  5. Supply the schema as an InputStream.

    Assuming you have an Iterable<URI> of schema locations and want to use the catalog resolver to retrieve the initial input stream (as well as the streams for include and import, which the parser will handle behind the scenes), initialize the context using:

    for (URI location : locations) {
        InputStream stream =
    DefaultCatalogResolver.SINGLETON.resolveInputStream(location);
        ComponentBag components = schemaParser.parse(location, stream, location, handler);
        tcontext.register(components);
    }
    

Once all of the schemas of interest have been registered with the TypedContext<N, A>, the next steps are examples of validating-while-parsing, of validating a tree that has been constructed in memory or that was parsed without validating, and then of navigating over the enhanced, schema-aware model.

Example: Validating a Document

Note

This section assumes that:

Two examples are provided here:

  • Validating in Memory. This typically returns a new tree, but may instead decorate the supplied tree.

  • Validating While Parsing. This demonstrates how the document can be read with a validator in the pipeline, so that the resultant tree of nodes is decorated with types and contains atoms.

In both of the examples, assume that tcontext has already parsed po.xsd, which defines the schema that the sample po.xml uses.

po.xsd

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:annotation>
    <xsd:documentation xml:lang="en">
     Purchase order schema for Example.com.
     Copyright 2000 Example.com. All rights reserved.
    </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

  <xsd:element name="comment" type="xsd:string"/>

  <xsd:complexType name="PurchaseOrderType">
    <xsd:sequence>
      <xsd:element name="shipTo" type="USAddress"/>
      <xsd:element name="billTo" type="USAddress"/>
      <xsd:element name="creditcard" type="CreditCard"/>
      <xsd:element ref="comment" minOccurs="0"/>
      <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
  </xsd:complexType>

  <xsd:complexType name="USAddress">
    <xsd:sequence>
      <xsd:element name="name"   type="xsd:string"/>
      <xsd:element name="street" type="xsd:string"/>
      <xsd:element name="city"   type="xsd:string"/>
      <xsd:element name="state"  type="xsd:string"/>
      <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN"
                   fixed="US"/>
  </xsd:complexType>
  
  <xsd:complexType name="CreditCard">
    <xsd:simpleContent>
      <xsd:restriction base="xsd:string">
        <xsd:pattern value="[0-9]{16}"/>
        <xsd:attribute name="type" type="CCEnum"/>
        <xsd:attribute name="expiration" type="xsd:gYearMonth"/>
      </xsd:restriction>
    </xsd:simpleContent>
  </xsd:complexType>
  
  <xsd:simpleType name="CCEnum">
    <xsd:restriction base="xsd:token">
      <xsd:enumeration value="Visa"/>
      <xsd:enumeration value="MasterCard"/>
      <xsd:enumeration value="AmericanExpress"/>
      <xsd:enumeration value="Discover"/>
      <xsd:enumeration value="JoeBobsCornerPawnshop"/>
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:complexType name="Items">
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="productName" type="xsd:string"/>
            <xsd:element name="quantity">
              <xsd:simpleType>
                <xsd:restriction base="xsd:positiveInteger">
                  <xsd:maxExclusive value="100"/>
                </xsd:restriction>
              </xsd:simpleType>
            </xsd:element>
            <xsd:element name="USPrice"  type="xsd:decimal"/>
            <xsd:element ref="comment"   minOccurs="0"/>
            <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
          </xsd:sequence>
          <xsd:attribute name="partNum" type="SKU" use="required"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <!-- Stock Keeping Unit, a code for identifying products -->
  <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
  </xsd:simpleType>

</xsd:schema>

Validating in Memory

This section describes how to validate an existing, untyped tree, using a document previously read into the schema context. This po.xml documnet is validated against the po.xsd shown above.

Because the mutable API does not allow typed trees, any changes made to the tree invalidate portions of the tree. For this reason, do not use a version of the po.xml that was modified in Modifying an XML Document.

The method needed is found on TypedContext<N, A>:

N validate(N source, ValidationHandler<A> validator, URI
schemaNamespace)

where:

N source

is the untyped tree that you want to validate. In this example, po.xml.

Alternately, use this method to re-validate a typed tree that has been modified in memory. Because the mutable API is untyped-only, any nodes added, or removed, or any other changes made to the tree implicitly invalidate portions of the tree. Specifically, the ancestor-or-self axis is invalidated, with the context set to the modified node. Once changes have been made, this method may be called to validate the modified tree (assuming that those changes did not make the result invalid).

URI schemaNamespace

is the target namespace of the schema. In this example, there is no target namespace; this is the empty string: "" (also called the global namespace).

Using the global namespace is extremely useful for schemas used in examples, because it reduces the clutter and allows us to focus on what we're doing. However,

This is considered poor practice for schemas that are used in production, but it is extremely useful for schemas used in examples. So long as there is only one schema with no @targetNamespace attribute (thus putting its components in the global, unnamed namespace represented by the empty string), this is not ambiguous. For this example, po.xsd is the only schema that defines components in this namespace.

ValidationHandler<A> validator

is the validation handler that verifies that the values encountered are valid for the specified type. For more information, see Validator.

For example:

ValidatorFactory<N, A> factory = new ValidatorFactory<N, A>(tcontext);
ValidationHandler<A> validator = factory.newXdmContentValidator();
SchemaExceptionHandler errors = new SchemaExceptionCatcher();
validator.setSchemaExceptionHandler(errors);
N typedDocument = tcontext.validate(document, validator, "");

Validating While Parsing

This section describes how a document can be read using a validator in the pipeline, so that the resultant tree of nodes is decorated with types and contains atoms.

TypedContext<N, A> is a TypedDocumentHandlerFactory<N, A>. TypedDocumentHandlerFactory<N, A> defines one new method:

TypedDocumentHandler<N, A> newDocumentHandler(SAXValidator<A>
validator, XMLReporter reporter, Resolver resolver)

where:

SAXValidator<A> is a Validator. It also is a SAX ContentHandler (which is not the same as a GenXDM ContentHandler). See also Validator for more information on Validators.
XMLReporter reporter is the error handler, as described in Error Handler. In the example that follows, the reporter is null.
Resolver resolver is the resolver, as described in Example: Initializing the Schema Context.

The following example also assumes that there is a method:

InputSource getInputSource()

which returns an initialized InputSource, as discussed in Example: Initializing the Schema Context.

For example:

ValidatorFactory<N, A> factory = new ValidatorFactory<N, A>(tcontext); 
SAXValidator<A> validator = factory.newSAXContentValidator();
SchemaExceptionHandler errors = new SchemaExceptionCatcher();
validator.setSchemaExceptionHandler(errors);
TypedDocumentHandler<N, A> parser = tcontext.newDocumentHandler(validator, null, null);
N typedDocument = parser.parse(getInputSource(), null);

Checking for Errors

Once you have run the validator over a tree, or while parsing, you hopefully have a tree that is valid. If the validator encountered any problems during validation, the response depends on the error handler specified during initialization.

Navigating and Investigating Typed Documents

Note

This section assumes that:

The TypedModel<N, A> adds five new methods to Model<N>, which it extends. All of these methods provide more information (or enhanced information transmission, in one case), no new forms or methods for navigation

  • QName getAttributeTypeName(N parent, String namespaceURI, String localName)

    Called with an element node as context, this method returns the type name of the designated attribute, if it exists, and is valid.

  • Iterable<? extends A> getAttributeValue(N parent, String namespaceURI, String localName)

    Called with an element node as context, this method returns the content of such a valid attribute, as a list of atoms.

  • QName getTypeName(N node)

    Returns null for all node kinds except attribute and element, and only returns non-null if the attribute or element is valid.

  • Iterable<? extends A> getValue(N node)

    Returns values for all node kinds. Elements with simple content return results as expected; comment and processing instructions return their (string) content as xs:untypedAtomic. The value of a namespace node is its URI. Elements with complex content, and documents, return their XDM value, which is the concatenated value of all the nodes in the descendant axis.

  • void stream(N node, boolean copyNamespaces, SequenceHandler<A> handler)

    Provides a means of streaming information into an enhanced ContentHandler, called the SequenceHandler.

The SequenceHandler (which has already been implicitly used in parsing) has several methods added, as overloads, to the ContentHandler interface:

  • void attribute(String, String, String, List<? extends A>, QName)

    Takes namespace URI, local name, prefix, and then instead of a String value and an enumerated DTD attribute type, it takes a list of atoms as value and a QName as the type.

  • void startElement(String, String, String, QName)

    Elements have types, but not values; the startElement method adds an argument of type QName that specifies the type.

  • void text(List<? extends A>)

    Simple content of elements appears in text nodes; the text method is overridden to accept typed values. The containing element keeps track of the type name.

Example: Navigating and Investigating Typed Documents

This example shows how the methods shown above are used to navigate a typed document.

To see how this all works, take a look at the schema and the instance document, found in typedDocument, and revisit the po.xml document first described in Examining an XML Document with Model/Cursor.

Note first that both shipTo and billTo are of the same type, USAddress. USAddress contains a number of child elements with simple content, all of which are defined to be strings--except for the zip code, which is defined to be xsd:decimal. This is not at all uncommon in schema definitions, although there is no need to have decimal or even integer zip codes, since you don't add and subtract, multiply and divide zip codes.

The credit card's contents are defined as a string with a pattern: 16 digits. The element has two attributes. One is an enumeration (a restricted universe of permitted values, here the allowed credit card types), and the other is a date type, in this case gYearMonth.

According to the schema, the quantity attribute is a positive integer with a maximum value of 100. "Positive integer" doesn't include zero, so after validation, we know that this contains a number from 1 to 100. The partNum element is well-done: it is a control number, with a pattern; we know that it matches the numbers in our inventory.

Here's a typed example:

TypedModel<N, A> model = tcontext.getModel(); 
N node = model.getFirstChildElement(typedDocument);
node = model.getFirstChildElementByName(node, "", "items"); 
Iterable<N> items = model.getChildElementsByName(node, "", "item"); 
double totalPrice = 0.0; 
AtomBridge<A> atoms = tcontext.getAtomBridge();

for (N item : items) {
    A partNum = model.getAttributeValue(item, "", "partNum").iterator().next();
    int inStock = checkInventory(partNum));

    A quant = model.getValue(model.getChildElementByName(item, "", "quantity"))
                            .iterator().next(); 
    int quantity = atoms.getInt(quant);

    A price = model.getValue(model.getChildElementByName(item, "", "USPrice"))
                             .iterator().next();
    double unitPrice = atoms.getDouble(price);

    double subTotal = unitPrice * ( (inStock < quantity) ? 
                                    inStock : quantity );
    totalPrice += subTotal;
}

cardProcessor.charge(chargeID + "\n" + merchantID + "\n" + totalPrice);