Friday, August 17, 2007

XML Validation with DTD in Java

One thing you may have heard about XML is that it lets the system developer define custom tags. With a nonvalidating parser, you certainly have that ability. You can make up any tag you want and, as long as you balance your open and close tags and don't overlap them in absurd ways, the nonvalidating SAX parser will parse the document without any problems. For example, a nonvalidating SAX parser would correctly parse and fire events for the document in below.

A Well formed meaningless Document










Why DTD?

Two people generally can't talk to one another unless they speak a mutually understood language. Likewise, two programs can't communicate via XML unless the programs agree on the XML language they use. A DTD defines a set of rules for the allowable tags and attributes in an XML document, and the order and cardinality of the tags. Programs using the DTD must still agree on what the tags mean (semantics again), but a DTD defines the words (or, the tags) and the grammatical rules for a particular XML dialect.
In this section, you will learn to va
lidate a xml file against a DTD (Document Type Definition) using the DOM APIs. A DTD defines the document structure with a list of legal elements and attributes.

Program Description:

Validating a XML file against a DTD needs a xml file and its DTD document. First of all construct a well-formed xml file along with a DTD file . This DTD file defines all elements to keep in the xml file. After creating these, we parse the xml file using the parse() method and generates a Document object tree. The setErrorHandler() method invokes an object of DoucmentBuilder. Enable the setValidating() method of the factory to "true". If we pass 'true' the parser will validate xml documents otherwise not. To validate xml file , pass the DTD file as setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "fileinfo.dtd") in the transformer object.

DTD For above xml:











Code :(To perform xml validation)

import java.io.FileInputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class ValidateXML {
public static void main(String args[]) {
try{

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new org.xml.sax.ErrorHandler() {
//To handle Fatal Errors
public void fatalError(SAXParseException exception)throws SAXException {
System.out.println("Line: " +exception.getLineNumber() + "\nFatal Error: "+exception.getMessage());
}
//To handle Errors
public void error(SAXParseException e)throws SAXParseException {
System.out.println("Line: " +e.getLineNumber() + "\nError: "+e.getMessage());
}
//To Handle warnings
public void warning(SAXParseException err)throws SAXParseException{
System.out.println("Line: " +err.getLineNumber() + "\nWarning: "+err.getMessage());
}
});
Document xmlDocument = builder.parse(new FileInputStream("fileinfo.xml"));
DOMSource source = new DOMSource(xmlDocument);
StreamResult result = new StreamResult(System.out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "fileinfo.dtd");
transformer.transform(source, result);
}
catch (Exception e) {
System.out.println(e.getMessage());
}
}
}

5 comments:

Pavel Moukhataev said...

Great, unfortunately this doesn't work. Seems that the whole internet is full of wanna-be java professionals that reuse the same wrong example.

w3c said...

Nice information, I really appreciate the way you presented.

http://www.w3cvalidation.net/

vinkal vishnoi said...

The same example is given in many sites and none of them is working.

Rakesh kumar Mohanty said...
This comment has been removed by the author.
Rakesh kumar Mohanty said...

This worked fine for me
if you get the error white space required for the element "xyz"
then try with the following tag
<1ELEMENT xyz EMPTY>