You can analyze XML documents and extract their data using SQL character-string operations, such as substring, charindex, and patindex. However, it is more efficient to use Java in SQL, and to use tools written in Java, such as XML parsers.
XML parsers can:
Check that a document is well-formed and valid.
Handle character-set issues.
Generate a Java representation of a document’s parse tree.
Build or modify a document’s parse tree.
Generate a document’s text from its parse tree.
Many XML parsers are available with a free license or are in the public domain. They normally implement two standard interfaces: the Simple API for XML (SAX) and the Document Object Model (DOM).
SAX is an interface for parsing. It specifies input sources, character sets, and routines to handle external references. While parsing, it generates events so that user routines can process the document incrementally, and it returns a DOM object that is the parse tree of the document.
DOM is an interface for the parse tree of an XML document. It provides facilities for stepping through and assembling a parse tree.
Applications that use the SAX and DOM interfaces are portable across XML parsers.