Tame the Information Tangle
XML data management systems
By Paul Sholtz
New Architect
October 2002
Two major limitations hinder Web information queries. First, custom, page-specific parsers must preprocess data contained within HTML pages before meaningful information can be extracted. Second, data in traditional database management systems is generally only Web accessible through simple and inflexible forms-based interfaces.
Encoding information in XML and exposing it on the Web will help overcome these hurdles and enable fine-tuned, database-like queries on a global scale. Of course, if all the world's data is to be encoded in XML, we'll need more efficient ways to store and manage large volumes of XML data. To address that need, a new breed of document storage and management systems has appeared that's been specially optimized for publishing XML documents on the Web.
Representing Documents in XML
XML is a very powerful and flexible standard, so it's important to clearly define how you want to use the XML data before you decide on a document management system. You may require the ability to store and persist XML data for a number of purposes, and different data storage systems may exhibit a range of performance characteristics depending on the structure of your XML, the frequency of document update and retrieval, and so forth. For example, XML is commonly used for machine-to-machine data interchange and processing, where no human being is ever involved. Keep in mind that the methods used to persist and query these types of XML documents may be very different from those used for XML documents designed for Web publishing.
The term "document-centric XML" is used to describe XML documents meant for human consumption. The document-centric approach is used to create XML encodings for books, email messages, advertisements, Web pages, XHTML documents, and many other types of semi-structured and unstructured document data.