Consider a purchase order application. Customers submit orders, which are identified by a Date and the CustomerID, and which list one or more items, each of which has an ItemID, ItemName, Quantity, and units.
The data for such an order might be displayed on a screen as follows:
ORDER
Date: July 4, 1999
Customer ID: 123
Customer Name: Acme Alpha
Items:
Item ID |
Item Name |
Quantity |
---|---|---|
987 |
Coupler |
5 |
654 |
Connector |
3 dozen |
579 |
Clasp |
1 |
This data indicates that the customer named “Acme Alpha,” whose Customer Id is “123”, submitted an order on 1999/07/04 for couplers, connectors, and clasps.
The HTML text for this display of order data is as follows:
<html> <body> <p>ORDER <p>Date: July 4, 1999 <p>Customer ID: 123 <p>Customer Name: Acme Alpha <p>Items:</p> <table bgcolor=white align=left border=”3” cellpadding=3> <tr><td><B>Item ID </B></tr> <td><B>Item Name </B></tr> <td><B>Quantity </B> </td></td></tr> <tr><td>987</td> <td>Coupler</td> <td>5</td></tr> <tr><td>654</td> <td>Connector</td> <td>3 dozen</td></tr> <tr><td>579</td> <td>Clasp</td> <td>1</td></tr> </table> </body> </html>
This HTML text has certain limitations:
It contains both data and formatting specifications.
The data is the Customer Id, , and the various Customer Name, Item Names, and Quantities.
The formatting specifications are the indications for type style (<b>....</b>), color (bcolor=white), and layout (<table>....</table>, and also the supplementary field names, such as “Customer Name”, etc.
The structure of HTML documents is not well suited for extracting data.Some elements, such as tables, require strictly bracketed opening and closing tags, but other elements, such as paragraph tags (“<p>”), have optional closing tags.Some elements, such as paragraph tags (“<p>”) are used for many sorts of data, so it is difficult to distinguish between a “123” that is a Customer IDand a “123” that is an Item ID, without specialized inference from surrounding field names.
This merging of data and formatting, and the lack of strict phrase structure, makes it difficult to adapt HTML documents to different presentation styles, and makes it difficult to use HTML documents for data interchange and storage. XML is similar to HTML, but includes restrictions and extensions that address these drawbacks.