|
|
Until recently, there were two major types of data available in the information universe:
This includes plain text, webpages, e-mails, etc. Major drawbacks of this format are that all documents exist without a well-formed structure.
Includes information that can be normalized into a set of values associated with specific fields -- records have a rigid structure that they must conform to. Major drawbacks are that there is no room for flexibility in document schemas. Recently, a new type of data known as semi-structured data has become popular. The most common format for semi-structured data today is the eXtensible Markup Language (XML). Standardized languages have been developed by the World Wide Web Consortium (W3C) for querying XML data. Semi-structured documents provide a solution to the disadvantages of structured and unstructured documents by allowing some structure and hierarchy to be embedded within unstructured information. |
Last updated: May 01, 2002. |