Introduction

 

Home
About the Author
Introduction
XML Database engines
Scalable XML Retrieval

Until recently, there were two major types of data available in the information universe: 

Unstructured Data

This includes plain text, webpages, e-mails, etc.  Major drawbacks of this format are that all documents exist without a well-formed structure.

Structured Data

Includes information that can be normalized into a set of values associated with specific fields -- records have a rigid structure that they must conform to.  Major drawbacks are that there is no room for flexibility in document schemas.

Recently, a new type of data known as semi-structured data has become popular.

The most common format for semi-structured data today is the eXtensible Markup Language (XML).  Standardized languages have been developed by the World Wide Web Consortium (W3C) for querying XML data.

Semi-structured documents provide a solution to the  disadvantages of structured and unstructured documents by allowing some structure and hierarchy to be embedded within unstructured information.

Click here for more...

 
Last updated: May 01, 2002.