- Количество слайдов: 35
Resources : Louise Lane & Kalpdrum Passi, Sanjay Madria and Mukesh Mohania - “A Model for XML Schema Integration”, and My Research in Fall, 2001 with Dr. Madria
XML is a markup language for documents containing structured information. A markup language is a mechanism to identify structures in a document. XML documents are self-describing, thus XML provides a platform independent means to describe data and therefore, can transport data from one platform to another. XML documents can be created and used by applications.
E-Commerce applications use data from different sources and need to be integrated. A mediated schema is created to represent a particular application domain and data sources are mapped as views over the mediated schema.
Business applications needs to exchange data between different applications. The data should be transparent from representation and should be platform independent. XML is also used when one or more organizations merge. When organizations merge, interoperability among documents is necessary which can be achieved using XML integration.
XML Schema is the recommended as the standard schema language by W 3 C to validate documents. XML Schema has a stronger expressive power than DTD schema for the purpose of data exchange and integration from various sources of data.
Tukwila Data Integration system uses a mediated schema to integrate data from different sources. The user asks a query over the mediated schema and the data Integration system reformulates the query over the data sources and executes it. Tukwila uses an Query Re-formulator and Optimizer to query large amounts of data efficiently. Mini. Con algorithm is used to map the query from the mediated schema to data sources. It uses an x-scan operator that can query streaming XML data.
To query an XML document, Querying techniques like XML-QL and XQL needs the complete XML document to be downloaded and is then queried.
Tukwila X-scan matches regular path expression patterns from the query, returning results in pipelined fashion as the data streams across the network.
The automated integration of XML schemas is beneficial to both the traditional forms of view integration and database integration. An integrated schema forms the basis for a valid query language over a particular set of XML documents. The schemas to be integrated currently validate a set of existing XML documents, data integrity and continued document delivery are chief concerns of the integration process.
XML schema requires the use of namespaces to uniquely identify schema structure ( elements, attributes, datatypes, etc. ). The name of each structure is prefaced by a namespace prefix which identifies the namespace that the structure is defined within. A practical example of schema integration is when two companies merge.
Pre-Integration: In this phase element, attribute and datatype definitions are extracted through parsing the actual schema document. Comparison: In this phase, the correspondences between elements and attributes are determined either by using semantic learning or using human interaction. Integration: In this phase, conflicts that exist between the corresponding elements and/or attributes such as naming conflicts, datatype conflicts and structural conflicts are resolved.
Correspondences table contain the information about the corresponding elements/attributes. An entry in the Dependencies table denotes the dependency of an element on other elements/attributes. The elements/attributes are integrated only after their dependencies are integrated.
Once the integration process is completed, the global schema in XSDM notation is used to construct the global XML schema document. The construction of the XML schema document is a straightforward process because all the data about the schema is present in the XSDM notation.
This method is useful when a required global schema is not present. The global XML schema obtained is complete, minimal and understandable. Human interaction is required only for a limited level. Even though local schemas are large and complex, the global schema can be obtained efficiently.
User interaction is required, cannot do the task by only using semantic learning. Not successful in resolving all key conflicts. Complete knowledge on data is required to resolve these. The method doesn’t have an cross check on the users input. The process may result in a un minimal schema if the user doesn’t recognize all the correspondences.
This method is successful in integrating schema documents. The method explained is implementable.