Retrieving information chunks from a repository of documents SIT Collected from heterogeneous sources
XML documents are generated from heterogeneous resources. They may share the same data but in different Schema, which make it difficult to retrieve information from them. In this paper we propose a new technique that first; minimizes the size of the XML documents by reducing the redundancy of the structure part and generate the repository for these documents, and second; relaxes and decomposes the XPath query in two stages to determine the relevant documents and the relevant part within these documents. The results show significant precision and recall comparing with the exact XPath queries.
Copyright (c) 2015 INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided that the original work is properly cited.