A New Filtering Algorithm for Duplicate Document Based on Concept Analysis

Ahmad M. Hasnah

doi:10.3844/jcssp.2006.434.440

Research Article Open Access

A New Filtering Algorithm for Duplicate Document Based on Concept Analysis

Ahmad M. Hasnah

Abstract

Data bases and web pages contain currently a huge number of duplicate document. It is then fundamental to have a filter which can be embedded, for instance, within an information retrieval system like a search engine in order to prohibit the redundant documents references to appear on the screen as a reply to the user's query. This filter can save the user time and increases his satisfaction. In this study, we propose a new algorithm based on concept analysis principle, which can act as a filter for duplicate document. It can be applied on a collection of documents or databases and reduce their storage spaces by eliminating redundant documents without loosing knowledge. Our experiments show that this algorithm increases the precision of the information retrieval system and improves its performance.

Journal of Computer Science

Volume 2 No. 5, 2006, 434-440

DOI: https://doi.org/10.3844/jcssp.2006.434.440

Submitted On: 16 February 2006 Published On: 31 May 2006

How to Cite: Hasnah, A. M. (2006). A New Filtering Algorithm for Duplicate Document Based on Concept Analysis. Journal of Computer Science, 2(5), 434-440. https://doi.org/10.3844/jcssp.2006.434.440

Copyright: © 2006 Ahmad M. Hasnah. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

5,726 Views
3,638 Downloads
2 Citations

Download

Keywords

Duplicate document
concept analysis
information retrieval
information filtering