Sketching-Din Elimination of Web Page
Abstract
Problem statement: The web content mining used to access lot of web pages, mining of web contents aims to extort positive information or awareness. Approach: There are several type of Web contents which can suggest valuable information to users are accessible in the Web, for instance graphical data, Extensible Markup Language documents, Hyper Text Markup Language documents and simple text. Here, only element of the information is useful for a testing purpose and the remaining information are noises. Results: In this research study, we propose an approach for removing the noises from a given web page which will get better the presentation of web content mining. At first, the web page information is divided into various blocks. Conclusion: From which, the duplicate blocks are removed using sketching. The performance of the proposed approach and results ensure the effectiveness of the proposed approach in classify the main blocks.
DOI: https://doi.org/10.3844/jcssp.2011.1888.1893
Copyright: © 2011 P. Sivakumar and R. M.S. Parvathi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,328 Views
- 2,578 Downloads
- 0 Citations
Download
Keywords
- Web mining
- web content mining
- web cleaning
- duplicate blocks
- web page information
- graphical data
- world wide web
- Web Structural Mining (WSM)
- Web Usage Mining (WUM)