Research Article Open Access

CTSS: A Tool for Efficient Information Extraction with Soft Matching Rules for Text Mining

A. Christy and P. Thambidurai

Abstract

The abundance of information available digitally in modern world had made a demand for structured information. The problem of text mining which dealt with discovering useful information from unstructured text had attracted the attention of researchers. The role of Information Extraction (IE) software was to identify relevant information from texts, extracting information from a variety of sources and aggregating it to create a single view. Information extraction systems depended on particular corpora and were poor in recall values. Therefore, developing the system as domain-independent as well as improving the recall was an important challenge for IE. In this research, the authors proposed a domain-independent algorithm for information extraction, called SOFTRULEMINING for extracting the aim, methodology and conclusion from technical abstracts. The algorithm was implemented by combining trigram model with softmatching rules. A tool CTSS was constructed using SOFTRULEMINING and was tested with technical abstracts of www.computer.org and www.ansinet.org and found that the tool had improved its recall value and therefore the precision value in comparison with other search engines.

Journal of Computer Science
Volume 4 No. 5, 2008, 375-381

DOI: https://doi.org/10.3844/jcssp.2008.375.381

Submitted On: 21 August 2008 Published On: 31 May 2008

How to Cite: Christy, A. & Thambidurai, P. (2008). CTSS: A Tool for Efficient Information Extraction with Soft Matching Rules for Text Mining . Journal of Computer Science, 4(5), 375-381. https://doi.org/10.3844/jcssp.2008.375.381

  • 3,393 Views
  • 2,627 Downloads
  • 4 Citations

Download

Keywords

  • Parsing
  • trigram model
  • soft matching
  • information extraction
  • recall
  • precision