Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Ercan Canhasi

doi:10.3844/jcssp.2018.699.704

Research Article Open Access

Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Ercan Canhasi¹

¹ Gjirafa, Inc. Rr. Rexhep Mala, 28A, Kosovo

Abstract

Discovering identical or near-identical items is urgently important in many applications such as Web crawling since it drastically reduces the text processing costs. Simhash is a widely used technique, able to attribute a bit-string identity to a text, such that similar texts have similar identities. In this study, a real-time solution for a simhash calculation in OpenCL is presented. We also show how it can be utilized by multi-CPUs, GPUs and FPGAs. As a result we indicate that the bottom line computation realized on the FPGA through OpenCL provides significant power advantages.

Journal of Computer Science

Volume 14 No. 5, 2018, 699-704

DOI: https://doi.org/10.3844/jcssp.2018.699.704

Submitted On: 8 June 2017 Published On: 28 April 2018

How to Cite: Canhasi, E. (2018). Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL. Journal of Computer Science, 14(5), 699-704. https://doi.org/10.3844/jcssp.2018.699.704

Copyright: © 2018 Ercan Canhasi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,197 Views
3,462 Downloads
0 Citations

Download

Keywords

Simhash
OpenCL
CPU
GPU
FPGA
Xilinx
SDAccel