Research Article Open Access

Data Streams Curation for Better Machine Learning Functionality and Result to Serve IoT and other Applications: A Survey

Haya Salah1, Islam Al-Omari1, Jaber Alwidian2, Rashed Al-Hamadin1 and Tariq Tawalbeh1
  • 1 Princess Sumaya University for Technology (PSUT), Jordan
  • 2 INTRASOFT MIDDLE EAST, Jordan

Abstract

Data Curation on data streams is effective in operating and reducing costs of BIG DATA analytic. Basically, analytic preparation requires data curation of available heterogeneous data sets available in big data clusters and such analytic process becomes harder when it comes to the concept of conducting the curation process on Data-on-Motion, in order to come at actionable insights and valuable analytic on a real-time basis including the Machine Learning further analytic and processing. In our paper, we identified and surveyed the different issues and challenges among different areas that are related to the big data. In addition to investigate, the most common techniques and methods followed through the implementations including Streams Curation, the Machine Learning Different Algorithms used in such implementations and the Feature Engineering different techniques that can be considered as curation pre-processing paradigm for data streams analytic. Furthermore, our paper shows the different application areas were data curation concept plays a critical role. Finally, we draw the map between the techniques and methods that are related to the data curation field to emphasize on its main critical role among Business, Retails, Culture, Arts, Health, Medicine, Social Media, Wireless Sensor Networks, Natural Language Processing (NLP) and Automated Feature Engineering (FE). On other hand, we identified the different issues and challenges among different areas including the IoT and Media Streams Curation to help the scholars in this region accordingly.

Journal of Computer Science
Volume 15 No. 10, 2019, 1572-1584

DOI: https://doi.org/10.3844/jcssp.2019.1572.1584

Submitted On: 8 May 2019 Published On: 1 November 2019

How to Cite: Salah, H., Al-Omari, I., Alwidian, J., Al-Hamadin, R. & Tawalbeh, T. (2019). Data Streams Curation for Better Machine Learning Functionality and Result to Serve IoT and other Applications: A Survey. Journal of Computer Science, 15(10), 1572-1584. https://doi.org/10.3844/jcssp.2019.1572.1584

  • 4,667 Views
  • 1,918 Downloads
  • 6 Citations

Download

Keywords

  • Data Curation
  • Data Streaming
  • Data Ingestion
  • Big Data
  • Machine Learning