Naïve Bayesian Based on Chi Square to Categorize Arabic Data

Fadi Thabtah, Mohammad Ali H. Eljinini, Mannam Zamzeer and  Wa’el Musa Hadi

 

Philadelphia University, Jordan

AL-Isra Private University, Jordan

University of Jordan, Jordan

AL-Isra Private University, Jordan

Abstract

Text classification is a supervised technique that uses labelled training data to learn the classification system and then automatically classifies the remaining text using the learned system. This paper investigates Naïve Bayesian algorithm based on Chi Square features selection method. The base of our comparisons are macro F1, macro recall and macro precision evaluation measures. The experimental results compared against different Arabic text categorization data sets provided evidence that feature selection often increases classification accuracy by removing rare terms.

Keywords: Text Categorization, Naïve Bayesian, Arabic Text Data, Chi Square.
Shares