A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection

  • Rim Koulali LaRI Laboratory, Sciences college, Mohammed I University, Oujda.
  • Abdelouafi Meziane LaRI Laboratory, Sciences college, Mohammed I University, Oujda.
Keywords: Multi-word Terms Extraction, Topic Detection, C-value, LLR.

Abstract

Mutli-word Terms extraction plays an important role in many Natural Language Processing (NLP) tasks. Despite their major importance, few works were dedicated to Arabic multi-word terms extraction. This paper proposes an automatic Arabic multi-word terms (MWTs) extraction system based on two major filtering steps: linguistics filter using a part-of-speech tagger along with morphological patterns and statistical filter based on probabilistic methods, namely: Log-Likelihood Ratio (LLR) and C-value. We evaluate the performances of the realized systems on Wattan; an Arabic oriented topic newspaper corpus. Our system manages to achieve 90.23% in term of multi-word extraction precision. We also study the use of MWTs as features in Arabic Topic Detection. The conducted experiments show good results.

Published
2014-10-30
How to Cite
Koulali, R., & Meziane, A. (2014). A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 13(10), 5105-5112. https://doi.org/10.24297/ijct.v13i10.2333
Section
Articles