A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection

  • Rim Koulali LaRI Laboratory, Sciences college, Mohammed I University, Oujda.
  • Abdelouafi Meziane LaRI Laboratory, Sciences college, Mohammed I University, Oujda.
Keywords: Multi-word Terms Extraction, Topic Detection, C-value, LLR.

Abstract

Mutli-word Terms extraction plays an important role in many Natural Language Processing (NLP) tasks. Despite their major importance, few works were dedicated to Arabic multi-word terms extraction. This paper proposes an automatic Arabic multi-word terms (MWTs) extraction system based on two major filtering steps: linguistics filter using a part-of-speech tagger along with morphological patterns and statistical filter based on probabilistic methods, namely: Log-Likelihood Ratio (LLR) and C-value. We evaluate the performances of the realized systems on Wattan; an Arabic oriented topic newspaper corpus. Our system manages to achieve 90.23% in term of multi-word extraction precision. We also study the use of MWTs as features in Arabic Topic Detection. The conducted experiments show good results.

Downloads

Download data is not yet available.
Published
2014-10-30
How to Cite
Koulali, R., & Meziane, A. (2014). A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 13(10), 5105-5112. https://doi.org/10.24297/ijct.v13i10.2333
Section
Articles