Topic 2:
A review of the challenges that the Arabic Language poses to the text preprocessing NLP tools Description
This topic aims to present how the peculiarities of the Arabic script and other linguistic and sociolinguistic aspects of the Arabic language can influence the accuracy of the text preprocessing NLP tools (Tokenizers, Part of Speech Taggers, Parsers). You can describe the challenges presenting the tools mentioned above and how Arabic can make difficult their analysis referring to the various linguistic levels (Spelling, Script, Phonology, Morphology, Syntax, Semantics, Sociolinguistics). The size of the report will be a minimum of 4 pages + an unlimited number of pages for tables, illustrations, references, and examples. You will also prepare a short PowerPoint presentation that will summarize your paper.
In the written report, you will briefly: (v) Introduce and summarize the basic linguistic features of Arabic. (vi) Describe the challenges that these features introduce to the above-mentioned NLP tools. (vii) Illustrate your findings by a couple of examples and snapshots and cite your references in the APA style. 3 Submission Guidelines Your report should be typed, double-spaced on standard-sized A4 paper with 2.54 cm margins on all sides.
You should use a clear font that is highly readable. APA recommends using 12 pt. Times New Roman font. Include a page header at the top of every page. To create a page header, insert page numbers flush right. Then type “TITLE OF YOUR PAPER” in the header flush left using all capital letters. The page header is a shortened version of your paper’s title. Your report should include four major sections: Title Page, Abstract, Main Body, and References.