Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2011 Technical Reports

Cross Lingual Word Sense Disambiguation for Languages with Scarce Resources

Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone, Aijun An

Technical Report CSE-2011-01

York University

January 24 2011


Word Sense Disambiguation (WSD) has long been a central problem in computational linguistics. WSD is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged corpus to aid machine learning approaches for Persian texts, nor any suitable parallel corpora. Yet due to the ever-increasing development of Persian pages in Wikipedia, this resource can act as a comparable corpus for English-Persian texts.In this paper, we propose a cross lingual approach to tagging the word senses in Persian texts. The new approach makes use of English sense disambiguators, the Wikipedia articles in both English and Persian, and a newly developed lexical ontology, FarsNet. It overcomes the lack of knowledge resources and NLP tools for the Persian language. We demonstrate the effectiveness of the proposed approach by comparing it to a direct sense disambiguation approach for Persian. The evaluation results indicate a comparable performance to the utilized English sense tagger.

Download paper in PDF format.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.