Skip Navigation
York U: Redefine the PossibleHOME | Current Students | Faculty & Staff | Research | International
Search »FacultiesLibrariesCampus MapsYork U OrganizationDirectorySite Index
Future Students, Alumni & Visitors
2005 Technical Reports

A Rapid Bayesian Adaptation of N-gram Language Models Using Cross-word Correlation

Hui Jiang, Keikichi Hirose, Nobuaki Minematsu, Koki Sasaki and Takaaki Moriya

Technical Report CS-2005-02

York University

February 4, 2005

Abstract

In this work, we study a fast adaptation problem of n-gram language models under the MAP estimation framework. We propose a heuristic method to explore cross-word correlation to accelerate the MAP adaptation of n-gram models. According to the correlation, occurrence of one word in adaptation text can be used to predict all possible n-grams which will likely appear in the same adaptation text. Then the predicted occurrence is incorporated into the MAP estimation of n-gram models. In this way, a large n-gram model can be efficiently adapted with only a small amount of adaptation data. We have conducted two experiments to evaluate the proposed fast adaptation technique, e.g., topic adaptation within a domain and cross-domain adaptation. All experimental results clearly show that the proposed fast adaptation approach is very efficient and effective to adapt a large n-gram model to a new task quickly, in terms of perplexity reduction and speech recognition improvements. It is also shown that the proposed fast adaptation technique significantly outperforms the conventional MAP adaptation, especially when we have very limited amount of adaptation data.

Download paper in PDF format.



The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.