Automatic Expansion of Abbreviations in Chinese News Text

机器学习 2948 Views

黑龙江大学自然语言实验室《Automatic Expansion of Abbreviations in Chinese News Text : A Hybrid Approach》这篇文章Three Methods : Reduction,Elimination,Generalization

Reduction:Selecting one or more key morphemes from each constituent word

e.g. “香港大学”—>”港大香港;大大学

n=m and si corresponds fi (i=1to n , si∈fi) , S is a reduced abbreviation

Elmination:Elimination one or more constituent words and the rest as the abbreviation

e.g. “清华大学”—>”清华

n<m and ∀sj ∈F(1<=j<=n), S is an eliminated abbreviation

Generalization:Generalizing parallel or similar parts

e.g. “防火、防盗、防交通事故”—>”三防

n<m and ∃sj ∉ F(1<=j<=n), S is a generalized abbreviation

其中,F=f1f2…fm :full-form (m constituent words) ;

S=s1s2…sn :corresponding abbreviation(n constituent words)


本篇文章所用的方法是HMMS and linguistic knowledge

Three main steps as follows:

(1)Expansion candidate generation.

Reduction:Generating with a mapping table

A mapping table is based on the characteristics of reduced abbreviations

Long-words come from a dictionary of common Chinese words.

Non-reduction:Generating expansions with a dictionary of abbreviations

In this system,a dictionary of abbreviation/full-form pairs is applied,in which each non-reduced abbreviation is mapped to a set of full-forms.

(2)Expansion disambiguation.

Based on HMMs,the goal of abbreviation disambiguation is to find a proper expansion Fˆ that maximizes the following score:

(3)Error correction with linguistic knowledge.

In this study,two types of linguistic knowledge, namely the hypothesis of one sense per discourse and the patterns for defining abbreviations are applied to check whether an expansion yielded by the above procedure is correct for a given abbreviation in a specific context.




