Automatic Expansion of Abbreviations in Chinese News Text

机器学习 2948 Views

黑龙江大学自然语言实验室《Automatic Expansion of Abbreviations in Chinese News Text : A Hybrid Approach》这篇文章Three Methods : Reduction,Elimination,Generalization

Reduction:Selecting one or more key morphemes from each constituent word

e.g. “香港大学”—>”港大香港;大大学

n=m and si corresponds fi (i=1to n , si∈fi) , S is a reduced abbreviation

Elmination:Elimination one or more constituent words and the rest as the abbreviation

e.g. “清华大学”—>”清华

n<m and ∀sj ∈F(1<=j<=n), S is an eliminated abbreviation

Generalization:Generalizing parallel or similar parts

e.g. “防火、防盗、防交通事故”—>”三防

n<m and ∃sj ∉ F(1<=j<=n), S is a generalized abbreviation

其中,F=f1f2…fm :full-form (m constituent words) ;

S=s1s2…sn :corresponding abbreviation(n constituent words)


本篇文章所用的方法是HMMS and linguistic knowledge

Three main steps as follows:

(1)Expansion candidate generation.

Reduction:Generating with a mapping table

A mapping table is based on the characteristics of reduced abbreviations

Long-words come from a dictionary of common Chinese words.

Non-reduction:Generating expansions with a dictionary of abbreviations

In this system,a dictionary of abbreviation/full-form pairs is applied,in which each non-reduced abbreviation is mapped to a set of full-forms.

(2)Expansion disambiguation.

Based on HMMs,the goal of abbreviation disambiguation is to find a proper expansion Fˆ that maximizes the following score:

(3)Error correction with linguistic knowledge.

In this study,two types of linguistic knowledge, namely the hypothesis of one sense per discourse and the patterns for defining abbreviations are applied to check whether an expansion yielded by the above procedure is correct for a given abbreviation in a specific context.




hlju_nlp往期内容的方法,比如在搜狗微信搜索( 上输入【黑龙江大学自然语言处理实验室 分词】就能找到我们关于分词的相关内容。也可通过搜索公众号获取往期的所有图文消息。

hlju_nlp 或扫描如下二维码,即可关注“黑龙江大学自然语言处理实验室”:

如未说明则本站原创,转载请注明出处:NULL » Automatic Expansion of Abbreviations in Chinese News Text