自然语言处理怎么最快入门?

机器学习 1193 Views

大家好,这里是“黑龙江大学自然语言处理实验室”。我们将成为大家了解科研,了解自然语言处理的一个很好的途径。如果大家有什么意见或者看法,都可以和我留言的。欢迎大家提问,多多互动~

其实我对NLP的了解,就源自于吴军博士写的那本《数学之美》,希望此文能让对NLP感兴趣的同学得以快速入门~

本文来自“知乎每日精选

点击左下角“阅读原文”即可查看链接

推荐《数学之美》,这个书写得特别科普且生动形象,我相信你不会觉得枯燥。这个我极力推荐,我相信科研的真正原因是因为兴趣,而不是因为功力的一些东西。

接下来说,《统计自然语言处理》这本书,这书实在是太老了,但是也很经典,看不看随意了。(小编注:推荐看《统计自然语言处理》(第二版))

现在自然语言处理都要靠统计学知识,所以我十分十分推荐《统计学习方法》,李航的。李航老师用自己课余时间7年写的,而且有博士生Review的。自然语言处理和机器学习不同,机器学习依靠的更多是严谨的数学知识以及推倒,去创造一个又一个机器学习算法。而自然语言处理是把那些机器学习大牛们创造出来的东西当Tool使用。所以入门也只是需要涉猎而已,把每个模型原理看看,不一定细致到推倒。

然后就是Stanford公开课了,Stanford公开课要求一定的英语水平。 | Coursera我觉得讲的比大量的中国老师好~

举例:

http://www.ark.csNaNu.edu/LS2/in…

或者

http://www.stanford.edu/class/cs…

如果做工程前先搜索有没有已经做好的工具,不要自己从头来。做学术前也要好好的Survey

开始推荐工具包:

中文的显然是哈工大开源的那个工具包LTP (Language Technology Platform) developed by HIT-SCIR( 哈尔滨工业大学社会计算与信息检索研究中心 ).

英文的(python)

pattern -simpler to get started than NLTK

chardet -character encoding detection

pyenchant -easy access to dictionaries

scikit-learn -has support for text classification

unidecode -because ascii is much easier to deal with

必读论文(摘自Quora我过一阵会翻译括号里面的解释):

Parsing(句法结构分析~语言学知识多,会比较枯燥)

Klein & Manning:"Accurate Unlexicalized Parsing" ( )

Klein & Manning:"Corpus-Based Induction of Syntactic Structure: Models of Dependency andConstituency" (革命性的用非监督学习的方法做了parser)

Nivre "DeterministicDependency Parsing of English Text" (shows that deterministic parsingactually works quite well)

McDonald et al. "Non-ProjectiveDependency Parsing using Spanning-Tree Algorithms" (the other main methodof dependency parsing, MST parsing)

Machine Translation

Knight "A statistical MTtutorial workbook" (easy to understand, use instead of the original Brownpaper)

Och "The Alignment-TemplateApproach to Statistical Machine Translation" (foundations of phrase basedsystems)

Wu "Inversion TransductionGrammars and the Bilingual Parsing of Parallel Corpora" (arguably thefirst realistic method for biparsing, which is used in many systems)

Chiang "HierarchicalPhrase-Based Translation" (significantly improves accuracy by allowing forgappy phrases)

Language Modeling (语言模型)

Goodman "A bit of progressin language modeling" (describes just about everything related to n-gramlanguage models 这是一个survey,这个survey写了几乎所有和n-gram有关的东西,包括平滑聚类)

Teh "A Bayesianinterpretation of Interpolated Kneser-Ney" (shows how to get state-of-theart accuracy in a Bayesian framework, opening the path for other applications)

Machine Learning for NLP

Sutton & McCallum "Anintroduction to conditional random fields for relational learning"(everyone should know CRFs, and this paper is the easiest to understand)

Knight "Bayesian Inferencewith Tears" (explains the general idea of bayesian techniques quite well)

Berg-Kirkpatrick et al."Painless Unsupervised Learning with Features" (this is from thisyear and thus a bit of a gamble, but this has the potential to bring the powerof discriminative methods to unsupervised learning)

Information Extraction

Hearst. Automatic Acquisition ofHyponyms from Large Text Corpora. COLING 1992. (The very first paper for allthe bootstrapping methods for NLP. It is a hypothetical work in a sense that itdoesn't give experimental results, but it influenced it's followers a lot.)

Collins and Singer. UnsupervisedModels for Named Entity Classification. EMNLP 1999. (It applies severalvariants of co-training like IE methods to NER task and gives the motivationwhy they did so. Students can learn the logic from this work for writing a goodresearch paper in NLP.)

Computational Semantics

Gildea and Jurafsky. AutomaticLabeling of Semantic Roles. Computational Linguistics 2002. (It opened up thetrends in NLP for semantic role labeling, followed by several CoNLL sharedtasks dedicated for SRL. It shows how linguistics and engineering cancollaborate with each other. It has a shorter version in ACL 2000.)

Pantel and Lin. Discovering WordSenses from Text. KDD 2002. (Supervised WSD has been explored a lot in theearly 00's thanks to the senseval workshop, but a few system actually benefitsfrom WSD because manually crafted sense mappings are hard to obtain. These dayswe see a lot of evidence that unsupervised clustering improves NLP tasks suchas NER, parsing, SRL, etc,

您可以查找公众号:hlju_nlp 或扫描如下二维码,即可关注“黑龙江大学自然语言处理实验室”:

如未说明则本站原创,转载请注明出处:NULL » 自然语言处理怎么最快入门?