汉语多音词歧义消解研究_计算机科学与技术.rar

需要金币：1000 个金币	资料包括：完整论文
转换比率：金额 X 10=金币数量，例100元=1000金币	论文字数：14869
折扣与优惠：团购最低可5折优惠 - 了解详情	论文格式：Word格式(*.doc)

上一篇：二维矢量图形绘制系统的设计与开发.rar

下一篇：汉语文本分类研究_计算机科学与技术.rar

摘要:汉语多音词消歧是自然语言处理领域的基础问题之一。多音词是汉语中普遍存在的现象，是自然语言处理不可回避的问题，因此多音词若得不到很好的解决，将成为制约自动注音的瓶颈。近几年来，虽然出现了一些自动注音软件，但是多音词消歧的问题仍没有得到很好的解决。所以，本文对汉语多音词的歧义消解进行研究。

　　本文的主要工作如下：

　　多音词抽取。从电子版的《现代汉语词典》中统计出收录的所有多音词。

　　语料准备。从2001年《人民日报》语料中抽取含多音词的句子，根据音项对语料进行标注。

　　多音词消歧。利用多音词的语境信息进行歧义消解并在语料上进行了实验。本文使用了五种模型，即CRF、最大熵、RFR_SUM、SVM和语义相似度，对22个多音词进行了歧义消解，其平均正确率分别为85.27%、91.63%、94.04%、89.96%和89.16%。还使用了投票集成的方法，其平均正确率为96.34%。最后使用基于种子词的方法对多音词进行消歧。

　　实现了一个自动注音系统。其可对62个多音词进行消歧。

关键词：多音词消歧，自动注音，CRF，最大熵，RFR_SUM，SVM，语义相似度，种子词

Abstract:Chinese polyphone disambiguation is one primary problem in the field of Natural Language Processing.Polyphone is prevalent phenomenon in Chinese.Also polyphone can not be avoided in NLP.So,if the problem is not well resolved,it will become a bottleneck of phonetic automatic.In recent years,although there have been some softwares which phonetic automatically,it still does not have a very good solution to polyphone disambiguation.So,this paper studies on Chinese polyphone disambiguation.

　　The details are as follows:

　　Polyphone extracting.Extract all polyphones from the electronic version of "Modern Chinese Dictionary".

Corpus prepareing.Extract sentences which contain polyphones from the "People's Daily" corpus of 2001 and categorize based on pronunciations.

　　Polyphone disambiguation.This paper uses the context information of polyphone in disambiguation and tests in corpus.It uses five models,namely CRF,Maximum Entropy,RFR_SUM,SVM and similarity of word-sense,to disambiguate the pronunciation of 22 polyphone,the average accurate rates are: 85.27%、91.63%、94.04%、89.96% and 89.16%.Moreover,it uses integrated of voting which reaches 96.34%.Finally，this paper disambiguates polyphone based on seed word.

　　Build a system of phonetic automatic which can disambiguate on 62 polyphones.

Key Words：polyphone disambiguation，phonetic automatic，CRF，Maximum Entropy，RFR_SUM，SVM，similarity of word-sense，seed word

　　自然语言处理的一项重要任务就是对语言中存在的大量歧义现象进行消解。通过对大量语料的研究发现，现代汉语中存在大量的多音词。所以，本文将对现代汉语中多音词进行相关的研究。在此基础上，使用机器学习的方法对多音词的读音消歧进行研究。最后，研究开发自动注音系统。

　　本文主要工作如下：

　　1、对《现代汉语词典》中多音词进行统计分析；

　　2、从《人民日报》语料抽取含多音词的句子并进行标注；

　　3、使用语境信息对多音词的读音进行消歧；

　　4、建立自动注音系统。

电子病历管理_子系统的设计与实现Delph	XX机械有限公司员工工资管理系统.rar	英语在线考试系统VS2008+SQL+C#语言.rar
基于Android的宠物狗就医客户端的设计和实	北京市各区县经济实力统计分析_信息与计	我国通信企业间主要相同业务的网络营销
Joyrich服装商贸企业采购管理系统的分析与	XX软件公司人事管理系统的设计与实现.	酒店餐饮管理系统的设计与实现VB+SQL.do
网络环境下非公众人物个人隐私主动公开	第三类Fourier延拓的数值算法_信息与计算	经济作物销售系统网站的研究与实现.ra
朴素贝叶斯分类器及其应用_计算机科学与	基于android的心录社交系统的设计与实现	单周期cpu设计与展示.rar