计算机应用与软件
計算機應用與軟件
계산궤응용여연건
COMPUTER APPLICATIONS AND SOFTWARE
2014年
2期
44-48
,共5页
倾向性文本%依存关系%词性%特征集
傾嚮性文本%依存關繫%詞性%特徵集
경향성문본%의존관계%사성%특정집
Tendentious text%Dependency relationship%Part of speech%Features set
通过大规模语料实验和分析,揭示倾向性文本与普通文本在词性特征、依存关系、依存关系中的词性特征、邻接依存关系以及邻接依存关系中的词性特征等五个方面客观存在的差异。总结出若干有意义的结论,如:名词、副词、拟声词、状中结构、副词动词序列等在有倾向性文本中占有率明显高于普通文本;地理名、专有名词、定中关系、名词名词序列等在有倾向性文本中占有率明显低于普通文本等等。这些结论可以作为使用机器学习方法进行本文倾向性判断与分析的特征集使用。
通過大規模語料實驗和分析,揭示傾嚮性文本與普通文本在詞性特徵、依存關繫、依存關繫中的詞性特徵、鄰接依存關繫以及鄰接依存關繫中的詞性特徵等五箇方麵客觀存在的差異。總結齣若榦有意義的結論,如:名詞、副詞、擬聲詞、狀中結構、副詞動詞序列等在有傾嚮性文本中佔有率明顯高于普通文本;地理名、專有名詞、定中關繫、名詞名詞序列等在有傾嚮性文本中佔有率明顯低于普通文本等等。這些結論可以作為使用機器學習方法進行本文傾嚮性判斷與分析的特徵集使用。
통과대규모어료실험화분석,게시경향성문본여보통문본재사성특정、의존관계、의존관계중적사성특정、린접의존관계이급린접의존관계중적사성특정등오개방면객관존재적차이。총결출약간유의의적결론,여:명사、부사、의성사、상중결구、부사동사서렬등재유경향성문본중점유솔명현고우보통문본;지리명、전유명사、정중관계、명사명사서렬등재유경향성문본중점유솔명현저우보통문본등등。저사결론가이작위사용궤기학습방법진행본문경향성판단여분석적특정집사용。
Through experiments and analysis on large-scale corpus,we reveal the objective differences existing between the plain text and the tendentious text in 5 aspects:the characteristics of part of speech,the dependency relationship,the characteristics of part of speech in dependency relationship,the dependency relationships of adjacency and the characteristics of part of speech in dependency relationships of adjacency.We also summarise a couple of meaningful conclusions,such as:those having significantly higher occupancies in tendentious text than in plain text are the nouns,the adverbs,the onomatopoeia,the structures of adverbial and central word and the sequences of adverbs and verbs,etc.;while those having significantly lower occupancies in tendentious text than in plain text are the geographical names,the proper nouns,the relations of attribute and central word,the sequences of nouns and noun.These conclusions can be used as the features set for the tendentious discriminant and analysis of text using machine learning method.