CAJ | 학술논문

分类体系是信息组织的有效形式，传统文献分类体系难以适用分类对象的转变，实用性不足，已有的网络分类体系则缺乏科学性。构建融合实用性与科学性的互联网信息分类体系，能够有效满足用户信息需求，且是自动文本分类技术研究的基础。文章分别以中图法、新浪门户为例，研究传统文献分类法与网络信息分类法的优缺点，提出互联网信息分类体系的实用性、科学性以及均衡性设计原则，基于三个设计原则构建了互联网信息分类体系。为了验证所构建的分类体系的有效性，通过网络爬虫抓取网易门户以及腾讯网的语料作为实验数据，与复旦语料库的分类体系进行对比实验。实验结果表明，相比于复旦语料库的分类体系，文章所提出的互联网信息分类体系具有更高的实用性，且能更为全面地涵盖各种互联网信息，类目之间交叉度小，各个类目信息量接近，文本分类效果更为理想。
분류체계시신식조직적유효형식，전통문헌분류체계난이괄용분류대상적전변，실용성불족，이유적망락분류체계칙결핍과학성。구건융합실용성여과학성적호련망신식분류체계，능구유효만족용호신식수구，차시자동문본분류기술연구적기출。문장분별이중도법、신랑문호위례，연구전통문헌분류법여망락신식분류법적우결점，제출호련망신식분류체계적실용성、과학성이급균형성설계원칙，기우삼개설계원칙구건료호련망신식분류체계。위료험증소구건적분류체계적유효성，통과망락파충조취망역문호이급등신망적어료작위실험수거，여복단어료고적분류체계진행대비실험。실험결과표명，상비우복단어료고적분류체계，문장소제출적호련망신식분류체계구유경고적실용성，차능경위전면지함개각충호련망신식，류목지간교차도소，각개류목신식량접근，문본분류효과경위이상。
The classification system is an effective method of information organization. The traditional classification system can not adapt to the transformation of classification object and is no longer practical; at the same time, the existing network classification system is not scientific. An Internet information classification system both practical and scientific can not only effectively meet the users' information demand, but can also promote the development of automatic text classification. Taking Chinese Library Classification and Sina portal for examples respectively, this paper studies the advantages and disadvantages between traditional document classification and taxonomy of network information, come up with the design principles of the internet information classification system, namely practical, scientific and balance. Based on these three design principles, an internet information classification system was built. In order to verify the validity of the classification system, the web crawler is used to grab corpus of www.163.com and www.qq.com which are as experimental data, and Fudan Corpus classification system is used for the comparative experiment. Experimental results show that, compared to the Fudan Corpus classification system, the proposed Internet Information Classification System has a higher practicality, and can more comprehensively cover all kinds of Internet information, little intersections among categories, more approach between the information of each category, the text classification efficiency is quietly improved.