CAJ | 학술논문

网络的普及和电子商务的发展改变了人们信息获取以及消费的方式．Web已经成为大多数人获取信息的重要来源．与此同时，互联网信息质量问题也逐渐凸显．Web中存在大量过时、错误、虚假、片面的信息．其中，不同网站为相同对象提供冲突信息的问题尤为突出．如何从这些冲突信息中找到正确信息成为亟待解决的问题，这类问题又被称为真值发现问题．通过对现有真值发现问题解决方法的调研，发现现有方法均未考虑数据源分类可信性差异对真值发现的影响．因此，提出基于数据源分类可信性的真值发现问题．提出2种方法探测数据源分类可信性差异，并采用贝叶斯的方法迭代计算数据源分类可信性和属性值准确性．另外，通过考虑数据源覆盖率和对象难度对真值发现的影响，进一步提高真值发现算法的准确性．一个真实数据集的实验结果表明，所提方法可以显著提高真值发现的准确性．
망락적보급화전자상무적발전개변료인문신식획취이급소비적방식．Web이경성위대다수인획취신식적중요래원．여차동시，호련망신식질량문제야축점철현．Web중존재대량과시、착오、허가、편면적신식．기중，불동망참위상동대상제공충돌신식적문제우위돌출．여하종저사충돌신식중조도정학신식성위극대해결적문제，저류문제우피칭위진치발현문제．통과대현유진치발현문제해결방법적조연，발현현유방법균미고필수거원분류가신성차이대진치발현적영향．인차，제출기우수거원분류가신성적진치발현문제．제출2충방법탐측수거원분류가신성차이，병채용패협사적방법질대계산수거원분류가신성화속성치준학성．령외，통과고필수거원복개솔화대상난도대진치발현적영향，진일보제고진치발현산법적준학성．일개진실수거집적실험결과표명，소제방법가이현저제고진치발현적준학성．
The popularization of the network and the development of e‐commerce have changed the way people access information and consume .For most of people ,Web has been the important source of information .Meanwhile ,information quality issue is becoming increasingly prominent .There is a lot of information w hich is outdated ,incorrect ,false and bias .Particularly ,the problem of conflicting information provided by different w ebsites is obvious .It has to be solved that how to find the truth from conflicting information .As we know ,there is not a method which considers the credibility of data categories on data sources during discovering truth .So ,we propose a problem which is truth discovery based credibility of data categories on data sources .In this paper ,tw o methods are proposed to detect the credibility differences of data categories on sources ,and a Bayesian method is used to iteratively compute the data sources quality and data accuracy . Additional , data coverage and the difficulty of each object is considered to improve the accuracy of truth finding .The experiments on a real data set show that our algorithms can significantly improve the accuracy of truth discovery .