燕山大学学报
燕山大學學報
연산대학학보
JOURNAL OF YANSHAN UNIVERSITY
2014年
6期
523-531,543
,共10页
关系数据库%频繁子图挖掘%聚类%公共特征%可信度计算
關繫數據庫%頻繁子圖挖掘%聚類%公共特徵%可信度計算
관계수거고%빈번자도알굴%취류%공공특정%가신도계산
relational database%frequent subgraph mining%cluster%common character%reliability calculation
在清洗算法不能有效地纠正不一致数据的情况下,“知情”用户给出的关于其正确取值的评论,对数据库的其他用户意义重大,可以帮助他们甄别错误数据,并在不丢失信息的前提下,尽可能地从不一致数据库中获取有用信息,但只有正确可信的评论才能有如此意义。因此,评论的可信度估算是这类应用中的一个关键问题。和互联网评论不同,数据库一般向系统内用户开放,用户的特征更易于提取,其语义确定。由于数据是对现实世界的描述,能对同一评论对象,发出类似评论的用户往往具有相同的背景或语义特征。文章提出了一种基于用户的特征分析的评论可信度计算算法,有针对性地解决了上述问题。算法首先根据语义特征,对历史评论者进行用户社区挖掘,得到在某准确度下评论过某对象的用户公共特征,形成用户模板;其次,对于任意给定新评论,通过其评论者和用户公共特征模板的匹配程度,并综合该评论者可信度、评论者和评论对象的语义相关性等关键因素,计算出该评论的可信度。实验证明,该算法在时间和准确率两方面都是有效的。
在清洗算法不能有效地糾正不一緻數據的情況下,“知情”用戶給齣的關于其正確取值的評論,對數據庫的其他用戶意義重大,可以幫助他們甄彆錯誤數據,併在不丟失信息的前提下,儘可能地從不一緻數據庫中穫取有用信息,但隻有正確可信的評論纔能有如此意義。因此,評論的可信度估算是這類應用中的一箇關鍵問題。和互聯網評論不同,數據庫一般嚮繫統內用戶開放,用戶的特徵更易于提取,其語義確定。由于數據是對現實世界的描述,能對同一評論對象,髮齣類似評論的用戶往往具有相同的揹景或語義特徵。文章提齣瞭一種基于用戶的特徵分析的評論可信度計算算法,有針對性地解決瞭上述問題。算法首先根據語義特徵,對歷史評論者進行用戶社區挖掘,得到在某準確度下評論過某對象的用戶公共特徵,形成用戶模闆;其次,對于任意給定新評論,通過其評論者和用戶公共特徵模闆的匹配程度,併綜閤該評論者可信度、評論者和評論對象的語義相關性等關鍵因素,計算齣該評論的可信度。實驗證明,該算法在時間和準確率兩方麵都是有效的。
재청세산법불능유효지규정불일치수거적정황하,“지정”용호급출적관우기정학취치적평론,대수거고적기타용호의의중대,가이방조타문견별착오수거,병재불주실신식적전제하,진가능지종불일치수거고중획취유용신식,단지유정학가신적평론재능유여차의의。인차,평론적가신도고산시저류응용중적일개관건문제。화호련망평론불동,수거고일반향계통내용호개방,용호적특정경역우제취,기어의학정。유우수거시대현실세계적묘술,능대동일평론대상,발출유사평론적용호왕왕구유상동적배경혹어의특정。문장제출료일충기우용호적특정분석적평론가신도계산산법,유침대성지해결료상술문제。산법수선근거어의특정,대역사평론자진행용호사구알굴,득도재모준학도하평론과모대상적용호공공특정,형성용호모판;기차,대우임의급정신평론,통과기평론자화용호공공특정모판적필배정도,병종합해평론자가신도、평론자화평론대상적어의상관성등관건인소,계산출해평론적가신도。실험증명,해산법재시간화준학솔량방면도시유효적。
In the application of inconsistent database which can't be cleaned, the reviews from the informed user which includs the correct values can help others identify the error data and get useful from the inconsistent database. But only reliable reviews are meaningful, so calculation reliability of users' review is one of most important problems in this kind of application. Different from the internet, relational database based applications are generally accessed to dedicated users, the characteristics of the user can be more easily to be extracted and their semantic meanings are determined. The users who submit the similar reviews to the same object commonly have the same background or semantic features. Based on that, an algorithm is presented to calculate the credibility of users'reviews in this paper. The algorithm firstly try to discover user's feature pattern by mining community of users who reviewed the same object with the similar accuracy and achieving their common features, then for a new review, its credibility is evaluated by the matching degree according to the matching degree of its reviewer and the user's feature pattern on the reviewed object. Be-sides, user's credibility and semantic relationship between the reviewer and the reviewed object are also considered in the evaluation. Experiment results show that the algorithm is efficient and valid on both time performance and correctness.