计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2014年
8期
2769-2772,2784
,共5页
数据发布%隐私保护%多约束%粗糙集%属性重要度
數據髮佈%隱私保護%多約束%粗糙集%屬性重要度
수거발포%은사보호%다약속%조조집%속성중요도
data publication%privacy preservation%multiple constrains%rough set%attribute significance
针对传统匿名算法采用相同的匿名强度实现k-划分,常导致所要发布数据的隐私保护程度与数据可用性之间失衡的问题,提出一种基于粗糙集属性重要度的多约束匿名化方法。根据准标识符属性重要度的差别,对准标识符属性维度进行自动划分,实现多约束匿名参数的设计,对具有不同维度的划分进行相应的匿名化操作。基于粗糙集理论和信息熵理论,设计了一种分类型数据可用性评估模型。从数据泛化后的信息损失、等价类对集合划分导致的信息熵改变两方面综合评估匿名化数据表的信息损失量。实验结果表明,采用该方法能够较好地实现数据的隐私保护和数据可用性之间的平衡。
針對傳統匿名算法採用相同的匿名彊度實現k-劃分,常導緻所要髮佈數據的隱私保護程度與數據可用性之間失衡的問題,提齣一種基于粗糙集屬性重要度的多約束匿名化方法。根據準標識符屬性重要度的差彆,對準標識符屬性維度進行自動劃分,實現多約束匿名參數的設計,對具有不同維度的劃分進行相應的匿名化操作。基于粗糙集理論和信息熵理論,設計瞭一種分類型數據可用性評估模型。從數據汎化後的信息損失、等價類對集閤劃分導緻的信息熵改變兩方麵綜閤評估匿名化數據錶的信息損失量。實驗結果錶明,採用該方法能夠較好地實現數據的隱私保護和數據可用性之間的平衡。
침대전통닉명산법채용상동적닉명강도실현k-화분,상도치소요발포수거적은사보호정도여수거가용성지간실형적문제,제출일충기우조조집속성중요도적다약속닉명화방법。근거준표식부속성중요도적차별,대준표식부속성유도진행자동화분,실현다약속닉명삼수적설계,대구유불동유도적화분진행상응적닉명화조작。기우조조집이론화신식적이론,설계료일충분류형수거가용성평고모형。종수거범화후적신식손실、등개류대집합화분도치적신식적개변량방면종합평고닉명화수거표적신식손실량。실험결과표명,채용해방법능구교호지실현수거적은사보호화수거가용성지간적평형。
To erase the imbalance phenomenon between the privacy protection and the utility of anonymized data caused by iden-tifying all attributes having the same importance degree in the traditional algorithm ,a multi-constraint anonymous method based on the attribute significance of the rough set was proposed ,which took into account the influence caused by various quasi-identi-fier attributes .The dimension division was carried out automatically according to the quasi-identifier attributes significance and thereby the design of multi-constraint anonymous parameters was realized .After that ,an anonymous operation was executed on the separate partition .Additionally ,a model for evaluating the utility of anonymized data based on both the rough set theory and the information entropy theory was designed ,which comprehensibly evaluated the information loss of anonymized data by consi-dering the information loss of generated attribute values and the change of the information entropy caused by using equivalence classes to partition the data set .Experimental results show that the method better balances the privacy protection degree and the data availability .