软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2013年
5期
1167-1182
,共16页
不确定数据%数据质量%一致的查询回答%完整性约束%数据清洗
不確定數據%數據質量%一緻的查詢迴答%完整性約束%數據清洗
불학정수거%수거질량%일치적사순회답%완정성약속%수거청세
uncertain data%data quality%consistent query answer%integrity constraints%data cleaning
不一致数据无法正确反映现实世界,其上的查询结果内含错误或矛盾,而现有的很多不一致数据查询处理相关研究都存在信息丢失的问题。 AQA(annotation based query answer)针对这一问题采用信任标签在属性级别上区分一致和不一致数据,避免了信息丢失。但 AQA 假设记录在依赖左边属性上的分量可信,且只针对函数依赖一种约束,具有应用局限性。在综合约束(函数依赖、包含依赖和域约束)范围内、不确定属性任意的情况下扩展了AQA,重新审视了 AQA 的数据模型及其上的查询代数,讨论了任意约束在查询结果上的蕴含约束计算问题。实验结果表明,扩展后的AQA非连接类查询的性能和普通的SQL基本相同,连接查询经优化后性能接近普通SQL查询,但AQA不丢失信息,与部分同类研究相比有很大优势。
不一緻數據無法正確反映現實世界,其上的查詢結果內含錯誤或矛盾,而現有的很多不一緻數據查詢處理相關研究都存在信息丟失的問題。 AQA(annotation based query answer)針對這一問題採用信任標籤在屬性級彆上區分一緻和不一緻數據,避免瞭信息丟失。但 AQA 假設記錄在依賴左邊屬性上的分量可信,且隻針對函數依賴一種約束,具有應用跼限性。在綜閤約束(函數依賴、包含依賴和域約束)範圍內、不確定屬性任意的情況下擴展瞭AQA,重新審視瞭 AQA 的數據模型及其上的查詢代數,討論瞭任意約束在查詢結果上的蘊含約束計算問題。實驗結果錶明,擴展後的AQA非連接類查詢的性能和普通的SQL基本相同,連接查詢經優化後性能接近普通SQL查詢,但AQA不丟失信息,與部分同類研究相比有很大優勢。
불일치수거무법정학반영현실세계,기상적사순결과내함착오혹모순,이현유적흔다불일치수거사순처리상관연구도존재신식주실적문제。 AQA(annotation based query answer)침대저일문제채용신임표첨재속성급별상구분일치화불일치수거,피면료신식주실。단 AQA 가설기록재의뢰좌변속성상적분량가신,차지침대함수의뢰일충약속,구유응용국한성。재종합약속(함수의뢰、포함의뢰화역약속)범위내、불학정속성임의적정황하확전료AQA,중신심시료 AQA 적수거모형급기상적사순대수,토론료임의약속재사순결과상적온함약속계산문제。실험결과표명,확전후적AQA비련접류사순적성능화보통적SQL기본상동,련접사순경우화후성능접근보통SQL사순,단AQA불주실신식,여부분동류연구상비유흔대우세。
Inconsistent data is confusing and conflicting. Computing credible query answers over such data is significant. However, previous related works lose information. The approach of annotation based query answer (AQA) introduces confidence annotation to differ consistently and inconsistently in attribute value. Thus, a credible query answer can be computed and information loss can also be avoided. This is limited, however, in functional dependencies. This paper extends the approach to applications where multi constraints are involved, and no attribute is definitely credible. This paper redefines its representing model and query algebra, discusses the rules for calculating valid implied constraints of the above types on query result for any query algebra, proposes a cost based heuristic algorithm to repair, and annotates the initial database. The experiments show that time performance of extended AQA is almost similar to that of SQL for any query without join, and close to SQL for join queries after optimization, but it doesn’t loss information.