软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2014年
8期
1781-1793
,共13页
哈希函数%子空间%偏移量%局部性保留%判别性
哈希函數%子空間%偏移量%跼部性保留%判彆性
합희함수%자공간%편이량%국부성보류%판별성
hash function%subspace%bias%locality preserving%discriminant
随着数据量的不断增加,快速而准确的索引算法对信息检索而言变得十分重要。针对上述问题,提出了一种基于子空间学习的索引算法。首先,利用部分有标签的数据进行子空间学习,在学习过程中,为了保证语义相同的样本在索引后保持局部性,以样本近邻间的距离衡量类内聚合度;同时,为了保证不同语义的样本在索引后增强判别性,以不同语义样本中心之间的距离衡量类间离散度。通过放松限制,用类似线性判别分析的方法进行子空间学习,将子空间作为哈希函数的投影向量。利用学习到的投影向量进一步计算偏移量,得到哈希函数。分别在数据集MNIST和CIFAR-10上进行编码判别性实验和局部性保留实验,并与相关方法进行比较,得到了较好的效果。实验结果表明该方法是有效的。
隨著數據量的不斷增加,快速而準確的索引算法對信息檢索而言變得十分重要。針對上述問題,提齣瞭一種基于子空間學習的索引算法。首先,利用部分有標籤的數據進行子空間學習,在學習過程中,為瞭保證語義相同的樣本在索引後保持跼部性,以樣本近鄰間的距離衡量類內聚閤度;同時,為瞭保證不同語義的樣本在索引後增彊判彆性,以不同語義樣本中心之間的距離衡量類間離散度。通過放鬆限製,用類似線性判彆分析的方法進行子空間學習,將子空間作為哈希函數的投影嚮量。利用學習到的投影嚮量進一步計算偏移量,得到哈希函數。分彆在數據集MNIST和CIFAR-10上進行編碼判彆性實驗和跼部性保留實驗,併與相關方法進行比較,得到瞭較好的效果。實驗結果錶明該方法是有效的。
수착수거량적불단증가,쾌속이준학적색인산법대신식검색이언변득십분중요。침대상술문제,제출료일충기우자공간학습적색인산법。수선,이용부분유표첨적수거진행자공간학습,재학습과정중,위료보증어의상동적양본재색인후보지국부성,이양본근린간적거리형량류내취합도;동시,위료보증불동어의적양본재색인후증강판별성,이불동어의양본중심지간적거리형량류간리산도。통과방송한제,용유사선성판별분석적방법진행자공간학습,장자공간작위합희함수적투영향량。이용학습도적투영향량진일보계산편이량,득도합희함수。분별재수거집MNIST화CIFAR-10상진행편마판별성실험화국부성보류실험,병여상관방법진행비교,득도료교호적효과。실험결과표명해방법시유효적。
With the increasing amount of data being collected, developing fast indexing methods with high accuracy becomes important for information retrieval tasks. To address this issue, this paper proposes an indexing method based on hashing mechanism with subspace learning. Firstly, the subspace is learned on a set of labeled data. To guarantee the locality preserving characteristics in the original space for the samples with similar semantic labels, the distances between the nearest neighbors are computed to measure the intra-class scatter. Besides, the distances between the centers of samples with dissimilar semantic labels are also computed to measure the inter-class scatter in order to enhance the discriminative power of the codes. The projections of the hash functions are then learned by relaxing the constraint of the formula. The biases are further learned based on the projections. Finally, the proposed method is evaluated on the datasets MNIST and CIFAR-10 to compare with the state-of-the-art methods. Experimental results show that the proposed method achieves significant performance and high effectiveness in searching semantically similar neighbors.