计算机应用与软件
計算機應用與軟件
계산궤응용여연건
Computer Applications and Software
2015年
8期
17-21,63
,共6页
Hadoop分布式系统%MapReduce并行计算%近似谱聚类算法%稀疏近似相似矩阵%大规模高维数据
Hadoop分佈式繫統%MapReduce併行計算%近似譜聚類算法%稀疏近似相似矩陣%大規模高維數據
Hadoop분포식계통%MapReduce병행계산%근사보취류산법%희소근사상사구진%대규모고유수거
Hadoop distributed system%MapReduce parallel computing%Approximate spectral clustering algorithm%Sparse approximate similarity matrix%Large-scale high-dimensional data
随着信息时代的来临,互联网产生的大规模高维数据呈现几何级数增长,对其进行谱聚类在计算时间和内存使用上都存在瓶颈问题,尤其是求Laplacian矩阵特征向量分解。鉴于Hadoop MapReduce并行编程模型对密集型数据处理的优势,基于t最近邻稀疏化近似相似Laplacian矩阵,设计Hadoop MapReduce并行近似谱聚类算法,以期解决上述瓶颈问题。实验使用UCI Bag of Words数据集验证所设计算法的正确性和有效性,结果显示该并行设计在谱聚类质量和性能方面达到了一定的预期效果。
隨著信息時代的來臨,互聯網產生的大規模高維數據呈現幾何級數增長,對其進行譜聚類在計算時間和內存使用上都存在瓶頸問題,尤其是求Laplacian矩陣特徵嚮量分解。鑒于Hadoop MapReduce併行編程模型對密集型數據處理的優勢,基于t最近鄰稀疏化近似相似Laplacian矩陣,設計Hadoop MapReduce併行近似譜聚類算法,以期解決上述瓶頸問題。實驗使用UCI Bag of Words數據集驗證所設計算法的正確性和有效性,結果顯示該併行設計在譜聚類質量和性能方麵達到瞭一定的預期效果。
수착신식시대적래림,호련망산생적대규모고유수거정현궤하급수증장,대기진행보취류재계산시간화내존사용상도존재병경문제,우기시구Laplacian구진특정향량분해。감우Hadoop MapReduce병행편정모형대밀집형수거처리적우세,기우t최근린희소화근사상사Laplacian구진,설계Hadoop MapReduce병행근사보취류산법,이기해결상술병경문제。실험사용UCI Bag of Words수거집험증소설계산법적정학성화유효성,결과현시해병행설계재보취류질량화성능방면체도료일정적예기효과。
With the advent of information age, the large-scale high-dimensional data generated in Internet increases exponentially, its spectral clustering suffers from the bottleneck problem in both computational time and memory use, particularly in solving Laplacian matrix eigenvector decomposition.Given the advantages of Hadoop MapReduce parallel programming model in processing intensive data, based on t nearest neighbour sparse approximation similarity Laplacian matrix, in this paper we design Hadoop MapReduce parallel approximate spectral clustering algorithm to solve the above-mentioned bottleneck problem.The experiment uses UCI Bag of Words dataset to validate the correctness and effectiveness of the designed algorithm, result indicates that the parallel design aligns with a certain desired effect in terms of spectral clustering quality and performance.