CAJ | 학술논문

针对潜在语义分析( LSA： Latent Semantic Analysis)方法在海量高维数据中的制约，提出K均值聚类的LSA方法( KLSA)：通过利用K均值聚类对主题词进行预处理，将主题词降到相对低维空间后再使用LSA方法；选取新浪微博文本数据作为具体研究对象，通过实验证明了所提出的方法能够在确保模型分类效果条件下，很好地满足海量高维数据对LSA方法计算速度的敏感要求。
침대잠재어의분석( LSA： Latent Semantic Analysis)방법재해량고유수거중적제약，제출K균치취류적LSA방법( KLSA)：통과이용K균치취류대주제사진행예처리，장주제사강도상대저유공간후재사용LSA방법；선취신랑미박문본수거작위구체연구대상，통과실험증명료소제출적방법능구재학보모형분류효과조건하，흔호지만족해량고유수거대LSA방법계산속도적민감요구。
Considering the constraints of Latent Semantic Analysis ( LSA) method in massive high-dimensional data, this paper proposes an improved LSA method based on k-means algorithm, called KLSA. This method takes advantage of k-means algorithm to reduce those feature words to relatively low-dimensional space and then uses the LSA method. In order to ensure the validity of this idea, the paper chooses text data from Sina Weibo to conduct an experiment. It is proved that the proposed method can satisfy the requirements of compu-tational efficiency in massive high-dimensional data under the condition of ensuring the classification results.