CAJ | 학술논문

作为高通量筛选的一种有效方法，虚拟筛选得到了越来越广泛的应用。当靶分子结构未知时，往往使用基于配体的虚拟筛选方法。在基于配体的虚拟筛选方法中，相似性方法起着非常重要的作用。基于中药有效成分化合物数据库，进行了层次凝聚聚类分析。在化学信息系统中，有许多的距离/相似性度量方法和相似性系数。在化学结构的表示和特征选择方面，使用了广泛使用的 Daylight分子指纹。采用CDK项目来计算基于Daylight分子指纹的Tanimoto系数作为分子相似性度量方法。对TCM数据库进行了层次凝聚聚类分析，并在聚类之前应用了化学结构领域知识来进行待聚类数据的预处理。在层次聚类时，设定了0.75作为聚类的相似度阈值。计算了层次聚类过程中Kelly方法中的惩罚值来获取最合适的簇数量，通过该方法得到的簇数量与采用0.75作为相似度阈值聚类得到的簇数量非常接近。针对每一个包含多个化合物的簇，选取了多个化合物作为该簇的代表性化合物。同时根据聚类结果分析了Tanimoto系数的缺点。在后续工作中，可对TCM数据库进行分子骨架分析和多样性分析，并基于分子骨架进行聚类。
작위고통량사선적일충유효방법，허의사선득도료월래월엄범적응용。당파분자결구미지시，왕왕사용기우배체적허의사선방법。재기우배체적허의사선방법중，상사성방법기착비상중요적작용。기우중약유효성분화합물수거고，진행료층차응취취류분석。재화학신식계통중，유허다적거리/상사성도량방법화상사성계수。재화학결구적표시화특정선택방면，사용료엄범사용적 Daylight분자지문。채용CDK항목래계산기우Daylight분자지문적Tanimoto계수작위분자상사성도량방법。대TCM수거고진행료층차응취취류분석，병재취류지전응용료화학결구영역지식래진행대취류수거적예처리。재층차취류시，설정료0.75작위취류적상사도역치。계산료층차취류과정중Kelly방법중적징벌치래획취최합괄적족수량，통과해방법득도적족수량여채용0.75작위상사도역치취류득도적족수량비상접근。침대매일개포함다개화합물적족，선취료다개화합물작위해족적대표성화합물。동시근거취류결과분석료Tanimoto계수적결점。재후속공작중，가대TCM수거고진행분자골가분석화다양성분석，병기우분자골가진행취류。
Virtual screening is increasingly used as a cost-effective complement to high-throughput screening. And similarity methods play a key role in the ligand-based virtual screening approaches while the macromolecule structural information is unavailable. The Traditional Chinese Medicine Database was used to conduct hierarchical agglomerative clustering of effective compounds contained in TCM. There are many distance metrics and similarity coefficients commonly used in chemical information systems. In this paper, Daylight fingerprint was adopted as chemical structural representation method. And similarity indexes were calculated according to Tanimoto coefficient definition using the famous chemical library project-Chemical Development Kit (CDK). The hierarchical agglomerative clustering algorithm was implemented and conducted with the TCM database. And domain-specific knowledge was used to preprocess the molecules data in TCM database. The similarity threshold value of 0.75 was used in hierarchical agglomerative clustering of TCM database. The penalty value of Kelly method was calculated to get the optimal clusters number. And the clusters number calculated from Kelly method is very close to the clusters number resulted from hierarchical clustering using the threshold value of 0.75. Multiple representative molecules were calculated and selected from each non-singleton cluster. And the bias of Tanimoto coefficient was also analyzed. The scaffold analysis and scaffold-based clustering can be done in the future work.