计算机与应用化学
計算機與應用化學
계산궤여응용화학
COMPUTERS AND APPLIED CHEMISTRY
2013年
6期
575-581
,共7页
彭涛%孙连英%周家驹
彭濤%孫連英%週傢駒
팽도%손련영%주가구
层次聚类%TCM%分子指纹%虚拟筛选%Ward方法%Tanimoto系数
層次聚類%TCM%分子指紋%虛擬篩選%Ward方法%Tanimoto繫數
층차취류%TCM%분자지문%허의사선%Ward방법%Tanimoto계수
hierarchical clustering%traditional Chinese medicine%molecular fingerprint%virtual screening%ward' method%tanimoto coefficient
作为高通量筛选的一种有效方法,虚拟筛选得到了越来越广泛的应用。当靶分子结构未知时,往往使用基于配体的虚拟筛选方法。在基于配体的虚拟筛选方法中,相似性方法起着非常重要的作用。基于中药有效成分化合物数据库,进行了层次凝聚聚类分析。在化学信息系统中,有许多的距离/相似性度量方法和相似性系数。在化学结构的表示和特征选择方面,使用了广泛使用的 Daylight分子指纹。采用CDK项目来计算基于Daylight分子指纹的Tanimoto系数作为分子相似性度量方法。对TCM数据库进行了层次凝聚聚类分析,并在聚类之前应用了化学结构领域知识来进行待聚类数据的预处理。在层次聚类时,设定了0.75作为聚类的相似度阈值。计算了层次聚类过程中Kelly方法中的惩罚值来获取最合适的簇数量,通过该方法得到的簇数量与采用0.75作为相似度阈值聚类得到的簇数量非常接近。针对每一个包含多个化合物的簇,选取了多个化合物作为该簇的代表性化合物。同时根据聚类结果分析了Tanimoto系数的缺点。在后续工作中,可对TCM数据库进行分子骨架分析和多样性分析,并基于分子骨架进行聚类。
作為高通量篩選的一種有效方法,虛擬篩選得到瞭越來越廣汎的應用。噹靶分子結構未知時,往往使用基于配體的虛擬篩選方法。在基于配體的虛擬篩選方法中,相似性方法起著非常重要的作用。基于中藥有效成分化閤物數據庫,進行瞭層次凝聚聚類分析。在化學信息繫統中,有許多的距離/相似性度量方法和相似性繫數。在化學結構的錶示和特徵選擇方麵,使用瞭廣汎使用的 Daylight分子指紋。採用CDK項目來計算基于Daylight分子指紋的Tanimoto繫數作為分子相似性度量方法。對TCM數據庫進行瞭層次凝聚聚類分析,併在聚類之前應用瞭化學結構領域知識來進行待聚類數據的預處理。在層次聚類時,設定瞭0.75作為聚類的相似度閾值。計算瞭層次聚類過程中Kelly方法中的懲罰值來穫取最閤適的簇數量,通過該方法得到的簇數量與採用0.75作為相似度閾值聚類得到的簇數量非常接近。針對每一箇包含多箇化閤物的簇,選取瞭多箇化閤物作為該簇的代錶性化閤物。同時根據聚類結果分析瞭Tanimoto繫數的缺點。在後續工作中,可對TCM數據庫進行分子骨架分析和多樣性分析,併基于分子骨架進行聚類。
작위고통량사선적일충유효방법,허의사선득도료월래월엄범적응용。당파분자결구미지시,왕왕사용기우배체적허의사선방법。재기우배체적허의사선방법중,상사성방법기착비상중요적작용。기우중약유효성분화합물수거고,진행료층차응취취류분석。재화학신식계통중,유허다적거리/상사성도량방법화상사성계수。재화학결구적표시화특정선택방면,사용료엄범사용적 Daylight분자지문。채용CDK항목래계산기우Daylight분자지문적Tanimoto계수작위분자상사성도량방법。대TCM수거고진행료층차응취취류분석,병재취류지전응용료화학결구영역지식래진행대취류수거적예처리。재층차취류시,설정료0.75작위취류적상사도역치。계산료층차취류과정중Kelly방법중적징벌치래획취최합괄적족수량,통과해방법득도적족수량여채용0.75작위상사도역치취류득도적족수량비상접근。침대매일개포함다개화합물적족,선취료다개화합물작위해족적대표성화합물。동시근거취류결과분석료Tanimoto계수적결점。재후속공작중,가대TCM수거고진행분자골가분석화다양성분석,병기우분자골가진행취류。
Virtual screening is increasingly used as a cost-effective complement to high-throughput screening. And similarity methods play a key role in the ligand-based virtual screening approaches while the macromolecule structural information is unavailable. The Traditional Chinese Medicine Database was used to conduct hierarchical agglomerative clustering of effective compounds contained in TCM. There are many distance metrics and similarity coefficients commonly used in chemical information systems. In this paper, Daylight fingerprint was adopted as chemical structural representation method. And similarity indexes were calculated according to Tanimoto coefficient definition using the famous chemical library project-Chemical Development Kit (CDK). The hierarchical agglomerative clustering algorithm was implemented and conducted with the TCM database. And domain-specific knowledge was used to preprocess the molecules data in TCM database. The similarity threshold value of 0.75 was used in hierarchical agglomerative clustering of TCM database. The penalty value of Kelly method was calculated to get the optimal clusters number. And the clusters number calculated from Kelly method is very close to the clusters number resulted from hierarchical clustering using the threshold value of 0.75. Multiple representative molecules were calculated and selected from each non-singleton cluster. And the bias of Tanimoto coefficient was also analyzed. The scaffold analysis and scaffold-based clustering can be done in the future work.