光谱学与光谱分析
光譜學與光譜分析
광보학여광보분석
SPECTROSCOPY AND SPECTRAL ANALYSIS
2014年
10期
2701-2706
,共6页
刘桂松%郭昊淞%潘涛%王继华%曹干
劉桂鬆%郭昊淞%潘濤%王繼華%曹榦
류계송%곽호송%반도%왕계화%조간
转基因甘蔗育种筛查%Vis-NIR光谱%SG平滑%PCA-LDA%PCA-HCA
轉基因甘蔗育種篩查%Vis-NIR光譜%SG平滑%PCA-LDA%PCA-HCA
전기인감자육충사사%Vis-NIR광보%SG평활%PCA-LDA%PCA-HCA
Breed screening of transgenic sugarcane%Vis-NIR spectroscopy%SG smoothing%PCA-LDA%PCA-HCA
以Savitzky-Golay (SG)平滑筛选,主成分分析(PCA)分别结合有监督的线性判别分析(LDA)、无监督的系统聚类分析(HCA),应用于转基因甘蔗育种筛查的可见-近红外(Vis-NIR)无损检测。提出兼顾随机性、稳定性的定标、预测、检验框架;取田间种植处于伸长期甘蔗叶样品456个,具有Bt基因和Bar基因的转基因样品(阳)306个,非转基因样品(阴)150个;随机选取156个为检验集(阴性50、阳性106),余下为建模集(阴性100、阳性200,共300),建模集再随机划分为定标集(阴性50、阳性100,共150)、预测集(阴性50、阳性100,共150)共50次;扩充SG平滑点数,同时删除绝对值偏小的高阶导数模式,共264个平滑模式用于模型筛选;采用前3个主成分两两组合,再根据模型效果选出最优主成分组合;基于所有定标、预测集划分和SG平滑模式,建立SG-PCA-LDA和SG-PCA-HCA模型,根据平均预测效果优选参数,使模型具有稳定性;最后用检验集进行模型检验。经SG平滑后,PCA-LDA和PCA-HCA的建模精度、稳定性均显著改善;最优SG-PCA-LDA模型阳性、阴性样品检验识别率分别达到94.3%和96.0%;最优SG-PCA-HCA模型阳性、阴性样品检验识别率分别达到92.5%和98.0%。结果表明:Vis-NIR光谱模式识别结合SG平滑可用于转基因甘蔗叶的准确识别,提供了一种简便的转基因甘蔗育种筛查方法。
以Savitzky-Golay (SG)平滑篩選,主成分分析(PCA)分彆結閤有鑑督的線性判彆分析(LDA)、無鑑督的繫統聚類分析(HCA),應用于轉基因甘蔗育種篩查的可見-近紅外(Vis-NIR)無損檢測。提齣兼顧隨機性、穩定性的定標、預測、檢驗框架;取田間種植處于伸長期甘蔗葉樣品456箇,具有Bt基因和Bar基因的轉基因樣品(暘)306箇,非轉基因樣品(陰)150箇;隨機選取156箇為檢驗集(陰性50、暘性106),餘下為建模集(陰性100、暘性200,共300),建模集再隨機劃分為定標集(陰性50、暘性100,共150)、預測集(陰性50、暘性100,共150)共50次;擴充SG平滑點數,同時刪除絕對值偏小的高階導數模式,共264箇平滑模式用于模型篩選;採用前3箇主成分兩兩組閤,再根據模型效果選齣最優主成分組閤;基于所有定標、預測集劃分和SG平滑模式,建立SG-PCA-LDA和SG-PCA-HCA模型,根據平均預測效果優選參數,使模型具有穩定性;最後用檢驗集進行模型檢驗。經SG平滑後,PCA-LDA和PCA-HCA的建模精度、穩定性均顯著改善;最優SG-PCA-LDA模型暘性、陰性樣品檢驗識彆率分彆達到94.3%和96.0%;最優SG-PCA-HCA模型暘性、陰性樣品檢驗識彆率分彆達到92.5%和98.0%。結果錶明:Vis-NIR光譜模式識彆結閤SG平滑可用于轉基因甘蔗葉的準確識彆,提供瞭一種簡便的轉基因甘蔗育種篩查方法。
이Savitzky-Golay (SG)평활사선,주성분분석(PCA)분별결합유감독적선성판별분석(LDA)、무감독적계통취류분석(HCA),응용우전기인감자육충사사적가견-근홍외(Vis-NIR)무손검측。제출겸고수궤성、은정성적정표、예측、검험광가;취전간충식처우신장기감자협양품456개,구유Bt기인화Bar기인적전기인양품(양)306개,비전기인양품(음)150개;수궤선취156개위검험집(음성50、양성106),여하위건모집(음성100、양성200,공300),건모집재수궤화분위정표집(음성50、양성100,공150)、예측집(음성50、양성100,공150)공50차;확충SG평활점수,동시산제절대치편소적고계도수모식,공264개평활모식용우모형사선;채용전3개주성분량량조합,재근거모형효과선출최우주성분조합;기우소유정표、예측집화분화SG평활모식,건립SG-PCA-LDA화SG-PCA-HCA모형,근거평균예측효과우선삼수,사모형구유은정성;최후용검험집진행모형검험。경SG평활후,PCA-LDA화PCA-HCA적건모정도、은정성균현저개선;최우SG-PCA-LDA모형양성、음성양품검험식별솔분별체도94.3%화96.0%;최우SG-PCA-HCA모형양성、음성양품검험식별솔분별체도92.5%화98.0%。결과표명:Vis-NIR광보모식식별결합SG평활가용우전기인감자협적준학식별,제공료일충간편적전기인감자육충사사방법。
Based on Savitzky-Golay (SG) smoothing screening ,principal component analysis (PCA) combined with separately supervised linear discriminant analysis (LDA) and unsupervised hierarchical clustering analysis (HCA) were used for non-de-structive visible and near-infrared (Vis-NIR) detection for breed screening of transgenic sugarcane .A random and stability-de-pendent framework of calibration ,prediction ,and validation was proposed .A total of 456 samples of sugarcane leaves planting in the elongating stage were collected from the field ,which was composed of 306 transgenic (positive) samples containing Bt and Bar gene and 150 non-transgenic (negative) samples .A total of 156 samples (negative 50 and positive 106) were randomly se-lected as the validation set ;the remaining samples (negative 100 and positive 200 ,a total of 300 samples) were used as the mod-eling set ,and then the modeling set was subdivided into calibration (negative 50 and positive 100 ,a total of 150 samples) and prediction sets (negative 50 and positive 100 ,a total of 150 samples) for 50 times .The number of SG smoothing points was ex-panded ,while some modes of higher derivative were removed because of small absolute value ,and a total of 264 smoothing modes were used for screening .The pairwise combinations of first three principal components were used ,and then the optimal combination of principal components was selected according to the model effect .Based on all divisions of calibration and predic-tion sets and all SG smoothing modes ,the SG-PCA-LDA and SG-PCA-HCA models were established ,the model parameters were optimized based on the average prediction effect for all divisions to produce modeling stability .Finally ,the model validation was performed by validation set .With SG smoothing ,the modeling accuracy and stability of PCA-LDA ,PCA-HCA were signif-icantly improved .For the optimal SG-PCA-LDA model ,the recognition rate of positive and negative validation samples were 94.3% ,96.0% ;and were 92.5% ,98.0% for the optimal SG-PCA-LDA model ,respectively .Conclusion:Vis-NIR spectro-scopic pattern recognition combined with SG smoothing could be used for accurate recognition of transgenic sugarcane leaves ,and provided a convenient screening method for transgenic sugarcane breeding .