农业工程学报
農業工程學報
농업공정학보
2013年
4期
266-271
,共6页
宦克为%刘小溪%郑峰%蔡小龙%于素平%石晓光
宦剋為%劉小溪%鄭峰%蔡小龍%于素平%石曉光
환극위%류소계%정봉%채소룡%우소평%석효광
近红外光谱%无损检测%模型%变量选择%蒙特卡罗采样%特征投影图
近紅外光譜%無損檢測%模型%變量選擇%矇特卡囉採樣%特徵投影圖
근홍외광보%무손검측%모형%변량선택%몽특잡라채양%특정투영도
near infrared spectroscopy%nondestructive examination%models%variable selection%monte carlo sampling%latent projective graph
为了实现小麦蛋白质的无损检测,简化便携式小麦蛋白质检测设备的预测模型,提高模型预测精度,该文针对小麦采集波长范围为950~1690 nm 的近红外漫透反射光谱,结合蒙特卡罗采样(MCS,monte carlo sampling)技术与特征投影图(LPG,latent projective graph)方法对波长变量进行选择.根据模型集群分析(MPA, model population analysis)思想,采用 MCS 技术建立样本子空间,利用主成分分析(PCA,principal component analysis)得到 LPG,假定 LPG 中共线性光谱变量对建模作用相同,选出少数波长变量建立子预测模型,选出预测均方根误差(RMSEP,root-mean-square error of prediction)较小的子模型,统计分析其变量的出现频次,选取频次最高的波长变量作为影响变量(IVs,influential variables).研究结果表明,利用 IVs 建模可以将 RMSEP 值由0.5245减小到0.2548,采用蒙特卡罗采样技术的特征投影图方法(MC-LPG,monte carlo-latent projective graph)进行变量选择,对于提高模型预测精度是可行的.
為瞭實現小麥蛋白質的無損檢測,簡化便攜式小麥蛋白質檢測設備的預測模型,提高模型預測精度,該文針對小麥採集波長範圍為950~1690 nm 的近紅外漫透反射光譜,結閤矇特卡囉採樣(MCS,monte carlo sampling)技術與特徵投影圖(LPG,latent projective graph)方法對波長變量進行選擇.根據模型集群分析(MPA, model population analysis)思想,採用 MCS 技術建立樣本子空間,利用主成分分析(PCA,principal component analysis)得到 LPG,假定 LPG 中共線性光譜變量對建模作用相同,選齣少數波長變量建立子預測模型,選齣預測均方根誤差(RMSEP,root-mean-square error of prediction)較小的子模型,統計分析其變量的齣現頻次,選取頻次最高的波長變量作為影響變量(IVs,influential variables).研究結果錶明,利用 IVs 建模可以將 RMSEP 值由0.5245減小到0.2548,採用矇特卡囉採樣技術的特徵投影圖方法(MC-LPG,monte carlo-latent projective graph)進行變量選擇,對于提高模型預測精度是可行的.
위료실현소맥단백질적무손검측,간화편휴식소맥단백질검측설비적예측모형,제고모형예측정도,해문침대소맥채집파장범위위950~1690 nm 적근홍외만투반사광보,결합몽특잡라채양(MCS,monte carlo sampling)기술여특정투영도(LPG,latent projective graph)방법대파장변량진행선택.근거모형집군분석(MPA, model population analysis)사상,채용 MCS 기술건립양본자공간,이용주성분분석(PCA,principal component analysis)득도 LPG,가정 LPG 중공선성광보변량대건모작용상동,선출소수파장변량건립자예측모형,선출예측균방근오차(RMSEP,root-mean-square error of prediction)교소적자모형,통계분석기변량적출현빈차,선취빈차최고적파장변량작위영향변량(IVs,influential variables).연구결과표명,이용 IVs 건모가이장 RMSEP 치유0.5245감소도0.2548,채용몽특잡라채양기술적특정투영도방법(MC-LPG,monte carlo-latent projective graph)진행변량선택,대우제고모형예측정도시가행적.
In order to realize the nondestructive determination of protein content in wheat, simplify the prediction model of portable wheat protein detection devices, and improve prediction accuracy of models, the near infrared diffuse transmission-reflectance spectra of wheat was measured from 950 to 1690 nm. The wavelength variable was selected by a combined Monte Carlo Sampling (MCS) technology and the Latent Projective Graph (LPG) method. The LPG is another expression of the principal component projective graph, and it is a technique developed in Chemical Factor Analysis (CFA) for investigating the nature of hyphenated data. Latent variables (loading) of a data matrix and the projection of objects onto the latent variables (score) are obtained by Principal Component Analysis (PCA), the nature of the data matrix can be analyzed by the loading and score plots, because the latent variables are linear combination of measured variables and the projection defines uniquely the sample relations in the reduced variable space spanned by the latent variables. So the LPG is adopted in wavelength selection for Near-Infrared (NIR) spectral analysis, the loading matrix is used to state the relationship among different samples, and the score matrix is used to select the wavelength variables. Model Population Analysis (MPA) is first obtained from the sub-dataset by MCS, then some sub-models are built for each sub-dataset. Finally, a statistical analysis is made from the sample space, variable space, parametric space and model space about the parameters which contribute to sub-models building,. Therefore, according to MPA, 500 sub-datasets of samples were established by MCS technology. For each sub-dataset, the proportion of calibration and prediction is 2:1.There are 61 kinds of wheat as calibration and 32 kinds of wheat as prediction. The LPG was obtained by PCA, assuming that linear spectral variables in LPG have the same contribution for modeling, a small number of wavelength variables were selected for building 500 predictable sub-models, 458 sub-models which have the smaller root mean square error (RMSEP) that is smaller than 0.55 were selected. The frequency number of the selected variables which are in 458 sub-models was analyzed statistically, the 12 wavelength of highest frequency number were selected as the influential variables (IVs), they were 1060, 1094, 1403, 1494, 1511, 1521, 1545, 1551, 1607, 1612, 1620, and 1630 nm. The RMSEP of the prediction model is reduced from 0.5245 to 0.2548 and the RPD value is increased from 1.7496 to 3.3985 by the new model which was built by the IVs. Therefore, the variable selection with Monte Carlo Sampling technology and Latent Projective Graph method (MC-LPG) is feasible for improving the precision of prediction model.