遗传学报
遺傳學報
유전학보
ACTA GENETICA SINICA
2006年
12期
1132-1140
,共9页
多重检验法%基因芯片%基因%功效
多重檢驗法%基因芯片%基因%功效
다중검험법%기인심편%기인%공효
Multiple-testing procedure%microarrays%gene%power
鉴于基因芯片实验的造价,在基因芯片实验设计中,首要考虑的因素是需要多少重复才能检测出一个具有显著差异表达的基因.计算多重检验法要求的重复数(样本大小)或功效可为基因芯片实验设计提供重要的参考.为此,本文基于置换重抽样法构建了一种基因表达噪声混合分布模型.该方法适用各类基因表达数据,即无论是基因表达单噪声源或是多噪声源都可行.应用混合模型和多重检验法并给定统计功效,研究者能在基因芯片实验中获得所需要的最少生物学重复数;或者根据样本大小来确定测定一个显著差异表达的基因所具有的检验功效;或者根据样本大小和统计检验功效,选择最好的统计测验方法.本文以一组在老鼠中与中风有关的3 000个基因的基因芯片实验所获得的数据为例,应用该方法拟和后组建了一个单分布模型(即表达单噪声源的分布模型).根据该模型,我们计算了4种多重检验法在鉴定一个具有表达差异(D)值的基因中所需要的统计功效.结果表明,检测一个小的差异D值,4种多重检验法中B方法的统计功效最低,而BH方法最高.但是,对于鉴定一个具有最大表达差异的基因时,4种方法有相同的鉴定功效.与传统的单个检验法一样,BH方法检测一个小的变化所需要的效率不会随基因数目增加而改变,其他3种多重检验法的检测功效则随基因数目增加而降低.
鑒于基因芯片實驗的造價,在基因芯片實驗設計中,首要攷慮的因素是需要多少重複纔能檢測齣一箇具有顯著差異錶達的基因.計算多重檢驗法要求的重複數(樣本大小)或功效可為基因芯片實驗設計提供重要的參攷.為此,本文基于置換重抽樣法構建瞭一種基因錶達譟聲混閤分佈模型.該方法適用各類基因錶達數據,即無論是基因錶達單譟聲源或是多譟聲源都可行.應用混閤模型和多重檢驗法併給定統計功效,研究者能在基因芯片實驗中穫得所需要的最少生物學重複數;或者根據樣本大小來確定測定一箇顯著差異錶達的基因所具有的檢驗功效;或者根據樣本大小和統計檢驗功效,選擇最好的統計測驗方法.本文以一組在老鼠中與中風有關的3 000箇基因的基因芯片實驗所穫得的數據為例,應用該方法擬和後組建瞭一箇單分佈模型(即錶達單譟聲源的分佈模型).根據該模型,我們計算瞭4種多重檢驗法在鑒定一箇具有錶達差異(D)值的基因中所需要的統計功效.結果錶明,檢測一箇小的差異D值,4種多重檢驗法中B方法的統計功效最低,而BH方法最高.但是,對于鑒定一箇具有最大錶達差異的基因時,4種方法有相同的鑒定功效.與傳統的單箇檢驗法一樣,BH方法檢測一箇小的變化所需要的效率不會隨基因數目增加而改變,其他3種多重檢驗法的檢測功效則隨基因數目增加而降低.
감우기인심편실험적조개,재기인심편실험설계중,수요고필적인소시수요다소중복재능검측출일개구유현저차이표체적기인.계산다중검험법요구적중복수(양본대소)혹공효가위기인심편실험설계제공중요적삼고.위차,본문기우치환중추양법구건료일충기인표체조성혼합분포모형.해방법괄용각류기인표체수거,즉무론시기인표체단조성원혹시다조성원도가행.응용혼합모형화다중검험법병급정통계공효,연구자능재기인심편실험중획득소수요적최소생물학중복수;혹자근거양본대소래학정측정일개현저차이표체적기인소구유적검험공효;혹자근거양본대소화통계검험공효,선택최호적통계측험방법.본문이일조재로서중여중풍유관적3 000개기인적기인심편실험소획득적수거위례,응용해방법의화후조건료일개단분포모형(즉표체단조성원적분포모형).근거해모형,아문계산료4충다중검험법재감정일개구유표체차이(D)치적기인중소수요적통계공효.결과표명,검측일개소적차이D치,4충다중검험법중B방법적통계공효최저,이BH방법최고.단시,대우감정일개구유최대표체차이적기인시,4충방법유상동적감정공효.여전통적단개검험법일양,BH방법검측일개소적변화소수요적효솔불회수기인수목증가이개변,기타3충다중검험법적검측공효칙수기인수목증가이강저.
Because of the high operation costs involved in microarray experiments, the determination of the number of replicates required to detect a gene significantly differentially expressed in a given multiple-testing procedure is of considerable significance. Calculation of power/replicate numbers required in multiple-testing procedures provides design guidance for microarray experiments. Based on this model and by choice of a multiple-testing procedure, expression noises based on permutation resampling can be considerably minimized. The method for mixture distribution model is suitable to various microarray data types obtained from single noise sources, or from multiple noise sources. By using the biological replicate number required in microarray experiments for a given power or by determining the power required to detect a gene significantly differentially expressed, given the sample size, or the best multiple-testing method can be chosen. As an example, a single-distribution model of t-statistic was fitted to an observed microarray dataset of 3 000 genes responsive to stroke in rat, and then used to calculate powers of four popular multiple-testing procedures to detect a gene of an expression change D. The results show that the B-procedure had the lowest power to detect a gene of small change among the multiple-testing procedures, whereas the BH-procedure had the highest power. However,all multiple-testing procedures had the same power to identify a gene having the largest change. Similar to a single test, the power of the BH-procedure to detect a small change does not vary as the number of genes increases, but powers of the other three multiple-testing procedures decline as the number of genes increases.