心理科学
心理科學
심이과학
Psychological Science
2014年
3期
742~747
,共null页
黎光明 刘晓瑜 谭小兰 周梦培 张敏强
黎光明 劉曉瑜 譚小蘭 週夢培 張敏彊
려광명 류효유 담소란 주몽배 장민강
考试评分 缺失数据 概化理论双侧面交叉设计(P × i× r)方差分量估计
攷試評分 缺失數據 概化理論雙側麵交扠設計(P × i× r)方差分量估計
고시평분 결실수거 개화이론쌍측면교차설계(P × i× r)방차분량고계
test scores, sparse data, generalizability theory, two - faceted random cross design( p × i × r), estimating variance com- ponents
考试评分缺失数据较为常见,如何有效利用现有数据进行统计分析是个关键性问题。在考试评分中,题目与评分者对试卷得分的影响不容忽视。根据概化理论原理,按考试评分规则推导出含有缺失数据双侧面交叉设计(P×i×r)方差分量估计公式,用Matlab7.0软件模拟多组缺失数据,验证此公式的有效性。结果发现:(1)推导出的公式较为可靠,估计缺失数据的方差分量偏差相对较小,即便数据缺失率达到50%以上,公式仍能对方差分量进行较为准确地估计;(2)题目数量对概化理论缺失数据方差分量的估计影响最大,评分者次之,当题目数量和评价者数量分别为6和5时,公式能够趋于稳定地估计;(3)学生数量对各方差分量的估计影响较小,无论是小规模考试还是大规模考试,概化理论估计缺失数据的多个方差分量结果相差不大。
攷試評分缺失數據較為常見,如何有效利用現有數據進行統計分析是箇關鍵性問題。在攷試評分中,題目與評分者對試捲得分的影響不容忽視。根據概化理論原理,按攷試評分規則推導齣含有缺失數據雙側麵交扠設計(P×i×r)方差分量估計公式,用Matlab7.0軟件模擬多組缺失數據,驗證此公式的有效性。結果髮現:(1)推導齣的公式較為可靠,估計缺失數據的方差分量偏差相對較小,即便數據缺失率達到50%以上,公式仍能對方差分量進行較為準確地估計;(2)題目數量對概化理論缺失數據方差分量的估計影響最大,評分者次之,噹題目數量和評價者數量分彆為6和5時,公式能夠趨于穩定地估計;(3)學生數量對各方差分量的估計影響較小,無論是小規模攷試還是大規模攷試,概化理論估計缺失數據的多箇方差分量結果相差不大。
고시평분결실수거교위상견,여하유효이용현유수거진행통계분석시개관건성문제。재고시평분중,제목여평분자대시권득분적영향불용홀시。근거개화이론원리,안고시평분규칙추도출함유결실수거쌍측면교차설계(P×i×r)방차분량고계공식,용Matlab7.0연건모의다조결실수거,험증차공식적유효성。결과발현:(1)추도출적공식교위가고,고계결실수거적방차분량편차상대교소,즉편수거결실솔체도50%이상,공식잉능대방차분량진행교위준학지고계;(2)제목수량대개화이론결실수거방차분량적고계영향최대,평분자차지,당제목수량화평개자수량분별위6화5시,공식능구추우은정지고계;(3)학생수량대각방차분량적고계영향교소,무론시소규모고시환시대규모고시,개화이론고계결실수거적다개방차분량결과상차불대。
Missing data are easily found in psychological surveys and experiments such as test scores. For example, in performance as- sessment, a certain group of raters rate a certain group of examinees. By this token, the data from performance assessment compose a sparse data matrix. Researchers are always concerned about how to make good use of the observed data. Brennan (2001) provided the estimating formulas of p x i design of sparse data. But in practice, there is always more than one factor which has an effect on the exper- iment such as the factor of rater and cannot be ignored in the performance assessment. The aim of this article is to find a way that can estimate the variance component of sparse data quickly and effectively. In China, many studies only analyzed complete data and ignored sparse data. There are two merits. Firstly, when facing missing data, researchers usually deleted incomplete records or used imputation before analysis. But using these methods to analyze performance assessment will lose sight of the data which can be used for analysis. Secondly, the estimated value will differ along with different imputation methods. This article provided the estimating formulas of p × i × r design of sparse data, based on the estimating formulas of p x i design of sparse data provided by Brennan (2001). This article used MATLAB 7.0 to simulate data which were usually encountered in examination, then used the generalizability the- ory to estimate variance components. We simulated two conditions respectively, a small size with 200 students and a large size with 10000 students. We then used the estimating formulas of p × i × r design of sparse data to estimate variance components in order to test the formulas' validity. The research showed that these formulas provided a good estimation of variance components. The estimated variance components approach to set values. The accuracy rates of item and rater were the highest. The accuracy rates of interaction of students and items was low. The maximum bias of interaction reached 1.3. The number of items had the most important effect on the estimation. If the number of item increased a little, then the accuracy rate increased by a big margin. These formulas provided a good estimation when the amount of items was moderate. We also found that these formulas could be used in either a small or a large number of data. Either kind of data had a little bias. We can increase the number of items to enhance the accuracy rate of variance components. If researchers cannot in- crease the number of items, they can increase the number of raters instead, which can also enhance the accuracy rate. But the number of raters cannot be too large. It can get a little bias when the number of raters reaches five.