心理学报
心理學報
심이학보
Acta Psychologica Sinica
2008年
5期
618~625
,共null页
IRT 多级评分模型 GPCM a-分层 选题策略
IRT 多級評分模型 GPCM a-分層 選題策略
IRT 다급평분모형 GPCM a-분층 선제책략
IRT, polytomously scored model, GPCM, a -stratified design, item selection strategy
选题策略是计算机白适应测验(Computerized Adaptive Testing,CAT)研究的一项重要内容,它的好坏直接关系到考试的信度、效度及考试的安全性。CAT的许多研究与应用,都建立在0—1二级评分模型基础上,对多级评分CAT的选题策略的研究很少报导。目前国内虽已开展了基于GRM的CAT研究,但基于GPCM的CAT的研究尚未见有关报道。本文通过计算机模拟程序,对基于拓广分部评分模型(Generalized Partial Credit Model,GPCM)下的CAT的四种选题策略在多种情况下进行了比较研究。研究结果表明:被试能力呈正态分布时,选题策略的使用效果与项目步骤参数分布有很大的关系。(1)项目步骤参数均服从正态分布时,采用能力与项目步骤参数匹配选题策略效果最佳;(2)项目步骤参数均服从均匀分布时,能力与项目步骤参数平均数匹配选题策略效果最佳。
選題策略是計算機白適應測驗(Computerized Adaptive Testing,CAT)研究的一項重要內容,它的好壞直接關繫到攷試的信度、效度及攷試的安全性。CAT的許多研究與應用,都建立在0—1二級評分模型基礎上,對多級評分CAT的選題策略的研究很少報導。目前國內雖已開展瞭基于GRM的CAT研究,但基于GPCM的CAT的研究尚未見有關報道。本文通過計算機模擬程序,對基于拓廣分部評分模型(Generalized Partial Credit Model,GPCM)下的CAT的四種選題策略在多種情況下進行瞭比較研究。研究結果錶明:被試能力呈正態分佈時,選題策略的使用效果與項目步驟參數分佈有很大的關繫。(1)項目步驟參數均服從正態分佈時,採用能力與項目步驟參數匹配選題策略效果最佳;(2)項目步驟參數均服從均勻分佈時,能力與項目步驟參數平均數匹配選題策略效果最佳。
선제책략시계산궤백괄응측험(Computerized Adaptive Testing,CAT)연구적일항중요내용,타적호배직접관계도고시적신도、효도급고시적안전성。CAT적허다연구여응용,도건립재0—1이급평분모형기출상,대다급평분CAT적선제책략적연구흔소보도。목전국내수이개전료기우GRM적CAT연구,단기우GPCM적CAT적연구상미견유관보도。본문통과계산궤모의정서,대기우탁엄분부평분모형(Generalized Partial Credit Model,GPCM)하적CAT적사충선제책략재다충정황하진행료비교연구。연구결과표명:피시능력정정태분포시,선제책략적사용효과여항목보취삼수분포유흔대적관계。(1)항목보취삼수균복종정태분포시,채용능력여항목보취삼수필배선제책략효과최가;(2)항목보취삼수균복종균균분포시,능력여항목보취삼수평균수필배선제책략효과최가。
The objective of computerized adaptive testing (CAT) is to construct an optimal test for each examinee. Item Selection Strategy (ISS) is an important part of CAT research, whose quality is directly related to the reliability, efficiency, and security of the test.
Many researches and applications of CAT are based on a dichotomously scored model. It is highly evident that more information can be obtained from examinees using a polytomously scored model rather than a dichotomous model. Moreover, it is necessary for us to further explore CAT research based on a polytomously scored model.
Both the Generalized of a polytomously scored Partial Credit Modal (GPCM) and the Graded Response Model (GRM) are within the range model. However, they differ from each other. In the GRM, the item grade difficulties ascend monotonously as the grades increase; while the GPCM shows the performing process of the item, which is separated into some line-steps to put forwards. In the GPCM, each item contains several step parameters, and there are no specific rules governing them. The posterior step cannot advance when the earlier step has not been completed, and the posterior's step parameter may be lower than that of the previous one. Considerable research is already being conducted on CAT using the GRM; however, in our country, there are few reports pertaining to research on CAT using the GPCM.
This study investigated the four types of ISS in comparison with CAT in various circumstances, using the GPCM through computer simulated programs. They are implemented in four item pools, and each item pool has a capacity of 1000 items. Each item has five step parameters; further, the discrimination parameter and step parameters are distributed as follows: b - N (0,1), lna - N (0,1), b - N (0,1), a - U (0.2,2.5), b - U (-3,3), lna - N (0,1), b - U (-3,3), and a - U (0.2,2.5). Item parameters are generated based on the Monte Carlo simulation method. Responses to the items are generated according to the GPCM for a sample of 3000 simulatees θ - N(0,1) whose trait level was also generated using the Monte Carlo simulation method in some types of ISS. During the course of responses, the simulatees' ability is estimated based on the response obtained. In addition, after the four item pools are sorted by the discrimination parameter to complete the a -stratified design, the abovementioned process is performed repeatedly. Thirty-two simulated CATs are administered with the output evaluated with regard to the following measurements: precision, ISS steady, item used even, average use of item per person, χ^2, efficiency, and item overlap.
The data in tables 1 and 2 include both the index values used for evaluation (which were obtained from the CAT process using four types of ISS when the item pool did not adopt the stratified design and instead adopted the a-stratified design) and values that are calculated after summing the weight of every index value: We can draw the following conclusions from the data in the tables: all the ability estimates are highly accurate and have fewer differences. Moreover, we compare the value by summing every means weight, we learn that the item step parameter distribution greatly influences the choices of ISS.
On the condition that the examinee's trait level follows normal distribution, the application results of the ISS and the item step parameter distribution share a very close relationship. (1) If the item's step parameters follow a normal distribution, the efficiency of the ISS for a random step parameter matching the trait level is much better than that for others. (2) If the item's step parameters follow a uniform distribution, the efficiency of the item selection strategy for the item' s average step parameter matching the trait level is much better than that for others.