心理学报
心理學報
심이학보
Acta Psychologica Sinica
2009年
8期
773~784
,共null页
概化理论 多面Rasch测量模型 主观评分
概化理論 多麵Rasch測量模型 主觀評分
개화이론 다면Rasch측량모형 주관평분
generalizability theory; multi-facet Rasch measurement; performance rating
概化理论(GT)和项目反应理论(IRT)从两个不同的方向发展了经典测量理论,GT和IRT中的多面Rasch测量模型(MFRM)在主观评分中都可以用来估计评分中各变异来源对变异的贡献,对测评的信度进行估计,提出测评改进意见。12名运动员参加了2008北京奥运会男子10米跳台跳水决赛,比赛共6个回合,7名裁判独立对他们在各个回合的表现进行打分。GT和MFRM比较一致地认为运动员自身、回合、运动员与回合的交互效应是运动员得分的重要变异来源,而裁判员对运动员得分差异的贡献不显著。MFRM同时还估计出难度系数是影响男子跳台跳水成绩的重要变异来源,在评分等级6.5附近存在步校准错乱,得出的运动员成绩排序与2008奥运实际排序有所不同。在GT中难度系数作为隐藏侧面,其效应未能分离出来。GT和MFRM从两个不同的方面给测量提供改进意见:GT发现可以通过增加回合数来提高g系数,而增加裁判数对其影响不大。MFRM给出各侧面的要素(如某裁判、运动员等)的估计值及其标准误,它给出的诊断性拟合统计也有助于甄别异常得分或评分模式。
概化理論(GT)和項目反應理論(IRT)從兩箇不同的方嚮髮展瞭經典測量理論,GT和IRT中的多麵Rasch測量模型(MFRM)在主觀評分中都可以用來估計評分中各變異來源對變異的貢獻,對測評的信度進行估計,提齣測評改進意見。12名運動員參加瞭2008北京奧運會男子10米跳檯跳水決賽,比賽共6箇迴閤,7名裁判獨立對他們在各箇迴閤的錶現進行打分。GT和MFRM比較一緻地認為運動員自身、迴閤、運動員與迴閤的交互效應是運動員得分的重要變異來源,而裁判員對運動員得分差異的貢獻不顯著。MFRM同時還估計齣難度繫數是影響男子跳檯跳水成績的重要變異來源,在評分等級6.5附近存在步校準錯亂,得齣的運動員成績排序與2008奧運實際排序有所不同。在GT中難度繫數作為隱藏側麵,其效應未能分離齣來。GT和MFRM從兩箇不同的方麵給測量提供改進意見:GT髮現可以通過增加迴閤數來提高g繫數,而增加裁判數對其影響不大。MFRM給齣各側麵的要素(如某裁判、運動員等)的估計值及其標準誤,它給齣的診斷性擬閤統計也有助于甄彆異常得分或評分模式。
개화이론(GT)화항목반응이론(IRT)종량개불동적방향발전료경전측량이론,GT화IRT중적다면Rasch측량모형(MFRM)재주관평분중도가이용래고계평분중각변이래원대변이적공헌,대측평적신도진행고계,제출측평개진의견。12명운동원삼가료2008북경오운회남자10미도태도수결새,비새공6개회합,7명재판독립대타문재각개회합적표현진행타분。GT화MFRM비교일치지인위운동원자신、회합、운동원여회합적교호효응시운동원득분적중요변이래원,이재판원대운동원득분차이적공헌불현저。MFRM동시환고계출난도계수시영향남자도태도수성적적중요변이래원,재평분등급6.5부근존재보교준착란,득출적운동원성적배서여2008오운실제배서유소불동。재GT중난도계수작위은장측면,기효응미능분리출래。GT화MFRM종량개불동적방면급측량제공개진의견:GT발현가이통과증가회합수래제고g계수,이증가재판수대기영향불대。MFRM급출각측면적요소(여모재판、운동원등)적고계치급기표준오,타급출적진단성의합통계야유조우견별이상득분혹평분모식。
Generalizability Theory (GT) and Item Response Theory (IRT) have improved the Classical Test Theory (CTT) in different aspects. They put focus on macro-level and micro-level of measurement, respectively. Both GT and Multi-Facet Rasch Measurement model (MFRM, which is one case of IRT methods) can be applied to decompose the variances from different sources (including error) in the Performance Rating and to estimate the reliability of rating. The results from both of them can give researchers some recommendations about how to improve the Performance Rating. This paper tries to find how they perform differently in the way of improving the rating process in Beijing Olympic Games through making a comparison between GT and MFRM. Those athletes' scores from 10 meters platform diving in Beijing Olympic Games form the data to be analysis. In the 2008 Beijing Olympic Games, there were twelve athletes who participated in the final of Men's 10 meters platform diving. Each athlete dived six times, and was marked independently by seven referees each time. In total, there are 12×6×7=504 data points. Based on this dataset, both GT and MFRM are applied to analyze four facets (including round, person, referee, and difficulty) of these scores. However, as a hidden facet, difficulty can't be separated in GT. The results from GT and MFRM suggest consistently that the athlete, the round, and their interaction are important sources of variation in these scores, and that the referees have not significant contribution to variance in athletes' scores. At the same time, the results from MFRM indicate that the difficulty is also a significant source of variation. Based on these results, we can find some ways to improve scoring from different aspects. For example, we find that the g coefficient is influenced significantly not by the number of referee but by the number of rounds. Therefore, it's helpful to improve the reliability of rating through increasing the number of rounds. MFRM gives the measure of individual elements within each facet, the standard errors for each element and the diagnostic fit statistics to detect aberrant responses. Based on the analysis of MFRM, We find the referees disordered the step calibrations of the scale around the category of 6.5. The results from MFRM also give birth to a new ranking which is really different from that given in the 2008 Beijing Olympic Games. In sum, we find that GT and MFRM are consistent totally in estimating the sources of variation. However, both methods have their own advantages. GT is more helpful in the way of design of measurement, and MFRM is more helpful in the ways of measure of individual elements within each facet and detecting aberrant responses. Moreover, MFRM can separate the effects of round, referee, and difficulty more successfully and produce a more precise estimation of ranking of athletes than the method used in 2008 Beijing Olympic Games.