CAJ | 학술논문

设置铆题来链接不同测验形式是一种常用的等值设计。但受到曝光等因素影响，铆题功能在不同施测时间会发生改变。本研究采用MH检验和logistic回归考察我国一大型考试等值的铆题质量，结果发现，有22个铆题发生参数漂移，铆题的难度参数和区分度参数可能发生漂移；这些铆题中大部分在二次使用时无法通过模型拟合检验；若不删除参数发生漂移的铆题导致较大的系统等值误差，应将铆题参数漂移检验作为等值中的一步必要工作。
설치류제래련접불동측험형식시일충상용적등치설계。단수도폭광등인소영향，류제공능재불동시측시간회발생개변。본연구채용MH검험화logistic회귀고찰아국일대형고시등치적류제질량，결과발현，유22개류제발생삼수표이，류제적난도삼수화구분도삼수가능발생표이；저사류제중대부분재이차사용시무법통과모형의합검험；약불산제삼수발생표이적류제도치교대적계통등치오차，응장류제삼수표이검험작위등치중적일보필요공작。
In a large-scale examination, common items or anchors are frequently embedded in different test forms for equating. The Non-Equivalent Anchor Test design （NEAT） requires not only the anchors＇ representation in contents but also functioning equivalently across test forms. As a result of the effects of irrelavent factors, some anchors＇ parameter may change substantially in different administrations. Goldstein （1983） named this phenomenon item as ＂parameter drift （IPD）＂. Drifted anchors may cause systematic error in equating （ Huiqin, Rogers, ＆ Vukmirovic, 2008）, but until now few studies have addressed this issue in China. In the present paper, several different approaches of detecting drifted items and minimizing their effect on equating were outlined first. Then, two kinds of popular methods for Differential Item Functioning （DIF） detection, the MH test and logistic regression, were utilized to examine anchors in equating two test forms from a large-scale examination in China. The MH method was done by means of DIFAS 4.0 and logistic regression was done by means of R. For controlling Type I Error, ETS＇ s classification criteria and pseudo R- squareds were also considered when MH test and logistic regression were performed. Two test forms data were fit and equated by the Three-Parameter Logistic Model （3PLM） after the drifted anchors were deleted. Item parameter estimation under 3PLM was performed by means of BILOG-MG. Factor analysis suggested that 3PLM could be used to fit two test forms. The results showed：（1） Twenty-two anchors were detected for parameter drift. Both anchors＇ difficulty parameter and discrimination parameter could drift across different test forms. （2） Twenty-one of all drifted anchors fitted 3PLM well in the old test form, but sixteen of them misfitted in the new test form. （3） Equating results with the Mean/Square method before and after the deletion of drifted anchors were very different. Therefore, the inclusion of drif- ted anchors in equating might cause systematic error. Examination of anchors＇ parameter drift should be treated as one necessary process in equating different test forms. Such method could also be utilized for longitudinal psychological study in order to maintain com- parability across timeline. The limitation of the present paper and further research suggestions on anchor parameter drift were given at the end of this paper. For example, the comparison of detecting methods and social and cultural causes of anchors＇ parameter drift may be examined in the future. The robustness of different equating methods needs to be investigated under circumstances of both anchors＇ difficulty parameter and discrimination parameter drift.