等级反应多水平评分者漂移模型的构建

心理科学 ›› 2018, Vol. 41 ›› Issue (1): 196-203.

等级反应多水平评分者漂移模型的构建

顾士伟¹,曾平飞¹,孙小坚²,康春花¹

1. 浙江师范大学教师教育学院
2. 北京师范大学

收稿日期:2016-12-22 修回日期:2017-09-18 出版日期:2018-01-20 发布日期:2018-01-20
通讯作者: 康春花

Formulation of Grade Response Multilevel DRIFT Model

Received:2016-12-22 Revised:2017-09-18 Online:2018-01-20 Published:2018-01-20
Contact: Chun-Hua KANG

摘要/Abstract

摘要： 对于评定耗时较长的测验来说，时间因素对评分精确性的影响不容忽视，因此，评分者漂移方面的研究备受关注。研究基于康春花，孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型建构出可用于检测评分者漂移的等级反应多水平评分者漂移模型，并通过模拟研究对模型性能进行验证。结果表明：模型能够精确估计项目和能力参数；且与固定效应模型相比，评分者随机效应模型能更有效地检测出评分者漂移效应，随机效应模型的有效性和稳定性更佳。

关键词: 评分者漂移, 等级反应多水平评分者漂移模型, 固定效应, 随机效应

Abstract: Recently, both national and international educational assessment programs have highlighted the usefulness of constructed-response item (CR item), accordingly, rater effect also be concerned. The grade response multilevel facets model (GR-MLFM), which proposed by Kang, Sun and Zeng (2016), was used to detect rater effect, and simulations demonstrated that GR-MLFM cannot only estimate the item and person parameters precisely, but also can detect the rater effect efficiently. GR-MLFM integrates the advantages of many facets Rasch model, multilevel random coefficient model, and grade response model. Like other many facet models, however, GR-MLFM also regard rater effect as just static effect, which means for GR-MLFM, only the overall rater effect can be obtained across the time stages, while the specific rater effect for each time stage unobserved. In fact, when the rating task takes place over the period of several hours or several days, concern may arise about the comparability of ratings both between and within raters over time (Wolfe, Moulder, Myford, 2010). This phenomenon is called DRIFT that means differential rater functioning over time. Myford and Wolfe(2009) developed the separate model(SM), which based on many facets Rasch model that include time variable as one of the facets of the model, to estimate the specific rater effect for each time stage. With the model, we can obtain the separated severity and the change trend of severity for each rater. Nevertheless, SM cannot find out the factors that affect rater drift. There were many research also aimed at detecting drift effect with generalizability theory or other item response model and etc. However, these methods or models can only detect rater drift effect but cannot discuss the affected factors behind rater drift. In order to detect rater drift and find out which factors that can affect rater drift simultaneously, the authors try to construct a model that dealing with this situation, we name it as grade response multilevel DRIFT model (GR-MLDM). The model extended the GR-MLFM and inherited the thoughts of SM, therefore combine the advantages that come from both GR-MLFM and SM. Two simulation studies, using rater fixed effect model and rater random effect model respectively, are conducted to evaluate the reasonable and feasible of GR-MLDM. For the fixed effect model, there are four types of parameter (discrimination, difficulty, ability, and rater severity over time) and no interaction effect between raters and time, which means that the overall severity of raters will remain the same across time. As for the random effect model, interactions between raters and time are allowed; therefore the dynamic “rater drift” can be detected. Results show that:(1)GR-MLDM, both fixed effect model and random effect model, can estimate item and person parameters precisely.(2)Compared with fixed effect model, random effect model can detect rater drift more precisely; furthermore, due to the interaction between raters and time stages, the random effect model shows more suitable for large-scale assessment. For further investigation, we will apply GR-MLDM into the real situation so that to verify the applicability of this model. In addition, predictors also can be added into the random model to form the full model, which can evaluate factors that affect rater drift in practice.

Key words: differential rater functioning over time, grade response multilevel DRIFT model, fixed effect, random effect

顾士伟曾平飞孙小坚康春花. 等级反应多水平评分者漂移模型的构建[J]. 心理科学, 2018, 41(1): 196-203.