等级反应多水平侧面模型及其在主观题评分中的应用

心理科学 ›› 2017, Vol. 40 ›› Issue (6): 1483-1490.

等级反应多水平侧面模型及其在主观题评分中的应用

康春花,孙小坚,曾平飞

浙江师范大学教师教育学院

收稿日期:2016-11-22 修回日期:2017-03-31 出版日期:2017-11-20 发布日期:2017-11-20
通讯作者: 曾平飞

Formulating and applying grade response multilevel facets model in scoring open-ended item

Received:2016-11-22 Revised:2017-03-31 Online:2017-11-20 Published:2017-11-20

摘要/Abstract

摘要： 探讨了康春花、孙小坚和曾平飞（2016）提出的等级反应多水平侧面模型（GR-MLFM）在包含被试及评分者预测变量（完整模型）下的返真性和适用性。结果表明：（1）GR-MLFM完整模型具有逻辑上和数理上的合理性，可用于主观题的评分情境，能较好地估计项目、被试能力、评分者及影响因素各参数，较好地检测出评分者效应、影响因素及其影响程度；（2）在数学问题解决的评分实践中，评分员存在两种类型的评分倾向（宽松效应和严格效应），但绝大多数评分员的宽严度不明显，只有极个别评分员具有明显的宽松效应；评分者的责任心可正向预测其严格程度，自信心可正向预测其宽松程度，而情绪稳定性和评分经验的预测作用不显著。

关键词: 等级反应多水平侧面模型, 完整模型, 评分者影响因素, 评分者效应

Abstract: Open-ended items played an important role in evaluating students’ skills, such as analysis, synthesis, and problem solving skills. During scoring open-ended items, rater effects would be occurred due to lack of standard answers, also lack of consistent cognition for the rating rules between different raters. As a consequence, the scoring results would be affected by rater effects. How to estimate person, item and rater parameters precisely is an important issue. Some researchers formulated the GRM-based multilevel facets model, which called grade response multilevel facets model (GR-MLFM), to estimate person ability and handle rater effect when the tasks are successively processing. They used two simulation studies to examine parameter recovery for the unconditional model (none predictors were added into the model) of GR-MLFM. Results show that the model can recovery all the parameters very well, GR-MLFM is useful and reasonable; also results show that the random effect model are more suitable than the fixed model. The purpose of current study was to examine the reasonable of GR-MLFM when both person and rater predictors were added into the model, which called full model of GR-MLFM. One simulation studies and an empirical study was conducted to evaluate the feasibility of GR-MLFM. For the simulation study, 2 levels were formulated, level 1 was an IRT model, and in level 2, the gender of student and 4 predictors of raters were considered. R software was applied to generate person response matrix, after that, OpenBUGS software, which based on MCMC algorithm, was used to estimate the parameters of model. Bias, root mean square error (RMSE), and percentage bias (PB) were used to evaluate the recovery. The results indicated that all of the estimates of parameters were closed to the true values, the absolute differences between the estimates and true values were less than .05 across all the parameters. Meanwhile, the RMSEs of these estimates were small enough, the values ranged from .04 to 0.132. Furthermore, although 7 PB values were larger than 5, most of they can be attributed to smaller denominators, which means almost of these parameters show an acceptable results. Based upon these results, it can be seem that the model can fitted the data precisely and stably, and was promising to apply the model to detect rater effect. For the empirical study, 4 open-ended items were used to detect students’ problem solving skills of mathematic, then 20 raters were recruited to rate the responses of 80 person who answered these items. Also, the gender of student and 4 predictors of raters, responsibility and stability of emotion and confidence and rating experience, were added into the level 2 model to investigate the rater effects. Results show that among these 20 raters, almost raters show non substantial rater effects (severity / leniency), only rater 9 displayed significant severity. Furthermore, 2 predictors of raters had significant effects on rater effects, among these, responsibility had positive effect on severity, and confidence had positive effect on leniency; while the rating experience and stability of emotion of raters produce non-significant effects on rating results.

Key words: grade response multilevel facets model, full model, rater characteristics, rater effect

康春花孙小坚曾平飞. 等级反应多水平侧面模型及其在主观题评分中的应用[J]. 心理科学, 2017, 40(6): 1483-1490.