Journal of Psychological Science ›› 2024, Vol. 47 ›› Issue (2): 474-484.DOI: 10.16719/j.cnki.1671-6981.20240226

• Psychological statistics, Psychometrics & Methods • Previous Articles     Next Articles

Evaluation of Item-Level Fit in Cognitive Diagnosis Model

Gao Xuliang, Wang Fang, Xia Linpo, Hou Minmin   

  1. School of Psychology Guizhou Normal University, Guiyang, 550025; Mental Health Education and Counseling Center, Guizhou Normal University, Guiyang, 550025
  • Online:2024-03-20 Published:2024-02-29

认知诊断模型中题目拟合评估的研究*

高旭亮**, 王芳, 夏林坡, 侯敏敏   

  1. 贵州师范大学心理学院,贵州师范大学心理健康教育与咨询中心,贵阳,550025
  • 通讯作者: **高旭亮,E-mail: gaoxl9817@foxmail.com
  • 基金资助:
    *本研究得到贵州省科技计划项目(黔科合基础-ZK[2021]一般123)、贵州省高校人文社会科学研究项目(2020QN018)和贵州师范大学2019年博士科研启动项目(GZNUD[2019] 27号)的资助

Abstract: The goal of cognitive diagnosis model (CDM) is to classify participants into potential categories with different attribute patterns, which provide diagnostic information about whether the student has mastered a set of skills or attributes. Compared with single-dimensional item response models (e.g., item response models), CDM provides a more detailed assessment of the strengths and weaknesses of students. Although CDM was originally developed in the field of educational evaluation, it has now been used to evaluate other types of structures, such as psychological disorders and context-based ability assessment. As with any model-based evaluation, a key step in implementing the CDM is to check the model data fit, that is, the consistency between model predictions and observed data. Only when the model fits the data, the estimated model parameters can be reliably explained. Item fit is used to evaluate the fit of each item with the model, which helps to identify abnormal items. Deleting or modifying these items will improve the overall model data fit for the entire test.
At present, some commonly used item fit statistics in IRT have been extended to CDM. However, there is no research system to compare the comprehensive performance of these item fit indicators in CDM. In this study, we compared the performance of χ2, G2, S-χ2, z(r), z(l), and Stone-Q1 in the CDM. This study investigated the Type I error rate and power of the above item fit statistics through a simulation study. The factors manipulated include sample size (N=500, 1000), generating model (DINA, DINO, and ACDM), fitting model (DINA, DINO, and ACDM), test length (30 and 60), test quality (high and low), and significance level (.01 and .05). The test examined five attributes. For high-quality and low-quality tests, the guess parameters and slipping parameters of the three generating models are randomly extracted from uniform distributions U(.05, .15) and U(.15, .25), respectively.
The simulation results showed that, in terms of the Type I error, z(r) and z(l) performed best under all conditions. In terms of statistical test power, when the generating model was ACDM, z(r) and z(l) had the highest average power under all conditions. When the generating model was DINA or DINO, in the low-quality test, the power of χ2and G2 was higher; and in the high-quality test, z(r) had the highest power. In short, combining the performance of the Type I error and power, if the data fit A-CDM, z(r), and z(l)performed best; when the data fit the DINA or DINO model, in low-quality test, χ2, and G2 performed the best; however, in high-quality tests, the z(r) performed better among all methods.
This study only investigated the condition that the number of attributes is 5, and the actual test may measure more attributes. Therefore, future research should focus on the influence of the number of attributes. Lastly, the person fit assessment is also an important step in the cognitive diagnostic test, which can help identify the abnormal responses of individual students. More studies on the person fit in cognitive diagnosis model are needed.

Key words: CDM, item fit, Type I error rate, power

摘要: 有效应用认知诊断模型(cognitive diagnosis model, CDM)的一个关键步骤是检查模型和测验题目是否拟合。尽管已有研究将IRT中的题目拟合检验方法应用于CDM中,然而这些方法在CDM中的表现仍缺乏系统的比较研究。本研究通过模拟实验比较了χ2,G2,S2,z(r),z(l)和Stone-Q1的一类错误率和统计检验力。实验结果显示,综合一类错误率和统计检验力而言,当用ACDM作为生成模型时,z(r)和z(l)的效果最优;当生成模型是DINA或DINO时,在高质量测验中,z(r)的表现最好,而在低质量测验中,χ2G2的表现更好。最后通过一个实测数据分析,进一步检验了题目拟合检验方法的实证应用效果。

关键词: 认知诊断模型, 题目拟合, 一类错误率, 统计检验力