心理科学 ›› 2023, Vol. 46 ›› Issue (2): 461-469.

• 统计、测量与方法 • 上一篇    下一篇

多值评分CD-CAT的选题策略研究

高旭亮,王芳 赵鹏娟   

  1. 贵州师范大学心理学院,贵州师范大学心理健康教育与咨询中心,贵阳,550025
  • 收稿日期:2020-09-29 修回日期:2022-04-23 出版日期:2023-03-20 发布日期:2023-03-20
  • 通讯作者: 高旭亮

Research on item selection algorithm of polytomous scored cognitive diagnosis adaptive test

Gao Xuliang, Wang Fang, Zhao Pengjuan   

  1. School of Psychology, Mental Health Education and Counseling Center, Guizhou Normal University, Guiyang, 550025
  • Received:2020-09-29 Revised:2022-04-23 Online:2023-03-20 Published:2023-03-20

摘要: 认知诊断计算机化自适应测验(Cognitive Diagnosis Computerized Adaptive Testing, CD-CAT)为心理和教育评估测验的发展提供了新的视角。目前,关于CD-CAT的研究主要是基于二值评分的模型展开,但是,在实际应用领域,例如教育评估或心理计量学中,存在很多多值评分数据。高效的选题方法是CD-CAT程序成功的核心要素,本研究提出了两种新的多值评分CD-CAT(Polytomous CD-CAT, PCD-CAT)的选题方法,期望后验方差(Expected Posterior Variance, EPV)和最大期望距离(Maximum Expected Distance, MED)。在广义多值评分GPDM模型下,通过模拟实验比较了EPV和MED在PCD-CAT的效果。实验结果表明,与传统的选题方法相比,EPV和MED具有更高的测验精度和测验效率。

关键词: 认知诊断, 计算机化自适应测验, 多值评分, GPDM模型

Abstract: The development of cognitive diagnostic computerized adaptive test (CD-CAT) provides a new perspective for obtaining information about students’ mastery or nonmastery of a set of skills in the field of knowledge. In recent years, CD-CAT has received more and more attention in the field of educational evaluation and psychological evaluation. Effective item selection algorithm is the key to the success of CD-CAT system. To date, various item selection methods of CD-CAT have been proposed based on dichotomous cognitive diagnosis model (CDM). There are few researches on the item selection algorithm of polytomous CD-CAT (PCD-CAT). However, in educational assessment, psychological evaluation and many other disciplines, there are a lot of polytomously-scored data. In this article, the authors explored the CD-CAT item selection algorithm based on general polytomous diagnosis model, and proposed two PCD-CAT item selection algorithms, namely, maximum expected posterior distribution variance (EPV) and maximum expected distance (MED). The performances of the proposed item selection algorithms were evaluated through two simulation studies and compared with the KL, PWKL, and HKL algorithms in fixed-length and variable-length PCD-CAT. In the simulation experiment, the size of the item bank was 350, and the maximum score of each item was fixed at 4. The number of attributes was fixed to K = 7. In the first study, we manipulated three factors: the test length (5, 10, 15 and 20), item bank quality (high vs. low), and item selection algorithms (KL, PWKL, HKL, EPV and MED). The results of Study 1 showed that the EPV and MED consistently resulted in the highest attribute pattern recovery rate in all the simulation conditions. The results of Study 1 showed that the pattern correct classification rate (PMR) of EPV and MED was significantly higher than that of KL, PWKL and HKL methods. The EPV and MED had similar PMR under most experimental conditions, but when the test length was short (for example, 5 items), regardless of the quality of the item bank, the PMR of EPV was higher than that of MED. Under all conditions, the KL method had the lowest PMR rate, while the difference in PMR rates between PWKL and HKL was almost negligible. Study 2 investigated the performance of two proposed new item selection algorithms under the condition of variable-length PCD-CAT. In Study 2, when the maximum posterior probability of the attribute vector reaches a prespecified value, the test is terminated. Three factors were manipulated in the Study 2: prespecified termination values (0.6, 0.7, 0.8 and 0.9), item bank quality (high and low) and five item selection algorithms. The results of Study 2 showed that the average test lengths of EPV and MED were roughly similar, and significantly smaller than the average test lengths of KL, PWKL and HKL. Although the results are encouraging, there are still some future research directions that deserve further study, such as, (a) how to use both confirmatory CDM and exploratory CDM in diagnostic evaluation to better analyze data; (b) item calibration technology for cognitive diagnostic computerized adaptive test; (c) using computer to realize automatic scoring of polytomously-scored items.

Key words: cognitive diagnosis, computerized adaptive testing, polytomously-scored items, GPDM