结合题目作答时间的计算机化自适应测验选题方法

心理科学 ›› 2021, Vol. ›› Issue (5): 1241-1248.

结合题目作答时间的计算机化自适应测验选题方法

郭治辰,汪大勋,蔡艳,涂冬波

江西师范大学

收稿日期:2020-04-10 修回日期:2020-12-16 出版日期:2021-09-20 发布日期:2021-09-20
通讯作者: 涂冬波

The Use of Response Time in Item Selection of Computerized Adaptive Testing

Received:2020-04-10 Revised:2020-12-16 Online:2021-09-20 Published:2021-09-20
Contact: Tu Dong-Bo

摘要/Abstract

摘要： 计算机形式的测验能够记录考生在测验中的题目作答时间（Response Time, RT），作为一种重要的辅助信息来源，RT对于测验开发和管理具有重要的价值，特别是在计算机化自适应测验（Computerized Adaptive Testing, CAT）领域。本文简要介绍了RT在CAT选题方面应用并作以简评，分析了这些技术在实践中的可行性。最后，探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。

关键词: 计算机化自适应测验, 题目作答时间, 能力估计, 选题方法, 题目曝光, 测验时间

Abstract: The computer-based test enables the examinee’s response time (RTs) to be recorded accurately. As an important source of auxiliary information, RTs have an important potential value for test development and management, especially in the field of Computerized Adaptive Testing (CAT). With the collection of RTs, the CAT assessment process can be further improved in terms of precision, fairness, and minimizing costs. It is widely known that item selection is the key step of CAT, which reflects its "adaptive" characteristics. The traditional CAT item selection algorithm does not consider RT information, this is unfavorable for test management and may lead to biased assessment results. This paper synthetically and briefly introduces the application of RTs in the item selection of CAT and analyzes the feasibility of these techniques in practice, which makes the readers have a specific and clear understanding of the potential value of RTs in CAT. Since item selection in CAT is based on the candidate's ability estimation (except for the selection of initial items), the improvement of ability estimation can also be considered as an indirect improvement of the item selection. Therefore, this paper divides relevant methods into two categories: (1) indirect improvement of item selection by RTs (ability estimation) and (2) direct improvement of item selection by RTs (item selection method). Generally, a majority of tests are a mixture of speed and power components, while the RTs provide information not only examinees’ ability but also item characteristics. In the past decades, a lot of models for response times and response accuracy (RA) has been proposed (e.g., Thissen, 1983; Wang & Hanson, 2005; van der Linden, 2007), which makes it possible to use RTs to improve the accuracy of ability estimation in CAT, and the item selection is further improved (van der Linden, 2008). In general, examinees with same ability level may need different time to complete an item (van der Linden, Scrams, & Schnipke, 1999), and the response time of an examinee for different items may also be different because some items are usually more time consuming than others (Bergsrtom et al., 1994; Veldkamp, 2016). Test speededness results in examinees taking different amounts of time to complete a test. However, most standardized tests often set a specific time for practical administration purposes, when candidates pressured by the time limit, they may improve the response speed at the expense of accuracy (Entink, Kuhn, Hornke, & Fox, 2009), which leads to biased ability estimation. Therefore, it is necessary to eliminate the influence of the speed factor for the test whose main goal is to evaluate ability, and this is more in line with the unidimensional hypothesis of IRT. However, the conventional item selection methods didn’t take this into account, and RT information should be introduced into the process of item selection to address this problem (van der Linden, Scrams & Schnipke, 1999; Fan et al., 2012). With the development of measurement theory and technology, researchers hope to get richer diagnostic information about an examinee from the test, rather than simply evaluating him on an abstract scale, and the application of RTs is a good start.

Key words: computerized adaptive testing, item response time, ability estimation, item exposure, item selection method, test time

郭治辰汪大勋蔡艳涂冬波. 结合题目作答时间的计算机化自适应测验选题方法[J]. 心理科学, 2021, (5): 1241-1248.

[1]	高旭亮王芳龚毅. 多级计分认知诊断计算机化自适应测验的新选题方法[J]. 心理科学, 2021, (3): 728-736.
[2]	郭磊刘伟. CAT中结合贝叶斯方法与序贯监测程序的题库质量监控技术[J]. 心理科学, 2018, (1): 189-195.
[3]	詹沛达高椿雷边玉芳罗照盛. 使用题组反应模型缓解局部题目依赖性对多阶段测验的危害[J]. 心理科学, 2017, 40(1): 216-223.
[4]	王钰彤罗照盛王睿. 计算机化多阶段自适应测验研究述评[J]. 心理科学, 2015, (2): 452-456.