Journal of Psychological Science ›› 2023, Vol. 46 ›› Issue (4): 960-970.DOI: 10.16719/j.cnki.1671-6981.202304025

• Psychological statistics, Psychometrics & Methods • Previous Articles     Next Articles

Influence Factors of Cross-Test-Cycles Linking:A Modified Single Group Design

Chen Ping, Li Xiao, Ren He, Xin Tao   

  1. Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, 100875
  • Online:2023-07-20 Published:2023-08-14

改良单组设计下的跨年等值影响因素研究*

陈平, 李潇**, 任赫, 辛涛   

  1. 北京师范大学中国基础教育质量监测协同创新中心,北京,100875
  • 通讯作者: ** 李潇,E-mail: lixiao19871117.student@sina.com
  • 基金资助:
    *本研究得到汉考国际科研基金项目“基于我国大规模测评项目的跨年等值研究”(CTI2018B02)和中国基础教育质量监测协同创新中心基础教育质量监测科研基金项目(2019-01-082-BZK01,2019-01-082-BZK02)的资助

Abstract: The implementation of the cross-test-cycles linking (CTCL) can achieve the longitudinal comparability of the test scores of each test cycle and then characterize the development trend of the examinee's ability. The safety and efficiency of linking design, an important part of CTCL, is the premise to ensure the scientificity of CTCL scheme. International large-scale assessments (ILSAs) such as PISA, TIMSS and PIRLS all employ the non-equivalent groups anchor test (NEAT) design for CTCL. However, the NEAT design may have exposure risks. Thus it is unsuitable for LSAs in China that require a high level of test security.
To this end, this study proposed a new CTCL design (i.e., a modified single-group design) that is in line with the national conditions in China. The new design collected linking data by organizing some anchor examinees to answer some anchor items, that is, randomly selecting a linking sample from the examinees who took the new form to anwer the anchor test composed of items selected from the old form. For the new design, the equating method, the size of the linking sample, the length and item type of the anchor test, and the heterogeneity of examinee's ability distribution across test cycles all will affect the equating precision of CTCL. Before applying it to practice, thus, this paper focused on delving into the five factors' influence on the equating precision of CTCL.
To achieve this, a series of simulation studies were conducted by manipulating five factors. Specifically, four equating methods (fixed-parameter calibration [FPC], separate calibration & scale transformation [SC&ST], FPC&ST, and concurrent calibration & ST [CC&ST]), three levels of the linking sample size (1500, 8000, and 18000), two levels of anchor test length (20 and 30), two item formats of the anchor test (mixed test consisting of multiple-choice and constructed-response items and only multiple-choice items), and two levels of mean difference of examinee's ability distribution between the new and old forms (0.01and 0.25) were considered.
The results showed that: (1) FPC&ST and CC&ST outperformed the other equating methods in that they yielded smaller equating errors and were able to provide accurate and stable equating results even when the sample size of the linking sample was relatively small (i.e., 1500); (2) either the length or the item format of anchor test affected the equating precision, but the direction and magnitude of the effect varied with the equating method; (3) the difference in examinee's ability distributions was inversely proportional to the equating precision; and (4) increasing the length of the mixed-format anchor test and the sample size of the linking sample could compensate for the equating error caused by the large difference in examinee's ability distribution.
Findings suggested that FPC&ST and CC&ST methods are preferred for the modified single group-design. As to the two equating methods, the longer the anchor test we set, the smaller the RMSEs and the better the performance. However, it is recommended that the anchor test length should be at least 50 percent of the old form. Moreover, using mixed-format anchor test may improve the performance of the two equating methods. Further research may devote to implementing empirical studies, applying other IRT models, and considering multiple linking scenarios.

Key words: large-scale assessments, linkage plan, equating design, IRT equating method

摘要: 针对我国测评项目的高安全性需求,提出锚人与锚题相结合的新跨年等值设计,并采用基于实证数据的模拟研究方法探究等值方法、锚人数量、锚测验组卷方式和不同测验周期被试能力差异对等值精度的影响。结果表明:以上因素均影响等值精度且等值方法的影响突出。建议:(1)锚人较少时采用需量尺转换的等值方法;(2)锚测验组卷方式应与等值方法计算特点相匹配;(3)各周期被试能力差异较大时可酌情增加锚人或调整锚测验组卷方案。

关键词: 大规模测评项目, 跨年等值方案, 等值设计, 项目反应理论(IRT)等值方法