心理科学 ›› 2016, Vol. 39 ›› Issue (3): 720-726.

• 统计、测量与方法 • 上一篇    下一篇

LP方法及其与三种常用DIF检测方法的比较

余跃1,杜文久1,周娟2,秦菊香1   

  1. 1. 西南大学数学与统计学院
    2. 成都石室双楠实验学校
  • 收稿日期:2015-07-20 修回日期:2015-11-29 出版日期:2016-05-20 发布日期:2015-06-20
  • 通讯作者: 杜文久

A New Method:LP and Its Comparision With Three kinds of Commonly Detect Procedure of DIF

  • Received:2015-07-20 Revised:2015-11-29 Online:2016-05-20 Published:2015-06-20

摘要: 本研究基于项目反应理论,提出了一种检验力高且犯Ⅰ类错误率小的检测DIF的新方法:LP法(Likelihood Procedure),且以2PLM下对题目进行DIF检验为例介绍此法。本文通过与MH方法、Lord卡方检验法和Raju面积测量法三种常用的检验DIF的方法比较研究LP法的有效性,同时探讨样本容量、测验长度、目标组和参照组能力分布的差异、DIF值大小等相关因素对LP法有效性可能产生的影响。通过模拟研究,得到以下结论:(1)LP法比MH法及Lord卡方法更灵敏且更稳健;(2) LP法比Raju面积测量法更合理;(3)LP法的检验力随着被试样本容量或DIF值的增大而增大;(4)当参照组与目标组的能力无差异时,LP法在各种条件下的检验力比参照组与目标组的能力有差异时的检验力高;(5)LP法对一致性DIF和非一致性DIF都有良好的检验力,且LP法对一致性DIF的检验力比对非一致性DIF的检验力高。LP法可以简便的扩展并运用到多维度、多级评分项目上。

关键词: 项目功能差异, 项目反应理论, LP法(Likelihood Procedure), MH法(Mantel-Haenszel Procedure), Lord卡方检验法, Raju面积测量法

Abstract: With the development of psychology metrology and wide application of psychological and educational tests, the fairness of test has been concerned by educators and psychologists, and more in-depth study on the differential item functioning has become the fact. Detection of differential item functioning (DIF) has been widely employed in the analysis of routine items, and a number of methods have been developed to detect DIF, such as Mantel-Hansel(MH) Procedure, Standardization(STND), Simultaneous Item Bias Procedure(SIBTEST), Likelihood Ration (LR) Test, Lord’s Chi-Square, Raju's Area Measures, MIMIC Method, etc. in most of those which there exist either a low power of test or a high type I error rate. Therefore it's necessary to find out one more effective method to detect DIF. Proposed in the paper for detecting differential item functioning (DIF), LP(Likelihood Procedure) is an IRT-based method with item-detection under the condition of two parameter logistic model (2PLM) as a representative. The performance of LP was compared with that of MH method, Lord chi-squared and Raju Area Measurement. DIF size, Test length, Sample size, the difference distribution of abilities between the focal group and reference group were also considered. Three levels of DIF size are 0.3, 0.5 and 0.8. Two levels of test length are 40 and 100. Three levels of sample size are 500 examinees, 1000 examinees and 2000 examinees. There are two distributions of abilities between the focal group and reference group, One fits in with standard normal distribution individually, the other says that distribution of abilities in reference group fits in with standard normal distribution while those in focal group fits in with normal distribution in which the mean is -1 and the standard deviation is 1. In this simulation study, data was generated using two parameter logistic model. The DIF item’s difficulty value in the study is corresponding to those in the focal group, or discrimination value is greater than those in the reference group. There are six DIF items in each group totally under the condition of uniform DIF and non-uniform DIF, including corresponding ones of three true-value DIF item. The simulation research indicates the following results: (1) LP has a high power of test and low and stable type I error rate. (2) As a whole the power of LP is higher than Lord chi-squared method and far higher than Mantel-Hansel(MH) method; and the type I error rate of LP is lower than Lord chi-squared method, when the test length is 100, MH method’s type I error rate is far beyond the range of stability scope. (3) LP is no better than Raju Area Measurement method in power of test, but the type I error rate of the later is so high that it’s above 0.1 and far beyond the range of stability scope under a variety of conditions. Generally speaking, LP has the following advantages: (1) LP is more sensitive and stability compared with MH. (2) LP is more reasonable used for checking DIF compared with Raju Area Measurement. (3) LP's power increases with the participants sample size or true DIF value. (4) Compared with the condition of same abilities, LP's power is lower When focal group and reference group behave diffierent abilities. (5) LP's power is high for both uniform DIF and non-uniform DIF, and the power is higher for the former. Finnally, LP is not only applicable to two parameter logistic model, but single parameter and three parameter logistic model as well. In addition, It’s easy to be applied extensively to multidimensional and multicategory scoring item.

Key words: Differential Item Functioning (DIF), Item Response Theory (IRT), LP(Likelihood Procedure), the Mantel-Hansel(MH) method,, Lord’s Chi-Square, Raju’s Area Measures