Psychological Science ›› 2015, Vol. ›› Issue (2): 446-451.

Previous Articles     Next Articles

Comparison of Methods addressing MNAR Missing Data When Fitting a Latent Growth Model: Selection Model and ML

Nan Chen,   

  • Received:2014-04-01 Revised:2014-12-03 Online:2015-03-20 Published:2015-03-20
  • Contact: Nan Chen

基于增长模型的非随机缺失数据处理:选择模型和极大似然方法

陈楠,刘红云   

  1. 北京师范大学
  • 通讯作者: 陈楠

Abstract: Longitudinal data analysis is a widely-used technique in psychological studies, however, since it is time consuming with a large number of repeated observations, missing data is a common problem and usually meets the missing not at random (MNAR) mechanism. Methods for handling missing data have developed for a long time; however, because MNAR mechanism itself cannot be tested and the same with the assumptions of different models under MNAR, it is still not an easy task for practitioners to select an appropriate method for handling MNAR missing data. Inappropriate methods may bias parameter estimates and even mislead the study results, due to the violation of model assumptions. The objective of the current study is to investigate the effects of method selection when fitting a latent growth curve model with longitudinal datasets. Two approaches under different assumptions were compared in handling MNAR missingness in a five-wave longitudinal dataset using the Monte Carlo simulation: one is the Diggle-Kenward selection model under MNAR mechanism, while the other is the Maximum Likelihood (ML) method under MAR mechanism. Three factors were simultaneously considered in this study: (i) the sample size (100, 300, 500, 1000), (ii) the percentage of MNAR missing data (5%, 10%, 20% 40%), and (iii) the percentage of MAR missing data (0%, 10%, 20%). So a total of 4×4×3=48 combination of conditions were generated, and 500 replicates were used in each of the conditions. The performances in estimating parameters (the means and variances of intercept and slope, i.e. μi, μs, σi2 and σs2) of these two approaches were then evaluated according to two criteria, namely, root mean square error (RMSE), and coverage rates of the 95% CIs. The estimations of standard errors (SEs) were also considered. Results indicated that: (i) higher precision of estimates were obtained from the Diggle-Kenward selection model, especially under conditions of high percentage of MNAR missing data. (ii) The level of the MNAR missingness was the major factor affecting the parameter estimation precision. With low MNAR missingness level (≤10%), parameter estimates by the Diggle-Kenward selection model had little difference from those of the ML approach. But with higher percentages of the MNAR missingness, the Diggle-Kenward selection model had obvious better performance. (iii) The precision of the parameter estimates enhanced as the quantity of the sample increased, no matter which method was applied. There existed significant interactions between MNAR missingness level and sample size in μi and μs estimation precisions. Moreover, the MAR missingness proportion made little difference in parameter estimation. (iv) ML approach produced lower SE estimates than Diggle-Kenward selection model, and the discrepancy between them became inflated as the percentage of MNAR missing data increased. (v) When fitting a growth curve model, compared to the variances of the latent variables (σi2 and σs2), the means (μi and μs) were influenced much more by MNAR missingness. In conclusion, when there exists MNAR missing data in a longitudinal dataset, applying an approach considering this MNAR mechanism is necessary for analysis. Only when there is a small sample size and sufficiently low percentage of MNAR missing data could the ML approach under MAR assumption be an acceptable alternative. Some suggestions are also provided for the selection of methods handling missing data.

Key words: longitudinal study, latent growth curve model (LGM), missing not at random (MNAR) mechanism, Diggle-Kenward selection model, maximum likelihood (ML) approach

摘要: 对含有非随机缺失数据的潜变量增长模型,为了考察基于不同假设的缺失数据处理方法:极大似然(ML)方法与Diggle-Kenward选择模型的优劣,通过Monte Carlo模拟研究,比较两种方法对模型中增长参数估计精度及其标准误估计的差异,并考虑样本量、非随机缺失比例和随机缺失比例的影响。结果表明,符合前提假设的Diggle-Kenward选择模型的参数估计精度普遍高于ML方法;对于标准误估计值,ML方法存在一定程度的低估,得到的置信区间覆盖比率也明显低于Diggle-Kenward选择模型。

关键词: 追踪研究, 潜变量增长模型, 非随机缺失机制, Diggle-Kenward选择模型, 极大似然方法