谱聚类算法在不同属性层级结构诊断评估中的应用

心理科学 ›› 2018, Vol. ›› Issue (3): 735-742.

谱聚类算法在不同属性层级结构诊断评估中的应用

郭磊¹,杨静²,宋乃庆¹

1. 西南大学
2. 东北师范大学

收稿日期:2017-04-28 修回日期:2017-12-15 出版日期:2018-05-20 发布日期:2018-05-20
通讯作者: 郭磊

Application of Spectral Clustering Algorithm under Various Attribute Hierarchical Structures for Cognitive Diagnostic Assessment

Received:2017-04-28 Revised:2017-12-15 Online:2018-05-20 Published:2018-05-20

摘要/Abstract

摘要： 聚类分析已成功用于认知诊断评估(CDA)中，使用广泛的聚类分析方法为K-means算法，有研究已证明K-means在CDA中具有较好的聚类效果。而谱聚类算法通常比K-means分类效果更佳，本研究将谱聚类算法引进CDA，探讨了属性层级结构、属性个数、样本量和失误率对该方法的影响。研究发现：（1）谱聚类算法要比K-means提供更好的聚类结果，尤其在实验条件较苛刻时，谱聚类算法更加稳健；（2）线型结构聚类效果最好，收敛型和发散型相近，独立型结构表现较差；（3）属性个数和失误率增加后，聚类效果会下降；（4）样本量增加后，聚类效果有所提升，但K-means方法有时会有反向结果出现。

关键词: 非参数认知诊断, 谱聚类, K-means, 属性层级结构

Abstract: Clustering analysis for cognitive diagnostic assessment is a significant approach to classify examinees into several categories matching their attribute profiles which can reflect the status of mastering or nonmastering each attribute. These methods belong to the nonparametric technique that dose not require the estimation of parameters, and are less restrictive and often computationally more efficient than parametric technique, such as cognitive diagnostic models. Better yet, many nonparametric classification algorithms can be easily implemented in most statistical software packages, R or matlab. The K-means is the most classical algorithm among the clustering analysis methods, and has widely application in real world. The K-means clustering analysis for cognitive diagnostic assessment requires the Q-matrix only, which describes the relationship between attributes and items. The previous study has proved that the K-means algorithm has fairly favorable classified ability for cognitive diagnostic assessment comparing the cognitive diagnostic models. However, the spectral clustering algorithm (SCA) which is the powerful algorithm for clustering has been broadly applied to many fields, including image segmentation, neural information processing, biology, and large-scale assessment in psychology. The SCA is easy to operate, and often outperforms traditional clustering algorithms such as the K-means algorithm. In this article, we introduce the SCA for classifying examinees into attribute-homogeneous groups based on their responses. However, the starting values have a large effect on the classified performance for both SCA and the K-means algorithm. So, we adopted Ward’s and random starting values when using SCA, and best, Ward’s and random starting values when using the K-means algorithm. Totally, five methods were considered in this article. They are SCA-Ward’s, SCA-R, K-means-best, K-means-Ward’s, and K-means-R, respectively. The simulation studies were implemented to compare the classified performance between the SCA and the K-means algorithm using two indices, agreement between partitions and the within-cluster homogeneity, under four factors: the attribute hierarchical structures (Linear, Convergent, Divergent, or Independent), the number of examinees (100 or 500), the number of attributes (4 or 5), and the slippage levels (5%, 10%, or 15%). Thus, there were totally 96 (=4×2×3×4) experimental conditions to investigate. 30 data sets were simulated and analyzed under each experimental condition in order to reduce the random error. Simulation results showed that: (1) the performance of classified results for SCA was always better than those for K-means algorithm in various conditions. Especially, the SCA performed robuster when the conditions became severe. (2) the classified results was the best under linear structure, followed by convergent and divergent structures, and the independent structure had poorest classified ability. (3) with increase of the number of attributes and the slippage levels, the accuracy of classification of examinees declined. (4) with increase of the number of examinees, the accuracy of classification also increased. But the reverse results would be appeared for the k-means algorithm, which meant the accuracy of classification decreased. Finally, some issues for the SCA and research directions are discussed. In conclusion, the SCA has much better classified performance than K-means algorithm. The practitioners should consider implementing the SCA to classify examinees into attribute-homogeneous groups in real world to obtain accurate attribute profiles.

Key words: nonparametric cognitive diagnosis, spectral clustering algorithm, k-means algorithm, attribute hierarchical structures

郭磊杨静宋乃庆. 谱聚类算法在不同属性层级结构诊断评估中的应用[J]. 心理科学, 2018, (3): 735-742.