Abstract：Clustering analysis for cognitive diagnostic assessment is a significant approach to classify examinees into several categories matching their attribute profiles which can reflect the status of mastering or nonmastering each attribute. These methods belong to the nonparametric technique that dose not require the estimation of parameters, and are less restrictive and often computationally more efficient than parametric technique, such as cognitive diagnostic models. Better yet, many nonparametric classification algorithms can be easily implemented in most statistical software packages, R or matlab.
The K-means is the most classical algorithm among the clustering analysis methods, and has widely application in real world. The K-means clustering analysis for cognitive diagnostic assessment requires the Q-matrix only, which describes the relationship between attributes and items. The previous study has proved that the K-means algorithm has fairly favorable classified ability for cognitive diagnostic assessment comparing the cognitive diagnostic models. However, the spectral clustering algorithm (SCA) which is the powerful algorithm for clustering has been broadly applied to many fields, including image segmentation, neural information processing, biology, and large-scale assessment in psychology. The SCA is easy to operate, and often outperforms traditional clustering algorithms such as the K-means algorithm. In this article, we introduce the SCA for classifying examinees into attribute-homogeneous groups based on their responses. However, the starting values have a large effect on the classified performance for both SCA and the K-means algorithm. So, we adopted Ward’s and random starting values when using SCA, and best, Ward’s and random starting values when using the K-means algorithm. Totally, five methods were considered in this article. They are SCA-Ward’s, SCA-R, K-means-best, K-means-Ward’s, and K-means-R, respectively.
The simulation studies were implemented to compare the classified performance between the SCA and the K-means algorithm using two indices, agreement between partitions and the within-cluster homogeneity, under four factors: the attribute hierarchical structures (Linear, Convergent, Divergent, or Independent), the number of examinees (100 or 500), the number of attributes (4 or 5), and the slippage levels (5%, 10%, or 15%). Thus, there were totally 96 (=4×2×3×4) experimental conditions to investigate. 30 data sets were simulated and analyzed under each experimental condition in order to reduce the random error. Simulation results showed that: (1) the performance of classified results for SCA was always better than those for K-means algorithm in various conditions. Especially, the SCA performed robuster when the conditions became severe. (2) the classified results was the best under linear structure, followed by convergent and divergent structures, and the independent structure had poorest classified ability. (3) with increase of the number of attributes and the slippage levels, the accuracy of classification of examinees declined. (4) with increase of the number of examinees, the accuracy of classification also increased. But the reverse results would be appeared for the k-means algorithm, which meant the accuracy of classification decreased. Finally, some issues for the SCA and research directions are discussed.
In conclusion, the SCA has much better classified performance than K-means algorithm. The practitioners should consider implementing the SCA to classify examinees into attribute-homogeneous groups in real world to obtain accurate attribute profiles.