
Abstract:
Clustering analysis for cognitive diagnostic assessment is a significant approach to classify examinees into several categories matching their attribute profiles which can reflect the status of mastering or nonmastering each attribute. These methods belong to the nonparametric technique that dose not require the estimation of parameters, and are less restrictive and often computationally more efficient than parametric technique, such as cognitive diagnostic models. Better yet, many nonparametric classification algorithms can be easily implemented in most statistical software packages, R or matlab.
The Kmeans is the most classical algorithm among the clustering analysis methods, and has widely application in real world. The Kmeans clustering analysis for cognitive diagnostic assessment requires the Qmatrix only, which describes the relationship between attributes and items. The previous study has proved that the Kmeans algorithm has fairly favorable classified ability for cognitive diagnostic assessment comparing the cognitive diagnostic models. However, the spectral clustering algorithm (SCA) which is the powerful algorithm for clustering has been broadly applied to many fields, including image segmentation, neural information processing, biology, and largescale assessment in psychology. The SCA is easy to operate, and often outperforms traditional clustering algorithms such as the Kmeans algorithm. In this article, we introduce the SCA for classifying examinees into attributehomogeneous groups based on their responses. However, the starting values have a large effect on the classified performance for both SCA and the Kmeans algorithm. So, we adopted Ward’s and random starting values when using SCA, and best, Ward’s and random starting values when using the Kmeans algorithm. Totally, five methods were considered in this article. They are SCAWard’s, SCAR, Kmeansbest, KmeansWard’s, and KmeansR, respectively.
The simulation studies were implemented to compare the classified performance between the SCA and the Kmeans algorithm using two indices, agreement between partitions and the withincluster homogeneity, under four factors: the attribute hierarchical structures (Linear, Convergent, Divergent, or Independent), the number of examinees (100 or 500), the number of attributes (4 or 5), and the slippage levels (5%, 10%, or 15%). Thus, there were totally 96 (=4×2×3×4) experimental conditions to investigate. 30 data sets were simulated and analyzed under each experimental condition in order to reduce the random error. Simulation results showed that: (1) the performance of classified results for SCA was always better than those for Kmeans algorithm in various conditions. Especially, the SCA performed robuster when the conditions became severe. (2) the classified results was the best under linear structure, followed by convergent and divergent structures, and the independent structure had poorest classified ability. (3) with increase of the number of attributes and the slippage levels, the accuracy of classification of examinees declined. (4) with increase of the number of examinees, the accuracy of classification also increased. But the reverse results would be appeared for the kmeans algorithm, which meant the accuracy of classification decreased. Finally, some issues for the SCA and research directions are discussed.
In conclusion, the SCA has much better classified performance than Kmeans algorithm. The practitioners should consider implementing the SCA to classify examinees into attributehomogeneous groups in real world to obtain accurate attribute profiles. 