Student Profiling on Academic Performance Using Cluster Analysis

Copyright © 2012 Osman N. Darcan and Bertan Y. Badur. This is an open access article distributed under the Creative Commons Attribution License unported 3.0, which permits unrestricted use, distribution, and reproduction in any medium, provided that original work is properly cited. Contact author: Osman N. Darcan E-mail: darcan@boun.edu.tr Student Profiling on Academic Performance Using Cluster Analysis


Introduction
Management Information Systems (MIS) combines the disciplines of management and computer science to manage information (Laudon and Laudon, 2009).As an emerging interdisciplinary field, MIS demands both technical and managerial skills from its graduates.The curriculum of MIS department in Boğaziçi University is designed to deliver a balanced set of management and computer courses in order to prepare students for developing and maintaining business information systems.Courses offered in the MIS curriculum cover a wide range of topics that include management and organization, economics, marketing, accounting and finance, computer programming, system design concepts, database management, data communication and operations research.In the first two years, students take basic management and computing courses.Specialized courses are offered in the last two years to provide the student with a strong foundation in information management.
In Turkey, students have to take a nationwide entrance exam to study at a university.The main objective of this exam is to measure the candidate's basic knowledge in social and technical high school courses.Based on these measurements, composite scores are calculated in selection of these candidates.As a direct consequence of this, students from general high schools and vocational high schools (mainly from computer and management departments) with widely varying range of backgrounds are admitted to the MIS department.Students with different backgrounds have to pursue the same diversified set of courses such as programming, managerial and quantitative subjects as well as analysis and design.
The aim of this study is to investigate the profiles of students in MIS department by performing cluster analysis on various dimensions of academic abilities based on their official grade data for the required courses.Characteristics of students in each cluster are examined to gain inside knowledge about how such attributes as educational background and high school types are distributed over each segment.Especially, how the distribution of category of high school types varies among different segments are of interest to shape strategic decision of our department.
The outline of this paper is as follows.In Section 2, basic data mining functionalities are introduced and related works in educational data mining are summarized.The methodology of this study is presented in Section 3, which is followed by the description of data in Section 4. Section 5 discusses the results in detail.Finally, the last section summarizes our work and presents how the result of the analysis is used in the department under question.

Educational Data Mining
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information.Data mining functionalities are classified into two broad categories as descriptive and predictive ones (Han et al., 2011 This study can be categorized as educational data mining which is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context.Educational data mining is a new research area.A survey of the application of data mining techniques to various educational systems is given in Romero and Ventura (2007).These techniques include data visualization, clustering, classification and association analysis applied to educational systems such as traditional education, distant education as well as the learning content management systems.
In an another work of Romero et al. (2008) on educational data mining, the application of various data mining techniques on data collected from the activities of students who use Moodle e-learning course management system is discussed.
To the extent of our knowledge, a closely related research is in Dzemyda (2005), where a method for the analysis of curricula via the statistical analysis of examination results is proposed.
The method is grounded on the visualization of a set of academic subjects characterized by their correlation matrix of 25 subjects obtained using examination results multidimensional data.The correlation matrix has been analyzed to test the relation between the aptitudes of students and the marks earned in the related subjects.

Methodology
This study aims at clustering undergraduate students in the MIS department of Boğaziçi University based on course grades data.After forming student clusters, a profile analysis was carried out so as to examine the variation of other student characteristics in different student segments.These characteristics are qualitative variables that are not included in cluster analysis (Sharma, 1995).
As can be seen in Table 1, there are 31 required courses in our current MIS undergraduate curriculum, so a dimension reduction strategy is needed to obtain independent factors.Courses requiring similar abilities from students are expected to fall under the same factors.One possible approach is based on identifying these different dimensions subjectively using domain knowledge; this can be accomplished by assigning a weight to each of these ability dimensions for each course.These weights can be obtained from instructors or students by designing appropriate questionnaires and combining their opinions accordingly.The second approach, as followed in this study, is the factor analysis which is a multivariate statistical method whose primary purpose is to define the structure of data.It can be utilized to examine the underlying patterns for a large number of variables and to determine whether these patterns can be condensed or summarized in a smaller set of factors or components.
The correlation between the original variables and the factors are called factor loadings.Once the initial solution (set of independent factors and their loadings) is obtained, a rotation method can be applied to facilitate interpretation of the solution.
The rotation method is expected to alter the decomposition of variance explained by different factors.Factor analysis can be carried out with different techniques such as principle component factoring, principle axis factoring and maximum-likelihood (Basilevsky, 1994;Hair et al., 2009;Sharma, 1995).
In this study, factor analysis using the principle component factoring is applied to obtain the underlying factors representing the ability dimensions of student grades.
Varimax rotation method is used to obtain the rotated factor loadings.
For each factor, summed scales are computed by taking the arithmetic averages of highly loading courses on that factor (Hair et al., 2009).Since all variables are numerical with ratio scale, cluster analysis is performed by k-means algorithm (Han et al., 2011;Mirkin, 2005).After forming the clusters, qualitative characteristics of students in each segment are examined by cross tabulations.
Only the grades of the required courses offered by the department are used in this study.The elective course grades are omitted due to the heterogeneous nature of these grades.In addition, considering the fact that each instructor may have different grading policies and even the same instructor's grading patterns may change over time, for each year the course grades are standardized by mapping the course average grade to 2.0 and standard deviation to 1.0.Hence, for each specific year, the success of a particular student is measured by the units of standard deviations above or below the average course grade of that year.Course averages are mapped to 2.0 for the sake of easy interpretation instead of using the wellknown z scores.

Description of Data
The data set is obtained from the Registration Office of the University.The data set contains records that include information about the student number, course code, semester, letter grade and status, as well as records that contain student's personal information such as gender, high school name and type.The sample period ranges from fall-2000 semester to fall-2007 semester.There are 467 MIS students in the data set.The letters ranging from AA (excellent) to F (Fail) are mapped to a ratio scale numerical variable with scores ranging from 4 (for AA) to 0 (for F).In the case of a student taking the same course more than once, the average of the all earned grades is used as its final score.The data is converted into tabular format where rows represent students and columns represent courses, hence each cell contains the score of a particular student for a specific course.In this format, the data contains a lot of missing values since new students do not have any junior or senior course grades.No missing value handling method is used as factor analysis is based on computing the correlations among variables, because replacing missing scores with the means may introduce bias into estimations.

Results
Both factor and cluster analysis were performed using SPSS version 16.0 (SPSS, 2009).The results of the factor analysis are shown in Table 2.  Examination of the cluster centers in Table 4 reveals that: Cluster 3 represents the most successful students.In all four dimensions, their grades are approximately one standard deviation above the mean.The students in Cluster 5 represent the second successful group whose grades are in general 0.4 standard deviation above the average.Cluster 4 is characterized by the average students.Unsuccessful students are grouped in Clusters 1 and 6, whose grades are below the average in all the abilities.However, students in these clusters have similar programming and quantitative abilities (F1, F2) but they are differentiated in the system and managerial dimensions (F3, F4).Compared to cluster 1, the managerial abilities of students in cluster 6 are higher by 0.7 standard deviation, whereas their system thinking abilities are lower by 0.4 standard deviation.There are only 7 students in cluster 2 which can be treated as outliers.These students have performed very poorly in all abilities.

Conclusions
In this study, we have explored different student segments by performing cluster analysis on various dimensions of academic abilities for the MIS department of Boğaziçi University.Based on these segments, the profiles of students including categorical variables such as educational background and high school types are determined.These profiles are used in two ways: (1) to investigate how high school type that varies among different segments effects the education; and (2) to distribute students in various elective courses and projects as well as to revise educational strategies of the department related to the curriculum.
Since students have to take a nationwide entrance exam to enter a university, the department has no control over the selection process of the undergraduate students.Therefore, the evaluation as defined in this paper cannot be applied to the selection of students.However, the results of this study can be used in the following areas to improve the quality of the MIS education: • The programming courses are offered in the first two years of the MIS curriculum.
The assignment of the students to different sections of the programming courses as well as the curriculum design for these sections can be carried out by considering students' backgrounds.
• The projects assigned to students in courses in the last two years of the program require different skills (programming, managerial, quantitative, or system) of the students; hence, the group member's composition can be determined based on the results of this study.
• The results can be used to offer a different type of elective courses according to the background of the current students in a particular semester as well as designing elective tracks.
• The academic advisors of the students can consider the findings to guide students in selecting appropriate complementary or departmental elective courses.