Confirmatory Factor Analysis of Center for Epidemiological Studies Depression 10-item Scale on Chinese International Students in Korea

Depression has become increasingly prevalent in Chinese international students in South Korea. For this population, therefore, accurate assessment of mood disorders, particularly depression, is critically important. The 10-item Center for Epidemiological Studies Depression Scale (CES-D 10) is commonly used to measure depression in both clinical and non-clinical populations. Thus, this study examined the CES-D 10’s factor structure and psychometric properties in Chinese international students.


INTRODUCTION
Major Depressive Disorder (MDD) is also known as major depression and is widely recognized worldwide as one of the most common mental disorders because it displays a relatively high prevalence in the general population. MDD often follows a chronic course and is associated with functional impairments and disabilities [1]. Given its prevalence and associated impairment, it is imperative to carry out accurate and sustain-able assessments of depressive symptoms. This paper focuses on Chinese international students who travel to South Korea (hereafter Korea) in pursuit of higher education opportunities.
With a significant increase in the enrollment of Chinese international students in South Korea (hereafter Korea) over the past 10 years, mental health concerns for this population have become an increasingly important area of study. Chinese students represent the largest proportion of international students in Korea, followed by those from Vietnam and Mongolia. According to the National Institute for International Education [2], the number of international students enrolled in Korean higher education institutions increased by 75% to 68,833 students during the 2015-2019 academic years, while Chinese student enrollment increased by 47% to 71,067, representing 44.4% of international students [3]. International students, including Chinese nationals, inevitably experience a process of acculturation that can pose unwanted or unexpected challenges as they immerse themselves into their host country [4]. Acculturation is the broad process of cultural and psychological changes resulting from contact between two or more cultures [5]. Besides adjustment to a new social environment and educational system, international students encounter unique sources of stress such as a language barrier, financial difficulties, homesickness, culture shock, deficient academic relationships with instructors, and unsatisfactory peer associations [6 -8]. Such phenomena consequently evoke acculturative stress in international students (i.e., strain stemming from life changes occurring during the acculturation process) [9]. In fact, the findings of previous studies conducted on the experiences of Chinese international students reveal their vulnerability to high levels of acculturative stress such as anxiety and depression [10,11]. Other scholars have noted that physical environmental stressors, including housing and transportation issues, also elicit stress in international students [12].
Available evidence suggests that the depression level of Chinese students in Korea is higher than average. For example, studies that administered the Center for Epidemiological Studies Depression Scale-20 (CES-D) among a sample of Chinese international students in Korea reported their mean depressive symptoms scores at 16.19 and 24.33, which have been reported to be high based on the scale's threshold score of 16 [13 -15].
The increasing number of Chinese international students in Korea gives rise to greater demands for valid and reliable psychiatric research instruments. Depression is a common psychological problem for international students, but no studies have yet tested depression measures on Chinese international students in Korea. The Beck Depression Inventory (BDI) [16] and the Patient Health Questionnaire (PHQ) [17] are the most commonly used self-reporting measures of depression. However, BDI has been found to exhibit false positives at high rates, and some somatic symptoms may not be related to depression [18]. The PHQ is deemed preferable for use in clinical settings [19]. The Center for Epidemiologic Studies Depression Scale (CES-D) represents a promising depression screening instrument. This measure is commonly employed to detect the presence and severity of depressive symptoms in both clinical and non-clinical settings [14]. It has been recommended that the original 20-item design of CES-D should be shortened to enhance its clinical utility and ease of scoring and to increase the likelihood that participants answer all items [20]. Diverse short and simplified forms of the 20item CES-D have been developed to minimize the respondent burden. The administration of the short-form in primary care and school settings is faster, and the likelihood of both clinical and non-clinical participants answering all of the questions is also augmented.  Iowa form developed by Kohout et al. [20] and the 10-item form (CES-D 10) developed by Andresen et al. [21] are the most commonly used abbreviated forms. The Andersen version exhibits robust psychometric properties in both clinical [1] and non-clinical samples of varied populations, including university students [22], transnational mothers [23], older adults [24], and adolescents [25], as indicated by its adequate internal reliability and construct validity. Despite wide psychometric support for CES-D 10, conflicting results of confirmatory factor analysis studies have elicited questions about the dimensionality of this instrument and provoked an ongoing debate concerning its underlying structure. For instance, Baron et al. [26] and Gonzalez et al. [27] have suggested that a one-factor model is most apt for data, but other researchers have consistently reported in favor of a two-factor model, including a 2-item positive affect factor and an 8-item negative affect factor, as offering the best fit [23 -25]. Yet other scholars have found a three-factor structure to be most conducive [28]. These inconsistent findings might be attributed to the use of different statistical techniques, for example, Exploratory Factor Analysis (EFA) vs. Confirmatory Factor Analysis (CFA) and sample heterogeneity pertaining to (1) individuals with different cultural backgrounds, (2) differences in study sample age ranges, and (3) participant characteristics (e.g., a clinical sample compared to a non-clinical sample). CFA can clarify the issues in such situations, allowing the evaluation of competing structure models and restricting factor loadings, variances, and covariances/correlations to deliver a more parsimonious model [24].
Nezu et al.
[29] evaluated the CES-D's clinical utility as "limited" and appraised its research applicability as "high" because CES-D in all its versions (20, 11, and 10) evidenced high sensitivity to changes in depressive symptoms over time. A disadvantage of CES-D concerns its potential difficulties for the elderly or individuals with cognitive impairments, who may find the response format confusing [20]. Also, its low specificity observed through the different cut-off scores for major depression syndrome has led to recommendations that it is more suitable as a screening rather than a diagnostic tool [30].
Several studies in Korea have performed EFA and CFA on the Korean versions of the CES-D 20, CES-D 11, and CES-D 10 to examine the factor structure of this measure. The following results have been reported consistently: (1) the CES-D demonstrated high internal consistency [23,31,32]; (2) the EFA revealed a three-factor structure [32]; (3) CFA-based comparisons of single-factor and two-factor, and three-factor models have evinced that single and two-factor models offered a perfect fit to the data [23,32]. Studies conducted by Heo et al. [31], Lee [23], and Park et al. [32] have confirmed the psychometric properties of the Korean versions of the CES-D, but at least three limitations currently prevent its deployment in the context of Korean universities and research projects. First, Lee's [23] investigation included transnational mothers known as "geese," whereas Park et al.'s [32] study encompassed parents of individuals with cerebral palsy; the international student sample of our study may differ qualitatively in many respects from samples that comprise parents. Second, the mean age of Lee's [23] parent-based sample was 45.9 years (age range 35-62 years); although typical of questionnaire validation studies, this span was much wider than the age differential of the international students who participated in our study. In contrast to these previous studies, our sample comprised students who were quite young with a mean age of 20.6 years. Third, Heo et al. [31] used the CES-D 20 to assess depressive symptoms in adolescents aged between 13 and 16. Their study sample encompassed middle school students attending grades 1-3. The longer version of the concerned measure has been demonstrated to be effective as a tool for the screening of depression in adolescents [31]. The specified limitations allow the construal that the previously conducted studies do not sufficiently validate the applicability of the CES-D to Chinese international students studying in Korea. Therefore, a more theoretically sound version of the questionnaire administered to a more representative Chinese student sample is necessary and could provide more reliable data. In fact, the extant research has demonstrated that the CES-D 10 is applicable both to patients in clinical settings and to normal adults in larger communities [1, 22 -28].
Moreover, universal school-based screening of youngsters would allow the identification of students who are currently facing or are at risk of experiencing internalizing problems associated with depression and would thus facilitate the prevention of the ailment [33]. We thus believe that it is necessary to examine the factor structure of the CES-D 10 as a first step toward the evaluation of its potential usefulness as a universal screening tool for Chinese international students in the university. Such an examination would also function to inform the best use of this screening tool. Although the CES-D 10's factor structure and reliability have been validated in various settings, its psychometric properties have not been validated with Chinese students studying in Korea despite that population's growth explosion. Given Chinese university students' high prevalence of depression, an easily accessible and highly reliable screening instrument for depressive symptoms is increasingly needed. Furthermore, no study to date has evaluated the CES-D 10's psychometric properties with this population in Korea. Therefore, this study's purposes were (1) to examine the CES-D 10's factor structure by testing competing models suggested by the literature to determine which provides the most parsimonious data fit, using a range of fit indices and CFA; and (2) to examine the scale's reliability (i.e., internal consistency, mean inter-item correlation, and testretest reliability).

Participants
Participants were 250 (39 males, 211 females) Chinese international students enrolled at a four-year university in Korea's central region. Their courses of study included beauty design management, education, media design and video, and social and child welfare. Participants' ages ranged from 18 to 26 (M = 20.9, SD = 1.88), and 84% were female. For male students, the mean age was 20.5 years (SD = 1.43), and for female students, 20.9 (SD = 1.95).

Procedures
The study was conducted during the 2019-2020 academic year. An institutional board reviewed and approved the study protocol (Protocol Code: 1041549-200407-SB-90).
After receiving ethical approval, the Principal Investigator (PI) arranged with academic instructors from four disciplines for students to complete questionnaires during scheduled class time. Students from each discipline were invited to participate and completed either a paper-pencil or an online survey version. The paper-pencil survey was distributed during inperson classes on campus during the academic year 2019. During the last 20 minutes of class sessions, the PI administered the survey by, first, informing students about its content and purpose, assuring them of confidentiality, and providing instructions; then by collecting consent forms and the completed survey. During the academic year 2020, the study's co-author administered the survey online through Google Forms by posting an invitational thread in discussion forums and sending a message to all participants from the four disciplines. To increase survey participation, potential respondents were sent an email reminder 2 weeks after the initial survey distribution, and data were collected 2 weeks later. Respondents were informed that participation was voluntary and that the completed questionnaire's return was considered informed consent. The anonymous online survey version took approximately 10-15 minutes to complete. The CES-D 10 was administered in Chinese, the preferred language of the participating students.

Measure
The Center for Epidemiological Studies Depression 10 Scale (CES-D 10) [21] is a brief self-report screening scale designed to measure depressive symptoms during the past week. Participants indicate the degree to which they have experienced each of ten symptoms, using a four-point Likert scale ranging from 0 "rarely or none of the time" to 3 "all of the time." A total score is calculated by summing all items, after reversing scores on the two positively worded items. Scores range from 10 to 30, with a score of 10 or higher as the cutoff for clinically significant depressive symptoms.
The Chinese version of the CES-D 10 used in this study was adopted from the translation of the China Health and Retirement Longitudinal Study (CHARLS), a population-based survey conducted by the National School of Development at Peking University. The Chinese version of the CES-D 10 has been validated and widely used for screening depression in adult Chinese populations [34,35]. In addition, to ensure translation accuracy, a bilingual teaching assistant confirmed the instrument's translational and conceptual equivalence.

Statistical Analysis
IBM SPSS Statistics for Windows, Version 23.0 (IBM Corp., Armonk, NY, USA) and AMOS 20.0 (IBM Corp.) [36] were used for statistical analyses. Although there are no generalized guidelines on sample size for CFA, Boomsma claimed that a minimum of 100-200 is required [37]. Moreover, Muthén and Muthén suggested that a reasonable sample for a simple CFA model is about 150 [38]. Thus, this study's sample of 250 should be considered fair, especially in view of the relative simplicity of the investigated models with 20 parameters (i.e., 10 items, one-factor solution). The assumption regarding the normality of the distributions was assessed at both the univariate and multivariate levels. Skewness and kurtosis values for all the study variables were well within acceptable ranges (i.e., +/−3.00). Multivariate normality was checked using Mardia's test, which indicated that the data violated the assumption of multivariate normality (Mardia's normalized coefficient equal to 105.21). Next, CFA was performed on the covariance matrices using the robust maximum likelihood method of estimation (MLE) procedure to test the latent structure of the competing models of the CES-D 10. Three competing models were tested, and the resulting fit indices were compared to determine which model best fit the data. The first model posited all ten CES-D-10 items loading onto a single factor. The second was a two-factor model in which two positively worded items (5 and 8) were loaded onto one factor as the "positive affect" (PA) factor, while the other eight items were loaded separately as the "depressive affect/somatic retardation" (DA/SR) factor. The third model loaded the 10 items onto three specific factors: DA (items 1, 3, 6, and 9), SR (items 2, 4, 7, and 10) and PA (items 5 and 8).
The three models' fit was assessed by reference to a number of indices, including chi-square (χ 2 ) and its subsequent ratio with degrees of freedom (χ 2 /df); comparative fit index (CFI); goodness-of-fit index (GFI); root mean square error of approximation (RMSEA) and its 90% confidence interval (90% CI); and standardized root mean square residual (SRMR). A χ 2 /df result of less than 5 indicates reasonable fit [39]. CFI and GFI with values equal to or greater than 0.90 generally indicate models with acceptable fit [40]. RMSEA values of less than 0.05 suggest excellent fit, although values from 0.05 to 0.08 indicate reasonable fit [41,42]. SRMR values below 0.08 indicate good model fit, with smaller values indicating better data fit [40]. The Akaike Information Criterion (AIC) was also used to compare the fit of the models. According to this index, lower AIC values reflect a better fit. In addition, chi-square difference tests were used to determine whether models showed statistically significant differences. After examining goodness-of-fit for the competing CES-D 10 models, the bestfitting model was used for subsequent analyzes. Cronbach's alpha and the Pearson correlation coefficient were assessed for internal consistency and test-retest reliability, respectively. Test-retest reliability was assessed at a 4-week interval with a subsample of 20 students. Table 1 provides goodness-of-fit indices for the three models tested, and Table 2 presents the standardized factor loadings for CES-D 10. As Table 1 shows, the one-factor model yielded the worst data fit compared to the other models (χ 2 = 249.0, df = 35; χ 2 /df = 7.2; CFI = 0.77; GFI = 0.85; RMSEA = 0.157 (90% CI = 0.139-0.175); SRMR = 0.111). All fit indices were inadequate. CFI and GFI fell below the cutoff score of 0.90, the RMSEA was above the suggested cutoff score of 0.08, and SRMR values were also above the recommended cutoff score of 0.08. In contrast, the two-factor model fit the data better than the one-factor model, as evidenced by decreased chi-square value and improved CFI, GFI, RMSEA, and SRMR; thus all fit indices could qualify as satisfactory (χ 2 = 99.0, df = 27; χ 2 /df = 3.7; CFI = 0.92; GFI = 0.97; RMSEA = 0.079 (90% CI = 0.072-0.084); SRMR = 0.071). Finally, the three-factor model fit the data poorly because overall fit indices did not approach an acceptable level (χ 2 = 136.3, df = 32; χ 2 /df = 4.3; CFI = 0.89; GFI = 0.92; RMSEA = 0.114 (90% CI = 0.095-0.134); SRMR = 0.075). More precisely, although χ 2 /df, GFI, and SRMR values were acceptable, all other indices failed to reach suggested cutoff values, indicating poorer fit overall. The AIC statistics further confirm the superior fit of the two-factor model, as the AIC is 155.01, which is lower than the one-factor (AIC = 289.01) and the three-factor model (AIC = 182.03). Additionally, the chi square difference test revealed that the two-factor model provided a relatively better fit than the one-factor (ΔX 2 (8) = 150.0, p < 0.001) and three-factor models (ΔX 2 (6) = 32.1, p < 0.001). Therefore, the best results for all goodness-of-fit indices were obtained with the two-factor model (Fig. 1). However, despite support for the two-factor model from goodness-of-fit indices, PA items' factor loadings were disproportional because item 5 had a low loading, whereas item 8 had a very high loading (0.29 vs. 0.84). The remaining items' loadings ranged from 0.53 to 0.76 (Table 2). Moreover, the correlation between the DA/SR and PA factors was low (0.28).

Reliability
Internal consistency measured using Cronbach's alpha was 0.80, 0.86, and 0.65 for the total CES-D 10, DA/SR and PA, respectively. For the overall CES-D 10, item-wise deletion showed that Cronbach's alpha did not change substantially with any item's exclusion. Based on the criterion of above 0.30 as an acceptable corrected item-total correlation, all items except 5 and 8 were in a suitable range (0.30-0.65). However, item-total correlations for 5 and 8 failed to meet the criterion (0.15 and 0.24). Test-retest correlations over a 4-week interval were found to be r = 0.82 (p < 0.01) for total CES-D 10 scores, r = 0.74 (p < 0.01) for the DA/SR factor, and r = 0.70 (p < 0.01) for the PA factor, indicating adequate stability. Table 3 displays means and standard deviations for the CES-D 10 items, showing overall mean scores of 10.78 (SD = 4.93). Based on a cutoff score of >/ =10, rates of depressive symptoms for the entire sample are considered slightly high. Mean scores for depression affect and somatic retardation, and positive affect were also computed, M = 7.38 (SD = 4.75) for the DA/SR and M = 3.39 (SD = 1.84) for the PA.

DISCUSSION
To the best of our knowledge, the present study represents the first evaluation of the CES-D 10's psychometric properties among Chinese international students in Korea. Notes: df = degrees of freedom; CFI = comparative fit index; GFI = goodness of fit index; RMSEA = root mean square error of approximation; CI = confidence interval; SRMR = standardized root mean residual; AIC = Akaike information criterion *p < .01.  Abbreviations: CES-D = Center for Epidemiological Studies Depression; SD = Standard Deviation CES-D-10 total score ranges from 0-30 The study found that the two-factor model, which divided items into DA/SR and PA constructs, fit the data better than both the one-and two-factor solutions. This is consistent with some previous studies that found the two-factor model provided the best fit compared to other models tested [23 -25]. While internal consistencies of the total CES-D 10 and the presence of the DA/SR factor were found equivalently high, the PA factor's internal consistency was relatively low, a result congruent with previous studies [24,25]. This might be because the only two items in the PA factor are not related to each other or to DA/SR [25] and because a small number of items associated with the scale tend to have lower reliability because the number of items strongly affects alpha. The testretest reliability of total CES-D 10 scores, as well as DA/SR and PA were adequate.
Despite support for the two-factor model, interpretation of the two-factor model's fit should be considered with caution. First, a correlation between DA/SR and PA factors was very low; the correlation's magnitude was similar to those in other studies that supported a two-factor model [23 -25]. As suggested by Schroevers et al. [43], a possible reason could be that these two items (PA) may measure other constructs in addition to depressive symptomatology, thereby resulting in a weak correlation. Specifically, these two items are worded in reverse ("I was happy" and "I felt hopeful about the future"); thus, higher scores indicate less depression. Bradley et al. [25] proposed that a better fit could be achieved if the CES-D scale was used with additional items or if both PA items were eliminated. However, the exclusion of items may not guarantee the increased reliability of the measure and may also render the newly validated scale incomparable to other published versions. Thus, the 10-item CES-D is frequently used in general populations across different countries despite potential weaknesses that may emerge from the retention of every item. It has demonstrated beneficial psychometric properties, and maintaining the same version is meaningful for comparative purposes.
. Another explanation is the moderation of emotions in East Asian cultures such as Korea and China, whose people often value modesty and self-effacement as cultural virtues [44,45]. Iwata et al. [46] suggested that the two PA items could not distinguish non-depressed and depressed Japanese women because their culture embraces emotional moderation, so they might suppress the expression of PA. A Korean sample also evidenced such moderated self-reporting; they were less likely to endorse PA items, thus leading to high total CES-D scores [47].
In view of these possible reporting patterns among Asians, caution should be exercised when the CES-D 10 is applied to Asian samples, including ours, considering that prevalence rates might not represent all individuals' true psychological health [24]. Perhaps the easiest way to address self-reporting bias and conform more closely to Asian cultures is to modify positively worded items into negatively worded items. For example, "hopeful" and "happy" items should be modified to "not hopeful" and "not happy." In this way, the items might be more theoretically sound and more culturally relevant for Asian research samples [24].
The two PA items' disproportional factor loadings were consistent with previous studies [23 -25]. As suggested by Bradley et al. [25], a possible reason could be that item 8 "happy" differs from item 5 "hopeful." Although happiness and hopefulness have PA qualities that can occur as states, they are nevertheless distinct. Thus, these items should not be grouped together. Bradley et al. [25] also suggested that this PA factor might need additional items or that the items need modification to produce a better-fitting model. This study is not without limitations. First, our sample size was relatively small and did not allow us to test more complex, higher-order models. Our attempt to do so led to an inadmissible solution when a bifactor model was tested. Future research employing a larger, more diverse sample would provide more support for the CES-D 10's factor structure and allow investigators to test more complex, higher-order models. Next, this study recruited only undergraduate students, so its findings might not generalize to other age groups or populations. Generalizability might also be limited by the sample's large number of female students. Thus, because the CES-D 10's structures may differ for males and females, models should be assessed by gender. For example, the twofactor structure has been validated in both female and male elderly Chinese samples, and its structure varied across those groups [24]. We did not attempt this procedure due to the relatively few male students. Another limitation of the study was that no convergent measure was included. Therefore, prospective research initiatives should consider assessing the convergent validity of CES-D by measuring its associations with similar constructs such as the PHQ-9 and BDI.

CONCLUSION
Despite these limitations, our findings can potentially inform international student advisors, health care professionals, and researchers in the field of depression. Understanding how the CES-D 10 functions among Chinese international students, knowing that it is a valid measure for minorities in Korea, such as Chinese international students, might increase its accuracy of administration and interpretation. In turn, the CES-D 10's proper use might improve health care professionals' understanding of depressive symptoms among Chinese international students studying in Korea. In conclusion, our findings suggest that the two-factor CES-D 10 model has satisfactory psychometric properties and can be used for assessing depressive symptoms in Chinese university students.