All published articles of this journal are available on ScienceDirect.
Factor Structure and Validity of the Korean Version of the Patient Health Questionnaire-9 Among Early Childhood Teachers
Abstract
Background:
Depression is prevalent among teachers, particularly those in early childhood education. Thus, their depressive symptoms’ accurate assessment is important in both research and practice, and, for this purpose, the nine-item Patient Health Questionnaire (PHQ-9) has shown considerable promise in depression screening and diagnosis. Although the PHQ-9 has been widely employed in both clinical and nonclinical settings, its validity among early childhood teachers in Korea is questionable, and its dimensionality remains controversial. This study’s purposes were thus to provide data on the factorial structure and psychometric properties of the PHQ-9’s Korean version and to investigate associations between the PHQ-9 and a corresponding psychiatric instrument, the Beck Depression Inventory-II (BDI-II).
Methods:
For this study, 252 early childhood teachers completed both the PHQ-9’s Korean version and the BDI-II. Confirmatory factor analysis was used to compare goodness-of-fit for four distinct factor models suggested by extant literature. Convergent validity was assessed by examining correlations between the PHQ-9 and the BDI-II.
Results:
A two-factor model with three items labeled “somatic” and six labeled “affective” provided the best fit. The scale’s convergent validity was supported by significant correlations with theoretically related measures, and its internal consistency was adequate.
Conclusion:
Overall, the results suggest that the PHQ-9’s Korean version is best conceptualized as a multidimensional measure of depression and confirms the PHQ-9 as a reliable assessment of depression among Korean early childhood teachers.
1. INTRODUCTION
Depression is one of the most common mental health problems worldwide. Projections to 2030 have indicated that, by then, depression will be the leading cause of disease globally [1]. In South Korea (Korea), major depressive disorder constitutes a critical public health concern. In fact, various Korean populations across diverse occupations suffer depression [2-4], and compared to all other professions, teaching is identified as high-stress [5]. In particular, teachers in early childhood education are prone to stress that results in poor mental health due to the physically and emotionally demanding nature of teaching young children [6]. These teachers’ most common sources of stress include high perceived workload, time constraints, problems with child behavior, relationships with coworkers, and insufficient resources [7, 8]. Moreover, early childhood education teachers have certain additional stressors such as dealing frequently with parents and performing other non-teaching-related duties [9, 10]. Specifically in Korea, the public remains unaware of these teachers’ professionalism, so social recognition of their status is low, as are their salaries [11]. Such stressors make them more vulnerable to mental disorders, thus negatively impacting not only teachers, but also instructional quality and children’s social and emotional development [12]. Moreover, a Korean study with 1634 early childhood education teachers found that 40.6% reported mental illness such as depression [13]. Given depressive symptoms’ growing prevalence among these professional educators, accurate assessment is of great importance.
One promising depression-screening tool is the Patient Health Questionnaire-9 (PHQ-9) [14], a nine-item self-report measure based on diagnostic criteria of major depressive episodes from the Diagnostic and Statistical Manual of Mental Disorders [15]. The PHQ-9’s validity and reliability have been extensively evaluated with various populations, including psychiatric patients [16], primary care patients [14], patients with spinal cord injury [17, 18], substance abusers [19], elderly individuals [20], university students [21, 22], and adolescents [23]. As for its underlying factor structure, however, previous studies have provided mixed results. While some studies found evidence for a one-factor model reflecting the depression construct’s unidimensionality [14, 16, 19, 21-23], others supported slightly different two-factor models [17, 18, 24]. The two-factor models have in common that one factor is represented by somatic symptoms and the other by affective symptoms.
Although the PHQ-9’s psychometric properties have been examined among Korean populations, including psychiatric patients with migraine and university students [22, 25], a number of issues need further exploration. First, no consensus has been reached on the PHQ-9’s factor structure. Second, no studies have examined its factor structure with early childhood education teachers, for whom identification of this construct’s measures is especially important. Third, transferring psychometric findings among psychiatric patients with severe migraine and university students to teacher populations nurturing and instructing young children would be erroneous. Thus, this study’s purposes were: (1) to explore the PHQ-9 Korean version’s factor structure for the best data fit by employing Confirmatory Factor Analysis (CFA) to test four models suggested by the literature and (2) to assess the PHQ-9’s convergent validity with a measure reflecting a similar construct. Generally, the PHQ-9 and BDI-II are commonly used to diagnose depressive disorders and are considered reliable for measuring the severity of depressive symptoms. However, no study has examined the convergent validity of these two measures among Korean early childhood education teachers.
2. METHODS
2.1. Participants and Procedures
The pilot study was conducted with four teachers from two child-care centers who agreed to participate in the study. The participants for the pilot study were recruited through the personal contacts of the principal investigator. First, the researcher visited the childcare centers in person to explain the purpose of the study and how to respond to the questionnaire on a particular scale. The responses were collected using a survey tool provided by the Google search engine. Additional question items were added to the existing questionnaire, such as the time required to respond, whether there were questions that were difficult to understand or ambiguous in meaning, and whether there was any inconvenience in using the system. It was revealed that questions were not difficult to understand and that the time required was approximately 20 minutes. Additionally, the system was redesigned to show the progress of the survey to reflect the opinion that the survey participants wanted to know when the survey would be completed because it was conducted online.
After the pilot study, participants were recruited through online postings on the Korean national early childhood teachers’ community website. Only registered teachers whose accounts have been verified can access the website. Postings described the study’s nature and purpose and directed those interested to an online-survey link to complete the questionnaire. Participants were informed that data would be treated confidentially and anonymously. They were also informed that the return of the completed questionnaire was considered as informed consent.
The sample consisted of 252 early childhood teachers (243 females, 9 males), with a mean age of 33.5 years (SD = 10.3, range = 21–59). On average, participants had 6.32 years of teaching experience (SD = 6.02, range 1 month–35 years) and worked at various types of childcare centers: onsite (49.6%), private (21.0%), public (11.9%), corporate (6.7%), domestic (5.6%), and others (5.2%).
2.2. Measures
A self-report questionnaire measuring depression severity, the Patient Health Questionnaire-9 (PHQ-9), consists of nine items scored on a 4-point Likert scale from 0 to 3, resulting in a total score of 0 to 27, with higher scores reflecting more severe depressive symptoms. This study used the PHQ-9’s Korean version, which has been verified and validated in Korean [26].
The Beck Depression Inventory-II (BDI-II) is a 21-item self-report scale that measures symptoms of depression. Each item is rated on a 4-point Likert scale (0–3), with a total summed score of 0 to 63, with higher scores implying greater depressive severity [27]. The BDI-II’s Korean version has been well validated in Korean adults and adolescents [28, 29]. In the current study, the BDI-II’s Cronbach alpha coefficient was 0.80.
2.3. Statistical Analysis
Statistical analysis was performed with IBM SPSS Statistics for Windows (version 23) and SPSS AMOS (version 20) (IBM Corp., Armonk, NY, USA). CFA with maximum likelihood procedure was used to assess the PHQ-9’s different latent structures. From the literature, four alternative plausible PHQ-9 models were tested. Specifically, Model 1 tested Kronke et al. [14] hypothesized original one-factor model, which posits that all items load on a single factor with uncorrelated error variance. Model 2a tested Richardson and Richard’s [18] two-factor model in which two latent variables are represented by somatic (5 items) and affective (4 items). Model 2b tested Krause et al.’s [17] two-factor model in which three items load on the somatic factor and six items on the affective factor. Finally, Model 2c, reported by Krause et al. [24], also proposed two factors, namely somatic (6 items) and affective (3 items).
Each model’s appropriateness was assessed using several fit indices, including the chi-square (χ2) and its subsequent ratio with the number of degrees of freedom (χ2/df); comparative fit index (CFI); goodness-of-fit index (GFI); root mean square error of approximation (RMSEA) and its 90% confidence interval (90% CI); and the standardized root mean square residual (SRMR). The model was considered to have good fit if the chi-square test divided by degrees of freedom is less than 3 [30], CFI ≥ 0.90 [31], GFI > 0.95 [32], RMSEA ≤ 0.06 [33], and SRMR < 0.08 [32, 33]. Chi-square difference tests were used to determine whether the models differed significantly. The PHQ-9 and BDI-II’s convergent validity was assessed using Pearson’s correlation, and the selected model’s internal consistency was calculated via Cronbach’s alpha (Fig. 1).

Model | k | χ² | df | χ²/df | CFI | GFI | RMSEA (90% CI) | SRMR |
Model 1 | 9 | 88.0* | 28 | 3.1 | 0.88 | 0.95 | 0.095 (0.073-0.117) | 0.065 |
Model 2a | 9 | 80.9* | 27 | 3.0 | 0.90 | 0.95 | 0.092 (0.069-0.115) | 0.063 |
Model 2b | 9 | 44.3* | 26 | 1.7 | 0.97 | 0.98 | 0.053 (0.024-0.079) | 0.044 |
Model 2c | 9 | 87.8* | 27 | 3.3 | 0.88 | 0.95 | 0.097 (0.075-0.120) | 0.065 |
*p < 0.01.
3. RESULTS
3.1. Descriptive Statistics
The mean PHQ-9 score was 14.20 (SD = 6.4). Considering cut-off scores from 8 to 11, participants could be used as cases. Items with the highest mean scores were 3, 4, and 5; of these, item 4 “Fatigue” was notable for the highest score.
3.2. Factor Structure
The CFA models’ goodness-of-fit indices are presented in Table 1, indicating that the one-factor model (Model 1) provided inadequate data fit due to CFI and RMSEA values outside recommended cut-offs; GFI and SRMR values only marginally indicated good fit (χ2 = 88.0, df = 28; χ2/df = 3.1; CFI = 0.88; GFI = 0.95; RMSEA = 0.095 (90% CI = 0.073-0.117); SRMR = .065). Model 2a fit the data slightly better than the one-factor model, as evidenced by a decreased chi-square value and improved CFI, RMSEA, and SRMR values (χ2 = 80.9, df = 27; χ2/df = 3.0; CFI = 0.90; GFI = 0.95; RMSEA = 0.092 (90% CI = 0.069-0.115); SRMR = .063). In contrast, Model 2b provided highly acceptable fit to data in CFI, GFI, RMSEA, and SRMR statistics—all in good-to-excellent ranges (χ2 = 44.3, df = 26; χ2/df = 1.7; CFI = 0.97; GFI = 0.98; RMSEA = 0.053 (90% CI = 0.024-0.079); SRMR = .044). Last, Model 2c yielded almost identical fit indices to Model 1 but provided the worst data fit of all models (χ2 = 87.8, df = 27; χ2/df = 3.3; CFI = 0.88; GFI = 0.95; RMSEA = 0.097 (90% CI = 0.075-0.120); SRMR = .065).
PHQ-9 item | Factor loadings |
1. Anhedonia | 0.63 |
2. Depressed mood | 0.65 |
3. Sleep difficulties | 0.69 |
4. Fatigue | 0.68 |
5. Appetite changes | 0.65 |
6. Feeling of worthlessness | 0.58 |
7. Concentration difficulties | 0.66 |
8. Psychomotor agitation/retardation | 0.60 |
9. Thoughts of death | 0.43 |
Notably, according to chi-square difference tests, Model 2b evidenced a reduction in χ2 over Model 1 (χ2(2) = 43.7, P < 0.001), Model 2a (χ2(1) = 36.6, P < 0.001), and Model 2c (χ2(1) = 43.5, P < 0.001). Therefore, Model 2b was deemed the best fit to data. Each item loaded significantly on its predicted factor, and all standardized factor loadings exceeded 0.40 (Table 2). Correlation between factors was 0.51.
3.3. Internal Consistency
Cronbach’s alphas for PHQ-9 somatic, affective, and overall scores were 0.72, 0.76, and 0.81, respectively.
3.4. Convergent Validity
The relationship between the two-factor solution for the PHQ-9 with the total PHQ-9 and BDI-II scores was obtained by Pearson’s correlation. A moderate correlation was found between total PHQ-9 and BDI-II scores (r = 0.75). Both somatic and affective factors also strongly correlated with the PHQ-9 total score (r = 0.82; r = 0.91). Moreover, somatic and affective factors had similar correlation patterns to those of BDI-II—all moderately correlated (r = 0.62; r = 0.68). All correlation p values were less than 0.01 (Table 3).
1 | 2 | 3 | 4 | ||||
1. Total PHQ-9 | - | ||||||
2. PHQ-9 Somatic | 0.82* | - | |||||
3. PHQ-9 Affective | 0.91* | 0.51* | - | ||||
4. BDI-II | 0.75* | 0.62* | 0.68* | - |
4. DISCUSSION
The PHQ-9 has been widely used with various populations in Korea but not with early childhood education teachers. In addition, the PHQ-9’s underlying factor structure, as explained by previous research, was inconsistent. This study’s purposes were to evaluate the PHQ-9 Korean version’s factor structure and convergent validity among Korean early childhood education teachers. Using CFA to determine which model had the greatest empirical support, the study tested four models from the literature. Analytical results revealed that the two-factor model with three somatic items and six affective items, as identified in patients with spinal cord injury [18, 24], provided the best data fit. Although early childhood teachers might differ qualitatively from patients with spinal cord injuries, the PHQ-9’s finding of distinct subscales of depression in these two populations possibly resulted because somatic symptoms are considered their norm rather than an exception [34, 35]. In fact, the examination of somatic and affective scores showed that the current study’s participants scored a higher average on the somatic subscale.
Although the PHQ-9 was originally developed as a unidimensional structure [14], findings here suggest that the PHQ-9 should be used as a multidimensional depression instrument with early childhood teachers in Korea. Consistent with theory and prior research that has increasingly suggested depression as a multidimensional construct [36, 37], current findings confirm the PHQ-9’s multidimensionality.
As for psychometric properties, the PHQ-9’s Korean version exhibited adequate reliability. The internal consistency values of the scores on the two subscales and the total scores were adequate (α>0.70). In terms of convergent validity, the correlations of the scores on the two subscales and the total scores on the PHQ-9 and BDI-II were satisfactory overall and in line with other studies [38, 39]. This correlation was slightly higher than the Pearson’s correlation coefficient of 0.70 reported by Lim et al. [40] when they compared the same instruments in populations of general Korean adults. Hence, the PHQ-9 scores evidenced convergent validity by their degree of correlation with the measure of depression in the BDI-II. Specifically, the moderate correlation between the PHQ-9 and the BDI-II suggests that the PHQ-9’s constructs constitute a valid depression measure for this population. These findings are compatible with those of previous studies that also found a correlation between PHQ-9 scores and measures of depression. Similar to other reports, the current analyses showed that a two-factor model representing somatic and affective factors provided the best fit to the BDI-II data [19, 41, 42].
Findings should be interpreted in light of several limitations. First, the sample size was relatively small, and most of the participants were female teachers. The primary reason for this gender imbalance is the low number of men in the early childhood education workforce, as childcare has historically been viewed as women’s work. The widespread public belief that women are more nurturing and caring than men has served as a barrier discouraging men from considering a career in early childhood education [43]. Additionally, epidemiological studies have consistently reported that men and women experience depression in different ways due to such factors as biological (gender) differences, gender-based roles, and identity [44, 45]. Therefore, the models should be assessed by gender, as the structures of the PHQ-9 may differ between males and females. For example, one study reported significant inequalities between genders on the PHQ-9 in a sample of Norwegian adolescents [46]. Although women are considerably overrepresented in the teaching profession, especially in early childhood education settings, future evaluations of the PHQ-9 would benefit from larger samples with more male respondents. Next, early childhood teachers are at particularly high risk of poor mental health, and, indeed, study participants’ higher than average scores indicated high levels of depression. The PHQ-9’s structure might differ between samples from the general population and various occupational groups, so these results are not generalizable to the general population or to other professions. As mentioned earlier, teaching, in particular early childhood education, is considered a relatively stressful profession with high turnover rates; moreover, early childhood education teachers are a specific occupational group at high risk of mental health disorders. Future studies could compare the PHQ-9’s structure between nonclinical and clinical samples of early childhood teachers to provide a more concrete understanding of their depression symptomatology and better understanding of how the PHQ-9 can be applied within various populations.
Despite these limitations, current findings have implications for clinical practice. This brief and valid measure’s availability is critical for helping healthcare professionals identify early childhood teachers at high risk of depression. Also, important is that clinicians use screening tools acceptable and appropriate for the specific population. Clarification of this instrument’s proper use is a fundamental contribution to aiding prevention, early detection, and treatment of depression.
CONCLUSION
In conclusion, findings from this study highlight that the PHQ-9 Korean version is a robust measure of depression among early childhood education teachers. Previous studies have affirmed somatic and affective constructs, and no items have been problematic. Hence, the model with two specific factors, especially Krause et al. two-factor model provided the best fit to data, and convergent validity and internal consistency were satisfactory overall. The Korean version of the PHQ-9 can serve as a valid and reliable screening tool to detect depression in early childhood education teachers.
ETHICS APPROVAL AND CONSENT TO PARTICIPATE
The Institutional Review Board at Woosong University in Korea (Protocol Code: 1041549-201006-SB-103) provided the study’s ethical approval.
HUMAN AND ANIMAL RIGHTS
No Animals were used in this research. All human research procedures followed were in accordance with the ethical standards of the committee responsible for human experimentation (institutional and national), and with the Helsinki Declaration of 1975, as revised in 2013.
CONSENT FOR PUBLICATION
Not applicable.
AVAILABILITY OF DATA AND MATERIALS
The data shall be shared on request by the corresponding author (B.L) upon reasonable request.
FUNDING
This research was supported by Woosong University (Grant No. WLB 1928-2332-01).
CONFLICT OF INTEREST
The authors declare no conflict of interest, financial or otherwise.
ACKNOWLEDGEMENTS
Declared none.