More on the Dimensionality of the GHQ-12: Competitive Confirmatory Models*

The General Health Questionnaire (GHQ) was designed to measure minor psychiatric morbidity by assessing normal ‘healthy’ functioning and the appearance of new, distressing symptoms. Among its versions, the 12-item is one of the most used. GHQ-12’s validity and reliability have been extensively tested in samples from different populations. In the Spanish version, studies have come to different conclusions, of one, two, and three-factor structures. This research aims to present additional evidence on the factorial validity of the Spanish version of the GHQ-12, using competitive confirmatory models. Three samples of workers (N= 525, 414 and 540) were used to test a set of substantive models previously found in Spanish and international literature. Results showed that multidimensional models had moderate to substantial inter-factor correlations (ranging from 0.29 to 0.76) but not as high as to jeopardize their discriminant validity. The best-fitting models were the original solution by Graetz (1991) and the exploratory three-factor solution offered by Rocha et al. (2011), multidimensional three-factor solutions with correlated factors. The conclusion is that a multidimensional threefactor structure underlies the items in the GHQ-12.


Palabras clave
Distress; GHQ-12 ; modelos competitivos; población española; validez factorial. Goldberg (1972) designed the General Health Questionnaire (GHQ) to measure minor psychiatric morbidity by assessing normal 'healthy' functioning and the appearance of new, distressing symptoms (Baksheev, Robinson, Cosgrave, Baker & Yung, 2011). The GHQ is a self-administered screening questionnaire developed in several versions: the original 60item version (Goldberg, 1972), and shorter versions of 30, 2 8 and 12 items (Goldberg & Williams, 1988). It is considered "the most widely used instrument for detecting non-psychotic psychiatric cases" (Molina et al., 2006, p. 478). Among all versions, the 12-item and the 30-item versions have been the most used in community samples (Molina et al., 2006), and probably, the GHQ-12 has become the most popular form of the scale because of its relatively good psychometric properties and its brevity (Goldberg & Williams, 1988).
Consequently, several models that tested for the GHQ-12 factor structure adding these wording effects started to appear along literature, and some have found support for a unidimensional structure with wording effects associated to negatively worded items (Smith et al., 2010;Smith et al., 2013;Solís-Cámara, Meda-Lara, Moreno-Jiménez & Juárez-Rodríguez, 2016).
Concerning the Spanish version of the scale, there is also a certain amount of accumulated evidence on its psychometric properties, and specifically on its factor structure. Some of this evidence comes from exploratory factor analyses. López-Castedo and Fernández (2005), for example, studied a non-probabilistic sample of 1930 Spanish adolescents and found support for a two-factor solution of anxiety and social dysfunction that conjointly explained 46.8% of the variance. Sánchez-López and Dresch (2008), in turn, found a multidimensional structure of three correlated factors in another nonprobabilistic sample of the general Spanish population. They were named successful coping, which included all positively worded items, self-esteem (items 6, 9, 10, and 11), and stress (items 2, 5, and 9). The factors were moderately inter-correlated, with a more significant value for factors two and three. Finally, Rocha, Pérez, Rodríguez-Sanz, Borrell and Obiols (2011) studied the psychometric properties of the GHQ-12, including its factorial validity, on a Spanish representative sample (29476 participants). They used exploratory factor analyses forced to one, two, and threefactor solutions that respectively explained 67%, 82%, and 91% of the variance. They concluded that the unidimensional solution adequately represented the observed scores and, consequently, gave normative values to be used in the Spanish population.
More convincing results are offered when confirmatory factor analyses (CFA) are employed, and this has been the case in some studies (González-Romá, Lloret & Espejo, 1993). González-Romá et al. (1993) studied two-factor structures (one factor and two factors of anxiety and depression) in two nonprobabilistic samples of 167 and 112 workers and found poor model fit for both structures, but slightly better for the multidimensional. Padrón, Galán, Durbán, Gandarillas and Rodríguez-Artalejo (2012) tested four confirmatory models: unidimensional model, two-factor (positive and negative items) structure, a three-factor model found by Graetz (1991), and the three-factor solution explored in their data. Their results showed that the three-factor structure was the best fitting model, but with high inter-factor correlations (ranging between 0.72 and 0.84). Finally, Aguado et al. (2012) were the first ones to model method effects into the Spanish version of the GHQ-12. They concluded that a onefactor model including method effects among negatively worded items better represented the data of their postpartum women sample.
Taking all these into account, this research aims to present additional evidence on the factorial validity of the GHQ-12 in its Spanish version. A set of substantive models that have been previously found in Spanish and international literature, including method effects associated with negatively worded items, will be tested. This complete set of models will be systematically tested in three independent samples of workers.

Method
Samples and procedure. Three independent samples of Spanish workers were used for this study. The first sample (A) comprised 525 workers, who were gathered through a systematic sampling of all the workers that underwent their annual medical tests at the Valencian Health and Safety Executive (Province of Valencia, Spain). Participants were randomly selected from all the workers attending this health-check during a one-year period. 85.4% were men. Participants' average age was 37 years, SD = 10.84, the minimum age was 16 and the maximum was 64 years old. The second sample (B) was composed of 414 youth employed at the beginning of their work careers. They worked in companies located in 11 Spanish provinces. 57.9% of participants were males. Age ranged from 17 to 34 years old with an average of 22.6 years (SD= 3.85). The third sample (C) was composed of 540 public servants working for two Autonomous Regions in Spain (Valencian Community and Andalucía). 53.81% of them were male with. Age ranged from 22 to 56 years old with an average age of 35 years (SD= 6.41).
Instruments. The General Health Questionnaire (GHQ-12) developed by Goldberg and Williams (1988) was used, with the item content in the Spanish version by Lobo and Muñoz (1996). This questionnaire consists of 12 items, 6 of which are positive and the remaining 6 are negative statements. Items in the three analyzed samples were answered using a 4point Likert-scale from 0 (better than usual) to 3 (much less than usual). Example items are "Lost much sleep over worry" or Felt you could not overcome your difficulties".
Statistical analyses. Confirmatory Factor Analyses were used to test for the a priori structures in the GHQ-12. These CFA were estimated with EQS 6.1 (Bentler, 2000(Bentler, -2018. Maximum likelihood with robust Satorra-Bentler corrections on polychoric correlations matrices was the estimation method of choice, given that multivariate normality was not tenable (Mardia multivariate coefficients were 25.6, 22.6, and 95.4, for samples A, B, and C, respectively), and the response format was ordinal (Finney & DiStefano, 2013). Goodnessof-fit for each model was assessed using indexes based on different approaches (Kline, 2015): 1) χ 2 statistic; 2) CFI (Comparative Fit Index), and 3) RMSEA (Root Mean Square Error of Approximation) and its 90% confidence interval. Robust versions of all tests and fit indices have been used. The χ 2 goodness-of-fit statistic is a test of the difference between the observed covariance matrix and the one predicted by the specified model. χ 2 value with a probability value greater than 0.05 indicates good fit; however, this statistic is affected by several limitations and has very restrictive assumptions (dependence on sample size, multivariate normality, use the correct model). Therefore, other indices less affected by sample size and model complexity (Kline, 2015) were used. Values higher than 0.90 for the CFI or lower than 0.08 in RMSEA are considered a reasonable fit (Kline, 2015), although values higher than 0.95 in CFI or lower than 0.05 in RMSEA are more desirable and considered excellent fit (Caycho-Rodríguez et al. 2018). It has also been suggested that the combination of a CFI more than 0.90 with an RMSEA value lower than 0.06 may indicate an extremely good fit (Caycho-Rodríguez et al. 2018). All models found in the literature with a reasonable model fit to represent the underlying structure of the GHQ-12 were specified and tested. Ten completely a priori or strictly confirmatory models were tested: a) Model 1 was a one-factor model (minor psychiatric morbidity), as found by Banks et al. (1980) or defended in the Spanish version by Rocha et al. (2011). It also served as a baseline (most parsimonious) model. b) Model 2 contained two correlated factors grouping the positively (items 1, 3, 4, 7, 8, and 12) and negatively worded items (2, 5, 6, 9, 10, and 11), a structure based on Andrich and Van Schoubroeck (1989) c) Model 3 specified two correlated factors of anxiety/depression (items 1, 2, 3, 6, 7, 10, and 11) and social performance (3, 4, 5, 8, 9, and 12), based on Schmitz et al. (1999). d) Model 4 was composed of two correlated factors: dysphoria (items 2, 5, 6, 9, 10, 11, and 12) and social dysfunction (items 1, 3, 4, 7, 8, and 12), based on Politi et al. (1994) best fitting model. e) Model 5 specified three substantive correlated dimensions of cope (items 1, 3, 4, and 8), stress (items 2, 5, and 7), and depression (items 6, 9, 10, 11, and 12), and was based on a content analysis made by Martin (1999). f) Model 6 presented three correlated factors: dysphoria (items 2, 5, 6, and 9), social dysfunction (items 1, 3, 4, 7, 8, and 12), and loss of confidence (items 10 and 11), that is, the factor solution supported in Graetz (1991). g) Model 7 included three correlated factors, very similar to those in Graetz (1991), but found in the 3-factor exploratory solution by Rocha et al. (2011): dysphoria (2, 5, and 9), social dysfunction (1, 3, 4, 7, 8, and 12), and loss of confidence (6, 9, 10, and 11). It included a cross-loading, as item 9 was considered an indicator of both dysphoria and social dysfunction. Therefore, two additional models were specified, to override the crossloading: a) Model 7a, with the same structure as model 7 but with item 9 loading only on dysphoria. b) Model 7b, with model 7 structure but with item 9 loading only on social dysfunction.
h) Additional to the substantive models, Model 8 considered a method factor related to the negatively worded items, together with the trait dimension of minor psychiatric morbidity. This model found support, among others, in the Spanish version of the scale by Aguado et al. (2012).

Results
Goodness-of-fit indexes for the ten models tested are shown in Table 1. A first general look at this table showed that the one-factor solution (model 1) originally proposed by Goldberg for the GHQ-12 was never the best fitting model, and generally speaking, it was always the worst. In a nutshell, a one-factor solution did not adequately represent the variance-covariance matrix among the items in the GHQ-12. Secondly, all models that posited two trait factors (models 2, 3 and 4) also inadequately represented the observed data. In none of the cases their fit indexes reached the stricter cut-off criteria, and most of the time they were well below the more "relaxed" cutoff criteria. Model 5 is the first three-factor model, and again its fit could not be considered enough. The same happened with model 8, which included method effects associated with negatively worded items, and again did not show a relevant increment on its fit compared to other models.

Table 1
Goodness-of-fit indexes for the three samples in the 10 a priori models Note.df = degrees of freedom; *p < 0.001. Table 1 that both Graetz and Rocha's three-factor models had the best fit through all indexes and samples considered. The goodness-of-fit indexes were very similar in both cases, for model 6, 7 and the derived models 7a and 7b. This was not surprising since both models only varied in one indicator loading. Goodnessof-fit was slightly better for Graezt's solutions in samples B and C, while it was better for Rocha's model in sample A. Table 2 shows the standardized factor loadings for model 6 estimated in the three samples. All factor loadings were statistically significant and substantial. Finally, Table 3 shows correlations among the factors for multidimensional models. A clear result is that multidimensional models do have moderate to large inter-factor correlations, but not as high as to jeopardize their discriminant validity.

Table 2
Standardized loadings (Samples A, B and C)

Discussion and conclusions
According to all data, the GHQ developed by Goldberg (1972), and specifically its shortest GHQ-12, is one of the most widely used and studied indicator of minor psychiatric disorders. It has been used as a screening tool that can be easily administered in adult and adolescent populations alike (Pena & Caine, 2006), and it is meant to be valid and reliable. Nevertheless, there is not a broad consensus on its structure, the number of dimensions, and items. This holds true across the different languages into which it has been translated. The solution that has received more support along literature is the multidimensional three-factor structure found by Graetz (1991), but the one-factor solution, with method effects associated to negatively worded items, has also had some support (Smith et al., 2013).
Regarding the Spanish version, the cumulated information on its factorial validity is not that large, but as seen in the Introduction, it has also been subject to controversy. Two recent contributions to the existing literature on the structure of the GHQ-12 are those from Rocha et al. (2011) and Aguado et al. (2012). On the one hand, Rocha et al. (2011) studied an extensive representative sample of the Spanish population (approximately 25000 people), and using EFA found support for a one-factor, two-factor, and three-factor structure, but decided to retain the simplest one-factor structure. They derived population norms for this single dimension to be used in the Spanish population. On the other hand, Aguado et al. (2012) tested several competing models via CFA. Among them, they included the one-factor solution, a one-factor solution with method effects associated with the negatively worded items, and Graetz's (1991) multidimensional model. They found that the best fitting models were the one-factor model with method effects and Graetz's (1991) threefactor solution. Nevertheless, fit indexes for several of the models proposed were extremely close to those of the best fitting models.
Taking both contributions together, the overall conclusion is that the GHQ-12 structure is far from being well established. Rocha et al.'s (2011) results, although coming from an impressive sample, were exploratory and it is pending to confirm if the one-factor model would fit better than the three-factor exploratory solution that explained a 91% of the variance (vs. the 67% in the one-factor solution). With regards to the latest contribution by Aguado et al. (2012), they used CFA to test the main models proposed in the literature. However, the fit among the different solutions was extremely close, especially the three-factor and the method effects' solutions, casting some doubts on GHQ-12 dimensionality. Moreover, their sample was particular, postpartum women, which may lead to sample-specific results. Therefore, these two latest contributions to the factorial validity of the GHQ-12, although extremely important, should be complemented with further analyses of new samples from the Spanish population.
Current results offer evidence on three samples of workers: industrial workers of all ages, young workers entering the labor market, and finally civil servants. Results are consistent among the three samples, as the best fitting models were the original solution by Graetz (1991) and the exploratory three-factor solution offered by Rocha et al. (2011); multidimensional three-factor solutions with correlated factors. It should be borne in mind that both structures are almost equal, as the only difference is in one indicator (item 6). Accordingly, both structures labeled the factors as dysphoria, social dysfunction, and loss of confidence. Additionally, these factors showed discriminant validity, given that correlations ranged from a minimum of 0.29 to a maximum of 0.76, depending on the sample analyzed, and in any case, the interval of confidence included 1. Although results in the three samples showed different values for the correlations among factors, we believe there may be some characteristics on the type of worker surveyed that make the relationship among the factors of the questionnaire vary among samples. In our opinion, if the scale indeed measures three factors with enough discriminant validity, a question should be clearly stated: Are the normative values offered in Rocha et al. (2011) useful in the Spanish population?
A potential limitation of the study is that it only offers evidence on the factorial validity of the scale. Implications of the different factor structures proposed in terms of criterion-related validity would be of interest.