Adaptation and preliminary validation of the Montgomery-Åsberg Depression Rating Scale (MADRS) using the Structured Interview Guide (SIGMA) for European Portuguese

The Montgomery-Åsberg Depression Rating Scale (MADRS) is considered one of the gold-standard measures to assess depression severity. To standardise the MADRS admi-1

nistration, a structured interview was developed (SIGMA).This study aims to translate and validate the SIGMA for European Portuguese.Twenty patients (80% women) were interviewed by ten dyads of raters (trained clinical psychologists and psychiatrists) using the European Portuguese version of the MADRS and its structured interview SIGMA.There was no significant difference in the total MADRS score between raters (interviewers and observers).The intraclass correlation for the total score between raters using the SIGMA was excellent (r = .98;p < .001).All items had excellent to good item-level intraclass correlation, and the internal consistency by rater role was good.The European Portuguese version of the SIGMA showed good preliminary psychometric properties (reliability and internal consistency).Our results suggest that the SIGMA is a useful and robust interview guide for assessing the ten depression symptoms in the MADRS, regardless of the rater clinical background.
In their studies, Montgomery and Åsberg (Åsberg et al., 1978) noticed differences in the clinical effectiveness between antidepressant drugs, however, the available assessment instruments could not accurately measure the clinical changes that occurred over time across various psychiatric symptoms.Therefore, aiming to distinguish the treatment effects of several antidepressants in clinical trials, the authors developed the Comprehensive Psychopathological Rating Scale (CPRS; Åsberg et al., 1978).
To select the items and build the MADRS, the authors used the CPRS to collect data about the clinical status of the patients diagnosed with depression that participated in several clinical trials of antidepressant drugs.In the inclusion process, the authors considered only patients with a primary diagnosis of depression (Feighner et al., 1972), although variability regarding the characteristics of patients was desired.The sample consisted of endogenous and reactive, psychotic and non-psychotic, bipolar and unipolar out-patients and in-patients, ranging from 18 to 69 years old.Ultimately, the authors selected ten items, which aligned with the fundamental symptoms of depression.
Several studies have compared the MADRS to other similar instruments.For instance, Heo et al. (2007) showed significant and high correlations between MADRS and the Hamilton Rating Scale for Depression (HAM-D) in a sample of elderly patients.Similarly, Fernandes et al. (2019) showed moderate to high correlations between the MADRS and the HAM-D in a sample of adult out-patients diagnosed with major depressive disorder.Svanborg and Åsberg (2001) showed high correlations between the MADRS and the Beck Depression Inventory (BDI) in a sample of 37 in-patients diagnosed with anxiety and mood disorders.In Kroenke et al (2001) and Englbrecht et al. (2017) studies, the MADRS was also highly correlated both with the BDI-II and the Patient Health Questionnaire (PHQ-9) in a sample of patients with rheumatoid arthritis.Bjelland et al. (2002) showed a high and significant correlation between the MADRS and the Bipolar Depression Rating Scale (BDRS) in a sample of patients diagnosed with bipolar disorder.Khosravani et al. (2022) and Kjaergaard et al. (2014) showed that the MADRS was moderately correlated with the BDI-II and HADS-D.Overall, these results show MADRS's moderate to high correlations with other instruments assessing depression, across samples and diagnoses, indicating high convergent validity.
In comparison to the HAM-D, the MADRS time of administration is shorter, which is an advantage to consider in clinical trials (Iannuzzo et al., 2006).Concerning the MADRS internal structure, it differs from HAM-D in several aspects.In HAM-D, although the authors aimed for a multidimensional scale, previous studies have shown inconsistent results, varying between two to eight factors (Addington et al., 1996;Akdemir et al., 2001;O'Brien & Glaudin, 1988;Steinmeyer & Möller, 1992).Similarly, the MADRS factorial studies were not consistent with the number of factors varying from one to four (Galinowski & Lehert, 1995;Rocca et al., 2002;Serretti et al., 1999).Nevertheless, the MADRS shows increased consistency regarding the factorial structure across studies, when compared with the HAM-D (Iannuzzo et al., 2006).
Moreover, the MADRS was designed to provide a sensitive and accurate estimate of change over time (Åsberg et al., 1978).Indeed, the study by Mulder et al. (2003) showed that the MADRS was more sensitive to changes in depressive symptoms than the HAM-D across six weeks of treatment.Another important feature of the MADRS is that it can be applied by psychiatrists and non-psychiatrists and has adequate psychometric properties.Namely, Davidson et al. (1986) reported values between .57and .76 for inter-observer correlation and .76 for the total score.Furthermore, internal consistency was significant for seven of the ten items correlated with the score of remaining items [except for the items "Reduced Sleep" (.26), "Reduced Appetite" (.12), and "Suicidal Thoughts" (.29)].
When the MADRS was first published, there were no recommended questions for clinicians to use when collecting and rating the required information per item.However, adopting a structured interview guide is known to increase reliability on rating scales that are alike (Williams & Kobak, 2008), facilitating that the same Adaptation and preliminary validation of the Montgomery-Åsberg Depression Rating Scale (MADRS) using the Structured Interview Guide (SIGMA) for European Portuguese information is gathered across patients and clinicians, consequently increasing inter-rater reliability (Williams, 1988).Furthermore, a structured interview provides novel raters with unambiguous instructions, aiding their training by providing them with a range of questions derived from expert interviewers (Williams & Kobak, 2008).To address this and to increase inter-rater reliability, Williams and Kobak (2008) developed the structured interview guide for the MADRS (SIGMA; Williams & Kobak, 2008).Although, currently, the MADRS is one of the gold-standard measures used in research in mental health across the globe (Hengartner et al., 2020), its structured interview still needed to be translated and validated to European Portuguese.This study aims to fill this gap and complement the available MADRS with a translated and validated version of its structured interview for European Portuguese-speaking professionals.

Participants
Twenty patients (16 women) between 38 and 73 years old (mean ± SD: 54.26 ± 8.94 years) were recruited from two distinctive healthcare units (from the centre and south of Portugal) to be interviewed.Across patients, depression severity was heterogeneous by the time of the interview, with total scores in the MADRS ranging between 3 and 40 (mean ± SD: 22.25 ± 9.07) and half of participants scoring over 23.The socio-demographic and clinical features of the sample are depicted in Table 1.In one of the health units, the two raters were both psychologists with distinctive professional experience and worked together throughout the study.In the other health unit raters were senior psychiatrists (n = 3) and residents of psychiatry (n = 3) and the pairs were organized according to the clinical schedule of the team.Description of the raters' professional and socio-demographic profiles are depicted in Table 2. Adaptation and preliminary validation of the Montgomery-Åsberg Depression Rating Scale (MADRS) using the Structured Interview Guide (SIGMA) for European Portuguese

Procedure
Participants gave their informed consent before participating.The ethical committee of the affiliated research institution of the authors and of the health units where the participants were recruited were consulted and approved the study.We invited eight clinicians (six women) to participate in the study and to test the inter-rater reliability of the SIGMA, by conducting the MADRS interview using the SIGMA.Participants were organized in dyads, and each dyad interviewed each patient once.
During the interview, one rater conducted the interview and the second observed it.Both raters scored the interview of the same patient independently.The raters were randomly assigned to the dyads and the interview roles.No prior training was offered to the interviewers and the only instruction given was to follow the guide as closely as possible, using the instructions, the hints, and the questions of the translated guide.None had previous experience with the MADRS.

The MADRS and the SIGMA translation
Three clinical psychologists with experience in depression assessment and fluent in English translated the MADRS items and the SIGMA questions to European Portuguese.We achieved the final European Portuguese version by combining the first three independent translations and through the discussion of all conflicting expressions.This version was then back-translated into English.The raters (psychologists and psychiatrists) assessed the final version, and indicated no further suggestions or changes.

Statistical analysis
We performed the analyses in SPSS (version 25.0;IBM Corp., 2017) and used descriptive and frequency statistics to describe the samples of participants and raters and estimate the intraclass correlation for the total score between raters.Values were interpreted with the scale suggested by Blacker and Endicott (2000) (> .80 = excellent, between .70 and .80 = good, between .50 and .70 fair, < .50 = poor).We estimated the difference in the mean MADRS total scores between raters using independent samples t-test.For internal consistency reliability estimations, we used Cronbach's alpha.Finally, we also examined the interrater reliability at the item-level, and the inter-item and item-total correlations.Correlation coefficients were interpreted according to the criteria suggested by Pestana and Gageiro (2005).That is, correlation coefficients less than .20 reflect a very low association between variables; between .21 and .39 points correspond to a low association; between .40 and .69 to moderate; between .70 and .89are elevated; and higher than .90are very high.For all analyses, we assumed a minimum significance level of .05.

RESULTS
We found no significant differences between the mean MADRS scores obtained by the raters who conducted the interview (M = 21.8,SD = 9.31) and the raters who observed the interview (M = 22.7, SD = 9.03, t(38) = -0.31,p = .758).The intraclass correlation (ICC) between raters for the total score was excellent (r = .98,p < .001,95% CI [.94, .99]).The internal consistency by rater role was also good, both for raters who were interviewers (α = .80)and for raters who were observers (α = .74).At the item level, the ICC estimates were good to excellent with all items showing excellent interrater reliability (r > .80,p < .01)except item 4 ("Reduced Sleep") that showed a lower ICC value (r = .70,p < .01).Regarding inter-item and inter-total correlations, item 5 and the total score were very high, item 4 was moderate, and the remaining items presented high correlations.Detailed Intraclass Correlations, Cronbach's alphas, Inter-Item and Item-Total Correlations can be found in Table 3.

DISCUSSION
To the best of our knowledge, this is the first translation, adaptation, and validation of the MADRS' structured interview guide for European Portuguese.Ten dyads assessed the MADRS independently (one interviewer and one observer), and a total of 20 patients from the south and centre of Portugal were included in the study.Although our sample was small, internal consistency showed positive preliminary results.
The internal consistency in our study was within the range reported in earlier studies of adult populations, despite the differences in study populations, ages, methods, and settings (Fernandes et al., 2019;Lobo et al., 2002;Ntini et al., 2020;Takahashi et al., 2004).In contrast with other studies where assessments depended on the rater background (psychiatrists, psychologists, students, and psychiatric nurses; Schmidtke et al., 1988), the values we found for inter-rater reliability were good regardless of the experience level or clinical background of the rater (psychology or medicine).This indicates the robustness, uniqueness and usefulness of the measure.These results suggest that the SIGMA might contribute to standardise the way practitioners collect information regarding depression severity.The expected increased inter-rater reliability in the MADRS with the addition of the SIGMA comes in line with what was found when a structured interview was added to the administration of Hamilton Depression and Anxiety Scales (Bruss et al., 1994;Williams, 1988).
Despite the overall high inter-item and inter-total correlations, "Reduced sleep" showed just moderate correlation coefficients.This result is in line with a previous MADRS validation where "Reduced sleep", "Reduced Appetite" and "Suicidal thoughts" correlated poorly with the total score (Davidson et al., 1986).This may be explained by a simplified vision of depression, a complex and diverse clinical condition, the frequently presents with other comorbidities.Therefore, looking at the diagnosis of depression as a discrete phenomenon according to defined and restrictive inclusion or exclusion criteria, that have to be present simultaneously with a pre-defined time-duration, overlooks phenomenological experience of symptoms and its relationships (for an in-depth discussion, see Research Domain Criteria -RDoC).
Our results should be interpreted considering some limitations.We did not include another measure of depression, so it was impossible to test the convergent validity.Concerning the sampling method, we used a convenience sample, which might reduce the generalizability of the results.On the other hand, we collected data in two clinical settings, which might contribute to the greater representability of the collected data.Another limitation of the current study was the absence of test-retest reliability analysis.Future studies should recruit larger and more representative samples and test the convergent validity and the temporal stability of the results.

CONCLUSIONS
This preliminary study provides evidence for the validity and reliability of the European Portuguese version of the SIGMA for assessing depression severity in Portuguese patients.The study's findings suggest that the SIGMA is a useful and robust interview guide for assessing the ten depression symptoms present in the MADRS, regardless of the rater clinical background and professional experience.Additionally, it contributes with a useful tool to guide the administration of the MADRS for European Portuguese speaking clinicians and researchers, offering a better standardized assessment.The study results could additioanlly inform the development of standardized assessments for depression in other languages and cultures, allowing for cross-cultural studies and helping to ensure consistency and comparability across different settings.

Table 1
Frequencies for the demographic characteristics of the patients(N = 20)

Table 1 (
Continued)Frequencies for the demographic characteristics of the patients(N = 20)

Table 2
Frequencies for the demographic characteristics of the raters(N = 8) PSYCHOLOGICA VOLUME 66 • 2023

Table 3
Intraclass correlations, Cronbach's alphas, inter-item and item-total correlations between raters on individual items of the MADRS