$$\rightleftharpoonup{xx}$$
$$\longleftharp{xx}$$,
$$\longrightharp{xx}$$,
The World Health Organization (WHO) defines psychiatric disorders as a combination of abnormal thoughts, perceptions, emotions, behavior, and interpersonal relationships1. These conditions represent a significant cause of long-term disability and mortality2. The broad spectrum of these disorders includes major depressive disorder, obsessive-compulsive disorder (OCD), generalized anxiety disorder, and bipolar disorder3. Although there are treatment options for each of these mental disorders, the accuracy of the diagnosis is critical to provide adequate evidence-based treatment4.
Regarding diagnostic assessment, several guidelines, such as those from the National Institute for Health & Clinical Excellence, recommend the use of validated assessment measures relevant to the disorder that is being assessed5, in order to provide additional information for the clinician6. There are several instruments for a variety of mental disorders7, developed to screen, diagnose, and assess symptom severity or response to treatment8,9. However, before being considered adequate, an instrument must offer accurate, valid, and interpretable data for the population to be assessed10. Importantly, the quality of the information about a specific individual depends on the psychometric properties of the instrument used11. To reduce bias in the testing process, from application to interpretation of the results, psychological measures should be standardized8. This was the main reason for the creation of the Standards for Educational and Psychological Testing, as a basis for evaluating tests, testing practices, and the impact of test use12. Equally important is the fact that most instruments were developed in English-speaking countries13 making cultural and linguistic adaptation necessary prior to use in a new country, culture, and/or language, to reach equivalence between the original (source) and the newly adapted (target) versions of the questionnaire14.
When an established instrument is not available in a specific language or culture, researchers face a choice between two main strategies: developing a new, context-specific instrument or performing a cross-cultural adaptation of an existing, well-validated measure15. While the development of a novel instrument can ensure maximum cultural specificity, it is an extremely resource- and time-intensive process that may take years16. In contrast, the adaptation of an established 'gold-standard' instrument offers distinct advantages. This approach is often more efficient and, critically, it allows for the cross-cultural comparison of findings from different populations, which is a primary goal of adapting measures rather than creating new ones17.
The International Test Commission has developed guidelines for cross-cultural translation and adaptation of psychological instruments17. Translation can be considered the first stage of the adaptation process18, and can be conducted using one or both of the two most popular methods of test translation: (a) translation and back-translation, or (b) two independent translations that are compared by a third person19. The cultural adaptation process requires that, in addition to an exact translation, an adaptation process be conducted to maximize semantic, idiomatic, experiential, and conceptual equivalence between the original measure and those that are developed from it14,20. Finally, the psychometric properties of a translated instrument should be evaluated in order to compare them with the original measure in the primary language20. Specifically, it is important to assess reliability and validity8,9,21, assuring, respectively, that the instrument results in a consistent measurement, and that it measures the intended construct22.
Reliability refers to the reproducibility of a test result when obtained at different times, in different settings, or by different interviewers, regarding coherence, stability, equivalence, and homogeneity23,24,25. It can be evaluated through several methods, including assessments of test-retest, alternate forms, split-half reliability, as well as internal consistency8,22,26, determining whether the measures are sufficiently consistent and free from measurement error8. Although an instrument that is not reliable cannot be valid, a reliable instrument can sometimes be invalid10. Validity is considered according to three categories27,28, namely content validity, construct validity, and criterion validity. The concept of content validity concerns the extent to which a test adequately samples the dimension it is intended to measure22, while construct validity, including convergent and discriminant validity (sometimes referred to as divergent validity29), represents the degree to which the variance of the measure is linked with the variance of the underlying construct30,31. Criterion validity is based on relationships between test scores9 and should be assessed using another measure of the same construct, ideally a widely accepted measure that is considered the gold standard8,28. This category of validity is especially important to understand whether a measure can be used to make predictions and/or decisions about patients25, which is the case in establishing a diagnosis.
Numerous guidelines for the cross-cultural adaptation of psychometric instruments have been published to aid researchers in this complex process17,32. However, systematic reviews of this literature have highlighted a lack of a single, unified consensus on the best methodology to follow33. Furthermore, many existing guides, while valuable, may focus more on the initial linguistic translation than on the equally critical subsequent psychometric validation required to ensure an instrument is ethically sound for clinical use19. This creates a need for a detailed, replicable protocol that integrates both the adaptation and a comprehensive validation phase into a single, step-by-step framework.
Standardized research practices focusing on the validation of psychometric measures are thus essential. The method described in this paper will provide researchers and clinicians with a detailed protocol to perform cultural adaptation of a psychometric measure and, specifically, to assess criterion validity for the diagnosis of a psychiatric disorder. To help readers assess its applicability and to ensure replicability, the protocol includes key practical details, such as sample size considerations, the rationale for multi-session administration timings, and a discussion of known limitations. For that purpose, we will use, as an example, the validation study of the European Portuguese Yale-Brown Obsessive-Compulsive Scale-Second Edition (PY-BOCS-II)34, in which a similar protocol was used to clarify the factor structure and criterion validity of the PY-BOCS-II for the diagnosis of OCD in adults. Therefore, this protocol can also be used for future validation studies of Y-BOCS-II in other contexts or languages.