RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
Research Article
Sílvia Almeida1,2, Pedro Castro-Rodrigues1,3,4, J. Bernardo Barahona-Corrêa1,3, Telmo Mourinho Baptista5, Jaime Grácio1,3, Albino J. Oliveira-Maia1,3
1Champalimaud Research and Clinical Centre,Champalimaud Foundation, 2Graduate Programme in Clinical and Health Psychology,Faculdade de Psicologia da Universidade de Lisboa, 3NOVA Medical School, Faculdade de Ciências Médicas, NMS, FCM,Universidade NOVA de Lisboa, 4Centro Hospitalar Psiquiátrico de Lisboa, 5Faculdade de Psicologia da Universidade de Lisboa
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This paper aims to provide a detailed protocol for performing the cultural adaptation and psychometric validation of a structured interview to assess the severity of symptoms of a specific psychiatric disorder. Empirically supported procedures, beginning with the selection of the measure and detailing the experimental procedures and statistical analysis, are presented.
Psychiatric disorders are a significant cause of long-term disability and mortality. Although treatment is available, diagnostic accuracy is critical to provide adequate evidence-based treatment and to develop novel therapies. To inform the diagnostic process during clinical interviews, the use of validated assessment measures, including self-report questionnaires and structured interviews, is highly recommended. However, such instruments must have excellent psychometric properties, particularly regarding reliability and validity, to ensure accurate and interpretable data for each individual. Furthermore, applying an instrument in a new country, context, or language requires a formal cultural adaptation. This process is mandatory to ensure that the findings from the adapted version are equivalent to those of the original questionnaire.
Here, we describe a detailed protocol for cultural adaptation and comprehensive psychometric validation of a psychometric instrument. Specifically, we outline the steps for selecting the measure, conducting the experimental procedures, and performing statistical analyses required to establish the instrument's psychometric properties, including reliability, construct validity, and criterion validity for diagnosis of a psychiatric disorder. Our primary purpose is to present a transparent, standardized method for culturally adapting and validating psychometric instruments. Such a procedure helps minimize confounding factors and undesired variability in future applications and research. We expect that this protocol, including a range of empirically supported methods, will be useful in research settings for the cultural adaptation of psychometric instruments for psychiatric assessment.
The World Health Organization (WHO) defines psychiatric disorders as a combination of abnormal thoughts, perceptions, emotions, behavior, and interpersonal relationships1. These conditions represent a significant cause of long-term disability and mortality2. The broad spectrum of these disorders includes major depressive disorder, obsessive-compulsive disorder (OCD), generalized anxiety disorder, and bipolar disorder3. Although there are treatment options for each of these mental disorders, the accuracy of the diagnosis is critical to provide adequate evidence-based treatment4.
Regarding diagnostic assessment, several guidelines, such as those from the National Institute for Health & Clinical Excellence, recommend the use of validated assessment measures relevant to the disorder that is being assessed5, in order to provide additional information for the clinician6. There are several instruments for a variety of mental disorders7, developed to screen, diagnose, and assess symptom severity or response to treatment8,9. However, before being considered adequate, an instrument must offer accurate, valid, and interpretable data for the population to be assessed10. Importantly, the quality of the information about a specific individual depends on the psychometric properties of the instrument used11. To reduce bias in the testing process, from application to interpretation of the results, psychological measures should be standardized8. This was the main reason for the creation of the Standards for Educational and Psychological Testing, as a basis for evaluating tests, testing practices, and the impact of test use12. Equally important is the fact that most instruments were developed in English-speaking countries13 making cultural and linguistic adaptation necessary prior to use in a new country, culture, and/or language, to reach equivalence between the original (source) and the newly adapted (target) versions of the questionnaire14.
When an established instrument is not available in a specific language or culture, researchers face a choice between two main strategies: developing a new, context-specific instrument or performing a cross-cultural adaptation of an existing, well-validated measure15. While the development of a novel instrument can ensure maximum cultural specificity, it is an extremely resource- and time-intensive process that may take years16. In contrast, the adaptation of an established 'gold-standard' instrument offers distinct advantages. This approach is often more efficient and, critically, it allows for the cross-cultural comparison of findings from different populations, which is a primary goal of adapting measures rather than creating new ones17.
The International Test Commission has developed guidelines for cross-cultural translation and adaptation of psychological instruments17. Translation can be considered the first stage of the adaptation process18, and can be conducted using one or both of the two most popular methods of test translation: (a) translation and back-translation, or (b) two independent translations that are compared by a third person19. The cultural adaptation process requires that, in addition to an exact translation, an adaptation process be conducted to maximize semantic, idiomatic, experiential, and conceptual equivalence between the original measure and those that are developed from it14,20. Finally, the psychometric properties of a translated instrument should be evaluated in order to compare them with the original measure in the primary language20. Specifically, it is important to assess reliability and validity8,9,21, assuring, respectively, that the instrument results in a consistent measurement, and that it measures the intended construct22.
Reliability refers to the reproducibility of a test result when obtained at different times, in different settings, or by different interviewers, regarding coherence, stability, equivalence, and homogeneity23,24,25. It can be evaluated through several methods, including assessments of test-retest, alternate forms, split-half reliability, as well as internal consistency8,22,26, determining whether the measures are sufficiently consistent and free from measurement error8. Although an instrument that is not reliable cannot be valid, a reliable instrument can sometimes be invalid10. Validity is considered according to three categories27,28, namely content validity, construct validity, and criterion validity. The concept of content validity concerns the extent to which a test adequately samples the dimension it is intended to measure22, while construct validity, including convergent and discriminant validity (sometimes referred to as divergent validity29), represents the degree to which the variance of the measure is linked with the variance of the underlying construct30,31. Criterion validity is based on relationships between test scores9 and should be assessed using another measure of the same construct, ideally a widely accepted measure that is considered the gold standard8,28. This category of validity is especially important to understand whether a measure can be used to make predictions and/or decisions about patients25, which is the case in establishing a diagnosis.
Numerous guidelines for the cross-cultural adaptation of psychometric instruments have been published to aid researchers in this complex process17,32. However, systematic reviews of this literature have highlighted a lack of a single, unified consensus on the best methodology to follow33. Furthermore, many existing guides, while valuable, may focus more on the initial linguistic translation than on the equally critical subsequent psychometric validation required to ensure an instrument is ethically sound for clinical use19. This creates a need for a detailed, replicable protocol that integrates both the adaptation and a comprehensive validation phase into a single, step-by-step framework.
Standardized research practices focusing on the validation of psychometric measures are thus essential. The method described in this paper will provide researchers and clinicians with a detailed protocol to perform cultural adaptation of a psychometric measure and, specifically, to assess criterion validity for the diagnosis of a psychiatric disorder. To help readers assess its applicability and to ensure replicability, the protocol includes key practical details, such as sample size considerations, the rationale for multi-session administration timings, and a discussion of known limitations. For that purpose, we will use, as an example, the validation study of the European Portuguese Yale-Brown Obsessive-Compulsive Scale-Second Edition (PY-BOCS-II)34, in which a similar protocol was used to clarify the factor structure and criterion validity of the PY-BOCS-II for the diagnosis of OCD in adults. Therefore, this protocol can also be used for future validation studies of Y-BOCS-II in other contexts or languages.
The procedures described here were developed to collect the data described by Castro-Rodrigues et al.34. The protocol was prepared in accordance with the Declaration of Helsinki, and participants were informed of the possibility of withdrawing from the study at any time. It was reviewed and approved by the Ethics Committees of the Champalimaud Foundation (approval granted on October 22, 2014) and Centro Hospitalar Psiquiátrico de Lisboa (approval granted on November 14, 2014). Use of this protocol for other projects or in other locations should be performed only after approval by local Ethics Committees and/or other competent authorities at that location. Specific examples regarding the Portuguese adaptation of the Y-BOCS-II34 are given to illustrate some of the steps, and specific instructions for the validation of the Y-BOCS-II for other languages/contexts are provided.
1. Selection of the scale of interest
2. Selection of other measures for assessment of psychometric properties of the scale
3. Translation and cultural adaptation of the primary instrument
4. Selection and recruitment of participants
5. Preparation and application of the test battery
6. Statistical analysis
Despite its gold-standard status and comprehensive structure, the criterion validity of Y-BOCS-II35 for diagnostic purposes had not been robustly established in the literature at the time the study was conducted. Therefore, this protocol aimed to address this general gap while performing the necessary cultural adaptation for Portugal. In this section, representative data from the validation study of the PY-BOCS-II34 are presented, with permission from the authors. The results are presented in two parts. First, we summarize the qualitative findings from the pilot testing phase, which informed the final version of the instrument. Second, we present the quantitative results regarding its criterion validity.
The cultural adaptation process included a pilot test with patients, followed by cognitive debriefing interviews to assess the clarity and comprehensibility of the adapted instrument34. While most participants found the instructions clear, the qualitative feedback highlighted several key areas for refinement. According to patients, issues included the instrument's length, discomfort with certain examples (reported by one participant), and the difficulty in quantifying the average daily time spent on symptoms due to their episodic nature. The interviewers involved in this process also provided critical feedback, noting that some questions did not flow smoothly, and identified practical formatting issues such as a lack of space for notes and the need for headers to be repeated on each page. This feedback led to adjustments, including minor wording and formatting changes, to produce the final version of the instrument used for the large-scale validation.
For the criterion validity analysis, we recruited a small sample of patients with a diagnosis of either OCD (n = 20) or a mood or anxiety disorder (n = 18), and the PY-BOCS-II was administered by a researcher blinded to the diagnostic status and the results of other psychometric tests, to avoid criterion contamination. Receiver Operating Characteristic (ROC) curves were created to assess criterion validity, using the SCID-OCD as the gold standard for the discrimination between participants with OCD and those with other diagnoses. Figure 1 shows the ROC curve for the discrimination between patients with either OCD or another mood and anxiety disorder, assessed in a blinded fashion. An area under the curve (AUC) of 0.93 (95% confidence interval [CI]: 0.84-1.00) was obtained, and further analysis of the ROC curve values demonstrated that a total PY-BOCS-II score of 13 points, when used as a cut-off for diagnosis, correctly identified OCD with a sensitivity of 90% and specificity of 94%. Given the modest sample size and case mix, these accuracy estimates should be replicated in larger cohorts to confirm generalizability.

Figure 1: Receiver Operating Characteristic curve for the diagnostic accuracy of the PY-BOCS-II in identifying OCD. This analysis includes data from patients who underwent blinded assessment, comprising a group with OCD (n = 20) and a group with mood and anxiety disorders (n = 18). The plot displays sensitivity (true positive rate) versus 1-specificity (false positive rate) across all possible cut-off scores of the PY-BOCS-II. The SCID-OCD was used as the gold-standard diagnostic tool. Abbreviations: AUC, Area under the curve; OCD, Obsessive-compulsive disorder; PY-BOCS-II, Portuguese Yale-Brown Obsessive-Compulsive Scale-II; ROC, Receiver operating characteristic. This figure was modified from Castro-Rodrigues et al.34. Please click here to view a larger version of this figure.
Here, we describe a detailed protocol for the cultural adaptation and comprehensive psychometric validation of a psychiatric diagnostic instrument. The protocol begins with the selection of the measure and then details the necessary experimental procedures and statistical analyses. The primary purpose of the protocol is to present a clear and standardized step-by-step procedure to adapt and validate a psychological measure, namely the Y-BOCS-II34, thereby minimizing confounding factors and undesired variability in clinical and research use. The methods focus on cultural adaptation and psychometric analysis, including criterion validity, both of which are essential when using an instrument in a new country or context with diagnostic intent13,17,19.
The Y-BOCS-II34, a clinician-administered interview for adults, allows for a detailed assessment of OCD. The instrument comprises two main parts: a 67 item Symptom Checklist to identify and classify current and past obsessions, compulsions, and avoidance behaviors, and a 10 item Severity Scale to rate the severity of those symptoms. Despite its gold-standard status and comprehensive structure, its criterion validity for diagnostic purposes had not been robustly established in the literature at the time the study was conducted. Therefore, the protocol aimed to address this broader gap while performing the necessary cultural adaptation for Portugal.
A critical step of the protocol is the rigorous cultural adaptation procedure, performed in line with existing guidelines and evidence-based standards. Equally important is maintaining rater blinding to diagnostic status when applying the psychometric measure, as knowledge of diagnosis can bias outcomes and compromise estimates of diagnostic accuracy53,54. This is particularly relevant for structured interviews, as in our example34. While compliance with these steps is essential, some features may vary by study (e.g., sample size, item distribution, measurement context, and the attainability of the construct)55. In addition, qualitative pilot testing using semi-structured cognitive debriefing with a standardized guide and team review of recurring themes provides actionable evidence for refinement (see Protocol steps 3.2.2-3.2.4). A practical example emerged during the adaptation of item 44: replacing "spouse" with "family member" ensured the instrument captured culturally appropriate reassurance-seeking targets in Portuguese contexts34.
Beyond translation quality and blinding, comprehensive validation requires principled quantitative assessment of reliability (internal consistency and test-retest stability), construct validity (e.g., factor structure), and criterion validity against an external gold standard15, following established principles and international guidance, such as COSMIN59. For example, for construct validity of the PY-BOCS-II34, convergent validity was examined against the Coimbra Obsessive-Compulsive Inventory (COI; Inventário Obsessivo de Coimbra)56, a Portuguese self-report measure with "frequency" and "emotional distress" subscales. While general guidelines for cross-cultural adaptation offer a useful foundation32, challenges such as overly literal translations and limited stakeholder involvement can compromise the final instrument's validity57. In the absence of a single consensus methodology33, the present protocol provides a transparent, step-by-step framework. Its advantages include mandatory qualitative pilot testing with the target population and blinded-rater assessment to minimize criterion contamination53, alongside explicit guidance for comprehensive psychometric validation to ensure clinical ethicality19. By distinguishing adaptation from validation33, the protocol is designed to yield a psychometrically sound instrument.
The methodology has been applied successfully in the study by Castro-Rodrigues et al.34 to assess criterion validity of the PY-BOCS-II clinician-administered interview for diagnosis of OCD. However, the framework is applicable to other formats (e.g., self-report scales and screening questionnaires). For these measures, qualitative pilot testing is paramount to ensure items are unambiguously understood in the
absence of a clinician58. Indeed, we have used variations of these methods for other measures and objectives: the Power of Food Scale60 and Yale Food Addiction Scale61 (reliability and construct validity), and instruments in oncology settings62. Lemos et al.63 adapted the Perceived Ability to Cope with Trauma Scale, and Almeida et al.64 adapted the Family Resilience Questionnaire-Short Form (FaRE-SF-P), both incorporating McDonald's Ω alongside Cronbach's α to provide robust internal consistency estimates, especially when tau-equivalence is not met65. This consistent methodological approach supports the efficient development of a comprehensive psychometric framework within a given population.
Our criterion-validity protocol has also been effective across clinical contexts. For the Hypomania Checklist-32 (HCL-32)66, a similar validation protocol was used, with a simplified adaptation process because the measure was already available in Portuguese (Brazilian variant) rather than European Portuguese67. The design for that project emphasized screening use in the context of bipolar spectrum disorders, over diagnostic confirmation. More recently, Almeida et al.68 evaluated the criterion validity of the BDI-II to measure depression severity in patients with cancer, highlighting how somatic symptom overlap can affect diagnostic accuracy.
These applications illustrate the protocol's adaptability across measure types (structured interviews, self-reports), constructs (psychiatric symptoms, appetite-related constructs, mood symptoms, coping, family resilience), and intended uses (diagnosis, screening, severity, psychological resources). With appropriate adjustments to address construct- or population-specific challenges, the protocol can be extended to large-scale screening in primary care and to diverse psychiatric diagnoses. It is also relevant to vulnerable populations where developmental, cognitive, or social factors can influence validity, and to digital health, where mobile-based assessments and digital therapeutics require culturally sensitive validation.
Implementers of this protocol may encounter practical challenges. During translation, if consensus is difficult, involving a senior independent mediator is recommended69. Slow recruitment at a single site can be mitigated through multicentre collaboration70. For culturally specific content, conceptual adaptation should be prioritized over literal translation, followed by rigorous pilot testing14,58. Ethical safeguards are also critical: evaluating capacity to consent, using a legally authorized representative when appropriate, and employing standardized tools (e.g., MacCAT) can support informed participation among individuals with moderate to severe symptoms47,48.
Limitations include the protocol's resource intensity (time, funding, bilingual experts, trained raters), which may challenge feasibility in low-resource settings. Criterion validation further depends on the availability of a well-established gold standard in the target culture. When such a benchmark is lacking, consensus diagnosis by independent experts is a viable, though demanding, alternative.
In conclusion, this protocol combines multiple empirically supported methods to culturally adapt and validate psychometric instruments for diagnostic use in psychiatry. Its successful applications across diverse instruments show its utility in generating psychometrically sound tools for clinical and research settings across cultural and linguistic contexts.
Albino J. Oliveira-Maia was investigator or national coordinator for Portugal of trials for depression, sponsored by Compass Pathways (EudraCT number 2017-003288-36) and Janssen-Cilag (EudraCT numbers 2019-002992-33, 2022-000439-22, 2022-000430-42); is recipient of a grant from Schuhfried for norming and validation of cognitive tests; has received payment, honoraria, consultancy fees or support for attending meetings and participating in advisory boards from MSD Portugal, Neurolite AG, Janssen-Cilag, the European Monitoring Centre for Drugs and Drug Addiction, Bioprojet Pharma and NaturalX Health Ventures; is Vice President of the Portuguese Society for Psychiatry and Mental Health; is head of the Psychiatry Working Group for the National Board of Medical Examination at the Portuguese Medical Association and Portuguese Ministry of Health; is President of the Ethics Committee for the Portuguese Institute for Addictive Behaviors and Dependence; and is President of the Scientific Council of the Portuguese Obsessive Compulsive Disorder Foundation. None of the aforementioned agencies had a role in the preparation, review, or approval of the manuscript or in the decision to submit the manuscript for publication.
This work received funding from the European Union’s Horizon research and innovation programme (PsyPal; grant agreement no. 101137378).
| Software | |||
| G*Power | Faul, F., et al. | www.gpower.hhu.de / Faul et al. (2007) | Software for a priori statistical power analysis to determine sample size. |
| IBM SPSS Statistics for Windows, Version 25.0. | International Business Machines (IBM) | IBM SPSS Statistics Corp. Released 2017 | Statistical software to perform statistical analysis |
| JASP Statistical software | JASP Team | Version 0.95 | Open-source software used for Confirmatory Factor Analysis. |
| Microsoft Excel | Microsoft | Office 365 Personal | Useful to create a provisional database of potential participants |
| Microsoft Word | Microsoft | Office 365 Personal | Convenient for write the Informed Consent and the content of the study |
| Randomg.org | https://www.random.org | Not applicable | Important for the randomization process |
| Interview & Psychometric Instruments | |||
| Beck Depression Inventory-II (BDI-II) | Pearson | Beck et al. (1996), Manual for the BDI-II | Self-report instrument for discriminant validity (depression). |
| Mini-International Neuropsychiatric Interview (MINI) | Sheehan, D.V., et al. | Sheehan et al. (1998), J Clin Psychiatry | Brief interview to assess for comorbid disorders and exclusion criteria. |
| State-Trait Anxiety Inventory - Form Y (STAI-Y) | Mind Garden | Spielberger, C.D. (1983), Manual for the STAI | Self-report instrument for discriminant validity (anxiety). |
| Structured Clinical Interview for DSM-IV, OCD Subscale (SCID-OCD) | American Psychiatric Press | First et al. (2002), SCID-I/P | Gold-standard interview for the criterion diagnosis of OCD. |
| Yale-Brown Obsessive-Compulsive Scale - Second Edition (Y-BOCS-II) | Goodman, W.K., et al. | Storch et al. (2010), Psychol Assess | The primary instrument for the adaptation and validation protocol. |
| Study Documents & Other Materials | |||
| Informed Consent Form | Developed for the study | Not available | Document outlining study procedures, signed by all participants. |
| Paper, Printer, Pencil | N/A | N/A | For printing and completing physical copies of the assessments. |
| Semi-structured Cognitive Debriefing Interview Guide | N/A | Available from authors upon request | Standardized guide used during the pilot test to collect qualitative feedback on the adapted instrument. |
| Telephone | N/A | N/A | To contact, screen, and schedule participants. |