Reliability in Psychology Experiments - JoVE Wordpress Development

JoVE Science Education
Experimental Psychology

A subscription to JoVE is required to view this content. Sign in or start your free trial.

JoVE Science Education Experimental Psychology

Reliability in Psychology Experiments

00:00Overview
00:57Experimental Design
02:09Running the Experiment
02:58Representative Results
03:47Applications
04:44Summary

English العربية 中文 Nederlands Français Deutsch עברית Italiano 日本語 한국어 Português русский Español Türkçe

심리적 실험의 신뢰성

English

Overview

출처: 게리 레반도프스키,데이브 스트로메츠, 나탈리 시아로코-몬머스 대학교 의 연구소

과학적으로 무언가를 공부하기 위해, 연구원은 그것을 정량화하는 쪽을 결정해야 합니다. 그러나 심리적 구조는 측정하고 정량화하는 데 어려울 수 있습니다. 이 비디오는 콘텐츠 분석의 컨텍스트에서 신뢰성을 검사합니다.

저널 소아과의 최근 연구에 따르면 빠르게 진행되는 만화를 본 4세 어린이는 게임에서 규칙을 따르고, 성인의 지시를 듣고, 느린 진행 중인 만화를 본 다른 아이들에 비해 만족을 늦추는 등 인지 작업에 더 나쁜 성과를 보였습니다. ¹ 만화의 속도 외에도 만화의 콘텐츠는 젊은 시청자에게 해로운 영향을 미칠 수 있습니다.

이 비디오는 간단한 2 그룹 디자인을 사용하여 신뢰성 문제를 예시하여 만화 스폰지 밥 스퀘어 팬츠가 만화 카일루보다 더 부적절한 콘텐츠를 가지고 있는지 여부에 대한 질문을 검토합니다.

Procedure

1. 주요 변수를 정의합니다. 부적절한 콘텐츠에 대한 운영정의(즉,연구원이 개념에 의해 의미하는 바를 정확하게 설명)를 만듭니다. 조직 TV 부모 지침에 의해 생성 되고 연방 통신 위원회의 승인을 받은 정의를 참조하십시오. 부적절한 내용은 조잡하거나 무례한 행동(예 :화장실 유머), 언어 적 또는 신체적 침략묘사(예 :이름 부르기, 타격 등),나쁜 …

Results

The results indicate that the raters had a high level of agreement or consistency in their ratings within each cartoon episode, which indicates high inter-rater reliability (Figure 1). There is also reliability or consistency in SpongeBob SquarePants episodes having more inappropriate content than Caillou. The results also revealed individual biases amongst raters. For example, Rater 3 reported more inappropriate content in SpongeBob than the other 2 raters, and Rater 1 reported less in Caillou than other raters.

Figure 1. Instances of inappropriate content by rater and cartoon for episodes 1 (top) and 2 (bottom).

Applications and Summary

Researchers have increasingly turned their attention toward analyzing television’s content, especially as it relates to children. As discussed prior to this current experiment, a recent study in the journal Pediatrics correlated the fast pace of the SpongeBob SquarePants cartoon to relatively poor cognitive abilities in the children who watch it.

Since the results of our experiment appear reliable, future research could examine whether the relative amount of inappropriate content in SpongeBob is also (or alternatively) responsible for children’s lower cognitive performance after watching.

One of the most important applications of reliability is in the use of survey instruments. Researchers must be sure that participants will consistently answer each of the items in a particular scale. That is, in a 5-item measure of life satisfaction, participants should answer items 1 and 2 in a somewhat similar fashion to how they answer questions 3, 4, and 5. In addition, researchers want to make sure that their measurements in an experiment are consistent over time. So if a researcher is using pupil dilation to indicate interest in a stimulus, the researcher must be sure that pupil dilation is a consistent indicator of interest.

References

Lillard, A. S., & Peterson, J. The Immediate Impact of Different Types of Television on Young Children's Executive Function. Pediatrics. 128(4):644-9. doi: 10.1542/peds.2010-1919 (2011).

Transcript

Scientific research uses precise methods to collect data, yet variability in obtaining measurements often exists.

Reliability can be assessed for any experimental measurement, and today, we’ll have a look at measurements of inappropriate behaviors in cartoons.

When viewers agree on the amount of inappropriate material within the same show—across multiple episodes—their judgments are considered highly reliable. In this case, assessments can extend across different shows because of the consistency between observers, which is referred to as inter-rater reliability.

This video demonstrates how to design and perform, as well as how to analyze and interpret, an experiment examining whether one cartoon has more inappropriate content than another.

To examine reliability and inter-rater reliability, a within-subjects design is used in this experiment. Participants are asked to watch two episodes of two different cartoons—SpongeBob SquarePants and Caillou.

Within this context of cartoon watching, the dependent variable is the number of inappropriate behaviors participants observe. These include: any crude and rude behaviors, bad language, verbal and physical aggression, and references to drugs and sexual content.

If reliability exists in the scoring of inappropriate content of a specific cartoon, participants will consistently rate that cartoon across different episodes.

Moreover, if multiple participants are in agreement with the number of inappropriate instances they count, inter-rater reliability exists.

Thus, establishing inter-rater reliability allows researchers to use the same participants to more powerfully compare data between multiple conditions.

To conduct the study, prepare four clips: two different episodes from two different cartoons, SpongeBob SquarePants and Caillou.

To allow participants to systematically identify instances of inappropriate behavior, create a coding sheet with categories, concrete examples, and space to count each occurrence.

With the participant sitting in front of the screen, hand them four coding sheets. Instruct the participant to separately watch two episodes of SpongeBob SquarePants.

As the participant watches each episode, instruct them to identify every occurrence of inappropriate behavior.

Using the same coding scheme, instruct the participant to watch and rate two episodes of Caillou.

To analyze the reliability of participants’ ratings of cartoon content, compare the coding sheets between each participant across the different episodes of cartoons. Sum all of the responses on a master sheet.

Graph the total number of inappropriate behaviors for each rater across episodes and cartoons.

Note that high reliability was observed in the scoring of the two different cartoons, as SpongeBob is consistently scored higher than Caillou.

However, stronger inter-rater reliability was found in the scoring of inappropriate content in Caillou compared to SpongeBob. Reduced inter-rater reliability was more obvious in the scoring of Episode 2 of SpongeBob.

Now that you are familiar with reliability in the context of content analysis, you can apply this approach to other areas of research.

Many psychological experiments gather information by utilizing cognitive assessments and surveys, in which reliability between each of the items must be consistent between participants.

Reliability in neurophysiological measures, such as EEG or eye tracking, is essential to conducting repeatable experiments. This reliability allows researchers to make associations between brain function and disease states across multiple subjects.

Additionally, researchers must ensure certain measurements in an experiment are consistent over time. For example, weight measurements are reliably taken to compare data before and after exercise routines.

You’ve just watched JoVE’s introduction to determining reliability in psychological experiments. Now you should have a good understanding of how to quantify a psychological construct such as inappropriate behavior, design an experiment, and finally how to evaluate reliability from the results.

Thanks for watching!