Login processing...

Trial ends in Request Full Access Tell Your Colleague About Jove
JoVE Science Education
Experimental Psychology

You have full access to this content through Seoul National University of Education


Reliability in Psychology Experiments



Scientific research uses precise methods to collect data, yet variability in obtaining measurements often exists.

Reliability can be assessed for any experimental measurement, and today, we’ll have a look at measurements of inappropriate behaviors in cartoons.

When viewers agree on the amount of inappropriate material within the same show—across multiple episodes—their judgments are considered highly reliable. In this case, assessments can extend across different shows because of the consistency between observers, which is referred to as inter-rater reliability.

This video demonstrates how to design and perform, as well as how to analyze and interpret, an experiment examining whether one cartoon has more inappropriate content than another.

To examine reliability and inter-rater reliability, a within-subjects design is used in this experiment. Participants are asked to watch two episodes of two different cartoons—SpongeBob SquarePants and Caillou.

Within this context of cartoon watching, the dependent variable is the number of inappropriate behaviors participants observe. These include: any crude and rude behaviors, bad language, verbal and physical aggression, and references to drugs and sexual content.

If reliability exists in the scoring of inappropriate content of a specific cartoon, participants will consistently rate that cartoon across different episodes.

Moreover, if multiple participants are in agreement with the number of inappropriate instances they count, inter-rater reliability exists.

Thus, establishing inter-rater reliability allows researchers to use the same participants to more powerfully compare data between multiple conditions.

To conduct the study, prepare four clips: two different episodes from two different cartoons, SpongeBob SquarePants and Caillou.

To allow participants to systematically identify instances of inappropriate behavior, create a coding sheet with categories, concrete examples, and space to count each occurrence.

With the participant sitting in front of the screen, hand them four coding sheets. Instruct the participant to separately watch two episodes of SpongeBob SquarePants.

As the participant watches each episode, instruct them to identify every occurrence of inappropriate behavior.

Using the same coding scheme, instruct the participant to watch and rate two episodes of Caillou.

To analyze the reliability of participants’ ratings of cartoon content, compare the coding sheets between each participant across the different episodes of cartoons. Sum all of the responses on a master sheet.

Graph the total number of inappropriate behaviors for each rater across episodes and cartoons.

Note that high reliability was observed in the scoring of the two different cartoons, as SpongeBob is consistently scored higher than Caillou.

However, stronger inter-rater reliability was found in the scoring of inappropriate content in Caillou compared to SpongeBob. Reduced inter-rater reliability was more obvious in the scoring of Episode 2 of SpongeBob.

Now that you are familiar with reliability in the context of content analysis, you can apply this approach to other areas of research. 

Many psychological experiments gather information by utilizing cognitive assessments and surveys, in which reliability between each of the items must be consistent between participants.

Reliability in neurophysiological measures, such as EEG or eye tracking, is essential to conducting repeatable experiments. This reliability allows researchers to make associations between brain function and disease states across multiple subjects.

Additionally, researchers must ensure certain measurements in an experiment are consistent over time. For example, weight measurements are reliably taken to compare data before and after exercise routines.

You’ve just watched JoVE’s introduction to determining reliability in psychological experiments. Now you should have a good understanding of how to quantify a psychological construct such as inappropriate behavior, design an experiment, and finally how to evaluate reliability from the results.

Thanks for watching! 

Read Article

Get cutting-edge science videos from JoVE sent straight to your inbox every month.

Waiting X
simple hit counter