Journal
/
/
A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
JoVE 杂志
行为学
Author Produced
需要订阅 JoVE 才能查看此.  登录或开始免费试用。
JoVE 杂志 行为学
A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

7,400 Views

12:39 min

January 18, 2020

DOI:

12:39 min
January 18, 2020

6 Views

成績單

Automatically generated

Hi, my name is Bertrand Schneider and I’m an assistant professor at the Harvard Graduate School of Education. In this video I’m going to show you how we can use mobile eye-trackers to capture a central construct into social science, joint visual attention. Joint visual attention has been extensively studied by psychologists and has been found to be closely correlated to the quality of the interactions between group members.

It turns out that when people build a common ground and create a shared understanding of a task, they tend to frequently look at the same place at the same time. Traditionally, researchers have studied joint visual attention qualitatively by manually coding videos. I’m going to show you how we can use mobile eye-trackers to get a quantitative measure of this construct in co-located settings.

In this video we will be using the Tobii Pro Glasses Two. These glasses are wearable eye-trackers that can capture eye movement in real-world environments. In addition to specialized cameras on the frame to track eye movements, the device is also equipped with a full-HD scene camera and a microphone, so the gaze behavior can be visualized within the context of the wearer’s visual field.

For these glasses gaze is captured 50 times per second and a live video feed from the glasses can be streamed to a computer either wirelessly, or through an ethernet cable. The glasses do have one limitation though, as they will not work over regular eyeglasses. The procedure to set up the eye-tracker is relatively straightforward.

First, the participants will be asked to put on the eye-tracking glasses as they would a normal pair of glasses. Based on the participants’distinct facial features, nose pieces with different heights may need to be used to preserve data quality. After turning on eye-tracker, the participants should clip the data collection unit to their persons to allow for unbridled body movements.

The Tobii Pro Glasses controller should be opened and the participants should be instructed to look at the center of the calibration marker provided by Tobii, while the calibration function of the software is enabled. Once calibration is complete, recording can be started from within the software. After the recording session is complete, terminate the recording from the Tobii software before instructing the participant to remove the eye-tracking glasses and the data collection unit.

And then, turn off the unit. Data can be extracted through another software, Tobii Pro Lab, by removing the SD card from the data collection unit and inserting the card to the computer. Tobii Pro Lab is able to import all of the recording sessions stored in the SD card at the same time.

Files can then be processed within Tobii Pro Lab to generate videos, different visualizations or be outputted as tab-separated values or TSV files for further analysis. Here you can see the raw eye-tracking data from a study we recently conducted, where two participants were learning to program a robot. On each side you can see the video stream generated by each eye-tracker with the location of the participant’s gaze.

As you can see, it is impossible to tell if they’re looking at the same place at the same time, because the point of view of each participant is different. Additionally, data recording may start at different times. This means that the data needs to be synchronized temporally and spatially.

I’m going to show you how to address these two issues in this video. First, I’m going to describe a procedure to synchronize the data temporarily. For the first participant you have a certain number of video frames.

Some of them are before or after the actual experimental task. Like the first frame, where the experimenter is calibrating the eye-tracker. Similarly, for the other participants you have the same kind of data.

It is not shown here but each frame of the video is also associated with an x and y coordinate that represents the gaze of each participant. To synchronize the data, we briefly show a fiducial marker on the computer screen before and after the experimental task. By using a computer vision algorithm, we can detect when this marker is presented to each participant, which allows us to trim and align the data.

So, this is one way of dealing with data synchronization issues. In the next parts, we are going to look at the second issue:How to synchronize the data spatially. As mentioned earlier, the data is coming from each eye-tracker in the form of a video feed associated with the location of each participant’s gaze, here in blue and green.

While the x and y-coordinates might be the same for both participants, it doesn’t mean that they’re looking at the same place because they’re looking at the screen from two different perspectives. One way of solving this issue is to build an image of the experimental setup that will serve as a reference, and where we’re going to remap the location of each participant’s case. This allows us to detect for each frame of the eye-tracking video, if the participants are looking at the same place at the same time.

But how do we remap these coordinates into the image on the left? We are going to use the same cooperative vision algorithm that allowed us to synchronize the data earlier. By applying it on each frame of the video recordings, we can now detect the location of the fiducial markers from the perspective of the participants.

This allows us to connect the same markers on the reference image on the left. By knowing the coordinates of this shared set of points, we can infer the location of each person’s gaze using a mathematical operation known as a homography. By applying this procedure on each frame, we can generate a video to make sure that the homography worked.

On the right side you can see the video recording of each participant with the location of their gaze in blue and green. The same fiducial markers are connected with a white line between the image on the left and the participant’s point of view on the right hand side. The remapped gazes are shown on the left and they turn red when there is some joint visual attention.

Generating this video is an important step toward making sure that the data is clean and that the homography was performed correctly. Additionally, there are two other visualizations that can be produced to sanity check the data. The first visualization is a heat map.

For each participant we can plot each gaze point on the image of the experimental setup. This ensures that the homography worked correctly and allows us to categorize these case points into different areas of interests. Here, for example, we can see that most of the time was spent looking at the computer screen and very little time was spent looking at the cheat sheets.

The second visualization is called a cross recurrence graph. Cross recurrence graphs allows us to visualize eye-tracking data for pair of participants. Time for the first participant is displayed on the x-axis, time for the second participant is displayed on the y-axis.

Black squares mean that both participants are looking at the same place, white square represent missing data and gray square represent when participants are looking at different locations. Black squares along the diagonal mean that they’re looking at the same place at the same time. Black squares off the diagonal mean that participants are looking at the same place but at different times.

On the left, you can see a dyad with high levels of joint visual attention. In the middle, a dyad with low levels of joint visual attention. On the right side, a group with a lot of missing data.

By performing this sanity checks, you can make sure that you have correctly synchronized and remapped your data into a common image of the experimental setup. These steps are critical and need to be performed before any analysis takes place. Lastly, there are two parameters that need to be picked before computing a measure of joint visual attention.

The first parameter is the time window in which to participants can look at the same location. Previous work by Richardson and Dale had established that it can take up to two seconds for participants to disengage from what they’re doing to pay attention to what their partner is doing. Thus, we determined that there is joint visual attention if two participants have looked at the same place within a plus and minus two second window.

The second parameter is the minimum distance between two gaze points for them to qualify as joint visual attention. This distance is context-dependent and needs to be defined by researchers depending on the task administered and their research questions. For some tasks, the distance might be short.

Here, for example, we used 100 pixels. For other tasks this distance might be larger. Next, I’m going to present some results found using this methodology.

After you get an estimate for the amount of joint visual attention in the group, you can correlate this measure with other variables of interest. For example, in the work we have correlated this measure with the rating scheme developed into learning sciences that captures a groups quality of collaboration. For each group we assigned a score on the nines of dimensions presented here.

For example, how well people sustained mutual understanding or how easily they reached a consensus. These scores have to receive acceptable inter reliability with another quota. Finally, we can also aggregate those scores into one general metric that approximates collaboration quality for each group.

A result that has been found in our work as well as other studies is that joint visual attention is significantly correlated with collaboration quality, as measured by the rating scheme presented earlier. Groups that are rated highly using this coding scheme tend to have more joint visual attention than the groups who received low scores. This shows that productive interactions are oftentimes associated with more joint visual attention.

On the next slide I’m going to show you another result that builds upon this finding. So, one advantage of having fine eye-tracking data is that we can extract other measures of joint visual attention. For example, we can compute who initiated and responded to an offer of joint visual attention.

In particular, on the x-axis of this graph the score of zero means equal distribution of these behaviors and a score of one means that one person was always responding or initiating moments of join visual attention. In this study we found an inverse correlation with learning gains shown on the y-axis as measured by pre and post-tests. Groups where the same person was consistently initiating or responding to moments of joint visual attention were less likely to learn and groups where this responsibility was equally shared were more likely to score higher on the post-tests when controlling for scores on the pre-test.

In this video I’ve presented the methodology that helps researchers synchronize mobile eye-tracking data both temporally and spatially. Findings suggest that dual eye-tracking data can provide indicators of collaboration by computing measures of joint visual attention. Additionally, I’ve presented results showing that we can go beyond simple measures of joint attention, for example, by looking at who initiated or responded to an episode of joint visual attention.

We found that this measure was related to other outcome measures, such as learning gains. Computing this kind of measure would not be possible without eye-tracking data. In summary, we found that the methodology presented in this video can help researchers gain new insights into collaborative processes.

Thank you so much for watching this video and feel free to refer to the paper for more information.

Summary

Automatically generated

Using multimodal sensors is a promising way to understand the role of social interactions in educational settings. This paper describes a methodology for capturing joint visual attention from colocated dyads using mobile eye-trackers.

Read Article