$$\rightleftharpoonup{xx}$$
$$\longleftharp{xx}$$,
$$\longrightharp{xx}$$,
Here, we provide a method for the reliable detection of squint at high temporal resolution using DeepLabCut. We optimized training parameters, and we provide an evaluation of this method's strengths and weaknesses (Figure 1).
After training our models, we verified that they were able to correctly estimate the top and bottom points of the eyelid (Figure 2), which serve as the coordinate points for the Euclidean distance measure. Euclidean distance is defined as the average lengths of the distances between the two top and bottom points of the eye. Our model was able to detect instances of non-squint (Figure 2A) and squint (Figure 2B). The blue dots indicate points used to determine the Euclidean distance for each frame. The green, yellow, orange, and purple dots were used to help the model correctly estimate the Euclidean distance and decrease the likelihood value when the head is in a suboptimal position (i.e., accounting for head movement and changes in position across sessions). We then validated the accuracy of the model using a number of different methods.
To validate the ideal number of frames used for the model, we trained and tested four models of varying sample frame size (Figure 3). We first compared the root mean square error (RMSE) values between the test and training data to validate how well the models could accurately predict test data that they had not been trained on. This comparison showed that variability between the manually labeled points and the model-labeled points leveled off after 300 frames. This trend correlated with the reported averages for likelihood that also appeared to level off after 300 labeled frames. We used these reported likelihood values to filter points that were less than 0.92. These likelihood values indicate how confident the model is that a given point was labeled correctly based on the training data. We averaged these values for the points that contribute to the Euclidean distance metric to examine how well the models performed relative to one another. While there was no significant difference between 300 and 400 frames, we used 400 frames because it averaged above the 0.95 likelihood value, which is nearing our threshold for manual filtering and aligns with the threshold utilized in similar models for pose estimation16.
Another way that we validated the accuracy of the model was with a confusion matrix comparing manually annotated frames to DLC-labeled frames. Two blinded individuals manually annotated 300 frames of the same eye in eight videos. We used these data to construct a confusion matrix to assess true and false positives and negatives (Figure 4), where manually scored data were used as the ground truth. For DLC, a positive squint value was recorded when the Euclidean distance was recorded as less than 75 pixels (i.e., the animal squints), and a negative value was recorded for values greater than 75 pixels (i.e., the animal does not squint). We found a positive predictive value of 96.96%, which is the percentage of time the model accurately predicts squint relative to a manually annotated squint. We found a negative predictive value of 99.66%, which is the percentage of time the model accurately predicts no squint relative to manually annotated squint. These show the proportion of negative and positive values that were correctly labeled. We also found a true positive rate of 98.1% and a true negative rate of 99.46%, which represent the model's accurate prediction of positive and negative values relative to all values positive and negative values, respectively. Our Matthews correlation coefficient, or MCC, was 93.8%, indicating the correlation coefficient between observed and predicted values.
Once we were confident that our model reliably tracks squint, we compared this DLC method against a previously published squint tracking method using a preclinical migraine dataset14. We will refer to this other method as the "area squint model (ASM)" because it was developed using open eye area as the continuous variable measuring squint14. The area squint model utilizes trained facial detection software combined with a custom MATLAB script to analyze the mean pixel area of the eye while excluding frames with a tracking error rate of >15%14. One major limitation is that the "ASM" is not open source and, therefore, not widely accessible. DLC allows for increased optimization and adaptability without requiring a significant purchase of software and hardware.
We used a data set of 10 female and 10 male CD1 mice. Experimentally, all animals were acclimated in gentle restraints for 30 min over a total of 3 days prior to the start of recordings. Each animal was recorded for 5 min of baseline and then 5 min for treatment recordings. During treatment sessions, animals were treated with either PBS (vehicle) or 0.1 mg/kg CGRP (treatment) intraperitoneally to induce a migraine-like state. Data were collected in a well-lit room using cameras equipped with infrared light to illuminate the face, ensuring accurate landmark detection. The infrared camera included a Kowa LM35JC 2/3" 35 mm F1.6 manual iris C-mount lens with a focal distance of 254 mm and an appropriately adjusted aperture. After we collected the data, we utilized the ASM and DLC to analyze the data. Since manual scoring has been conventionally utilized in the field to quantify facial grimace, with squint being one component of the facial grimace14, we also compared our data to manually scored data.
Based on previous findings that peripheral injection of CGRP induces a squint response in mice, we expected to observe significant differences in squint response between vehicle and CGRP treatment6,14. We compared ASM, manual, and DLC methods and found that our model robustly detected a squint phenotype, as did the manual and ASM methods (Figure 5). It is important to note that the ASM model was used to assess CGRP-induced pain and squint. In that study, Rea et al. compared squint response following CGRP to squint response following formalin injection of the hind paw as a "more traditional" pain induction assay14. Moreover, CGRP is well documented as inducing touch hypersensitivity in mice through the use of von Frey3,17. Consistent with the field, we normalized the average squint during the treatment session to a 5 min pretreatment baseline for each animal and compared PBS (n = 10) versus CGRP-treated (n = 11) animals. Statistical analyses of the PBS versus CGRP-treated groups are as follows. We found that CGRP-treated animals exhibited decreased mean pixel area using the area squint method of tracking (p = 0.012, Figure 5A) and exhibited decreased Euclidean distance when manually scored (p = 0.0007, Figure 5B) and using our DLC model (p = 0.007, Figure 5C). When we compared each method over time in a single representative animal, the same pattern was observed (Figure 5). This animal showed a very clear squint phenotype in response to CGRP treatment but not to PBS. All models were able to detect these differences, but the data were most clearly represented in our DLC model (Figure 5). Precise and accurate metrics are especially important when data must be analyzed at finer resolutions where averaging is not indicative of the complete behavioral readout (e.g., brain activity). The DLC method of detecting squint in mice allows us to collect data at a millisecond timescale and time-lock it to measures of brain activity (e.g., local field potentials), which occurs on a millisecond time scale. We can then utilize this technique to build a more robust profile of a brain state indicative of spontaneous pain in the context of migraine and other complex brain disorders.

Figure 1: Overview of the procedure for generating a trained network with DLC. General schematic of the process by which eye features of an animal are tracked and then analyzed using machine learning. Abbreviation: DLC = DeepLabCut. Please click here to view a larger version of this figure.

Figure 2: Example of automated squint tracking in a representative CD1 mouse. (A) Example of a frame showing DLC tracking squint (colored dots) on the outline of the eye during the treatment day when the mouse is not squinting. (B) Example of a frame showing automated detection of squint on the treatment day, using our DLC model. Euclidean distance was measured using the average distance between B and C, the blue dots, on the top and bottom of the eye. The blue sets of dots at the top and bottom of the eye are used when tracking Euclidean distance. The other points (green, yellow, orange, purple) are framing landmarks used to both help the model estimate the Euclidean distance points and filter out suboptimal head positioning after data collection. Abbreviation: DLC = DeepLabCut. Please click here to view a larger version of this figure.

Figure 3: Justification for the number of frames used to train the model. (A) Root-mean squared error analysis indicates the average distance between predicted and observed values for test and train data sets. The training data set represents the frames sampled when training the model, and the test data set represents the non-training frames used to validate how well the model could identify similar but different images. We used five sets of training and test data and found that RMSE values leveled off around 300 frames for the test group. (B) The likelihood that a given point is correctly labeled (mean + SEM). This showed that 400 manually labeled frames were ideal because the raw data sets averaged above 0.95 likelihood, while having an RMSE score closest to that of the training data. This meant the model was able to closely approximate the points it had been trained on while also reporting on most of the frames with a high likelihood. Abbreviation: RMSE = root-mean squared error. Please click here to view a larger version of this figure.

Figure 4: Confusion matrix for DLC squint measurements. We sampled 300 s from eight videos (five CGRP and three PBS) and compared those points to a manually labeled binary yes or no score for squint. We quantified predicted values as those identified by DLC and actual values as those scored manually by a human. We then compared this to the manually scored data to see how often squint was correctly identified relative to that manually scored binary yes or no of squint. Abbreviations: DLC = DeepLabCut; CGRP = calcitonin-gene-related peptide; PBS = phosphate-buffered saline; TP = true positives; FP = false positives; FN = false negatives; TN = true negatives; PPV = positive predictive value; NPV = negative predictive value; TPR = true positive rate; TNR = true negative rate; MCC = Matthew's correlation coefficient. Please click here to view a larger version of this figure.

Figure 5: Squint phenotype across three different models for detecting squint. Top two rows contain the same representative animal with each condition (PBS or CGRP) across three different models for detecting squint. Bottom row reflects averages across all animals. (A) There was a decrease in mean pixel area (mean overall pixel area/baseline) in CGRP-treated versus PBS-treated mice (t(18) = 2.805, p = 0.012) after processing all data using the previously published and validated area squint model14. (B) There was a similar response in manually scored data (t(18) = 4.064, p = 0.0007). (C) CGRP-treated mice showed decreased average eyelid to eyelid distance (treatment Euclidean distance/pretreatment Euclidean distance, baseline) than PBS-treated mice (t(18) = 3.040, p = 0.007 when utilizing DLC to process all data. N = 20 (10 females, 10 males). Error bars indicate mean ± SEM. Please click here to view a larger version of this figure.