Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

Hirotsugu Azechi; Susumu Takahashi

doi:10.3791/69506

Method Article

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

DOI:

10.3791/69506

⸱

November 7th, 2025

Hirotsugu Azechi¹ , Susumu Takahashi¹

¹Laboratory of Cognitive and Behavioral Neuroscience, Graduate School of Brain Science, Doshisha University

Summary

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol introduces vmTracking, a method that enables high-accuracy pose tracking in videos of multiple freely moving, markerless animals, even under crowded conditions where animals are densely grouped. This approach provides reliable data for analyzing social interactions that occur in semi-natural environments where animals can move freely.

Abstract

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

In social behavior research using rodents, there is a growing demand for evaluating more natural interactive behaviors under freely moving conditions. Accurate pose tracking of multiple animals is essential for this purpose. However, current markerless multi-animal pose tracking tools face a significant challenge: tracking accuracy tends to decline under conditions of occlusion and crowding. This problem becomes especially pronounced when the animals are visually indistinguishable from one another. To overcome this issue, we developed virtual marker tracking (vmTracking), a method that improves the accuracy of multi-animal pose tracking under such challenging conditions by maintaining individual identity across frames using virtual markers. vmTracking can also be applied to existing markerless multi-animal video data by incorporating additional processing steps that add individual identity labels into standard tracking workflows. Here, we describe both the method for assigning virtual markers and the protocol for tracking animals in the resulting labeled videos. High-accuracy multi-animal tracking enabled by vmTracking provides a reliable foundation for subsequent quantitative analyses of social interactions under semi-natural conditions.

Introduction

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Research on rodent social behavior has increasingly emphasized the study of interactions in more naturalistic social contexts¹. Traditional approaches, such as the three-chamber test, have been widely used to evaluate sociability and social preference², but these paradigms capture only simplified aspects of social interaction³. To address this need, multi-animal DeepLabCut (maDLC)⁴ and Social LEAP Estimates Animal Poses (SLEAP)⁵ have become indispensable tools for markerless pose tracking of multiple animals in free-moving conditions, providing advanced deep learning-based technology for markerless multi-animal pose tracking. However, in crowded environments where animals are in close proximity, these tools are prone to tracking errors such as prediction failures and identity (ID) switches. Achieving reliable data under such conditions often requires extensive manual correction.

Accurate pose tracking of multiple freely moving animals opens the door to investigating more complex social dynamics. For instance, detailed tracking data allow the evaluation of leader-follower relationships, approach-avoidance behaviors during movement, and other nuanced patterns of interaction that cannot be reliably assessed with conventional behavioral assays. Such advances extend the utility of multi-animal tracking beyond relatively simple descriptive measures of behavior, enabling analyses of more complex social interactions.

One approach to improve tracking accuracy is to clearly differentiate individual animals. For example, Bordes⁶ demonstrated simultaneous tracking of a white CD1 mouse and a black C57BL/6N mouse using single-animal DeepLabCut (saDLC)⁷. This finding suggests that when animals are visually distinguishable, accurate individual tracking is feasible. However, the use of physical markers for identification may alter natural behaviors. Even minimally invasive methods, such as implanting radio-frequency identification tags⁸, raise concerns regarding their potential effects on behavior.

Here, we present a multi-animal pose tracking method that uses virtual markers (vmTracking)⁹ for non-invasive individual identification. Accurate evaluation of social behavior, taking into account its social context, requires highly precise multi-animal pose tracking data. vmTracking was developed to meet this need by providing a reliable protocol-rather than a new software or algorithm-for obtaining such high-quality tracking data using existing tools such as DLC. Therefore, social behavioral analysis based on these data is beyond the scope of this protocol. vmTracking involves two main steps: adding virtual markers to markerless videos and tracking the resulting virtual marker videos. This method enables high-accuracy pose tracking without the need for physical markers, even in conditions involving three or more visually indistinguishable animals of the same strain. By allowing reliable tracking under semi-naturalistic conditions, vmTracking provides an effective tool for advancing research on complex social interactions, with potential applications across behavioral science, psychology, and neuroscience.

Protocol

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

All experimental procedures were approved by the Doshisha University Animal Care and Use Committee (Approval No. A23068). C57BL/6J mice (see Table of Materials) were housed in groups of 2-3 per cage under controlled conditions of 24-26 °C, a 12 h light/dark cycle, with ad libitum access to food and water. All behavioral recordings were collected during the light phase. The overall workflow of the vmTracking process, along with snapshots from an example, is shown in Figure 1.

NOTE: vmTracking is not a new software or algorithm but a protocol that uses existing DLC functions to maximize multi-animal tracking accuracy. Consequently, a substantial portion of the following steps describes specific graphical user interface (GUI) operations within DLC to ensure reproducibility.

1. Preparation of markerless multi-animal video data

NOTE: Experimental parameters can be adjusted according to the specific study.

Place three C57BL/6J mice in a circular open field with a white floor and transparent walls (31 cm diameter × 17 cm height; see Table of Materials), and allow them to move freely.
Using a top-down camera view, record the animals for approximately 60 min at a resolution of 460 × 460 pixels and a frame rate of 30 fps (see Table of Materials).

2. Creation of a virtual marker video

Perform standard markerless pose tracking using maDLC in DLC version 2.2.3 (Figure 2; see Table of Materials).
NOTE: In the steps below, DLC GUI tab menu names and parameter settings are shown in boldto clarify operations in DLC. All procedures are performed using default settings unless otherwise specified. For detailed instructions on using DLC, refer to the official documentation¹⁰.
1. Create a multi-animal project in Manage Project (Supplementary Figure 1) for markerless video tracking. Modify the individuals and body parts fields in the config.yaml file according to the analysis requirements. In this protocol, individual1 to individual3 and bodypart1 to bodypart6 are used. All other settings are optional.
  NOTE: The parameters numframes2pick (number of extracted frames) and dotsize (marker size) may be adjusted based on the video conditions and the experimenter's workload. Other settings were not systematically tested under all possible conditions; however, successful tracking has been confirmed with default configurations.
2. Extract frames from the video in Extract Frames (Supplementary Figure 1) for subsequent labeling and training. Set all other parameters to their default values.
3. Label each extracted frame in Label Data (Supplementary Figure 1), assigning the appropriate body parts to each visible individual without maintaining consistent IDs between frames.
4. Create a training dataset from the extracted frames in Create Training Dataset (Supplementary Figure 1). In this protocol, set the network to dlcrnet_ms5. Set all other parameters to their default values.
5. Start training the network in Train (Supplementary Figure 1). In this protocol, the Maximum iterations parameter is set to 200,000 for training. Set all other parameters to their default values.
6. Evaluate the trained network in Evaluate (Supplementary Figure 1). Set all other parameters to their default values.
7. Run pose estimation on the selected videos in Analyze videos (Supplementary Figure 1), generating .h5 coordinate data and other output files in the videos folder inside the project folder. Set all other parameters to their default values.
  NOTE: If the training dataset is insufficient, analysis may fail to generate .h5 coordinate files. In that case, restart from step 2.1.2.
8. Manual verification by the experimenter.
  1. Create a labeled tracking video in Create videos (Supplementary Figure 1). Select the Create video with animal ID colored? option in the DLC GUI, and generate a labeled tracking video (Figure 2A, maDLC). Set all other parameters to their default values.
  2. Inspect the labeled videos (Figure 2A, maDLC) to check for ID switches and missing predictions. If noticeable errors are observed, perform retraining (see step 2.1.8.3). If retraining is not performed, proceed to step 2.2.
    NOTE: Retraining is not mandatory, but it improves accuracy and reduces workload in later steps.
  3. For retraining, extract and annotate outlier frames in Extract/Refine Outliers (Supplementary Figure 1). After annotation is completed, click the Merge dataset button to automatically proceed to Create Training Dataset, and then repeat from step 2.1.4 onward.
Correct the maDLC results to create a virtual marker video (Figure 3).
1. Using the custom GUI¹¹ (Figure 3B), retain only the selected keypoints for virtual markers and replace all others with NaN in the .h5 file. This reduction in keypoints decreases the workload for subsequent correction steps (Figure 3C,E). In this protocol, bodyparts 2 and 4 are used.
  NOTE: Selecting two keypoints that efficiently define the body-axis vector is recommended. Performing the initial maDLC tracking using only the keypoints intended as virtual markers may fail to generate the .h5 file in step 2.1.4. Therefore, performing the initial tracking with a larger set of keypoints is recommended.
2. Using Refine tracklets (Supplementary Figure 1), correct the .h5 file from the previous step, containing only the selected keypoints. Correct the keypoints so that each individual retains a consistent ID throughout the video (Figure 3E,F). Save the file; the .h5 file in the videos folder will be updated automatically.
3. Create a labeled video with ID color labels from the corrected data in Create videos (Supplementary Figure 1, Figure 3A, Virtual Marker). In DLC 2.2, an existing labeled video in the videos folder will prevent a new one from being created; remove, rename, or move the file before proceeding. Here, set the marker size to 2 in the configuration file, using the smallest size that allows identification without obscuring the animal in this dataset.
4. Rename the created video to a short filename to prevent errors during DLC processing when re-tracking in the next step with saDLC.

3. Perform pose tracking of virtual marker video using saDLC (Figure 4)

NOTE: All procedures are performed using default DLC GUI settings unless otherwise specified.

Create a new saDLC project in Manage Project (Supplementary Figure 2) for virtual marker video tracking. In this protocol, list all combinations of individuals and body parts in the bodyparts field of the config.yaml file (e.g., individual1-bodypart1, individual2-bodypart1, …, individual3-bodypart6). All other settings are optional.
NOTE: As in step 2.1.1, the parameters numframes2pick and dotsize may be adjusted according to video conditions and the experimenter's workload. Successful tracking has been confirmed with default configurations.
Extract frames from the virtual marker video in Extract Frames (Supplementary Figure 2). Set all other parameters to their default values.
Label each extracted frame in Label Data (Supplementary Figure 2), using the virtual markers to assign individual IDs so that both IDs and bodyparts match across frames (Figure 3B-D). When an individual's ID is known and a body part is partially hidden, estimate its position and place the label accordingly. If individual IDs cannot be uniquely determined (e.g., virtual markers for ≥2 individuals are missing in the frame), do not label those body parts.
Create a training dataset from the extracted frames in Create Training Dataset (Supplementary Figure 2). In this protocol, set the network to efficientnet-b0. Set all other parameters to their default values.
Start training the network in Train (Supplementary Figure 2). In this protocol, set the Maximum iterations parameter to 200,000 for training. Set all other parameters to their default values.
Evaluate the trained network in Evaluate (Supplementary Figure 2). Set all other parameters to their default values.
Run pose estimation on the selected videos in Analyze videos (Supplementary Figure 2), generating .h5 coordinate data and other output files in the videos folder. Set all other parameters to their default values.
Manual verification by the experimenter.
1. Create a labeled tracking video in Create videos (Supplementary Figure 2). Set all other parameters to their default values.
2. Inspect the labeled videos (Figure 4A, vmTracking) to check for tracking inconsistencies. Retrain until the experimenter is satisfied with the results. If retraining is not required, the procedure is complete.
  NOTE: After confirming satisfactory tracking results, you may create a presentation-quality video without virtual markers (Figure 5), output the tracking results on the original markerless video instead of the virtual marker video in Create videos (Supplementary Figure 2). Replace the virtual marker video in the videos folder with the markerless video, renaming it to match the virtual marker video filename used earlier in the analysis. For DLC 2.2, ensure no labeled videos are present in the folder before creating the new one (Figure 5B,C).
3. For retraining, extract and annotate outlier frames in Extract/Refine Outliers (Supplementary Figure 2). After annotation is completed, click the Merge dataset button to automatically proceed to Create Training Dataset, and then repeat from step 3.1.4 onward.

Results

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Tracking accuracy was evaluated by comparison with a manually generated ground truth (GT). Twenty 5- or 10-s scenes were extracted and classified as either crowded (CR) scenes, which included overlaps such as mice crossing paths (12 scenes; total of 1,950 frames), or non-crowded (nCR) scenes without overlaps (8 scenes; total of 1,350 frames). Tracking predictions within 10 pixels of the GT position were counted as matches (Match); predictions ≥10 pixels away were counted as false positives (FP); and missing predictions were counted as false negatives (FN). In addition, if the predicted identity differed from that in the immediately preceding frame, it was counted as an ID switch.

Compared with maDLC (Figure 6), vmTracking showed a significant improvement in the Match in both CR and nCR scenes (Figure 6A, CR: p < 0.001; Figure 6E, nCR: p < 0.01). In both scene types, no FNs occurred, and the number of FNs was significantly reduced with vmTracking (Figure 6B, CR: p < 0.001; Figure 6F, nCR: p < 0.05). The FP (Figure 6C and 6G) and ID switch (Figure 6D,H) counts showed no changes in either scene type. A side-by-side comparison of tracking results obtained using vmTracking and conventional maDLC is presented, highlighting the enhanced accuracy and stability of vmTracking under crowded conditions (Supplementary Video 1).

We also examined the relationship between the number of annotation frames and both Match and FN for CR scenes in maDLC and vmTracking (Figure 6I). In maDLC, the Match plateaued at approximately 85% with around 400 annotation frames, with little further improvement as the number increased. Increasing the annotation frames for training did not reduce FNs. In contrast, vmTracking showed a steady increase in Match with more annotation frames, reaching approximately 95% at around 1,000 frames.

Virtual markers were classified into six categories based on their assignment patterns, and the proportion of each category was used as an index of virtual marker accuracy to examine its relationship with tracking Match (Figure 6J). Scenes with a higher proportion of correctly assigned virtual markers tended to show higher Match. However, some scenes in which the proportion of correct markers was low, either because of many incorrect assignments or many missing markers, still exhibited high Match (e.g., CR scenes 3 and 11).

Markerless tracking, maDLC, vmTracking process diagram for motion analysis; experimental results.
Figure 1: Overview of the vmTracking workflow. (A) Schematic diagram of the vmTracking workflow. First, perform maDLC on the markerless video (I). Based on the resulting maDLC tracking data, correct the output so that each individual maintains a consistent ID throughout the video, and create a labeled video that displays only a selected subset of keypoints (II). Using the labels in this video as virtual markers, track the virtual marker video with saDLC (III). (B) Two examples of vmTracking are shown as snapshots from each stage of the process. The far-left image shows the full view, and the images to the right are magnified views of the areas enclosed by yellow boxes. The orange bars in the lower right corners of the images indicate the scales: 5 cm for the "Markerless (Overview)" image and 1 cm for all others. Please click here to view a larger version of this figure.

Markerless tracking diagram; object detection, maDLC method; dynamic analysis results.
Figure 2: Markerless multi-animal tracking workflow using maDLC. This figure is related to Protocol step 2.1. (A) Panels are adapted from Figure 1B and illustrate Process I (markerless multi-animal tracking) shown in Figure 1A. For clarity, each animal is outlined with a dashed line. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B-D) Snapshots of frames used for labeling during maDLC execution. The orange bar in the lower right corner of each image indicates a 5 cm scale bar. (B) Example frame before labels are applied. (C) Same frame as in B, after labeling. (D) Example of another labeled frame. The identity of individuals between frames C and D is unknown, but individual identity does not need to be considered when labeling. As shown in the snapshots, it is sufficient to correctly label the posture of each animal. Please click here to view a larger version of this figure.

maDLC virtual marker method diagram; data correction, analysis in movement tracking experiment.
Figure 3: Workflow for creating a virtual marker video based on maDLC results. This figure is related to Protocol step 2.2. (A) Panels are adapted from Figure 1B and illustrate Process II (virtual marker creation) shown in Figure 1A. For clarity, each animal is outlined with a dashed line. White arrows indicate the keypoints used as virtual markers (the 2nd and 4th keypoints for each ID). In this example, virtual markers are output in grayscale: purple, green, and red maDLC labels appear as black, gray, and white, respectively. In Example 1, maDLC results were output directly as virtual markers without correction. In Example 2, the purple and red IDs were swapped and corrected before output, due to an ID switch in consecutive frames. A green point in the maDLC result that should have been corrected was overlooked and output without modification. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B) Python-based GUI for editing h5 coordinate data files to retain only specific keypoints and replace all others with NaN¹¹. (Ba) Initial menu window on program launch. Select the file to edit via the Select h5 file button. (Bb) Window after selecting the h5 file. Check the keypoints to replace with NaN (displayed as individual name-body part name), and specify either start-end times (with frame rate) or frame numbers for NaN replacement. Clicking Process Data saves the edited file in the same folder and creates a backup of the original. (C) Consecutive frames from a scene in which an ID switch occurred. (D) Example of correcting all tracking points from B. (E) Snapshot showing the result from B with only the 2nd and 4th keypoints retained. (F) Example of correcting the result in E. Compared with correcting all keypoints (C → D), limiting correction to two points (C → E → F) reduces the number of required edits. The orange bar in the lower right corner of each image C-F indicates a 5 cm scale bar. Please click here to view a larger version of this figure.

Virtual marker tracking diagram for motion analysis in biological study setup; illustrates process.
Figure 4: Workflow for tracking the virtual marker video. This figure is related to Protocol step 3. (A) Panels are adapted from Figure 1B and illustrate Process III (tracking of the virtual marker video) shown in Figure 1A. Each individual is enclosed in a dotted outline for clarity of the tracking target. In this example, individuals with black, gray, and white virtual markers are tracked as purple, green, and red labels, respectively. In Example 1, the gray virtual marker is absent, yet vmTracking maintains correct identity tracking. In Example 2, the gray marker appears on two individuals, but tracking remains correct. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B-D) Snapshots from frames used for labeling during vmTracking. In this step, saDLC is used, with each keypoint defined to include both individual and body part information. Labeling specifies which body part of which individual is represented. The orange bar in the lower right corner of each image indicates a 5 cm scale bar. (B) Snapshot of a frame before labeling. (C) Snapshot of the frame in B after labeling. (D) Snapshot of a different frame from C after labeling. Even in different frames (e.g., C and D), labeling maintains individual identity using the virtual marker as a cue, as indicated by consistent label colors. Please click here to view a larger version of this figure.

Video motion tracking process; vmTracking method, diagram of marker-based vs. markerless analysis.
Figure 5: Workflow for merging tracking results with a markerless video to produce a tracking video without virtual markers. This is an optional step related to the NOTE in Step 3.8.2. (A) When tracking results are applied to the virtual marker video, the resulting tracking video retains the virtual markers (vmTracking^vm+). Yellow arrows indicate the virtual markers visible in the output. Applying the tracking results to the original markerless video instead produces a tracking video without virtual markers (vmTracking^vm-), suitable for presentation or other purposes. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B,C) File management in the "videos" folder of the project directory when applying vmTracking results to a markerless video. The orange box schematically illustrates the folder contents. (B) The "videos" folder after generating a vmTracking tracking video by outputting the tracking results onto the virtual marker video (schematically shown as '4. Labels on virtual marker video'). (C) Preparing the "videos" folder for outputting vmTracking results onto the markerless video. Move or rename the '1. Virtual marker video' and '4. Labels on virtual marker video' files from panel B so they are excluded from the folder. The issue is not the files themselves, but that their original filenames interfere with processing. Then, place the '5. Markerless video' in the folder, renaming it to match the original virtual marker video filename (e.g., VMvideo.MP4). In this state, re-running Create videos produces a tracking video with the results applied to the markerless video ('6. Labels on markerless video'). This figure has been modified from Azechi & Takahashi, 2025, PLOS Biology (CC BY 4.0)⁷. Please click here to view a larger version of this figure.

Tracking performance analysis; bar graphs comparing maDLC and vmTracking; data accuracy metrics.
Figure 6: Evaluation of tracking accuracy with vmTracking. (A-H) Comparison between maDLC and vmTracking for the proportions of Matches. (A), false negatives (B), false positives (C), and ID switches (D) in 12 crowded (CR) scenes, and Matches (E), false negatives (F), false positives (G), and ID switches (H) in 8 non-crowded (nCR) scenes. Plots represent the measured values for each scene, and bars represent the mean values calculated from them. Statistical comparisons were conducted using the Wilcoxon signed-rank test. (I) Relationship between the number of annotated frames in CR and the proportions of Matches and false negatives. The green box on the right shows a magnified view of the area enclosed by the green dotted lines in the left panel. Plots represent the mean values. (J) Relationship between virtual marker accuracy and tracking Matches in CR and nCR scenes. Virtual marker accuracy was categorized for each pair of markers into six types: both points correctly assigned to the correct individual, one correct and one missing, one correct and one incorrect, both incorrect, one incorrect and one missing, and both missing. For each scene, the proportions of these categories are shown as stacked bars, with the corresponding tracking Match for that scene indicated above each bar. Scenes are arranged in ascending order of tracking Match from left to right, separately for CR and nCR scenes. CR scenes: n = 12; nCR scenes: n = 8. ***p < 0.001, **p < 0.01, *p < 0.05. This figure has been modified from Azechi and Takahashi (2025)⁹. Please click here to view a larger version of this figure.

Supplementary Figure 1: Snapshot of the main window in multi-animal DeepLabCut, showing the location of each operation tab. The snapshots illustrate, with arrows, the tabs used in each step of the vmTracking procedure. Images were captured from the DeepLabCut GUI (version 2.2.3). Please click here to download this figure.

Supplementary Figure 2: Snapshot of the main window in single-animal DeepLabCut, showing the location of each operation tab. The snapshots illustrate, with arrows, the tabs used in each step of the vmTracking procedure. Images were captured from the DeepLabCut GUI (version 2.2.3). Please click here to download this figure.

Supplementary Video 1: Comparison of multi-animal DeepLabCut and vmTracking in tracking three mice. This comparative video shows the performance of multi-animal DeepLabCut (maDLC; left) and vmTracking (right) during the tracking of three mice. The vmTracking video was generated by replacing the virtual marker video with the original markerless video, following the procedure described in the NOTE of Step 3.8.2, to present the tracking results without virtual markers. For each tracking method, individual identities are indicated using color labels of the same color family to facilitate comparison. Throughout the video, vmTracking shows fewer instances where body parts belonging to different individuals are detected, compared with maDLC. Notably, around 15-17 s, maDLC misidentifies the green-labeled and red-labeled individuals, resulting in a switch of their assigned identities. Please click here to download this file.

Discussion

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Although there is a growing demand to study naturalistic forms of social interaction in rodents¹, obtaining reliable identity and pose information for multiple freely moving animals-particularly during close interactions or occlusions-remains a technical challenge. vmTracking overcomes this limitation, enabling accurate tracking under such conditions. vmTracking reliably achieves high tracking accuracy in diverse experimental conditions, including crowded environments⁹ where conventional multi-animal trackers⁴^,⁵ often fail. This robustness stems from the combination of virtual marker creation, based on multi-animal tracking, with saDLC tracking, enabling precise identity and pose estimation. Importantly, the method maintained high performance even when virtual marker accuracy was suboptimal. This tolerance to a certain degree of imperfection in the virtual markers ensures that vmTracking remains practical for reliably tracking multiple freely moving individuals, thereby advancing studies of rodent social interactions under semi-natural conditions.

A critical step in vmTracking is correcting the multi-animal tracking results during virtual marker creation. In this process, overlooking identity switches or incorrect labels can cause saDLC to maintain these misidentifications throughout subsequent tracking. If such mistakes are later discovered, they can be corrected by revising the affected segments, regenerating the virtual marker video, and re-running saDLC tracking. In some cases, virtual markers may be placed correctly, yet saDLC may still misidentify individuals when markers on different animals come into close proximity during crowded interactions. Such errors can be mitigated by repositioning the affected virtual markers during the creation step to maximize inter-marker distances, thereby reducing the likelihood of confusion in the subsequent tracking phase.

Although our protocol uses DLC⁴^,⁷ for both steps, it can also be implemented with SLEAP⁵^,¹². In our experience, SLEAP in single-animal mode is generally less stable than saDLC, with performance more dependent on the video⁹. For the virtual marker creation step, however, SLEAP or idtracker.ai¹³ can be equally effective, and the optimal choice may vary with the video. Therefore, if the initial maDLC tracking shows frequent ID switches or severe keypoint loss on visual inspection, a mixed approach is possible-for example, creating virtual markers with SLEAP or idtracker.ai and then tracking the resulting video with saDLC.

Because vmTracking involves two tracking steps, some may view it as cumbersome and question why physical markers are not simply attached from the outset, eliminating the need for virtual marker creation. However, virtual markers offer clear advantages: they can be applied post hoc to existing markerless videos, avoiding any behavioral effects or ethical concerns associated with attaching devices to animals. They can be placed anywhere, never fall off, and are unaffected by posture or close interactions, allowing them to be positioned and adjusted to suit the specific video. This flexibility enables more reliable individual identification and, consequently, improved tracking accuracy compared with physical markers. As a result, vmTracking, which uses virtual markers, enables high-accuracy tracking in studies of free-moving social interactions as well as in the re-analysis of archival video data, where physical markers cannot be applied. The high-accuracy tracking data thereby obtained are broadly useful across fields that rely on behavioral experiments, including behavioral science, psychology, and neuroscience.

vmTracking has several limitations. First, because individual identification relies on differences in marker color, the number of trackable individuals is limited by the finite range of distinguishable colors. To date, we have successfully tracked a group of ten fish⁹; however, increasing this number will require expanding the variety of virtual markers-for example, by altering marker shapes or exploring the use of skeleton structures as additional identifiers-although these approaches require further validation. Second, vmTracking necessarily involves an extra step of adding virtual markers to each markerless video, which can be labor-intensive for long recordings or datasets with many individuals. In addition, while this method can, in principle, be applied to various species and recording conditions, the practicality of achieving high-precision tracking may decrease when experimental parameters cannot be optimized. For example, in recordings of wild or freely moving animals under field-like conditions, where lighting, camera resolution, or recording angles are difficult to control, the manual correction required during virtual marker assignment tends to increase, potentially limiting the ease of obtaining high-quality tracking data. In the future, streamlining and automating the virtual marker assignment process will further enhance the method's applicability and usability.

Disclosures

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors have nothing to disclose.

Acknowledgements

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This work was supported by the Japan Society for the Promotion of Science (JSPS) (JP24K15711 and JP21H04247 to HA, and JP23H00502 and JP21H05296 to ST) and Core Research for Evolutional Science and Technology (CREST) under the Japan Science and Technology Agency (JST) (JPMJCR23P2 to ST).

Materials

List of materials used in this article
Name	Company	Catalog Number	Comments
Acrylic pipe (clear, thickness 5 mm, inner diameter 31 cm)	Sugawarakougei Co., Ltd.	https://www.sugawarakougei.jp/	Purchased from Hazaiya (an online acrylic materials retailer)
Acrylic plate (white, 3 mm thickness, 31 cm diameter)	Sugawarakougei Co., Ltd.	https://www.sugawarakougei.jp/	Purchased from Hazaiya (an online acrylic materials retailer)
C57BL/6J mouse	Shimizu Laboratory Supplies, Co.LTD.	N/A
Camera	Basler	acA3088-57uc
DeepLabCut 2.2.3	Mathis laboratory at Swiss Federal Institute of Technology in Lausanne	https://www.mackenziemathislab.org/deeplabcut
Pylon Camera Software Suite (Pylon Viewer)	Basler	N/A

References

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Lab mice go wild: making experiments more natural in order to decode the brain. Nature. 618 (7965), 448-450 (2023).">Smith, K. Lab mice go wild: making experiments more natural in order to decode the brain. Nature. 618 (7965), 448-450 (2023).
A standardized social preference protocol for measuring social deficits in mouse models of autism. Nat Protoc. 15 (10), 3464-3477 (2020).">Rein, B., Ma, K., Yan, Z. A standardized social preference protocol for measuring social deficits in mouse models of autism. Nat Protoc. 15 (10), 3464-3477 (2020).
Beyond the three-chamber test: toward a multimodal and objective assessment of social behavior in rodents. Mol Autism. 13 (1), 41(2022).">Jabarin, R., Netser, S., Wagner, S. Beyond the three-chamber test: toward a multimodal and objective assessment of social behavior in rodents. Mol Autism. 13 (1), 41(2022).
Multi-animal pose estimation, identification and tracking with DeepLabCut. Nat Methods. 19 (4), 496-504 (2022).">Lauer, J., et al. Multi-animal pose estimation, identification and tracking with DeepLabCut. Nat Methods. 19 (4), 496-504 (2022).
SLEAP: A deep learning system for multi-animal pose tracking. Nat Methods. 19 (4), 486-495 (2022).">Pereira, T. D., et al. SLEAP: A deep learning system for multi-animal pose tracking. Nat Methods. 19 (4), 486-495 (2022).
Automatically annotated motion tracking identifies a distinct social behavioral profile following chronic social defeat stress. Nat Commun. 14 (1), 4319(2023).">Bordes, J., et al. Automatically annotated motion tracking identifies a distinct social behavioral profile following chronic social defeat stress. Nat Commun. 14 (1), 4319(2023).
DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 21 (9), 1281-1289 (2018).">Mathis, A., et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 21 (9), 1281-1289 (2018).
Cost effective raspberry pi-based radio frequency identification tagging of mice suitable for automated in vivo imaging. J Neurosci Methods. 276, 79-83 (2017).">Bolaños, F., LeDue, J. M., Murphy, T. H. Cost effective raspberry pi-based radio frequency identification tagging of mice suitable for automated in vivo imaging. J Neurosci Methods. 276, 79-83 (2017).
vmTracking enables highly accurate multi-animal pose tracking in crowded environments. PLOS Biol. 23 (2), e3003002(2025).">Azechi, H., Takahashi, S. vmTracking enables highly accurate multi-animal pose tracking in crowded environments. PLOS Biol. 23 (2), e3003002(2025).
https://deeplabcut.github.io/DeepLabCut/README.html (2025).">DeepLabCut: documentation. , DeepLabCut Team. https://deeplabcut.github.io/DeepLabCut/README.html (2025).
vmTracking figure and code. Zenodo. , (2025).">Azechi, H. vmTracking figure and code. Zenodo. , (2025).
Fast animal pose estimation using deep neural networks. Nat Methods. 16 (1), 117-125 (2019).">Pereira, T. D., et al. Fast animal pose estimation using deep neural networks. Nat Methods. 16 (1), 117-125 (2019).
idtracker.ai: tracking all individuals in small or large collectives of unmarked animals. Nat Methods. 16 (2), 179-182 (2019).">Romero-Ferrero, F., Bergomi, M. G., Hinz, R. C., Heras, F. J. H., de Polavieja, G. G. idtracker.ai: tracking all individuals in small or large collectives of unmarked animals. Nat Methods. 16 (2), 179-182 (2019).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

References

Reprints and Permissions

Tags

Related Articles