RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This protocol introduces vmTracking, a method that enables high-accuracy pose tracking in videos of multiple freely moving, markerless animals, even under crowded conditions where animals are densely grouped. This approach provides reliable data for analyzing social interactions that occur in semi-natural environments where animals can move freely.
In social behavior research using rodents, there is a growing demand for evaluating more natural interactive behaviors under freely moving conditions. Accurate pose tracking of multiple animals is essential for this purpose. However, current markerless multi-animal pose tracking tools face a significant challenge: tracking accuracy tends to decline under conditions of occlusion and crowding. This problem becomes especially pronounced when the animals are visually indistinguishable from one another. To overcome this issue, we developed virtual marker tracking (vmTracking), a method that improves the accuracy of multi-animal pose tracking under such challenging conditions by maintaining individual identity across frames using virtual markers. vmTracking can also be applied to existing markerless multi-animal video data by incorporating additional processing steps that add individual identity labels into standard tracking workflows. Here, we describe both the method for assigning virtual markers and the protocol for tracking animals in the resulting labeled videos. High-accuracy multi-animal tracking enabled by vmTracking provides a reliable foundation for subsequent quantitative analyses of social interactions under semi-natural conditions.
Research on rodent social behavior has increasingly emphasized the study of interactions in more naturalistic social contexts1. Traditional approaches, such as the three-chamber test, have been widely used to evaluate sociability and social preference2, but these paradigms capture only simplified aspects of social interaction3. To address this need, multi-animal DeepLabCut (maDLC)4 and Social LEAP Estimates Animal Poses (SLEAP)5 have become indispensable tools for markerless pose tracking of multiple animals in free-moving conditions, providing advanced deep learning-based technology for markerless multi-animal pose tracking. However, in crowded environments where animals are in close proximity, these tools are prone to tracking errors such as prediction failures and identity (ID) switches. Achieving reliable data under such conditions often requires extensive manual correction.
Accurate pose tracking of multiple freely moving animals opens the door to investigating more complex social dynamics. For instance, detailed tracking data allow the evaluation of leader-follower relationships, approach-avoidance behaviors during movement, and other nuanced patterns of interaction that cannot be reliably assessed with conventional behavioral assays. Such advances extend the utility of multi-animal tracking beyond relatively simple descriptive measures of behavior, enabling analyses of more complex social interactions.
One approach to improve tracking accuracy is to clearly differentiate individual animals. For example, Bordes6 demonstrated simultaneous tracking of a white CD1 mouse and a black C57BL/6N mouse using single-animal DeepLabCut (saDLC)7. This finding suggests that when animals are visually distinguishable, accurate individual tracking is feasible. However, the use of physical markers for identification may alter natural behaviors. Even minimally invasive methods, such as implanting radio-frequency identification tags8, raise concerns regarding their potential effects on behavior.
Here, we present a multi-animal pose tracking method that uses virtual markers (vmTracking)9 for non-invasive individual identification. Accurate evaluation of social behavior, taking into account its social context, requires highly precise multi-animal pose tracking data. vmTracking was developed to meet this need by providing a reliable protocol-rather than a new software or algorithm-for obtaining such high-quality tracking data using existing tools such as DLC. Therefore, social behavioral analysis based on these data is beyond the scope of this protocol. vmTracking involves two main steps: adding virtual markers to markerless videos and tracking the resulting virtual marker videos. This method enables high-accuracy pose tracking without the need for physical markers, even in conditions involving three or more visually indistinguishable animals of the same strain. By allowing reliable tracking under semi-naturalistic conditions, vmTracking provides an effective tool for advancing research on complex social interactions, with potential applications across behavioral science, psychology, and neuroscience.
All experimental procedures were approved by the Doshisha University Animal Care and Use Committee (Approval No. A23068). C57BL/6J mice (see Table of Materials) were housed in groups of 2-3 per cage under controlled conditions of 24-26 °C, a 12 h light/dark cycle, with ad libitum access to food and water. All behavioral recordings were collected during the light phase. The overall workflow of the vmTracking process, along with snapshots from an example, is shown in Figure 1.
NOTE: vmTracking is not a new software or algorithm but a protocol that uses existing DLC functions to maximize multi-animal tracking accuracy. Consequently, a substantial portion of the following steps describes specific graphical user interface (GUI) operations within DLC to ensure reproducibility.
1. Preparation of markerless multi-animal video data
NOTE: Experimental parameters can be adjusted according to the specific study.
2. Creation of a virtual marker video
3. Perform pose tracking of virtual marker video using saDLC (Figure 4)
NOTE: All procedures are performed using default DLC GUI settings unless otherwise specified.
Tracking accuracy was evaluated by comparison with a manually generated ground truth (GT). Twenty 5- or 10-s scenes were extracted and classified as either crowded (CR) scenes, which included overlaps such as mice crossing paths (12 scenes; total of 1,950 frames), or non-crowded (nCR) scenes without overlaps (8 scenes; total of 1,350 frames). Tracking predictions within 10 pixels of the GT position were counted as matches (Match); predictions ≥10 pixels away were counted as false positives (FP); and missing predictions were counted as false negatives (FN). In addition, if the predicted identity differed from that in the immediately preceding frame, it was counted as an ID switch.
Compared with maDLC (Figure 6), vmTracking showed a significant improvement in the Match in both CR and nCR scenes (Figure 6A, CR: p < 0.001; Figure 6E, nCR: p < 0.01). In both scene types, no FNs occurred, and the number of FNs was significantly reduced with vmTracking (Figure 6B, CR: p < 0.001; Figure 6F, nCR: p < 0.05). The FP (Figure 6C and 6G) and ID switch (Figure 6D,H) counts showed no changes in either scene type. A side-by-side comparison of tracking results obtained using vmTracking and conventional maDLC is presented, highlighting the enhanced accuracy and stability of vmTracking under crowded conditions (Supplementary Video 1).
We also examined the relationship between the number of annotation frames and both Match and FN for CR scenes in maDLC and vmTracking (Figure 6I). In maDLC, the Match plateaued at approximately 85% with around 400 annotation frames, with little further improvement as the number increased. Increasing the annotation frames for training did not reduce FNs. In contrast, vmTracking showed a steady increase in Match with more annotation frames, reaching approximately 95% at around 1,000 frames.
Virtual markers were classified into six categories based on their assignment patterns, and the proportion of each category was used as an index of virtual marker accuracy to examine its relationship with tracking Match (Figure 6J). Scenes with a higher proportion of correctly assigned virtual markers tended to show higher Match. However, some scenes in which the proportion of correct markers was low, either because of many incorrect assignments or many missing markers, still exhibited high Match (e.g., CR scenes 3 and 11).

Figure 1: Overview of the vmTracking workflow. (A) Schematic diagram of the vmTracking workflow. First, perform maDLC on the markerless video (I). Based on the resulting maDLC tracking data, correct the output so that each individual maintains a consistent ID throughout the video, and create a labeled video that displays only a selected subset of keypoints (II). Using the labels in this video as virtual markers, track the virtual marker video with saDLC (III). (B) Two examples of vmTracking are shown as snapshots from each stage of the process. The far-left image shows the full view, and the images to the right are magnified views of the areas enclosed by yellow boxes. The orange bars in the lower right corners of the images indicate the scales: 5 cm for the "Markerless (Overview)" image and 1 cm for all others. Please click here to view a larger version of this figure.

Figure 2: Markerless multi-animal tracking workflow using maDLC. This figure is related to Protocol step 2.1. (A) Panels are adapted from Figure 1B and illustrate Process I (markerless multi-animal tracking) shown in Figure 1A. For clarity, each animal is outlined with a dashed line. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B-D) Snapshots of frames used for labeling during maDLC execution. The orange bar in the lower right corner of each image indicates a 5 cm scale bar. (B) Example frame before labels are applied. (C) Same frame as in B, after labeling. (D) Example of another labeled frame. The identity of individuals between frames C and D is unknown, but individual identity does not need to be considered when labeling. As shown in the snapshots, it is sufficient to correctly label the posture of each animal. Please click here to view a larger version of this figure.

Figure 3: Workflow for creating a virtual marker video based on maDLC results. This figure is related to Protocol step 2.2. (A) Panels are adapted from Figure 1B and illustrate Process II (virtual marker creation) shown in Figure 1A. For clarity, each animal is outlined with a dashed line. White arrows indicate the keypoints used as virtual markers (the 2nd and 4th keypoints for each ID). In this example, virtual markers are output in grayscale: purple, green, and red maDLC labels appear as black, gray, and white, respectively. In Example 1, maDLC results were output directly as virtual markers without correction. In Example 2, the purple and red IDs were swapped and corrected before output, due to an ID switch in consecutive frames. A green point in the maDLC result that should have been corrected was overlooked and output without modification. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B) Python-based GUI for editing h5 coordinate data files to retain only specific keypoints and replace all others with NaN11. (Ba) Initial menu window on program launch. Select the file to edit via the Select h5 file button. (Bb) Window after selecting the h5 file. Check the keypoints to replace with NaN (displayed as individual name-body part name), and specify either start-end times (with frame rate) or frame numbers for NaN replacement. Clicking Process Data saves the edited file in the same folder and creates a backup of the original. (C) Consecutive frames from a scene in which an ID switch occurred. (D) Example of correcting all tracking points from B. (E) Snapshot showing the result from B with only the 2nd and 4th keypoints retained. (F) Example of correcting the result in E. Compared with correcting all keypoints (C → D), limiting correction to two points (C → E → F) reduces the number of required edits. The orange bar in the lower right corner of each image C-F indicates a 5 cm scale bar. Please click here to view a larger version of this figure.

Figure 4: Workflow for tracking the virtual marker video. This figure is related to Protocol step 3. (A) Panels are adapted from Figure 1B and illustrate Process III (tracking of the virtual marker video) shown in Figure 1A. Each individual is enclosed in a dotted outline for clarity of the tracking target. In this example, individuals with black, gray, and white virtual markers are tracked as purple, green, and red labels, respectively. In Example 1, the gray virtual marker is absent, yet vmTracking maintains correct identity tracking. In Example 2, the gray marker appears on two individuals, but tracking remains correct. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B-D) Snapshots from frames used for labeling during vmTracking. In this step, saDLC is used, with each keypoint defined to include both individual and body part information. Labeling specifies which body part of which individual is represented. The orange bar in the lower right corner of each image indicates a 5 cm scale bar. (B) Snapshot of a frame before labeling. (C) Snapshot of the frame in B after labeling. (D) Snapshot of a different frame from C after labeling. Even in different frames (e.g., C and D), labeling maintains individual identity using the virtual marker as a cue, as indicated by consistent label colors. Please click here to view a larger version of this figure.

Figure 5: Workflow for merging tracking results with a markerless video to produce a tracking video without virtual markers. This is an optional step related to the NOTE in Step 3.8.2. (A) When tracking results are applied to the virtual marker video, the resulting tracking video retains the virtual markers (vmTrackingvm+). Yellow arrows indicate the virtual markers visible in the output. Applying the tracking results to the original markerless video instead produces a tracking video without virtual markers (vmTrackingvm-), suitable for presentation or other purposes. The orange bar in the lower right corner of each image indicates a 1 cm scale bar. (B,C) File management in the "videos" folder of the project directory when applying vmTracking results to a markerless video. The orange box schematically illustrates the folder contents. (B) The "videos" folder after generating a vmTracking tracking video by outputting the tracking results onto the virtual marker video (schematically shown as '4. Labels on virtual marker video'). (C) Preparing the "videos" folder for outputting vmTracking results onto the markerless video. Move or rename the '1. Virtual marker video' and '4. Labels on virtual marker video' files from panel B so they are excluded from the folder. The issue is not the files themselves, but that their original filenames interfere with processing. Then, place the '5. Markerless video' in the folder, renaming it to match the original virtual marker video filename (e.g., VMvideo.MP4). In this state, re-running Create videos produces a tracking video with the results applied to the markerless video ('6. Labels on markerless video'). This figure has been modified from Azechi & Takahashi, 2025, PLOS Biology (CC BY 4.0)7. Please click here to view a larger version of this figure.

Figure 6: Evaluation of tracking accuracy with vmTracking. (A-H) Comparison between maDLC and vmTracking for the proportions of Matches. (A), false negatives (B), false positives (C), and ID switches (D) in 12 crowded (CR) scenes, and Matches (E), false negatives (F), false positives (G), and ID switches (H) in 8 non-crowded (nCR) scenes. Plots represent the measured values for each scene, and bars represent the mean values calculated from them. Statistical comparisons were conducted using the Wilcoxon signed-rank test. (I) Relationship between the number of annotated frames in CR and the proportions of Matches and false negatives. The green box on the right shows a magnified view of the area enclosed by the green dotted lines in the left panel. Plots represent the mean values. (J) Relationship between virtual marker accuracy and tracking Matches in CR and nCR scenes. Virtual marker accuracy was categorized for each pair of markers into six types: both points correctly assigned to the correct individual, one correct and one missing, one correct and one incorrect, both incorrect, one incorrect and one missing, and both missing. For each scene, the proportions of these categories are shown as stacked bars, with the corresponding tracking Match for that scene indicated above each bar. Scenes are arranged in ascending order of tracking Match from left to right, separately for CR and nCR scenes. CR scenes: n = 12; nCR scenes: n = 8. ***p < 0.001, **p < 0.01, *p < 0.05. This figure has been modified from Azechi and Takahashi (2025)9. Please click here to view a larger version of this figure.
Supplementary Figure 1: Snapshot of the main window in multi-animal DeepLabCut, showing the location of each operation tab. The snapshots illustrate, with arrows, the tabs used in each step of the vmTracking procedure. Images were captured from the DeepLabCut GUI (version 2.2.3). Please click here to download this figure.
Supplementary Figure 2: Snapshot of the main window in single-animal DeepLabCut, showing the location of each operation tab. The snapshots illustrate, with arrows, the tabs used in each step of the vmTracking procedure. Images were captured from the DeepLabCut GUI (version 2.2.3). Please click here to download this figure.
Supplementary Video 1: Comparison of multi-animal DeepLabCut and vmTracking in tracking three mice. This comparative video shows the performance of multi-animal DeepLabCut (maDLC; left) and vmTracking (right) during the tracking of three mice. The vmTracking video was generated by replacing the virtual marker video with the original markerless video, following the procedure described in the NOTE of Step 3.8.2, to present the tracking results without virtual markers. For each tracking method, individual identities are indicated using color labels of the same color family to facilitate comparison. Throughout the video, vmTracking shows fewer instances where body parts belonging to different individuals are detected, compared with maDLC. Notably, around 15-17 s, maDLC misidentifies the green-labeled and red-labeled individuals, resulting in a switch of their assigned identities. Please click here to download this file.
Although there is a growing demand to study naturalistic forms of social interaction in rodents¹, obtaining reliable identity and pose information for multiple freely moving animals-particularly during close interactions or occlusions-remains a technical challenge. vmTracking overcomes this limitation, enabling accurate tracking under such conditions. vmTracking reliably achieves high tracking accuracy in diverse experimental conditions, including crowded environments9 where conventional multi-animal trackers4,5 often fail. This robustness stems from the combination of virtual marker creation, based on multi-animal tracking, with saDLC tracking, enabling precise identity and pose estimation. Importantly, the method maintained high performance even when virtual marker accuracy was suboptimal. This tolerance to a certain degree of imperfection in the virtual markers ensures that vmTracking remains practical for reliably tracking multiple freely moving individuals, thereby advancing studies of rodent social interactions under semi-natural conditions.
A critical step in vmTracking is correcting the multi-animal tracking results during virtual marker creation. In this process, overlooking identity switches or incorrect labels can cause saDLC to maintain these misidentifications throughout subsequent tracking. If such mistakes are later discovered, they can be corrected by revising the affected segments, regenerating the virtual marker video, and re-running saDLC tracking. In some cases, virtual markers may be placed correctly, yet saDLC may still misidentify individuals when markers on different animals come into close proximity during crowded interactions. Such errors can be mitigated by repositioning the affected virtual markers during the creation step to maximize inter-marker distances, thereby reducing the likelihood of confusion in the subsequent tracking phase.
Although our protocol uses DLC4,7 for both steps, it can also be implemented with SLEAP5,12. In our experience, SLEAP in single-animal mode is generally less stable than saDLC, with performance more dependent on the video9. For the virtual marker creation step, however, SLEAP or idtracker.ai13 can be equally effective, and the optimal choice may vary with the video. Therefore, if the initial maDLC tracking shows frequent ID switches or severe keypoint loss on visual inspection, a mixed approach is possible-for example, creating virtual markers with SLEAP or idtracker.ai and then tracking the resulting video with saDLC.
Because vmTracking involves two tracking steps, some may view it as cumbersome and question why physical markers are not simply attached from the outset, eliminating the need for virtual marker creation. However, virtual markers offer clear advantages: they can be applied post hoc to existing markerless videos, avoiding any behavioral effects or ethical concerns associated with attaching devices to animals. They can be placed anywhere, never fall off, and are unaffected by posture or close interactions, allowing them to be positioned and adjusted to suit the specific video. This flexibility enables more reliable individual identification and, consequently, improved tracking accuracy compared with physical markers. As a result, vmTracking, which uses virtual markers, enables high-accuracy tracking in studies of free-moving social interactions as well as in the re-analysis of archival video data, where physical markers cannot be applied. The high-accuracy tracking data thereby obtained are broadly useful across fields that rely on behavioral experiments, including behavioral science, psychology, and neuroscience.
vmTracking has several limitations. First, because individual identification relies on differences in marker color, the number of trackable individuals is limited by the finite range of distinguishable colors. To date, we have successfully tracked a group of ten fish9; however, increasing this number will require expanding the variety of virtual markers-for example, by altering marker shapes or exploring the use of skeleton structures as additional identifiers-although these approaches require further validation. Second, vmTracking necessarily involves an extra step of adding virtual markers to each markerless video, which can be labor-intensive for long recordings or datasets with many individuals. In addition, while this method can, in principle, be applied to various species and recording conditions, the practicality of achieving high-precision tracking may decrease when experimental parameters cannot be optimized. For example, in recordings of wild or freely moving animals under field-like conditions, where lighting, camera resolution, or recording angles are difficult to control, the manual correction required during virtual marker assignment tends to increase, potentially limiting the ease of obtaining high-quality tracking data. In the future, streamlining and automating the virtual marker assignment process will further enhance the method's applicability and usability.
The authors have nothing to disclose.
This work was supported by the Japan Society for the Promotion of Science (JSPS) (JP24K15711 and JP21H04247 to HA, and JP23H00502 and JP21H05296 to ST) and Core Research for Evolutional Science and Technology (CREST) under the Japan Science and Technology Agency (JST) (JPMJCR23P2 to ST).
| Acrylic pipe (clear, thickness 5 mm, inner diameter 31 cm) | Sugawarakougei Co., Ltd. | https://www.sugawarakougei.jp/ | Purchased from Hazaiya (an online acrylic materials retailer) |
| Acrylic plate (white, 3 mm thickness, 31 cm diameter) | Sugawarakougei Co., Ltd. | https://www.sugawarakougei.jp/ | Purchased from Hazaiya (an online acrylic materials retailer) |
| C57BL/6J mouse | Shimizu Laboratory Supplies, Co.LTD. | N/A | |
| Camera | Basler | acA3088-57uc | |
| DeepLabCut 2.2.3 | Mathis laboratory at Swiss Federal Institute of Technology in Lausanne | https://www.mackenziemathislab.org/deeplabcut | |
| Pylon Camera Software Suite (Pylon Viewer) | Basler | N/A |