Automated Joint Space Detection Improves Bone Segmentation Accuracy

H. Mark Kenney; Daniel Lichau; Rémi Blanc; Lindsay Schnur; Richard  D. Bell; Christopher  T. Ritchlin; Edward  M. Schwarz; Hani  A. Awad; Ronald  W. Wood

doi:10.3791/69252

Research Article

Automated Joint Space Detection Improves Bone Segmentation Accuracy

DOI:

10.3791/69252

⸱

November 28th, 2025

H. Mark Kenney¹^,²^,³ , Daniel Lichau⁴ , Rémi Blanc⁴ , Lindsay Schnur¹ , Richard D. Bell⁵ , Christopher T. Ritchlin¹^,³ , Edward M. Schwarz¹^,²^,³^,⁶^,⁷^,⁸ , Hani A. Awad¹^,⁷^,⁸ , Ronald W. Wood¹^,⁶^,⁹^,¹⁰

Summary

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The development of an automated joint space detection workflow enabled high-throughput segmentation of distinct murine hindpaw bones with >98% accuracy in wild-type animals. Flexible application to forepaws and paws with inflammatory-erosive arthritis was achieved, but with deprecated performance that warrants further optimization in future studies using publicly available data.

Abstract

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Quantitative description of complex anatomical structures remains challenging due to the expertise necessary for manual segmentation, labor, and interobserver variability. To overcome this, automated detection of specific landmarks can be accomplished by digital image analysis techniques, including deep learning (DL) models. To this end, we undertook supervised automated analysis of micro-computed tomography (micro-CT) datasets of murine hindpaws and forepaws. Advancing beyond previously published semi-automated (SA) marker-based watershed algorithms, we added structure enhancement, tensor voting, and output dilation to identify joint spaces. Segmentation was enhanced by utilizing a DL joint space prediction model (3D U-Net architecture, ResNet-18 backbone) using wild-type (WT) hindpaw labels as ground truth. Prediction was extended to hindpaws and forepaws from WT and tumor necrosis factor transgenic (TNF-Tg) mice with inflammatory-erosive arthritis of both sexes across age. Segmentation accuracy improved dramatically using the DL methodology. Accuracy declined with increased disease severity and age in TNF-Tg mice. Subsequent testing in forepaws also displayed progressive reduction in accuracy with increasing arthritic severity. Overall, this supervised automated model outperforms recent SA approaches in healthy joints to enhance the investigation of complex bone anatomy. Although flexible application to novel and disease-modified datasets demonstrates deprecated performance, utilization may nonetheless catalyze structure-specific segmentation model development.

Introduction

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

High-quality image analysis not only enhances research endeavors but also has the potential to aid clinical radiologists as they seek to detect and quantify pathological changes, a task paramount to patient care. Image analysis is a detailed sequence of procedures, including feature extraction, translating an otherwise generic image into meaningful labels, and deriving quantitative metrics¹. Much of this process is driven by prior knowledge, such as thresholding structures based on density or color, and then applying downstream image processing algorithms (i.e., dilation, erosion, smoothing, separation) to achieve the desired segmentation. Once optimized, the segmented images can provide inputs for supervised machine learning², including deep learning (DL) that encodes and decodes complex features using neural networks,²^,³ resulting in improved image segmentation accuracy and throughput.

In fact, the implementation of various 3D convolutional neural networks (CNNs) has provided critical advancements in automated bone segmentation algorithms for skeletal image analysis, where some models outperform human segmentation⁴. While 3D segmentation CNNs function through distinct architectures (i.e., AlexNet, ResNet, UNet), their outputs are fundamentally the same, an image mask denoting bone positive volume from background. Such deep learning models in musculoskeletal image analysis have been under rapid development, and the field has quickly moved from solving simple 2D fracture detection problems⁵ to complex multi-joint problems⁶ with the ability to deal with artifactual noise or anomalous features in datasets. For example, Woo et al noticed that structural anomalies (i.e., bone marrow lesions, bone cysts) on MRI were degrading their segmentation predictions for the articular cartilage in knee joints. Thus, they developed an anomaly-aware segmentation model to first identify unrelated anomalous structures, which dramatically improved the segmentation of bone and cartilage⁷. He et al trained 14 separate models, each on an ROI of key hand joints, to estimate skeletal age and then integrated their outputs to improve predictions from hand x-rays, rather than using the entire hand structure⁸. Similarly, including context into models, like regional segmentation features and global anatomical relationships, has been shown to improve predictions. Using multi-region CNNs to provide context on expected segmentation features also improves skeletal maturity assessment in hand X-rays by restraining the classification problem to anatomically appropriate locations⁹. Additional advances include SVTNet with CNN-based bone segmentation, followed by further processing by vision transformer models to capture global information regarding the spatial relationship between segmented regions of interest to estimate quantitative outcomes such as bone age¹⁰.

As demonstrated in these segmentation approaches, arthritis research imaging focuses on articulating surfaces between two or more bones, where methods for bone separation are critical for successful evaluation of differential pathologic processes in complex joints. Advances in image processing algorithms have demonstrated remarkable utility in increasing the analytical throughput of closely adjacent carpal or tarsal bones¹¹^,¹². However, limited adoption due to inaccuracies requiring user intervention and difficulties in translation to distinct structures highlights the need for the implementation of optimized workflows. These multi-step processes can benefit tremendously from discrete tools for image enhancement (i.e., bone edge detection¹³). Beyond strictly morphologic operations, other studies have implemented registration-based techniques leveraging the typical reliability and consistency in anatomy for structure identification¹⁴^,¹⁵^,¹⁶^,¹⁷. The alternative of manual inputs to generate ground truth labels is expensive and tedious, but can be similarly successful, where utilization may be essential in complicated, closely interlocking bones with less discrete boundaries (i.e., skull¹⁸). Similarly, alternative imaging approaches with multi-color/hue variability even within a discrete structure, such as magnetic resonance imaging (MRI¹⁹^,²⁰^,²¹) or tissue histology²², also exhibit complexity that may benefit from initial manual segmentation to guide automated processes. Together, these methods can offer additional benefits by fueling further automation, where outcomes serve as training datasets to implement DL approaches. The benefits of segmentation automation are numerous, but particularly these methods allow for detailed spatially-relevant quantitative metrics, including regional/bone-specific erosive volume changes²³^,²⁴^,²⁵^,²⁶ as well as identification of areas with high susceptibility to damage²⁷.

Here, we build upon established semi-automated (SA) murine hindpaw segmentation methods¹² with improvements in image processing algorithms in combination with ground truth bone segmentations²⁸ to train DL models for joint space detection. This novel analytical strategy demonstrated significantly improved accuracy in individual bone segmentation of hindpaws, which reduced manual efforts for corrections of segmentation errors¹² to expedite processing of downstream quantitative metrics. We also demonstrate the potential for implementation of these technical advances in novel structures, including forepaws and hindpaws with severe erosive arthritis. As manual segmentation is time-intensive and requires high levels of expertise⁷, similar strategies to sequentially utilize semi-automated and automated segmentation to produce improved input filters can lower the barrier to developing high-quality CNNs for specific applications. The associated methods developed in Amira software (Supplementary File 1) and the relevant datasets are provided publicly to support adoption and collaboration in further research endeavors²⁸^,²⁹.

Protocol

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

All animal experiments were performed in accordance with IACUC protocols approved by the University Committee for Animal Resources at the University of Rochester Medical Center.

Animal models
The mice were housed in an AAALAC-accredited vivarium. A total of 19 mice were used for the described experiments, including 4 wildtype (WT) male, 4 WT female, 4 TNF transgenic (TNF-Tg) male, and 7 TNF-Tg female mice, with longitudinal monthly assessments. TNF-Tg mice (3647 line, C57BL/6 genetic background³⁰) were initially obtained from Dr. George Kollias with continued maintenance at the University of Rochester. The TNF-Tg mice were bred as heterozygotes, with WT mice serving as littermate controls. Genotyping for the TNF transgene was performed using the following primer sequences: TNF-Tg Forward: 5-TAC-CCC-CTC-CTT-CAG-ACA-CC-3; TNF-Tg Reverse: 5-GCC-CTT-CAT-AAT-ATC-CCC-CA-3.

TNF-Tg mice develop chronic, progressive, and spontaneous inflammatory-erosive arthritis³¹ with more rapid onset of articular and extra-articular manifestations in female mice, leading to early mortality at approximately 5-6 months³². Thus, additional mice were allocated to the TNF-Tg female cohort. Indicated in previous descriptions of this study cohort²³, n=2 TNF-Tg females died prior to completion of the study, n=1 before 4 months, and n=1 before 5 months. As the TNF-Tg mice exhibit well-established asymmetric disease progression in hindpaws³³^,³⁴, individual limbs were considered the unit of measure (2 forepaws and 2 hindpaws per animal).

Micro-CT image collection
Micro-CT datasets were collected as previously described¹²^,²³. Briefly, the mice were placed in a Derlin plastic and clear acrylic tube with 1%-3% isoflurane anesthesia for imaging with a micro-CT using the following parameters: 55 kV, 145 µA, 300 ms integration time, 2048 x 2048 pixels, 1000 projections over 180°, resolution 17.5 µm isotropic voxels. For safety when using isoflurane, proper personal protective equipment (nitrile gloves, lab coat or gown, safety glasses) was utilized, and the isoflurane vaporizer was maintained within a fume hood using a charcoal filter to capture gaseous waste with well-sealed tubing for continuity in the delivery system. The vaporized isoflurane was utilized for anesthesia induction in a sealed chamber, then anesthesia was maintained by continuous isoflurane flow into a murine nose cone in the micro-CT machine. Both the hindpaws and forepaws were taped together for stabilization during imaging sessions. Each dataset was collected in approximately 30-45 min (60-90 min total) with hindpaw and forepaw data derived from the same animals and timepoints. The mice were evaluated at monthly intervals starting at 2 months of age until 5 months (females, TNF-Tg with early mortality³²) or 8 months (males). Portions of the hindpaw data utilized for this study were previously published (WT:¹²^,³⁵; WT and TNF-Tg:²³) and are publicly available²⁸. The forepaw data were made publicly available for the purposes of this study²⁹.

Joint space segmentation algorithm with deep learning facilitation
A high-throughput SA image processing algorithm to segment the individual bones of the complex hindpaw in mice (30-31 bones) was previously developed¹², which provided a framework to investigate individual biomarkers of inflammatory-erosive arthritis progression using Amira software²³. This baseline SA segmentation method utilized a marker-based watershed algorithm³⁶ for separation at bone boundaries. The marker-based watershed algorithm separates different objects in an image by treating pixel values as local topography based on user-defined markers. These bone-specific markers were generated in an SA manner through a variety of image processing steps, including black top-hat (BTH), in an effort to highlight local regions of large density changes, such as bone edges and articulations. Together, this approach created an eroded version of each individual bone, which was then expanded to the bone borders through the application of a binary boundary mask. While the watershed method improved upon previous utilization of manual contouring, the accuracy (bones segmented correctly / total bones) was approximately 80% per dataset due to low contrast noise leading to bridged joint regions (over-connecting bones; 2+ bones as one material) or edge misidentification (over-splitting bones; 1 bone as 2+ materials). Thus, the SA marker generation for the watershed approach required consistent and frequent manual correction procedures¹² to develop a resource of gold-standard labels²³^,²⁸.

The development of the DL model for bone joint prediction was based on the architecture of a 3D U-Net with a ResNet-18 backbone. The training loss function was Dice, the validation metric was intersection over union (IoU), the gradient descent used Adam optimization with an initial learning rate of 0.0001, and weights were initialized randomly. The model was trained using 20 WT datasets (40 hindpaws) with equal sex and age (from 2-6 months) distribution, each split into 6 subvolumes (3 per hindpaw) of 200 x 200 x 200 voxels with a 25% randomized validation set of subvolumes (30 validation, 120 total) to avoid overfitting. These 3D tiles were positioned evenly on the tarsals, distal phalanges, and background regions. The training patch size was set at 96 x 96 x 96 voxels (Supplementary Figure 1). The model was trained over 500 epochs, taking approximately 6 h. Ground truth joint regions were obtained by an automatic recipe from ground truth labels, which expanded label interfaces with a 3D dilation size of 5 for both thickness and extent.

In conjunction with the joint space DL prediction, several image processing steps were implemented to augment joint space identification and bone segmentation. These strategies included the utilization of the BTH method combined with structure enhancement³⁷, membrane enhancement, and tensor voting³⁸ to reinforce joint space continuity by limiting membrane gaps. Together, these approaches fortify the joint spaces for bone separation to limit the segmentation leakage between adjacent bones that then generates over-connection errors as the watershed algorithm propagates across multiple bones. The final result separates and segments original micro-CT datasets into bone-specific labels.

A detailed step-by-step protocol of the DL facilitated segmentation method is provided below.

Step 1: Open Amira software (requires personal or institutional license). Step 2: Open Python tab, select Create New Python Environment with environment name: deep-learning-environment-2022_2. Check Install Deep Learning Packages. Step 3: Restart software, open Python tab, select user environment deep-learning-environment-2022_2. Step 4: Open data - DICOM stacks can be loaded by selecting all individual DICOM files or opening .am files that contain embedded DICOM stacks, if available. Step 5: Apply Deep Learning Prediction module to the imported data object with the following details:
Data: Imported data object
Architecture: .json file (Supplementary File 2)
Weights: .hdf5 file (Supplementary File 3)
Tiling: manual - optimization possible by reducing tiling pixel size and increasing tiling overlap as able, depending on computational hardware. The module will fail if processing requirements are insufficient. Ensure in Edit, Preferences, Large data that memory allocation for the software is maximized as much as possible.
Tiling width, height, depth: 352 pixels
Tiling overlap: 0 pixels
Step 6: Apply Image Recipe Player, right-click in the Project Area without targeting a specific data object. Input/assess the following:
Open recipe: .hxisp file (Supplementary File 4)
Data: Imported data object
Input joints: Result of Step 5 (Deep Learning Prediction)
Step3 intensity range: 2500 - 20000
Step 7: Evaluate the resulting processed data object that contains the final segmentation - ensure to adjust Colormap to Labels256 to appreciate the total individual resulting segmentations (default is only 8 colors). For 2D: Apply Ortho Slice and for 3D: Apply Volume Rendering. Details on embedded recipe (BTH+DL+SEF+MEF_D2.hxisp) applied in Image Recipe Player step provided below (Supplementary File 5). Note that in Image Recipe Designer, individual steps can be visualized and exported as needed to evaluate steps for optimization in particular datasets. In the protocol, particular steps are highlighted that will need adjustment for any unique application, as these are dependent on imaging outputs (i.e., density) and/or size of objects (i.e., bones):
Step 1: Apply Median Filter with Data: Imported data object; Interpretation: 3D; Neighborhood: 26; Iterations: 3; Type: Iterative.
Step 2: Apply Thresholding - this step will require optimization depending on datasets and the particular threshold that targets the object of interest, in this case, the bone with Data: Result of Step 1 (Median Filter); Intensity range: 2500 - 20000.
Step 3: Apply Closing - this step will require optimization depending on size of joint spaces between bones with Data: Result of Step 1 (Median Filter); Type: Cube; Interpretation: 3D; Neighborhood: 26; Pixel size: 3.
Step 4: Apply Image Arithmetic with Input A: Result of Step 3 (Closing); Input B: Result of Step 1 (Median Filter); Result channels: like input A; Expression: A-B.
Step 5: Apply Thresholding with Data: Result of Step 4 (Image Arithmetic); Intensity range: 750 - 20000.
Step 6: Apply Image Arithmetic with Input A: Result of Step 2 (Thresholding); Input B: Result of Step 5: (Thresholding); Result channels: like input A; Expression: A-(B>0)
Step 7: Apply Structure Enhancement Filter with Input image: Imported data object; Interpretation: 3D; Tensor type: Hessian; Standard deviation min/max: 1 - 3 pixels; Standard deviation step: 1 pixel; Contrast: Dark; Structure type: Plane.
Step 8: Apply Auto Thresholding with Input image: Result of Step 7 (Structure Enhancement Filter); Type: Auto Threshold High; Interpretation: 3D; Mode: Min-max; Criterion: Factorisation.
Step 9: Apply Membrane Enhancement Filter with Data: Imported data object; Output selection: Planeness Tensor Voting; Tensor voting scale: 3 pixels; Densification scale: 3 pixels; Type: Ridge
Contrast: Dark; Scale: 1 pixel.
Step 10: Apply Auto Thresholding with Input image: Result of Step 9 (Membrane Enhancement Filter); Type: Auto Threshold High; Interpretation: 3D; Mode: Min-max; Criterion: Factorisation.
Step 11: Apply Dilation - this step will require optimization depending on the size of joint spaces - with Input image: Result of Step 10 (Auto Thresholding); Type: Ball; Interpretation: 3D; Size: 1 pixel; Precision: Faster.
Step 12: Apply Image Arithmetic with Input A: Result of Step 11 (Dilation); Input B: Result of Step 8 (Auto Thresholding); Input C: Result of Deep Learning Prediction; Result channels: like input A; Expression: A||B||C.
Step 13: Apply Remove Small Spots with Input image: Result of Step 12 (Image Arithmetic); Interpretation: 3D; Size: 500 pixels.
Step 14: Apply Image Arithmetic with Input A: Result of Step 13 (Remove Small Spots); Input B: Result of Step 6 (Image Arithmetic); Result channels: like input A; Expression: !A*B.
Step 15: Apply Remove Small Spots with Input image: Result of Step 14 (Image Arithmetic); Interpretation: 3D; Size: 500 pixels.
Step 16: Apply Labeling with Input image: Result of Step 15 (Remove Small Spots); Interpretation: 3D; Neighborhood: 26.
Step 17: Apply Convert Image Type with Data: Result of Step 1 (Median Filter); Output type: 16-bit unsigned; Normalization mode: Scaling; Scaling: Scale 3, Offset 2000.
Step 18: Apply Marker Based Watershed Inside Mask with Data: Result of Step 17 (Convert Image Type); Markers: Result of Step 16 (Labeling); Binary Mask: Result of Step 6 (Image Arithmetic); Split type: Low Intensity.

Segmentation method testing and quantification
The segmentation method was tested through the generation of a recipe that incorporated the DL joint prediction with the downstream image processing recipe on an intensity range of 2500 - 20000 Hounsfield units. Recipe generation allowed for batch processing (Apply a Recipe on a Batch of Files) of the original micro-CT datasets (.am file format as image stack saved after import of initial .dcm files into Amira). Computer hardware included 16 cores from an Intel Xeon Gold 5218 central processing unit (CPU) at 2.30 GHz, 128 GB of double data rate fourth generation (DDR4) error correction code (ECC) random-access memory (RAM) at 2666 megatransfers (MT)/s, and 24 GB of virtual graphics processing unit (vGPU/VRAM) on a 64-bit operating system running Windows 10 (operating system build: 19044.4780). Each hindpaw dataset (2 hindpaws) was segmented in approximately 32.7 ± 8.42 min (mean ± standard deviation) without user intervention. This is compared to the prior SA model, where segmentation time was dependent on user experience, with novice users at 40.5 ± 9.06 min per dataset and experienced users at 19.3 ± 5.34 min per dataset (WT datasets only, including correction of segmentation errors)¹². Both DL and SA methods demonstrate remarkable improvements from prior gold standard manual contouring for segmentation at 190.6 ± 30.4 min per dataset by an experienced user (performed using conventional Scanco analysis)¹². Forepaw datasets (2 forepaws) were segmented in approximately 53.4 ± 23.6 min without user intervention, where increased segmentation time can be attributed to additional structures within the original imaging datasets (i.e., spine and ribs), which are not present in the more distally isolated hindpaws and inflate the segmentation time in the absence of preceding volume editing steps.

Quantification of accuracy was performed by visual inspection (HMK) to identify correct segmentation or error type based on expected bone anatomy (Hindpaw:¹²^,³⁹; Forepaw:⁴⁰). Accuracy was calculated as a percentage by:

Accuracy formula, equation for model evaluation, diagram illustrating true/false positives/negatives.

where true positives were correctly segmented bones, true negatives equal 0 (there are no circumstances in which bones should be missing, and background was not relevant for quantification), false positives were over-split bones, and false negatives were over-connected bones. Accuracy was determined to be an appropriate quantitative metric given the single-class problem (i.e., identifying joint spaces), and true negatives (i.e., background) were non-contributory in accuracy calculations, thus reducing the risk of overestimating performance. The automated segmentation method does not include bone naming; the bone names are later associated with segmented materials manually by the user.

Evaluation of the hindpaws involved in this study has confirmed the fixed fusion in the tarsals of the navicular and lateral cuneiform (NAVLAT) in C57BL/6 mice⁴¹, and additionally determined that the adjacent intermediate cuneiform (INT) may also be variably fused with the NAVLAT structure (NAVLATINT)¹²^,³⁹. Similar variable fusion was appreciated in the carpal region of forepaws, where the trapezoid (ZOID; lesser multangular) and centrale (CENT) bones may present as either a single fused structure (CENTZOID) or subdivided into their individual bones, particularly in forepaws. Additional carpal bones investigated for segmentation accuracy included the trapezium (ZIUM; greater multangular), capitate (CAP), hamate (HAM), triquetrum (TRI; triangular), pisiform (PIS), scaphoid (navicular)/lunate (SCAPHATE; fixed fusion), and falciformis (FALC). The forepaw metacarpals (MET-F; 1-5), proximal phalanges (PP-F; 1-5), distal phalanges (DP-F; 2-5), and sesamoids (S-F; 1-10) are numbered lateral to medial, as opposed to hindpaw counterparts (metatarsals (MET-H), PP-H, DP-H, and S-H), which are numbered medial to lateral¹². Along with NAVLATINT, additional tarsal bones were evaluated for hindpaws, as previously described¹²^,²³, including the calcaneus (CALC), cuboid (CUB), medial cuneiform (MED), talus (TAL), and tibiale (TIB). Note the overall cohort accuracy quantifications vary slightly when comparing the assessment of average accuracy per dataset with a variable number of bones due to anatomical fusions versus the accuracy calculated based on the total individual bones analyzed.

Statistical analysis
Statistical analysis, including 3-way or 2-way mixed-effects analysis with interaction effects or Sidak's multiple comparisons and Fisher's exact test, was performed as appropriate in GraphPad Prism (v10.2.0; San Diego, CA, USA). Males (2-8 months) and females (2- 5 months) were analyzed separately, given the distinct timeframes of evaluation based on early TNF-Tg female mortality³². The sample sizes of the WT hindpaws used for training/validation and methodological testing are provided in Supplementary Table 1, along with sample size details for tested WT and TNF-Tg hindpaws and forepaws in Supplementary Table 2, Supplementary Table 3, and Supplementary Table 4. As certain timepoints for WT hindpaw testing incorporated accuracy evaluation for <3 hindpaws, interaction effects were reported without post-hoc multiple comparisons in analyses that included WT hindpaws. Entire or portions of hindpaws were omitted from analysis if there were imaging errors with incomplete capture of the paw, considerable motion artifact rendering the scans uninterpretable, and/or if the animal died prior to a scheduled imaging session, as all data were collected in vivo.

Results

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Implementation of automated joint space identification improves bone segmentation accuracy
Given the heterogeneity of bone shape and architecture in complex structures such as the murine hindpaw, we build upon our systematic image processing algorithm¹² through DL training predictions (blue) coupled with image processing steps for robust identification of inter-bone joint spaces in micro-CT datasets (Figure 1A-B; process described below and shown in Supplementary Figure 1). The identification of spaces between bones enabled precise bone separation and segmentation of individual hindpaw bones (separate colors; Figure 1C). For the DL component, the training and validation datasets (WT) consisted of equal ages (2-6 months of age, n=8 hindpaws per age) and sex (n=20 hindpaws per sex). The remainder of the WT hindpaws (n=44, from 2-8 months of age, excluding 6 months as all were used for training and validation) served as the test datasets to quantify the accuracy of bone segmentation (Figure 1D). There were 2 WT male hindpaws at 2 months and 2 WT female hindpaws at 3 months that were omitted due to imaging error (Supplementary Table 1).

Along with implementation in WT hindpaws, we also tested the automated segmentation approach on hindpaws from TNF-Tg mice (n=56 male hindpaws, n=48 female hindpaws) with spontaneous inflammatory-erosive arthritis. There were 4 TNF-Tg female hindpaws at both 4 months and 5 months that were omitted due to imaging error or premature death prior to endpoint at 5 months (Supplementary Table 2). The novel segmentation algorithm automatically detected the joint spaces (blue, left) for individual bone separation (colors, right) across both sexes and genotypes (Figure 2A-D). For segmentation accuracy of individual bones shown in Supplementary Table 5 and Supplementary Table 6, WT outperformed TNF-Tg datasets for both males (WT 98.4% versus TNF-Tg 93.1%, p<0.0001) and females (WT 98.7% versus TNF-Tg 92.1%, p<0.0001). The source of error was demonstrated visually as incomplete closure of joint spaces (arrows in white dashed box), thus inadvertently over-connecting two distinct bones into a single segmentation (Figure 2C-D). These over-connected errors demonstrated in TNF-Tg hindpaws may represent sequelae of chronic damage leading to joint fusion, where the space between bones no longer exists. In fact, the difference in accuracy between WT and TNF-Tg datasets becomes more pronounced across time as the arthritic severity increases (Figure 2E-F), especially in the tarsal bones (Figure 2G-H, yellow = increased accuracy, green = decreased accuracy) that typically serve as reliable biomarkers for the progression of bone erosion²³. However, compared to our prior SA segmentation approach, there was a remarkable improvement in dataset accuracy overall (Figure 2E-F; WT male: SA 79.39% ± 5.73% versus DL 98.16% ± 1.47%, p<0.0001; WT female: SA 79.16% ± 4.84% versus DL 99.19% ± 1.63%, p<0.0001), demonstrating the robust methodologic advancements both in automaticity and reliability. Thus, our novel strategic model for hindpaw bone segmentation using DL facilitated joint space identification provides significantly increased segmentation accuracy in WT datasets (>98%) compared to prior SA methods (~79%), but with slightly deprecated performance when applied to hindpaws with inflammatory-erosive arthritis (92%-93%).

Flexible application of the segmentation method to forepaws highlights pronounced joint destruction and bone fusions in TNF-Tg mice with a rapid reduction in segmentation accuracy
We further extended the application of the novel segmentation method to murine forepaws (n=55 WT male forepaws, n=29 WT female forepaws, n=54 TNF-Tg male forepaws, and n=50 TNF-Tg female forepaws) with unique bone size and anatomy. There was 1 forepaw at 4 months from the WT male, 1 forepaw at 4 months and 2 forepaws at 5 months for the WT female, 2 forepaws at 3 months for the TNF-Tg male, and 2 forepaws at 4 months and 4 forepaws at 5 months for the TNF-Tg female, which were omitted due to imaging error or premature death prior to the endpoint. In addition, there was a partial imaging error for 1 forepaw at 3 months for the WT female with the omission of DP-F3, PP-F3, DP-F4, and PP-F4 (Supplementary Table 3 and Supplementary Table 4). For orientation, we provide a model WT forepaw with each individual bone separated by color and bone-specific nomenclature indicated from different viewpoints (Figure 3). Prior investigation in TNF-Tg mice has primarily focused on the hindpaw, while here we demonstrate the architecture of murine forepaws in both WT and TNF-Tg mice. We particularly highlight the carpals (yellow dashed circle) and sesamoids (blue dashed circle) that exhibit visually profound erosive disease, especially in TNF-Tg females (Figure 4A-D). As such, comparison of hindpaw and forepaw segmentation accuracy showed marked reduction in forepaws (paw type effect p<0.0001) primarily driven by the steep decline of bone integrity with increased age and disease severity in TNF-Tg datasets (Figure 4E-F; paw x genotype effect p=0.0083; male forepaws: WT 87.29% ± 2.07% versus TNF-Tg 72.65% ± 11.70%, p<0.0001). Similar to hindpaws, the decline in TNF-Tg segmentation accuracy with aging and disease severity is more pronounced in carpals, along with the sesamoids (Figure 4G-H, Supplementary Table 7, and Supplementary Table 8). This regional bone pathology may be driven by enhanced erosive activity at the adjacent articulation of the MET-F and PP-F (metacarpophalangeal joint). Evaluation of error type revealed that TNF-Tg forepaws tend to exhibit a higher proportion of completely eroded bones compared to hindpaws (Supplementary Figure 2, red as missing). While certainly representative of progressive arthritic severity, the absence of bones in TNF-Tg forepaws could also highlight a limitation in image resolution. The severe erosions in TNF-Tg forepaws are further demonstrated by representative images across time that highlight the carpal region (white arrows) and the progressive complete dislocation of the paw from the forearm (yellow arrows) most notable in TNF-Tg females (Supplementary Figure 3). Thus, flexible application of the automated bone segmentation method to unique structures of the forepaw showed remarkable performance in WT datasets (~87%) with similar reduction in accuracy in TNF-Tg forepaws with inflammatory-erosive arthritis (67%-72%).

Data availability:
As described in micro-CT image collection section, the hindpaw data was previously published¹²^,²³^,³⁵ and is publicly available at at https://doi.org/10.5281/zenodo.11191782²⁸. The data for accuracy quantification in the SA segmentation method for WT¹² and TNF-Tg²³ datasets was repurposed for direct comparison with the novel DL model described here. No specific data from the additional prior study was repurposed³⁵, but the same hindpaw datasets that are publicly available²⁸ were also utilized. Additional details on licensing and repurposing of data are provided below. For the purposes of the described study, the corresponding forepaw data has also been made publicly available at the Zenodo repository (https://doi.org/10.5281/zenodo.14865639)²⁹.

The accuracy data for the SA segmentation method WT datasets¹² was repurposed in Figure 2. Reuse of this material is protected by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode. As authors of the referenced work, we retain the right to prepare other derivative works via Author’s Rights from Elsevier https://beta.elsevier.com/about/policies-and-standards/copyright. The datapoints have been revisualized for comparison with accuracy over time with TNF-Tg counterparts and directly compared to the novel DL method described here.

The accuracy data for the SA segmentation method WT and TNF-Tg datasets²³ was repurposed for Figure 2, and the WT and TNF-Tg hindpaw datasets were further evaluated for volumetric measurements previously²³. Reuse of the material is protected by the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The datapoints have been revisualized for assessment of accuracy over time and directly compared to the novel DL method described here.

The same publicly available WT and TNF-Tg hindpaw datasets²⁸ were further utilized for bone volumetric measurements previously for novel comparisons with wheel running cohorts³⁵. Reuse of the material is protected by the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The same publicly available datasets²⁸ have been utilized in the current work, but without any specific utilization or modification of previously published datapoints.

Micro-CT joint analysis; segmentation diagram with deep learning dataset distribution chart.
Figure 1: Automated joint space detection by strategic image processing and deep learning predictions for bone segmentation. Murine micro-CT datasets with visualization from (A) the dorsal (top) and plantar (bottom) surfaces were processed for (B) subsequent automated identification of joint spaces (blue) using a DL model (described in Supplementary Figure 1) developed from gold-standard bone segmentations¹²^,²³. (C) Final successful bone separation (bone-specific colors) was accomplished through an additional combination of image processing steps, including a black top-hat¹², structure enhancement³⁷, and membrane enhancement with tensor voting³⁸ for robust joint space identification to label individual bones. (D) Training and validation (n=40 hindpaws) of the DL component were performed with WT murine hindpaws of equal age (from 2-6 months, n=8 hindpaws each timepoint) and sex (n=20 hindpaws male/female) distribution, with a randomized 25% of subvolumes used for validation (3 subvolumes per hindpaw, total 120 subvolumes). The remaining WT hindpaws (n=44) were evaluated as test cases for further analysis. The combination of the DL model and image processing algorithms was evaluated using previously published and publicly available datasets²³^,²⁸. Please click here to view a larger version of this figure.

Bone segmentation analysis, 3D models, graphs of WT/TNF male/female, accuracy by age, heatmap results.
Figure 2: Implementation of automated joint space identification with deep learning facilitation improves bone segmentation accuracy. (A-B) Following the development of the automated joint space detection, we applied the DL model (left: blue joint spaces; right: bone-specific segmentation colors) to the remaining test cases for WT males and females. (C-D) We also assessed performance on age-matched cohorts (males: 2- 8 months; females: 2- 5 months) of TNF-Tg mice with progressive inflammatory-erosive arthritis associated with early onset mortality in females³². Inset images demonstrate high-magnification segmentation errors (dashed boxes) where disconnections in predicted joint spaces (white arrows) lead to a leak in bone separation, resulting in over-connected bone segmentation errors. (E-F) Note that the 6-month male timepoint was omitted as all WT datasets were utilized for training and validation, so they were not included in the DL testing cohort. Compared to our previous SA segmentation algorithms¹²^,²³, the segmentation accuracy (bones correctly segmented / total bones) was remarkably improved for both WT and TNF-Tg datasets with the DL approach, regardless of sex (average accuracy lines: solid black = DL WT, dashed black = DL TNF-Tg, solid grey = SA WT, dashed grey = SA TNF-Tg). However, the accuracy of TNF-Tg segmentations notably declined with time and associated progressive joint damage compared to WT, although it continued to outperform the SA method. (G-H) Heatmaps of accuracy specified to bone compartments (T = tarsals, MT = metatarsals, PP = proximal phalanges, DP = distal phalanges, S = sesamoids) demonstrate the increased error rate in TNF-Tg mice as predominately localized to the tarsal region (light (yellow) = high (100%), dark (purple) = low (20%) accuracy). As mentioned, inset images (C-D) highlight the source of error with disconnected joint spaces (arrows, left image) leading to over-connected bones (colors, right image). In fact, the errors were predominantly over-connected (2+ bones segmented as 1 material; noted in Supplementary Figure 2), which may represent the pathologic process of joint fusions with increasing arthritic severity. Statistics: 3-way mixed-effects analysis (SA versus DL; method x genotype x time; E-F), 2-way mixed-effects analysis (WT vs TNF; genotype x time; E-H); ****p<0.0001, **p<0.01, *p<0.05 (interaction effects); data presented as mean ± standard deviation. Sample sizes: n=34 hindpaws WT male (n=2 at 2-months, n=4 at 3-months, n=6 at 4-5-months, n=0 at 6-months [all data used for testing], n=8 at 7-8-months), n=10 hindpaws WT female (n=4 at 2-months, n=2 at 3-5-months), n=56 hindpaws TNF-Tg male (n=8 at 2-8-months), and n=48 hindpaws TNF-Tg female (n=14 at 2-3-months and n=10 at 4-5-months). Data used in this figure has been modified from prior studies¹²^,²³. Please click here to view a larger version of this figure.

Primate wrist bone diagram, labeled carpal anatomy, dorsal and lateral views, educational use.
Figure 3: Flexible application of joint space deep learning segmentation to other complex structures highlights murine forepaw bone anatomy. Next, we evaluated the potential for the joint space segmentation DL model to automatically separate bones in additional complex structures beyond the hindpaw. The segmentation method was implemented in the corresponding forepaw micro-CT datasets visualized from the (A) dorsal, (B) plantar, (C) lateral, and (D) medial surfaces with colors representing individual segmented bones. We identified the potential for accurate segmentation of the forepaw bones, including distinct carpals, metacarpals (#, MET-F), proximal phalanges (^, PP-F), distal phalanges (~, DP-F), sesamoids (dashed circles, S-F), and claws (*) with bone-specific labeling corresponding with known forepaw anatomy⁴⁰. Please click here to view a larger version of this figure.

X-ray imaging of mouse forepaw bones; segmentation accuracy graphs, heatmaps analyzing WT vs TNF.
Figure 4: TNF-Tg mice exhibit pronounced forepaw joint destruction and bone fusions with a rapid reduction in segmentation accuracy. (A-B) Given the complexity and small architecture of murine forepaws highlighted by dorsal (left) and plantar (right) visualization of micro-CT images from WT male and female mice, (C-D) the anatomy and associated arthritis in TNF-Tg mice has not been previously assessed. Application of our novel joint space DL approach provided an initial opportunity to evaluate these complex structures by reducing the analytical challenges with achievement of >85% accuracy of WT forepaws, although with deficient accuracy compared to hindpaws (average accuracy lines: solid blue = WT hindpaw, dashed blue = TNF-Tg hindpaw, solid red = WT forepaw, dashed red = TNF-Tg forepaw). (E-F) In addition, TNF-Tg forepaws showed a rapid and dramatic decline in segmentation accuracy due to errors localized to the carpals (yellow dotted circles in A-D) and sesamoids (blue dotted circles in A-D) over time. (G-H) The decreased regional reductions in segmentation accuracy are shown by heatmaps (light (yellow) = high (100%), dark (purple) = low (20%) accuracy) of bone compartments (C = carpals, MC = metacarpals, PP = proximal phalanges, DP = distal phalanges, S = sesamoids). Note that the 6-month male timepoint was omitted in (E) as all WT hindpaw datasets were utilized for training and validation, so were not included in the DL testing cohort. Statistics: 3-way mixed effects analysis (hindpaw vs forepaw, WT vs TNF; paw type x genotype x time, interaction effects reported; E-F), 2-way mixed effects analysis with Sidak's multiple comparisons (WT vs TNF; genotype x time; G-H); ****p<0.0001, **p<0.01, *p<0.05; data presented as mean ± standard deviation. Sample sizes: n=55 forepaws WT male (n=8 at 2-3- and 5-8-months, n=7 at 4-months), n=29 forepaws WT female (n=8 at 2-3-months, n=7 at 4-months, n=6 at 5-months), n=54 forepaws TNF-Tg male (n=8 at 2- and 4-8-months, n=6 at 3-months), and n=50 forepaws (n=14 at 2-3-months, n=12 at 4-months, and n=10 at 5-months). DL hindpaw data (E-F) reproduced from Figure 2E-F for additional comparison with DL forepaw data. Please click here to view a larger version of this figure.

Supplementary Figure 1: Development and training of the joint detection deep learning model. (A) Ground truth joint regions were obtained from initial ground truth bone segmentations by an automatic recipe using Amira, which combines label expansion, extraction of label interfaces, masking, and dilation. (B) For each of the 20 training micro-CT datasets (40 hindpaws), 6 subvolumes of 200 x 200 voxels were manually extracted from tarsals, distal phalanges, and background regions, evenly split between left and right paws (3 patches per hindpaw). The resulting 120 subvolumes were then used as input for a 3D segmentation Amira training module along with corresponding labeled joint regions as ground truth target. A randomized subset of 25% patches was used for validation to control model overfitting during the training. Please click here to download this File.

Supplementary Figure 2: Distinct distribution of error types between hindpaws and forepaws. Similar to the SA segmentation algorithm previously developed¹²^,²³, the joint space DL model produced the greatest proportion of errors by over-connecting bones (green, 2+ bones segmented as 1 material), most notable in the (A-D) hindpaws or (E-F) WT forepaws. As noted in Figure 2, over-connected errors will occur if there is a gap in the detected joint space that may occur for various reasons, including greater bone proximity than image resolution, motion artifact blurring the joint space, or bone remodeling in the context of arthritis, leading to joint fusions. (G-H) Interestingly, TNF-Tg forepaws exhibit a remarkably increased proportion of missing bones (red), meaning the bone was completely absent from the segmentation. These errors are likely attributed to a combination of severe erosions and deficiencies in image resolution, given the relatively decreased size of forepaw bones, especially carpals and sesamoids, as the predominant source of error (Figure 4), compared to those of hindpaws. Additional types of errors include over-split (blue, 1 bone segmented as 2+ materials) or both over-connected and over-split (orange). Pie charts represent proportions of total errors attributed to specific error subtypes. Please click here to download this File.

Supplementary Figure 3: Evaluation of progressive TNF-Tg forepaw arthritis with severe bone erosions and joint dislocations. To visualize the structural changes in forepaws over time, we provided representative images of the dorsal surface from (A) WT male, (B) TNF-Tg male, (C) WT female, and (D) TNF-Tg female forepaws over time from 2- 5 months (left to right) to particularly highlight the carpal region (white arrows). Note the severe bone erosions and remodeling that occur by approximately 4 months in females and 5 months in males. These time periods predate the typical onset of severe bone erosions in hindpaws at approximately 5 months in females and 7- 8 months in males²³. (E) A side view of TNF-Tg female forepaws is also shown to demonstrate the progressive dislocation of the entire paw from the forearm (yellow arrows) associated with the joint destruction. Please click here to download this File.

Supplementary Table 1: Sample sizes of WT hindpaws for DL training, validation, and methodological testing. Sample sizes in the number of hindpaws are provided across age (months 2-8) and organized by datasets used for DL training/validation, total methodologic testing, or those omitted either due to imaging error, severe motion artifact, or death prior to the scheduled micro-CT scan. Black cells from months 6-8 for females indicate the planned termination of scans after 5 months due to early mortality of TNF-Tg experimental counterparts. Please click here to download this File.

Supplementary Table 2: Sample sizes of TNF-Tg hindpaws for methodological testing. Sample sizes in the number of hindpaws are provided across age (months 2-8) and organized by datasets used for total methodologic testing or those omitted due to imaging error, severe motion artifact, and/or death prior to the scheduled micro-CT scan. Black cells from months 6-8 for females indicate the planned termination of scans after 5 months due to early mortality of TNF-Tg female mice. Please click here to download this File.

Supplementary Table 3: Sample sizes of WT forepaws for methodological testing. Sample sizes in the number of forepaws are provided across age (months 2-8) and organized by datasets used for total methodological testing or those omitted due to imaging error, severe motion artifact, and/or death prior to the scheduled micro-CT scan. Black cells from months 6-8 for females indicate the planned termination of scans after 5 months due to early mortality of TNF-Tg experimental counterparts. *At 3 months for WT females, n=1 forepaw had omitted DP-F3, PP-F3, DP-F4, and PP-F4 due to imaging error, although the remainder of the forepaw was evaluated. Please click here to download this File.

Supplementary Table 4: Sample sizes of TNF-Tg forepaws for methodological testing. Sample sizes in the number of forepaws are provided across age (months 2-8) and organized by datasets used for total methodological testing or those omitted due to imaging error, severe motion artifact, and/or death prior to the scheduled micro-CT scan. Black cells from months 6-8 for females indicate the planned termination of scans after 5 months due to early mortality of TNF-Tg female mice. Please click here to download this File.

Supplementary Table 5: Individual bone accuracy of male hindpaws. To identify the particular bones that reduce the segmentation accuracy in TNF-Tg versus WT hindpaws, details are provided on the number of bones segmented correctly, incorrectly, and the percent correct relative to the total bones evaluated in male mice. Within the tarsal region where the primary deficits occur (Figure 2), the calcaneus (CALC), intermediate cuneiform (unfused, INT), and navicular/lateral cuneiform (unfused) demonstrated the most prominent decrease in accuracy for TNF-Tg hindpaws. Statistics: Fisher's exact test; *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001. Please click here to download this File.

Supplementary Table 6: Individual bone accuracy of female hindpaws. To identify the particular bones that reduce the segmentation accuracy in TNF-Tg versus WT hindpaws, details are provided on the number of bones segmented correctly, incorrectly, and the percent correct relative to the total bones evaluated in female mice. Given the utilization of datasets for DL training and validation, along with the decreased timeframe to 5 months for comparison with TNF-Tg mice that exhibit early mortality³², the total number of allocated DL testing hindpaws for WT females limits the capacity for individual bone comparisons to explain the overall decreased accuracy in TNF-Tg datasets. Statistics: Fisher's exact test; ****p<0.0001. Please click here to download this File.

Supplementary Table 7: Individual bone accuracy of male forepaws. To identify the particular bones that reduce the segmentation accuracy in TNF-Tg vs WT forepaws, details are provided on the number of bones segmented correctly, incorrectly, and the percent correct relative to the total bones evaluated in male mice. Within the carpal and sesamoid regions where the primary deficits occur (Figure 4), the capitate (CAP), triquetrum (TRI), centrale (unfused, CENT), scaphoid/lunate (SCAPHATE), trapezoid (ZOID), and sesamoids 2-10 demonstrated the most prominent decrease in accuracy for TNF-Tg forepaws. Of note, the accuracy of sesamoids 1 and 2 is deficient for both the WT and TNF-Tg datasets. Interestingly, metacarpal 1 actually showed improvements in segmentation accuracy in TNF-Tg mice, potentially due to close articulations with adjacent bones leading to over-connected errors that are mitigated with arthritic erosions. Statistics: Fisher's exact test; *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001. Please click here to download this File.

Supplementary Table 8: Individual bone accuracy of female forepaws. To identify the particular bones that reduce the segmentation accuracy in TNF-Tg vs WT forepaws, details are provided on the number of bones segmented correctly, incorrectly, and the percent correct relative to the total bones evaluated in female mice. Within the carpal and sesamoid regions where the primary deficits occur (Figure 4), the capitate (CAP), hamate (HAM), triquetrum (TRI), and sesamoids 1-10 demonstrated the most prominent decrease in accuracy for TNF-Tg forepaws. Of note, the accuracy of sesamoids 1 and 2 is deficient for both the WT and TNF-Tg datasets. Statistics: Fisher's exact test; *p<0.05, ***p<0.001, ****p<0.0001. Please click here to download this File.

Supplementary File 1: Joint segmentation recipe for deep learning model training. Series of embedded steps to extract segmented joint spaces from gold-standard pre-segmented micro-CT hindpaws that were used to train DL model for joint space identification. Please click here to download this File.

Supplementary File 2: Bone segmentation recipe using image processing with deep learning facilitation. Series of embedded steps to transform original micro-CT data into segmentations of individual bones using image processing steps combined with the output of DL joint space identification to guide bone separation. Please click here to download this File.

Supplementary File 3: Deep learning prediction weights. File used as input for weights during deep learning prediction of joint space segmentation. Please click here to download this File.

Supplementary File 4: Deep learning prediction architecture. File used as input for architecture during deep learning prediction of joint space segmentation. Please click here to download this File.

Supplementary File 5: Deep learning python script. File used as python script for deep learning prediction of joint space segmentation. Please click here to download this File.

Discussion

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

To the end of fully automated analyses of bone volumes in mice, we generated further improvements in the segmentation of micro-CT data in complex structures, particularly murine hindpaws. The strategy was to target the articulating joint spaces to create boundaries for bone separation, where focusing on the negative space between bones enabled flexible implementation in alternative structures like the forepaw, since the approach was not particular to the shape and anatomy of the distinct hindpaw bones. Although the segmentation accuracy decreased when performed on forepaws, WT datasets still demonstrated bone accuracy of >85%. Well-described corrective processes¹² could be applied to create pre-annotated forepaw model datasets for DL training, dramatically lowering barriers for creation of structure-specific algorithms. This novel approach also allowed for application to TNF-Tg paws with severe and progressive inflammatory-erosive arthritis. In TNF-Tg paws, the decline in segmentation accuracy was striking over time, corresponding to the progressive increase in bone erosions and eventual pathologic bone-bone fusions from remodeling eroded surfaces with increased age. Thus, the remarkably successful application of an automated and highly accurate segmentation model in WT structures has the potential to guide future applications in disease models or other complex joints. Further investigation will focus on optimizing segmentation of arthritic joints that may quantify pathologic effects of bone erosions and fusions to identify disease biomarkers, as previously described²³.

Despite the successful utilization of micro-CT imaging to monitor erosion of small bones in pre-clinical arthritis models¹²^,²³^,²⁵^,³⁵^,⁴², there has been limited application of CT modalities in clinical evaluation. In particular for rheumatoid arthritis, scoring systems are primarily implemented for MRI⁴³, ultrasound⁴⁴^,⁴⁵, and/or conventional X-ray⁴⁶ to generate semi-quantitative and user-dependent measures of disease severity, often in conjunction with clinical metrics⁴⁷. As CT is considered the gold standard reference for evaluation of bone integrity⁴⁸^,⁴⁹, further optimization of clinically translatable analytical approaches promises to provide tremendous benefit for reliable and longitudinal quantitative assessment of bone volumes, both to inform measures of disease severity and to evaluate treatment response. Although imaging modalities such as MRI provide a greater breadth of information, including regions of inflammation, bone marrow changes, and soft tissue pathology, novel CT imaging approaches with multi-energy inputs⁵⁰ provide promise for extending CT utilization beyond the bone architecture. Despite these proposed benefits, we also acknowledge the immense challenges in clinical translation from developed pre-clinical analytical tools, considering the application to low-resolution clinical CT images and implementation in distinct human anatomy. Similar to our recent identification of bone-specific biomarkers in pre-clinical arthritis models²³, a detailed clinical effort investigating purely quantitative metrics of bone erosion would be a major advancement in disease monitoring.

While our current work provides a foundation for clinical implementation, given the potential for flexible application to novel structures by targeting joint spaces, a primary limitation is the reliance on a well-documented pre-clinical, research-oriented software in Amira, not intended for clinical diagnosis. However, the underlying algorithms and strategic design can be readily implemented in alternative software environments through the detailed methodology provided. Regardless of the research software used, incorporation into clinical use (rather than investigation) requires translational efforts that meet the regulatory requirements for introduction to clinical practice. For the application of the novel segmentation strategy, it is also important to consider the potential limitations in differential image resolution, where we have previously described that image resolution (i.e., voxel/structure size) is a key determinant of segmentation accuracy using solely image processing algorithms¹². In fact, this is possibly associated with the slight reduction in segmentation accuracy of forepaws, where the decreased size of the forepaw structures would inherently produce relatively reduced image quality compared to hindpaws. It is also important to acknowledge the discrepancy in the age range of training (2-6 months) and testing (including 7-8 months) datasets, which may impact application and accuracy with age-related changes, including continued bone growth or further onset of joint pathology. Our findings support maintained accuracy for WT hindpaws in the DL analysis beyond 6 months of age (Figure 2E), suggesting the deprecation in segmentation performance in TNF-Tg counterparts is likely more related to inflammatory-erosive progression independent of age itself. However, further studies with aged and elderly wild-type mice are needed to ensure consistent accuracy independent of the particular age range of the DL training cohort. Lastly, expanding the described methods beyond a single-class bone separation approach to a more robust multi-class analytical tool that includes predicted bone names based on structural architecture or coordinate location in fixed anatomy (i.e., akin to an atlas tree) will certainly provide essential improvements and likely enhance method adoption.

In conclusion, we have designed a novel image processing and DL facilitated micro-CT segmentation strategy to isolate individual bones within complex structures. This innovation demonstrates a remarkable improvement in both automaticity and segmentation accuracy compared to our recently created SA workflow¹², which served here as the foundation for the production of numerous gold-standard segmentations to train DL models and optimize the current improvements. Although translation of the segmentation methods in forepaws and paws with inflammatory-erosive arthritis showed deprecated performance, implementation of this DL segmentation approach could reduce the manual efforts necessary to generate completely annotated datasets to allow for pathology- or structure-specific DL training models. Utilization of this DL method in future studies could allow for optimization of bone segmentation across different species and disease models in pre-clinical research to allow for detailed downstream quantitative analysis. We further urge the incorporation of such strategies into clinical research, as it promises eventual benefits for patient care.

Disclosures

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Daniel Lichau and Rémi Blanc are employees of ThermoFisher Scientific involved in the development and maintenance of the Amira software used to produce the methods described in this manuscript. All other authors have nothing to disclose.

Acknowledgements

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Funding Sources: F30AG076326 (HMK), T32GM007356 (HMK), R01AR069000 (CTR), R01AR056702 (EMS), and P30AR069655 (LS, EMS, and HAA). HMK was a trainee in the Medical Scientist Training Program funded by NIH T32GM007356. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Science or NIH. We would like to thank the faculty and staff of the Histology, Biochemistry, and Molecular Imaging core, the Biomechanics, Biomaterials, and Multimodal Tissue Imaging core, and the Center for Musculoskeletal Research at the University of Rochester Medical Center for their contributions to this work.

Materials

List of materials used in this article
Name	Company	Catalog Number	Comments
Computing system	Details provided in Protocol section	Details provided in Protocol section
Image visualization software	ThermoFisher Scientific	v2022.2 or later	Amira
Isoflurane	VetOne	13985-528-60	Fluriso, 1-3% for anesthesia
Mice	University of Rochester Medical Center	N/A	C57BL/6, TNF-transgenic
Micro-CT	Scanco Medical	N/A	VivaCT 40
Statistical software	GraphPad Software, Inc	v10.2.0 or later	GraphPad Prism
Tape	N/A	N/A	To secure animal paws for imaging
Tubing	N/A	N/A	Derlin plastic and clear acrylic for animal stabilization

References

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Guidelines for assessment of bone microstructure in rodents using micro-computed tomography. J Bone Miner Res. 25 (7), 1468-1486 (2010).">Bouxsein, M., et al. Guidelines for assessment of bone microstructure in rodents using micro-computed tomography. J Bone Miner Res. 25 (7), 1468-1486 (2010).
Computational pathology for musculoskeletal conditions using machine learning: advances, trends, and challenges. Arthritis Res Ther. 24 (1), 68(2022).">Konnaris, M. A., et al. Computational pathology for musculoskeletal conditions using machine learning: advances, trends, and challenges. Arthritis Res Ther. 24 (1), 68(2022).
Redefining Radiology: A Review of Artificial Intelligence Integration in Medical Imaging. Diagnostics (Basel). 13 (17), 2760(2023).">Najjar, R. Redefining Radiology: A Review of Artificial Intelligence Integration in Medical Imaging. Diagnostics (Basel). 13 (17), 2760(2023).
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 8 (1), 53(2021).">Alzubaidi, L., et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 8 (1), 53(2021).
Convolutional neural network for detecting rib fractures on chest radiographs: a feasibility study. BMC Med Imaging. 23 (1), 18(2023).">Wu, J., et al. Convolutional neural network for detecting rib fractures on chest radiographs: a feasibility study. BMC Med Imaging. 23 (1), 18(2023).
Fully automated pelvic bone segmentation in multiparameteric MRI using a 3D convolutional neural network. Insights Imaging. 12 (1), 93(2021).">Liu, X., et al. Fully automated pelvic bone segmentation in multiparameteric MRI using a 3D convolutional neural network. Insights Imaging. 12 (1), 93(2021).
Automated anomaly-aware 3D segmentation of bones and cartilages in knee MR images from the Osteoarthritis Initiative. Med Image Anal. 93, 103089(2024).">Woo, B., et al. Automated anomaly-aware 3D segmentation of bones and cartilages in knee MR images from the Osteoarthritis Initiative. Med Image Anal. 93, 103089(2024).
An improved AlexNet model for automated skeletal maturity assessment using hand X-ray images. Future Generat Comp Syst. 121, 106-113 (2021).">He, M., Zhao, X., Lu, Y., Hu, Y. An improved AlexNet model for automated skeletal maturity assessment using hand X-ray images. Future Generat Comp Syst. 121, 106-113 (2021).
SMANet: multi-region ensemble of convolutional neural network model for skeletal maturity assessment. Quant Imaging Med Surg. 12 (7), 3556-3568 (2022).">Zhang, Y., et al. SMANet: multi-region ensemble of convolutional neural network model for skeletal maturity assessment. Quant Imaging Med Surg. 12 (7), 3556-3568 (2022).
SVTNet: Automatic bone age assessment network based on TW3 method and vision transformer. Int J Imag Syst Technol. 34 (2), e22990(2024).">Wu, J., Mi, Q., Zhang, Y., Wu, T. SVTNet: Automatic bone age assessment network based on TW3 method and vision transformer. Int J Imag Syst Technol. 34 (2), e22990(2024).
Segmentation of carpal bones from CT images using skeletally coupled deformable models. Med Image Anal. 7 (1), 21-45 (2003).">Sebastian, T. B., Tek, H., Crisco, J. J., Kimia, B. B. Segmentation of carpal bones from CT images using skeletally coupled deformable models. Med Image Anal. 7 (1), 21-45 (2003).
A High-Throughput Semi-Automated Bone Segmentation Workflow for Murine Hindpaw Micro-CT Datasets. Bone Rep. 16, 101167(2022).">Kenney, H., et al. A High-Throughput Semi-Automated Bone Segmentation Workflow for Murine Hindpaw Micro-CT Datasets. Bone Rep. 16, 101167(2022).
Bone and joint enhancement filtering: Application to proximal femur segmentation from uncalibrated computed tomography datasets. Med Image Anal. 67, 101887(2021).">Besler, B. A., et al. Bone and joint enhancement filtering: Application to proximal femur segmentation from uncalibrated computed tomography datasets. Med Image Anal. 67, 101887(2021).
Atlas-based whole-body segmentation of mice from low-contrast Micro-CT data. Med Image Anal. 14 (6), 723-737 (2010).">Baiker, M., et al. Atlas-based whole-body segmentation of mice from low-contrast Micro-CT data. Med Image Anal. 14 (6), 723-737 (2010).
Automatic nonrigid registration of whole body CT mice images. Med Phys. 35 (4), 1507-1520 (2008).">Li, X., Yankeelov, T. E., Peterson, T. E., Gore, J. C., Dawant, B. M. Automatic nonrigid registration of whole body CT mice images. Med Phys. 35 (4), 1507-1520 (2008).
Articulated whole-body atlases for small animal image analysis: construction and applications. Mol Imaging Biol. 13 (5), 898-910 (2011).">Khmelinskii, A., et al. Articulated whole-body atlases for small animal image analysis: construction and applications. Mol Imaging Biol. 13 (5), 898-910 (2011).
Image Registration in Longitudinal Bone Assessment Using Computed Tomography. Curr Osteoporos Rep. 21 (4), 372-385 (2023).">Liu, H., Durongbhan, P., Davey, C. E., Stok, K. S. Image Registration in Longitudinal Bone Assessment Using Computed Tomography. Curr Osteoporos Rep. 21 (4), 372-385 (2023).
Fully automated segmentation in temporal bone CT with neural network: a preliminary assessment study. BMC Med Imaging. 21 (1), 166(2021).">Wang, J., et al. Fully automated segmentation in temporal bone CT with neural network: a preliminary assessment study. BMC Med Imaging. 21 (1), 166(2021).
Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the Osteoarthritis Initiative. Med Image Anal. 52, 109-118 (2019).">Ambellan, F., Tack, A., Ehlke, M., Zachow, S. Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the Osteoarthritis Initiative. Med Image Anal. 52, 109-118 (2019).
Fast and accurate 3-D spine MRI segmentation using FastCleverSeg. Magn Reson Imaging. 109, 134-146 (2024).">Ramos, J. S., et al. Fast and accurate 3-D spine MRI segmentation using FastCleverSeg. Magn Reson Imaging. 109, 134-146 (2024).
Improved Repeatability of Mouse Tibia Volume Segmentation in Murine Myelofibrosis Model Using Deep Learning. Tomography. 9 (2), 589-602 (2023).">Kushwaha, A., et al. Improved Repeatability of Mouse Tibia Volume Segmentation in Murine Myelofibrosis Model Using Deep Learning. Tomography. 9 (2), 589-602 (2023).
Automated multi-scale computational pathotyping (AMSCP) of inflamed synovial tissue. Nat Commun. 15 (1), 7503(2024).">Bell, R. D., et al. Automated multi-scale computational pathotyping (AMSCP) of inflamed synovial tissue. Nat Commun. 15 (1), 7503(2024).
High-throughput micro-CT analysis identifies sex-dependent biomarkers of erosive arthritis in TNF-Tg mice and differential response to anti-TNF therapy. PLoS One. 19 (7), e0305623(2024).">Kenney, H. M., et al. High-throughput micro-CT analysis identifies sex-dependent biomarkers of erosive arthritis in TNF-Tg mice and differential response to anti-TNF therapy. PLoS One. 19 (7), e0305623(2024).
Detection and characterisation of bone destruction in murine rheumatoid arthritis using statistical shape models. Med Image Anal. 40, 30-43 (2017).">Brown, J. M., et al. Detection and characterisation of bone destruction in murine rheumatoid arthritis using statistical shape models. Med Image Anal. 40, 30-43 (2017).
Mechanical strain determines the site-specific localization of inflammation and tissue damage in arthritis. Nat Commun. 9 (1), 4613(2018).">Cambre, I., et al. Mechanical strain determines the site-specific localization of inflammation and tissue damage in arthritis. Nat Commun. 9 (1), 4613(2018).
Open-source pipeline for automatic segmentation and microstructural analysis of murine knee subchondral bone. Bone. , 167(2023).">Mahdi, H., et al. Open-source pipeline for automatic segmentation and microstructural analysis of murine knee subchondral bone. Bone. , 167(2023).
Finite element models with automatic computed tomography bone segmentation for failure load computation. Sci Rep. 14 (1), 16576(2024).">Saillard, E., et al. Finite element models with automatic computed tomography bone segmentation for failure load computation. Sci Rep. 14 (1), 16576(2024).
Zenodo. , (2024).">Kenney, H., et al. Micro-CT of hind paw. Zenodo. , (2024).
Zenodo. , (2025).">Kenney, H., et al. Micro-CT of hind paw. Zenodo. , (2025).
Transgenic mice expressing human tumour necrosis factor: a predictive genetic model of arthritis. EMBO J. 10 (13), 4025-4031 (1991).">Keffer, J., et al. Transgenic mice expressing human tumour necrosis factor: a predictive genetic model of arthritis. EMBO J. 10 (13), 4025-4031 (1991).
The TNF-alpha transgenic mouse model of inflammatory arthritis. Springer Semin Immunopathol. 25 (1), 19-33 (2003).">Li, P., Schwarz, E. The TNF-alpha transgenic mouse model of inflammatory arthritis. Springer Semin Immunopathol. 25 (1), 19-33 (2003).
Selective sexual dimorphisms in musculoskeletal and cardiopulmonary pathologic manifestations and mortality incidence in the tumor necrosis factor-transgenic mouse model of rheumatoid arthritis. Arthritis Rheumatol. 71 (9), 1512-1523 (2019).">Bell, R., et al. Selective sexual dimorphisms in musculoskeletal and cardiopulmonary pathologic manifestations and mortality incidence in the tumor necrosis factor-transgenic mouse model of rheumatoid arthritis. Arthritis Rheumatol. 71 (9), 1512-1523 (2019).
CD23+/CD21hi B cell translocation and ipsilateral lymph node collapse is associated with asymmetric arthritic flare in TNF-Tg mice. Arthritis Res Ther. 13 (4), R138(2011).">Li, J., et al. CD23+/CD21hi B cell translocation and ipsilateral lymph node collapse is associated with asymmetric arthritic flare in TNF-Tg mice. Arthritis Res Ther. 13 (4), R138(2011).
Persistent popliteal lymphatic muscle cell coverage defects despite amelioration of arthritis and recovery of popliteal lymphatic vessel function in TNF-Tg mice following anti-TNF therapy. Sci Rep. 12 (1), 12751(2022).">Kenney, H., et al. Persistent popliteal lymphatic muscle cell coverage defects despite amelioration of arthritis and recovery of popliteal lymphatic vessel function in TNF-Tg mice following anti-TNF therapy. Sci Rep. 12 (1), 12751(2022).
Implementation of automated behavior metrics to evaluate voluntary wheel running effects on inflammatory-erosive arthritis and interstitial lung disease in TNF-Tg mice. Arthritis Res Ther. 25 (1), 17(2023).">Kenney, H., et al. Implementation of automated behavior metrics to evaluate voluntary wheel running effects on inflammatory-erosive arthritis and interstitial lung disease in TNF-Tg mice. Arthritis Res Ther. 25 (1), 17(2023).
Mathematical Morphology in Image Processing. , CRC Press. (1992).">Meyer Sm Beucher, F. Mathematical Morphology in Image Processing. , CRC Press. (1992).
Medical Image Computing and Computer-Assisted Intervention - MICCAI'98. Wells, W. M., Colchester, A., Scott, D. , Springer. Berlin Heidelberg. 130-137 (1998).">Frangi, A. F., Niessen, W. J., Vincken, K. L., Viergever, M. A. Medical Image Computing and Computer-Assisted Intervention - MICCAI'98. Wells, W. M., Colchester, A., Scott, D. , Springer. Berlin Heidelberg. 130-137 (1998).
Robust membrane detection based on tensor voting for electron tomography. J Struct Biol. 186 (1), 49-61 (2014).">Martinez-Sanchez, A., Garcia, I., Asano, S., Lucic, V., Fernandez, J. J. Robust membrane detection based on tensor voting for electron tomography. J Struct Biol. 186 (1), 49-61 (2014).
Micro-tomographic atlas of the mouse skeleton. , Springer Science + Business Media. (2007).">Bab, I., Hajbi-Yonissi, C., Gabet, Y., Müller, R. Micro-tomographic atlas of the mouse skeleton. , Springer Science + Business Media. (2007).
Micro-Tomographic Atlas of the Mouse Skeleton. , Springer. (2007).">Bab, I., Hajbi-Yonissi, C., Gabet, Y., Müller, R. Micro-Tomographic Atlas of the Mouse Skeleton. , Springer. (2007).
Anatomical Variation of the Tarsus in Common Inbred Mouse Strains. . Anat Rec (Hoboken). 300 (3), 450-459 (2017).">Richbourg, H., Martin, M., Schachner, E., McNulty, M. Anatomical Variation of the Tarsus in Common Inbred Mouse Strains. . Anat Rec (Hoboken). 300 (3), 450-459 (2017).
Longitudinal assessment of synovial, lymph node, and bone volumes in inflammatory arthritis in mice by in vivo magnetic resonance imaging and microfocal computed tomography. Arthritis Rheumatol. 56 (12), 4024-4037 (2007).">Proulx, S., et al. Longitudinal assessment of synovial, lymph node, and bone volumes in inflammatory arthritis in mice by in vivo magnetic resonance imaging and microfocal computed tomography. Arthritis Rheumatol. 56 (12), 4024-4037 (2007).
Reliability of Magnetic Resonance Imaging (MRI)-scoring of the Metatarsophalangeal-joints of the Foot According to the Rheumatoid Arthritis-MRI Score (RAMRIS). J Rheumatol. 47 (8), 1165-1173 (2020).">Dakkak, Y., Matthijssen, X., van der Heijde, D., Reijnierse, M., van der Helm-van Mil, A. Reliability of Magnetic Resonance Imaging (MRI)-scoring of the Metatarsophalangeal-joints of the Foot According to the Rheumatoid Arthritis-MRI Score (RAMRIS). J Rheumatol. 47 (8), 1165-1173 (2020).
Ultrasound detection of subclinical synovitis in rheumatoid arthritis patients in clinical remission: a new reduced-joint assessment in 3 target joints. Clin Exp Rheumatol. 36 (6), 984-989 (2018).">Dimanti, A., et al. Ultrasound detection of subclinical synovitis in rheumatoid arthritis patients in clinical remission: a new reduced-joint assessment in 3 target joints. Clin Exp Rheumatol. 36 (6), 984-989 (2018).
A reduced 12-joint ultrasound examination predicts lack of X-ray progression better than clinical remission criteria in patients with rheumatoid arthritis. Rheumatol Int. 37 (8), 1347-1356 (2017).">De Miguel, E., et al. A reduced 12-joint ultrasound examination predicts lack of X-ray progression better than clinical remission criteria in patients with rheumatoid arthritis. Rheumatol Int. 37 (8), 1347-1356 (2017).
Assessment of structural damage progression in established rheumatoid arthritis by conventional radiography, computed tomography, and magnetic resonance imaging. Best Pract Res Clin Rheumatol. 33 (5), 101481(2019).">Ornbjerg, L., Ostergaard, M. Assessment of structural damage progression in established rheumatoid arthritis by conventional radiography, computed tomography, and magnetic resonance imaging. Best Pract Res Clin Rheumatol. 33 (5), 101481(2019).
Update of the American College of Rheumatology Recommended Rheumatoid Arthritis Disease Activity Measures. Arthritis Care Res (Hoboken). 71 (12), 1540-1555 (2019).">England, B., et al. Update of the American College of Rheumatology Recommended Rheumatoid Arthritis Disease Activity Measures. Arthritis Care Res (Hoboken). 71 (12), 1540-1555 (2019).
Are bone erosions detected by magnetic resonance imaging and ultrasonography true erosions? A comparison with computed tomography in rheumatoid arthritis metacarpophalangeal joints. Arthritis Res Ther. 8 (4), R110(2006).">Dohn, U., et al. Are bone erosions detected by magnetic resonance imaging and ultrasonography true erosions? A comparison with computed tomography in rheumatoid arthritis metacarpophalangeal joints. Arthritis Res Ther. 8 (4), R110(2006).
Detection of bone erosions in rheumatoid arthritis wrist joints with magnetic resonance imaging, computed tomography and radiography. Arthritis Res Ther. 10 (1), R25(2008).">Dohn, U., et al. Detection of bone erosions in rheumatoid arthritis wrist joints with magnetic resonance imaging, computed tomography and radiography. Arthritis Res Ther. 10 (1), R25(2008).
Dual-energy CT: a new imaging modality for bone marrow oedema in rheumatoid arthritis. Ann Rheum Dis. 77 (6), 958-960 (2018).">Jans, L., et al. Dual-energy CT: a new imaging modality for bone marrow oedema in rheumatoid arthritis. Ann Rheum Dis. 77 (6), 958-960 (2018).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Automated Joint Space Detection Improves Bone Segmentation Accuracy

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

References

Reprints and Permissions

Tags

Related Articles