This protocol was designed to train a machine learning algorithm to use a combination of imaging parameters derived from magnetic resonance imaging (MRI) and positron emission tomography/computed tomography (PET/CT) in a rat model of breast cancer bone metastases to detect early metastatic disease and predict subsequent progression to macrometastases.
Machine learning (ML) algorithms permit the integration of different features into a model to perform classification or regression tasks with an accuracy exceeding its constituents. This protocol describes the development of an ML algorithm to predict the growth of breast cancer bone macrometastases in a rat model before any abnormalities are observable with standard imaging methods. Such an algorithm can facilitate the detection of early metastatic disease (i.e., micrometastasis) that is regularly missed during staging examinations.
The applied metastasis model is site-specific, meaning that the rats develop metastases exclusively in their right hind leg. The model’s tumor-take rate is 60%–80%, with macrometastases becoming visible in magnetic resonance imaging (MRI) and positron emission tomography/computed tomography (PET/CT) in a subset of animals 30 days after induction, whereas a second subset of animals exhibit no tumor growth.
Starting from image examinations acquired at an earlier time point, this protocol describes the extraction of features that indicate tissue vascularization detected by MRI, glucose metabolism by PET/CT, and the subsequent determination of the most relevant features for the prediction of macrometastatic disease. These features are then fed into a model-averaged neural network (avNNet) to classify the animals into one of two groups: one that will develop metastases and the other that will not develop any tumors. The protocol also describes the calculation of standard diagnostic parameters, such as overall accuracy, sensitivity, specificity, negative/positive predictive values, likelihood ratios, and the development of a receiver operating characteristic. An advantage of the proposed protocol is its flexibility, as it can be easily adapted to train a plethora of different ML algorithms with adjustable combinations of an unlimited number of features. Moreover, it can be used to analyze different problems in oncology, infection, and inflammation.
The purpose of this protocol is to integrate several functional imaging parameters from MRI and PET/CT into a model-averaged neural network (avNNet) ML algorithm. This algorithm predicts the growth of macrometastases in a rat model of breast cancer bone metastases at an early timepoint, when macroscopic changes within the bone are not yet visible.
Prior to the growth of macrometastases, a bone marrow invasion of disseminated tumor cells occurs, commonly referred to as micrometastatic disease1,2. This initial invasion can be considered an early step in metastatic disease, but is typically missed during conventional staging examinations3,4. Although the currently available imaging modalities cannot detect bone marrow microinvasion when used alone, a combination of imaging parameters yielding information on vascularization and metabolic activity has been shown to perform better5. This complementary benefit is achieved by combining different imaging parameters into an avNNet, which is an ML algorithm. Such an avNNet allows for the reliable prediction of bone macrometastases formation before any visible metastases are present. Therefore, integrating imaging biomarkers into an avNNet could serve as a surrogate parameter for bone marrow microinvasion and early metastatic disease.
To develop the protocol, a previously described model of breast cancer bone metastases in nude rats was used6,7,8. The advantage of this model is its site-specificity, meaning that the animals develop bony metastases exclusively in their right hind leg. However, the tumor-take rate of this approach is 60%–80%, so a considerable number of the animals do not develop any metastases during the study. Using imaging modalities such as MRI and PET/CT, the presence of metastases is detectable from day 30 postinjection (PI). At earlier time points (e.g., 10 PI) imaging does not distinguish between animals that will develop metastatic disease and those will not (Figure 1).
An avNNet trained on functional imaging parameters acquired on day 10 PI, as described in the following protocol, reliably predicts or excludes the growth of macrometastases within the following ~3 weeks. Neural Networks combine artificial nodes within different layers. In the study protocol, the functional imaging parameters for bone marrow blood supply and metabolic activity represent the bottom layer, while the prediction of malignancy represents the top layer. An additional intermediate layer contains hidden nodes that are connected to both the top and the bottom layer. The strength of the connections between the different nodes is updated during the training of the network to perform the respective classification task with high accuracy9. The accuracy of such a neural network can be further increased by averaging the outputs of several models, resulting in an avNNet10.
All care and experimental procedures were performed in accordance with national and regional legislation on animal protection, and all animal procedures were approved by the State Government of Franconia, Germany (reference number 55.2 DMS-2532-2-228).
1. Induction of breast cancer bone metastases in the right hind leg of nude rats
NOTE: A detailed description of the induction of breast cancer bone metastases in nude rats has been published elsewhere6,8. The most relevant steps are presented below.
2. Magnetic resonance imaging (MRI)
NOTE: For a detailed description of MRI procedures, please see Bäuerle et al.11.
3. Positron emission tomography/computed tomography (PET/CT)
NOTE: For a detailed description of the PET procedures, please see Cheng at al.12.
4. Alternative imaging strategies
5. MRI analysis
6. PET/CT analysis
7. Determining the tumor-take rate
8. Feature selection
9. ML analysis
10. Training an avNNet ML algorithm
11. Analyzing the ML algorithm’s results
12. Comparing the final model's Receiver Operating Characteristic (ROC) curve with the ROC curves of its constituent parameters
The rats recovered quickly from the surgery and injection of the MDA-MB-231 breast cancer cells and were then subjected to MR- and PET/CT imaging on days 10 and 30 PI (Figure 1). A representative DCE analysis of a rat’s right proximal tibia is presented in Figure 2A. The DCE raw measurements were saved by selecting the “Export” button and choosing “DCEraw.txt” as the file name.
Subsequent calculations of the dynamic parameters, AUC, PE, and washout were performed in RStudio with the respective script. The output of the DCE measurements had to be saved as “DCEraw.txt” within the “Downloads” folder, so that the script could be run directly without additional configurations to provide a data table, as depicted in Figure 2B. These data were copied to the provided spreadsheet (Figure 2C). Similarly, the DCE parameters for muscular tissue were determined and transferred to the spreadsheet (Figure 2D,E). These values were normalized by dividing the bone measurements by the muscle measurements; this was performed automatically within the spreadsheet. From the PET/CT, the calculated SUVmax values were subsequently transferred into the table (Figure 2F).
On day 30 PI, all animals were evaluated to determine whether or not they developed metastases, and the table was completed by coding positive tumor burden as “1” and healthy animals as “0” within the rightmost column of the spreadsheet (Figure 2C).
The spreadsheet was imported into the open-source data visualization, machine learning, and data mining toolkit, and feature ranking revealed the SUVmax, PE, and AUC as the top three features for prediction of metastatic disease (Figure 3). These parameters reflect metabolic activity (SUVmax) and tissue vascularization (PE and AUC).
Running the TrainModel.R Script automatically imported the spreadsheet and calculated an avNNet. The optimal hyperparameter combination was determined (Figure 4A) and the final model was then calculated using the optimal hyperparameter combination (Figure 4B). Subsequently, a set of standard diagnostic parameters was calculated (Figure 4C) and an ROC curve of the model was plotted (Figure 4D).
The positive result is shown in Figure 4B–D. A comparison of the model’s ROC curve with the ROC curve of its three constituents (i.e., AUC, PE, and SUVmax) revealed that the model performed significantly better than all of its three constituents (p = 0.01 for AUC, p = 0.003 for PE, and p = 0.007 for SUVmax). The combination of the three selected parameters to an avNNet was more sensitive, thus allowing prediction of macroscopic disease with an overall accuracy of 85.7% (95% confidence interval = 67.3%–96.0%). These results were obtained from an analysis of 28 samples. The confidence intervals can be further narrowed by increasing the number of animals.
The negative results could be obtained as described here. The accuracy measures were highly sensitive to specific types of machine learning algorithms and to steps of data preprocessing. Neural Networks, in particular, tended to perform better when the input data were normalized. This was achieved by the “BoxCox” function in section 10 of the protocol (lines 22 and 36 within the provided TrainModel.R-Script). Refraining from normalization and using a different algorithm (e.g., a neural network not averaged), by changing the method to “nnet” (lines 21 and 35 within the provided TrainModel.R-Script), resulted in an area of 0.594 under the ROC curve (Supplementary Figure 1). Such a model failed to outperform its three constituents significantly (all p > 0.15).
Because the script was optimized for RStudio 3.4.1 and the caret package version 6.0-84, using different software versions might yield different results. Running the provided scripts with the software versions used in this manuscript will give similar results. However, if readers aim to modify the script, add additional variables, change document folders or file names, or modify the machine learning algorithms in greater detail, respective adjustments of the code will be necessary. For these cases, the manual of the caret-package offers in-depth explanations20.
Figure 1: Representative MR and PET/CT images. MR and PET/CT images of the right hind leg of an animal that did not develop metastases over the course of the study (two leftmost columns, with images from day 10 and day 30 PI), and an animal that developed metastases between day 10 and day 30 PI (two rightmost columns, metastases marked with arrows). Note the high signal intensity of metastases in the T2w images (upper row), the contrast enhancement depicted by the increased area under the curve (AUC; second row), and the increased maximum standardized uptake value in the PET/CT (SUVmax; third row). Note that there are no visible differences in the images acquired on day 10 PI (first and third column) between the animal with metastases on day 30 PI and the animal that developed no bone metastases. This figure was modified from Ellmann et al.5. Please click here to view a larger version of this figure.
Figure 2: Assessment of the imaging features and compilation into a spreadsheet. (A) The dynamic contrast enhancement of the proximal tibia’s bone marrow was analyzed with a freeware DICOM viewer15 using a DCE plugin16. The respective measurements were saved, and (B) further analyzed with the provided DCE-Script.R-file in RStudio17. (C) The output was copied into a spreadsheet (see supplementary material for a template). (D) Likewise, the DCE measurement was performed for adjacent muscular tissue, analyzed using RStudio (E), and then copied to the spreadsheet. Data were normalized by dividing the results of the bone measurements by the results of the muscle measurements (C; salmon-shaded cells). (F) The PET/CT measurements were performed with the vendor’s software. The maximum standardized uptake value was calculated by dividing the respective measurement by the injected activity and multiplying it by the animal’s body weight. The result was subsequently copied into the spreadsheet. Please click here to view a larger version of this figure.
Figure 3: Feature ranking. Ranking of the acquired imaging features was performed within an open-source data visualization, machine learning, and data mining toolkit18 by importing the spreadsheet via the “File”-subroutine, and analyzing it via the “Rank”-subroutine. Please click here to view a larger version of this figure.
Figure 4: Representative RStudio output. The machine learning algorithm was developed using RStudio17 with the provided TrainModel.R-Script file. (A) A grid search among different hyperparameter combinations for the model-averaged neural network revealed a size of three neurons and a decay of 0.0005 as an optimum. (B) Using this hyperparameter combination, a full network was trained and cross-validated, reaching an overall accuracy of 85.7%. (C) Standard parameters of diagnostic accuracy, including sensitivity, specificity, positive and negative predictive values, and likelihood ratios, were calculated from a confusion matrix. (D) A representative ROC plot of the cross-validated model revealed an area under the curve (AUC) of 0.917. Please click here to view a larger version of this figure.
Supplementary Figure 1: Negative result. Changing the ML algorithm to a Neural Network without averaging and refraining from normalization of the input parameters led to a drop of the area under the curve of the ROC curve from 0.917 (Figure 4D) to 0.594. Please click here to download this file.
Supplementary Figure 2: Alternative feature ranking. In contrast to the standard method depicted in Figure 3, a random variable was also introduced (“Random”; highlighted in blue), with its importance included in the ranking. This approach confirmed the applied selection of the variables SUVmax, PE, and AUC. Please click here to download this file.
ML algorithms are powerful tools used to integrate several predictive features into a combined model and obtain an accuracy that exceeds that of its separate constituents when used alone. Nonetheless, the actual result depends on several critical steps. First, the ML algorithm used is a crucial factor, because different ML algorithms yield different results. The algorithm used in this protocol is an avNNet, but other promising algorithms include Extreme Gradient Boosting21 or Random Forests. The caret package20 for RStudio provides a plethora of different algorithms (currently >175), and the proposed protocol is highly flexible in terms of switching from one algorithm to another by simply changing single lines of code (e.g., changing method = “avNNet” to method =“rf”) and adapting the TunedGrid-settings to the respective ML algorithm. For details, see the caret github repository22. An excellent overview of different algorithms and their performance with respect to different classification problems was published by Fernández-Delgado et al.23 and could serve as a starting point for other experiments.
Another crucial factor is the choice of relevant features to include in an ML algorithm. This protocol proposes the use of the filter method, “Information Gain”19, to rank the available features in descending order and use the most relevant ones to train the avNNet. Filter methods are based only on general assumptions, such as correlation with the variable to predict, so researchers should preselect features independently of the classifier used24,25. Such methods are particularly effective in computation time and robust to overfitting. However, the cutoff that separates relevant from irrelevant features is defined by the user, making it somewhat arbitrary. For the proposed protocol, the features with the top 75% information gain were used, corresponding to SUVmax, PE, and AUC. This selection, however, can be systematically strengthened by including a random variable that has no relationship to the target, calculating its information gain, and then comparing it to the information gain of the real features (Supplementary Figure 2). This slightly more sophisticated approach additionally confirmed the choice of the three abovementioned features as being the most relevant. Nonetheless, several different filter methods exist, along with other approaches that select features with respect to a particular classifier algorithm, such as feature extraction and wrapper methods. Different feature selection approaches may yield different results.
To ensure generalizability of the ML algorithm and further prevent overfitting, the proposed protocol applies leave-one-out cross-validation (LOOCV). The best approach, however, would be to randomly remove a subset from the entire data set, and treat it as a testing set. The ML algorithm is then trained on the remainder of the data (i.e., the training set) to subsequently predict the outcome of the testing set. However, this approach needs a sufficiently large data set. For smaller sample sizes, application of LOOCV is common because it provides an almost unbiased estimate of a model's true generalization ability26. In LOOCV, the first data point is removed from the data set, and the avNNet is trained with the retained data. Then, the outcome of the formerly withheld data point is predicted and saved. This process is repeated for all data points, so that finally each outcome is predicted with data that were not used for training the algorithm. Other validation approaches include x-fold cross-validations (most commonly 10-fold) and can be easily applied by changing the respective trainControl parameter within the code to method="CV".
From a quantitative point-of-view, medical images are typically evaluated in a very basic way, largely relying on measurements of size and shape of potentially suspicious lesions or areas27,28. However, the advantage of the Digital Imaging and Communications in Medicine (DICOM) standard is that it allows extraction of many features, referred to as radiomics. The term “radiomics“ was initially defined as the high-throughput extraction of large quantities of image features29, but was subsequently extended to include the conversion of images to higher dimensional data30. However, the higher dimensional data are mainly used to identify correlations rather than causes30. The features described in this protocol fall in between classic radiological features, such as size and shape, and radiomics, as they resemble generally accepted parameters of vascularization and metabolic activity. This offers a potential causal relation to the microinvasion of disseminated tumor cells. If desired by the user, an extraction of radiomic features can be performed with different software packages31.
The protocol provided is not restricted to a finite number of features. Thus, it can be used with large radiomics data sets. However, the abovementioned issue of feature selection becomes increasingly important with growing data sets. The presented protocol can also be transferred to different study settings, e.g., from the fields of oncology, infection, or inflammation32.
The authors have nothing to disclose.
This work was supported by the German Research Foundation (DFG, Collaborative Research Centre CRC 1181, subproject Z02; Priority Programme μBone, projects BA 4027/10-1 and BO 3811), including additional support for the scanning devices (INST 410/77-1 FUGG and INST 410/93-1 FUGG), and by the Emerging Fields Initiative (EFI) “Big Thera” of the Friedrich-Alexander-University Erlangen-Nürnberg.
Binocular Operating Microscope | Leica | NA | |
ClinScan MR System | Bruker | NA | |
DICOM Viewer | Horos | NA | www.horosproject.org |
Excel: Spreadsheet | Microsoft | NA | |
FCS | Sigma | F2442-500ML | |
Gadovist | Bayer-Schering | NA | |
Inveon PET/CT | Siemens | NA | |
Inveon Research Workplace Software | Siemens Healthcare GmbH | NA | |
IVIS Spectrum | PerkinElmer | NA | |
MDA-MB-231 human breast cancer cells | American Type Culture Collection | N/A | |
Open-source data visualization, machine learning and data mining toolkit. | Orange3, University of Ljubljana | NA | https://orange.biolab.si/ |
RPMI-1640 | Invitrogen/ThermoFisher | 11875093 | |
Trypsin | Sigma | 9002-07-7 | |
Vevo 3100 | VisualSonics | NA |