This test-retest study evaluated leg blood flow measured by the Doppler ultrasound technique during single-leg knee-extensor exercise. The within-day, between-day, and inter-rater reliability of the method was investigated. The approach demonstrated high within-day and acceptable between-day reliability. However, the inter-rater reliability was unacceptably low during rest and at low workloads.
Doppler ultrasound has revolutionized the assessment of organ blood flow and is widely used in research and clinical settings. While Doppler ultrasound-based assessment of contracting leg muscle blood flow is common in human studies, the reliability of this method requires further investigation. Therefore, this study aimed to investigate the within-day test-retest, between-day test-retest, and inter-rater reliability of Doppler ultrasound for assessing leg blood flow during rest and graded single-leg knee-extensions (0 W, 6 W, 12 W, and 18 W), with the ultrasound probe being removed between measurements. The study included thirty healthy subjects (age: 33 ± 9.3, male/female: 14/16) who visited the laboratory on two different experimental days separated by 10 days. The study did not control for major confounders such as nutritional state, time of day, or hormonal status. Across different exercise intensities, the results demonstrated high within-day reliability with a coefficient of variation (CV) ranging from 4.0% to 4.3%, acceptable between-day reliability with a CV ranging from 10.1% to 20.2%, and inter-rater reliability with a CV ranging from 17.9% to 26.8%. Therefore, in a real-life clinical scenario where controlling various environmental factors is challenging, Doppler ultrasound can be used to determine leg blood flow during submaximal single-leg knee-extensor exercise with high within-day reliability and acceptable between-day reliability when performed by the same sonographer.
Doppler ultrasound, introduced in the 1980s, has been extensively used to determine contracting muscle blood flow, particularly in the single-leg knee-extensor model, allowing measurement of blood flow in the common femoral artery (CFA) during small muscle mass activation1,2,3,4,5,6. Doppler ultrasound-based blood flow technology has provided valuable insights into vascular regulation in various populations, including healthy adults7,8, individuals with diabetes9, hypertension10, COPD11,12, and heart failure13,14.
One advantage of Doppler ultrasound is its non-invasiveness compared to other blood flow determination methods like thermodilution, and it can be combined with arterial and venous catheterization if necessary3,4,6,15. It also enables beat-to-beat blood flow velocity measurement, allowing for the detection of rapid changes16. However, Doppler ultrasound-based blood measurements have limitations, including difficulties in obtaining stable recordings during excessive limb movement at near-maximal exercise intensities and the requirement for ultrasound accessibility to the targeted blood vessel, excluding evaluations during ergometer bicycling15. Hence, the single-leg knee-extensor model is well-suited for LBF evaluation using Doppler ultrasound during dynamic exercise at submaximal intensities17, minimizing the influence of exercise-related heart and lung limitations and facilitating comparisons between healthy subjects and patients with cardio-pulmonary diseases11.
Despite being widely used, the between-day reliability of the single-leg knee-extensor model using Doppler ultrasound has not been investigated on a larger scale in recent decades, with prior studies involving small populations (n = 2)3,18,19,20.
This study aimed to investigate (1) the within-day test-retest reliability, (2) the between-day test-retest reliability, and (3) the inter-rater reliability of Doppler ultrasound for LBF evaluation during single-leg knee-extensor exercise at 0 W, 6 W, 12 W, and 18 W. The measurements were conducted in a clinically realistic scenario where the probe was removed between measurements. It is important to note that several intrinsic and extrinsic environmental factors known to influence LBF were not controlled during the measurements, which could introduce variability and affect reliability. Considering advancements in Doppler ultrasound technology and blood flow analysis software, we hypothesized that, even in an uncontrolled setting, acceptable within- and between-day reliability of LBF measurements could be achieved at all intensities when performed by the same sonographer.
The study was evaluated by the Regional Ethical Committee of the Capital Region of Denmark (file no. H-21054272), who determined that this was a quality study. In accordance with Danish legislation, the study was thus approved locally by the internal Research and Quality Improvement Board at the Department of Clinical Physiology and Nuclear Medicine, Rigshospitalet (file no. KF-509-22). The study was performed according to the guidelines of the Declaration of Helsinki. All subjects provided oral and written informed consent prior to enrolment. Men and women, ≥18 years, were included in the study. Individuals with peripheral arterial disease, heart failure, neurological and musculoskeletal disease hindering KEE effort, and symptoms of disease within 2 weeks prior to the study, were excluded.
1. Setup of the participant
2. Setup of the ultrasound apparatus
3. Doppler ultrasound scan
4. Quantitation of blood flow
Participants
From May 2022 to October 2022, a total of thirty healthy men and women were recruited to participate in the study. All participants had no history of cardiovascular, metabolic, or neurological diseases. They were not instructed to make any changes to their usual habits, including caffeine, alcohol, nicotine, vigorous exercise, or any other factors that could potentially impact vascular function.
Experimental procedures
Participants reported to the laboratory on two different experimental days 10 days apart. For each participant, the experiments were performed at the same time of the day, but the time of day differed between participants. Furthermore, the experiments were performed in the same room with limited light exposure, controlled temperature, no music, and limited conversation. On experimental day 1 and 2, the measurements were performed by the same sonographer (S1).
The participants were placed in the single-leg knee-extensor model described in the protocol and Supplementary Figure 1. The single-leg knee-extensor chair was constructed by a former professor at our research center (Professor Bengt Saltin), and is also referred to as the 'Saltin Chair' (see Table of Materials).
On two different experimental days, with a 10-day interval, participants reported to the laboratory. The experiments were conducted at the same time of day for each participant, although the specific time differed between participants. The experiments took place in a controlled environment, with limited light exposure, controlled temperature, no music, and limited conversation. On both experimental days (1 and 2), the measurements were performed by the same sonographer (S1). Participants were positioned in the single-leg knee-extensor model, as described in the protocol and Supplementary Figure 1. The single-leg knee-extensor chair, also known as the 'Saltin Chair' (see Table of Materials), was developed by Professor Bengt Saltin at our research center.
Initially, blood flow in the common femoral artery (CFA) of the dominant leg was measured in the seated resting condition, with the leg secured to the pedal. Subsequently, participants commenced the exercise, and blood flow was measured at the following workloads: 0 W, 6 W, 12 W, and 18 W. Each exercise session lasted for 4 min and was performed continuously. Two blood flow measurements were taken at each workload to ensure a steady state. The measurements were obtained at 2.5 min and 3.5 min into each workload21. To assess within-day reliability, the probe was briefly lifted away from the artery for 10 s after the first measurement, and then repositioned for the second measurement, as shown in Figure 1. The end-systolic diameter of the CFA, measured at rest, was used to calculate flow throughout the experiment.
On the third experimental day, the variation between the two sonographers was investigated using the same exercise protocol described above. Six participants provided informed consent for a third visit. Two skilled sonographers, experienced in measuring blood flow in clinical settings, performed measurements within 1 min of each other at the same workload, as depicted in Figure 1. Skilled sonographers were defined as having completed a minimum of 20 h of scanning volunteers in the single-leg knee-extensor model, including supervision for error correction. Both sonographers demonstrated comparable within-day reliability. During the exercise, the two sonographers measured blood flow in a randomized order, while being blinded to each other's measurements. To avoid audio and visual feedback, the sonographers were not present in the room simultaneously. The first sonographer completed the first measurement after 150 s at a given workload. After completing the trace, the first sonographer reset the ultrasound apparatus to default settings and left the room. The participant maintained the same pace and load, and then the second sonographer entered the room to obtain a new trace. Both sonographers performed blood flow measurements for the four workloads, as in experimental days 1 and 2. Prior to scanning at each workload, a coin flip determined the randomized order for the sonographers, ensuring that the 'winner' started the measurement. On experimental day 3, each sonographer obtained only one blood flow measurement during each exercise session.
Statistica
All statistical analyses were conducted using statistical software. A significance level of p < 0.05 (two-tailed) was considered statistically significant. The data are presented as mean (standard deviation, SD) or mean [95% confidence interval, lower limit (LL), upper limit (UL)]. Paired t-tests were used to assess within-day and between-day differences in LBF. The p-values were Bonferroni corrected, with a threshold of 0.005 for statistical significance.
Reliability measures the amount of random error introduced by variability in the measured variable22. Absolute reliability was assessed using Bland-Altman plots and presented as limits of agreement (LOA) and smallest real difference (SRD), which estimate the expected difference between two measurements in 95% of cases23,24. One-way analysis of variance (ANOVA) was used to determine the standard deviation within participants (SDw), and SRD was calculated using the following formula24:
To compare the method with other LBF measurement techniques, the coefficient of variance (CV) was calculated as a relative measure of reliability. CV expresses the proportion of variance caused by measurement error25:
Based on the distribution of mean estimates and residual variance from a linear mixed model, the distribution of CV was simulated to obtain 95% confidence intervals for CV26. There is no official consensus on the quality levels of CV values, as they depend on the methodology and study type. However, CV is generally considered low if <10%, acceptable if 10%-20%, and non-acceptable if above 25%25,27.
In this study, sonographer 1 and sonographer 2 were the only raters of interest, and multiple measurements were performed to determine the appropriate ICC model to use. The intraclass correlation coefficient (ICC) was calculated using a two-way mixed-effects model with the absolute agreement and multiple measurements ICC (3, k). The first number refers to the model (1, 2, or 3), and the second number/letter refers to the type, indicating whether it is a single rater/measurement (1) or the mean of raters/measurements (k)28,29.
Both absolute and relative reliability are commonly used to assess the reliability of a measurement. Repeatability refers to the consistency of obtaining the same results when the measurement is repeated under identical conditions. Reproducibility, on the other hand, refers to the ability to obtain consistent results when the measurement is performed under varying or changing conditions. These terms are useful for understanding and evaluating the reliability of a measurement method22.
All participants successfully completed the study and tolerated the experimental design. A total of 30 healthy subjects (age: 33 ± 9.3, male/female: 14/16) were included, with a mean weight of 74.5 kg (SD: 13) and a mean height of 174 cm (SD: 9.3).
Absolute values and internal consistency
There were no statistically significant differences in the absolute LBF values between within-day or between-day measurements (Table 1). LBF increased progressively across the incremental workloads (Figure 2), ranging from 0.36 (SD: 0.20) L/min at rest to 2.44 (SD: 0.56) L/min during exercise at 18 W, demonstrating a linear increase with workload progression.
Bland-Altman plots illustrating LBF measurements are presented for within-day reliability in Figure 3, between-day reliability in Figure 4, and inter-rater reliability in Figure 5. Within-day data showed no outliers, while a few outliers were observed in the between-day measurements, and several outliers were observed during the inter-rater measurements.
Test-retest reliability
Values for smallest real difference (SRD), the coefficient of variation (CV), and intraclass correlation coefficient (ICC) are provided for within-day in Table 2, between-day in Table 3, and for inter-rater in Table 4.
The within-day SRD values ranged from 0.28 [95% CI: 0.22, 0.38] L/min during 0 W to 0.39 [95% CI: 0.32, 0.50] L/min during 18 W. The SRD values were higher in the between-day measurements ranging from 0.66 [95% CI: 0.41, 1.32] L/min at 0 W to 0.71 [95% CI: 0.53, 1.01] L/min during 18 W. The SRD was even higher in the inter-rater measurements ranging from 0.23 [95% CI: 0.12, 0.70] L/min at rest to 1.55 [95% CI: 1.02, 2.82] L/min during exercise at 18 W.
The CV values ranged from 4.0 [95% CI: 3.0, 5.1] % during 18 W to 4.2 [95% CI: 3.1, 5.3] % during 0 W. The CV was also higher in the between-day measurements ranging from 20.2 [95% CI: 14.7, 27.2] % during rest to 10.1 [95% CI: 7.5 to 13.1] % during 6 W. Even higher values were obtained during the inter-rater measurements with a CV ranging from 26.8 [95% CI: 11, 51] % at rest to 17.9 [95% CI: 8.5, 29.2] % during 6 W.
The ICC values showed that the reliability at all workloads, both during within-day and between-day, was >0.90. Conversely, the inter-rater measurements yielded ICC values as low as 0.41 (0.1 to 0.84).
Figure 1: Study design overview. A total of 30 healthy participants underwent a single-leg knee-extensor protocol with incremental workloads ranging from 0 to 18 W. This protocol was repeated within a 10-day period. A subgroup of 6 participants volunteered for the inter-rater reliability study on day 3. Please click here to view a larger version of this figure.
Figure 2: Leg blood flow response to single-leg knee-extensor exercise. The mean values for day 1 and day 2 are represented by black and grey dots, respectively, with whiskers indicating the standard deviation. One measurement was obtained at rest, and two measurements were obtained at each workload (0, 6, 12, and 18 W). Please click here to view a larger version of this figure.
Figure 3: Within-day test-retest reliability of leg blood flow during single-leg knee-extension depicted by Bland-Altman plots. The plots were created from within-day measurements on both days (n = 60). One plot is shown for each incremental workload: 0 W (A), 6 W (B), 12 W (C), and 18 W (D). Please click here to view a larger version of this figure.
Figure 4: Between-day test-retest reliability of leg blood flow during single-leg knee-extension depicted by Bland-Altman plots. The plots were created from between-day measurements (n = 30). One plot is shown for each condition: rest (A), 0 W (B), 6 W (C), 12 W (D), and 18 W (E). Please click here to view a larger version of this figure.
Figure 5: Inter-rater test-retest reliability of leg blood flow during single-leg knee-extension depicted by Bland-Altman plots. The plots were created from inter-rater measurements (n = 6). One plot is shown for each condition: rest (A), 0 W (B), 6 W (C), 12 W (D), and 18 W (E). Please click here to view a larger version of this figure.
N = 30 | Day 1, 1. LBF | Day 1, 2. LBF | Within-day p-value | Day 2,1. LBF | Day 2,2. LBF | Within-day p-value | Between-day mean difference | Between-day | Day 1, CFA diameter (cm) | Day 2, CFA diameter (cm) |
Rest (L/min) | 0.36 (0.20) | NA | NA | 0.37 (0.14) | NA | NA | 0.006 (0.11) | 0.76 | 0.94 (0.12) | 0.96 (0.14) |
0 W (L/min) | 1.68 (0.40) | 1.69 (0.47) | 0.60 | 1.58 (0.34) | 1.63 (0.40) | 0.03 | 0.13 (0.30) | 0.37 | ||
6 W (L/min) | 1.77 (0.45) | 1.75 (0.46) | 0.53 | 1.74 (0.40) | 1.72 (0.39) | 0.25 | 0.02 (0.26) | 0.37 | ||
12 W (L/min) | 1.99 (0.50) | 1.99 (0.45) | 0.8 | 1.95 (0.37) | 1.97 (0.38) | 0.42 | 0.07 (0.32) | 0.4 | ||
18 W (L/min) | 2.43 (0.55) | 2.51 (0.53) | 0.10 | 2.34 (0.44) | 2.38 (0.45) | 0.12 | 0.12 (0.33) | 0.06 |
Table 1: Leg blood flow. This table displays the absolute blood flow values and common femoral artery diameter measurements obtained on day 1 and day 2 during the first and second blood flow measurement. The data is presented as mean (standard deviation). A paired t-test was conducted to assess within-day and between-day differences. Abbreviations: W = watt, CFA = Common femoral artery. The p-value considered statistically significant after Bonferroni correction was set at p = 0.005.
SRD (L) | CV (%) | ICC (Fraction) | |
0 W | 0.28 (0.21 to 0.38) | 4.2 (3.1 to 5.3) | 0.98 (0.96 to 0.99) |
6 W | 0.31 (0.26 to 0.38) | 4.3 (3.3 to 5.5) | 0.97 (0.95 to 0.99) |
12 W | 0.31 (0.24 to 0.50) | 4.1 (3.1 to 5.2) | 0.96 (0.93 to 0.97) |
18 W | 0.39 (0.32 to 0.50) | 4.0 (3 to 5.1) | 0.96 (0.94 to 0.98) |
Table 2: Within-day reliability measurements. The table presents the mean values (with 95% confidence intervals, lower limit, upper limit) for within-day reliability measurements. W = watt. SRD = Smallest real difference, CV = Coefficient of variance, ICC = Intraclass correlation coefficient.
SRD (L) | CV (%) | ICC (Fraction) | |
Rest | 0.21 (0.16 to 0.32) | 20.2 (14.7 to 27.2) | 0.92 (0.82 to 0.96) |
0 W | 0.66 (0.41 to 1.32) | 13.7 (10.3 to 17.6) | 0.93 (0.86 to 0.97) |
6 W | 0.52 (0.38 to 0.79) | 10.1 (7.5 to 13.1) | 0.91 (0.82 to 0.96) |
12 W | 0.66 (0.50 to 0.94) | 11.5 (8.6-14.7) | 0.82 (0.62 to 0.91) |
18 W | 0.71 (0.53 to 1.01) | 10.2 (7.6 to13.1) | 0.90 (0.79 to 0.95) |
Table 3: Between-day reliability measurements. The table provides the mean values (with 95% confidence intervals, lower limit, upper limit) for between-day reliability measurements. W = watt. SRD = Smallest real difference, CV = Coefficient of variance, ICC = Intraclass correlation coefficient.
SRD (L) | CV (%) | ICC (Fraction) | |
Rest | 0.23 (0.12 to 0.70) | 26.8 (11 to 51) | 0.85 (0.1 to 0.98) |
0 W | 0.96 (0.75 to 1.31) | 20 (9.2 to 33.3) | 0.74 (0,1 to 0.96) |
6 W | 0.88 (0.59 to 1.55) | 17.9 (8.5 to 29.2) | 0.6 (0.2 to 0.94) |
12 W | 1.09 (0.59 to 1.55) | 18.7 (8.8 to 30.6) | 0.5 (0.2 to 0.93) |
18 W | 1.55 (1.01 to 2.82) | 18.4 (8.6 to 30.1) | 0.41(0.1 to 0.84) |
Table 4: Inter-rater reliability measurements. The table presents the mean values (with 95% confidence intervals, lower limit, upper limit) for inter-rater reliability measurements. W = watt. SRD = Smallest real difference, CV = Coefficient of variance, ICC = Intraclass correlation coefficient.
Supplementary Figure 1: Single-leg knee-extensor model. This image depicts a participant during the trial while using the single-leg knee-extensor model. Prior consent was obtained from both the participant and the sonographer for the usage of this image. Text boxes are used to highlight all the materials mentioned in the protocol. Please click here to download this File.
Supplementary Figure 2: Ultrasound apparatus. This image showcases the buttons utilized for conducting a Doppler ultrasound examination. All the buttons described in the protocol are highlighted for easy reference. Please click here to download this File.
Supplementary Figure 3: Ultrasound apparatus in Pulse wave mode. The image demonstrates the buttons employed for conducting a Doppler ultrasound examination in Pulse wave mode. All the buttons mentioned in the protocol section are highlighted for clarity. Please click here to download this File.
Supplementary Figure 4: Doppler ultrasound signal. This image displays a blood velocity trace utilized for calculating leg blood flow. All the relevant metrics and buttons described in the protocol section are highlighted for easy identification and reference. Please click here to download this File.
This study assessed the reliability of Doppler ultrasound methodology for evaluating leg blood flow (LBF) during submaximal single-leg knee-extensor exercise in healthy participants. The results indicated high within-day reliability and acceptable between-day reliability, while inter-rater reliability was found to be unacceptable at rest and at 0 W.
Although probe removal between measurements appeared to have little impact, the difference in reliability between within-day and between-day measurements could be attributed to uncontrolled environmental factors. The scan site, sonographer, and experimental setup remained consistent throughout the study. However, participants were not instructed to abstain from caffeine, nicotine, alcohol, or strenuous exercise, all of which are known to affect blood flow to the limb30,31,32,33. Additionally, factors such as diet, fluid intake, and high caloric intake, especially fatty meals, known to affect muscle blood flow, were not controlled for34,35. The study also did not record information about participants' sleep before the examination, which has been shown to impact vascular function36. Furthermore, medication status and the potential influence of medications on blood flow regulation were not recorded or controlled for37,38,39,40. Therefore, the reported reliability estimates represent a worst-case scenario, and the method can be expected to be equally or even more reliable when used in healthy individuals while controlling for these subject-related factors. This aligns with the purpose of the study, as controlling for potential confounders is not always feasible in experimental or clinical settings. It is important to note that despite these limitations, the results demonstrated excellent within-day and between-day reliability. Moreover, ensuring that LBF is assessed by the same sonographer appears to be more crucial due to lower inter-rater reliability.
The findings of this study are consistent with other studies that evaluated the reliability of Doppler ultrasound in different experimental setups, including single-leg passive movement (PLM) in both men and women. These studies reported the highest reliability measure during peak LBF, suggesting that the method is more reliable during exercise compared to rest27,41. The results of this study demonstrated slightly higher reliability compared to the previous studies, which could be attributed to the data being obtained during exercise when LBF was higher. Furthermore, the reliability of the method was found to be comparable to a recent study that examined ultrasound reliability in a different setup, where two-legged stepping exercises were performed to measure blood flow to the leg21. The within-day reliability in this study was higher than an earlier study from 1997, potentially due to advancements in ultrasound technology and software.
The study revealed that the reliability between experimental days was lower at rest but improved as exercise intensity increased, highlighting the importance of detailed baseline measurements. In this study, resting LBF was assessed in the seated position with the foot tied to the pedal, and it is worth considering whether baseline measurements in the supine position would have been more reliable. Additionally, no standard protocol for the duration of rest was implemented, making the baseline measurement more susceptible to environmental factors, including the participants' physical activity level prior to the experiment, compared to the high-flow states during exercise.
It is important to note that this study was conducted on healthy participants, and the reliability measures may not be applicable to individuals with diseases. Doppler ultrasound heavily relies on the skills of the sonographer, and the reliable data obtained cannot be extrapolated to untrained sonographers. Evaluating both sonographers is crucial to account for potential differences in skill level that could lead to falsely low-reliability measures. However, it is worth mentioning that both sonographers exhibited the same degree of within-day variability, indicating consistent performance throughout the assessment period.
Furthermore, the study focused on single-leg knee extensions, and the results may not be applicable to Doppler ultrasound of the forearm, as blood flow regulation may differ between limbs42,43. The existing literature on vessel diameter changes during dynamic exercise presents conflicting data. Additionally, during seated rest, only one diameter measurement was obtained for the common femoral artery (CFA), which was then used to calculate flow following the methodology described in previous studies4,44. It should be noted that some evidence suggests an increase in CFA diameter during incremental single-leg knee exercise in young, healthy women45.
Future studies should investigate whether considering potential changes in CFA diameter during exercise would impact reliability. Furthermore, it is important to acknowledge that no exhaustion test was conducted prior to the protocol in this study. Therefore, the results are based on absolute workloads, and the low-to-submaximal intensities were derived from previous studies involving healthy young volunteers3,4,6,44. The assumption that steady-state is achieved after 2.5 min at the intensities used in this study is reasonable and consistent with previous findings6. However, it is essential to note that this may not hold true at higher intensities. Regardless, it should be emphasized that the reliability measures obtained in the present study cannot be generalized or extrapolated to maximal effort situations.
In summary, Doppler ultrasound-based measurements of leg blood flow during submaximal single-leg knee-extensor exercise in healthy humans demonstrated high within-day and acceptable between-day reliability when performed by the same sonographer. This reliability was observed even when intrinsic and extrinsic environmental factors were not controlled for, except for place, time, and room temperature.
The authors have nothing to disclose.
The Centre for Physical Activity Research (CFAS) is supported by TrygFonden (grants ID 101390 and ID 20045. JPH was supported by grants from Helsefonden and Rigshospitalet. During this work, RMGB was supported by a post.doc. grant from Rigshospitalet.
EKO GEL | EKKOMED A7S | DK-7500 Holstebro | |
RStudio, version 1.4.1717 | R Project for Statistical Computing | ||
Saltin Chair | This was built from an ergometer bike and a carseat owned by Professor Bengt Saltin. The steelconstruction was built from a specialist who custommade it. | ||
Ultrasound apparatus equipped with a linear probe (9 MHz, Logic E9) | GE Healthcare | Unknown | GE Healthcare, Milwaukee, WI, USA |
Ultrasound gel |