Automatic prediction model for online diaphragm motion tracking based on optical surface monitoring by machine learning
Original Article

Automatic prediction model for online diaphragm motion tracking based on optical surface monitoring by machine learning

Zhenhui Dai1^, Qiang He1, Lin Zhu1, Bailin Zhang1, Huaizhi Jin1, Geng Yang1, Chunya Cai1, Xiang Tan1, Wanwei Jian1, Yao Chen2, Hua Zhang3, Jian Wu2, Xuetao Wang1

1Department of Radiation Therapy, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China; 2Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China; 3School of Biomedical Engineering, Southern Medical University, Guangzhou, China

Contributions: (I) Conception and design: Z Dai, J Wu; (II) Administrative support: X Wang; (II) Provision of study materials or patients: H Jin, H Zhang, W Jian; (IV) Collection and assembly of data: L Zhu, B Zhang, X Tan; (V) Data analysis and interpretation: Q He, G Yang, C Cai, Y Chen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^ORCID: 0000-0002-8711-6502.

Correspondence to: Xuetao Wang. Department of Radiation Therapy, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou 510006, China. Email: wangxuetao0625@126.com; Jian Wu. Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, Tsinghua University, Tsinghua Park, Xili University Town, Nanshan District, Shenzhen 518057, China. Email: wuj@sz.tsinghua.edu.cn.

Background: The aim of this study was to establish a correlation model between external surface motion and internal diaphragm apex movement using machine learning and to realize online automatic prediction of the diaphragm motion trajectory based on optical surface monitoring.

Methods: The optical body surface parameters and kilovoltage (kV) X-ray fluoroscopic images of 7 liver tumor patients were captured synchronously for 50 seconds. The location of the diaphragm apex was manually delineated by a radiation oncologist and automatically detected with a convolutional network model in fluoroscopic images. The correlation model between the body surface parameters and the diaphragm apex of each patient was developed through linear regression (LR) based on synchronous datasets before radiotherapy. Model 1 (M1) was trained with data from the first 30 seconds of the datasets and tested with data from the following 20 seconds of the datasets in the first fraction to evaluate the intra-fractional prediction accuracy. Model 2 (M2) was trained with data from the first 30 seconds of the datasets in the next fraction. The motion trajectory of the diaphragm apex during the following 20 seconds in the next fraction was predicted with M1 and M2, respectively, to evaluate the inter-fractional prediction accuracy. The prediction errors of the 2 models were compared to analyze whether the correlation model needed to be re-established.

Results: The average mean absolute error (MAE) and root mean square error (RMSE) using M1 trained with automatic detection location for the first fraction were 3.12±0.80 and 3.82±0.98 mm in the superior-inferior (SI) direction and 1.38±0.24 and 1.74±0.32 mm in the anterior-posterior (AP) direction, respectively. The average MAE and RMSE of M1 versus M2 in the AP direction were 2.63±0.71 versus 1.28±0.48 mm and 3.26±0.90 versus 1.61±0.60 mm, respectively. The average MAE and RMSE of M1 versus M2 in the SI direction were 5.84±1.22 versus 3.37±0.43 mm and 7.22±1.45 versus 4.07±0.54 mm, respectively. The prediction accuracy of M2 was significantly higher than that of M1.

Conclusions: This study shows that it is feasible to use optical body surface information to automatically predict the diaphragm motion trajectory. At the same time, it is necessary to establish a new correlation model for the current fraction before each treatment.

Keywords: Diaphragm apex motion; optical surface; fluoroscopic image; machine learning; correlation model


Submitted Mar 14, 2022. Accepted for publication Jul 22, 2022. Published online Jul 28, 2022.

doi: 10.21037/qims-22-242


Introduction

Stereotactic body radiation therapy (SBRT) has become a favorable treatment approach for liver tumor patients, having a high tumor control rate and low probability of complications (1). However, a liver tumor may move out of the radiation field due to respiratory motion during radiotherapy. It is necessary to develop online tumor-tracking technology for precise radiotherapy (2). Real-time tumor tracking is mainly achieved by implanting metal fiducials for CyberKnife (Accuray Inc., Sunnyvale, CA, USA) or radio-frequency electromagnetic fiducials for Calypso (Calypso Medical Technologies, Seattle, WA, USA) (3,4). The tracking margins for SBRT with invasive markers are generally set to 3 mm in the superior-inferior (SI) direction, 2 mm in the right-left (RL) direction, and 2 mm in the anterior-posterior (AP) direction (5). Although implanted fiducial markers can provide the accurate location of a tumor, they also increase the risk of liver hemorrhage. Markerless tracking is potentially an alternative approach during liver tumor SBRT (6). Fluoroscopic imaging in the linear accelerator is a conventional tool for position verification before radiotherapy. Some studies have reported using fluoroscopic imaging for direct tumor tracking with template matching applied as the localization algorithm (7-10). However, the problem with tracking tumor motion in fluoroscopic images is that it is difficult to identify tumor targets because of the poor image contrast. The upper edge of the diaphragm is sharp in contrast to the neighboring tissue and, therefore, can be more reliably identified than the tumor in the abdominal region (11). Studies have reported that tumor motion near the diaphragm has a high correlation with diaphragm motion, suggesting that the diaphragm could be used as an internal surrogate for predicting tumor motion without the need for fiducial implants (12,13). This approach could be clinically useful for the management of liver tumor motion located near the diaphragm. Several groups have proposed methods for direct diaphragm tracking using kilovoltage (kV) X-ray images (14,15). Hirai et al. (16) proposed a method for tracking the internal target based on fluoroscopic images using deep learning. The drawback of deep learning is that it requires a large number of samples, which is lacking for this particular application. An alternative method is to train a convolutional neural network (CNN) on a large public dataset and perform transfer learning by fine-tuning the pre-trained model for target tracking in fluoroscopic images (17). Some studies have reported tumor motion estimation by external surface surrogate location using an optical monitoring system (18-22). However, it is difficult to construct a reliable correlation model between external surrogates and an internal tumor, and the model is subject to the changes of respiratory patterns (23). Bertholet et al. (24) developed an automatic tumor-motion monitoring approach by combining an external one-dimensional (1D) optical marker and kV X-ray imaging. The correlation model could track the internal tumor online using implanted fiducials and an external optical signal with a 2.31 mm root-mean-square error (RMSE). The problem with this method is that the tumor is invasively located by implanting metal fiducials. The body surface marker can only monitor 1D respiratory waveforms, yet both the body surface and the tumor move in three dimensions. Vedam et al. (25) formulated a linear model for 5 lung tumor patients to predict the diaphragm motion from 1D respiration signals using a reflective marker on the abdominal wall. However, 1D respiratory signals in the vertical direction may not adequately reflect 3D diaphragm motion, as respiration is an anisotropic movement in three dimensions. Glide-Hurst et al. (26) investigated the feasibility of 3D surface imaging as an external surrogate of diaphragm motion through synchronization with kV fluoroscopic imaging. Glide-Hurst’s study did not predict internal diaphragm movement based on surface imaging. Fayad et al. (27) studied the correlation between external surface abdominal areas and the internal diaphragm with the tumor using correlation coefficients 0.80±0.18. This approach used 4D computed tomography (CT) to construct a motion model which did not track the internal target in real time based on the external surface. Seregni et al. (28) proposed 2 internal/external correlation models in infra-red optical tracking. The geometric and dosimetric accuracies were increased with this tumor-tracking approach. The limitations of previous studies have included problems with invasive procedures or off-line modes of motion tracking.

The goal of this study was to realize the non-invasive, real-time tracking of tumor motion near the diaphragm by predicting the motion of the diaphragm based on optical body surface information. We developed a tracking model combining a discriminative correlation filter (DCF) with fully convolutional network (FCN) to automatically detect the diaphragm apex in fluoroscopic images. A linear regression (LR) method was used to construct an internal/external correlation model based on the synchronized diaphragm apex position and pre-treatment optical information of the body surface. The diaphragm motion trajectory was predicted in real time based on the optical information of the body surface without X-ray fluoroscopy during subsequent radiotherapy. We present the following article in accordance with the TRIPOD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-22-242/rc).


Methods

Patient datasets

The motion datasets were acquired from 7 liver cancer patients who underwent external beam radiotherapy with 25 fractions (2 Gy/fraction) from January 2021 to October 2021 at The Second Affiliated Hospital of Guangzhou University of Chinese Medicine. The characteristics of the patients are shown in Table S1. The kV fluoroscopic imaging and optical surface monitoring system (OSMS) of a Varian Edge linear accelerator (Varian Medical Systems, Palo Alto, CA, USA) were used to acquire the datasets used in this study, as shown in Figure 1. The inclusion criteria were as follows: (I) patients with liver cancer and (II) patients with fluoroscopic imaging and optical surface monitoring conducted over 50 seconds. Patients were excluded for the following reasons: (I) inability to cooperate with fluoroscopic or optical surface monitoring and (II) the breathing amplitude of the body surface was less than 3 mm. A total of 10 patients with liver tumors underwent fluoroscopic imaging synchronized with optical body surface monitoring. We excluded 3 patients with a surface motion amplitude of less than 3 mm from the datasets, with 7 patients included in the final dataset. The flow diagram of patient selection is shown in Figure S1. The On-Board Imager (Varian Medical Systems) equipped in the accelerator worked in the radiography mode (100 kV, 80 mA), acquiring kV fluoroscopic images for each patient at a rate of 15 frames per second for 50 seconds. The pixel sizes of the obtained images were 0.388×0.388 mm with a matrix size of 1,024×768 pixels. The X-ray fluoroscopic imaging was performed in the RL direction. The fluoroscopic images reflected the motion information of the diaphragm apex in the SI direction and the AP direction. The direction signs are shown in Figure 1. The body surface motion in real-time was monitored with an OSMS which consisted of 3 cameras. The cameras capture the reflected light to reconstruct 3D distance maps of the complete patient body surface. AlignRT (Vision RT Ltd., London, UK) was used to acquire body surface information, including 8 variables, which are listed in Table 1. The monitoring region of interest (ROI) covered the diaphragm area on the chest and abdominal body surface with obvious and regular fluctuation. The body surface region below the xiphoid process and between the left and right costal arches was selected as the ROI. In this study, the reference surface was captured at the end of the expiratory phase. The information, including the 8D body surface parameters, was saved into the “RealTimeDeltas” file of the OSMS. The fluoroscopic images and the body surface motion parameters were acquired synchronously to build the internal-external correlation model. The OSMS data and fluoroscopic images of 4 of the 7 patients were acquired synchronously on different days to verify the inter-fractional accuracy of the prediction model. The synchronization was achieved by extracting the corresponding frame of the fluoroscopic image closest to the acquisition time of each sampling point of the optical body surface monitoring system.

Figure 1 The schematic diagram of a Varian Edge linear accelerator with kV fluoroscopic imaging and OSMS. AP, anterior-posterior; RL, right-left; SI, superior-inferior; kV, kilovolt; OSMS, optical surface monitoring system.

Table 1

Parameters of the optical information of the body surface

Parameters Implication
Translation (mm) The overall distance (distance between the current surface and the reference surface)
D.VRT (mm) The distance in the AP direction, to move the current surface back to the reference surface
D.LNG (mm) The distance in the SI direction, to move the current surface back to the reference surface
D.LAT (mm) The distance in the RL direction, to move the current surface back to the reference surface
D.Rtn (deg) Rotation angle in the AP direction, to move the current surface back to the reference surface
D.Roll (deg) Rotation angle in the SI direction, to move the current surface back to the reference surface
D.pitch (deg) Rotation angle in the RL direction, to move the current surface back to the reference surface
D.Amp (mm) Distance measured perpendicular to the average direction of the reference surface

D.VRT, vertical distance; AP, anterior-posterior; D.LNG, longitudinal distance; SI, superior-inferior; D.LAT, lateral distance; RL, right-left; D.Rtn, degree of rotation; D.Amp, amplitude distance.

In this study, the diaphragm motion was represented by its apex. The diaphragm apex was detected from the 2D fluoroscopic images to acquire the location in the SI and AP directions. As Schwarz et al. (29) stated, the motion of the diaphragm apex in the RL direction was only 0.8–1.6 mm. Therefore, movement in the RL direction was not considered in this study. An experienced radiation oncologist was asked to manually delineate the apex of the right diaphragm in all fluoroscopic images from the datasets of 7 patients to provide the ground truth for the diaphragm apex location. The 2D coordinates of these points were used to describe the motion information of the right diaphragm. The study was approved by the Human Research Ethics Committee of The Second Affiliated Hospital of Guangzhou University of Chinese Medicine (No. BE2021-141-01) and informed consent was provided by all participants. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Automatic detection of the diaphragm apex

In order to construct the internal/external correlation model online, we proposed a deep learning framework to detect the diaphragm apex automatically in fluoroscopic images without prior knowledge or additional learning time. The network structure that we used was that proposed by Shen et al. (30). The network input image size was 125×125×3 pixels. The loss function was mean square error (MSE). The learning rate was set to 0.001. Our algorithm was implemented in Pytorch (https://pytorch.org/). All experiments were carried out on a computer with an Intel Core i9-9900X CPU at 3.6 GHz and 2 NVIDIA RTX 2080Ti 11GB GPUs.

In this work, a CNN was first trained on the ImageNet datasets (http://image-net.org) of the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC; 2017). The pre-trained CNN was then applied to the X-ray fluoroscopic images through an image-enhancement operation. We used fluoroscopic images of 7 patients for testing, and approximately 320–340 fluoroscopic images were acquired for each patient. We performed power-law transform and grayscale inversion for the fluoroscopic images and fed the enhanced image directly into the network. Figure 2 contains 2 parts: Figure 2A,2B. The flow chart of the automatic detection of the diaphragm apex is shown in Figure 2A. Figure 2B shows the flow chart of the construction of the internal/external correlation model. We reformulated the DCF as a differentiable neural network layer (30,31) and connected it with the FCN to develop an end-to-end network. We took the first frame of the video stream as the template patch, z, where the tracking target in the diaphragm boundary was manually selected. The template patch, z, was inputted to the feature extraction function ϕ(·) to get the feature map ϕ(z), then ϕ(z) was inputted to the DCF layer, as described by:

ϵ=wϕ(z)y2

Figure 2 The flowchart of the construction of the correlation model, (A) the framework of automatic diaphragm apex detection in the fluoroscopic images and (B) the internal/external correlation model. LRN, local response normalization; DCF, discriminative correlation filter; OSMS, optical surface monitoring system; kV, kilovolt; LR, linear regression; M1, correlation model 1; M2, correlation model 2.

For template patch z, the DCF layer operates by minimizing ϵ to find the parameters of our desired DCF w, where ϵ is a ridge regression function, y denotes the label of the template patch z, and * denotes circular correlation; y is generated using a Gaussian distribution with a band-width of 0.50. The bandwidth is a hyperparameter that can be modified; in our case, we used 0.50. In fact, the DCF layer is a ridge regression function. We obtained a series of parameters by solving the ridge regression function. When tracking the target in subsequent frames, the search patch x was inputted to the same feature extraction function ϕ(·), ϕ(x) was obtained and later inputted to the DCF layer. For search patch x, the DCF layer operates by applying w to ϕ(x) to get a response map of x, as described in Eq. [2]:

Responsex=wϕ(x)

Finally, we obtained the response map, Responsex, whereby the peak of Responsex is the location of the tracking target. We used FCN as the feature extraction function ϕ(·) which was not limited to the size of the input image. However, we found that the image size of 125×125×3 was more stable, and the 3 channels contained identical grayscale data of the 1-channel image, so we applied the 125×125×3 pixel size uniformly in our experiments. In this way, a tracker model was developed which could detect the diaphragm apex automatically on pre-treatment kV X-ray fluoroscopic images of the liver tumor patients if the point of interest in the diaphragm boundary was manually selected in the first frame image. The automatic tracking with the proposed algorithm was compared with the manual delineation by the clinical oncologist to evaluate the accuracy of this model.

Internal and external correlation models

The correlation model was trained by offline learning with pre-treatment data and used in online tumor tracking during the subsequent treatment delivery, as shown in Figure 2B. The gray triangle region in Figure 2B represents the monitoring ROI of OSMS. For the training stage, the acquired fluoroscopic images were fed into the trained diaphragm apex detection model to obtain the motion of the diaphragm apex in the SI and AP directions. Then, the 8 extracted body surface motion parameters and the diaphragm apex movement information of each patient were fed into the LR model for establishing patient-specific correlation models. In the testing stage, the 8 extracted body surface motion parameters were fed into the trained correlation models to acquire the diaphragm apex motion in the SI and AP directions. The formula of the LR model was defined as follows:

yt=β0+β1x1,t+β2x2,t++β8x8,t+εt

where, yt is the predicted variable which denotes the location of the diaphragm apex, x1,t,,x8,t are the predictor variables which denote the parameters of translation, D.VRT, D.LNG, D.LAT, D.Rtn, D.Roll, D.Pitch, and D.Amp, respectively, β0 denotes the intercept of the regression line, β1,,β8 measure the degree of influence of the predictor variable on the predicted variable, respectively, while keeping all other predictor variables unchanged, and εt is an offset value. These coefficients were optimized with the least squares method.

Experiment setup

In this study, 2 correlation models (M1, M2) were constructed with LR based on the data of a single patient in the first treatment fraction and the next fraction respectively, as shown in Figure 1B. The datasets of each patient contain fluoroscopic images and optical surface information collected synchronously for 50 seconds. The correlation model was built for each patient as an individual model. The M1 was trained with data from the first 30 seconds of the datasets and tested with data from the following 20 seconds of the datasets in the first treatment fraction to evaluate the intra-fractional prediction accuracy. We constructed 3 models (M1a, M1b, M1c) based on automatic detection of the diaphragm apex and the 8D body surface parameters, manual detection of the diaphragm apex and the 8D body surface parameters, and manual detection of the diaphragm apex and the 1D body surface parameter (D.VRT), respectively. The prediction accuracy of the M1a was compared with that of the M1b to verify whether the automatically detected location could be used for subsequent model construction. The prediction accuracy of the M1b was compared with that of the M1c to evaluate the advantage of the OSMS compared with the 1D optical marker.

The M2 was constructed with data from the first 30 seconds of the datasets in the next treatment fraction of 4 patients. The motion trajectory of the diaphragm apex during the following 20 seconds in the next fraction was predicted with M1a and M2, respectively, to evaluate the inter-fractional prediction accuracy. Some 4 of the 7 patients were tested in order to avoid additional radiation. Both models were trained based on the internal diaphragm apex position acquired by automatic detection and the 8D body surface parameters. The prediction errors of the 2 models were compared to analyze whether the correlation model needed to be re-established for inter-fractional motion management.

Evaluation metrics

To evaluate the difference between the prediction and manually identified diaphragm apex position, the mean absolute error (MAE), RMSE, and Euclidean distance were calculated between the prediction location and the actual position of the diaphragm apex. The Euclidean distance is defined as the straight-line distance, d, between the prediction location and the ground truth. The formulas for these metrics were defined as follows:

MAE=1ni=1n|ptti|

RMSE=i=1n(piti)2n

d(P,T)=(P1T1)2+(P2T2)2

where, p is the prediction value, t is the true value, n is the total number of values, P = (P1, P2) is the prediction location, and T = (T1, T2) is the true location.

Since the signals of external body surface motion and internal diaphragm motion were recorded synchronously, the 2 signals could be analyzed on the same time scale. The relative diaphragm location instead of absolute location was used in this study.

Statistical analysis

The software MATLAB (version 2018b; MathWorks Inc., Natick, MA, USA) was used for the statistical analysis. A t-test was performed between M1a and M1b and M1b and M1c, respectively. A t-test was used to test the statistical significance of the difference between M1a and M2. All P values were 2-sided P values and P<0.05 was considered significant.


Results

Automatic detection of the diaphragm apex

An example of 1 of the 7 patients with automatic versus manual detection of the diaphragm apex motion trajectory in the SI and AP direction is illustrated in Figure 3. The red lines denote the automatic detection position, while the black lines denote the manually marked diaphragm apex position. The results of automatic tracking were consistent with that of manual detection. The errors between the automatic detection with FCN and manual detection for 7 patients are listed in Table 2. The average MAE and RMSE of automatic detection were 0.69±0.21 and 0.97±0.40 mm in the AP direction and 1.02±0.28 and 1.28±0.37 mm in the SI direction, respectively. The mean Euclidean distance between the automatic detection location and truth location for each patient ranged from 1.05 to 1.89 mm.

Figure 3 Automatic detection and manual detection of the diaphragm apex motion trajectories, (A) AP direction, (B) SI direction. AP, anterior-posterior; SI, superior-inferior.

Table 2

The error between the automatic detection location and the manual detection location of the diaphragm apex

Patient no. AP SI Euclidean distance (mm)
MAE (mm) RMSE (mm) MAE (mm) RMSE (mm)
1 0.87 1.74 1.42 1.83 1.89±1.56
2 0.58 0.77 1.13 1.29 1.38±0.59
3 0.80 1.07 0.69 0.86 1.20±0.65
4 1.02 1.27 1.14 1.49 1.72±0.94
5 0.51 0.69 0.82 0.99 1.05±0.61
6 0.69 0.79 0.66 0.84 1.05±0.48
7 0.36 0.46 1.31 1.69 1.41±1.04
Average 0.69±0.21 0.97±0.40 1.02±0.28 1.28±0.37
95% CI 0.54–0.85 0.73–1.33 0.81–1.23 1.02–1.57

The data in the “Average” row and the “Euclidean distance” column are presented as mean ± standard deviation. AP, anterior-posterior; MAE, mean absolute error; RMSE, root mean square error; SI, superior-inferior; CI, confidence interval.

Internal and external correlation models

The regression coefficients of the LR model of 7 patients are summarized in Table 3. Regression coefficients indicate the strength of the relationship between the 8 body surface motion parameters and the diaphragm apex location by manual detection. The values of β2 and β6 in more than 5 patients were relatively large in most models which shows that the correlation between D.VRT and D.Roll and internal motion was stronger than the other parameters. The visualization of the regression coefficients is shown in Figure S2.

Table 3

Regression coefficients of model M1b

Direction Patient no. β0 β1 β2 β3 β4 β5 β6 β7 β8
AP 1 228.60 −29.59 −96.74 −13.98 24.26 −69.26 29.26 14.98 58.14
2 360.85 −26.88 −44.93 19.55 15.27 −4.88 −8.16 15.07 27.99
3 296.49 −2.52 48.29 −6.37 14.08 31.26 26.59 8.90 −40.50
4 379.60 −26.49 −37.04 5.70 −7.47 −9.82 35.23 50.28 24.68
5 392.48 0.14 15.57 −1.62 −10.44 −5.55 14.22 −11.68 2.62
6 293.64 −3.69 4.98 −4.75 −14.67 28.59 −8.47 7.94 3.71
7 214.97 3.10 14.03 8.68 −2.05 5.42 −19.41 −0.72 −1.74
SI 1 551.83 23.44 114.98 3.29 −3.53 41.61 −112.88 −62.15 −51.26
2 683.33 −36.52 38.85 38.73 −0.07 81.68 −163.62 0.53 20.75
3 464.51 −5.98 126.97 −17.07 38.82 88.68 54.16 18.02 −105.66
4 629.99 −53.83 −93.41 −5.65 −42.51 49.65 129.95 101.08 82.92
5 647.28 −1.83 27.62 0.14 −12.02 −13.23 14.55 −24.33 2.34
6 676.54 −6.56 34.89 −32.09 −40.17 89.50 27.38 20.99 −19.57
7 609.84 12.02 40.88 19.85 −2.29 7.44 −62.14 −9.45 −6.04

AP, anterior-posterior; SI, superior-inferior.

Evaluation of the intra-fractional prediction accuracy

The actual and prediction trajectories of the diaphragm apex of patient 1 in the first fraction are shown in Figure 4. Figure 4A,4B present the trajectories of the diaphragm apex in the AP and SI directions, respectively. The red line denotes the prediction results of M1b based on the internal apex by manual detection, and the black line denotes the actual motion trajectories of the diaphragm apex. It can be seen that the prediction trajectories match well with the actual ones, especially in the SI direction. The movement of the diaphragm apex in the SI direction had a certain regularity, and the prediction accuracy in the SI direction was higher than that in the AP direction. Some variations were observed between the prediction results and the ground truth in the AP direction. Most of the large errors appeared near the peak and trough positions of the trajectory in the AP direction. This may be due to the larger baseline shift of the diaphragm moving in the AP direction.

Figure 4 The actual and prediction diaphragm apex trajectories obtained by M1b, (A) AP direction, (B) SI direction. AP, anterior-posterior; SI, superior-inferior.

The prediction errors with M1 in the first fraction for 7 patients are summarized in Table 4. The prediction accuracy of M1b was comparable with that of M1a. The average MAE and RMSE in the SI direction of M1a were 3.12±0.80 and 3.82±0.98 mm, respectively. The average MAE and RMSE in the AP direction were 1.38±0.24 and 1.74±0.32 mm, respectively. The average MAE and RMSE in the SI direction of M1b were 3.09±0.80 and 3.75±1.01 mm, respectively. The average MAE and RMSE in the AP direction were 1.34±0.24 and 1.69±0.29 mm, respectively. The average MAE and RMSE in the SI direction of M1c were 3.78±0.42 and 4.73±0.47 mm, respectively. The average MAE and RMSE in the AP direction were 2.22±0.71 and 2.75±0.90 mm, respectively, as shown in Figure 5.

Table 4

Intra-fractional errors (prediction-actual position) with M1a and M1b

Patient no. M1a M1b
AP SI AP SI
MAE (mm) RMSE (mm) MAE (mm) RMSE (mm) MAE (mm) RMSE (mm) MAE (mm) RMSE (mm)
1 1.49 1.98 2.16 2.67 1.36 1.90 2.13 2.71
2 0.85 1.04 2.18 2.76 0.80 1.04 2.13 2.63
3 1.62 2.03 2.44 3.03 1.63 1.97 2.47 3.08
4 1.32 1.65 3.84 4.69 1.27 1.56 3.69 4.48
5 1.58 1.98 4.35 5.44 1.52 1.82 4.35 5.48
6 1.38 1.84 3.56 4.36 1.41 1.86 3.62 4.52
7 1.40 1.68 3.28 3.77 1.36 1.69 3.21 3.33
Average 1.38±0.24 1.74±0.32 3.12±0.80 3.82±0.98 1.34±0.24 1.69±0.29 3.09±0.80 3.75±1.01
95% CI 1.13–1.50 1.40–1.91 2.53–3.71 3.15–4.60 1.09–1.47 1.37–1.85 2.51–3.70 3.10–4.61

The data in the “Average” row are presented as mean ± standard deviation. AP, anterior-posterior; MAE, mean absolute error; RMSE, root mean square error; SI, superior-inferior; CI, confidence interval.

Figure 5 The comparison of prediction errors, (A) comparison between M1a and M1b, (B) comparison between M1b and M1c. AP, anterior-posterior; SI, superior-inferior; MAE, mean absolute error; RMSE, root mean square error.

The comparison of prediction errors between M1a and M1b (Figure 5A) and M1b and M1c (Figure 5B) is shown in Figure 5. The prediction errors between M1b and M1c achieved a significant difference (all P<0.05), and the pairwise comparison showed that the performance of M1b was better than that of M1c. There was no significant difference between M1a and M1b (all P>0.05).

Evaluation of the inter-fractional prediction accuracy

Figure 6 shows the actual and prediction diaphragm apex trajectories obtained by M1a and M2 of patient 7 in the next fraction. The black lines denote the ground truth of the motion trajectory. The blue dotted line represents the prediction results for M1a. The red solid line represents the prediction results for M2. It can be seen that the prediction accuracy of M2 constructed with the data of the current fraction was higher than that of M1a constructed with the first fractional data. Most of the significant differences between the prediction results and ground truth occurred near the peaks and troughs of the motion trajectories, with the position error larger than 4 mm.

Figure 6 The actual and prediction diaphragm apex trajectories obtained by M1a and M2, (A) AP direction, (B) SI direction. AP, anterior-posterior; SI, superior-inferior.

The prediction errors with M1a and M2 for the next fraction in the AP and SI direction are summarized in Table 5. The average MAE and RMSE with M1a versus M2 in the AP direction for 4 patients were 2.63±0.71 versus 1.28±0.48 mm and 3.26±0.90 versus 1.61±0.60 mm, respectively. The average MAE and RMSE with M1a versus M2 in the SI direction for 4 patients were 5.84±1.22 versus 3.37±0.43 mm and 7.22±1.45 versus 4.07±0.54 mm, respectively. The prediction performance of M2 was significantly higher than that of M1a. The differences of prediction errors between M1a and M2 were statistically significant (all P<0.05), as shown in Figure 7.

Table 5

Inter-fractional errors (prediction-actual position)

Patient no. M1a M2
AP SI AP SI
MAE (mm) RMSE (mm) MAE (mm) RMSE (mm) MAE (mm) RMSE (mm) MAE (mm) RMSE (mm)
4 3.51 4.52 5.37 7.04 1.87    2.33 3.46 4.15
5 2.95 3.45 6.75 8.17 0.64 0.80 3.86 4.76
6 2.49 3.04 7.16 8.73 1.60 2.02 3.48 4.13
7 1.58 2.01 4.07 4.95 1.02 1.28 2.67 3.25
Average 2.63±0.71 3.26±0.90 5.84±1.22 7.22±1.45 1.28±0.48 1.61±0.60 3.37±0.43 4.07±0.54
95% CI 1.81–3.23 2.27–3.99 4.40–6.85 5.47–8.31 0.74–1.73 1.04–2.18 2.87–3.67 3.47–4.45

The data in the “Average” row are presented as mean ± standard deviation. AP, anterior-posterior; MAE, mean absolute error; RMSE, root mean square error; SI, superior-inferior; CI, confidence interval.

Figure 7 The differences of prediction errors between M1a and M2. AP, anterior-posterior; SI, superior-inferior; MAE, mean absolute error; RMSE, root mean square error.

Discussion

Tumor tracking is one of the most important requirements in precise radiotherapy for liver tumors. However, non-invasive and real-time tracking of liver tumors is difficult to achieve directly. Previous studies have demonstrated that the diaphragm can be used as the surrogate to track liver tumors near the diaphragm (12,13). In this work, we proposed an automatic, online diaphragm-motion prediction framework based on the monitoring of optical body surface information through machine learning. The method, which avoids invasive procedures, can be used for real-time tumor tracking near the diaphragm in liver tumor radiotherapy. The diaphragm apex position in fluoroscopic images can be automatically detected with an FCN combined with a DCF (FCN-DCF). The correlation model based on pre-treatment data with offline machine learning could be reliably used online to predict diaphragm motion from body surface information during subsequent intra-fraction treatment. Omitting fluoroscopy during tumor tracking can significantly reduce additional radiation doses to patients.

In our study, the DCF was used as a differentiable neural network layer and connected with the FCN to develop the tracker model. The FCN-DCF is a template matching process which searches for the most similar content in the search patch compared to the template patch. In the framework, the FCN-DCF network was trained on ImageNet datasets to develop the feature extraction model with better generalization. The model could detect the diaphragm apex automatically on X-ray fluoroscopic images if the point of interest was manually selected in the first frame image. The average MAE of automatic detection was 0.69±0.21 mm in the AP direction and 1.02±0.28 mm in the SI direction. The detection accuracy of our model was consistent with that reported by Keatley et al. (32). The learned target features under noisy conditions may cause a slight decrease in accuracy since the background samples may be incorrectly identified as the diaphragm apex. From the results in our study, it could be seen that the diaphragm motion trajectory predicted based on the optical body surface was consistent with the manually delineated diaphragm motion trajectory. Most of the large errors occurred near the peaks and troughs of the motion trajectory, and the error at the regular breathing interval of the middle of the trajectory was small, as shown by the red line in Figure 6. The higher prediction accuracy of regular breathing indicated that the irregular respiration at the peaks and troughs of different cycles led to large prediction errors. The accuracy of the correlation model may be affected by the changes of breathing types within the acquisition time of a single patient in the training data. The motion amplitude of the diaphragm apex in the SI and AP direction was about 30 and 10 mm, respectively. The average MAE and RMSE of M1b were 3.09±0.80 and 3.75±1.01 mm in the SI direction, respectively. The average MAE and RMSE were 1.34±0.24 and 1.69±0.29 mm in the AP direction, respectively. The relative accuracy improvement for the SI direction was much greater than that for the AP direction. The movement in the SI direction was more related to the body surface than that in the AP direction. The larger the training datasets, the more robust the prediction results were, and prolonged fluoroscopy time led to an increase in the exposure dose for patients.

Compared to the previous study utilizing 1D body surface signals (24), the correlation model in the current study used a more comprehensive 3D body surface movement with 8 parameters to accurately predict the movement of the diaphragm and did not require imaging to update the model during the monitoring process. The prediction errors of M1b based on 8D body surface parameters were significantly smaller than those of M1c based on the 1D body surface parameters of D.VRT, especially in the AP direction. The surface information containing 8 parameters can reflect the actual movement of the body surface by capturing the rigid transformation of a reference surface. More comprehensive information about the movement of the body surface was not available because the optical system could not track each point on the body surface. Obviously, real-time acquisition of the movement of the complete patient’s external surface during radiation therapy would reduce prediction variability and related errors. This can be achieved by selecting the ROI on the body surface that is most relevant to the diaphragm during motion synchronization and modeling. The correlation between D.VRT and D.Roll of the body surface parameters and internal motion was higher than that of other parameters. This may be due to greater motion amplitude of the body surface in the VRT and Roll directions. Generally, a higher frame rate can ensure timely capture of body surface changes over time. Using frequently updated input data to train a prediction model may get higher prediction accuracy. It is recommended to use the data with a sampling interval of less than 1 second to build a prediction model (33). The frame rates of the OSMS were 3–6 frame/s with a sampling time interval of 0.16–0.33 seconds, and the frame rate of the fluoroscopy in this study was 15 frames/s.

The prediction accuracy of M1b was comparable with that of M1a. The results indicated that the diaphragm apex position in fluoroscopic image with automatic detection can be used to construct the internal/external correlation model in real time. To test the reliability and robustness of the correlation model, 2 correlation models (M1a, M2) were constructed based on the data in the first treatment fraction and the next fraction, respectively. The results showed that M1a could predict the diaphragm motion trend in the next fraction, but the accuracy was significantly lower than M2 which was constructed based on the data of the current fraction. The prediction results demonstrated that there was reproducibility of the inter-fractional correlation model, indicating that the selected ROI of the body surface could be used as a surrogate of the internal diaphragm motion. However, baseline drift of the diaphragm apex or changes in the correlations between internal/external motion for different treatment fractions may reduce the prediction accuracy. The MAE of the inter-fraction prediction was over 5 mm, which could affect the clinical application. To obtain higher prediction accuracy, it is necessary to update the internal and external correlation model before each treatment. The prediction accuracy of the correlation model was verified online by comparing the prediction position of the diaphragm apex based on the body surface with the automatic detection position based on fluoroscopic images taken before treatment. Once the location of the diaphragm apex was obtained with the automatic detection model, the total training and prediction time of the correlation model was less than 1 ms, which has the potential to meet the clinical requirements of online tracking. The correlation model could be used to guide tumor tracking in real time during subsequent radiotherapy if the prediction accuracy meets the requirements by the radiation oncologist of the local institution. The conventional margin between the clinical target volume (CTV) and planning target volume (PTV) was 10 mm because of the tumor motion during radiotherapy (25). There is no uniform standard for the accuracy of tumor motion tracking. Whether the prediction accuracy meets the requirements is judged by the radiation oncologist based on the protection of organs at risk. In the clinical setting, this tracking error could be added as a margin to the target volume. In our study, the average MAE and RMSE of intra-fraction motion in the SI direction were 3.09 and 3.75 mm, respectively, which are comparable with previous studies (18,24). The error of all other components, including planning, residual setup, and machine quality assurance, was about 3 mm (34). The CTV-PTV margin could be set to 6 mm. The side effects of radiotherapy will be greatly reduced if the margin is reduced from over 10 to 6 mm. The advantage of our method is that it not only ensures tracking accuracy but can also realize non-invasive tracking with an online mode during radiotherapy. Most of the significant errors occurred due to sudden changes in the transition between exhalation and inhalation. In actual treatment, the delivery beam is held by the gating window of OSMS when the error exceeds the default threshold to minimize the impact of large errors. If the error is below the threshold, the irradiation field continues to beam on. It is feasible to predict the diaphragm motion with body surface in the clinical application.

There were some limitations to this study: (I) the original point of the diaphragm apex in the first frame needed to be marked manually so that the diaphragm apex was identified and tracked automatically in other fluoroscopic images; (II) the correlation model was able to be constructed quickly based on a minimal amount of input data and worked well to reduce the exposure dose. (III) The breathing pattern in the model training process could be quite different from the breathing of the subsequent treatment process. Hence, the correlation model needs to have the ability to accurately predict irregular breathing. In the future, another method should be developed to identify the point of interest automatically without any artificial assistance. More complex and high-performance correlation models need to be developed to efficiently and robustly predict internal motion.


Conclusions

Our work presents the first attempt to establish a correlation model with offline learning and online prediction of the diaphragm motion trajectories based on the 3D body surface during radiotherapy. The prediction model is a non-invasive tool to quantify the motion of the diaphragm based on optical surface information. It is necessary to update the correlation model for the current fraction before each treatment.


Acknowledgments

A part of the manuscript has been presented as a poster in 2021 ESTRO annual meeting.

Funding: This work was supported by the Guangzhou Science and Technology Plan (No. 202102010264), Guangdong Provincial Hospital of Chinese Medicine Science and Technology Project (No. ZY2022YL07), Natural Science Foundation of China (No. 61871208), Knowledge Innovation Program of Basic Research Projects of Shenzhen (No. JCY20200109142805928), and the National Key R&D Program of China (No. 2019YFC0119500).


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-22-242/rc

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-22-242/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Human Research Ethics Committee of The Second Affiliated Hospital of Guangzhou University of Chinese Medicine (No. BE2021-141-01) and informed consent was taken from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Winter JD, Wong R, Swaminath A, Chow T. Accuracy of Robotic Radiosurgical Liver Treatment Throughout the Respiratory Cycle. Int J Radiat Oncol Biol Phys 2015;93:916-24. [Crossref] [PubMed]
  2. Nishioka T, Nishioka S, Kawahara M, Tanaka S, Shirato H, Nishi K, Hiromura T. Synchronous monitoring of external/internal respiratory motion: validity of respiration-gated radiotherapy for liver tumors. Jpn J Radiol 2009;27:285-9. [Crossref] [PubMed]
  3. Torshabi AE, Pella A, Riboldi M, Baroni G. Targeting accuracy in real-time tumor tracking via external surrogates: a comparative study. Technol Cancer Res Treat 2010;9:551-62. [Crossref] [PubMed]
  4. Dai Z, Zhang H, Xie Y, Zhu L, Zhang B, Cai C, Li F, Yang G, Jin H, Wang X. Validation of Geometric and Dosimetric Accuracy of Edge Accelerator Gating With Electromagnetic Tracking: A Phantom Study. IEEE Access 2019;7:127693-702.
  5. Liu M, Cygler JE, Vandervoort E. Geometrical tracking accuracy and appropriate PTV margins for robotic radiosurgery of liver lesions by SBRT. Acta Oncol 2019;58:906-15. [Crossref] [PubMed]
  6. Xie Y, Xing L, Gu J, Liu W. Tissue feature-based intra-fractional motion tracking for stereoscopic x-ray image guided radiotherapy. Phys Med Biol 2013;58:3615-30. [Crossref] [PubMed]
  7. Zhou D, Quan H, Yan D, Chen S, Qin A, Stanhope C, Lachaine M, Liang J. A feasibility study of intrafractional tumor motion estimation based on 4D-CBCT using diaphragm as surrogate. J Appl Clin Med Phys 2018;19:525-31. [Crossref] [PubMed]
  8. Mann P, Witte M, Mercea P, Nill S, Lang C, Karger CP. Feasibility of markerless fluoroscopic real-time tumor detection for adaptive radiotherapy: development and end-to-end testing. Phys Med Biol 2020;65:115002. [Crossref] [PubMed]
  9. Cui Y, Dy JG, Sharp GC, Alexander B, Jiang SB. Multiple template-based fluoroscopic tracking of lung tumor mass without implanted fiducial markers. Phys Med Biol 2007;52:6229-42. [Crossref] [PubMed]
  10. Nguyen K, Haytmyradov M, Mostafavi H, Patel R, Surucu M, Block A, Harkenrider MM, Roeske JC. Evaluation of Radiomics to Predict the Accuracy of Markerless Motion Tracking of Lung Tumors: A Preliminary Study. Front Oncol 2018;8:292. [Crossref] [PubMed]
  11. Wei J, Chao M. A constrained linear regression optimization algorithm for diaphragm motion tracking with cone beam CT projections. Phys Med 2018;46:7-15. [Crossref] [PubMed]
  12. Yang J, Cai J, Wang H, Chang Z, Czito BG, Bashir MR, Palta M, Yin FF. Is diaphragm motion a good surrogate for liver tumor motion? Int J Radiat Oncol Biol Phys 2014;90:952-8. [Crossref] [PubMed]
  13. Cerviño LI, Jiang Y, Sandhu A, Jiang SB. Tumor motion prediction with the diaphragm as a surrogate: a feasibility study. Phys Med Biol 2010;55:N221-9.
  14. Hindley N, Keall P, Booth J, Shieh CC. Real-time direct diaphragm tracking using kV imaging on a standard linear accelerator. Med Phys 2019;46:4481-9. [Crossref] [PubMed]
  15. Hirai R, Sakata Y, Tanizawa A, Mori S. Regression model-based real-time markerless tumor tracking with fluoroscopic images for hepatocellular carcinoma. Phys Med 2020;70:196-205. [Crossref] [PubMed]
  16. Hirai R, Sakata Y, Tanizawa A, Mori S. Real-time tumor tracking using fluoroscopic imaging with deep neural network analysis. Phys Med 2019;59:22-9. [Crossref] [PubMed]
  17. Mylonas A, Keall PJ, Booth JT, Shieh CC, Eade T, Poulsen PR, Nguyen DT. A deep learning framework for automatic detection of arbitrarily shaped fiducial markers in intrafraction fluoroscopic images. Med Phys 2019;46:2286-97. [Crossref] [PubMed]
  18. Baroni G, Riboldi M, Spadea MF, Tagaste B, Garibaldi C, Orecchia R, Pedotti A. Integration of Enhanced Optical Tracking Techniques and Imaging in IGRT. J Radiat Res 2007;48 Suppl A:A61-74.
  19. Nankali S, Torshabi AE, Miandoab PS, Baghizadeh A. Optimum location of external markers using feature selection algorithms for real-time tumor tracking in external-beam radiotherapy: a virtual phantom study. J Appl Clin Med Phys 2016;17:221-33. [Crossref] [PubMed]
  20. Gierga DP, Brewer J, Sharp GC, Betke M, Willett CG, Chen GT. The correlation between internal and external markers for abdominal tumors: implications for respiratory gating. Int J Radiat Oncol Biol Phys 2005;61:1551-8. [Crossref] [PubMed]
  21. Seregni M, Cerveri P, Riboldi M, Pella A, Baroni G. Robustness of external/internal correlation models for real-time tumor tracking to breathing motion variations. Phys Med Biol 2012;57:7053-74. [Crossref] [PubMed]
  22. Cerviño LI, Chao AK, Sandhu A, Jiang SB. The diaphragm as an anatomic surrogate for lung tumor motion. Phys Med Biol 2009;54:3529-41. [Crossref] [PubMed]
  23. Dong B, Graves YJ, Jia X, Jiang SB. Optimal surface marker locations for tumor motion estimation in lung cancer radiotherapy. Phys Med Biol 2012;57:8201-15. [Crossref] [PubMed]
  24. Bertholet J, Toftegaard J, Hansen R, Worm ES, Wan H, Parikh PJ, Weber B, Høyer M, Poulsen PR. Automatic online and real-time tumour motion monitoring during stereotactic liver treatments on a conventional linac by combined optical and sparse monoscopic imaging with kilovoltage x-rays (COSMIK). Phys Med Biol 2018;63:055012. [Crossref] [PubMed]
  25. Vedam SS, Kini VR, Keall PJ, Ramakrishnan V, Mostafavi H, Mohan R. Quantifying the predictability of diaphragm motion during respiration with a noninvasive external marker. Med Phys 2003;30:505-13. [Crossref] [PubMed]
  26. Glide-Hurst CK, Ionascu D, Berbeco R, Yan D. Coupling surface cameras with on-board fluoroscopy: a feasibility study. Med Phys 2011;38:2937-47. [Crossref] [PubMed]
  27. Fayad H, Pan T, Clement JF, Visvikis D. Technical note: Correlation of respiratory motion between external patient surface and internal anatomical landmarks. Med Phys 2011;38:3157-64. [Crossref] [PubMed]
  28. Seregni M, Kaderka R, Fattori G, Riboldi M, Pella A, Constantinescu A, Saito N, Durante M, Cerveri P, Bert C, Baroni G. Tumor tracking based on correlation models in scanned ion beam therapy: an experimental study. Phys Med Biol 2013;58:4659-78. [Crossref] [PubMed]
  29. Schwarz M, Teske H, Stoll M, Bendl R. Improving accuracy of markerless tracking of lung tumours in fluoroscopic video by incorporating diaphragm motion. J Phys: Conf Ser 2014;489:012082. [Crossref]
  30. Shen C, He J, Huang Y, Wu J. Discriminative Correlation Filter Network for Robust Landmark Tracking in Ultrasound Guided Intervention. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019. MICCAI 2019. Cham: Springer, Cham, 2019.
  31. Wang Q, Gao J, Xing J, Zhang M, Hu W. DCFNet: Discriminant Correlation Filters Network for Visual Tracking. ArXiv 2017;abs/1704.04057.
  32. Keatley EL, Mageras GS. Computer Automated Quantification of Respiratory Motion in a Fluoroscopic Movie. In: Schlegel W, Bortfeld T. editors. The Use of Computers in Radiation Therapy. Berlin: Springer Berlin Heidelberg, 2000.
  33. Mukumoto N, Nakamura M, Akimoto M, Miyabe Y, Yokota K, Matsuo Y, Mizowaki T, Hiraoka M. Impact of sampling interval in training data acquisition on intrafractional predictive accuracy of indirect dynamic tumor-tracking radiotherapy. Med Phys 2017;44:3899-908. [Crossref] [PubMed]
  34. Copeland A, Barron A, Fontenot J. Analytical setup margin for spinal stereotactic body radiotherapy based on measured errors. Radiat Oncol 2021;16:234. [Crossref] [PubMed]
Cite this article as: Dai Z, He Q, Zhu L, Zhang B, Jin H, Yang G, Cai C, Tan X, Jian W, Chen Y, Zhang H, Wu J, Wang X. Automatic prediction model for online diaphragm motion tracking based on optical surface monitoring by machine learning. Quant Imaging Med Surg 2023;13(4):2065-2080. doi: 10.21037/qims-22-242

Download Citation