Radiomics and machine learning analysis of liver magnetic resonance imaging for prediction and early detection of tumor response in colorectal liver metastases
Article information
Abstract
Purpose
The aim of this study was to demonstrate the effectiveness of a machine learning-based radiomics model for distinguishing tumor response and overall survival in patients with unresectable colorectal liver metastases (CRLM) treated with targeted biological therapy.
Methods
We prospectively recruited 17 patients with unresectable liver metastases of colorectal cancer, who had been given targeted biological therapy as the first line of treatment. All patients underwent liver magnetic resonance imaging (MRI) three times up until 8 weeks after chemotherapy. We evaluated the diagnostic performance of machine learning-based radiomics model in tumor response of liver MRI compared with the guidelines for the Response Evaluation Criteria in Solid Tumors. Overall survival was evaluated using the Kaplan-Meier analysis and compared to the Cox proportional hazard ratios following univariate and multivariate analyses.
Results
Performance measurement of the trained model through metrics showed the accuracy of the machine learning model to be 76.5%, and the area under the receiver operating characteristic curve was 0.857 (95% confidence interval [CI], 0.605–0.976; P<0.001). For the patients classified as non-progressing or progressing by the radiomics model, the median overall survival was 17.5 months (95% CI, 12.8–22.2), and 14.8 months (95% CI, 14.2–15.4), respectively (P=0.431, log-rank test).
Conclusion
Machine learning-based radiomics models could have the potential to predict tumor response in patients with unresectable CRLM treated with biologic therapy.
INTRODUCTION
Colorectal cancer (CRC) is the third most common cancer and the second most common cause of cancer deaths worldwide for both sexes [1]. In 2020, about 1.9 million new CRC cases were diagnosed and 935,000 CRC deaths occurred. The global incidence of CRC was 783,000 cases in 1999 and increased to 1.8 million cases from 2002 to 2007, increasing by 3.2% per year [1]. CRC metastasis is a major problem after curative treatment and is the important cause of CRC-related death [2]. The liver is the most common distant metastasis organ in CRC [3]. Approximately 50% of CRC patients have hepatic metastases over the course of disease [4]. Curative surgical resection and chemotherapy are the standard treatment options for patients with colorectal liver metastasis (CRLM) [5]. However, only a small number of patients are suitable for curative surgery. Due to factors such as the tumor location and size, unresectable disease, presence of extrahepatic disease, or comorbidities, surgery is applicable in only 10%–20% of cases, and the 5-year survival rate is approximately 30% [6,7].
However, due to the heterogeneity of CRLM, patients may respond differently to chemotherapy [8]. A Response Evaluation Criteria in Solid Tumors (RECIST 1.1) is mainly used to assess response to chemotherapy, primarily measuring and classifying changes in longest axis tumor diameter [9]. However, due to the irregular or complex shape of tumors, these size measurements criteria may not be representative of the entire tumor volume. When treating CRLM with targeted drugs and systemic chemotherapy, tumor response is mostly reflected by changes in internal composition including the amount of residual cancer cells, degree of necrosis and fibrosis, and cystic degeneration [10]. Moreover, the correlation between RECIST and pathological response is known to be limited [11]. Accurate prediction and differentiation of CRC liver metastases is important to plan appropriate treatment and improve patient outcomes. However, the predictive accuracy and efficiency remain unsatisfactory, especially considering that functional imaging is not routinely used in a clinical setting.
Radiomics is an emerging field of diagnostic interest in oncology [12]. Routine patient images are transformed into mineable quantitative data which can be used to assess tumor phenotypes for improving diagnosis, determining prognosis, and predicting treatment response. Radiomics can predict TNM grade, histologic grade, treatment response, and survival in various oncology [13]. Machine learning has garnered attention and showed significant results in biomedical imaging. The main architecture of machine methods used to process imaging data is the convolutional neural network (CNN). Machine learning using CNNs specifically excel in visual classification tasks and has excellent performance with high sensitivity and specificity [14]. This method enables abstraction from a large volume of heterogeneous and high-dimensional raw datasets with multiple hidden layers, identifying and amplifying vital features, which are then further refined to allow visual tasks. To our knowledge, machine learning on magnetic resonance images to predict the response of CRLM to chemotherapy is used in very limited studies.
Therefore, we hypothesized that radiomics applied to serial liver magnetic resonance imaging (MRI) might be predictive of early tumor response and could be a prognostic biomarker of patient survival. In addition, we adopted an artificial intelligence algorithm using radiomics data to differentiate radiologic tumor response and predict survival of CRLM treated with biological therapy.
METHODS
Patients
The study was given ethical approval by the Institutional Review Board of the Gil Medical Center (IRB No. GAIRB2015-34). The study was conducted in accordance with the Declaration of Helsinki guidelines. All participants provided written informed consent. Between November 2015 and April 2018, we prospectively enrolled 17 patients with unresectable CRLM who were scheduled to undergo chemotherapy. Patients considered to have unresectable liver metastases had residual liver volumes that were too small for the extent of resection required for complete resection of all metastases. This is defined as a resection with less than 30% liver residual or less than 40% for patients who received intensive chemotherapy prior to surgery. Exclusion criteria were as follows: (1) double primary cancer; (2) unable to undergo MRI procedure because of the presence of pacemakers, defibrillators, neurostimulators, prohibited medical implants, and foreign bodies (e.g., bullets, shrapnel, and metal slivers). All included patients received at least two cycles of standard chemotherapy (FOLFIRI; folinic acid fluorouracil and irinotecan) with bevacizumab or cetuximab. Progression-free survival (PFS) was defined as the length of time during and after treatment that the cancer does not grow or spread further. Overall survival (OS) was defined as the length of time from either the date of diagnosis or the start of treatment for a disease, such as cancer, that patients diagnosed with the disease are still alive.
Methods including doses of contrast agent and MRI protocol
All patients received a bolus injection of 0.025 mmol gadoxetic acid (Primovist; Bayer Schering Pharma) per kilogram of body weight via antecubital vein at 1 mL/sec by using a power injector, followed by a 20-mL saline chaser. All MRI examinations were performed using a 3.0 T MRI scanner (Verio, Siemens Healthcare) with an 8-channel phase array body coil. The routine liver MR sequence consisted of pre-contrast heavily T2-weighted images, diffusion-weighted images using two b-values (0 and 1,000 s/mm2), pre-contrast T1-weighted images (T1WI), and dynamic contrast-enhanced T1WI. After intravenous injection of contrast media, arterial, portal venous, transitional, and hepatobiliary phases were obtained. hepatobiliary phase images were obtained 20 minutes after the injection of gadoxetic acid. The parameters of MRI were as follows: repetition time/echo time, 4.5 ms/1.99 ms; slice thickness, 3–5 mm; field of view, 380×297 mm2; matrix, 384×240. Pre-treatment liver MRI (baseline MRI) was conducted within 1 month of targeted therapy, and the first post-treatment liver MRI was conducted 3–4 weeks after targeted therapy. The second post-treatment liver MRI was conducted 8 weeks after targeted therapy.
Image analysis
Images were interpreted in consensus by two radiologists (S.J.A., who had 10 years and S.Y., who had 4 years of abdominal imaging experience) blinded to the demographics and image reports of the patients.
In colon cancer, the tumor length was measured as the longest diameter between transverse, coronal and sagittal scan of computed tomography (CT). In rectal cancer, the tumor length was measured as the largest tumor extent along the long axis of the colorectum on rectal MRI. T stages were classified as follows: Tx: primary tumor cannot be assessed; T0: no visible primary tumor; T1: tumor extends to involve the submucosa; T2: tumor extends to involve the muscularis propria; T3: tumor extends beyond the muscularis propria to involve mesorectal fat; and T4: tumor infiltrates/invades the peritoneum (T4a) or other pelvic organs and structures (T4b). T1 stage was described as an intraluminal extension without intestinal wall thickening, T2 stage was evaluated as an asymmetrical wall thickening with clear adjacent pericolonic or mesorectal fat tissue, and T3 stage was described as smooth or nodular extension of a discrete mass through the intestinal wall into pericolonic or mesorectal tissues. In nodal staging of colon cancer on CT, a size criterion of 5 mm maximum short axis nodal diameter was used to differentiate benign nodes from metastatic ones. N1 was evaluated as one to three lymph nodes with a short axis larger than 5 mm or three or more abnormally clustered normal-sized lymph nodes; N2 was evaluated as four or more lymph nodes with a short axis larger than 5 mm. In rectal cancer, mesorectal, superior rectal, and inferior mesenteric nodes (superior to the take-off of the left colic artery from the inferior mesenteric artery), and internal iliac and obturator lymph nodes were considered as locoregional lymph nodes in the setting of rectal cancer. For the initial nodal staging criteria for regional mesorectal, superior rectal, and inferior mesenteric lymph nodes, the Dutch criteria were adopted (i.e., criteria published by the European Society of Gastrointestinal and Abdominal Radiology and the Society of Abdominal Radiology,s Colorectal and Anal Cancer Disease Focus Panel). These criteria include the short-axis dimensions and morphologic characteristics including irregular borders, heterogeneous signal intensity, and round shape [15]. If the short-axis length of a regional lymph node is greater than 9 mm, it is considered as suspicious regardless of its morphology. When the short axis is between 5–9 mm, two morphological criteria are required. If the short axis is <5 mm, three criteria are required. For regional lateral pelvic lymph nodes, such as internal iliac and obturator lymph nodes, a size >7 mm in the short axis is required. The axial diameter of liver metastases was measured using baseline liver MRI. For each patient, two target lesions were selected for analysis [9]. The target lesions were the largest, most reproducible, and most dominant lesions treated during chemotherapy.
Response assessment
Treatment response was classified as a complete response (CR), partial response (PR), progressive disease (PD), or stable disease (SD) according to both RECIST (version 1.1) and volumetric criteria. Treatment response was calculated as the mean of two times the tumor diameter according to the RECIST classification: (1) CR: disappearance of all target lesions; (2) PR: at least 30% reduction in the sum of the target lesion diameters; (3) SD: absence of PR or PD; and (4) PD: at least 20% increase in the sum of the target lesion diameters or the appearance of new lesions [9]. In the present study, patients with CR, PR, or SD were defined as non-progressing group, and patients with PD were defined as the progressing group. Two independent radiologists (S.J.C. and S.Y.), blinded to the clinical information, independently performed the grading. The gold standard for calculating area under the receiver operating characteristic curve (AUROC) in the radiomics model and RECIST criteria of MRI is the RECIST criteria obtained using post-treatment CT performed as a routine follow-up CT 12 weeks after targeted therapy.
Radiomics and machine learning
Radiomics features were extracted from T2-weighted and diffusion-weighted images (Fig. 1). To extract image features from the MRI data, segmentation of liver tumor and a binary mask was created by specifying the lesion area of the image as a region of interesting (Fig. 2). Using the pyradiomics library, original image data was added as input and the generated binary mask was used to extract the radiomics features for the lesion area on the original image. Radiomics feature extraction is a process that quantifies the image features of a specified area through a mask such as a tumor or tissue in image data into quantitative variables such as shape features, first-order features, and second-order features, and is mainly used to analyze medical image data to evaluate the characteristics of a specified area and predict prognosis.
The extracted radiomics variables consisted of 107 features, including shape (14 features), first-order (18 features), and second order including Gray-Level Co-occurrence Matrix and Gray-Level Dependence Matrix (75 features). To ensure the model used in this study reflected the changes in MR imaging data over time, we used the Long Short-Term Memory (LSTM) structure. The LSTM is a type of recurrent neural network that can be trained by inputting data converted to a sequence dataset. After training the T2- and diffusion-weighted images using this LSTM structure, the training results of the two models are combined into one ensemble model, and the final prediction model structure is constructed through two dense layers.
Statistical analysis
Categorical data are presented as percentages, frequencies, and differences in proportion, and were compared using the chi-square or the Fischer exact test. Continuous data with significantly skewed distributions are presented as medians and compared using the Mann-Whitney U-test. Mean values of continuous variables with normal distributions were compared using unpaired Student t-tests. Cumulative survival analysis was performed using the Kaplan-Meier method, and the differences in survival between the groups were assessed using the log-rank test. Potential prognostic factors of survival were evaluated using the Cox proportional hazard model. Univariate analyses were performed to identify significant predictors of survival. Characteristics determined to be statistically significant (P<0.05) in the univariate analysis were used as input variables for multiple logistic regression analysis. Univariate and multiple logistic regression analyses were performed to determine the potential predictive factors of PD. We evaluated the diagnostic performance of radiomics and machine learning model for tumor response. The performance of diagnostic criteria was assessed by the standard AUROC, sensitivity, specificity, and accuracy. Statistical analyses were performed using IBM SPSS Statistics 23 (IBM Corp.). A P-value <0.05 was considered statistically significant.
RESULTS
The CRLM demographic results are shown in Table 1. The mean age was 62.9±9.6 years (mean±standard deviation). Of the 17 patients, four (24%) had right colon cancer, eight (47%) had left colon cancer, and five (29%) had rectal cancer. The mean length of colorectal tumors was 47.9±8.0 mm. The length of the tumor was <5 cm (52.9%) in nine patients and length was ≥5 cm (47.1%) in eight patients. In the tumor stages, five patients had clinically T4 stage (29.4%) and 12 patients had clinically T3 stage (70.6%). Regional lymph node metastases was reported in 14 patients, and three patients did not have lymph node metastases. In the liver metastases, nine patients (52.9%) had ≥10 liver metastases and eight patients (47.1%) had less <10 liver metastases. In seven patients (41.2%), liver metastases was <4 cm in size, and 10 patients (58.8%) had liver metastases of ≥4 cm. Liver metastases at both lobes of liver were reported in 15 patients (88.2%), and 10 patients (58.8%) had less than 10 liver metastases. Two of the patients had single lobe metastases but were classified as unresectable liver metastases due to the small expected liver volume after hepatic surgery. Portal vein invasion was reported in three patients (17.6%), and 14 (82.4%) patients did not have portal vein invasion. A treatment cycle of FOLFIRI plus bevacizumab or cetuximab was 2 weeks (14 days). The initial follow-up interval of CT scan is usually 3 months after target therapy. FOLFIRI plus bevacizumab was administered to 14 patients, and FOLFIRI plus cetuximab was administered to three patients. The median value of PFS and the OS of all patients was 12.6 months and 17.5 months, respectively (Table 1). Tumor markers were classified into two groups based on 15 ng/mL of carcinoembryonic antigen and 37 U/mL of carbohydrate antigen 19-9 [16].
Variables were extracted from 107 radiomics variables in both baseline and follow-up MRI, and ultimately selected five variables through feature selection calibration (Table 2). We obtained radiomics model from these selected features. The performance of the radiomics model with metrics is shown to have sensitivity, specificity, accuracy and AUROC of 100%, 71.4%, 76.5%, and 0.857 (95% confidence interval [CI], 0.605–0.976; P<0.001), respectively (Fig. 3). For patients classified as non-progressing or progressing by RECIST, the median OS was 17.5 months (95% CI, 11.5–23.4) and 14.6 months, respectively (P=0.132, log-rank test). For patients classified as non-progressing or progressing by radiomics model, the median OS was 17.5 months (95% CI, 12.8–22.2), and 14.8 months (95% CI, 14.2–15.4), respectively (P=0.431, log-rank test).
Logistic regression analysis conducted to determine whether machine learning-based radiomics was an independent predictor of PD did not show any significant independent predictor of PD (Table 3). Univariate analyses based on a Cox proportional hazard model conducted to identify significant predictors of OS did not show any significant factors that affected OS (Table 4).
DISCUSSION
Our study demonstrated that radiomics and machine learning models have the potential to early predict tumor response in patients with unresectable CRLM treated with targeted therapy. MRI data with machine learning-based radiomics predicted outcomes of better accuracy after 8 weeks. Furthermore, our study combined radiology data from diffusion-weighted images and T2-weighted images, which do not require contrast media, so it can be used in patients without compromising the kidney function (unlike the commonly used contrast-enhanced CT).
Accurate and early detection of response to treatment of liver metastases is important for optimal intervention planning [17]. Early, reliable prognostic information may help physicians to develop proper treatment plans for individual patients and allow for timely attempts of alternative therapies for treatment-resistant tumors. Chemotherapy is the main treatment for patients with unresectable liver metastases. With the advent of chemotherapy regimens, response rates to first-line chemotherapy using FOLFOX/FOLFIRI and biologic agents, such as vascular endothelial growth factor inhibitor or epidermal growth factor receptor inhibitors, is up to 60%–70%, and median survival is up to 34 months in patients with metastases [18]. However, PFS still averages about 10 months, and response rates for second-line chemotherapy is approximately 30% [19]. With modern chemotherapy, a subset of patients (approximately 15%–40%) with unresectable disease can transform to resectable disease, and the long-term outcome in these patients is comparable to those patients who had an original diagnosis of resectable disease (i.e., a 5-year survival of 30%–40%) [19,20]. Patients on chemotherapy who continue to have unresectable disease either because of lack of adequate response or because of progression of disease have a poor prognosis. In addition to the improved efficacy of systemic chemotherapy, factors such as portal vein embolization, two-stage hepatectomies, ablation techniques, expanding criteria for resection, and improved surgical and parenchymal transection techniques have contributed to an increase in secondary resection rates. Further, with the development of various chemotherapy and surgical modalities, there is an increasing need for sophisticated quantitative analysis methods that go beyond the limitations of traditional tumor length-based RECIST criteria to obtain early and objective data for response assessment.
Radiomics involves extracting numerous high-dimensional or semi-quantitative features from images. This can be achieved through image acquisition, targeted tumor segmentation, by creating features generation, and database development. Images used for radiomics analysis are collected at various hospitals or data centers. Therefore, these images are typically obtained using different parameters and protocols and reconstructed using different types of software. These varying methods of data collection can have unexpected effects on radiomics models. Targeted tumor segmentation is important because subsequent feature data is generated from the segmented volumes. This is difficult because the boundaries of many tumors are unclear in clinical practice. Generation of create features means extracting semantic features such as dimension, necrosis, margin, position, or non-semantic features including shape, histogram, or texture of tumor [21]. Radiomics represents the correlation between these features and the diagnosis or prognosis of cancer [22]. Radiomic parameters offer notable advantages over qualitative imaging assessments because qualitative imaging can be limited by the manual resolution used by clinicians or radiologists [23].
Some studies showed that radiomics features were effective in predicting chemotherapy response in malignant tumor, hence, demonstrating clinical utility in response prediction [24,25]. However, previously used hard-coded texture features were not specifically designed for targeted clinical issues, which limited the predictive validity. With the development of the deep learning technique, the neural network is more commonly used in radiomics studies, and has achieved expert-level performance in rectal cancer and liver diseases [26]. Deep learning-based quantitative features can supplement unrevealed imaging features in addition to the conventional radiomic features, to improve the predictive power. Additionally, deep learning-based radiomics are not time-consuming [27].
With the advances in machine learning technology, neural networks are more commonly used in radiomics research and have been shown to have good performance in rectal cancer and liver diseases [28]. Quantitative features via machine learning can complement traditional radiologic features with unpublished imaging features to improve predictive power. In addition, machine learning-based radiology can avoid precise manual segmentation of tumors, which is time-consuming and can greatly increase the utility of radiology in clinical settings. However, to our knowledge, evidence on the beneficial use of machine learning-based MRI to predict the response of CRLM is lacking.
Zhu et al. [10] reported that MRI-based machine learning-assisted model had better accuracy (0.875 vs. 0.578) and AUC (0.849 vs. 0.615) than RECIST in distinguishing treatment response to preoperative chemotherapy in patients with CRLM. That study showed that the machine learning model better distinguished survival outcomes after hepatectomy compared with RECIST criteria. Wei et al. [27] reported that the deep learning-based model provided better performance than the traditional classifier-based radiomics model and had a significantly higher AUROC (training: 0.903 vs. 0.745; validation: 0.820 vs. 0.598) when compared to contrast-enhanced CT. Previous studies have reported that apparent diffusion coefficient in diffusion-weighted MRI is relevant to response to chemotherapy [29,30]. However, those studies primarily proved the correlation between imaging characteristics and chemotherapy response but lacked quantitative and accurate prediction. Our study results have implications for refining response assessment strategies in patients with unresectable CRLM. The sensitivity of the radiomics model in identifying subtle changes in treatment response highlights its potential as a complementary tool in clinical decision-making.
Our study has several limitations. First, our study had a small number of patients. A larger sample size is needed to assess the clinical validity of our study findings. Although the deep learning model yielded satisfactory results in assessing tumor response, a larger sample size and data from multiple centers should be used to optimize the robustness and reproducibility of the deep learning model. Second, our analysis was a single-center study, and a prospective external validation of machine learning model using radiomics data was not done. Further validation studies and prospective trials are warranted to establish the generalizability and reliability of the radiomics model in diverse clinical settings. Third, we enrolled CRLM patients treated with different target agents (bevacizumab and cetuximab). However, both bevacizumab and cetuximab are now routinely used molecular target agents in metastatic CRC. Thus, this study could be indicative of real-time clinical practice.
In conclusion, machine learning-based radiomics models could have the potential to predict early tumor response in patients with unresectable CRLM treated with biologic therapy. Our findings suggest the possibility of further research into personalized treatment strategies for CRLM patients and encourage further exploration of advanced imaging methodologies in precision oncology to improve early tumor response assessment paradigms.
Notes
This study was financially supported by Bayer. Except for that, no potential conflict of interest relevant to this article was reported.
FUNDING
This study was financially supported by Bayer, but the authors had complete control of the data and information submitted for publication at all times.