Advertisement

Combination of a Big Data Analytics Resource System With an Artificial Intelligence Algorithm to Identify Clinically Actionable Radiation Dose Thresholds for Dysphagia in Head and Neck Patients

Open AccessPublished:January 12, 2020DOI:https://doi.org/10.1016/j.adro.2019.12.007

      Abstract

      Purpose

      We combined clinical practice changes, standardizations, and technology to automate aggregation, integration, and harmonization of comprehensive patient data from the multiple source systems used in clinical practice into a big data analytics resource system (BDARS). We then developed novel artificial intelligence algorithms, coupled with the BDARS, to identify structure dose volume histograms (DVH) metrics associated with dysphagia.

      Methods and Materials

      From the BDARS harmonized data of ≥22,000 patients, we identified 132 patients recently treated for head and neck cancer who also demonstrated dysphagia scores that worsened from base line to a maximum grade ≥2. We developed a method that used both physical and biologically corrected (α/β = 2.5) DVH curves to test both absolute and percentage volume based DVH metrics. Combining a statistical categorization algorithm with machine learning (SCA-ML) provided more extensive detailing of response threshold evidence than either approach alone. A sensitivity guided, minimum input, machine learning (ML) model was iteratively constructed to identify the key structure DVH metric thresholds.

      Results

      Seven swallowing structures producing 738 candidate DVH metrics were ranked for association with dysphagia using SCA-ML scoring. Structures included superior pharyngeal constrictor (SPC), inferior pharyngeal constrictor (IPC), larynx, and esophagus. Bilateral parotid and submandibular gland (SG) structures were categorized by relative mean dose (eg, SG_high, SG_low) as a dose versus tumor centric analog to contra and ipsilateral designations. Structure DVH metrics with high SCA-ML scores included the following: SPC: D20% (equivalent dose [EQD2] Gy) ≥47.7; SPC: D25% (Gy) ≥50.4; IPC: D35% (Gy) ≥61.7; parotid_low: D60% (Gy) ≥13.2; and SG_high: D35% (Gy) ≥61.7. Larynx: D25% (Gy) ≥21.2 and SG_low: D45% ≥28.2 had high SCA-ML scores but were segmented on less than 90% of plans. A model based on SPC: D20% (EQD2 Gy) alone had sensitivity and area under the curve of 0.88 ± 0.13 and 0.74 ± 0.17, respectively.

      Conclusions

      This study provides practical demonstration of combining big data with artificial intelligence to increase volume of evidence in clinical learning paradigms.

      Introduction

      Dysphagia is a significant acute and late toxicity for patients undergoing radiation therapy for head and neck cancers, increasing the probability of an aspiration pneumonia posttreatment, with modern multi-institutional trials demonstrating 10% to 20% long-term dysphagia.
      • Gillison M.L.
      • Trotti A.M.
      • Harris J.
      • et al.
      Radiotherapy plus cetuximab or cisplatin in human papillomavirus-positive oropharyngeal cancer (NRG Oncology RTOG 1016): A randomised, multicentre, non-inferiority trial.
      Organ sparing of the superior constrictor muscles has been demonstrated as an advantageous use of intensity modulated radiation therapy early in application of that technology.
      • Feng F.Y.1
      • Kim H.M.
      • Lyden T.H.
      • et al.
      Intensity modulated radiotherapy of head and neck cancer aiming to reduce dysphagia: Early dose–effect relationships for the swallowing structures.
      • Eisbruch A.
      • Kim H.M.
      • Feng F.Y.
      • et al.
      Chemo-IMRT of oropharyngeal cancer aiming to reduce dysphagia: Swallowing organs late complication probabilities and dosimetric correlates.
      • Chera B.S.
      • Fried D.
      • Price A.
      • et al.
      Dosimetric predictors of patient-reported xerostomia and dysphagia with deintensified chemoradiation therapy for HPV-associated oropharyngeal squamous cell carcinoma.
      • Caudell J.J.
      • Schaner P.E.
      • Desmond R.A.
      • et al.
      Dosimetric factors associated with long-term dysphagia after definitive radiotherapy for squamous cell carcinoma of the head and neck.
      • Mazzola R.
      • Ricchetti F.
      • Fiorentino A.
      • et al.
      Dose-volume-related dysphagia after constrictor muscles definition in head and neck cancer intensity modulated radiation treatment.
      Owing to the extensive manual effort required, most single institution studies tend to be modest in size, examining a limited set of manually selected dose volume histograms (DVHs) metrics.
      Reliance on manual aggregation methods decreases the likelihood of follow-up studies as findings are implemented, and treatment planning approaches are subsequently modified. In addition, the manual effort required to collect DVH metrics constrains the range of metrics examined, introducing potential biases in selection of metrics for testing.
      Recently, we have constructed a big data analytics resource system (BDARS) that automates aggregation, integration, and harmonization of key data elements and relationships for all treated patients in a standardized framework.
      • Mayo C.S.
      • Kessler M.L.
      • Eisbruch A.
      • et al.
      The big data effort in radiation oncology: Data mining or data farming?.
      ,
      • Mayo C.S.
      • Phillips M.
      • McNutt T.
      • et al.
      Treatment data and technical process challenges for practical big data efforts in radiation oncology medical physics.
      Aggregated elements include dose volume histograms (DVHs) for all treated plans and the course cumulative as treated plan sum in both physical (Gy) and bio-corrected (equivalent dose [EQD2] Gy with α/β = 2.5, 5, 10) doses.
      • Mayo C.S.
      • Phillips M.
      • McNutt T.
      • et al.
      Treatment data and technical process challenges for practical big data efforts in radiation oncology medical physics.
      ,
      • Mayo C.S.
      • Yao J.
      • Eisbruch A.
      • et al.
      Incorporating big data into treatment plan evaluation -development of statistical DVH metrics and visualization dashboards.
      Common Terminology Criteria for Adverse Events toxicity grades were entered in our electronic health record (Epic, Verona, WI) using standardized smart list objects we developed to enable accurate, automated extraction from encounter notes with aggregation into our BDARS.
      • Mayo C.S.
      • Matuszak M.M.
      • Schipper M.J.
      • et al.
      Big data in designing clinical trials: Opportunities and challenges frontiers in oncology.
      Our objective in this study was to develop an automatable, systematic approach that enabled consideration of both physical and biologically corrected doses to both percentage and absolute volumes of organs at risk, detailing levels of evidence for each candidate metric. We developed a novel algorithmic approach that combined a statistical categorization algorithm (SCA) with a machine learning (ML) algorithm to identify the DVH metrics with the strongest associations for each structure. From these, a multistructure predictive ML model, extending the SCA, then was iteratively constructed to identify a minimal set of predictive cofactors. In this approach the end product is not the model. Instead, the end product is a minimal set of clinically actionable DVH metric inputs and thresholds, identified through use of the model, with the strongest levels of evidence for association with worsening dysphagia.

      Methods and Materials

      Patients

      Records were examined for 439 patients treated for head and neck cancer from January 2014 to September 2018 using either intensity modulated radiation therapy or volumetric arc therapy treatment plans designed on a commercial system (Varian Medical System Eclipse, Palo Alto, CA). Toxicity and DVH curves for patients whose Common Terminology Criteria for Adverse Events dysphagia toxicity scores increased from baseline recorded during the first week of radiation therapy was used in the analysis. Patients were stratified for toxicity by maximum grade ≥2. Table 1 summarizes characteristics of 132 patients identified in this cohort. Three percent of patients were enrolled on clinical trials. Overall rates of toxicity that worsened from baseline were 17.8% ≥grade 2 and 5.5% ≥grade 3.
      Table 1Characteristics of patients demonstrating worsening dysphagia
      Characteristics of 132 out of 439 demonstrating worsening dysphagia
      Sex
      Male35
      Female97
      Age (median [25% quantile, 75% quantile])62 [53, 67]
      Count of patients by diagnosis site
      Pharynx63
      Oral cavity22
      Larynx22
      Nasopharynx8
      Other17
      Follow-up days (median [25% quantile, 75% quantile])152 [52, 270]
      Count of patients with dysphagia details
       Max dysphagia = 154
       Max dysphagia = 254
       Max dysphagia = 324
       Max-Min dysphagia = 163
       Max-Min dysphagia = 250
       Max-Min dysphagia = 319

      Contouring

      Structures were contoured in a consistent fashion by a small number of physicians using agreed upon guidelines that have been in place for several years at our institution. The cervical esophagus was contoured as a tubular structure beginning at the bottom of inferior constrictor and extending to the thoracic inlet. The larynx was contoured extending from inferior border of hyoid to the inferior border of cricoid, and inferior constrictors were contoured from bottom of the hyoid to esophageal inlet, including anterior commissure and arytenoids. Superior constrictors were contoured from pterygoid plates to the inferior border of the hyoid. Inferior constrictors were contoured from inferior hyoid to cervical esophagus.

      Statistical categorization algorithm and machine learning for algorithmic evidence-based identification of DVH metric predictors

      We applied an approach combining a statistical categorization algorithm and machine learning (SCA-ML) to rank combined levels of evidence DVH metrics for ability to predict among patients demonstrating dysphagia scores that increased from start of treatment, which reached a maximum grade ≥2. Nine swallowing structures were examined (Table 2). DVH metrics were written using standardized TG-263 nomenclature.
      • Mayo C.S.
      • Moran J.M.
      • Bosch W.
      • et al.
      American Association of Physicists in Medicine task group 263: Standardizing nomenclatures in radiation oncology.
      Four as treated plan sum DVH curves were used for each structure to select from among physical and bio-corrected dose with respect to absolute and percent volume for each structure. Curves were rendered as sets of DVH metrics: Dx% (Gy), Dxcc (Gy), Dx% (α/β = 2.5) (EQD2 Gy), Dxcc (α/β = 2.5) (EQD2 Gy). Percentage volumes examined were.
      x[100,99.5,9996by1,955by5,41by1,0.5,0.0].


      For absolute volume x[vq10.5by0.5], where vq1 is the lower 1% quantile of volumes for structure in the sample.
      Table 2Summary statistics from statistical screening metrics set and combined statistical categorization algorithm and machine learning (SCA-ML) for the top physical and bio-corrected dose metrics for each swallowing structure examined
      StructureDVH metricTVNAUCPPVNPVSNSPORPETRSCA-ML
      SPCD25% (Gy)50.41290.680.690.760.920.372.90.554.1
      SPCD20% (EQ2D Gy) (✓)47.71290.680.700.900.970.357.00.574.1
      Parotid_lowD60% (Gy)13.21230.660.720.550.690.581.60.472.4
      Parotid_lowD80% (EQD2 Gy) (✓)6.01230.650.750.520.60.691.60.442.9
      SG_highD35% (Gy) (✓)61.71240.680.740.580.660.671.70.472.60
      SG_highD30% (EQD2 Gy)57.81240.680.730.580.690.631.70.481.80
      Oral_cavityD95% (Gy) (✓)15.31290.680.780.530.550.771.70.452.5
      Oral_cavityD96% (EQD2 Gy)9.81290.670.780.530.550.771.70.452.1
      Parotid_highD28.5cc (Gy) (✓)13.91290.660.800.680.780.702.50.522.4
      Parotid_highD28.5cc (EQD2 Gy)8.91290.660.80.680.780.702.50.522.4
      EsophagusD2cc (Gy) (✓)22.61240.610.690.590.820.421.70.451.5
      EsophagusD3cc (EQD2 Gy)24.31210.580.790.450.360.851.40.251.5
      IPCD90% (Gy) (✓)12.81240.660.730.590.730.591.80.481.4
      IPCD95% (EQD2 Gy)7.51240.660.720.630.800.532.00.501.2
      LarynxD25% (Gy) (☒)21.21100.600.670.880.970.315.40.494.5
      LarynxD25% (EQD2 Gy)151100.590.660.810.950.293.50.463.7
      SG_lowD45% (Gy) (☒)28.2950.710.730.850.950.464.90.605.4
      SG_lowD35% (EQD2 Gy)23.5950.690.700.930.980.359.90.584.2
      Columns correspond to the threshold value (TV), number of plans with the structure drawn (N), area under the curve (AUC) from the receiver operator characteristic analysis, positive predictive value (PPV), negative predictive value (NPV), sensitivity (SN), specificity (SP), and risk ratio determined using TV to construct a 2 × 2 contingency table. Structures not contoured on at least 90% of treatment plans (☒) are noted. For each structure, dose volume histograms (DVH) metric with the higher statistical categorization algorithm with machine learning (SCA-M) score is checked ( ✓ ).
      Abbreviations: IPC = inferior pharyngeal constrictor; PETR = positive evidence of a threshold response; SG = submandibular gland; SPC = superior pharyngeal constrictor.
      For each DVH metric we calculated a statistical screening metrics set (SSMS) to identify an optimal threshold and detail statistical evidence for its predictive value. All calculations were carried out using R (Vienna, Austria, version 4.3.3).
      • Feng F.Y.1
      • Kim H.M.
      • Lyden T.H.
      • et al.
      Intensity modulated radiotherapy of head and neck cancer aiming to reduce dysphagia: Early dose–effect relationships for the swallowing structures.
      • Eisbruch A.
      • Kim H.M.
      • Feng F.Y.
      • et al.
      Chemo-IMRT of oropharyngeal cancer aiming to reduce dysphagia: Swallowing organs late complication probabilities and dosimetric correlates.
      • Chera B.S.
      • Fried D.
      • Price A.
      • et al.
      Dosimetric predictors of patient-reported xerostomia and dysphagia with deintensified chemoradiation therapy for HPV-associated oropharyngeal squamous cell carcinoma.
      • Caudell J.J.
      • Schaner P.E.
      • Desmond R.A.
      • et al.
      Dosimetric factors associated with long-term dysphagia after definitive radiotherapy for squamous cell carcinoma of the head and neck.
      • Mazzola R.
      • Ricchetti F.
      • Fiorentino A.
      • et al.
      Dose-volume-related dysphagia after constrictor muscles definition in head and neck cancer intensity modulated radiation treatment.
      • Mayo C.S.
      • Kessler M.L.
      • Eisbruch A.
      • et al.
      The big data effort in radiation oncology: Data mining or data farming?.
      • Mayo C.S.
      • Phillips M.
      • McNutt T.
      • et al.
      Treatment data and technical process challenges for practical big data efforts in radiation oncology medical physics.
      • Mayo C.S.
      • Yao J.
      • Eisbruch A.
      • et al.
      Incorporating big data into treatment plan evaluation -development of statistical DVH metrics and visualization dashboards.
      • Mayo C.S.
      • Matuszak M.M.
      • Schipper M.J.
      • et al.
      Big data in designing clinical trials: Opportunities and challenges frontiers in oncology.
      • Mayo C.S.
      • Moran J.M.
      • Bosch W.
      • et al.
      American Association of Physicists in Medicine task group 263: Standardizing nomenclatures in radiation oncology.

      R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Available at: https://www.R-project.org/. Accessed January 16, 2020.

      • Andy Liaw A.
      • Wiener M.
      Classification and regression by random forest.
      • Robin X.
      • Turck N.
      • Hainard A.
      • et al.
      pROC: An open-source package for R and S+ to analyze and compare ROC curves.
      • Kuhn M.
      • Wing J.
      • Weston S.
      • et al.
      Caret: Classification and regression training.
      For each SSMS, a receiver operator characteristic curve was constructed, and the area under the curve (AUC) was calculated for each set of toxicity and DVH metric dose records. A DVH metric value threshold was determined with the Youden index and used to construct a 2 × 2 contingency table. Values for the 95% confidence interval for the AUC, sensitivity (SN), specificity (SP), positive predictive value (PPV), and negative predicted value were calculated. The Fisher exact test was used to calculate the P value of the 2 × 2 contingency table. Relative risk and odds ratio were calculated. Standard and scaled values for the number of true positive, false positive, true negative, and false negative values were calculated with the square root of the number of samples (N) as the scaling factor. A single-tailed Kolmogorov-Smirnov (ks) test was used to determine the P value that the distribution of doses for those without toxicities was stochastically less than the distribution of doses for those with toxicities. A single-tailed Welch T test was used to determine P for the probability that the mean of the distribution of values without toxicities is less than that with toxicities. The 15% and 25% quantiles for the distribution of doses with toxicities and the 75% and 85% quantiles for the distribution of doses without toxicities were used to demark dose-response regions.
      Using the SSMS for each structure-DVH metric, we introduced a ranking metric combining elements for positive evidence of a threshold response (PETR). PETR was based on the AUC, with weighting factors (1-0) for sTP, ks, PPV, and SN.
      PETR=AUCxLFsTP(sTP,sTP0,ksTP)xLFks(ks,ks0,kks)x(PPV+SN)2
      (1)


      We noted that AUC can be high when TP is small. Small values could be due to random events. To screen for the possibility of high AUC due to “noisy” data, we used a logistic function (LFsTP) with coefficients selected so that LFsTP = (0.5, 1.0) for sTP = (0.5, >1)
      LFsTP(sTP,sTP0,ksTP)=11+eksTP(sTPsTP0)
      (2)


      with sTP0=0.5andksTP=6/0.5.
      We noted that AUC can be high when the distribution of DVH metric values associated with the toxicity is not separated from, and higher than, the distribution of values without toxicity (ie, single sided ks is large). To screen distributions not demonstrating a transition to increased likelihood of toxicity with increasing dose (ie, a response-threshold) we used ks in a logistic function (LFks) with coefficients selected so that LFks = (0.5, 1) for ks = (0.1, < 0.01).
      LFks(ks,ks0,kks)=11+eksks(ksks0)
      (3)


      with ks0=0.1,kks=60.09
      Next, a machine learning model was used, like PETR, to rank each structure-DVH metric. Machine learning models are nondeterministic, vary substantially in selection of ranking metric (MLRM) used to score relative importance of input values, and frequently differ in which input variables are selected in models as most relevant for predicting outcomes.
      • Jackson W.C.
      • Hawkins P.G.
      • et al.
      Submandibular gland sparing when irradiating neck level IB in the treatment of oral squamous cell carcinoma.
      For this study, random forest was selected using percent incremental increase in mean square error to rank the relative relevance of input variables (ie, MLRM = percent incremental increase in mean square error).
      The product of PETR and MLRM was used for relative ranking of structure-DVH metrics for predictive ability, based on combined evidence from machine learning and more conventional statistical methods.
      SCA-ML=PETR×MLRM
      (4)


      Peak SCA-ML was used to cull the large number of candidate DVH metrics, selecting one physical and one bio-corrected DVH metric for each structure. These were categorized as primary and secondary according to their relative SCA-ML score. Absolute volume statistics (Dxcc [Gy], Dxcc [EQD2 Gy]) were dropped from consideration if x was greater than the 5% quantile of the structure volumes.

      Minimum input set for multistructure predictive model

      The minimal set of SCA-ML based metrics needed to predict dysphagia within the data set was identified through iterative construction of a machine learning model. Structures that were not drawn on at least 90% of the plans were excluded. For each remaining structure in the culled data set, the physical or biological dose metric with the largest SCA-ML was selected for the modeling data set (MDS). Plans with incomplete sets of structure-DVH metrics were excluded. At each iteration, 10-fold cross validation was used to calculate the average and standard deviation of the SP, SN, PPV, and negative predicted value across the folds.
      A baseline model was first constructed using the full MDS as inputs. The next iterative construction of a minimal input model began with constructing single input models for each element of the MDS. The element with the largest sensitivity was selected as the first input element. Elements were incrementally added to the model and ranked according to sensitivity. Model iterations were stopped when the average SN was not significantly (P < .05) different from the baseline value according to a Student’s t test.
      In routine clinical practice, physical doses are more readily available in commercial treatment planning systems than bio-corrected doses. Therefore, if the resulting model contained bio-corrected dose metrics, then the process was repeated using the physical dose metric identified in the culled data set. The sensitivity of initial iterative model to the physical dose model was compared.

      Results

      Of the 439 patients examined, 132 (27%) had dysphagia that worsened from beginning of treatment. Of those with worsening dysphagia, 78 (16%) had a maximum grade ≥2. The median (25% quantile, 75% quantile) number of days from beginning of treatment to the highest recorded toxicity greater than or equal to grade ≥2, was 37 (22, 80) days. Figure 1 illustrates the time to maximum dysphagia score.
      Figure thumbnail gr1
      Figure 1For patients demonstrating dysphagia scores that worsened from start of treatment, the median time to the first maximum toxicity record was 37 days. Median time to the last occurrence of the maximum score was 48 days.
      Seven swallowing structures were evaluated: esophagus, larynx, superior pharyngeal constrictor (SPC), inferior pharyngeal constrictor (IPC), parotids, and submandibular glands (SG). Parotids and submandibular glands were subcategorized according to their relative mean doses (parotid_high, parotid_low, SG_high, SG_low).
      In the analysis, 738 structure-DVH metrics were calculated and ranked for evidence for predicting dysphagia using SCA-ML. The top 18 are presented in Table 2. Primary (checked ✓) and secondary structure–DVH metrics identified with the SCA-ML are listed in Table 2. In order of decreasing SCA-ML, the top 3 primary structure-DVH identified in the MDS were SPC D20% (EQD2 Gy) ≥47.7, parotid_low: D80% (EQD2 Gy) ≥6, SG_high D35% ≥61.7. The top secondary structure-DVH metric was SPC D25% (Gy) ≥50.4.
      Both SG_low D45% (Gy) ≥28.8 and larynx D25% (Gy) ≥21.2 Gy had high SCA-ML scores. They were not present on at least 90% of the treatment plans. Reasons include involvement in the target volume (eg, cancer of the larynx), laryngectomy, or removal as part of neck dissection.
      Figure 2 illustrates statistical DVH curves for the physical and bio-corrected doses to the SPC, and for physical doses to SG_high, SG_low, larynx.
      • Mayo C.S.
      • Yao J.
      • Eisbruch A.
      • et al.
      Incorporating big data into treatment plan evaluation -development of statistical DVH metrics and visualization dashboards.
      Curves are color coded for patient subsets with and without worsening dysphagia scores. Statistical DVHs show the median Dx% (Gy or EQD2 Gy) values (dotted line) layered with a shaded area encompassing the central 70% of Dx% values to highlight where subsets separate.
      Figure thumbnail gr2
      Figure 2(a) Plots of statistical dose volume histograms (DVH) curves. Superior constrictor muscle (SCP) bio-corrected DVH curves are shown for patients with (red) and without (blue) worsening dysphagia. To clarify visualization and provide more quantitative detail, statistical DVH curves show median (dashed line) and 70% confidence intervals of DVH curves for (b) SCP Dx% (EQD2 Gy), (c) SCP Dx% (Gy), (d) the submandibular gland receiving the higher relative mean dose (SG_high) Dx [Gy], (e) the submandibular gland receiving the lower relative mean dose (SG_low) Dx [Gy], and (f) larynx Dx [Gy]. SG_low and larynx were not included in multistructure model due to lack of contouring on at least 90% of plans (☒). The DVH metric and threshold with the highest combined statistical categorization algorithm and machine learning (SCA-ML) score is shown for each (black dot).
      Figure 3 illustrates application of the method for physical and bio-corrected doses to the SPC and for physical doses to SG_high, SG_low, larynx. In Fig 3b, SPC Dx% (EQD2 Gy) AUCs did not vary greatly with volume or highlight specific narrow regions with evidence for response thresholds (ks). Fractional volumes of 15% to 35% demonstrated the region with the strongest evidence based on PETR scores. Note in the figure the low predictive strength near median (Gy). Also note that although AUC was elevated near to Max (Gy) (ie, D0% [Gy]), SCA-ML scoring indicated low combined evidence for dose-response threshold.
      Figure thumbnail gr3
      Figure 3(a) Illustration of combined statistical categorization algorithm and machine learning (SCA-ML) plots for determining dose volume histograms (DVH) metric demonstrating strong evidence of dose-response threshold for (b) superior constrictor muscle (SCP) Dx% (EQD2 Gy), (c) SCP Dx% (Gy), (d) the submandibular gland receiving the higher relative mean dose (SG_high) Dx (Gy), (e) the submandibular gland receiving lower relative mean dose (SG_low) Dx (Gy), and (f) larynx Dx (Gy). Area under the curve (AUC) values are plotted for each metric with color coding and symbol size differentiating P values for Kolmogorov-Smirnov test. Positive evidence of threshold response (PETR) and SCA-ML scores are scaled using the highest relative value to select metric. The threshold dose determined for each metric is plotted (dashed line). Peak SCA-ML values and thresholds are circled on the graph.
      Figure 4 shows the toxicities along with the SCAL-ML identified thresholds. A logistic regression of the data was used to characterize the overall probability of toxicity for each structure independent of the others. Comparing distributions for physical and bio-corrected SCP doses, D20% (EQD2 Gy) and D25% (Gy) graphically demonstrated dose-response thresholds with similar SCA-ML (4.092 vs 4.067) and PETR (5.4 vs 4.3) scores.
      Figure thumbnail gr4
      Figure 4(a) Univariate plots of worsening dysphagia versus ranking metrics using combined statistical categorization algorithm and machine learning (SCA-ML) selected for (b) superior constrictor muscle (SCP) Dx% (EQD2 Gy), (c) SCP Dx% (Gy), (d) the submandibular gland receiving the higher relative mean dose (SG_high) Dx (Gy), (e) the submandibular gland receiving lower relative mean dose (SG_low) Dx (Gy), and (f) larynx Dx (Gy). Threshold corresponding to peak SCA-ML is plotted (dashed line) to highlight association with the distribution. A small amount of noise was added to the binary outcome, to reduce point overlap masking the density of points. A logistic regression is plotted to characterize probability of toxicity.
      SG_low and the larynx had high scores but were excluded from the multistructure model because they had only been contoured on 95 out of 132 of the treatment plans. In the multistructure iterative model construction, there were 108 complete data sets in the MDS for the 5 candidate structures (SPC, IPC, esophagus, SG_high, parotid_high, and parotid_low) that had been contoured on at least 90% of treatment plans. The baseline sensitivity of the model constructing using the 5 primary structure-DVH metrics was 0.79 ± 0.21. Only one structure-DVH metric input, D20% (EQD2 Gy), was needed in the iterative model to achieve sensitivity comparable to the baseline. Although SPC D20% (EQD2 Gy) ≥47.7 had a higher relative risk than D25% (Gy) ≥50.4 (20.7 vs 7.1) in the SSMS, the overall sensitivity (0.78 ± 0.18 vs 0.76 ± 0.26) and AUC (0.70 ± 0.16 vs 0.70 ± 0.15) of the iteratively constructed, cross validated random forest models was comparable.

      Discussion

      Combining the big data analytics resource system with artificial intelligence enabled systematic investigation of a much larger range of structure-DVH metrics than used by other studies, using historic evidence to identify a minimal set of clinically actionable metrics and thresholds. This provides a means to incrementally improve the set of constraints used.
      Although AUC is useful, we did not find it necessarily sufficient as a sole metric for identification of dose-response thresholds. To add levels of evidence, we introduced PETR as an algorithmic method for layering combined information from conventional statistical measures that have well understood interpretability (ks, sensitivity, positive predictive value) onto AUC. We further extended the approach, by layering on “importance” metrics used by machine learning algorithms, such as random forest by introduction of SCA-ML. This layered approach enabled illustrating where combined evidence of different types of measures agree.
      The purpose for use of ML in the method was not to generate a specific model for predicting toxicity. Instead, the approach combined evidence from statistical categorization, ML and iterative construction of parsimonious model to winnow a large number of candidate inputs down to a minimal set of DVH metric inputs and thresholds with the strongest clinical evidence for increasing dose contributing to increasing toxicity. This method provides a means to follow observational data accumulated in the BDARS to identify inputs that are also clinically actionable. By objectively comparing both physical and biologically corrected doses with absolute and percentage volume cut points, it avoids a-priori judgment, of which is most relevant. In this case 738 candidate model metrics were winnowed down to the one with the strongest combined levels of evidence that was also actionable in a routine clinical setting.
      Without the advantage of a BDARS, prior studies have used substantially smaller sets of patients and of metrics tested for predicting various endpoints related to dysphagia. In a 2007 study of 36 patients who examined a total of 15 physical-dose based DVH metrics for 3 swallowing structures, Feng et al found that total pharyngeal constrictor (PC) mean (Gy) >60, V65 Gy (%) >65, and supraglottic larynx mean V50 Gy (%) >50 values had strong correlations with videoflouroscopy based aspirations.
      • Feng F.Y.1
      • Kim H.M.
      • Lyden T.H.
      • et al.
      Intensity modulated radiotherapy of head and neck cancer aiming to reduce dysphagia: Early dose–effect relationships for the swallowing structures.
      They found that only PC mean (Gy) was correlated with both patient- and provider-rated worsening of swallowing solids.
      In a 2010 study of 83 evaluable patients, Caudell et al examined 16 physical dose DVH metrics for 2 swallowing structures.
      • Caudell J.J.
      • Schaner P.E.
      • Desmond R.A.
      • et al.
      Dosimetric factors associated with long-term dysphagia after definitive radiotherapy for squamous cell carcinoma of the head and neck.
      They reported glottis and supraglotic larynx (GSL) V55 Gy (%) <32 and IPC V60 Gy (%) <11.8 were significant for stricture and risk of aspiration with odds ratios of 1.03 and 1.02, respectively. Larynx mean (Gy) ≥41 and V60 Gy (%) >24 in addition to IPC V60 Gy (%) >12 were significant for percutaneous endoscopy gastrostomy tube dependence and aspiration. SPC V65 Gy (%) ≥33 and IPC V65 Gy (%) ≥75 were associated with pharygoesophageal stricture that required dilation. Median time to diagnosis of stricture was 7 months. No aspiration was noted for larynx mean (Gy) ≤40.6.
      In a 2011 study of 73 patients, Eisbruch et al
      • Eisbruch A.
      • Kim H.M.
      • Feng F.Y.
      • et al.
      Chemo-IMRT of oropharyngeal cancer aiming to reduce dysphagia: Swallowing organs late complication probabilities and dosimetric correlates.
      found that esophagus mean (Gy) ≥48 was significant for strictures. For increased video fluoroscopy-based aspiration, scoring of PC mean (Gy) >56 and GSL mean (Gy) >39 correlated with 25% toxicity incidence. They examined 5 physical dose DVH metrics for 6 structures: SPC, IPC, mid pharyngeal constrictors and PC, GSL, and esophagus.
      In a 2017 study, Chera et al reported on 9 out of 45 patients studied with worsening dysphagia scores at 6 months.
      • Chera B.S.
      • Fried D.
      • Price A.
      • et al.
      Dosimetric predictors of patient-reported xerostomia and dysphagia with deintensified chemoradiation therapy for HPV-associated oropharyngeal squamous cell carcinoma.
      Limiting their study to fractional volumes receiving physical doses, they found that for SPC V55 Gy (%) ≥78 and V60 Gy (%) ≥40 were associated with 20% risk of toxicity. They reported 6 patients evaluated at 12 months. They did not find dose associations with esophagus, IPC, or middle constrictor muscles.
      In a 2018 study, Kamal et al
      • Kamal M.
      • Mohamamed A.S.R.
      • Volpe S.
      • et al.
      Radiotherapy dose-volume parameters predict videofluoroscopy-detected dysphagia per DIGEST after IMRT for oropharyngeal cancer: Results of a prospective registry.
      reported on 30 out of 97 patients found with moderate to severe radiation induced dysphagia at 3 to 6 months after XRT, using the Dynamic Imaging Grade for Swallowing Toxicity ≥2. They identified geniohyoid muscle V61 Gy (%) ≥18.6 was the strongest predictor. SPC V55 Gy (%) ≥97.5 and supraglottic area V23 Gy (%) ≥92.5 were also identified as predictive.
      Our specific findings that SPC D20% (EQD2 Gy) ≥47.7 and D25% (Gy) ≥50.4 are strongly associated with dysphagia are more specific, but consistent with the results of Chera et al and Caudell et al.
      • Chera B.S.
      • Fried D.
      • Price A.
      • et al.
      Dosimetric predictors of patient-reported xerostomia and dysphagia with deintensified chemoradiation therapy for HPV-associated oropharyngeal squamous cell carcinoma.
      ,
      • Caudell J.J.
      • Schaner P.E.
      • Desmond R.A.
      • et al.
      Dosimetric factors associated with long-term dysphagia after definitive radiotherapy for squamous cell carcinoma of the head and neck.
      The finding that SG_high D35% ≥61.7 was predictive may be a surrogate for sensitivity of the proximal musculature. That interpretation is consistent is with the finding of finding of Kamal et al for the geniohyoid muscle. Sparing at least one salivary structure conveyed benefit for reducing odds for worsening dysphagia. Higher observed sensitivity of SG_low D45% (%) ≥28.2 compared with parotid_low D65% (Gy) ≥13.2 (0.95 vs 0.65) at minimum signals the importance of routine contouring of these structures and monitoring of their doses, which is consistent with the results of Jackson et al.
      • Jackson W.C.
      • Hawkins P.G.
      • et al.
      Submandibular gland sparing when irradiating neck level IB in the treatment of oral squamous cell carcinoma.
      The studies of Feng et al,
      • Feng F.Y.1
      • Kim H.M.
      • Lyden T.H.
      • et al.
      Intensity modulated radiotherapy of head and neck cancer aiming to reduce dysphagia: Early dose–effect relationships for the swallowing structures.
      Eisbruch et al,
      • Eisbruch A.
      • Kim H.M.
      • Feng F.Y.
      • et al.
      Chemo-IMRT of oropharyngeal cancer aiming to reduce dysphagia: Swallowing organs late complication probabilities and dosimetric correlates.
      and Caudell et al
      • Caudell J.J.
      • Schaner P.E.
      • Desmond R.A.
      • et al.
      Dosimetric factors associated with long-term dysphagia after definitive radiotherapy for squamous cell carcinoma of the head and neck.
      focused on mean dose to the larynx or GSL and identified differing thresholds. Drawing from these early results, the historic plans examined in this data set had used larynx:mean (Gy) ≤50 as a high priority constraint. The finding that D25% (Gy) ≥21.2 had a high sensitivity (SN = 0.97) suggests that controlling dose to small volumes may convey additional advantage.
      Esophagus was noteworthy for identifying absolute versus a percentage volume D2cc [Gy] ≥22.6 as the strongest predictor. One interpretation is that the small volume of the esophagus proximal to the larynx could act as a surrogate measure for larynx dose. Additional inspection of the relative location of these sub volumes would be needed to confirm that interpretation.
      Historic plans had been created using IPC:mean (Gy) <20 as a high priority constraint. D90% (Gy) ≥12.8 reinforced use of the historic constraint to reduce doses to IPC. This highlights an important point to be noted in modeling dose responses. Results should be viewed in the context of intrinsic biases introduced by dose constraints used in creating treatment plans. In this instance not finding median (D50% [Gy]) dose more significant than D90% (Gy), could mean that the metric has already been sufficiently constrained by the default mean (Gy) <20 constraint and that significance of D90% (Gy) signals potential to augment, not replace, this default metric.
      Ability to use historic data gathered from routine practice, by combining the BDARS with AI, underscores the importance of consistency in contouring approaches within and among clinics. For example, we noted substantial differences in sensitivity of SPC versus IPC metrics for predicting worsening dysphagia. This highlights importance of contouring these structures separately. Other clinics may only contour a generalized PC structure as part of their practice guidelines. In that case, those clinics would miss the opportunity to detect differences for predicting toxicities or to use that information to reduce toxicities. Similarly, high SCA-ML scores for the parotid and submandibular gland structures underscore the value of consistently contouring both (if unresected) as part of routine treatment planning.
      The potential for use of observational clinical data coupled with AI to improve hypothesis generation in design processes for randomized controlled trials has been discussed previously.
      • Mayo C.S.
      • Matuszak M.M.
      • Schipper M.J.
      • et al.
      Big data in designing clinical trials: Opportunities and challenges frontiers in oncology.
      The method described here illustrates a potential example. Results provide strong levels of evidence for selection of specific DVH metrics and associations that could be tested in a subsequent multi-institutional trial. Evidence that larynx and SG_low DVH metrics may play a second order role to SC in predicting dysphagia underscore the need for consistent contouring of these structures to detail interactions in such a trial. Observation of the natural history occurrence of toxicity (Fig 1) could provide more specific guidance for selection of measurement time intervals.

      Conclusions

      By combining a big data analytics resource system with an AI algorithm, we were able to examine evidence for response thresholds for a much larger set of patients and DVH metrics than conventional approaches. Calculating both physical and biologically corrected doses and percentage and absolute volume DVH metrics, the approach was better able to follow the data and minimize metric selection bias. This presents a means that can in be automated to enable iterative learning from historic treatments to inform decision frameworks for future patients with clinically apprehensible metrics.

      References

        • Gillison M.L.
        • Trotti A.M.
        • Harris J.
        • et al.
        Radiotherapy plus cetuximab or cisplatin in human papillomavirus-positive oropharyngeal cancer (NRG Oncology RTOG 1016): A randomised, multicentre, non-inferiority trial.
        Lancet. 2019; 393: 40-50
        • Feng F.Y.1
        • Kim H.M.
        • Lyden T.H.
        • et al.
        Intensity modulated radiotherapy of head and neck cancer aiming to reduce dysphagia: Early dose–effect relationships for the swallowing structures.
        Int J Radiat Oncol Biol Phys. 2007; 68: 1289-1298
        • Eisbruch A.
        • Kim H.M.
        • Feng F.Y.
        • et al.
        Chemo-IMRT of oropharyngeal cancer aiming to reduce dysphagia: Swallowing organs late complication probabilities and dosimetric correlates.
        Int J Radiat Oncol Biol Phys. 2011; 81: e93-e99
        • Chera B.S.
        • Fried D.
        • Price A.
        • et al.
        Dosimetric predictors of patient-reported xerostomia and dysphagia with deintensified chemoradiation therapy for HPV-associated oropharyngeal squamous cell carcinoma.
        Int J Radiat Oncol Biol Phys. 2017; 98: 1022-1027
        • Caudell J.J.
        • Schaner P.E.
        • Desmond R.A.
        • et al.
        Dosimetric factors associated with long-term dysphagia after definitive radiotherapy for squamous cell carcinoma of the head and neck.
        Int J Radiat Oncol Biol Phys. 2010; 76: 403-409
        • Mazzola R.
        • Ricchetti F.
        • Fiorentino A.
        • et al.
        Dose-volume-related dysphagia after constrictor muscles definition in head and neck cancer intensity modulated radiation treatment.
        Br J Radiol. 2014; 87: 20140543
        • Mayo C.S.
        • Kessler M.L.
        • Eisbruch A.
        • et al.
        The big data effort in radiation oncology: Data mining or data farming?.
        Adv Radiat Oncol. 2016; 1: 260-271
        • Mayo C.S.
        • Phillips M.
        • McNutt T.
        • et al.
        Treatment data and technical process challenges for practical big data efforts in radiation oncology medical physics.
        Med Phys. 2018; 45: e793-e810
        • Mayo C.S.
        • Yao J.
        • Eisbruch A.
        • et al.
        Incorporating big data into treatment plan evaluation -development of statistical DVH metrics and visualization dashboards.
        Adv Radiat Oncol. 2017; 2: 503-514
        • Mayo C.S.
        • Matuszak M.M.
        • Schipper M.J.
        • et al.
        Big data in designing clinical trials: Opportunities and challenges frontiers in oncology.
        Front Oncol. 2017; 7: 187
        • Mayo C.S.
        • Moran J.M.
        • Bosch W.
        • et al.
        American Association of Physicists in Medicine task group 263: Standardizing nomenclatures in radiation oncology.
        Int Journal Radiat Oncol Biol Phys. 2018; 100: 1057-1066
      1. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Available at: https://www.R-project.org/. Accessed January 16, 2020.

        • Andy Liaw A.
        • Wiener M.
        Classification and regression by random forest.
        R News. 2002; 2 (Available at:): 18-22
        https://CRAN.R-project.org/doc/Rnews/
        Date accessed: January 16, 2020
        • Robin X.
        • Turck N.
        • Hainard A.
        • et al.
        pROC: An open-source package for R and S+ to analyze and compare ROC curves.
        BMC Bioinformatics. 2011; 12: 77
        • Kuhn M.
        • Wing J.
        • Weston S.
        • et al.
        Caret: Classification and regression training.
        (2018 R package version 6.0-80. Available at:)
        https://CRAN.R-project.org/package=caret
        Date accessed: January 16, 2020
        • Jackson W.C.
        • Hawkins P.G.
        • et al.
        Submandibular gland sparing when irradiating neck level IB in the treatment of oral squamous cell carcinoma.
        Med Dosim. 2019; 44: 144-149
        • Kamal M.
        • Mohamamed A.S.R.
        • Volpe S.
        • et al.
        Radiotherapy dose-volume parameters predict videofluoroscopy-detected dysphagia per DIGEST after IMRT for oropharyngeal cancer: Results of a prospective registry.
        Radiother Oncol. 2018; 128: 442-451