“Arm yourself with the comprehensive but nuanced science behind hepatocellular carcinoma (HCC),” Riad Salem (Northwestern Memorial Hospital, Chicago, USA) instructed the audience at the Global Embolization Oncology Symposium Technologies (GEST; 9–12 May, New York, USA) during his Honorary Lecture, which concluded with the warning: “Do not let yourself be bullied by the tyranny of the randomised controlled trial”.
Arguing that “so-called evidence-based medicine is really not [that] 90% of the time”, Salem challenged the notion that randomised controlled trials with a primary endpoint of overall survival should be the gold standard in interventional radiology. However, he began by conceding that “the randomised design is ideal—it is the best. It directly compares [the control] to the treatment, it minimises the bias and gives you causality where you can say ‘this treatment led to this endpoint’. You can power a randomised control trial with a large enough sample size that you minimise type 1 and type 2 error, and of course it is the most influential data”. However, though he says this is all possible in theory, he goes on to recount how in practice, this is often far from the case.
Speaking generally, Salem opined: “In interventional radiology (IR) and other areas, these are very expensive and complicated studies to do, with sample sizes that are sometimes impractical, if not impossible. They take years to complete and, as a result, what is their relevance [when completed]?”.
There is a precedent for using lower level data to inform procedural decisions
Salem argued that there are precedents for not relying on randomised controlled data to guide procedural choice. In the late 1990s, Salem informed GEST delegates that he used to perform a lot of chemoembolization procedures. However, it was only in 2002 that Josep Llovet (University of Barcelona, Barcelona, Spain) and colleagues published in The Lancet findings from a randomised controlled trial that reported chemoembolization improved survival compared with conservative treatment. Writing at the time, Llovet et al said: “There is no standard treatment for unresectable HCC. Arterial embolization is widely used, but evidence of survival benefits is lacking.” In response to this, Salem continued, “Before 2002, we were all performing chemoembolization based on phase II data”. However, Salem emphasised its limitations: “You have a 112 patient clinical trial with three arms; notice each arm is about 35 to 40 patients, so it is very small. This study here, that we all quote [to support the use of chemoembolization as the standard of care in unresectable HCC patients], basically was a single-centre study. There is a low sample size, there is no active control, so there is relatively limited power—all the things that we currently need in a randomised controlled trial are not necessarily present in this clinical trial.” In fact, the trial was stopped when the ninth sequential inspection showed that chemoembolization had survival benefits compared with conservative treatment.
Examining the evidence presented in the various guidelines that help inform physician’s practice, Salem reported a distinct lack of level 1 data. Of the National Comprehensive Cancer Network (NCCN) guidelines, Salem said: “95% of the recommendations are based on non-level 1 evidence”. Continuing through other frameworks, he stated that in the American Association for the Study of Liver Diseases (AASLD) guidelines, two of the 21 recommendations are based on level 1 data. He stressed that “everything else is lower level evidence”. The European Association for the Study of the Liver (EASL) guidelines tell a similar story: three of their 36 recommendations hinge on level 1 evidence.
Salem summarised: “We have guidelines that tell us what we should and should not do, yet most of them are not based on the very data that we are supposed to be generating, so there is some confusion. […] As interventional radiologists, we have to recognise that many standards of care are not supported by randomised controlled trials, particularly with overall survival as the endpoint.”
He then goes on to enumerate the several procedures performed by interventional radiologists for which level 1 evidence does not exist: bland transarterial chemoembolization (TACE), drug-eluting bead (DEB) TACE, or Yttrium-90 (Y-90) TACE to treat HCCs or neuroendocrine tumours; ablation in colon cancers and HCCs; resection for colon cancer or HCC metastases. “Liver transplantation,” he added, “a goal that many of us seek to achieve for our patients, a gold standard, curative option—what is that based on? A phase II, single-arm study published in 1996 and based on 48 patients. It is the standard of care, I understand that, but we have to make sure that we compare our therapies and their respective levels of evidence with what other standards are based on.”
Not all randomised controlled trials change clinical practice
“What we want are randomised controlled trials that provide clinically meaningful results. I think that is what we believe in interventional radiology,” Salem stated. However, he went on to highlight how several randomised controlled trials were far from practice-changing.
Using the example of a phase III pancreatic cancer study published in the Journal of Clinical Oncology in 2007 by Malcolm Moore (Princess Margaret Hospital, Toronto, Canada) et al, Salem explained how, despite filling all the criteria for a randomised controlled trial, the results may still not be illuminating. Moore and colleagues set out to see if they could improve the chances of survival in patients with unresectable, locally advanced or metastatic liver cancer by adding erlotinib to gemcitabine. Comparing the erlotinib and gemcitabine group with the gemcitabine only cohort, the investigators found there was an overall survival benefit of 9.9 days in the combined group. “More than 500 patients, level 1 evidence—and a survival benefit of 9.9 days,” Salem reiterated. “Is erlotinib and gemcitabine the gold standard? I am not a medical oncologist, I do not know, but this trial satisfies everything the purists really want when it comes to generating level 1 evidence. I do not know if it actually changes standard of care though.” A recent publication in Oncology Reviews by Amrallah Mohammad (Zagazig University, Zagazig, Egypt) concluded that “only [a] small set of patients get small benefit” from the combined use of erlotinib and gemcitabine.
Randomised controlled trials “do not necessarily reflect real-world outcomes”
Randomised controlled trials are often designed by statisticians and physicians to fit what is logistically possible, and therefore do not necessarily reflect real-world outcomes, Salem said. He uses several examples to illustrate this point. Firstly, he explained how he suspects there is “a little bit of doctoring” going on when choosing the sample size for a randomised controlled trial: “Interestingly, if you look at many of these trials that we reported, sample size is 300, 400, 600—it is almost always an even number”. The sample size is the result of back and forth between the principal investigator and the statistician, and is, in theory, calculated from the initial hypothesis and the DELTA (difference elicitation in trials). Salem explains how, in practice, determining the sample size is often a case of calculating how many patients a physician believes they may realistically be able to enrol in a trial, and then “reverse engineering” a hypothesis and DELTA from that number (though he accedes he is “a little bit cynical”).
In addition to this alleged data engineering, Salem pointed to narrow inclusion criteria as slowing recruitment to trials. That so many are excluded slows recruitment, he said, and means that by the time enough patients are enrolled and have been followed-up, and the data have been analysed, the findings are no longer timely. According to Salem, there is “about a 90% non-enrolment rate”; he extrapolated from this that “the findings of a randomised study are [therefore] not applicable to 90% of the patients that you see, so it is a very strict patient population you are making conclusions about”.
Recruitment can also be an issue when enrolling for large, multicentre trials, Salem explained, due to the biases and beliefs individual institutions hold. In the mid-2000s, Salem and colleagues were designing a multicentre trial to compare Y-90 with RFA and TACE RFA. However, what was initially intended as a six-centre clinical trial devolved into a single-centre trial of 45 patients. The investigators could not enrol enough patients in each of the three arms—Y-90, TACE and TACE RFA—because of physician and institutional preference over certain procedures. Nevertheless, Salem used the data derived from the beginnings of this trial to redefine the institutional standard of care at Northwestern Memorial Hospital to Y-90.
Using another example from his own work in liver cancer, Salem explained that the STOP-HCC trial, of which he is the global principal investigator, further demonstrates the difficulties of recruitment in a randomised controlled trial. The initial vision was to have STOP-HCC as an open-label, prospective, multicentre, randomised, phase III trial evaluating Yttrium-90 (Y-90) transarterial chemoembolization (TACE) in the treatment of patients with unresectable HCC. “In 2010, when we started the trial, all we had was sorafenib, and we wanted to see if this would add to the standard of care”, Salem said at GEST. “This started as a 400 patient clinical trial; it is now eight years later, the technique has improved, there are seven other drugs approved, and our sample size is now 600. While I hope it will have an effect on standards of care, I am not sure what the impact of this trial will be, whether it is positive or negative.”
Overall survival “is not a good outcome for randomised trials with effective subsequent therapies”
Advising the listening interventional radiologists as to how he believes they should devise trials to further their field, Salem guards against using overall survival as a primary endpoint when the patient will receive multiple interventions. Quoting statisticians writing in the Journal of Clinical Oncology, Salem said that “survival is not a good outcome for randomised trials with effective subsequent therapies”. He elaborated: “We cannot attribute overall survival to the initial treatment, say TACE or radiofrequency ablation (RFA), if later on the patient had other interventions. After a patient progresses, they may be given regorafenib, lenvatinib, nivolumab, cabozantinib, ramucirumab. This gets very complicated.”
Indeed, the Barcelona Clinic Liver Cancer (BCLC) staging system (stages A/B) is predominantly predicated on level 2 data without data on overall survival. “It is too complicated to do overall survival studies, and there is a prohibitive statistical barrier”, Salem explained. “We must use imaging surrogates for BCLC A/B studies, such as time to progression and progression-free survival. Overall survival is possible in end-stage care, but really for everything else, we have to use imaging.”
He emphasised this point in his concluding remarks, reiterating: “I think in our quest to improve overall survival, we have let the statisticians rule a little too much, and we need to push back a little bit, and get a bit more clinical relevance in our clinical trials. For two therapies that might provide the same overall survival, we need to find surrogates that matter and reject the trope that overall survival is the supreme endpoint”.
Designing trials to push practice in interventional radiology
To combat these shortcomings, Salem made two proposals to the GEST audience for how interventional radiologists should go about designing trials in future.
Firstly, he suggested physicians conduct an initial, exploratory analysis. This would be an 80 patient study in a randomised, phase II trial design. “Forty versus 40, this bead versus that bead, this probe versus that probe”, Salem described. “Whatever it is you are investigating, you can compare it to a different option and make sure there is a difference you are identifying between the two.” If the investigators are aiming to change the standard of care, Salem suggested using this same model, but for 100 versus 100 patients: “If you cannot determine a difference in 100 patients versus 100 patients, there probably is not much there that is clinically relevant in the IR space”, he said. “Then you do not have to get a patient population of 1,000 and have a negative clinical trial.”
The second proposition Salem has for the GEST attendees is that investigators largely stop using overall survival as an endpoint, and use a clinically meaningful alternative. “I personally think response is a very important one”, he suggested. He explicated: “Imagine you have three treatments. One gives you stable disease, one gives you minor response, and one gives you major response. All three of those treatments have the same time to progression and the same progression free survival. But if you are a believer in the lethal tumour model in the liver, then the person with the best response rate is the person furthest away from the lethal tumour load line. So, everything else being the same, the best response becomes a relevant and important concept.”
He concluded: “I think we need to limit the amount of weight we give randomised controlled trials, and be careful not to prematurely and quickly discount other types of rigorously developed evidence. I think physicians rely on randomised controlled trials when supportive of their position, but interestingly ignore them when they do not; that is clear, we see that all the time.
“Do not accept in isolation the fact that we have limited level 1 randomised controlled trial data [in IR]. Take everything in totality—phase II, big data, randomised controlled trials—process, triangulate, and carry this knowledge to tumour board. You are certainly not alone in relying too heavily on phase II data, and please, whatever you are confronted with, do not let yourself be bullied by the tyranny of the randomised controlled trial.”