Jennifer Bradford and Matthew Metherell consider the success and potential of artificial intelligence in clinical research and the impact of machine learning on clinical trials

In recent years, the increase in accessible, large-scale computing power and storage has ushered in a new dawn for artificial intelligence (AI), with the technology appearing to finally catch up with the existing algorithms to bring machine learning (ML) to realisation. In this article, we discuss some of the key applications of ML that have shown success in clinical research. Moreover, we consider how machine learning is impacting on clinical trials, utilising decades of structured clinical trial data alongside real-world data (RWD) and other valuable data sources to support clinical trial design, execution and analysis. Combining computational skills and drug development experience, data science teams can support the pharma and biotech industry to generate business value through the application of machine learning.

Successful ML requires extensive quality data

For ML algorithms to be successful they require large, quality data sets for their application. An ML algorithm utilises both a training and test set of data; the training set is example data used to fit the ML model, while the test set is previously unseen data used to evaluate its performance.

There are numerous initiatives across healthcare and clinical research aimed at facilitating access and sharing of data. For example, in the US, the HIPAA law allows healthcare companies to use any relevant data without needing to reapply for each use case provided it is for the purpose of improving patient care; the data of patients on Medicare is also available. In the UK, NHSX was established in 2019 to oversee the digital transformation of the NHS, with one aim to bring together NHS data in a meaningful and useable way.

As the amount of potential data sources to support clinical research continues to grow, it is important that data used for ML is factually sound, meaningful and facilitates decisions in the real world. Poor quality data may bias results, cause limited engagement in the results through lack of trust and can lead to incorrect decision-making which could critically affect people’s health and safety. A Contract Research Organisation (CRO) will have experience working with clinicians and scientists to formulate specific questions and identify appropriate data sets to address them, as well as expertise in processing, integrating and analysing diverse data sets to maximise the value of data.

Leveraging ML in clinical trials

Clinical trials can take years to complete and cost millions of pounds. Thus, any efficiency improvements could offer massive savings in time and money. One idea is to run preclinical trials via an ML platform that allows earlier identification of the demographics most likely to respond to a drug and biomarkers that show the most promise for patient response, therefore refining both the compound and the trial design.

ML-based predictive analytics are also being used in recruitment and retention activities, as well as for patient engagement. For recruitment, identifying the right candidates at a faster rate can accelerate R&D timelines. The ultimate success of a trial hinges on keeping patients engaged. This becomes increasingly important as healthcare providers expand the use of health IT, including apps and wearables, to manage patient health.

To counteract the excessive duration and cost of drug development, the US Food and Drug Administration sponsored the enactment of the 21st Century Cures Act, which was signed into law in December 2016. The law is designed to help accelerate medical product development and bring innovations and modernisation to clinical trial designs and clinical outcome assessments to speed up the development and review of medical products. The FDA, although encouraging an increase of observational studies utilising real-world evidence, is in the process of developing standards and methodologies for their use. This would allow ML analysis of larger data-sets, ie the interpretation of wearables, data, electronic health records or medical claims databases.

One application of this is in the development of synthetic control arms, which model comparators using previously collected data from sources such as electronic healthcare records, claims data, fitness trackers, disease registries and historical clinical trials. This approach has the potential to decrease trial size and duration thereby reducing costs, and can also incentivise patient participation.

Real-world data may also be used alongside multi-omic mapping of patients receiving an investigational product to create a ‘digital twin’: an in silico representation of a specific individual reflecting his or her physiological and molecular status as well as lifestyle over time. Digital twins could be used to understand, for example, what would have happened to the individual if he or she had received placebo or standard of care. This approach has shown utility in healthcare for example where a ‘digital twin’ of a patient’s heart is created using an individual’s medical data and models the unique characteristics of an individual’s heart. This model can be used by clinicians to test different treatment options for a given patient by comparing possible outcomes without any real risk to the patient.

Case in point

An example application of ML in clinical trials is the utilisation of historical data to guide clinical study design. The study team wanted to understand whether there were any biomarkers or features of the clinical data – such as demographics, vital signs and laboratory measurements – from previous studies that were predictive of a specific clinical event.

Following extraction and processing of the relevant clinical data, statistical and visualisation approaches enabled the team to look in detail at the data, such as data consistency, missing data and outliers to ensure a full understanding of the data prior to ML. Random Forest and Gradient boosting methods were applied through a cross validation approach, ensuring the methods were tested using previously unseen data. The predictive power, precision and recall of the different methods were analysed and variable importance described how the different features in the data contributed towards the predictor which ensured that the output was not just a ‘black-box’ predictor but provided insights into which variables were important. It was then possible to look at the most interesting variables in more detail using visualisation techniques.

The results of this ML approach were used as a source of evidence for the clinical team to support its decision making during the design of the next study providing efficiencies and cost savings.

Pharma R&D: a potential treasure trove of data

An established pharma company may have a vast R&D database containing years of data from clinical trials, lab experiments and more. This data contains potential insights, at the patient level, waiting to be uncovered by different approaches, for example, by NLP, which can sift through previous research documents for findings that are relevant to current research or by ML approaches across data pooled from many clinical studies.

Utilising this rich data source alongside other valuable real-world data is not without its challenges. An experienced team can pool together studies and integrate other data types, harmonising different versions of data dictionaries and standards. In addition, they should be proficient in the application of ML across clinical and real-world data; identifying patterns to inform future trials and research and generating business value working alongside physicians and scientists.

The future of ML in clinical research

It is hard to overstate the potential for ML in any industry, particularly for clinical research. For many years, IT technology had been dominated by traditional healthcare companies, but with claim for the top spot in machine learning very much underway, the industry has opened up to the likes of Google, Apple, Facebook and Amazon, which have brought unprecedented levels of investment and innovation across all aspects of healthcare.
For instance, Google is applying AI capabilities in the areas of disease detection, data interoperability and health insurance: one example is the application of its DeepMind Health to differentiate between healthy and cancerous tissue to improve radiation treatment.

Former FDA Commissioner Dr Scott Gottlieb recently noted that new streams of real-world data gathered directly from electronic health records and other data sources, paired with advances in ML, will be crucial for creating the next generation of clinical trials. He also stressed the importance of modernising the clinical trial process to take advantage of this data as well as Internet-of-Things devices, claims, lab tests, wearable devices and even social media.

“Digital technologies are one of the most promising tools we have for making healthcare more efficient and more patient-focused,” Gottlieb said.  “This isn’t an indictment of the randomised controlled trial. Far from it. It’s a recognition that new approaches and new technologies can help expand the sources of evidence that we can use to make more reliable treatment decisions. And it’s a recognition that this evidence base can continue to build and improve throughout the therapeutic life of an FDA-approved drug or medical device.”

While these new streams of real-world data can radically alter the efficacy of clinical trials, it is crucial that data science teams supporting clinical trials be highly skilled and adept at delivering high quality advice and results, not only for standard clinical trials, observational data and studies, but also to help businesses understand what is possible in this new frontier.

Jennifer Bradford, PhD and head of Data Science and Matthew Metherell, senior programmer for the CRO PHASTAR