Drug safety has been, and will remain, a critical concern for pharmaceutical companies, regulators, health authorities and patients. Although most pharmaceutical stakeholders understand that any medication or treatment encompasses some amount of risk, the goal is always to develop a comprehensive understanding of that risk in order to appropriately mitigate it.

Consequently, an essential question for drug developers becomes: How can we effectively predict risk? This simple question represents a significant focus of many research initiatives and is being addressed at several levels – drug target, molecule, patient, population. More recently, however, because of substantial advances in artificial intelligence (AI) and machine learning (ML), the pharmaceutical industry has experienced a surge of models to enable better risk prediction.

For drug developers, one big challenge lies in optimising these AI- and ML-based models by feeding them the right data, which is no simple task. The reason why revolves around the astonishing proliferation of health data across the industry in recent years, as healthcare organisations have seen a mind-boggling health data growth rate of 878% since 2016, according to Dell EMC.

This virtually overwhelming surge of health data has made it nearly impossible for humans to properly analyse data before, during and after clinical trials without leveraging technology. To help manage this avalanche of data, more pharmaceutical companies are turning to natural language processing (NLP) technology, which mines unstructured text-based documents and then converts that data into structured information that can be analysed by a computer.

When used to scour scientific and clinical literature and other sources for deeper information on various drugs, targets and diseases, for example, NLP can help pharmaceutical companies develop risk models capable of predicting adverse events with a high level of precision that enable them to speed drug development and reduce costs.

NLP feeds predictive adverse event models
The US Food and Drug Administration (FDA) is performing ongoing work to develop models that leverage post-market safety data to predict adverse events for new drugs coming to market. In two papers published in 2020, researchers detail how AI and ML tools such as NLP, combined with ensemble models and classification algorithms, contribute to these models. Both papers build on a previous pilot study of six drugs, which demonstrated that pharmacological target adverse-event profiles, based on marketed drugs, can be used to predict unlabelled adverse events for a new drug at the time of approval.

The first paper, published in BMC Bioinformatics in April, advances the pilot’s research by adopting additional features in its ML model, such as structural similarity, target similarity, and time on market, as well as profiles of adverse events from FDA drug labels and clinical literature.

These features were used to train a Naïve Bayes classifier, with 10,000 bootstrapping steps. This approach predicted 53 serious adverse events with high positive predictive values where well-characterised target-event relationships existed. However, adverse events that may be idiosyncratic or related to secondary target effects were not as well-predicted.

The second paper, published in Clinical Pharmacology and Therapeutics in October, represents another worthwhile addition to this body of research. In this study, researchers tapped data from three key sources – adverse event reports, peer reviewed literature, and FDA drug labels – to extract features for target-adverse event profiles. Then, these features were fed into an ensemble machine learning model that used the data to link drugs to drug targets, enabling a new level of risk prediction for new drugs targeting the same protein.

Commercial examples of adverse-event predictive models
In recent years, many global pharmaceutical companies have begun to use NLP to more efficiently uncover insights that enable better risk-prediction models. For example, researchers on AstraZeneca’s clinical biomedical informatics team sought evidence to understand the landscape of drug candidates associated with the risk of neutropenia, a condition characterised by an unusually low number of white blood cells called neutrophils.

The challenge the researchers faced, however, was that data on drugs reported to cause neutropenia in humans was generally buried in scientific literature and other textual sources. To overcome this barrier, AstraZeneca used NLP to mine clinical literature, such as scientific abstracts and curated clinical trial reports, to extract relevant data that was ultimately used to feed predictive models. The analysis enabled the company to build models that helped researchers better assess the potential toxicity of a drug candidate, enabling a more informed decision of whether to continue investigating the drug by transitioning from animals to humans.

Separately, Pfizer safety searchers needed to mine clinical literature for data on target-safety links related to a broad range of diseases and determined that a manual review would be too laborious. Leveraging NLP, Pfizer performed a literature review that produced a “toxico-matrix”. This delivered a picture of the target-safety landscape in the form of easy-to-use tabular view that categorised the data by target and featured links to underlying evidence. The result was a significant increase in the quality of NLP-driven safety analysis compared with standard keywords searches. This more systematic approach to risk prediction provided Pfizer researchers with a single comprehensive overview of the best targets for a therapeutic approach, from a safety perspective.

Because ensuring safety is such an indispensable part of drug development, pharmaceutical companies will always seek the latest innovations that can help them remove risk from the process. Today, those leading technologies include artificial intelligence, machine learning and natural language processing. As these tools improve over time, so will the safety of drug development.

Jane Z Reed is director, Life Sciences, at Linguamatics, an IQVIA company