What is "artificial intelligence" in "drug discovery"?
A critique of a recent review on AI in drug discovery, and some guidelines
There’s that old joke about an opinion poll that asked people in different countries the question, “What do you think about the shortage of food in the rest of the world?”. Every country’s citizens answered the question based on their own political and social worldview, leading them to interpret even common words like “food” and “think” differently.
A recent review on the uses of AI in drug discovery from J. Med. Chem. reminded me of this joke and illustrated a bigger point about why people who talk about “AI in drug discovery” often end up talking past each other and why it’s difficult to measure progress in the field. The review lays out several examples of applications across both computational and experimental drug discovery where AI was ostensibly used. But if you take a look at these applications, each uses its own definition of AI, and some don’t use AI at all. Let’s charitably say that the authors’ conclusion that “Drug discovery and development is the most advanced area of AI” is at best premature.
Taking a look at the purported applications of AI they point to, they can be roughly divided into a couple of categories: there’s the well-known uses of AI for optimizing protein and mRNA sequences and structures; there’s the equally well-known uses of AI and deep learning in particular in analyzing images of cells in phenotypic and high-content screening (HCS); there’s the use of AI in optimizing reaction conditions in high-throughput experimentation (HTE) and then there’s AI for crystal-structure prediction, with applications in formulations.
The problem is that putting all these studies in an expansive bucket called “AI” might well lead onlookers to think that the paradigm is impacting all these areas equally and substantially. But each one of the areas yields different payoffs, and in most cases the methods are so varied as to render the AI label dubious. For instance, there’s no doubt that protein folding has been greatly impacted by what we can legitimately call AI, although its relevance to the overall process of drug development is slim. The example of polymorph crystal structure prediction for Nirmatrelvir from the review doesn’t use AI at all but uses standard physics-based techniques (free energy calculations, molecular dynamics etc.). Another lead discovery example, while a legitimate use of AI, lacks novelty. The mRNA sequence optimization paper uses linear parsing, an optimization protocol going back to the 1960s and which can also hardly be called AI. HCS and reaction-optimization legitimately use some form of AI and deep learning, although again, their impact on the overall drug development process is small.
Taken together, these examples do not make the case about drug development being the most advanced applications of AI; in fact one might make the case that drug development is one of the least advanced applications, not in terms of frequency of use but in terms of impact. That of course presents an exciting opportunity, but it’s important to put it that way then.
None of this is to say that the techniques indicated in the review are useless; like other drug discovery techniques, they have their place as niche tools. I have productively used generative AI myself, but only when I used other techniques to parse the haystack of results - many chemically unrealistic - coming out of it. That would be consistent with the general ethos of drug discovery as a Swiss army knife rather than a single gleaming dagger, where ideas and techniques work best when they synergistically piggyback on each other; in that scenario AI can be a powerful technique in the right context. In fact what’s true for AI has been true for every other technique in drug discovery: for the right target, the right stage of the process, the right problem, a technique can be very useful; who can deny the value of structure-based design and modeling for the right targets?
But calling everything from linear regression to physics-based optimization “AI” is doing the field a disservice because it makes it harder to tease apart what’s legitimate and what’s working and what’s not, and it increases the temptation to blindly apply AI to every problem in drug discovery. It misleads outsiders and investors who are skeptical of new technology. We already lived in a hype-fueled AI cycle, so it’s particularly important for all of us - the scientific community, investors and publication gatekeepers - to separate the wheat from the chaff.
So how should we report and evaluate claims of using “AI in drug discovery”? Some commonsense guidelines come to mind.
Explicitly say what the “AI” in the claim means: there’s clearly a qualitative difference between a simple linear regression model and a multi-layer deep learning model, or a simple knowledge graph and a transformer-based LLM model. If you are going to claim that AI was used in your drug discovery project, make it clear as day what the word means. Press releases breathlessly announcing the effort bear a particular responsibility to describe these details accurately'; “AI discovered a drug” is about as accurate as saying “A crampon climbed Mount Everest”.
Make it very clear what exactly your AI did. Did it discover leads quickly? Did it improve their properties? If so, what properties? Did it discover diverse scaffolds? Since every technique in drug discovery fulfills a certain goal, it is key to describe what that successful but modest goal was. Saying that a drug was “discovered using AI” raises more questions than it answers about what stage and what problem the AI was addressing, exactly.
Just like it takes a village to raise a child, it takes a village to discover and develop a drug. A big problem - perhaps the cardinal problem - in evaluating the uses of AI in drug development is that AI is always combined with multiple other techniques, including good old med chem intuition. This makes it difficult, if not impossible, to tease apart the exact role the techniques played, but it also makes the “drug discovered using AI” pitch a dubious one. That’s where point #2 above can help; if you accurately describe what the technique did, people can appreciate its utility without needing to swallow any grand claims.
How does AI compare to other, simpler techniques in solving your problem? This point is admittedly a difficult one to address since doing controlled, A/B testing type experiments is almost never practical in a fast-paced drug discovery project. Nevertheless, practitioners of AI need to face the sobering fact that their new tools are up against almost 40 years of computer-aided drug design tools, none of which are magic fixes and all of which are effective within a restricted domain of applicability. At least some validation studies need to tell us why their techniques are better than any number of existing methods, including basic structure-property and SAR models. What are the advantages and what are the practical metrics by which these advantages can be assessed?
A few years, J. Med. Chem. asked authors to submit computational papers only if there was experimental follow-up. The same should be done for AI papers. Especially in an age when we are drowning in a surfeit of theoretical techniques and papers, it’s the studies which follow up predictions with solid experimental data which are going to stand out. And these aren’t exactly common; a recent review only found eight papers out of fifty-five which had experimental validation.
The review ends up making a good point about why some of these techniques like generative adversarial networks (GANs) should be incorporated into the early education of both drug discovery students and early stage drug discovery scientists. That would be a good thing, but what would be even better would be to teach students and professionals to critically evaluate these techniques, to place them in their proper context and clearly laying out where they do and do not work. Because if we keep saying everything is AI and keep saying it’s revolutionizing drug discovery, we risk getting to a point where people will start believing nothing is AI. And that would be worse than not using it.