The future of machine learning for small-molecule drug discovery will be driven by data

Durant, Guy; Boyles, Fergus; Birchall, Kristian; Deane, Charlotte M.

doi:10.1038/s43588-024-00699-0

Perspective
Published: 15 October 2024

The future of machine learning for small-molecule drug discovery will be driven by data

Nature Computational Science (2024)Cite this article

4 Altmetric
Metrics details

Abstract

Many studies have prophesied that the integration of machine learning techniques into small-molecule therapeutics development will help to deliver a true leap forward in drug discovery. However, increasingly advanced algorithms and novel architectures have not always yielded substantial improvements in results. In this Perspective, we propose that a greater focus on the data for training and benchmarking these models is more likely to drive future improvement, and explore avenues for future research and strategies to address these data challenges.

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Benchmark performance with respect to publication date.

Machine learning in preclinical drug discovery

Article 19 July 2024

Computational approaches streamlining drug discovery

Article 26 April 2023

A Bayesian machine learning approach for drug target identification using diverse data types

Article Open access 19 November 2019

Data availability

Source data for Fig. 1 is available with this paper.

References

Makurvet, F. D. Biologics vs. small molecules: drug costs and patient access. Med. Drug Discov. 9, 100075 (2021).
Article Google Scholar
Midlam, C. Status of Biologic Drugs in Modern Therapeutics-Targeted Therapies vs. Small Molecule Drugs 31–46 (Wiley, 2020).
Liu, Z. et al. An overview of PROTACs: a promising drug discovery paradigm. Mol. Biomed. 3, 46 (2022).
Article Google Scholar
Dong, G., Ding, Y., He, S. & Sheng, C. Molecular glues for targeted protein degradation: from serendipity to rational discovery. J. Med. Chem. 64, 10606–10620 (2021).
Article Google Scholar
Scannell, J. W., Blanckley, A., Boldon, H. & Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200 (2012).
Article Google Scholar
Taylor, D. The pharmaceutical industry and the future of drug development. Pharm. Environ. https://doi.org/10.1039/9781782622345-00001 (2015).
Article Google Scholar
Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).
Article Google Scholar
Blanco-Gonzalez, A. et al. The role of AI in drug discovery: challenges, opportunities, and strategies. Pharmaceuticals 16, 891 (2023).
Article Google Scholar
Ramesh, A. et al. Zero-shot text-to-image generation. In International Conference on Machine Learning 8821–8831 (PMLR, 2021).
Croitoru, F.-A., Hondru, V., Ionescu, R. T. & Shah, M. Diffusion models in vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10850–10869 (2023).
Article Google Scholar
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT4. Preprint at https://arxiv.org/abs/2303.12712 (2023).
Gozalo-Brizuela, R. & Garrido-Merchán, E. C. ChatGPT is not all you need. A State of the Art Review of large generative AI models. GRACE 1, 1 (2023).
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Bertoline, L. M., Lima, A. N., Krieger, J. E. & Teixeira, S. K. Before and after AlphaFold2: an overview of protein structure prediction. Front. Bioinform. 3, 1120370 (2023).
Article Google Scholar
Lipinski, C. F., Maltarollo, V. G., Oliveira, P. R., Da Silva, A. B. & Honorio, K. M. Advances and perspectives in applying deep learning for drug design and discovery. Front. Robot. AI 6, 108 (2019).
Article Google Scholar
Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).
Article Google Scholar
Meyers, J., Fabian, B. & Brown, N. De novo molecular design and generative models. Drug Discov. Today 26, 2707–2715 (2021).
Article Google Scholar
Jiang, Y. et al. Artificial intelligence for retrosynthesis prediction. Engineering https://doi.org/10.1016/j.eng.2022.04.021 (2022).
Article Google Scholar
Sánchez-Cruz, N. Deep graph learning in molecular docking: advances and opportunities. Artif. Intell. Life Sci. 3, 100062 (2023).
Google Scholar
Mitchell, JohnB. O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).
Article Google Scholar
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
Article Google Scholar
Zhu, H., Yang, J. & Huang, N. Assessment of the generalization abilities of machine-learning scoring functions for structure-based virtual screening. J. Chem. Inf. Model. 62, 5485–5502 (2022).
Article Google Scholar
Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).
Article Google Scholar
Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024).
Article Google Scholar
Mokaya, M. et al. Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning. Nat. Mach. Intell. 5, 386–394 (2023).
Article Google Scholar
Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).
Article Google Scholar
Torren-Peraire, P. et al. Models matter: the impact of single-step retrosynthesis on synthesis planning. Digit. Discov. 3, 558–572 (2024).
Article Google Scholar
Ivanenkov, Y. et al. The hitchhiker’s guide to deep learning driven generative chemistry. ACS Med. Chem. Lett. 14, 901–915 (2023).
Article Google Scholar
Handa, K., Thomas, M. C., Kageyama, M., Iijima, T. & Bender, A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J. Cheminform. 15, 112 (2023).
Article Google Scholar
Harris, C. et al. PoseCheck: generative models for 3D structure-based drug design produce unrealistic poses. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023).
Neves, B. J. et al. QSAR-based virtual screening: advances and applications in drug discovery. Front. Pharmacol. 9, 1275 (2018).
Article Google Scholar
Yan, X. et al. Chemical structure similarity search for ligand-based virtual screening: methods and computational resources. Curr. Drug Targets 17, 1580–1585 (2016).
Article Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article Google Scholar
Pereira, J. et al. High-accuracy protein structure prediction in CASP14. Proteins 89, 1687–1699 (2021).
Article Google Scholar
Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).
Article Google Scholar
Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. PhD thesis, Univ. Cambridge (2012).
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Article Google Scholar
Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).
Article Google Scholar
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a crossdocked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems 6000–6010 (ACM, 2017).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. International Conference of Learning Representations (ICLR) (2017).
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Article Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).
Article Google Scholar
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2023).
Article Google Scholar
Jiang, D. et al. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J. Cheminform. 13, 12 (2021).
Article Google Scholar
Korolev, V., Mitrofanov, A., Korotcov, A. & Tkachenko, V. Graph convolutional neural networks as ‘general-purpose’ property predictors: the universality and limits of applicability. J. Chem. Inf. Model. 60, 22–28 (2020).
Article Google Scholar
Geiger, M. & Smidt, T. e3nn: Euclidean neural networks. Preprint at https://arxiv.org/abs/2207.09453 (2022).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. PLMR 139, 9323–9332 (2021).
Google Scholar
Scantlebury, J. et al. A small step toward generalizability: training a machine learning scoring function for structure-based virtual screening. J. Chem. Inf. Model. 63, 2960–2974 (2023).
Article Google Scholar
Corso, G. et al. DiffDock: diffusion steps, twists, and turns for molecular docking. In International Conference on Learning Representations (2023).
Igashov, I. et al. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 6, 417–427 (2024).
Article Google Scholar
Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional diffusion for molecular conformer generation. In Proc. 36th International Conference on Neural Information Processing Systems article no. 1760, 24240–24253 (ACM, 2022).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695v2 (2022).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Reed, J., Alterio, B., Coblenz, H., O’Lear, T. & Metz, T. AI image-generation as a teaching strategy in nursing education. J. Interact. Learn. Res. 34, 369–399 (2023).
Google Scholar
Yildirim, E. In Art and Architecture: Theory, Practice and Experience 97 (2022).
Azuaje, G. et al. Exploring the use of AI text-to-image generation to downregulate negative emotions in an expressive writing application. R. Soc. Open Sci. 10, 220238 (2023).
Article Google Scholar
Fishman, N., Klarner, L., Mathieu, E., Hutchinson, M. & De Bortoli, V. Metropolis sampling for constrained diffusion models. In Proc. 37th International Conference on Neural Information Processing Systems article no. 2721, 62296–6233 (ACM, 2024).
Song, Y., Dhariwal, P., Chen, M. & Sutskever, I. Consistency models. In International Conference on Machine Learning 32211–32252 (PMLR, 2023).
Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M. & Le, M. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations (2022).
Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proc. IEEE International Conference on Computer Vision 843–852 (IEEE, 2017).
Betker, J. et al. Improving image generation with better captions. Open AI https://cdn.openai.com/papers/dall-e-3.pdf (2023).
Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2014).
Article Google Scholar
Rose, P. W. et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45, D271–D281 (2016).
Google Scholar
Zdrazil, B. et al. The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 52, D1180–D1192 (2024).
Article Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at https://arxiv.org/abs/2204.06125 (2022).
Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet Google Scholar
Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48, D570–D578 (2019).
Google Scholar
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Article Google Scholar
Tang, J. et al. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J. Chem. Inf. Model. 54, 735–743 (2014).
Article Google Scholar
Huang, R. et al. Tox21challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 3, 85 (2016).
Article Google Scholar
Voitsitskyi, T. et al. Augmenting a training dataset of the generative diffusion model for molecular docking with artificial binding pockets. RSC Adv. 14, 1341–1353 (2024).
Article Google Scholar
Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).
Article Google Scholar
Blundell, T. L. & Patel, S. High-throughput X-ray crystallography for drug discovery. Curr. Opin. Pharmacol. 4, 490–496 (2004).
Article Google Scholar
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).
Article Google Scholar
Stark, H., Jing, B., Barzilay, R. & Jaakkola, T. Harmonic prior self-conditioned flow matching for multi-ligand docking and binding site design. In NeurIPS 2023 AI for Science Workshop (2023).
Corso, G., Deng, A., Polizzi, N., Barzilay, R. & Jaakkola, T. The discovery of binding modes requires rethinking docking generalization. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023).
Liu, L. et al. Pre-training on large-scale generated docking conformations with helixdock to unlock the potential of protein–ligand structure prediction models. Preprint at https://arxiv.org/abs/2310.13913 (2023).
McFee, M. & Kim, P. M. GDockScore: a graph-based protein–protein docking scoring function. Bioinform. Adv. 3, vbad072 (2023).
Article Google Scholar
Réau, M., Langenfeld, F., Zagury, J.-F., Lagarde, N. & Montes, M. Decoys selection in benchmarking datasets: overview and perspectives. Front. Pharmacol. 9, 11 (2018).
Article Google Scholar
Strieth-Kalthoff, F. et al. Machine learning for chemical reactivity: the importance of failed experiments. Angew. Chem. Int. Ed. 61, 29 (2022).
Article Google Scholar
Mlinarić, A., Horvat, M. & Šupak Smolčić, V. Dealing with the positive publication bias: why you should really publish your negative results. Biochem. Med. 27, 447–452 (2017).
Article Google Scholar
McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).
Article Google Scholar
Maloney, M. P. et al. Negative data in data sets for machine learning training. Org. Lett. 25, 2945–2947 (2023).
Article Google Scholar
McEwen, L. & Mustafa, F. Worldfair chemistry: making IUPAC assets fair. Chem. Int. 45, 14–17 (2023).
Article Google Scholar
Steinbeck, C. et al. NFDI4chem—towards a national research data infrastructure for chemistry in Germany. Res. Ideas Outcomes 6, e55852 (2020).
Article Google Scholar
Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Article Google Scholar
Ball, P. Computer gleans chemical insight from lab notebook failures. Nature https://doi.org/10.1038/nature.2016.19866 (2016).
Article Google Scholar
Swain, M. C. & Cole, J. M. ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56, 1894–1904 (2016).
Article Google Scholar
Rajan, K., Brinkhaus, H. O., Agea, M. I., Zielesny, A. & Steinbeck, C. DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nat. Commun. 14, 5045 (2023).
Article Google Scholar
Blecher, L., Cucurull, G., Scialom, T. & Stojnic, R. Nougat: neural optical understanding for academic documents. Preprint at https://arxiv.org/abs/2308.13418 (2023).
Chodera, J., Lee, A. A., London, N. & von Delft, F. Crowdsourcing drug discovery for pandemics. Nat. Chem. 12, 581 (2020).
Article Google Scholar
The COVID Moonshot Consortium. COVID Moonshot: open science discovery of SARS-CoV-2 main protease inhibitors by combining crowdsourcing, high-throughput experiments, computational simulations, and machine learning. Preprint at bioRxiv https://doi.org/10.1101/2020.10.29.339317 (2020).
Boby, M. L. et al. Open science discovery of potent noncovalent SARS-CoV-2 main protease inhibitors. Science 382, eabo7201 (2023).
Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).
Article Google Scholar
Hanser, T. et al. Using privacy-preserving federated learning to enable pre-competitive cross-industry knowledge sharing and improve QSAR models. In Society of Toxicology (SOT) Annual Meeting (2022).
Wang, R., Chaudhari, P. & Davatzikos, C. Bias in machine learning models can be significantly mitigated by careful training: evidence from neuroimaging studies. Proc. Natl Acad. Sci. USA 120, e2211613120 (2023).
Article Google Scholar
Van Giffen, B., Herhausen, D. & Fahse, T. Overcoming the pitfalls and perils of algorithms: a classification of machine learning biases and mitigation methods. J. Bus. Res. 144, 93–106 (2022).
Article Google Scholar
Leavy, S. Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning. In Proc. 1st International Workshop on Gender Equality in Software Engineering 14–16 (2018).
Lee, N. T. Detecting racial bias in algorithms and machine learning. J. Inf. Commun. Ethics Soc. 16, 252–260 (2018).
Article Google Scholar
Subramanian, G., Ramsundar, B., Pande, V. & Denny, R. A. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J. Chem. Inf. Model. 56, 1936–1949 (2016).
Article Google Scholar
Martins, I. F., Teixeira, A. L., Pinheiro, L. & Falcao, A. O. A Bayesian approach to in silico blood–brain barrier penetration modeling. J. Chem. Inf. Model. 52, 1686–1697 (2012).
Article Google Scholar
Delaney, J. S. ESOL: estimating aqueous solubility directly from molecular structure. J. Chem. Inf. Comput. Sci. 44, 1000–1005 (2004).
Article Google Scholar
Xie, Y., Xu, Z., Ma, J. & Mei, Q. How much space has been explored? Measuring the chemical space covered by databases and machine-generated molecules. In The Eleventh International Conference on Learning Representations (2022).
Thakkar, A. et al. Unbiasing retrosynthesis language models with disconnection prompts. ACS Cent. Sci. 9, 1488–1498 (2023).
Article Google Scholar
Cleves, A. E. & Jain, A. N. Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery. J. Comput. Aided Mol. Des. 22, 147–159 (2008).
Article Google Scholar
Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, e0220113 (2019).
Article Google Scholar
Sieg, J., Flachsenberg, F. & Rarey, M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 59, 947–961 (2019).
Article Google Scholar
Jacobsson, M. & Karlén, A. Ligand bias of scoring functions in structure-based virtual screening. J. Chem. Inf. Model. 46, 1334–1343 (2006).
Article Google Scholar
Chaput, L., Martinez-Sanz, J., Saettel, N. & Mouawad, L. Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance. J. Cheminform. 8, 56 (2016).
Article Google Scholar
Jiang, D. et al. Interactiongraphnet: a novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. J. Med. Chem. 64, 18209–18232 (2021).
Article Google Scholar
Shen, C. et al. A generalized protein–ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem. Sci. 14, 8129–8146 (2023).
Article Google Scholar
Farahani, A., Voghoei, S., Rasheed, K. & Arabnia, H. R. A brief review of domain adaptation. Advances in Data Science and Information Engineering: Proc. ICDATA 2020 and IKE 2020 877–894 (2021).
Han, X., Baldwin, T. & Cohn, T. Towards equal opportunity fairness through adversarial learning. Preprint at https://arxiv.org/abs/2203.06317 (2022).
Shao, S., Ziser, Y. & Cohen, S. B. Gold doesn’t always glitter: spectral removal of linear and nonlinear guarded attribute information. In The 17th Conference of the European Chapter of the Association for Computational Linguistics 1611–1622 (Association for Computational Linguistics, 2023).
Klarner, L. et al. Drug discovery under covariate shift with domain-informed prior distributions over functions. In Proc. 40th International Conference on Machine Learning article no. 706, 17176–17197 (ACM, 2023).
Kramer, C., Beck, B., Kriegl, J. M. & Clark, T. A composite model for hERG blockade. ChemMedChem 3, 254–265 (2008).
Article Google Scholar
Kausar, S. & Falcao, A. O. An automated framework for QSAR model building. J. Cheminform. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0256-5 (2018).
Simeon, S. & Jongkon, N. Construction of quantitative structure activity relationship (QSAR) models to predict potency of structurally diversed Janus kinase 2 inhibitors. Molecules 24, 4393 (2019).
Article Google Scholar
Kalliokoski, T., Kramer, C., Vulpetti, A. & Gedeck, P. Comparability of mixed IC₅₀ data—a statistical analysis. PLoS ONE 8, e61007 (2013).
Article Google Scholar
Kramer, C., Kalliokoski, T., Gedeck, P. & Vulpetti, A. The experimental uncertainty of heterogeneous public K_i data. J. Med. Chem. 55, 5165–5173 (2012).
Article Google Scholar
Landrum, G. A. & Riniker, S. Combining IC₅₀ or K_i values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).
Article Google Scholar
Hernández-Garrido, C. A. & Sánchez-Cruz, N. Experimental uncertainty in training data for protein–ligand binding affinity prediction models. Artif. Intell. Life Sci. 4, 100087 (2023).
Google Scholar
Speck-Planche, A. & Kleandrova, V. V. Multi-condition QSAR model for the virtual design of chemicals with dual pan-antiviral and anti-cytokine storm profiles. ACS Omega 7, 32119–32130 (2022).
Article Google Scholar
Baell, J. B. & Nissink, J. W. M. Seven year itch: pan-assay interference compounds (PAINs) in 2017 utility and limitations. ACS Chem. Biol. 13, 36–44 (2018).
Article Google Scholar
Brenk, R. et al. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 3, 435–444 (2008).
Article Google Scholar
Jadhav, A. et al. Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a thiol protease. J. Med. Chem. 53, 37–51 (2010).
Article Google Scholar
Walters, P. We need better benchmarks for machine learning in drug discovery. Practical Cheminformatics Blog https://practicalcheminformatics.blogspot.com/2023/08/we-need-better-benchmarks-for-machine.html (2023).
Klarner, L., Reutlinger, M., Schindler, T., Deane, C. & Morris, G. Bias in the benchmark: systematic experimental errors in bioactivity databases confound multi-task and meta-learning algorithms. In ICML 2022 2nd AI for Science Workshop (2022).
Wigh, D. S., Arrowsmith, J., Pomberger, A., Felton, K. C. & Lapkin, A. A. Orderly: data sets and benchmarks for chemical reaction data. J. Chem. Inf. Model. 64, 3790–3798 (2024).
Article Google Scholar
Durant, G., Boyles, F., Birchall, K., Marsden, B. & Deane, C. Robustly interrogating machine learning based scoring functions: what are they learning? Preprint at bioRxiv https://doi.org/10.1101/2023.10.30.564251 (2023).
Li, S. et al. Structure-aware interactive graph neural networks for the prediction of protein–ligand binding affinity. In KDD21: Proc. 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3447548.3467311 (ACM, 2021).
Wójcikowski, M., Kukiełka, M., Stepniewska-Dziubinska, M. M. & Siedlecki, P. Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics 35, 1334–1341 (2019).
Article Google Scholar
Wang, Z. et al. OnionNet-2: a convolutional neural network model for predicting protein–ligand binding affinity based on residue-atom contacting shells. Front. Chem. 9, 913 (2021).
Google Scholar
Browne, C. B. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012).
Article Google Scholar
Huang, K. et al. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. Preprint at https://arxiv.org/abs/2102.09548v2 (2021).
Gan, J. L. et al. Benchmarking ensemble docking methods in D3R Grand Challenge 4. J. Comput. Aided Mol. Des. 36, 87–99 (2022).
Article Google Scholar
Ackloo, S. et al. CACHE (critical assessment of computational hit-finding experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6, 287–295 (2022).
Article Google Scholar

Download references

Acknowledgements

This work was supported by funding from the Engineering and Physical Sciences Research Council (EPSRC) (grant number EP/S024093/1).

Author information

Authors and Affiliations

Department of Statistics, University of Oxford, Oxford, UK
Guy Durant, Fergus Boyles & Charlotte M. Deane
LifeArc, Stevenage, UK
Kristian Birchall

Authors

Guy Durant
View author publications
You can also search for this author in PubMed Google Scholar
Fergus Boyles
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Birchall
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte M. Deane
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.D., F.B. and C.M.D. conceived the overall structure of the paper. G.D. wrote the paper. F.B., C.M.D. and K.B. reviewed and edited the paper.

Corresponding author

Correspondence to Charlotte M. Deane.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Diwakar Shukla and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Source data

Source Data Fig. 1

Collated papers and ML models for CASF-2016, USPTO-50k and MoleculeNet HIV benchmarks.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Durant, G., Boyles, F., Birchall, K. et al. The future of machine learning for small-molecule drug discovery will be driven by data. Nat Comput Sci (2024). https://doi.org/10.1038/s43588-024-00699-0

Download citation

Received: 01 March 2024
Accepted: 03 September 2024
Published: 15 October 2024
DOI: https://doi.org/10.1038/s43588-024-00699-0

Subjects

Access through your institution

Buy or subscribe

Fig. 1: Benchmark performance with respect to publication date.

Makurvet, F. D. Biologics vs. small molecules: drug costs and patient access. Med. Drug Discov. 9, 100075 (2021).
Article Google Scholar
Midlam, C. Status of Biologic Drugs in Modern Therapeutics-Targeted Therapies vs. Small Molecule Drugs 31–46 (Wiley, 2020).
Liu, Z. et al. An overview of PROTACs: a promising drug discovery paradigm. Mol. Biomed. 3, 46 (2022).
Article Google Scholar
Dong, G., Ding, Y., He, S. & Sheng, C. Molecular glues for targeted protein degradation: from serendipity to rational discovery. J. Med. Chem. 64, 10606–10620 (2021).
Article Google Scholar
Scannell, J. W., Blanckley, A., Boldon, H. & Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200 (2012).
Article Google Scholar
Taylor, D. The pharmaceutical industry and the future of drug development. Pharm. Environ. https://doi.org/10.1039/9781782622345-00001 (2015).
Article Google Scholar
Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).
Article Google Scholar
Blanco-Gonzalez, A. et al. The role of AI in drug discovery: challenges, opportunities, and strategies. Pharmaceuticals 16, 891 (2023).
Article Google Scholar
Ramesh, A. et al. Zero-shot text-to-image generation. In International Conference on Machine Learning 8821–8831 (PMLR, 2021).
Croitoru, F.-A., Hondru, V., Ionescu, R. T. & Shah, M. Diffusion models in vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10850–10869 (2023).
Article Google Scholar
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT4. Preprint at https://arxiv.org/abs/2303.12712 (2023).
Gozalo-Brizuela, R. & Garrido-Merchán, E. C. ChatGPT is not all you need. A State of the Art Review of large generative AI models. GRACE 1, 1 (2023).
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Bertoline, L. M., Lima, A. N., Krieger, J. E. & Teixeira, S. K. Before and after AlphaFold2: an overview of protein structure prediction. Front. Bioinform. 3, 1120370 (2023).
Article Google Scholar
Lipinski, C. F., Maltarollo, V. G., Oliveira, P. R., Da Silva, A. B. & Honorio, K. M. Advances and perspectives in applying deep learning for drug design and discovery. Front. Robot. AI 6, 108 (2019).
Article Google Scholar
Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).
Article Google Scholar
Meyers, J., Fabian, B. & Brown, N. De novo molecular design and generative models. Drug Discov. Today 26, 2707–2715 (2021).
Article Google Scholar
Jiang, Y. et al. Artificial intelligence for retrosynthesis prediction. Engineering https://doi.org/10.1016/j.eng.2022.04.021 (2022).
Article Google Scholar
Sánchez-Cruz, N. Deep graph learning in molecular docking: advances and opportunities. Artif. Intell. Life Sci. 3, 100062 (2023).
Google Scholar
Mitchell, JohnB. O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).
Article Google Scholar
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
Article Google Scholar
Zhu, H., Yang, J. & Huang, N. Assessment of the generalization abilities of machine-learning scoring functions for structure-based virtual screening. J. Chem. Inf. Model. 62, 5485–5502 (2022).
Article Google Scholar
Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).
Article Google Scholar
Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024).
Article Google Scholar
Mokaya, M. et al. Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning. Nat. Mach. Intell. 5, 386–394 (2023).
Article Google Scholar
Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).
Article Google Scholar
Torren-Peraire, P. et al. Models matter: the impact of single-step retrosynthesis on synthesis planning. Digit. Discov. 3, 558–572 (2024).
Article Google Scholar
Ivanenkov, Y. et al. The hitchhiker’s guide to deep learning driven generative chemistry. ACS Med. Chem. Lett. 14, 901–915 (2023).
Article Google Scholar
Handa, K., Thomas, M. C., Kageyama, M., Iijima, T. & Bender, A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J. Cheminform. 15, 112 (2023).
Article Google Scholar
Harris, C. et al. PoseCheck: generative models for 3D structure-based drug design produce unrealistic poses. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023).
Neves, B. J. et al. QSAR-based virtual screening: advances and applications in drug discovery. Front. Pharmacol. 9, 1275 (2018).
Article Google Scholar
Yan, X. et al. Chemical structure similarity search for ligand-based virtual screening: methods and computational resources. Curr. Drug Targets 17, 1580–1585 (2016).
Article Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article Google Scholar
Pereira, J. et al. High-accuracy protein structure prediction in CASP14. Proteins 89, 1687–1699 (2021).
Article Google Scholar
Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).
Article Google Scholar
Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. PhD thesis, Univ. Cambridge (2012).
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Article Google Scholar
Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).
Article Google Scholar
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a crossdocked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems 6000–6010 (ACM, 2017).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. International Conference of Learning Representations (ICLR) (2017).
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Article Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).
Article Google Scholar
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2023).
Article Google Scholar
Jiang, D. et al. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J. Cheminform. 13, 12 (2021).
Article Google Scholar
Korolev, V., Mitrofanov, A., Korotcov, A. & Tkachenko, V. Graph convolutional neural networks as ‘general-purpose’ property predictors: the universality and limits of applicability. J. Chem. Inf. Model. 60, 22–28 (2020).
Article Google Scholar
Geiger, M. & Smidt, T. e3nn: Euclidean neural networks. Preprint at https://arxiv.org/abs/2207.09453 (2022).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. PLMR 139, 9323–9332 (2021).
Google Scholar
Scantlebury, J. et al. A small step toward generalizability: training a machine learning scoring function for structure-based virtual screening. J. Chem. Inf. Model. 63, 2960–2974 (2023).
Article Google Scholar
Corso, G. et al. DiffDock: diffusion steps, twists, and turns for molecular docking. In International Conference on Learning Representations (2023).
Igashov, I. et al. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 6, 417–427 (2024).
Article Google Scholar
Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional diffusion for molecular conformer generation. In Proc. 36th International Conference on Neural Information Processing Systems article no. 1760, 24240–24253 (ACM, 2022).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695v2 (2022).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Reed, J., Alterio, B., Coblenz, H., O’Lear, T. & Metz, T. AI image-generation as a teaching strategy in nursing education. J. Interact. Learn. Res. 34, 369–399 (2023).
Google Scholar
Yildirim, E. In Art and Architecture: Theory, Practice and Experience 97 (2022).
Azuaje, G. et al. Exploring the use of AI text-to-image generation to downregulate negative emotions in an expressive writing application. R. Soc. Open Sci. 10, 220238 (2023).
Article Google Scholar
Fishman, N., Klarner, L., Mathieu, E., Hutchinson, M. & De Bortoli, V. Metropolis sampling for constrained diffusion models. In Proc. 37th International Conference on Neural Information Processing Systems article no. 2721, 62296–6233 (ACM, 2024).
Song, Y., Dhariwal, P., Chen, M. & Sutskever, I. Consistency models. In International Conference on Machine Learning 32211–32252 (PMLR, 2023).
Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M. & Le, M. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations (2022).
Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proc. IEEE International Conference on Computer Vision 843–852 (IEEE, 2017).
Betker, J. et al. Improving image generation with better captions. Open AI https://cdn.openai.com/papers/dall-e-3.pdf (2023).
Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2014).
Article Google Scholar
Rose, P. W. et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45, D271–D281 (2016).
Google Scholar
Zdrazil, B. et al. The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 52, D1180–D1192 (2024).
Article Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at https://arxiv.org/abs/2204.06125 (2022).
Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet Google Scholar
Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48, D570–D578 (2019).
Google Scholar
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Article Google Scholar
Tang, J. et al. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J. Chem. Inf. Model. 54, 735–743 (2014).
Article Google Scholar
Huang, R. et al. Tox21challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 3, 85 (2016).
Article Google Scholar
Voitsitskyi, T. et al. Augmenting a training dataset of the generative diffusion model for molecular docking with artificial binding pockets. RSC Adv. 14, 1341–1353 (2024).
Article Google Scholar
Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).
Article Google Scholar
Blundell, T. L. & Patel, S. High-throughput X-ray crystallography for drug discovery. Curr. Opin. Pharmacol. 4, 490–496 (2004).
Article Google Scholar
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).
Article Google Scholar
Stark, H., Jing, B., Barzilay, R. & Jaakkola, T. Harmonic prior self-conditioned flow matching for multi-ligand docking and binding site design. In NeurIPS 2023 AI for Science Workshop (2023).
Corso, G., Deng, A., Polizzi, N., Barzilay, R. & Jaakkola, T. The discovery of binding modes requires rethinking docking generalization. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023).
Liu, L. et al. Pre-training on large-scale generated docking conformations with helixdock to unlock the potential of protein–ligand structure prediction models. Preprint at https://arxiv.org/abs/2310.13913 (2023).
McFee, M. & Kim, P. M. GDockScore: a graph-based protein–protein docking scoring function. Bioinform. Adv. 3, vbad072 (2023).
Article Google Scholar
Réau, M., Langenfeld, F., Zagury, J.-F., Lagarde, N. & Montes, M. Decoys selection in benchmarking datasets: overview and perspectives. Front. Pharmacol. 9, 11 (2018).
Article Google Scholar
Strieth-Kalthoff, F. et al. Machine learning for chemical reactivity: the importance of failed experiments. Angew. Chem. Int. Ed. 61, 29 (2022).
Article Google Scholar
Mlinarić, A., Horvat, M. & Šupak Smolčić, V. Dealing with the positive publication bias: why you should really publish your negative results. Biochem. Med. 27, 447–452 (2017).
Article Google Scholar
McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).
Article Google Scholar
Maloney, M. P. et al. Negative data in data sets for machine learning training. Org. Lett. 25, 2945–2947 (2023).
Article Google Scholar
McEwen, L. & Mustafa, F. Worldfair chemistry: making IUPAC assets fair. Chem. Int. 45, 14–17 (2023).
Article Google Scholar
Steinbeck, C. et al. NFDI4chem—towards a national research data infrastructure for chemistry in Germany. Res. Ideas Outcomes 6, e55852 (2020).
Article Google Scholar
Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Article Google Scholar
Ball, P. Computer gleans chemical insight from lab notebook failures. Nature https://doi.org/10.1038/nature.2016.19866 (2016).
Article Google Scholar
Swain, M. C. & Cole, J. M. ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56, 1894–1904 (2016).
Article Google Scholar
Rajan, K., Brinkhaus, H. O., Agea, M. I., Zielesny, A. & Steinbeck, C. DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nat. Commun. 14, 5045 (2023).
Article Google Scholar
Blecher, L., Cucurull, G., Scialom, T. & Stojnic, R. Nougat: neural optical understanding for academic documents. Preprint at https://arxiv.org/abs/2308.13418 (2023).
Chodera, J., Lee, A. A., London, N. & von Delft, F. Crowdsourcing drug discovery for pandemics. Nat. Chem. 12, 581 (2020).
Article Google Scholar
The COVID Moonshot Consortium. COVID Moonshot: open science discovery of SARS-CoV-2 main protease inhibitors by combining crowdsourcing, high-throughput experiments, computational simulations, and machine learning. Preprint at bioRxiv https://doi.org/10.1101/2020.10.29.339317 (2020).
Boby, M. L. et al. Open science discovery of potent noncovalent SARS-CoV-2 main protease inhibitors. Science 382, eabo7201 (2023).
Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).
Article Google Scholar
Hanser, T. et al. Using privacy-preserving federated learning to enable pre-competitive cross-industry knowledge sharing and improve QSAR models. In Society of Toxicology (SOT) Annual Meeting (2022).
Wang, R., Chaudhari, P. & Davatzikos, C. Bias in machine learning models can be significantly mitigated by careful training: evidence from neuroimaging studies. Proc. Natl Acad. Sci. USA 120, e2211613120 (2023).
Article Google Scholar
Van Giffen, B., Herhausen, D. & Fahse, T. Overcoming the pitfalls and perils of algorithms: a classification of machine learning biases and mitigation methods. J. Bus. Res. 144, 93–106 (2022).
Article Google Scholar
Leavy, S. Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning. In Proc. 1st International Workshop on Gender Equality in Software Engineering 14–16 (2018).
Lee, N. T. Detecting racial bias in algorithms and machine learning. J. Inf. Commun. Ethics Soc. 16, 252–260 (2018).
Article Google Scholar
Subramanian, G., Ramsundar, B., Pande, V. & Denny, R. A. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J. Chem. Inf. Model. 56, 1936–1949 (2016).
Article Google Scholar
Martins, I. F., Teixeira, A. L., Pinheiro, L. & Falcao, A. O. A Bayesian approach to in silico blood–brain barrier penetration modeling. J. Chem. Inf. Model. 52, 1686–1697 (2012).
Article Google Scholar
Delaney, J. S. ESOL: estimating aqueous solubility directly from molecular structure. J. Chem. Inf. Comput. Sci. 44, 1000–1005 (2004).
Article Google Scholar
Xie, Y., Xu, Z., Ma, J. & Mei, Q. How much space has been explored? Measuring the chemical space covered by databases and machine-generated molecules. In The Eleventh International Conference on Learning Representations (2022).
Thakkar, A. et al. Unbiasing retrosynthesis language models with disconnection prompts. ACS Cent. Sci. 9, 1488–1498 (2023).
Article Google Scholar
Cleves, A. E. & Jain, A. N. Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery. J. Comput. Aided Mol. Des. 22, 147–159 (2008).
Article Google Scholar
Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, e0220113 (2019).
Article Google Scholar
Sieg, J., Flachsenberg, F. & Rarey, M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 59, 947–961 (2019).
Article Google Scholar
Jacobsson, M. & Karlén, A. Ligand bias of scoring functions in structure-based virtual screening. J. Chem. Inf. Model. 46, 1334–1343 (2006).
Article Google Scholar
Chaput, L., Martinez-Sanz, J., Saettel, N. & Mouawad, L. Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance. J. Cheminform. 8, 56 (2016).
Article Google Scholar
Jiang, D. et al. Interactiongraphnet: a novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. J. Med. Chem. 64, 18209–18232 (2021).
Article Google Scholar
Shen, C. et al. A generalized protein–ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem. Sci. 14, 8129–8146 (2023).
Article Google Scholar
Farahani, A., Voghoei, S., Rasheed, K. & Arabnia, H. R. A brief review of domain adaptation. Advances in Data Science and Information Engineering: Proc. ICDATA 2020 and IKE 2020 877–894 (2021).
Han, X., Baldwin, T. & Cohn, T. Towards equal opportunity fairness through adversarial learning. Preprint at https://arxiv.org/abs/2203.06317 (2022).
Shao, S., Ziser, Y. & Cohen, S. B. Gold doesn’t always glitter: spectral removal of linear and nonlinear guarded attribute information. In The 17th Conference of the European Chapter of the Association for Computational Linguistics 1611–1622 (Association for Computational Linguistics, 2023).
Klarner, L. et al. Drug discovery under covariate shift with domain-informed prior distributions over functions. In Proc. 40th International Conference on Machine Learning article no. 706, 17176–17197 (ACM, 2023).
Kramer, C., Beck, B., Kriegl, J. M. & Clark, T. A composite model for hERG blockade. ChemMedChem 3, 254–265 (2008).
Article Google Scholar
Kausar, S. & Falcao, A. O. An automated framework for QSAR model building. J. Cheminform. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0256-5 (2018).
Simeon, S. & Jongkon, N. Construction of quantitative structure activity relationship (QSAR) models to predict potency of structurally diversed Janus kinase 2 inhibitors. Molecules 24, 4393 (2019).
Article Google Scholar
Kalliokoski, T., Kramer, C., Vulpetti, A. & Gedeck, P. Comparability of mixed IC₅₀ data—a statistical analysis. PLoS ONE 8, e61007 (2013).
Article Google Scholar
Kramer, C., Kalliokoski, T., Gedeck, P. & Vulpetti, A. The experimental uncertainty of heterogeneous public K_i data. J. Med. Chem. 55, 5165–5173 (2012).
Article Google Scholar
Landrum, G. A. & Riniker, S. Combining IC₅₀ or K_i values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).
Article Google Scholar
Hernández-Garrido, C. A. & Sánchez-Cruz, N. Experimental uncertainty in training data for protein–ligand binding affinity prediction models. Artif. Intell. Life Sci. 4, 100087 (2023).
Google Scholar
Speck-Planche, A. & Kleandrova, V. V. Multi-condition QSAR model for the virtual design of chemicals with dual pan-antiviral and anti-cytokine storm profiles. ACS Omega 7, 32119–32130 (2022).
Article Google Scholar
Baell, J. B. & Nissink, J. W. M. Seven year itch: pan-assay interference compounds (PAINs) in 2017 utility and limitations. ACS Chem. Biol. 13, 36–44 (2018).
Article Google Scholar
Brenk, R. et al. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 3, 435–444 (2008).
Article Google Scholar
Jadhav, A. et al. Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a thiol protease. J. Med. Chem. 53, 37–51 (2010).
Article Google Scholar
Walters, P. We need better benchmarks for machine learning in drug discovery. Practical Cheminformatics Blog https://practicalcheminformatics.blogspot.com/2023/08/we-need-better-benchmarks-for-machine.html (2023).
Klarner, L., Reutlinger, M., Schindler, T., Deane, C. & Morris, G. Bias in the benchmark: systematic experimental errors in bioactivity databases confound multi-task and meta-learning algorithms. In ICML 2022 2nd AI for Science Workshop (2022).
Wigh, D. S., Arrowsmith, J., Pomberger, A., Felton, K. C. & Lapkin, A. A. Orderly: data sets and benchmarks for chemical reaction data. J. Chem. Inf. Model. 64, 3790–3798 (2024).
Article Google Scholar
Durant, G., Boyles, F., Birchall, K., Marsden, B. & Deane, C. Robustly interrogating machine learning based scoring functions: what are they learning? Preprint at bioRxiv https://doi.org/10.1101/2023.10.30.564251 (2023).
Li, S. et al. Structure-aware interactive graph neural networks for the prediction of protein–ligand binding affinity. In KDD21: Proc. 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3447548.3467311 (ACM, 2021).
Wójcikowski, M., Kukiełka, M., Stepniewska-Dziubinska, M. M. & Siedlecki, P. Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics 35, 1334–1341 (2019).
Article Google Scholar
Wang, Z. et al. OnionNet-2: a convolutional neural network model for predicting protein–ligand binding affinity based on residue-atom contacting shells. Front. Chem. 9, 913 (2021).
Google Scholar
Browne, C. B. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012).
Article Google Scholar
Huang, K. et al. Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. Preprint at https://arxiv.org/abs/2102.09548v2 (2021).
Gan, J. L. et al. Benchmarking ensemble docking methods in D3R Grand Challenge 4. J. Comput. Aided Mol. Des. 36, 87–99 (2022).
Article Google Scholar
Ackloo, S. et al. CACHE (critical assessment of computational hit-finding experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6, 287–295 (2022).
Article Google Scholar

Your privacy, your choice

The future of machine learning for small-molecule drug discovery will be driven by data

Abstract

Access options

Similar content being viewed by others

Machine learning in preclinical drug discovery

Computational approaches streamlining drug discovery

A Bayesian machine learning approach for drug target identification using diverse data types

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Source data

Source Data Fig. 1

Rights and permissions

About this article

Cite this article

Subjects