The ability of computers to produce new compound structures using ‘generative chemistry’ algorithms is a hot topic in drug discovery. The field has recently been reinvigorated by new machine learning (ML) algorithms that learn what a drug-like molecule ‘looks like’ and subsequently generate large numbers of chemically meaningful structures.
A key advantage of ML algorithms is that they can explore the chemical space around a lead or series, and are able to generate vastly more compounds than either an individual or a team of experts. In addition, these algorithms can be combined with predictive models of target activities and other compound properties, to identify high-quality chemical suggestions. ML methods have the ability to learn from much more data than any human expert, enabling the identification of complex patterns and guiding future compound optimisation. The resulting generative chemistry systems can also be applied objectively, thereby challenging human biases to enable a more rigorous exploration of optimisation strategies.1
Unfortunately, generative chemistry methods often remain difficult to apply on a routine basis. They typically require a specialised computational expert to set up the complex array of parameters appropriately in order to generate legitimate results. Furthermore, the output is frequently a long list of compounds, many of which are irrelevant. This then creates a need for extensive post-hoc analysis and triage to find the proverbial ‘needle in a haystack’; a compound that is worthy of subsequent synthesis and experimental investigation. Exceptions to issues like this include the BRADSHAW2 and Kernel3 systems, established by GSK and Eli Lilly, respectively, which take a much more proactive approach to suggesting ideas to discovery teams.
Defining Generative Chemistry Methods
A recent paper by Goldman et al. 4 describes a scale of Automated Chemical Design (ACD) Levels to classify generative chemistry methods. This is similar to the one used for autonomous vehicles.5 The ACD Level depends on whether the compound ideas are generated by a human or by a machine, whether a human or machine selects the ideas for consideration, and whether the system performs multiple or single iterations of optimisation (see Table 1). The author of this paper notes that, in their opinion, there are not yet any examples of an ACD Level 5 system in practice.
Although the ACD Level approach provides a convenient way to classify generative chemistry methods, it makes an artificial distinction between whether machines or humans perform the idea generation and selection, when in practice, there need not be such a division of responsibilities. In a concept known as Augmented Intelligence, human experts and computer algorithms can work collaboratively to achieve superior outcomes compared to when working separately.6
Introducing, Augmented Intelligence – Dynamic Learning to Generate Better Compounds
Augmented Intelligence leverages the combined strengths of both human experts and of ML algorithms to overcome their individual weaknesses. Human experts possess an understanding of strategic objectives and context that raw data alone does not capture. With this knowledge, a human expert can guide a rigorous and objective tactical analysis of the strategic options using a ML algorithm. Supplementing human expertise, this computer algorithm can exhaustively explore far more options, learn from more data, and challenge natural human biases to facilitate innovation and thinking ‘outside the box’. The value of this approach has already been demonstrated in wide-ranging disciplines, including medicine,7 document search,8 and customer service.9