Neo4j’s Alex Jarasch explains how knowledge graphs can be a valuable asset for the pharmaceutical industry to extract insights from complex data.
An interesting possibility is emerging for the life sciences industry – can we use a new approach to organising complex data to gain a more holistic view of the underlying problems we want to address? The new approach is knowledge graph, which is defined by the AI research group The Alan Turing Institute as capturing information about entities of interest in a given domain or task (like people, places or events) and finding the connections between them.
Tackling the Hard Bio Problems
A knowledge graph is a type of data structure that represents information as a set of entities and the relationships between them. It is a way to organise and link data together in a way that is more easily understandable and accessible. The main feature of knowledge graphs is that they allow for complex connections between different data sources, which can reveal new insights and relationships that would be difficult or impossible to uncover using traditional SQL databases.
Knowledge graphs are multidimensional and can represent data in a variety of forms, including text, images, and structured data. This allows them to store and link together different types of information, such as scientific research, clinical trial data, and market data. The ability to make connections between different data sources is what makes knowledge graphs so powerful for the life sciences industry.
The first real-world use of graph technology that entered the public’s consciousness was the Panama Papers scandal, where a network of journalists used graphs to link together millions of documents and uncover illicit financial activities. That seems at a remove from life sciences in one way, but in another it isn’t – both areas involve seeing useful patterns in a lot of what seems like “noise” to the average person.
In the field of biological science, for example, understanding the complex interrelationships between different factors, such as genes, environment, diet, and behaviour, is essential for gaining a deeper understanding of diseases and developing new treatments but is notoriously hard. Knowledge graphs could play a crucial role in aiding us here, by empowering analysis of these interrelationships and correlations on a large scale.
Modern native graph databases are particularly well-suited for this type of analysis. They are optimised for handling large amounts of interconnected data and so can store and link together billions of connections. That makes it possible to analyse and make sense of massive amounts of data – important in fields such as medicine, where the amount of data being generated is growing rapidly, and traditional methods of data analysis are no longer sufficient.