Health Europa talks with Open Targets who discuss how the platform works to systematically identify and prioritise drug discovery to develop safe and effective medicines.
Open Targets is an innovative, large-scale, multi-year, public-private partnership that uses human genetics and genomics data for systematic drug target identification and prioritisation. Generating and interpreting the data required for drug discovery demands a diverse set of skills, backgrounds, evidence types and technologies, which do not exist today in any single entity. Open Targets brings together expertise from six complementary institutions to systematically identify and prioritise targets from which safe and effective medicines can be developed.
Speaking to Health Europa, Dr Rolf Apweiler, Joint Director of EMBL-EBI (European Bioinformatics Institute, part of the European Molecular Biology Laboratory) and Open Targets’ Director, discusses how the platform works and what he hopes it will achieve.
Many drugs that enter clinical trials fail and never make it to the market. How does the Open Targets platform work to overcome this challenge?
It is hard to objectively select targets with a high chance of clinical success because the data required to predict efficacy and safety are complex, dispersed and incomplete. Open Targets is a public-private initiative to transform drug discovery by enabling the systematic identiﬁcation and prioritisation of targets.
We have identified two connected, over-arching scientific requirements to achieve our vision of systematically identifying and prioritising targets. The first is to integrate information relevant to targets and diseases, and to make it as useful as possible to drug discovery scientists. The Open Targets Platform (see: http://targetvalidation.org) has been developed to marshal the resources of existing databases around a common infrastructure to specifically facilitate early target decisions.
The second requirement is to increase the wealth of data that provides causal links between targets and diseases. Here, we have developed high throughput experimental projects that generate target-centred data in human, physiologically relevant systems such as organoids and induced pluripotent stem cells.
We focus on three therapeutic areas of interest to our partners:
- Immunology; and
We envisage these two arms of Open Targets as a virtuous cycle of data generation, data integration and hypothesis generation.
Specifically, the platform enables scientists to search for diseases or targets, and in return receive the targets or diseases that are associated with a clear view of the evidence that supports the association. We accomplish this through our integration of the data which allows us to build an association score for a target and disease pair, and on top of this we have invested in a great deal of user experience design work to create easy to understand visualisation of the data.
Once scientists have identified interesting targets, they can further explore the evidence that associated them to a disease through our platform or follow links through to our data providers for more detail. Our aim is that by enabling scientists to readily access and assess relevant data when they are making decisions about what targets should be pursued for new, safe and effective medicines, targets with an improved chance of clinical success will be selected for drug discovery programmes.
What issues do you face in mining the data fed into the platform? Are there any significant challenges in compiling and comparing multiple data types?
The platform is designed to bring together different sources of data which provide evidence for an association between a target and a disease. One of the key obstacles we had to overcome was harmonising the description of disease in each data source so that the evidence could sit alongside the other evidence. We do this through an ontology (the Experimental Factor Ontology, EFO, see: https://www.ebi.ac.uk/efo) which brings together several disease ontologies, and also provides cross referencing between terms. This enables us to integrate data and allows users to navigate between similar diseases.
However, there is always a problem mapping new data to the ontology. This is a major barrier to data integration.
The data types we include in the platform were identified on the basis that users that we spoke to recommended them for use in targets identification and prioritisation. Within each data type – for example, rare disease genetics or a genome-wide association study (GWAS) – we have a principled scoring system that ranks evidence relatively according to the known parameters of the experiment.
However, comparing between data types is more difficult because it requires an assessment of how useful a particular data type is to target identification. Pragmatically, we have used an heuristic approach assigning weightings to our scores that reflect our view of the relative value of curated evidence, or evidence with a strong statistical basis, versus evidence that may be more circumstantial.
Eventually, we hope that we will be able to use machine learning type approaches to set these weightings, although one problem is the relative sparsity of positive data for successful drug discovery programmes, and the fact that most of that data represents historical paradigms for drug discovery which may not be the same as future approaches.
Finally, we also use an approach to aggregate our scores to give an overall score (the harmonic sum) that rewards initial replication of a piece of evidence but doesn’t excessively inflate the significance of repeated observations.
What can the community do to assist the platform and make sure that results are being exploited for applications they perhaps weren’t even intended for? And how important is it to make research output widely accessible?
Users can assist the platform by feeding back to us their experiences, ideas and any issues that they encounter either with the features or the data (email:email@example.com).
We can use this input to help shape the future direction of our development and correct any data issues found by working with our data providers. Beyond target identification and prioritisation, we have had feedback that the platform has been successfully used to support workflows for on-target and off-target safety reviews as well as drug repurposing.
It is part of our core philosophy to make our research outputs openly available, so we consider this to be incredibly important, and that the identification of a promising new target is pre-competitive and should be shared, with the subsequent steps moving into the competitive arena. The user cases mentioned show the benefit of this as the innovation that users have shown using our platform and data has given our team new ideas about how we can develop the platform.
Additionally within the community, we would ask that any group that is generating pre-competitive data relevant to target discovery to ensure it is submitted and made available through a public, sustainable database of record so that we and others can use it to build out the knowledge required to bring safe and effective medicines to patients through improved target selection.
How would you like to see the European community better provide the platform with biological data? How could this be achieved?
The keys are open data and open science. We need the flow of biological data and its descriptive metadata together with the information resulting from the analysis of the data into the existing appropriate databases. Open Targets can then find, access and integrate relevant data into the Open Targets Platform. The push of major European funders for open science and FAIR (Findable, Accessible, Interoperable, and Reusable) principles is a major step forward.
How difficult is it to ensure that data protection and privacy regulations are adhered to? Are there any instances of conflicting requirements – such as data having to be held for a specific duration while patients have an increasing right to have their data ‘forgotten’? How does the platform work to ensure that these issues are resolved?
This is an important issue. Open Targets works with our data providers to incorporate data that is fully publicly available and we’ve updated our own privacy and data policies in light of the recent GDPR legislation. We don’t currently host patient information, but in general patient information has the potential to be very important in our field. It is extremely important that patients are fully involved in decisions about how their data can be used and made available and the connections between making their data available and how it might benefit research and drug discovery are communicated with high quality.
There are existing databases that provide for controlled access to sensitive genome information, for instance the European Genome-phenome Archive (EGA), which is part hosted at one of our partners, EMBL-EBI. We submit identifiable genomics data from our projects to these controlled access resources.
For the Open Targets Platform, we are interested in the evidence that a target is associated with a disease, so one approach that we have adopted that doesn’t require identifiable data is to use summary level data. For instance, the association between a disease and the underlying genomes is summarised in a GWAS result in the association statistics for each SNP, without providing the individual genotype results. We are working with the GWAS catalogue at EMBL-EBI to provide access to these summary statistics.
Similarly, the occurrence of deleterious mutations in a single gene in a rare disease can be summarised as frequencies of mutation types without providing the individual genotypes. In this way, we can get access to the meaning of the observation without compromising privacy.
How widespread is discarded research data in Europe, and how would you like to see the community help to see the information compiled centrally and made available for novel applications?
It is difficult to know how widespread discarded research data is as it isn’t recorded! Currently, most of the data in our platform is ‘positive’ data, or in other words evidence that there is an association between a target and a disease. Our model can accommodate ‘negative’ data, but the availability of this is limited and it has to be carefully defined.
One immensely helpful piece of evidence in this space, that we would like to see centrally made available, would be better recording of the reasons for terminating drug discovery programmes based on in vivo or in vitro results, as often this is never published, or from trials that are stopped in the clinic but the rationale is not published or easy to discover.
Congratulations for securing Celgene. How does this strengthen the platform, and what will you be looking for with future partnerships?
All our partners bring a unique take on the science of target identification and prioritisation with them. Adding Celgene as a new voice to the consortium will strengthen our collective ideas and give us an opportunity to learn and help inform our platform and project development. As with our other partners, Celgene have a strong interest in genetics, which is an area of our informatics output that we are working with our partners to enhance in the very near future.
As our name suggests, Open Targets is open to collaboration with organisations with similar philosophies and scientific interests to our partners as we believe that the opportunities unfolding in target selection and validation are too big and moving too rapidly for any single organisation to successfully pursue them alone.
New data sources, methods to analyse and score our data, experimental projects and target identification, prioritisation and validation would be interesting topics to discuss.
This article will appear in issue 7 of Health Europa Quarterly, which will be published in November 2018.