How data science is driving innovations in medical biotechnology

How data science is driving innovations in medical biotechnology
© iStock-metamorworks

In this article, Ramya Sriram, digital content manager at freelance platform for scientists, Kolabtree, explains how data science is driving innovations in medical biotechnology.

It is fair to say that the human body contains a lot of data. Not only is our DNA made up of about three billion genome bases, if you laid out all the DNA in the human body, it would stretch to twice the diameter of the Solar System and each cell’s DNA would be three-metre long — now amounting to a lot of data.

Biotechnology, the use of living organisms or biological systems and their derivatives to make products, is propelled forward by data, information, and statistics. In 2014, according to Science magazine, bioinformatics became a discipline in its own right rather than a tool in a biologist or biotechnologist’s armoury. Business intelligence, data analytics, and technological advances are crucial to the development of new technologies and treatments, and to overcome current challenges. By making sense of big data, from genomics or from sensors, we can identify potential drug targets, improve processes, bring new drugs to market, and reduce errors in clinical trials.


Think of big data in the context of biotechnology, and your first thought probably relates to genome sequencing. The Human Genome Project, which ran from 1990 to 2003, was a pioneering effort that gave us access to three billion bases of data, opening the door to information on mutations, genes and more. We now live in a world where genome data is at our fingertips, it can be sequenced in a few hours and for under £1,000. Think carefully about how much data that is — how are we going to make the best use of it?

The data now available to us provides researchers with the ability to obtain insights on things from medicine to crime scene investigation. To work with it effectively, data scientists use frameworks and tools to store, track, receive, analyse, and interpret their data. Tools are now being built to automatically annotate specific genes, and software companies like DNAnexus, Knome, and NextBio have sprung up to tackle genome interpretation. Interestingly, NextBio has even worked with Intel to improve Hadoop for genomic big data analysis. The pharmaceutical and healthcare industries can use this insight to improve diagnostics, aid drug discovery, or develop personalised medicine strategies.

Drug discovery and development

Bringing a new pharmaceutical product to market is a long, arduous process with many bottlenecks. Trials regularly fail to meet their objectives, for example in terms of enrolment, which can add further delay and therefore increase the costs of an already expensive process. From finding a drug candidate to recruiting patients for a clinical trial, there are numerous data points, experiments, and risk/benefit analyses to conduct, making the pharmaceutical industry a logical fit for big data analytics.

We can now use automated software to screen millions of compounds to identify drug candidates for a clinical trial. Pharmaceutical professionals can let Artificial Intelligence (AI) do the hard work of sifting through a huge library of potential drugs, assessing what is likely to work against the trial’s specific criteria.

Biotechnology company Numerate, for example, builds predictive models to help with small molecule drug design, making predictions on toxicity, metabolism, absorption, distribution and more. AI can also be used to come up with new combinations of compounds. Pharmaceutical companies can therefore screen drug candidates and pick the most likely ones to take to clinical trial.

Big data in biotechnology is not only about genomics — the data may also be collected by sensors. Wearable, ingestible, or implantable sensors can provide a continuous data stream for clinical trials. This data can reduce the gap between measurements taken at appointments, mitigate for human error, identify reasons for dropout and may allow patients to go about their normal lives more easily.

Any improvement in the drug discovery or clinical trial process can save millions of dollars in development costs and speed up the time it takes to bring a potentially life-saving drug to market.

Healthcare and disease management

One big data challenge facing the healthcare sector is the storage and management of electronic medical records. In fact, the US Government is investing $19bn (~€15.50) into boosting the uptake of electronic records. With patient information stored in this way, the industry has a pool of data to work with to help improve diagnoses and treatments.

Hospitals can monitor and evaluate a patient’s progress and the information could feed into something much bigger. For example, Genentech has produced a database of patients that have previously been treated for cancer, to help inform treatment for newly diagnosed patients. In New York, the Partnership to Advance Clinical Electronic Research is working on a system to enable investigators and sponsors to use electronic patient records to help find patients for clinical trials. Oracle has also unveiled cloud-based applications that support the sharing of anonymised patient data between the health system and pharmaceutical companies.

By using wearable, implantable, or ingestible devices, disease progression could be monitored continuously and in real-time. Data on diet, environmental factors, sleeping habits, and more could be tied to genomic information to alert an individual to the risk of a specific disease. Treatment plans could be more carefully tailored and costs to the healthcare system reduced.

The keys to the future

Data scientists hold the keys to the future of data analytics for medical biotechnology in their hands. For innovation to take place, the industry needs trained data scientists and biotechnicians with skills in languages including Python, R, C++, and SQL, among others. They also require an underlying knowledge of data collection, storage, algorithms, validation, and visualisation to generate meaning from biological data. The knowledge and skills needed are held primarily by data scientists with advanced degrees, a pool of professionals that is expensive to hire from and can be difficult to access.

While it would not be worthwhile to lay out three billion bases of DNA, there is a lot to be gained from data analytics in medical biotechnology.

Ramya Sriram
Guest author
Digital content manager

Subscribe to our newsletter


Please enter your comment!
Please enter your name here