Understanding Drug Discovery and Drug Discovery Using Machine Learning.

Adith - The Data Guy
8 min readJun 10, 2021


Machine Learning is coming into practice in various industries, including transportation, advertising, healthcare, and pharmaceutical to name a few. According to Accenture’s data, AI applications are expected to generate 150 billion dollar savings for the United States by 2026. With the help of Machine learning, we have already improved the diagnostic process as a whole.

With that being said there is another healthcare segment that is dependent on data which is drug discovery. Before jumping to learn about Drug discovery let’s once walk through how the drugs were discovered earlier and the current process of drug discovery.

So, without any further ado let’s get started!!!

Drug Discovery in the past

In 1928, Alexander Fleming, a pathologist accidentally left his petri dish uncovered beside a window before leaving for a vacation. Arriving back from his exciting vacation to his surprise the thrown-away dish led him to the discovery of the world’s first antibiotic Penicillin which excited the whole pharmaceutical industry.

This is just one of many examples of how the simplest mistake made itself into a breakthrough.

Even recently, New Zealand researchers discovered that the vaccine used for the meningitis epidemic in the early 2000s also simultaneously lowers the risk of gonorrhea.

Drug Discovery in the present

a drug from research to the market on average takes 2.6 billion dollars and more than 10 years. Yes, you read it right it takes more than 10 years. This partly explains the lack of treatment for more than 90% of diseases.

The answer to why this process is so lengthy and costly is just only because of its complexity.

We all want life-saving drugs to be in the hands of patients quicker and cheaper. But have you ever seen what the drug discovery pipeline looks like? Ok, let me break that down for you.

Phases of Drug Discovery:

There are 7 phases in drug discovery

1. Target Identification

The first step is not about the drug rather it is all about understanding the targets that are responsible for the disease. These targets consist of DNA mutations, misfolded proteins, and other disease biomarkers. However, this was not the case in the discovery of Penicillin, which was purely unexpected. This process at least takes 2 years to identify the target.

2. Lead Discovery

This is the process of choosing thousands of compounds designed to interfere with the targets. The objective of this process is to significantly narrow down the domain of the variety of possible compounds. This process takes around 1–2 years to complete.

3. Medicinal Chemistry

The phase involves the process of testing the compound to analyze its interaction with disease targets. Some analyses may be conducted including investigating the interaction of compounds taking into account. This process takes around 1–2 years to complete.

4. In-vitro Studies

As proof of concept, compounds that make it to this phase are tested in a cell system. This is the phase where the petri dish studies come into the picture. In vitro studies attempt to examine the effectiveness of the compound when it is interfering with the target. The results from the in-vitro studies don’t reflect the results from the animal or clinical studies. This process takes around 1–2 years to complete.

5. In-Vivo Studies

After getting wonderful success from in-vitro studies the compound is tested in animal models like rats. The results from animal studies are more representative compared to the in-vitro studies model. Nevertheless, in-vivo studies are more expensive than in-vitro studies. This process takes around 1–2 years to complete.

6. Clinical trials

If the results from the above are promising, they then proceed to clinical trials. There are three phases in clinical trials each with a different objective. The final goal of a clinical trial is to validate the efficacy and safety of the compound in a human setting. There are many regulatory aspects of clinical trials. Proving the efficacy of the drug is not an easy task. Especially the long-term side effects of the compounds usually remained unknown.

In the first phase of the clinical trial, 20–80 volunteers are tested. In the second phase, 100–300 volunteers are tested. And in the third phase, 1000 to 3000 volunteers are tested. And a fact is that the average cost of phase 3 clinical tests is estimated to be 19 million dollars.

7. Food and Drug Administration [FDA] approval and Commercialization Once all the testing is completed the compound is submitted to be reviewed by the FDA for approval. Once approved, the drug can finally be commercially available in the markets for patients to improve lives and treat diseases.

But what if there was a way to accelerate the process of early phase drug discovery (except the clinical trials and commercialization) from 6 years to 6 months to 6 days? I don’t think this is sci-fi at all!

Basic Algorithms Related to ML in Drug Discovery

Okay, now think of drug discovery as trying to fit a key in a lock. The lock is compared to the target ligand with which we hope for the key, the drug molecule, to bind to produce a reaction. The algorithm tests thousands and thousands of keys to find the one that fits perfectly. Both humans and the algorithm do the same, but the only difference is that the algorithm does this in seconds, not months.

There are two significant aspects to drug finding — data mining and deep learning. Data mining uses the available research data, to narrow down and identify the target ligand with which we want our drug to bind. Once the target has been identified, we use deep learning technologies, such as convolutional neural networks to create extremely accurate binding profiles. These networks then select the best molecule considering a titanic amount of parameters, and hence a New drug was found.

Current innovations and Impact of ML in drug discovery

  • Insilico Medicine was able to design, synthesize, and verify new drug candidates in just 46 days. This system examines previous research and patents for molecules to work against the drug target. It’s similar to what a human chemist might do to seek new therapies — much faster.
  • AlphaFold a Google’s AI algorithm was able to predict the 3D structure of proteins at an unusual speed and accuracy, topping some of the world’s best biologists and researchers in this field.

As mentioned earlier many companies are now using Machine learning and Artificial Intelligence among them is Pfizer, which uses IBM Watson in searching for new immuno-oncology drugs. Watson has defeated human intelligence in areas like winning the quiz show Jeopardy against its former champions. Apart from this, it has successfully diagnosed a woman with leukemia. Many of the pharma corporations have now teamed up with Watson.

The process of creating a new drug creates a lot of data. Machine Learning offers an opportunity to process data and create outcomes that help us in drug discovery. Machine Learning can help us process the data that has been collected over many decades in a short period. Also, Machine Learning helps us to make better decisions that are taken by predictions and experiments.

McKinsey says if we use Machine Learning to our full potential then it will help the healthcare industry to generate 300 billion dollar revenue each year. Experts predict that Machine Learning will help with predictive modeling of biological processing in later years which will help us in developing successful medication in less time.

So, till now I guess you got a brief overview of the process of Drug Discovery using Machine Learning and I hope it excites you. To make it even more exciting now let’s look at the 3 companies that made a real-life impact on Drug Discovery.

Drug Discovery Examples in Real-Life Astrazeneca

AstraZeneca harnesses data and technology to discover new medicines. They are embedding Data Science and Artificial Intelligence to enable themselves to deliver life-changing medicines.

AstraZeneca in collab with Oxford University found the vaccine for the deadly coronavirus named Covid-19. AstraZeneca has already passed all three clinical trials. According to their survey report, there were 11,635 participants of which 131 were symptomatic for the phase-III trials. Medicines and Healthcare Products Regulatory Agency (MHRA) in the UK provided the authorization for Covid-19 Vaccine Astrazeneca on 30 December 2020. The Company aims to supply millions of doses in the first quarter as part of an agreement with the government to supply up to 100 million doses in total.


The Antidote is a company that mines clinical trials. It is a UK and US-based company focusing on matching patients and researchers in clinical trials. This platform allows patients to find the most suitable clinical trials, which helps the researchers continue their latest study information to millions of patients and even connects them with the members of the medical community. Firstly, the company was launched under the name TrialReach in 2010 and it was rebranded to Antidote in 2016.


Atomwise is a well-known company in drug discovery. This company aims to reduce the cost of medicine by using supercomputers to predict from a database of molecular structures in advance. AtomNet is their deep convolutional neural network which screens more than 100 million compounds each day.

In 2015, Atomwise launched a search for safe, existing medicines that could be redesigned to administer the Ebola Virus. They found two drugs predicted by the company’s AI which may significantly reduce Ebola. This analysis, which typically would take more than months or years was completed within a single day. With this example imagine how drug discovery would be efficient if clinical trials like this come into our day-to-day life.


This a company that purely concentrates on cancer drugs. A team of AI developers, medical professionals, and bioinformaticians spent 6 years researching and building an artificial intelligence solution to design personalized treatments for any cancer patient faster than any traditional healthcare service. This technology models cell biology on the molecular level; which can identify the best drug to target a specific tumor, moreover, it identifies complex biomarkers and designs combination therapies by performing millions of experiments each day.

The reason why Turbine is unique is only because of its molecular model of cancer biology guided by an AI that identifies the biomarkers that signal sensitivity to treatment. As a result, the technology is already being used in collaborations with Bayer, the University of Cambridge, and Hungarian research groups to find new cancer cures. This in turn speeds up the time to market them and save the lives of patients suffering from incurable forms of the deadly disease.

Additional References

  1. Fleming, N. How artificial intelligence is changing drug discovery.
  2. Top Companies Using A.I. In Drug Discovery And Development. (2019, September 20).

Images from:

  1. https://www.researchgate.net/publication/341097009_The_Stages_of_Drug_Discovery_and_Development_Pro cess
  2. https://www.antidote.me/
  3. https://www.atomwise.com/
  4. https://www.drugdiscoverytrends.com/turbine-secures-e3-million-eur-seed-fund-to-expand-the-potential-of-simulation-first-drug-discovery/
  5. https://www.pexels.com/
  6. https://brandslogos.com/a/astrazeneca-logo/


On reading this article, I hope you got a better understanding of what is drug discovery, how drugs were discovered in the past, drug discovery in the present, 7 phases of drug discovery, Algorithms used in drug discovery, the Impact created by Machine Learning in drug discovery and drug discovery examples in real-life.


Originally published at https://fittechie.in.



Adith - The Data Guy

Passionate about sharing knowledge through blogs. Turning data into narratives. Data enthusiast. Content Curator with AI. https://www.linkedin.com/in/asr373/