A global leader in information and analytics for researchers and healthcare professionals.
- Estafet Team
- November 24, 2022
The Story
Our client, who is a global leader in the provisioning of information and analytics for scientific researchers and healthcare professionals, has launched a new cloud-based data platform designed to help life science companies overcome the challenges of modern R&D. The aim is to enrich and harmonise proprietary and external data and deliver it in an AI-ready environment. The solution strives to assist customers in scientific literature research, leveraging its rich public data, and predicting previously unknown information. The anticipated value is for researchers to produce far more accurate predictive models across a range of pre-and post-market activities, including drug efficacy studies, risk-benefit analyses, and pharmacovigilance activities.
Challenges
- ACCELERATE DEVELOPMENT
- SPEED UP SEMANTIC CONTENT HANDLING
- ENRICH SCIENTIFIC RESEARCH & ANALYTICS CAPABILITIES

The Challenges
Building a platform that links meaningfully the huge amounts of structured data and unstructured data (such as patents and journals) was a massive undertaking. Multiple teams were involved to address this initiative without much progress.
In addition to the usual challenges of building cloud-scale, big data solutions, with full automation of the software lifecycle, the project uses Resource Description Framework (RDF-based) semantics which poses scalability issues and makes it even harder to implement for Big Data solutions.


How we collaborated
Estafet augmented the client’s team with expertise in solution architecture, development, machine learning, DevOps, and test automation. Our teams quickly understood both the business domain and the product roadmap and were able to break down the roadmap items into actionable steps to execute the product vision.
The unstructured data was handled and processed with NLP techniques and stored as structured data in a graph. This simplified further processing and analysis through graph analytics algorithms such as Link Prediction, Shortest Path, and Frequent Pattern applied across the linked data. Machine Learning models were built with Workbench enabling drug-to-drug interaction or drug repurposing and REST APIs were developed for 3rd party application access.
Deliverables
- NLP HANDLING OF UNSTRUCTURED DATA
- CLOUD-BASED DATA INGESTION PLATFORM FOR STRUCTURED DATA
- AI WORKBENCH ENVIRONMENT FOR DATA SCIENTISTS
The Success
The solution improved the individual patents being ingested automatically from multiple patent offices. Furthermore, with the move to automated assimilation of unstructured data the client increased the value of its expert-curated chemistry database. The new innovations in data handling made chemical-structured diagrams machine-readable and searchable. This enabled a seamless search for drugs that have already been patented and opened up the opportunity to repurpose drugs – it costs $1bn to develop an entirely new drug, so this is potentially massive for pharmaceutical companies.
Outcomes
- EARLY VALUE TO CLIENTS DELIVERED
- HIGHLY EFFICIENT AND QUALITY TEAM AUGMENTATION
- REGULAR & RELIABLE RELEASES BUILD CONFIDENCE IN USERS