Back to Projects
Evidence Hub Curator App - Real World Evidence Platform

Evidence Hub Curator App - Real World Evidence Platform

Work
CML Insights2023 - 2024Machine Learning Engineering Lead

Key Highlights

  • Principal-level ownership of an LLM-powered real-world evidence (RWE) extraction platform
  • Designed agent-based architecture for automated evidence extraction from research papers
  • Led system design, schema definition, and end-to-end feature delivery
  • Built scalable retrieval-augmented generation (RAG) pipelines over scientific PDFs
  • Advised and mentored engineers on LLM, RAG, and production AI best practices

Overview

LLM-driven platform that automates the discovery and extraction of real-world evidence (RWE) from academic research papers. Designed to help organizations quickly identify actionable interventions, target populations, and measurable outcomes without manual literature review.

The system transforms unstructured research PDFs into structured, decision-ready evidence at scale.

Architecture

Cloud-native, agent-based ML system:

  • Ingestion: Automated paper discovery and download via academic search providers
  • Knowledge layer: PDF ingestion into vector stores for semantic retrieval
  • Reasoning layer: LLM-backed agents for evidence extraction and validation
  • Output: Structured RWE mapped to standardized schemas for downstream use

Key Technical Contributions

Evidence Extraction Engine

Designed and built an automated evidence extraction service converting research PDFs into structured RWE. Extracts research objectives, study design, treatments, populations, outcomes, effect sizes, and statistical measures with schema-level consistency.

Agentic Research Assistant

Led development of a multi-agent Research Assistant system (V2, with V3 in progress). Implemented:

  • Extractor Agent to reason over retrieved paper sections
  • Formatter Agent to normalize outputs into strict schemas
    Enabled reliable, repeatable extraction using Retrieval Augmented Generation (RAG) over vectorized documents.

Platform & ML Systems Leadership

Owned architectural decisions across LLM integration, retrieval strategy, and schema design. Guided the team on prompt design, agent orchestration, evaluation strategies, and production hardening of LLM-powered services.

Technologies

  • Python: Core services, orchestration, and data processing
  • LLMs: OpenAI Assistants for agent-based reasoning
  • RAG: Vector stores with semantic retrieval over PDFs
  • Datastores: PostgreSQL for structured evidence storage
  • Kubernetes: Scalable deployment of extraction services

Impact

Reduced weeks of manual literature review to minutes by automating real-world evidence extraction. Enabled organizations across domains (education, healthcare, business) to act on high-quality research evidence with speed, consistency, and confidence.