Evidence Hub Curator App - Real World Evidence Platform

Technologies

kubeflow postgres kubernetes terraform gcp kustomize vault

Key Highlights

▸Principal-level ownership of an LLM-powered real-world evidence (RWE) extraction platform
▸Designed agent-based architecture for automated evidence extraction from research papers
▸Led system design, schema definition, and end-to-end feature delivery
▸Built scalable retrieval-augmented generation (RAG) pipelines over scientific PDFs
▸Advised and mentored engineers on LLM, RAG, and production AI best practices

Overview

LLM-driven platform that automates the discovery and extraction of real-world evidence (RWE) from academic research papers. Designed to help organizations quickly identify actionable interventions, target populations, and measurable outcomes without manual literature review.

The system transforms unstructured research PDFs into structured, decision-ready evidence at scale.

Architecture

Cloud-native, agent-based ML system:

Ingestion: Automated paper discovery and download via academic search providers
Knowledge layer: PDF ingestion into vector stores for semantic retrieval
Reasoning layer: LLM-backed agents for evidence extraction and validation
Output: Structured RWE mapped to standardized schemas for downstream use

Key Technical Contributions

Evidence Extraction Engine

Designed and built an automated evidence extraction service converting research PDFs into structured RWE. Extracts research objectives, study design, treatments, populations, outcomes, effect sizes, and statistical measures with schema-level consistency.

Agentic Research Assistant

Led development of a multi-agent Research Assistant system (V2, with V3 in progress). Implemented:

Extractor Agent to reason over retrieved paper sections
Formatter Agent to normalize outputs into strict schemas
Enabled reliable, repeatable extraction using Retrieval Augmented Generation (RAG) over vectorized documents.

Platform & ML Systems Leadership

Owned architectural decisions across LLM integration, retrieval strategy, and schema design. Guided the team on prompt design, agent orchestration, evaluation strategies, and production hardening of LLM-powered services.

Technologies

Python: Core services, orchestration, and data processing
LLMs: OpenAI Assistants for agent-based reasoning
RAG: Vector stores with semantic retrieval over PDFs
Datastores: PostgreSQL for structured evidence storage
Kubernetes: Scalable deployment of extraction services

Impact

Reduced weeks of manual literature review to minutes by automating real-world evidence extraction. Enabled organizations across domains (education, healthcare, business) to act on high-quality research evidence with speed, consistency, and confidence.