AI/ML Engineer • RAG & LLM Systems

I make AI systems practical, performant, and production-ready.

Currently at Radical Squares building RAG pipelines, LLM orchestration, and real-time inference infrastructure.

View work Get in touch

Selected Work

Projects

400

ms latency

Brckt

Production RAG system for real-time tennis analytics.

Production RAG system for real-time tennis analytics
Built 5-stage pipeline with query routing, hybrid retrieval, and streaming generation
Reduced LLM latency from 3s to 400ms (7.5x faster) using chunked streaming
Integrated via FastAPI endpoints.

FastAPIQdrantLLM Streaming

Live Site View Code

Agents

CodePilot

Multi-agent AI system for autonomous code generation.

Multi-agent AI system for autonomous code generation
Four specialized agents (Planner, Coder, Reviewer, Tester) orchestrate complex coding tasks with sandboxed E2B execution—reducing iteration time by 60% vs manual development cycles
LangGraph provides state management for reliable agent coordination, solving the coordination failure problem (30% → <5%).

Claude 4.5LangGraphE2B

Live Demo View Code

<100

ms latency

ML-Monitor

Production MLOps platform for real-time fraud detection.

Production MLOps platform for real-time fraud detection
Sub-100ms inference (5x faster than industry standard 500ms) with automated retraining on drift
Handles 10K+ predictions/sec with Grafana observability
Why it matters for 2026: Same MLOps patterns power LLM guardrails—fraud detection principles transfer directly to prompt injection detection and output validation for GenAI systems.

FastAPIXGBoostMLflow

Live Demo View Code

60%

cost saved

Cascade

Intelligent LLM router with semantic caching.

Intelligent LLM router with semantic caching
DistilBERT classifier achieves 97% routing accuracy at 50ms—routes 70% of queries to GPT-3.5, 30% to GPT-4 based on complexity
Result: 60% cost reduction vs 100% GPT-4 baseline
Hybrid cache strategy (exact-match Redis + semantic Qdrant) dropped false positives from 15% → <2% through threshold tuning.

DistilBERTQdrantRedis

Live Demo View Code

94%

relevance

VerbaQuery

Industrial RAG with hybrid retrieval (BM25 + dense embeddings) and cross-encoder re-ranking for enterprise document search.

Industrial RAG with hybrid retrieval (BM25 + dense embeddings) and cross-encoder re-ranking for enterprise document search
Hybrid approach: 94% relevance vs 78% dense-only baseline (+16% improvement)
Architecture choice: BM25 catches exact keywords, SBERT handles semantic matches—cross-encoder re-ranking on top-50 adds final 4% boost
Evaluated on 2K query-document pairs with manual relevance judgments
(Private enterprise deployment)

LangChainQdrantCross-Encoder

View Code

About

Background

MS in Computer Science from Indiana University with a 3.9 GPA. Focused on making machine learning work in production.

My work spans RAG systems, LLM pipelines, and MLOps infrastructure. I care deeply about building AI that's reliable, fast, and actually useful.

Experience

GenAI Engineer

Jan 2026 — Present

Radical Squares →

Developing production GenAI applications integrating OpenAI GPT-4o APIs with LangChain. Built full-stack AI platform with React/FastAPI, implemented RAG pipeline with vector embeddings, and deployed microservices with Docker/Redis achieving 94.6% API cost reduction.

GPT-4oRAG Pipeline94.6% cost reduction

AI/ML Engineer

Dec 2024 — Present

Brckt (Peristyle Labs) →

Built real-time GenAI application using Llama 3.3-70B LLM with streaming responses via Server-Sent Events (SSE). Developed scalable backend API with FastAPI and async processing, containerized with Docker and deployed with Caddy reverse proxy.

Llama 3.3-70BSSE StreamingDocker/Caddy

GenAI Developer

Jun 2025 — Dec 2025

Riverside Global →

Architected enterprise RAG system using GPT-4 API with LangChain orchestration. Implemented 5-stage pipeline with hybrid retrieval (BM25 + semantic), built vector search with ChromaDB/FAISS for 10,000+ documents achieving 94% retrieval accuracy.

94% accuracyChromaDB/FAISSGPT-4 + LangChain

Tech Stack

Python

PyTorch

TensorFlow

LangChain

Hugging Face

OpenAI

Claude

Ollama

Contact

Let's build something

Currently open to AI/ML engineering opportunities. If you're building something interesting, I'd love to hear about it.

GitHub LinkedIn

Send a Message