AI/ML Engineer • RAG & LLM Systems

Building production AI systems that actually ship.

I design and build reliable AI infrastructure—RAG pipelines, LLM orchestration, and real-time inference systems.

Selected Work

Projects

5-Stage RAG PipelineQueryRouteReformQdrantCheckStream Gen400msFastAPI + Qdrant
400
ms latency
01

Brckt

Production RAG system for real-time tennis analytics.

FastAPIQdrantLLM Streaming
QueryOrchestratorPlannerCoderReviewerExecutorE2B Sandbox
4
Agents
02

CodePilot

Multi-agent AI system for autonomous code generation.

Claude 4.5LangGraphE2B
Real-time Fraud DetectionDataStreamFastAPI<100msXGBoostModelRiskScoreMLflow
<100
ms latency
03

ML-Monitor

Production MLOps platform for real-time fraud detection.

FastAPIXGBoostMLflow
QueryDistilBERTClassifierQdrantHITRouterRedisMISS
60%
cost saved
04

Cascade

Intelligent LLM router with semantic caching.

DistilBERTQdrantRedis
Hybrid RAG PipelineQueryHybridBM25DenseCrossEncoderLLMQdrant
94%
relevance
05

VerbaQuery

Industrial RAG with hybrid retrieval (BM25 + dense embeddings) and cross-encoder re-ranking for enterprise document search.

LangChainQdrantCross-Encoder

About

Background

MS in Computer Science from Indiana University with a 3.9 GPA. Focused on making machine learning work in production.

My work spans RAG systems, LLM pipelines, and MLOps infrastructure. I care deeply about building AI that's reliable, fast, and actually useful.

Experience

AI/ML Engineer

2024 — Present
Brckt

Building real-time sports analytics with LLM streaming inference on a 4-person ML team. Reduced latency from 3s to 400ms using chunked streaming and model quantization. Led optimization sprint, collaborated with backend engineers on API design, mentored junior engineer on streaming patterns.

400ms latencyStreaming LLMsReal-time analytics

AI Engineer

2024
Riverside Global

Architected production RAG systems for enterprise document search. Improved retrieval relevance from 67% (BM25 baseline) to 94% using hybrid search and cross-encoder re-ranking. Coordinated with Product, DevOps, and enterprise clients. Presented technical architecture to C-suite stakeholders.

94% relevanceHybrid RAGEnterprise scale

Tech Stack

Python
PyTorch
TensorFlow
LangChain
Hugging Face
OpenAI
Claude
Ollama

Contact

Let's build something

Currently open to AI/ML engineering opportunities. If you're building something interesting, I'd love to hear about it.

ayushkumarmalik10@gmail.com

What I'm Looking For

High-impact AI roles

Teams shipping real ML products to production—not just prototypes.

Hard problems

RAG at scale, LLM orchestration, real-time inference systems.

Strong engineering culture

Code reviews, testing, and ownership over ML systems end-to-end.

© 2026 Ayush