AI & ML / MLOps
LLM integration, RAG pipelines, prompt engineering, MLOps infrastructure, and building AI features that work in production.
RAG in Production: What the Demo Does Not Show You
Retrieval augmented generation is easy to demo and hard to run reliably. The real work is in chunking strategy, picking the right embedding model, choosing a vector store, reranking results, and measuring whether any of it works.
Running ML Pipelines on Kubernetes from Training to Serving
A machine learning model is only useful if you can retrain it, version it, and serve it reliably. This post covers building that whole pipeline with Kubeflow, Argo Workflows, and MLflow on Kubernetes, right through to serving with KServe.
LLM Integration Patterns That Actually Work in Production
Adding an LLM to a product is straightforward. Making it fast, cheap, and reliable is not. This post covers prompt caching, streaming, structured output, tool calling, fallback chains, and keeping costs from getting out of hand.
Building a Conversational AI Backend That Scales
A chat interface is just the front. Behind it you need session management, context window handling, memory that works across turns, safety checks, and streaming delivery that does not keep users waiting. This post goes through all of it.