4 articles

AI & ML / MLOps

LLM integration, RAG pipelines, prompt engineering, MLOps infrastructure, and building AI features that work in production.

RAGLLMVector DBEmbeddingsLangChain

RAG in Production: What the Demo Does Not Show You

Retrieval augmented generation is easy to demo and hard to run reliably. The real work is in chunking strategy, picking the right embedding model, choosing a vector store, reranking results, and measuring whether any of it works.

May 5, 202513 min read

MLOpsKubernetesKubeflowMLflowKServe

Running ML Pipelines on Kubernetes from Training to Serving

A machine learning model is only useful if you can retrain it, version it, and serve it reliably. This post covers building that whole pipeline with Kubeflow, Argo Workflows, and MLflow on Kubernetes, right through to serving with KServe.

Apr 18, 202514 min read

LLMOpenAIPrompt EngineeringTool CallingStreaming

LLM Integration Patterns That Actually Work in Production

Adding an LLM to a product is straightforward. Making it fast, cheap, and reliable is not. This post covers prompt caching, streaming, structured output, tool calling, fallback chains, and keeping costs from getting out of hand.

Mar 25, 202510 min read

Conversational AIWebSocketsRAGMemoryMultimodal

Building a Conversational AI Backend That Scales

A chat interface is just the front. Behind it you need session management, context window handling, memory that works across turns, safety checks, and streaming delivery that does not keep users waiting. This post goes through all of it.

Feb 20, 202512 min read