Posts tagged "production"

Kitaru durable runtime around a Claude Agent SDK invocation

Don't make Claude do the same work twice

Claude Agent SDK runs the agent loop. Kitaru adds the durable runtime around a completed invocation — checkpointed results, artifacts, replay boundaries, and waits.

Alex Strick van Linschoten

Jun 1, 20268 mins

Kitaru

Your LangGraph agent works. Now make the workflow durable.

LangGraph keeps graph state, threads, and interrupts. Kitaru adds the durable workflow around the graph call — replay boundaries, durable waits, and inspectable runs.

Alex Strick van Linschoten

May 29, 20269 mins

Kitaru

OpenAI Agents are great. Production still needs a runtime.

The OpenAI Agents SDK stays the harness; Kitaru adds the runtime around it — durable workflow waits, replay boundaries, and inspectable execution history.

Alex Strick van Linschoten

May 27, 202610 mins

Kitaru

The Anatomy of a Production Coding Agent

A production coding agent isn't a prompt and a while loop. It's eight stages, each with different failure modes, costs, and human touchpoints. Here's the full pattern.

Hamza Tahir

Mar 15, 2026

LLMs

RLMs in Production: What Happens After the Notebook

Alex Strick van Linschoten

Feb 20, 20267 mins

LLMOps

The Agent Deployment Gap: Why Your LLM Loop Isn't Production-Ready (And What to Do About It)

Comprehensive analysis of why simple AI agent prototypes fail in production deployment, revealing the hidden complexities teams face when scaling from demos to enterprise-ready systems.

Alex Strick van Linschoten

Jul 28, 20259 mins

LLMOps

Here are the Top 7 LlamaIndex Alternatives to Build AI Production Agents

Discover the top 7 LlamaIndex alternatives to build AI production agents with ease.

Hamza Tahir

Jun 29, 202514 mins

MLOps

Understanding the AI Act: February 2025 Updates and Implications

The EU AI Act, now partially in effect as of February 2025, introduces comprehensive regulations for artificial intelligence systems with significant implications for global AI development. This landmark legislation categorizes AI systems based on risk levels - from prohibited applications to high-risk and limited-risk systems - establishing strict requirements for transparency, accountability, and compliance. The Act imposes substantial penalties for violations, up to €35 million or 7% of global turnover, and provides a clear timeline for implementation through 2027. Organizations must take immediate action to audit their AI systems, implement robust governance infrastructure, and enhance development practices to ensure compliance, with tools like ZenML offering technical solutions for meeting these regulatory requirements.

Alex Strick van Linschoten

Feb 18, 20256 mins

LLMOps

LLMOps in Production: 457 Case Studies of What Actually Works

A comprehensive overview of lessons learned from the world's largest database of LLMOps case studies (457 entries as of January 2025), examining how companies implement and deploy LLMs in production. Through nine thematic blog posts covering everything from RAG implementations to security concerns, this article synthesizes key patterns and anti-patterns in production GenAI deployments, offering practical insights for technical teams building LLM-powered applications.

Alex Strick van Linschoten

Jan 20, 202545 minutes

LLMOps

Production LLM Security: Real-world Strategies from Industry Leaders 🔐

Learn how leading companies like Dropbox, NVIDIA, and Slack tackle LLM security in production. This comprehensive guide covers practical strategies for preventing prompt injection, securing RAG systems, and implementing multi-layered defenses, based on real-world case studies from the LLMOps database. Discover battle-tested approaches to input validation, data privacy, and monitoring for building secure AI applications.

Alex Strick van Linschoten

Jan 15, 20258 mins

LLMOps

Optimizing LLM Performance and Cost: Squeezing Every Drop of Value

This comprehensive guide explores strategies for optimizing Large Language Model (LLM) deployments in production environments, focusing on maximizing performance while minimizing costs. Drawing from real-world examples and the LLMOps database, it examines three key areas: model selection and optimization techniques like knowledge distillation and quantization, inference optimization through caching and hardware acceleration, and cost optimization strategies including prompt engineering and self-hosting decisions. The article provides practical insights for technical professionals looking to balance the power of LLMs with operational efficiency.

Alex Strick van Linschoten

Jan 13, 20257 mins

LLMOps

Building Advanced Search, Retrieval, and Recommendation Systems with LLMs

Discover how embeddings power modern search and recommendation systems with LLMs, using case studies from the LLMOps Database. From RAG systems to personalized recommendations, learn key strategies and best practices for building intelligent applications that truly understand user intent and deliver relevant results.

Alex Strick van Linschoten

Dec 6, 20248 mins

LLMOps

Demystifying LLMOps: A Practical Database of Real-World Generative AI Implementations

The LLMOps Database offers a curated collection of 300+ real-world generative AI implementations, providing technical teams with practical insights into successful LLM deployments. This searchable resource includes detailed case studies, architectural decisions, and AI-generated summaries of technical presentations to help bridge the gap between demos and production systems.

Alex Strick van Linschoten

Dec 2, 20244 mins

mlstacks

Introducing mlstacks: a refreshed way to deploy MLOps infrastructure

We released an updated way to deploy MLOps infrastructure, building on the success of the `mlops-stack` repo and its stack recipes. All the new goodies are available via the `mlstacks` Python package.

Alex Strick van Linschoten

Sep 1, 20233 Mins Read

Tag: production

Don't make Claude do the same work twice

Your LangGraph agent works. Now make the workflow durable.

OpenAI Agents are great. Production still needs a runtime.

The Anatomy of a Production Coding Agent

RLMs in Production: What Happens After the Notebook

The Agent Deployment Gap: Why Your LLM Loop Isn't Production-Ready (And What to Do About It)

Here are the Top 7 LlamaIndex Alternatives to Build AI Production Agents

Understanding the AI Act: February 2025 Updates and Implications

LLMOps in Production: 457 Case Studies of What Actually Works

Production LLM Security: Real-world Strategies from Industry Leaders 🔐

Optimizing LLM Performance and Cost: Squeezing Every Drop of Value

Building Advanced Search, Retrieval, and Recommendation Systems with LLMs

Demystifying LLMOps: A Practical Database of Real-World Generative AI Implementations

Introducing mlstacks: a refreshed way to deploy MLOps infrastructure

Popular Topics