Open Source

Get started.

ZenML runs your ML and data pipelines. Kitaru keeps agents alive across crashes, waits, and retries. Plenty of teams run both.

Build your first pipeline

Install ZenML

Get ZenML up and running in minutes. You just need to install it

pip install 'zenml[local]'

Track inputs and outputs

Wire two steps into a training pipeline — ZenML tracks every input and output as a versioned artifact:

from sklearn.base import ClassifierMixin
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from zenml import step, pipeline


@step
def load_data() -> tuple[list, list]:
    X, y = load_iris(return_X_y=True)
    return X, y


@step
def train_model(X: list, y: list) -> ClassifierMixin:
    # The returned model is versioned + tracked as an artifact automatically.
    return SVC().fit(X, y)


@pipeline
def training_pipeline():
    X, y = load_data()
    train_model(X, y)


if __name__ == "__main__":
    training_pipeline()

Run your pipeline locally

Run it locally. The pipeline executes, artifacts are versioned, and the run shows up in your dashboard.

python run.py

ZenML Architecture

Built on a Robust Client-Server Architecture

ZenML is a metadata layer on top of your existing infrastructure, meaning all data and compute stays on your side.

ZenML system architecture diagram showing connections between five main components: ZenML Client (Development Environment), ZenML Server, Database, MLOps Infrastructure (Cloud, Kubernetes, on-prem), and MLOps Tools (Experiment tracker, model deployer)

Pause for a human, survive a crash

Install Kitaru

Get Kitaru up and running in minutes. You just need to install it.

pip install kitaru

Add a human-in-the-loop gate

Wrap an agent you already have, checkpoint the expensive calls, and wait for a human before anything ships:

from kitaru import flow, wait
from kitaru.adapters.pydantic_ai import KitaruAgent
from pydantic_ai import Agent

# KitaruAgent checkpoints every model + tool call for you.
agent = KitaruAgent(Agent("openai:gpt-5.4", system_prompt="You draft customer replies."))

@flow
def support_flow(ticket: str) -> str:
    reply = agent.run_sync(f"Draft a reply to: {ticket}").output
    approved = wait(schema=bool, question=f"Send this?

{reply}")
    return reply if approved else "escalated to a human"

if __name__ == "__main__":
    print(support_flow.run("my invoice is wrong").wait())

Run it, walk away, resume it

Run it. It pauses at the approval gate and releases compute. Answer hours later and it resumes from where it stopped — no idle container, no lost work.

python flow.py
EXECUTION ARCHITECTURE

The brain and the hand should not have to share a process.

The runner owns durable control flow — checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work: inline, in an isolated job, inside a sandbox, or through an external tool. Checkpoints are the contract between them.

Inner Harness
OpenAI Agents SDK Anthropic Agent SDK PydanticAI LangGraph Raw Python
Model loop
Kitaru SDK
@flow @checkpoint wait() llm() log() save() load() configure()
Outer runtime primitives
Kitaru Runtime
Checkpoint order Replay Resume Wait Artifacts & state Versions
Durable brain
Execution Targets
Inline Isolated job Sandbox External / MCP tool Custom backend
Hands that do work
Your Infrastructure
Kubernetes AWS / GCP / Azure S3 / GCS SQL Database
Self-hosted substrate
agent.py
import kitaru
from kitaru import flow, checkpoint

@flow
def coding_agent(issue: str) -> str:
    plan = analyze_issue(issue)
    patch = write_code(plan)

    # Pauses. Resumes when input arrives.
    approved = kitaru.wait(
        bool, question="Merge this PR?"
    )
    if approved:
        merge(patch)
    return patch

Ready for the next level?

Run ML pipelines or agent flows on a managed control plane. RBAC, audit logs, and dedicated support.