continuum

Unified execution runtime for LLM and ML programs

One runtime for inference, retrieval, and pipeline orchestration.

View on GitHub↗

Demo

terminal

$ continuum run pipeline.ct

> loading program...

> compiling execution graph...

> step 1/3 embed(corpus)

> step 2/3 retrieve(query, k=5)

> step 3/3 generate(prompt + context)

✓ pipeline completed in 1.2s

Problem

LLM pipelines stitch together Python scripts, inference servers, vector stores, and orchestration layers — each with its own runtime and process boundary. The glue code is fragile and the overhead is real.

What it is

Continuum is a C++ runtime that executes LLM and ML programs as compiled execution graphs. You describe a pipeline once; the runtime schedules and dispatches kernels with no interpreter in the loop.

embed, retrieve, generate — as first-class operations
static graph compilation catches errors before execution
single binary, no orchestration daemons

Features

Unified Runtime

LLM inference and ML ops in one execution environment.

Pipeline Composition

Chain embedding, retrieval, and generation as typed steps.

Native Performance

C++ core — no interpreter overhead on the critical path.

Typed Programs

Static execution graphs catch shape errors before runtime.

Execution model

Program (.ct)

Graph Compiler

Scheduler

Kernel Dispatch

Result

Example program

continuum run pipeline.ct --input query.txt

Why I Built This

Every LLM pipeline I built ended up as a mess of subprocess calls and HTTP clients. The model was fast; the glue was slow.

Continuum compiles the whole pipeline into a single execution graph — one runtime, no boundaries.

Repository

View on GitHub