vs Jupyter, Marimo, Pluto¶

Strata is closest in spirit to the new generation of reactive notebooks (Marimo, Pluto.jl) it shares the "your DAG comes from your variable references" idea. Where Strata steps further is in turning every cell output into a content-addressed artifact and treating remote compute and AI calls as first-class cell behaviors rather than escape hatches.

Capability matrix¶

Capability	Strata	Marimo	Pluto.jl	Jupyter
File format	Per-cell `.py` files + `notebook.toml` manifest	Single `.py` per notebook	Single `.jl` per notebook	JSON `.ipynb`
Git-friendly diffs	Per-cell, no embedded outputs or execution counts	Single-file but text	Single-file but text	Outputs + base64 images + execution counts embedded in the same file
Automatic DAG from variable references	Yes	Yes	Yes	No
Persistent cell-output cache	Automatic, content-addressed per cell, survives restarts	Opt-in via `mo.persistent_cache` / `mo.cache` decorators	None, Pluto guarantees the program state is described by the visible code, no hidden cache between sessions	None
Distributed / remote execution	`# @worker gpu-fly` annotation dispatches a single cell to a registered worker	Via external orchestration (e.g. SkyPilot recipe); no per-cell remote annotation	Single-process	Single-process per kernel
First-class AI/LLM cells	Prompt cells participate in the DAG and cache by template + inputs + model config	"AI-native" refers to editor-level code authoring; LLM calls happen inside Python cells	No	No
Built-in SQL cells	Yes (named connections, schema discovery, snapshot-aware caching)	Yes (built-in SQL engine)	Community library	Community extensions
Loop / iteration cells	Yes (`# @loop max_iter=N carry=var`), checkpointed per iteration	No	No	No
Variant cells (tabbed alternatives sharing a DAG slot)	Yes	No	No	No
Per-notebook Python environment	uv-managed, per notebook	uv-managed, per notebook	Julia project / Project.toml	Manual (venv / conda / kernel spec)
Headless / CI runner	`strata run` (executes the cascade in topological order)	Notebooks runnable as `python file.py`	Pluto.run for a script	`nbconvert --execute`

Where Strata is distinctive¶

Caching is automatic, not opt-in. Marimo offers persistent caching through the mo.persistent_cache context manager, the user explicitly delimits a block of code they want cached. In Strata, every cell's output is content-addressed by default: the provenance hash of source + upstream artifact hashes + environment lockfile decides cache identity, and a cache hit is the path of zero work. Re-running a notebook nobody's touched costs milliseconds.

Remote compute is a one-line annotation. Marimo can be run on a remote host (SkyPilot integration, SSH port-forwarding), but the granularity is the whole notebook process. Strata's # @worker gpu-fly annotation routes a single cell, fitting one classifier on a GPU, fingerprinting one file on a high-memory box, without rewriting the rest of the pipeline.

AI calls are first-class DAG nodes. Marimo's "AI-native" label refers to the editor's code-authoring assistance, not to LLM responses participating in the dependency graph. Strata's prompt cells render a {{ var }} template against upstream artifacts, send the result to an OpenAI-compatible API, validate against an optional JSON Schema, and store the response as a cached artifact, same caching guarantees as a Python cell. Mixing prompt and Python cells in one DAG is the point.

Variant cells are unique to Strata. Three alternative training implementations can share the same DAG slot; switching the active variant is a one-line edit in notebook.toml and downstream cells re-cascade against the new producer. The other tools require duplicating cells (and the downstream cells that read them) per variant.

Notebook commits show the work, not the runtime. Strata stores cells as one .py file per cell, notebook.toml as the manifest, and all runtime state (display outputs, console snapshots, the artifact store) in a gitignored .strata/ directory. notebook.toml's updated_at only bumps on structural edits, adding/removing cells, changing workers: so re-running a cell never touches the tracked tree. Jupyter .ipynb files JSON-encode source, outputs (base64 images and all), and execution counts in the same blob; Marimo and Pluto avoid the JSON issue with one text file per notebook but still keep all cells together.

Where other notebooks are stronger¶

Interactive UI widgets. Marimo has mo.ui.slider, mo.ui.dropdown, etc., reactive widgets the user can drag/click to update a parameter, which then propagates through the DAG. Strata doesn't have a widget layer; you change a value by editing source.
Ecosystem maturity. Jupyter's ecosystem of extensions, kernels (R, Julia, Scala, Bash, etc.), and integrations is unmatched. Strata is Python-only with an AI provider abstraction.
Reactive evaluation at the keystroke level. Pluto and Marimo immediately re-run dependent cells on edit. Strata is reactive about staleness (the DAG updates, downstream cells flip to stale on every source change) but execution is explicit, you press Run.
Hosted offerings. Google Colab, Deepnote, Hex, and Databricks Notebooks all bundle a hosted runtime; Strata is self-hosted (see the section below on where these fit).

Where the hosted offerings fit¶

Most managed notebook services are JupyterLab in a hosted wrapper. Their files are .ipynb, their kernels are IPython, and they differentiate on compute provisioning (GPUs, identity, billing) rather than on the notebook runtime itself:

Offering	Runtime	File format
Google Colab	Jupyter	`.ipynb`
Kaggle Notebooks	Jupyter	`.ipynb`
AWS SageMaker Studio	JupyterLab	`.ipynb`
Azure ML Notebooks	Jupyter / JupyterLab	`.ipynb`
Databricks Notebooks	Custom UI on IPython kernel	`.ipynb` (default), `.dbc` legacy

None of them have automatic content-addressed caching, per-cell remote dispatch, or first-class AI cells, because the underlying Jupyter runtime doesn't.

The smaller "we-rejected-Jupyter" cohort (Marimo, Observable, Deepnote, Hex) explicitly stepped away from .ipynb to redesign the runtime: reactive execution, real-time collaboration, multi-language cells, app deployment. That cohort is Strata's natural competitive set; the JupyterLab-wrapper hosted offerings are an orthogonal category whose moat is compute provisioning, not notebook-engine innovation.

When to pick Strata¶

Strata is the right fit when your notebook is:

Expensive to recompute: model training, embeddings, large scans, long LLM chains. The automatic cache pays for itself the first time you reload.
Heterogeneous in compute: some cells want a GPU, some want a warehouse, some are pure CPU. The # @worker annotation routes each cell to where it should run.
Iterative and branching: variant cells let you keep three model candidates in one notebook without forking.
Version-controlled with others: plain text, no JSON-in-git pain, no execution-count churn on every re-run.
AI-heavy: prompt cells make LLM responses cacheable like any other artifact, with schema-constrained output and retry-on-validation.

For light interactive exploration where the work is a few seconds per cell, you're not really paying for what Strata gives you, Jupyter and Marimo are fine. The value lands when your work is too expensive to re-run on every refresh.