vs Jupyter, Marimo, Pluto¶
Strata is closest in spirit to the new generation of reactive notebooks (Marimo, Pluto.jl) it shares the "your DAG comes from your variable references" idea. Where Strata steps further is in turning every cell output into a content-addressed artifact and treating remote compute and AI calls as first-class cell behaviors rather than escape hatches.
Capability matrix¶
| Capability | Strata | Marimo | Pluto.jl | Jupyter |
|---|---|---|---|---|
| File format | Per-cell .py files + notebook.toml manifest |
Single .py per notebook |
Single .jl per notebook |
JSON .ipynb |
| Git-friendly diffs | Per-cell, no embedded outputs or execution counts | Single-file but text | Single-file but text | Outputs + base64 images + execution counts embedded in the same file |
| Automatic DAG from variable references | Yes | Yes | Yes | No |
| Persistent cell-output cache | Automatic, content-addressed per cell, survives restarts | Opt-in via mo.persistent_cache / mo.cache decorators |
None, Pluto guarantees the program state is described by the visible code, no hidden cache between sessions | None |
| Distributed / remote execution | # @worker gpu-fly annotation dispatches a single cell to a registered worker |
Via external orchestration (e.g. SkyPilot recipe); no per-cell remote annotation | Single-process | Single-process per kernel |
| First-class AI/LLM cells | Prompt cells participate in the DAG and cache by template + inputs + model config | "AI-native" refers to editor-level code authoring; LLM calls happen inside Python cells | No | No |
| Built-in SQL cells | Yes (named connections, schema discovery, snapshot-aware caching) | Yes (built-in SQL engine) | Community library | Community extensions |
| Loop / iteration cells | Yes (# @loop max_iter=N carry=var), checkpointed per iteration |
No | No | No |
| Variant cells (tabbed alternatives sharing a DAG slot) | Yes | No | No | No |
| Per-notebook Python environment | uv-managed, per notebook | uv-managed, per notebook | Julia project / Project.toml | Manual (venv / conda / kernel spec) |
| Headless / CI runner | strata run (executes the cascade in topological order) |
Notebooks runnable as python file.py |
Pluto.run for a script | nbconvert --execute |
Where Strata is distinctive¶
Caching is automatic, not opt-in. Marimo offers persistent caching
through the mo.persistent_cache context manager, the user explicitly
delimits a block of code they want cached. In Strata, every cell's output
is content-addressed by default: the provenance hash of source + upstream
artifact hashes + environment lockfile decides cache identity, and a cache
hit is the path of zero work. Re-running a notebook nobody's touched costs
milliseconds.
Remote compute is a one-line annotation. Marimo can be run on a remote
host (SkyPilot integration, SSH port-forwarding), but the granularity is
the whole notebook process. Strata's # @worker gpu-fly annotation routes
a single cell, fitting one classifier on a GPU, fingerprinting one file
on a high-memory box, without rewriting the rest of the pipeline.
AI calls are first-class DAG nodes. Marimo's "AI-native" label refers
to the editor's code-authoring assistance, not to LLM responses
participating in the dependency graph. Strata's prompt cells render a
{{ var }} template against upstream artifacts, send the result to an
OpenAI-compatible API, validate against an optional JSON Schema, and store
the response as a cached artifact, same caching guarantees as a Python
cell. Mixing prompt and Python cells in one DAG is the point.
Variant cells are unique to Strata. Three alternative training
implementations can share the same DAG slot; switching the active variant
is a one-line edit in notebook.toml and downstream cells re-cascade
against the new producer. The other tools require duplicating cells (and
the downstream cells that read them) per variant.
Notebook commits show the work, not the runtime. Strata stores cells
as one .py file per cell, notebook.toml as the manifest, and all
runtime state (display outputs, console snapshots, the artifact store) in
a gitignored .strata/ directory. notebook.toml's updated_at only
bumps on structural edits, adding/removing cells, changing workers:
so re-running a cell never touches the tracked tree. Jupyter .ipynb
files JSON-encode source, outputs (base64 images and all), and execution
counts in the same blob; Marimo and Pluto avoid the JSON issue with one
text file per notebook but still keep all cells together.
Where other notebooks are stronger¶
- Interactive UI widgets. Marimo has
mo.ui.slider,mo.ui.dropdown, etc., reactive widgets the user can drag/click to update a parameter, which then propagates through the DAG. Strata doesn't have a widget layer; you change a value by editing source. - Ecosystem maturity. Jupyter's ecosystem of extensions, kernels (R, Julia, Scala, Bash, etc.), and integrations is unmatched. Strata is Python-only with an AI provider abstraction.
- Reactive evaluation at the keystroke level. Pluto and Marimo immediately re-run dependent cells on edit. Strata is reactive about staleness (the DAG updates, downstream cells flip to stale on every source change) but execution is explicit, you press Run.
- Hosted offerings. Google Colab, Deepnote, Hex, and Databricks Notebooks all bundle a hosted runtime; Strata is self-hosted (see the section below on where these fit).
Where the hosted offerings fit¶
Most managed notebook services are JupyterLab in a hosted wrapper. Their
files are .ipynb, their kernels are IPython, and they differentiate on
compute provisioning (GPUs, identity, billing) rather than on the
notebook runtime itself:
| Offering | Runtime | File format |
|---|---|---|
| Google Colab | Jupyter | .ipynb |
| Kaggle Notebooks | Jupyter | .ipynb |
| AWS SageMaker Studio | JupyterLab | .ipynb |
| Azure ML Notebooks | Jupyter / JupyterLab | .ipynb |
| Databricks Notebooks | Custom UI on IPython kernel | .ipynb (default), .dbc legacy |
None of them have automatic content-addressed caching, per-cell remote dispatch, or first-class AI cells, because the underlying Jupyter runtime doesn't.
The smaller "we-rejected-Jupyter" cohort (Marimo, Observable, Deepnote,
Hex) explicitly stepped away from .ipynb to redesign the runtime:
reactive execution, real-time collaboration, multi-language cells, app
deployment. That cohort is Strata's natural competitive set; the
JupyterLab-wrapper hosted offerings are an orthogonal category whose
moat is compute provisioning, not notebook-engine innovation.
When to pick Strata¶
Strata is the right fit when your notebook is:
- Expensive to recompute: model training, embeddings, large scans, long LLM chains. The automatic cache pays for itself the first time you reload.
- Heterogeneous in compute: some cells want a GPU, some want a
warehouse, some are pure CPU. The
# @workerannotation routes each cell to where it should run. - Iterative and branching: variant cells let you keep three model candidates in one notebook without forking.
- Version-controlled with others: plain text, no JSON-in-git pain, no execution-count churn on every re-run.
- AI-heavy: prompt cells make LLM responses cacheable like any other artifact, with schema-constrained output and retry-on-validation.
For light interactive exploration where the work is a few seconds per cell, you're not really paying for what Strata gives you, Jupyter and Marimo are fine. The value lands when your work is too expensive to re-run on every refresh.