Skip to content

S3 Mount — reading a public bucket from a notebook

Demonstrates Strata's mount feature: a notebook cell can read a remote filesystem path as if it were local, with no custom code inside the cell.

The example mounts a single file from NOAA's GSOD (Global Surface Summary of Day) dataset — a public, anonymous-readable S3 bucket — and runs a small pandas analysis over it.

What it shows

  • Declaring a mount in notebook.toml under [[mounts]].
  • Using fsspec options to authenticate (here, anon = true for a public bucket).
  • Accessing the mounted path as a pathlib.Path inside a cell.
  • Zero AWS credentials required — anon = true tells fsspec to skip the credential chain.

Mount declaration

[[mounts]]
name = "jfk_weather"
uri = "s3://noaa-gsod-pds/2024/72503014732.csv"
mode = "ro"
options = { anon = true }

Inside a cell, jfk_weather is a pathlib.Path; the CSV is materialized locally by fsspec on first read.

Cells

Cell What it does
load Reads the weather CSV into a DataFrame, parses dates, keeps the columns we need.
summary Groups by month and aggregates avg / max / min / total-precip.

Running

From the project root:

uv run strata-server --host 127.0.0.1 --port 8765

Then open examples/s3_mount from the Strata home page.

Swapping in a private bucket

Drop the options = { anon = true } line and configure AWS credentials the normal way (aws configure, AWS_PROFILE, IAM role, etc.). fsspec will pick them up automatically.

Load JFK weather from S3

kind python

# @name Load JFK weather from S3
import pandas as pd

weather = pd.read_csv(jfk_weather / "72503014732.csv")
weather["DATE"] = pd.to_datetime(weather["DATE"])
weather = weather[["DATE", "NAME", "TEMP", "MAX", "MIN", "PRCP"]]
weather

Monthly weather summary

kind python

# @name Monthly weather summary
monthly = (
    weather.assign(month=weather["DATE"].dt.to_period("M").astype(str))
    .groupby("month", as_index=False)
    .agg(
        avg_temp=("TEMP", "mean"),
        max_temp=("MAX", "max"),
        min_temp=("MIN", "min"),
        total_precip=("PRCP", "sum"),
    )
    .round(2)
)
monthly