Skip to content

Build a Pipeline from YAML

Building a Spatial Pack by hand — converting files, reprojecting coordinates, generating tiles, writing manifests — is tedious and error-prone. Pipelines automate the entire workflow. Define your sources, processing stages, and output metadata in a single YAML file, and the spatialpack CLI executes it reproducibly.

This recipe walks through writing a Pipeline YAML from scratch and building a complete Spatial Pack.

  • spatialpack CLI installed (pip install -e ".[full]")
  • Source data files (Shapefiles, GeoJSON, GeoPackage, or GeoTIFF)

Set up a directory for your pack with separate folders for source data and output.

Terminal window
mkdir -p my-pack/data
  • Directorymy-pack/
    • Directorydata/
      • sites.geojson (your source data)
    • pipeline.yaml (you will create this next)

What just happened: You created a clean workspace. Source data goes in data/, and the Pipeline YAML lives at the project root. The build output will go in a separate directory that you specify at build time.

Create pipeline.yaml and start with the pipeline and pack sections. These describe what the pack is and where it covers.

pipeline.yaml
pipeline:
name: my-sites-pack
version: "1.0"
pack:
id: "my-org:demo:sites:v1"
version: "1.0.0"
title: "Development Sites Pack"
theme: site-selection
geography: demo
license: CC-BY-4.0
region:
bbox: [115.83, -31.99, 115.90, -31.93]
crs: "EPSG:4326"

What just happened: The pack section sets the identity and spatial extent of your Spatial Pack. The bbox defines the bounding box (west, south, east, north in WGS84 coordinates), and crs sets the storage coordinate system. The id follows a colon-delimited naming convention: org:region:theme:version.

Add a sources section that names each input dataset with its file path and license.

pipeline.yaml
sources:
sites:
path: "data/sites.geojson"
license: CC-BY-4.0
format: geojson

What just happened: You declared a named source called sites. Stages reference this name (not the file path) when they need input data. The license metadata ensures governance tracking — every layer in the final pack traces back to a licensed source.

Stages are the core of a Pipeline. Each stage has an action (what to do), an input (where to read from), an output (where to write), and optional options.

Add three stages: convert the source data to GeoParquet, compute area metrics, and generate vector tiles for map rendering.

pipeline.yaml
stages:
- name: convert_sites
action: convert.shp
input: sites
output: "layers/sites.parquet"
options:
crs: EPSG:4326
bbox: ${pack.region.bbox}
layer:
id: sites
title: "Development Sites"
type: vector
- name: compute_metrics
action: metrics.compute
input: convert_sites
depends_on: [convert_sites]
output: "layers/sites.parquet"
- name: tile_sites
action: tiles.pmtiles
input: convert_sites
depends_on: [convert_sites]
output: "layers/sites.pmtiles"
options:
min_zoom: 4
max_zoom: 14

What just happened: You defined a three-stage pipeline. The convert_sites stage reads the GeoJSON source and writes GeoParquet. The compute_metrics stage adds area and perimeter columns. The tile_sites stage generates PMTiles for web map rendering.

Notice the ${pack.region.bbox} variable in the bbox option. The executor resolves this from the pack.region.bbox value before running the stage, so the same Pipeline works for different regions by changing only the pack metadata.

Here is how the stages connect:

graph LR
  Source["GeoJSON\nSource"] --> Convert["convert.shp"]
  Convert --> Metrics["metrics.compute"]
  Convert --> Tiles["tiles.pmtiles"]
  Metrics --> Pack["Spatial Pack"]
  Tiles --> Pack

The convert stage feeds two downstream stages: metrics and tiles. Both run after the conversion completes, and their outputs are bundled into the final Spatial Pack alongside the generated manifest.

Run spatialpack pack build with the Pipeline YAML and an output directory.

Terminal window
spatialpack pack build pipeline.yaml -o ./output

Expected output:

Building pack from pipeline.yaml...
[1/3] convert_sites: GeoJSON -> GeoParquet (0.3s)
[2/3] compute_metrics: area, perimeter added (0.1s)
[3/3] tile_sites: GeoParquet -> PMTiles (1.2s)
Pack built successfully!
Output: ./output/
Manifest: ./output/spatialpack.json
Layers: 2 (1 vector + 1 tileset)

What just happened: The executor processed all three stages in dependency order. It resolved variables, ran each action handler, and generated a spatialpack.json manifest from the accumulated layer metadata. The output directory now contains a complete Spatial Pack.

Check the built pack structure and verify the manifest.

Terminal window
spatialpack pack info ./output
  • Directoryoutput/
    • spatialpack.json
    • Directorylayers/
      • sites.parquet
      • sites.pmtiles

What just happened: The pack directory contains the manifest and two layer files. The manifest records every layer’s ID, title, type, and file path. This manifest is the entry point for any consumer loading the pack.

Run validation to confirm the pack meets the schema requirements.

Terminal window
spatialpack validate ./output/

What just happened: The validator checked the manifest against the Spatial Pack schema, confirmed all declared layer files exist, and verified integrity. A passing validation means the pack is ready to publish to CDN.

During development, validate your Pipeline without executing stages.

Terminal window
spatialpack pack build pipeline.yaml --dry-run

What just happened: The CLI parsed the YAML, resolved all variables, and validated the stage configuration — but did not run any action handlers. This catches configuration errors (missing sources, invalid action names, unresolved variables) before committing to a full build.

The wa-solar-feasibility-v1.yaml Pipeline in the Spatial.Properties repository demonstrates a production Pipeline with 14 stages across raster processing, vector conversion, and tile generation. It processes DEM data, bushfire zones, road networks, energy infrastructure, and cadastre boundaries into a single Spatial Pack for solar site selection.

Key patterns from the real-world Pipeline:

  • Multiple raster sources mosaic into a VRT before clipping and deriving slope/aspect
  • Variable interpolation (${pack.region.bbox}) clips every layer to the same region
  • Intermediate stages (no layer block) produce files that feed downstream stages but do not appear in the final manifest
  • Layer metadata in each stage’s layer block flows into the manifest automatically

See Pipeline Architecture for a detailed explanation of execution order, variable resolution, and error handling.