Pipeline Architecture

Building a Spatial Pack by hand is tedious. You need to download source data, convert between formats, reproject coordinates, clip to a bounding box, generate vector tiles, and validate the result. Doing this reliably and repeatably demands automation.

Pipelines are the build system for Spatial Packs. A single YAML file declares the data sources, processing stages, and output formats. The spatialpack CLI reads the Pipeline, resolves variables, and executes each stage in order — producing a complete, ready-to-publish pack.

Pipeline YAML format

A Pipeline file has three top-level sections: pipeline (name and version), pack (output pack metadata), and stages (the processing steps).

YAML
CLI

pipeline:
  name: wa-solar-feasibility-pack
  version: "1.0"

pack:
  id: "spatial.properties:wa:solar-feasibility:v1"
  version: "2025.01.31"
  theme: solar-feasibility
  geography: wa
  license: CC-BY-4.0
  region:
    bbox: [115.65, -32.15, 116.15, -31.65]
    crs: "EPSG:4326"

index:
  h3_resolution: 7

sources:
  cadastre:
    path: data/cadastre.gpkg
    license: CC-BY-4.0

stages:
  - name: convert-cadastre
    action: convert.gpkg
    input: cadastre
    output: cadastre.parquet
    options:
      bbox: ${pack.region.bbox}
      layer: cadastre_boundaries
    layer:
      id: cadastre
      title: Cadastre Boundaries
      type: vector

  - name: tile-cadastre
    action: tiles.pmtiles
    input: convert-cadastre
    output: cadastre.pmtiles
    options:
      min_zoom: 0
      max_zoom: 14

# Build a Spatial Pack from a Pipeline
spatialpack pack build pipelines/wa-solar-feasibility-v1.yaml \
  -o ./output/wa-solar-pack

# Validate without executing (dry run)
spatialpack pack build pipelines/wa-solar-feasibility-v1.yaml \
  --dry-run

Notice the ${pack.region.bbox} syntax in the bbox option. The executor resolves these variables from the pack section before running stages, so a single Pipeline can be re-targeted to a different region by changing the bounding box.

The optional sources section declares named data inputs with their file paths and license metadata. The index section configures H3 spatial indexing — a hexagonal grid system by Uber that enables efficient location-based lookups. Setting h3_resolution: 7 creates cells of approximately 5.16 km2, suitable for metro-scale analysis.

Stage types

Each stage has an action field that tells the executor what to do. The platform ships with built-in actions for the most common geospatial operations:

Action	Purpose
`convert.shp`	Convert Shapefile to GeoParquet with CRS normalization
`convert.gpkg`	Convert GeoPackage to GeoParquet with optional bbox clipping
`raster.cog`	Produce a Cloud Optimized GeoTIFF from raw raster data
`raster.vrt`	Mosaic multiple rasters into a virtual dataset
`raster.slope`	Derive slope from a digital elevation model
`tiles.pmtiles`	Generate PMTiles vector tile archive from GeoParquet
`extract.zip`	Extract files from a ZIP archive
`metrics.compute`	Compute area, perimeter, and other metrics on vector layers
`hash.file`	Compute BLAKE3 integrity hash for an output file

Conversion stages (convert.*) automatically apply CRS normalization, geometry validation, and metric computation when an analysis_crs is available. H3 spatial indexing — a hexagonal grid system that enables efficient spatial lookups — is applied to every GeoParquet output at the resolution defined in the index section.

Execution model

The executor processes stages in declaration order. Each stage reads its inputs (either a named source or the output of a previous stage), runs the action handler, and writes the result to the output directory. When all stages complete, the executor generates a spatialpack.json manifest from the accumulated layer metadata.

If a stage fails and you passed --continue-on-error, the executor skips it and continues with the remaining stages. The final manifest includes only the layers that succeeded.

graph LR
  YAML["Pipeline YAML"] --> Parser["Parser"]
  Parser --> Vars["Variable\nInterpolation"]
  Vars --> Exec["Executor"]
  Exec --> S1["Download"]
  S1 --> S2["Convert"]
  S2 --> S3["Index + Tile"]
  S3 --> Pack["Spatial Pack"]

Variable interpolation happens once, before execution begins. The parser walks the YAML tree, replaces ${pack.region.bbox} and similar expressions with their resolved values, and hands the fully concrete pipeline to the executor.

Deterministic builds

Pipelines are designed to be deterministic. Given the same YAML file and the same source data, the executor produces identical output every time. There are no hidden timestamps, random seeds, or environment-dependent behavior.

This property makes Pipelines suitable for CI/CD. You can commit your Pipeline YAML to version control, run it in a build pipeline, and trust that the resulting Spatial Pack matches what you tested locally. If something breaks, you diff the YAML — not a chain of manual steps.

Error handling

Stages can fail for many reasons: missing source data, invalid geometries, unsupported coordinate systems. The executor provides two modes for handling failures.

In strict mode (the default), the executor stops at the first failed stage and reports the error. This is appropriate for production builds where partial output is unacceptable.

In continue-on-error mode (--continue-on-error), the executor logs the failure, skips the broken stage, and continues with the remaining stages. The final manifest omits any layers from failed stages. This mode is useful during development when you want to test stages independently.

Next steps

Ready to write your own Pipeline? Follow the guide or explore the reference.

Write your first Pipeline — step-by-step walkthrough
CLI build command reference — all options for spatialpack pack build