Skip to content

Pipeline Architecture

Building a Spatial Pack by hand is tedious. You need to download source data, convert between formats, reproject coordinates, clip to a bounding box, generate vector tiles, and validate the result. Doing this reliably and repeatably demands automation.

Pipelines are the build system for Spatial Packs. A single YAML file declares the data sources, processing stages, and output formats. The spatialpack CLI reads the Pipeline, resolves variables, and executes each stage in order — producing a complete, ready-to-publish pack.

A Pipeline file has three top-level sections: pipeline (name and version), pack (output pack metadata), and stages (the processing steps).

pipelines/wa-solar-feasibility-v1.yaml
pipeline:
name: wa-solar-feasibility-pack
version: "1.0"
pack:
id: "spatial.properties:wa:solar-feasibility:v1"
version: "2025.01.31"
theme: solar-feasibility
geography: wa
license: CC-BY-4.0
region:
bbox: [115.65, -32.15, 116.15, -31.65]
crs: "EPSG:4326"
index:
h3_resolution: 7
sources:
cadastre:
path: data/cadastre.gpkg
license: CC-BY-4.0
stages:
- name: convert-cadastre
action: convert.gpkg
input: cadastre
output: cadastre.parquet
options:
bbox: ${pack.region.bbox}
layer: cadastre_boundaries
layer:
id: cadastre
title: Cadastre Boundaries
type: vector
- name: tile-cadastre
action: tiles.pmtiles
input: convert-cadastre
output: cadastre.pmtiles
options:
min_zoom: 0
max_zoom: 14

Notice the ${pack.region.bbox} syntax in the bbox option. The executor resolves these variables from the pack section before running stages, so a single Pipeline can be re-targeted to a different region by changing the bounding box.

The optional sources section declares named data inputs with their file paths and license metadata. The index section configures H3 spatial indexing — a hexagonal grid system by Uber that enables efficient location-based lookups. Setting h3_resolution: 7 creates cells of approximately 5.16 km2, suitable for metro-scale analysis.

Each stage has an action field that tells the executor what to do. The platform ships with built-in actions for the most common geospatial operations:

ActionPurpose
convert.shpConvert Shapefile to GeoParquet with CRS normalization
convert.gpkgConvert GeoPackage to GeoParquet with optional bbox clipping
raster.cogProduce a Cloud Optimized GeoTIFF from raw raster data
raster.vrtMosaic multiple rasters into a virtual dataset
raster.slopeDerive slope from a digital elevation model
tiles.pmtilesGenerate PMTiles vector tile archive from GeoParquet
extract.zipExtract files from a ZIP archive
metrics.computeCompute area, perimeter, and other metrics on vector layers
hash.fileCompute BLAKE3 integrity hash for an output file

Conversion stages (convert.*) automatically apply CRS normalization, geometry validation, and metric computation when an analysis_crs is available. H3 spatial indexing — a hexagonal grid system that enables efficient spatial lookups — is applied to every GeoParquet output at the resolution defined in the index section.

The executor processes stages in declaration order. Each stage reads its inputs (either a named source or the output of a previous stage), runs the action handler, and writes the result to the output directory. When all stages complete, the executor generates a spatialpack.json manifest from the accumulated layer metadata.

If a stage fails and you passed --continue-on-error, the executor skips it and continues with the remaining stages. The final manifest includes only the layers that succeeded.

graph LR
  YAML["Pipeline YAML"] --> Parser["Parser"]
  Parser --> Vars["Variable\nInterpolation"]
  Vars --> Exec["Executor"]
  Exec --> S1["Download"]
  S1 --> S2["Convert"]
  S2 --> S3["Index + Tile"]
  S3 --> Pack["Spatial Pack"]

Variable interpolation happens once, before execution begins. The parser walks the YAML tree, replaces ${pack.region.bbox} and similar expressions with their resolved values, and hands the fully concrete pipeline to the executor.

Pipelines are designed to be deterministic. Given the same YAML file and the same source data, the executor produces identical output every time. There are no hidden timestamps, random seeds, or environment-dependent behavior.

This property makes Pipelines suitable for CI/CD. You can commit your Pipeline YAML to version control, run it in a build pipeline, and trust that the resulting Spatial Pack matches what you tested locally. If something breaks, you diff the YAML — not a chain of manual steps.

Stages can fail for many reasons: missing source data, invalid geometries, unsupported coordinate systems. The executor provides two modes for handling failures.

In strict mode (the default), the executor stops at the first failed stage and reports the error. This is appropriate for production builds where partial output is unacceptable.

In continue-on-error mode (--continue-on-error), the executor logs the failure, skips the broken stage, and continues with the remaining stages. The final manifest omits any layers from failed stages. This mode is useful during development when you want to test stages independently.

Ready to write your own Pipeline? Follow the guide or explore the reference.