Pipeline Architecture
Building a Spatial Pack by hand is tedious. You need to download source data, convert between formats, reproject coordinates, clip to a bounding box, generate vector tiles, and validate the result. Doing this reliably and repeatably demands automation.
Pipelines are the build system for Spatial Packs. A single YAML file declares the data sources, processing stages, and output formats. The spatialpack CLI reads the Pipeline, resolves variables, and executes each stage in order — producing a complete, ready-to-publish pack.
Pipeline YAML format
Section titled “Pipeline YAML format”A Pipeline file has three top-level sections: pipeline (name and version), pack (output pack metadata), and stages (the processing steps).
pipeline: name: wa-solar-feasibility-pack version: "1.0"
pack: id: "spatial.properties:wa:solar-feasibility:v1" version: "2025.01.31" theme: solar-feasibility geography: wa license: CC-BY-4.0 region: bbox: [115.65, -32.15, 116.15, -31.65] crs: "EPSG:4326"
index: h3_resolution: 7
sources: cadastre: path: data/cadastre.gpkg license: CC-BY-4.0
stages: - name: convert-cadastre action: convert.gpkg input: cadastre output: cadastre.parquet options: bbox: ${pack.region.bbox} layer: cadastre_boundaries layer: id: cadastre title: Cadastre Boundaries type: vector
- name: tile-cadastre action: tiles.pmtiles input: convert-cadastre output: cadastre.pmtiles options: min_zoom: 0 max_zoom: 14# Build a Spatial Pack from a Pipelinespatialpack pack build pipelines/wa-solar-feasibility-v1.yaml \ -o ./output/wa-solar-pack
# Validate without executing (dry run)spatialpack pack build pipelines/wa-solar-feasibility-v1.yaml \ --dry-runNotice the ${pack.region.bbox} syntax in the bbox option. The executor resolves these variables from the pack section before running stages, so a single Pipeline can be re-targeted to a different region by changing the bounding box.
The optional sources section declares named data inputs with their file paths and license metadata. The index section configures H3 spatial indexing — a hexagonal grid system by Uber that enables efficient location-based lookups. Setting h3_resolution: 7 creates cells of approximately 5.16 km2, suitable for metro-scale analysis.
Stage types
Section titled “Stage types”Each stage has an action field that tells the executor what to do. The platform ships with built-in actions for the most common geospatial operations:
| Action | Purpose |
|---|---|
convert.shp | Convert Shapefile to GeoParquet with CRS normalization |
convert.gpkg | Convert GeoPackage to GeoParquet with optional bbox clipping |
raster.cog | Produce a Cloud Optimized GeoTIFF from raw raster data |
raster.vrt | Mosaic multiple rasters into a virtual dataset |
raster.slope | Derive slope from a digital elevation model |
tiles.pmtiles | Generate PMTiles vector tile archive from GeoParquet |
extract.zip | Extract files from a ZIP archive |
metrics.compute | Compute area, perimeter, and other metrics on vector layers |
hash.file | Compute BLAKE3 integrity hash for an output file |
Conversion stages (convert.*) automatically apply CRS normalization, geometry validation, and metric computation when an analysis_crs is available. H3 spatial indexing — a hexagonal grid system that enables efficient spatial lookups — is applied to every GeoParquet output at the resolution defined in the index section.
Execution model
Section titled “Execution model”The executor processes stages in declaration order. Each stage reads its inputs (either a named source or the output of a previous stage), runs the action handler, and writes the result to the output directory. When all stages complete, the executor generates a spatialpack.json manifest from the accumulated layer metadata.
If a stage fails and you passed --continue-on-error, the executor skips it and continues with the remaining stages. The final manifest includes only the layers that succeeded.
graph LR YAML["Pipeline YAML"] --> Parser["Parser"] Parser --> Vars["Variable\nInterpolation"] Vars --> Exec["Executor"] Exec --> S1["Download"] S1 --> S2["Convert"] S2 --> S3["Index + Tile"] S3 --> Pack["Spatial Pack"]
Variable interpolation happens once, before execution begins. The parser walks the YAML tree, replaces ${pack.region.bbox} and similar expressions with their resolved values, and hands the fully concrete pipeline to the executor.
Deterministic builds
Section titled “Deterministic builds”Pipelines are designed to be deterministic. Given the same YAML file and the same source data, the executor produces identical output every time. There are no hidden timestamps, random seeds, or environment-dependent behavior.
This property makes Pipelines suitable for CI/CD. You can commit your Pipeline YAML to version control, run it in a build pipeline, and trust that the resulting Spatial Pack matches what you tested locally. If something breaks, you diff the YAML — not a chain of manual steps.
Error handling
Section titled “Error handling”Stages can fail for many reasons: missing source data, invalid geometries, unsupported coordinate systems. The executor provides two modes for handling failures.
In strict mode (the default), the executor stops at the first failed stage and reports the error. This is appropriate for production builds where partial output is unacceptable.
In continue-on-error mode (--continue-on-error), the executor logs the failure, skips the broken stage, and continues with the remaining stages. The final manifest omits any layers from failed stages. This mode is useful during development when you want to test stages independently.
Next steps
Section titled “Next steps”Ready to write your own Pipeline? Follow the guide or explore the reference.
- Write your first Pipeline — step-by-step walkthrough
- CLI build command reference — all options for
spatialpack pack build