Build a Pipeline from YAML
Building a Spatial Pack by hand — converting files, reprojecting coordinates, generating tiles, writing manifests — is tedious and error-prone. Pipelines automate the entire workflow. Define your sources, processing stages, and output metadata in a single YAML file, and the spatialpack CLI executes it reproducibly.
This recipe walks through writing a Pipeline YAML from scratch and building a complete Spatial Pack.
Prerequisites
Section titled “Prerequisites”spatialpackCLI installed (pip install -e ".[full]")- Source data files (Shapefiles, GeoJSON, GeoPackage, or GeoTIFF)
1. Create the project structure
Section titled “1. Create the project structure”Set up a directory for your pack with separate folders for source data and output.
mkdir -p my-pack/dataDirectorymy-pack/
Directorydata/
- sites.geojson (your source data)
- pipeline.yaml (you will create this next)
What just happened: You created a clean workspace. Source data goes in data/, and the Pipeline YAML lives at the project root. The build output will go in a separate directory that you specify at build time.
2. Define pack metadata
Section titled “2. Define pack metadata”Create pipeline.yaml and start with the pipeline and pack sections. These describe what the pack is and where it covers.
pipeline: name: my-sites-pack version: "1.0"
pack: id: "my-org:demo:sites:v1" version: "1.0.0" title: "Development Sites Pack" theme: site-selection geography: demo license: CC-BY-4.0 region: bbox: [115.83, -31.99, 115.90, -31.93] crs: "EPSG:4326"What just happened: The pack section sets the identity and spatial extent of your Spatial Pack. The bbox defines the bounding box (west, south, east, north in WGS84 coordinates), and crs sets the storage coordinate system. The id follows a colon-delimited naming convention: org:region:theme:version.
3. Declare data sources
Section titled “3. Declare data sources”Add a sources section that names each input dataset with its file path and license.
sources: sites: path: "data/sites.geojson" license: CC-BY-4.0 format: geojsonWhat just happened: You declared a named source called sites. Stages reference this name (not the file path) when they need input data. The license metadata ensures governance tracking — every layer in the final pack traces back to a licensed source.
4. Add processing stages
Section titled “4. Add processing stages”Stages are the core of a Pipeline. Each stage has an action (what to do), an input (where to read from), an output (where to write), and optional options.
Add three stages: convert the source data to GeoParquet, compute area metrics, and generate vector tiles for map rendering.
stages: - name: convert_sites action: convert.shp input: sites output: "layers/sites.parquet" options: crs: EPSG:4326 bbox: ${pack.region.bbox} layer: id: sites title: "Development Sites" type: vector
- name: compute_metrics action: metrics.compute input: convert_sites depends_on: [convert_sites] output: "layers/sites.parquet"
- name: tile_sites action: tiles.pmtiles input: convert_sites depends_on: [convert_sites] output: "layers/sites.pmtiles" options: min_zoom: 4 max_zoom: 14What just happened: You defined a three-stage pipeline. The convert_sites stage reads the GeoJSON source and writes GeoParquet. The compute_metrics stage adds area and perimeter columns. The tile_sites stage generates PMTiles for web map rendering.
Notice the ${pack.region.bbox} variable in the bbox option. The executor resolves this from the pack.region.bbox value before running the stage, so the same Pipeline works for different regions by changing only the pack metadata.
5. Understand the pipeline flow
Section titled “5. Understand the pipeline flow”Here is how the stages connect:
graph LR Source["GeoJSON\nSource"] --> Convert["convert.shp"] Convert --> Metrics["metrics.compute"] Convert --> Tiles["tiles.pmtiles"] Metrics --> Pack["Spatial Pack"] Tiles --> Pack
The convert stage feeds two downstream stages: metrics and tiles. Both run after the conversion completes, and their outputs are bundled into the final Spatial Pack alongside the generated manifest.
6. Build the pack
Section titled “6. Build the pack”Run spatialpack pack build with the Pipeline YAML and an output directory.
spatialpack pack build pipeline.yaml -o ./outputExpected output:
Building pack from pipeline.yaml... [1/3] convert_sites: GeoJSON -> GeoParquet (0.3s) [2/3] compute_metrics: area, perimeter added (0.1s) [3/3] tile_sites: GeoParquet -> PMTiles (1.2s)
Pack built successfully! Output: ./output/ Manifest: ./output/spatialpack.json Layers: 2 (1 vector + 1 tileset)What just happened: The executor processed all three stages in dependency order. It resolved variables, ran each action handler, and generated a spatialpack.json manifest from the accumulated layer metadata. The output directory now contains a complete Spatial Pack.
7. Inspect the output
Section titled “7. Inspect the output”Check the built pack structure and verify the manifest.
spatialpack pack info ./outputDirectoryoutput/
- spatialpack.json
Directorylayers/
- sites.parquet
- sites.pmtiles
What just happened: The pack directory contains the manifest and two layer files. The manifest records every layer’s ID, title, type, and file path. This manifest is the entry point for any consumer loading the pack.
8. Validate before publishing
Section titled “8. Validate before publishing”Run validation to confirm the pack meets the schema requirements.
spatialpack validate ./output/What just happened: The validator checked the manifest against the Spatial Pack schema, confirmed all declared layer files exist, and verified integrity. A passing validation means the pack is ready to publish to CDN.
9. Use dry run for debugging
Section titled “9. Use dry run for debugging”During development, validate your Pipeline without executing stages.
spatialpack pack build pipeline.yaml --dry-runWhat just happened: The CLI parsed the YAML, resolved all variables, and validated the stage configuration — but did not run any action handlers. This catches configuration errors (missing sources, invalid action names, unresolved variables) before committing to a full build.
Real-world example
Section titled “Real-world example”The wa-solar-feasibility-v1.yaml Pipeline in the Spatial.Properties repository demonstrates a production Pipeline with 14 stages across raster processing, vector conversion, and tile generation. It processes DEM data, bushfire zones, road networks, energy infrastructure, and cadastre boundaries into a single Spatial Pack for solar site selection.
Key patterns from the real-world Pipeline:
- Multiple raster sources mosaic into a VRT before clipping and deriving slope/aspect
- Variable interpolation (
${pack.region.bbox}) clips every layer to the same region - Intermediate stages (no
layerblock) produce files that feed downstream stages but do not appear in the final manifest - Layer metadata in each stage’s
layerblock flows into the manifest automatically
See Pipeline Architecture for a detailed explanation of execution order, variable resolution, and error handling.
Next steps
Section titled “Next steps”- Pipeline Architecture — How the executor processes stages, resolves variables, and handles errors
- Publish a Pack to CDN — Upload your built pack for web-based consumption
- CLI pack reference — Full options for
spatialpack pack build