Skip to content

Pipeline YAML Reference

A Pipeline YAML file defines a reproducible data processing workflow that builds a Spatial Pack. The spatialpack CLI reads the file, resolves variables, executes each stage in order, and produces a complete pack with a spatialpack.json manifest. For the conceptual overview, see Pipeline Architecture.

A Pipeline YAML has six top-level sections. Only pipeline, pack, and stages are required.

pipeline.yaml
pipeline:
name: my-pack-pipeline
version: "1.0"
pack:
id: "my-org:region:theme:v1"
version: "1.0.0"
title: "My Spatial Pack"
theme: site-selection
geography: wa
license: CC-BY-4.0
region:
bbox: [115.65, -32.15, 116.15, -31.65]
crs: "EPSG:4326"
index:
h3_resolution: 7
sources:
cadastre:
path: "./data/cadastre.gpkg"
license: CC-BY-4.0
format: gpkg
variables:
output_crs: "EPSG:4326"
stages:
- name: convert_cadastre
action: convert.gpkg
input: cadastre
output: "layers/cadastre.parquet"
options:
crs: ${variables.output_crs}
bbox: ${pack.region.bbox}
layer:
id: cadastre
title: "Cadastre Boundaries"
type: vector
description: "Property lot boundaries."
FieldTypeRequiredDescription
pipeline.namestringyesHuman-readable pipeline name
pipeline.versionstringyesPipeline version (semver recommended)
pipeline.descriptionstringnoMulti-line description of the pipeline

The pack section defines the output Spatial Pack metadata. These values are written into the spatialpack.json manifest.

FieldTypeRequiredDescription
pack.idstringyesPack identifier in org:geography:theme:version format
pack.versionstringyesPack data version
pack.titlestringnoHuman-readable title
pack.descriptionstringnoPack description
pack.themestringyesThematic category (e.g., solar-feasibility, terrain)
pack.geographystringyesGeographic region code (e.g., wa, nsw)
pack.licensestringyesSPDX license identifier (e.g., CC-BY-4.0)
pack.authoritystringnoPublishing authority (defaults to spatial.properties)
pack.region.bboxarrayyesBounding box as [minx, miny, maxx, maxy] in the CRS specified by pack.region.crs
pack.region.crsstringyesCoordinate reference system for the bounding box (e.g., EPSG:4326)
pack.region.analysis_crsstringnoProjected CRS for area/distance calculations. Auto-detected from bbox if not specified.

The optional index section controls H3 spatial indexing applied to all GeoParquet outputs.

FieldTypeDefaultDescription
index.h3_resolutioninteger7H3 hexagonal grid resolution (0-15). Resolution 7 produces cells of approximately 5.16 km2.

Individual layers can override the resolution with a h3_resolution field in their layer metadata.

The sources section declares named data inputs. Each source has a key (used to reference it in stages) and metadata fields.

FieldTypeRequiredDescription
pathstringyesFile path relative to the pipeline YAML location
licensestringyesSPDX license identifier for this source
attributionstringnoData provider attribution
formatstringnoFile format hint (e.g., shp, gpkg, tif, geojson)
sources:
dem_z50:
path: "./data/zone50.tif"
license: CC-BY-4.0
attribution: "Geoscience Australia"
format: tif
cadastre:
path: "./data/cadastre.gpkg"
license: CC-BY-4.0
format: gpkg

The optional variables section defines custom key-value pairs for use in stage options via interpolation.

variables:
target_crs: "EPSG:4326"
compression: zstd

Reference them in stages with ${variables.target_crs}.

The parser resolves ${...} expressions before execution. If the entire value is a single expression that resolves to a non-string type (like a list), the resolved type is preserved.

PatternResolves toExample value
${pack.region.bbox}Pack bounding box array[115.65, -32.15, 116.15, -31.65]
${pack.region.crs}Pack CRS stringEPSG:4326
${sources.<name>.path}Source file path./data/cadastre.gpkg
${stages.<name>.output}Previous stage output pathstaging/mosaic.vrt
${variables.<name>}Custom variable valueUser-defined

Expressions navigate the YAML tree using dot notation. The parser walks the pipeline dictionary, following each path segment. If a segment cannot be resolved, the build fails with a descriptive error.

# Using bbox interpolation -- resolves to the array from pack.region.bbox
options:
bbox: ${pack.region.bbox}
# Using source path -- resolves to the string from sources.cadastre.path
input: ${sources.cadastre.path}
# Using a variable -- resolves to the custom value
options:
crs: ${variables.target_crs}

Each entry in the stages array defines a processing step. Stages execute in declaration order with dependency-based scheduling.

FieldTypeRequiredDescription
namestringyesUnique stage identifier
actionstringyesAction handler to execute (see stage types below)
inputstringyes*Single input reference (source name or previous stage name)
inputsarrayyes*Multiple input references (for actions like raster.vrt that accept multiple files)
depends_onarraynoExplicit stage dependencies. The executor waits for these stages to complete first.
outputstringyesOutput file path relative to the pack output directory
optionsobjectnoAction-specific options (see each stage type)
layerobjectnoLayer metadata for the spatialpack.json manifest

*Either input or inputs is required, not both.

When a stage has a layer field, the executor includes it in the manifest as a pack layer.

FieldTypeRequiredDescription
layer.idstringyesLayer identifier (must be unique within the pack)
layer.titlestringyesHuman-readable layer title
layer.typestringyesLayer type: vector or raster
layer.descriptionstringnoLayer description
layer.h3_resolutionintegernoPer-layer H3 resolution override

Stages without a layer field produce intermediate files (e.g., VRT mosaics, staging outputs) that are not included in the final manifest.

The executor ships with 13 built-in action handlers. Each section documents the action name, purpose, available options, and a minimal YAML example.


Create a VRT (Virtual Raster) mosaic from multiple raster files. A VRT is a lightweight XML file that references the original rasters without copying data.

Options:

OptionTypeDefaultDescription
resamplingstringbilinearResampling method: nearest, bilinear, cubic, lanczos, average
resolutionfloatautoTarget resolution in CRS units. Uses highest resolution input if not specified.

Example:

- name: mosaic_dem
action: raster.vrt
inputs: [dem_z50, dem_z51]
output: "staging/dem_mosaic.vrt"
options:
resampling: bilinear

Convert a raster to Cloud Optimized GeoTIFF with internal tiling and overviews. Supports bounding box clipping and CRS transformation.

Options:

OptionTypeDefaultDescription
bboxarraynoneBounding box for clipping: [minx, miny, maxx, maxy]
resolutionfloatautoTarget resolution in CRS units
compressionstringdeflateCompression codec: deflate, lzw, zstd
blocksizeinteger512Internal tile size in pixels
crsstringEPSG:4326Target CRS
nodatafloatnoneNoData value to set

Example:

- name: clip_dem
action: raster.cog
input: mosaic_dem
depends_on: [mosaic_dem]
output: "rasters/terrain_dem.tif"
options:
bbox: ${pack.region.bbox}
compression: zstd
blocksize: 512
layer:
id: terrain_dem
title: "Terrain DEM"
type: raster
description: "Digital Elevation Model clipped to region."

Compute slope from a Digital Elevation Model. Outputs slope as a Cloud Optimized GeoTIFF.

Options:

OptionTypeDefaultDescription
unitsstringpercentOutput units: percent or degrees
compressionstringdeflateCompression codec: deflate, lzw, zstd

Example:

- name: compute_slope
action: raster.slope
input: clip_dem
depends_on: [clip_dem]
output: "rasters/terrain_slope.tif"
options:
units: percent
compression: zstd
layer:
id: terrain_slope
title: "Terrain Slope"
type: raster

Compute aspect from a Digital Elevation Model. Outputs aspect in degrees from north (0-360) as a Cloud Optimized GeoTIFF.

Options:

OptionTypeDefaultDescription
compressionstringdeflateCompression codec: deflate, lzw, zstd

Example:

- name: compute_aspect
action: raster.aspect
input: clip_dem
depends_on: [clip_dem]
output: "rasters/terrain_aspect.tif"
options:
compression: zstd
layer:
id: terrain_aspect
title: "Terrain Aspect"
type: raster

raster.aspect output is degrees-from-north (0–360). Supports the same compression option as raster.slope.


Compute hillshade from a Digital Elevation Model. Creates a shaded relief visualization as a Cloud Optimized GeoTIFF.

Options:

OptionTypeDefaultDescription
azimuthfloat315.0Light source compass direction in degrees (315 = northwest)
altitudefloat45.0Light source altitude above horizon in degrees
z_factorfloat1.0Vertical exaggeration factor
compressionstringdeflateCompression codec: deflate, lzw, zstd

Example:

- name: compute_hillshade
action: raster.hillshade
input: clip_dem
depends_on: [clip_dem]
output: "rasters/terrain_hillshade.tif"
options:
azimuth: 315
altitude: 45
z_factor: 1.0
compression: zstd
layer:
id: terrain_hillshade
title: "Terrain Hillshade"
type: raster

Convert a Shapefile, GeoJSON, or other OGR-supported vector format to GeoParquet. Applies CRS normalization, geometry validation, metric computation, and H3 spatial indexing.

Options:

OptionTypeDefaultDescription
crsstringEPSG:4326Target CRS for output
compressionstringsnappyParquet compression: snappy, zstd, gzip
validate_geometrybooleantrueValidate and fix geometries on read
bboxarraynoneBounding box for spatial clipping: [minx, miny, maxx, maxy]
enrichmentstringnoneExplicit path to enrichment YAML (auto-discovered by convention if not specified)

Example:

- name: convert_bushfire
action: convert.shp
input: bushfire
output: "layers/bushfire_prone.parquet"
options:
crs: EPSG:4326
compression: zstd
validate_geometry: true
bbox: ${pack.region.bbox}
layer:
id: bushfire_prone
title: "Bushfire Prone Areas"
type: vector
description: "Designated bushfire prone areas."

Convert a GeoJSON file to GeoParquet with CRS normalization. Applies CRS normalization, geometry validation, metric computation, and H3 spatial indexing.

convert.geojson supports the same options as convert.shp (bbox, crs, compression, validate_geometry).

Options:

OptionTypeDefaultDescription
crsstringEPSG:4326Target CRS for output
compressionstringsnappyParquet compression: snappy, zstd, gzip
validate_geometrybooleantrueValidate and fix geometries on read
bboxarraynoneBounding box for spatial clipping: [minx, miny, maxx, maxy]
enrichmentstringnoneExplicit path to enrichment YAML (auto-discovered by convention if not specified)

Example:

- name: convert_sites
action: convert.geojson
input: sites
output: "layers/sites.parquet"
options:
crs: EPSG:4326
compression: zstd
validate_geometry: true
bbox: ${pack.region.bbox}
layer:
id: sites
title: "Development Sites"
type: vector
description: "Candidate development site polygons."

Convert a GeoPackage layer to GeoParquet. Supports multi-layer GeoPackage files with explicit layer selection. Applies CRS normalization, bbox clipping, metric computation, and H3 indexing.

Options:

OptionTypeDefaultDescription
crsstringEPSG:4326Target CRS for output
compressionstringsnappyParquet compression: snappy, zstd, gzip
bboxarraynoneBounding box for spatial clipping: [minx, miny, maxx, maxy]
layerstringnoneLayer name within the GeoPackage (required for multi-layer files)
row_group_sizeinteger65536Parquet row group size
enrichmentstringnoneExplicit path to enrichment YAML

Example:

- name: convert_asgs_sa2
action: convert.gpkg
input: asgs_sa2
output: "layers/asgs_sa2.parquet"
options:
layer: SA2_2021_AUST_GDA2020
crs: EPSG:4326
compression: zstd
bbox: ${pack.region.bbox}
layer:
id: asgs_sa2
title: "Statistical Areas Level 2"
type: vector

Generate a PMTiles vector tile archive from a GeoParquet file. Uses tippecanoe for tile generation.

Options:

OptionTypeDefaultDescription
min_zoominteger0Minimum zoom level
max_zoominteger14Maximum zoom level

Example:

- name: tiles_cadastre
action: tiles.pmtiles
input: convert_cadastre
depends_on: [convert_cadastre]
output: "layers/cadastre.pmtiles"
options:
min_zoom: 4
max_zoom: 14

Extract files from a ZIP archive into the output directory.

Options:

OptionTypeDefaultDescription
patternstringnoneGlob pattern to filter extracted files (e.g., *.shp). Extracts all files if not specified.

Example:

- name: extract_source
action: extract.zip
input: zipped_data
output: "staging/extracted/"

Compute a BLAKE3 cryptographic hash for one or more input files. The hash is recorded in the stage result and can be written into the manifest’s integrity section.

Options:

No action-specific options. The hash is computed from the file content.

Example:

- name: hash_cadastre
action: hash.file
input: convert_cadastre
depends_on: [convert_cadastre]
output: "layers/cadastre.parquet.hash"

Copy a file from the input path to the output path. Preserves file metadata (timestamps, permissions).

Options:

No action-specific options.

Example:

- name: copy_readme
action: copy
input: readme_source
output: "README.md"

Compute area and perimeter metrics on an existing GeoParquet file. This action is optional — the convert.shp and convert.gpkg actions compute metrics automatically when an analysis_crs is available. Use this action when you need to add metrics to a pre-existing GeoParquet that was not produced by a convert stage.

Options:

No action-specific options. The analysis CRS is resolved automatically from pack.region.analysis_crs or auto-detected from the bounding box.

Example:

- name: add_metrics
action: metrics.compute
input: existing_layer
depends_on: [existing_layer]
output: "layers/layer_with_metrics.parquet"

The WA Solar Feasibility Pipeline is a real-world example with 7 sources, 15 stages, variable interpolation, and stage dependencies. Here is an annotated excerpt showing the key patterns:

pipelines/wa-solar-feasibility-v1.yaml (excerpt)
pipeline:
name: wa-solar-feasibility-pack
version: "1.0"
pack:
id: "spatial.properties:wa:solar-feasibility:v1"
version: "2025.01.31"
title: "WA Solar Feasibility Pack"
theme: solar-feasibility
geography: wa
license: CC-BY-4.0
region:
# Perth Metro bounding box
bbox: [115.65, -32.15, 116.15, -31.65]
crs: "EPSG:4326"
# H3 resolution 7 = ~5.16 km2 hexagonal cells
index:
h3_resolution: 7
# Multiple source declarations with license metadata
sources:
dem_z50:
path: "./spatial-data/waz50/waz50.tif"
license: CC-BY-4.0
attribution: "Geoscience Australia SRTM-derived 1 Second DEM"
format: tif
bushfire:
path: "./spatial-data/bushfire.shp"
license: CC-BY-4.0
attribution: "Department of Fire and Emergency Services WA"
format: shp
stages:
# Raster chain: VRT -> COG -> derivatives
- name: mosaic_dem
action: raster.vrt
inputs: [dem_z50, dem_z51]
output: "staging/wa_dem_mosaic.vrt"
options:
resampling: bilinear
- name: clip_dem
action: raster.cog
input: mosaic_dem
depends_on: [mosaic_dem]
output: "rasters/terrain_dem.tif"
options:
bbox: ${pack.region.bbox} # Interpolated from pack section
compression: zstd
layer:
id: terrain_dem
title: "Terrain DEM"
type: raster
- name: compute_slope
action: raster.slope
input: clip_dem
depends_on: [clip_dem] # Explicit dependency
output: "rasters/terrain_slope.tif"
options:
units: percent
compression: zstd
layer:
id: terrain_slope
title: "Terrain Slope"
type: raster
# Vector conversion with bbox clipping
- name: convert_bushfire
action: convert.shp
input: bushfire
output: "layers/bushfire_prone.parquet"
options:
crs: EPSG:4326
bbox: ${pack.region.bbox} # Same bbox, reused via interpolation
layer:
id: bushfire_prone
title: "Bushfire Prone Areas"
type: vector
# PMTiles from converted layer
- name: tiles_bushfire
action: tiles.pmtiles
input: convert_bushfire
depends_on: [convert_bushfire]
output: "layers/bushfire_prone.pmtiles"
options:
min_zoom: 4
max_zoom: 10

Patterns demonstrated:

  • Multi-source pipelines — Multiple data sources with different formats (raster and vector) in a single pipeline.
  • Stage chainingdepends_on ensures stages execute in the correct order. The clip_dem stage waits for mosaic_dem to complete.
  • Variable interpolation${pack.region.bbox} is used in multiple stages, ensuring consistent clipping across all layers.
  • Intermediate stages — The VRT mosaic stage has no layer field, so it produces a staging file but no manifest entry.
  • Layer metadata — Stages with a layer field contribute layers to the final spatialpack.json manifest.
Terminal window
# Build the pack
spatialpack pack build pipeline.yaml -o ./output/
# Dry run (validate without executing)
spatialpack pack build pipeline.yaml --dry-run
# Continue past failures during development
spatialpack pack build pipeline.yaml -o ./output/ --continue-on-error

See the CLI pack reference for all build options.