Pipeline YAML Reference
A Pipeline YAML file defines a reproducible data processing workflow that builds a Spatial Pack. The spatialpack CLI reads the file, resolves variables, executes each stage in order, and produces a complete pack with a spatialpack.json manifest. For the conceptual overview, see Pipeline Architecture.
File structure
Section titled “File structure”A Pipeline YAML has six top-level sections. Only pipeline, pack, and stages are required.
pipeline: name: my-pack-pipeline version: "1.0"
pack: id: "my-org:region:theme:v1" version: "1.0.0" title: "My Spatial Pack" theme: site-selection geography: wa license: CC-BY-4.0 region: bbox: [115.65, -32.15, 116.15, -31.65] crs: "EPSG:4326"
index: h3_resolution: 7
sources: cadastre: path: "./data/cadastre.gpkg" license: CC-BY-4.0 format: gpkg
variables: output_crs: "EPSG:4326"
stages: - name: convert_cadastre action: convert.gpkg input: cadastre output: "layers/cadastre.parquet" options: crs: ${variables.output_crs} bbox: ${pack.region.bbox} layer: id: cadastre title: "Cadastre Boundaries" type: vector description: "Property lot boundaries."Pipeline metadata
Section titled “Pipeline metadata”| Field | Type | Required | Description |
|---|---|---|---|
pipeline.name | string | yes | Human-readable pipeline name |
pipeline.version | string | yes | Pipeline version (semver recommended) |
pipeline.description | string | no | Multi-line description of the pipeline |
Pack definition
Section titled “Pack definition”The pack section defines the output Spatial Pack metadata. These values are written into the spatialpack.json manifest.
| Field | Type | Required | Description |
|---|---|---|---|
pack.id | string | yes | Pack identifier in org:geography:theme:version format |
pack.version | string | yes | Pack data version |
pack.title | string | no | Human-readable title |
pack.description | string | no | Pack description |
pack.theme | string | yes | Thematic category (e.g., solar-feasibility, terrain) |
pack.geography | string | yes | Geographic region code (e.g., wa, nsw) |
pack.license | string | yes | SPDX license identifier (e.g., CC-BY-4.0) |
pack.authority | string | no | Publishing authority (defaults to spatial.properties) |
pack.region.bbox | array | yes | Bounding box as [minx, miny, maxx, maxy] in the CRS specified by pack.region.crs |
pack.region.crs | string | yes | Coordinate reference system for the bounding box (e.g., EPSG:4326) |
pack.region.analysis_crs | string | no | Projected CRS for area/distance calculations. Auto-detected from bbox if not specified. |
Index configuration
Section titled “Index configuration”The optional index section controls H3 spatial indexing applied to all GeoParquet outputs.
| Field | Type | Default | Description |
|---|---|---|---|
index.h3_resolution | integer | 7 | H3 hexagonal grid resolution (0-15). Resolution 7 produces cells of approximately 5.16 km2. |
Individual layers can override the resolution with a h3_resolution field in their layer metadata.
Sources
Section titled “Sources”The sources section declares named data inputs. Each source has a key (used to reference it in stages) and metadata fields.
| Field | Type | Required | Description |
|---|---|---|---|
path | string | yes | File path relative to the pipeline YAML location |
license | string | yes | SPDX license identifier for this source |
attribution | string | no | Data provider attribution |
format | string | no | File format hint (e.g., shp, gpkg, tif, geojson) |
sources: dem_z50: path: "./data/zone50.tif" license: CC-BY-4.0 attribution: "Geoscience Australia" format: tif
cadastre: path: "./data/cadastre.gpkg" license: CC-BY-4.0 format: gpkgVariables
Section titled “Variables”The optional variables section defines custom key-value pairs for use in stage options via interpolation.
variables: target_crs: "EPSG:4326" compression: zstdReference them in stages with ${variables.target_crs}.
Variable interpolation
Section titled “Variable interpolation”The parser resolves ${...} expressions before execution. If the entire value is a single expression that resolves to a non-string type (like a list), the resolved type is preserved.
| Pattern | Resolves to | Example value |
|---|---|---|
${pack.region.bbox} | Pack bounding box array | [115.65, -32.15, 116.15, -31.65] |
${pack.region.crs} | Pack CRS string | EPSG:4326 |
${sources.<name>.path} | Source file path | ./data/cadastre.gpkg |
${stages.<name>.output} | Previous stage output path | staging/mosaic.vrt |
${variables.<name>} | Custom variable value | User-defined |
Expressions navigate the YAML tree using dot notation. The parser walks the pipeline dictionary, following each path segment. If a segment cannot be resolved, the build fails with a descriptive error.
# Using bbox interpolation -- resolves to the array from pack.region.bboxoptions: bbox: ${pack.region.bbox}
# Using source path -- resolves to the string from sources.cadastre.pathinput: ${sources.cadastre.path}
# Using a variable -- resolves to the custom valueoptions: crs: ${variables.target_crs}Stage structure
Section titled “Stage structure”Each entry in the stages array defines a processing step. Stages execute in declaration order with dependency-based scheduling.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Unique stage identifier |
action | string | yes | Action handler to execute (see stage types below) |
input | string | yes* | Single input reference (source name or previous stage name) |
inputs | array | yes* | Multiple input references (for actions like raster.vrt that accept multiple files) |
depends_on | array | no | Explicit stage dependencies. The executor waits for these stages to complete first. |
output | string | yes | Output file path relative to the pack output directory |
options | object | no | Action-specific options (see each stage type) |
layer | object | no | Layer metadata for the spatialpack.json manifest |
*Either input or inputs is required, not both.
Layer metadata
Section titled “Layer metadata”When a stage has a layer field, the executor includes it in the manifest as a pack layer.
| Field | Type | Required | Description |
|---|---|---|---|
layer.id | string | yes | Layer identifier (must be unique within the pack) |
layer.title | string | yes | Human-readable layer title |
layer.type | string | yes | Layer type: vector or raster |
layer.description | string | no | Layer description |
layer.h3_resolution | integer | no | Per-layer H3 resolution override |
Stages without a layer field produce intermediate files (e.g., VRT mosaics, staging outputs) that are not included in the final manifest.
Stage types
Section titled “Stage types”The executor ships with 13 built-in action handlers. Each section documents the action name, purpose, available options, and a minimal YAML example.
raster.vrt
Section titled “raster.vrt”Create a VRT (Virtual Raster) mosaic from multiple raster files. A VRT is a lightweight XML file that references the original rasters without copying data.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
resampling | string | bilinear | Resampling method: nearest, bilinear, cubic, lanczos, average |
resolution | float | auto | Target resolution in CRS units. Uses highest resolution input if not specified. |
Example:
- name: mosaic_dem action: raster.vrt inputs: [dem_z50, dem_z51] output: "staging/dem_mosaic.vrt" options: resampling: bilinearraster.cog
Section titled “raster.cog”Convert a raster to Cloud Optimized GeoTIFF with internal tiling and overviews. Supports bounding box clipping and CRS transformation.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
bbox | array | none | Bounding box for clipping: [minx, miny, maxx, maxy] |
resolution | float | auto | Target resolution in CRS units |
compression | string | deflate | Compression codec: deflate, lzw, zstd |
blocksize | integer | 512 | Internal tile size in pixels |
crs | string | EPSG:4326 | Target CRS |
nodata | float | none | NoData value to set |
Example:
- name: clip_dem action: raster.cog input: mosaic_dem depends_on: [mosaic_dem] output: "rasters/terrain_dem.tif" options: bbox: ${pack.region.bbox} compression: zstd blocksize: 512 layer: id: terrain_dem title: "Terrain DEM" type: raster description: "Digital Elevation Model clipped to region."raster.slope
Section titled “raster.slope”Compute slope from a Digital Elevation Model. Outputs slope as a Cloud Optimized GeoTIFF.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
units | string | percent | Output units: percent or degrees |
compression | string | deflate | Compression codec: deflate, lzw, zstd |
Example:
- name: compute_slope action: raster.slope input: clip_dem depends_on: [clip_dem] output: "rasters/terrain_slope.tif" options: units: percent compression: zstd layer: id: terrain_slope title: "Terrain Slope" type: rasterraster.aspect
Section titled “raster.aspect”Compute aspect from a Digital Elevation Model. Outputs aspect in degrees from north (0-360) as a Cloud Optimized GeoTIFF.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
compression | string | deflate | Compression codec: deflate, lzw, zstd |
Example:
- name: compute_aspect action: raster.aspect input: clip_dem depends_on: [clip_dem] output: "rasters/terrain_aspect.tif" options: compression: zstd layer: id: terrain_aspect title: "Terrain Aspect" type: rasterraster.aspect output is degrees-from-north (0–360). Supports the same compression option as raster.slope.
raster.hillshade
Section titled “raster.hillshade”Compute hillshade from a Digital Elevation Model. Creates a shaded relief visualization as a Cloud Optimized GeoTIFF.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
azimuth | float | 315.0 | Light source compass direction in degrees (315 = northwest) |
altitude | float | 45.0 | Light source altitude above horizon in degrees |
z_factor | float | 1.0 | Vertical exaggeration factor |
compression | string | deflate | Compression codec: deflate, lzw, zstd |
Example:
- name: compute_hillshade action: raster.hillshade input: clip_dem depends_on: [clip_dem] output: "rasters/terrain_hillshade.tif" options: azimuth: 315 altitude: 45 z_factor: 1.0 compression: zstd layer: id: terrain_hillshade title: "Terrain Hillshade" type: rasterconvert.shp
Section titled “convert.shp”Convert a Shapefile, GeoJSON, or other OGR-supported vector format to GeoParquet. Applies CRS normalization, geometry validation, metric computation, and H3 spatial indexing.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
crs | string | EPSG:4326 | Target CRS for output |
compression | string | snappy | Parquet compression: snappy, zstd, gzip |
validate_geometry | boolean | true | Validate and fix geometries on read |
bbox | array | none | Bounding box for spatial clipping: [minx, miny, maxx, maxy] |
enrichment | string | none | Explicit path to enrichment YAML (auto-discovered by convention if not specified) |
Example:
- name: convert_bushfire action: convert.shp input: bushfire output: "layers/bushfire_prone.parquet" options: crs: EPSG:4326 compression: zstd validate_geometry: true bbox: ${pack.region.bbox} layer: id: bushfire_prone title: "Bushfire Prone Areas" type: vector description: "Designated bushfire prone areas."convert.geojson
Section titled “convert.geojson”Convert a GeoJSON file to GeoParquet with CRS normalization. Applies CRS normalization, geometry validation, metric computation, and H3 spatial indexing.
convert.geojson supports the same options as convert.shp (bbox, crs, compression, validate_geometry).
Options:
| Option | Type | Default | Description |
|---|---|---|---|
crs | string | EPSG:4326 | Target CRS for output |
compression | string | snappy | Parquet compression: snappy, zstd, gzip |
validate_geometry | boolean | true | Validate and fix geometries on read |
bbox | array | none | Bounding box for spatial clipping: [minx, miny, maxx, maxy] |
enrichment | string | none | Explicit path to enrichment YAML (auto-discovered by convention if not specified) |
Example:
- name: convert_sites action: convert.geojson input: sites output: "layers/sites.parquet" options: crs: EPSG:4326 compression: zstd validate_geometry: true bbox: ${pack.region.bbox} layer: id: sites title: "Development Sites" type: vector description: "Candidate development site polygons."convert.gpkg
Section titled “convert.gpkg”Convert a GeoPackage layer to GeoParquet. Supports multi-layer GeoPackage files with explicit layer selection. Applies CRS normalization, bbox clipping, metric computation, and H3 indexing.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
crs | string | EPSG:4326 | Target CRS for output |
compression | string | snappy | Parquet compression: snappy, zstd, gzip |
bbox | array | none | Bounding box for spatial clipping: [minx, miny, maxx, maxy] |
layer | string | none | Layer name within the GeoPackage (required for multi-layer files) |
row_group_size | integer | 65536 | Parquet row group size |
enrichment | string | none | Explicit path to enrichment YAML |
Example:
- name: convert_asgs_sa2 action: convert.gpkg input: asgs_sa2 output: "layers/asgs_sa2.parquet" options: layer: SA2_2021_AUST_GDA2020 crs: EPSG:4326 compression: zstd bbox: ${pack.region.bbox} layer: id: asgs_sa2 title: "Statistical Areas Level 2" type: vectortiles.pmtiles
Section titled “tiles.pmtiles”Generate a PMTiles vector tile archive from a GeoParquet file. Uses tippecanoe for tile generation.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
min_zoom | integer | 0 | Minimum zoom level |
max_zoom | integer | 14 | Maximum zoom level |
Example:
- name: tiles_cadastre action: tiles.pmtiles input: convert_cadastre depends_on: [convert_cadastre] output: "layers/cadastre.pmtiles" options: min_zoom: 4 max_zoom: 14extract.zip
Section titled “extract.zip”Extract files from a ZIP archive into the output directory.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
pattern | string | none | Glob pattern to filter extracted files (e.g., *.shp). Extracts all files if not specified. |
Example:
- name: extract_source action: extract.zip input: zipped_data output: "staging/extracted/"hash.file
Section titled “hash.file”Compute a BLAKE3 cryptographic hash for one or more input files. The hash is recorded in the stage result and can be written into the manifest’s integrity section.
Options:
No action-specific options. The hash is computed from the file content.
Example:
- name: hash_cadastre action: hash.file input: convert_cadastre depends_on: [convert_cadastre] output: "layers/cadastre.parquet.hash"Copy a file from the input path to the output path. Preserves file metadata (timestamps, permissions).
Options:
No action-specific options.
Example:
- name: copy_readme action: copy input: readme_source output: "README.md"metrics.compute
Section titled “metrics.compute”Compute area and perimeter metrics on an existing GeoParquet file. This action is optional — the convert.shp and convert.gpkg actions compute metrics automatically when an analysis_crs is available. Use this action when you need to add metrics to a pre-existing GeoParquet that was not produced by a convert stage.
Options:
No action-specific options. The analysis CRS is resolved automatically from pack.region.analysis_crs or auto-detected from the bounding box.
Example:
- name: add_metrics action: metrics.compute input: existing_layer depends_on: [existing_layer] output: "layers/layer_with_metrics.parquet"Complete example
Section titled “Complete example”The WA Solar Feasibility Pipeline is a real-world example with 7 sources, 15 stages, variable interpolation, and stage dependencies. Here is an annotated excerpt showing the key patterns:
pipeline: name: wa-solar-feasibility-pack version: "1.0"
pack: id: "spatial.properties:wa:solar-feasibility:v1" version: "2025.01.31" title: "WA Solar Feasibility Pack" theme: solar-feasibility geography: wa license: CC-BY-4.0 region: # Perth Metro bounding box bbox: [115.65, -32.15, 116.15, -31.65] crs: "EPSG:4326"
# H3 resolution 7 = ~5.16 km2 hexagonal cellsindex: h3_resolution: 7
# Multiple source declarations with license metadatasources: dem_z50: path: "./spatial-data/waz50/waz50.tif" license: CC-BY-4.0 attribution: "Geoscience Australia SRTM-derived 1 Second DEM" format: tif
bushfire: path: "./spatial-data/bushfire.shp" license: CC-BY-4.0 attribution: "Department of Fire and Emergency Services WA" format: shp
stages: # Raster chain: VRT -> COG -> derivatives - name: mosaic_dem action: raster.vrt inputs: [dem_z50, dem_z51] output: "staging/wa_dem_mosaic.vrt" options: resampling: bilinear
- name: clip_dem action: raster.cog input: mosaic_dem depends_on: [mosaic_dem] output: "rasters/terrain_dem.tif" options: bbox: ${pack.region.bbox} # Interpolated from pack section compression: zstd layer: id: terrain_dem title: "Terrain DEM" type: raster
- name: compute_slope action: raster.slope input: clip_dem depends_on: [clip_dem] # Explicit dependency output: "rasters/terrain_slope.tif" options: units: percent compression: zstd layer: id: terrain_slope title: "Terrain Slope" type: raster
# Vector conversion with bbox clipping - name: convert_bushfire action: convert.shp input: bushfire output: "layers/bushfire_prone.parquet" options: crs: EPSG:4326 bbox: ${pack.region.bbox} # Same bbox, reused via interpolation layer: id: bushfire_prone title: "Bushfire Prone Areas" type: vector
# PMTiles from converted layer - name: tiles_bushfire action: tiles.pmtiles input: convert_bushfire depends_on: [convert_bushfire] output: "layers/bushfire_prone.pmtiles" options: min_zoom: 4 max_zoom: 10Patterns demonstrated:
- Multi-source pipelines — Multiple data sources with different formats (raster and vector) in a single pipeline.
- Stage chaining —
depends_onensures stages execute in the correct order. Theclip_demstage waits formosaic_demto complete. - Variable interpolation —
${pack.region.bbox}is used in multiple stages, ensuring consistent clipping across all layers. - Intermediate stages — The VRT mosaic stage has no
layerfield, so it produces a staging file but no manifest entry. - Layer metadata — Stages with a
layerfield contribute layers to the finalspatialpack.jsonmanifest.
Building a pipeline
Section titled “Building a pipeline”# Build the packspatialpack pack build pipeline.yaml -o ./output/
# Dry run (validate without executing)spatialpack pack build pipeline.yaml --dry-run
# Continue past failures during developmentspatialpack pack build pipeline.yaml -o ./output/ --continue-on-errorSee the CLI pack reference for all build options.