Data Formats

Traditional GIS relies on dozens of file formats — Shapefiles, GeoJSON, GeoTIFF, MBTiles, KML, and more. Each has trade-offs in compression, query speed, portability, and tooling support. Spatial.Properties standardizes on three modern formats that balance performance, portability, and query efficiency.

GeoParquet

GeoParquet is Apache Parquet with a GeoArrow geometry column. It stores tabular data in a columnar layout, which means queries that touch a few columns skip reading the rest. Compression (Zstandard by default) shrinks files 5—10x compared to Shapefiles, and columnar scans run 10—100x faster.

Every vector layer in a Spatial Pack is stored as GeoParquet. The file is readable by DuckDB, pandas, GeoPandas, QGIS, and any tool that supports Apache Arrow.

# Convert a Shapefile to GeoParquet
spatialpack convert shp ./cadastre.shp -o ./layers/cadastre.parquet

import geopandas as gpd

gdf = gpd.read_parquet("layers/cadastre.parquet")
print(gdf.columns)   # attribute columns + geometry
print(len(gdf))       # feature count

-- Query a GeoParquet file directly
SELECT lot_number, area_m2
FROM read_parquet('layers/cadastre.parquet')
WHERE area_m2 > 5000
LIMIT 10;

Why not Shapefiles or GeoJSON?

Shapefiles split data across four or more sidecar files (.shp, .dbf, .shx, .prj) and cap field names at 10 characters. GeoJSON is human-readable but uncompressed, making it impractical for datasets above a few thousand features. GeoParquet avoids both problems: it is a single file, supports long column names, and compresses efficiently.

PMTiles

PMTiles is a single-file archive of vector tiles. A traditional tile server slices data into millions of small files organized by zoom level, row, and column. PMTiles packs all those tiles into one file and serves them with HTTP range requests — no tile server required.

Every Spatial Pack that includes a map visualization layer ships a .pmtiles file. Host it on any static file server (S3, CloudFront, GitHub Pages) and load it directly in MapLibre GL or Leaflet.

# Pipeline stage that produces PMTiles
- name: tiles_cadastre
  action: tiles.pmtiles
  input: cadastre
  output: "layers/cadastre.pmtiles"
  options:
    min_zoom: 4
    max_zoom: 14

Why not MBTiles?

MBTiles uses SQLite internally. It works well on desktop but requires a tile server for web delivery. PMTiles removes that dependency entirely — a browser can fetch individual tiles via HTTP range requests against a static file.

H3 Indexing

H3 is Uber’s hexagonal hierarchical spatial index. It divides the globe into hexagonal cells at 16 resolution levels. Spatial.Properties uses H3 to tag every feature in a GeoParquet file with the hexagonal cells it occupies.

Two columns are added during indexing:

h3_cell (string) — The H3 cell containing the feature’s centroid. Used for row-group clustering so that nearby features are stored together on disk.
h3_cells (list of strings) — All H3 cells covering the feature’s geometry. Used for spatial intersection queries without computing geometry overlaps.

The default resolution is 7, which produces cells of approximately 5.16 km. The manifest records the resolution in each layer’s index.h3_res field.

CLI
DuckDB

# Add H3 index to a GeoParquet file
spatialpack h3 index ./layers/cadastre.parquet --resolution 7

-- Find features in a specific H3 cell
SELECT * FROM read_parquet('layers/cadastre.parquet')
WHERE h3_cell = '87283472fffffff';

-- Find features overlapping a cell (coverage query)
SELECT * FROM read_parquet('layers/cadastre.parquet')
WHERE list_contains(h3_cells, '87283472fffffff');

Why not R-tree or S2?

R-tree indexes are in-memory structures that do not persist across tools. S2 (Google’s spherical index) is supported but less widely adopted in the Python/DuckDB ecosystem. H3’s hexagonal grid produces uniform cell areas and maps cleanly to Parquet row groups, enabling spatial pruning at read time without loading the full file.

How formats relate in a Spatial Pack

flowchart LR
    A[Source Data] --> B[Pipeline]
    B --> C[GeoParquet]
    B --> D[PMTiles]
    C --> E[H3 Indexing]
    E --> F[Indexed GeoParquet]
    F --> G[Spatial Pack]
    D --> G

The Pipeline converts source data into GeoParquet and PMTiles. An optional H3 indexing stage adds spatial index columns to the GeoParquet file. Both outputs are bundled into the final Spatial Pack alongside the spatialpack.json manifest.

Format comparison

Property	GeoParquet	PMTiles	H3 Index
Purpose	Tabular + geometry storage	Map tile delivery	Spatial lookup
Compression	Zstandard (columnar)	Gzip (per tile)	N/A (column in Parquet)
Query engine	DuckDB, pandas, QGIS	MapLibre GL, Leaflet	DuckDB, pandas
Server required	No	No (HTTP range requests)	No
File extension	`.parquet`	`.pmtiles`	`.parquet` (same file)

Next steps

Build your first pack — Follow the Getting Started guide to create a Spatial Pack with all three formats.
Explore the schema — See the Spatial Pack manifest schema for the full list of layer fields including parquet, pmtiles, and index.h3_res.