Skip to content

Data Formats

Traditional GIS relies on dozens of file formats — Shapefiles, GeoJSON, GeoTIFF, MBTiles, KML, and more. Each has trade-offs in compression, query speed, portability, and tooling support. Spatial.Properties standardizes on three modern formats that balance performance, portability, and query efficiency.

GeoParquet is Apache Parquet with a GeoArrow geometry column. It stores tabular data in a columnar layout, which means queries that touch a few columns skip reading the rest. Compression (Zstandard by default) shrinks files 5—10x compared to Shapefiles, and columnar scans run 10—100x faster.

Every vector layer in a Spatial Pack is stored as GeoParquet. The file is readable by DuckDB, pandas, GeoPandas, QGIS, and any tool that supports Apache Arrow.

Terminal window
# Convert a Shapefile to GeoParquet
spatialpack convert shp ./cadastre.shp -o ./layers/cadastre.parquet

Shapefiles split data across four or more sidecar files (.shp, .dbf, .shx, .prj) and cap field names at 10 characters. GeoJSON is human-readable but uncompressed, making it impractical for datasets above a few thousand features. GeoParquet avoids both problems: it is a single file, supports long column names, and compresses efficiently.

PMTiles is a single-file archive of vector tiles. A traditional tile server slices data into millions of small files organized by zoom level, row, and column. PMTiles packs all those tiles into one file and serves them with HTTP range requests — no tile server required.

Every Spatial Pack that includes a map visualization layer ships a .pmtiles file. Host it on any static file server (S3, CloudFront, GitHub Pages) and load it directly in MapLibre GL or Leaflet.

pipeline.yaml
# Pipeline stage that produces PMTiles
- name: tiles_cadastre
action: tiles.pmtiles
input: cadastre
output: "layers/cadastre.pmtiles"
options:
min_zoom: 4
max_zoom: 14

MBTiles uses SQLite internally. It works well on desktop but requires a tile server for web delivery. PMTiles removes that dependency entirely — a browser can fetch individual tiles via HTTP range requests against a static file.

H3 is Uber’s hexagonal hierarchical spatial index. It divides the globe into hexagonal cells at 16 resolution levels. Spatial.Properties uses H3 to tag every feature in a GeoParquet file with the hexagonal cells it occupies.

Two columns are added during indexing:

  • h3_cell (string) — The H3 cell containing the feature’s centroid. Used for row-group clustering so that nearby features are stored together on disk.
  • h3_cells (list of strings) — All H3 cells covering the feature’s geometry. Used for spatial intersection queries without computing geometry overlaps.

The default resolution is 7, which produces cells of approximately 5.16 km. The manifest records the resolution in each layer’s index.h3_res field.

Terminal window
# Add H3 index to a GeoParquet file
spatialpack h3 index ./layers/cadastre.parquet --resolution 7

R-tree indexes are in-memory structures that do not persist across tools. S2 (Google’s spherical index) is supported but less widely adopted in the Python/DuckDB ecosystem. H3’s hexagonal grid produces uniform cell areas and maps cleanly to Parquet row groups, enabling spatial pruning at read time without loading the full file.

flowchart LR
    A[Source Data] --> B[Pipeline]
    B --> C[GeoParquet]
    B --> D[PMTiles]
    C --> E[H3 Indexing]
    E --> F[Indexed GeoParquet]
    F --> G[Spatial Pack]
    D --> G

The Pipeline converts source data into GeoParquet and PMTiles. An optional H3 indexing stage adds spatial index columns to the GeoParquet file. Both outputs are bundled into the final Spatial Pack alongside the spatialpack.json manifest.

PropertyGeoParquetPMTilesH3 Index
PurposeTabular + geometry storageMap tile deliverySpatial lookup
CompressionZstandard (columnar)Gzip (per tile)N/A (column in Parquet)
Query engineDuckDB, pandas, QGISMapLibre GL, LeafletDuckDB, pandas
Server requiredNoNo (HTTP range requests)No
File extension.parquet.pmtiles.parquet (same file)
  • Build your first pack — Follow the Getting Started guide to create a Spatial Pack with all three formats.
  • Explore the schema — See the Spatial Pack manifest schema for the full list of layer fields including parquet, pmtiles, and index.h3_res.