Data Formats
Traditional GIS relies on dozens of file formats — Shapefiles, GeoJSON, GeoTIFF, MBTiles, KML, and more. Each has trade-offs in compression, query speed, portability, and tooling support. Spatial.Properties standardizes on three modern formats that balance performance, portability, and query efficiency.
GeoParquet
Section titled “GeoParquet”GeoParquet is Apache Parquet with a GeoArrow geometry column. It stores tabular data in a columnar layout, which means queries that touch a few columns skip reading the rest. Compression (Zstandard by default) shrinks files 5—10x compared to Shapefiles, and columnar scans run 10—100x faster.
Every vector layer in a Spatial Pack is stored as GeoParquet. The file is readable by DuckDB, pandas, GeoPandas, QGIS, and any tool that supports Apache Arrow.
# Convert a Shapefile to GeoParquetspatialpack convert shp ./cadastre.shp -o ./layers/cadastre.parquetimport geopandas as gpd
gdf = gpd.read_parquet("layers/cadastre.parquet")print(gdf.columns) # attribute columns + geometryprint(len(gdf)) # feature count-- Query a GeoParquet file directlySELECT lot_number, area_m2FROM read_parquet('layers/cadastre.parquet')WHERE area_m2 > 5000LIMIT 10;Why not Shapefiles or GeoJSON?
Section titled “Why not Shapefiles or GeoJSON?”Shapefiles split data across four or more sidecar files (.shp, .dbf, .shx, .prj) and cap field names at 10 characters. GeoJSON is human-readable but uncompressed, making it impractical for datasets above a few thousand features. GeoParquet avoids both problems: it is a single file, supports long column names, and compresses efficiently.
PMTiles
Section titled “PMTiles”PMTiles is a single-file archive of vector tiles. A traditional tile server slices data into millions of small files organized by zoom level, row, and column. PMTiles packs all those tiles into one file and serves them with HTTP range requests — no tile server required.
Every Spatial Pack that includes a map visualization layer ships a .pmtiles file. Host it on any static file server (S3, CloudFront, GitHub Pages) and load it directly in MapLibre GL or Leaflet.
# Pipeline stage that produces PMTiles- name: tiles_cadastre action: tiles.pmtiles input: cadastre output: "layers/cadastre.pmtiles" options: min_zoom: 4 max_zoom: 14Why not MBTiles?
Section titled “Why not MBTiles?”MBTiles uses SQLite internally. It works well on desktop but requires a tile server for web delivery. PMTiles removes that dependency entirely — a browser can fetch individual tiles via HTTP range requests against a static file.
H3 Indexing
Section titled “H3 Indexing”H3 is Uber’s hexagonal hierarchical spatial index. It divides the globe into hexagonal cells at 16 resolution levels. Spatial.Properties uses H3 to tag every feature in a GeoParquet file with the hexagonal cells it occupies.
Two columns are added during indexing:
h3_cell(string) — The H3 cell containing the feature’s centroid. Used for row-group clustering so that nearby features are stored together on disk.h3_cells(list of strings) — All H3 cells covering the feature’s geometry. Used for spatial intersection queries without computing geometry overlaps.
The default resolution is 7, which produces cells of approximately 5.16 km. The manifest records the resolution in each layer’s index.h3_res field.
# Add H3 index to a GeoParquet filespatialpack h3 index ./layers/cadastre.parquet --resolution 7-- Find features in a specific H3 cellSELECT * FROM read_parquet('layers/cadastre.parquet')WHERE h3_cell = '87283472fffffff';
-- Find features overlapping a cell (coverage query)SELECT * FROM read_parquet('layers/cadastre.parquet')WHERE list_contains(h3_cells, '87283472fffffff');Why not R-tree or S2?
Section titled “Why not R-tree or S2?”R-tree indexes are in-memory structures that do not persist across tools. S2 (Google’s spherical index) is supported but less widely adopted in the Python/DuckDB ecosystem. H3’s hexagonal grid produces uniform cell areas and maps cleanly to Parquet row groups, enabling spatial pruning at read time without loading the full file.
How formats relate in a Spatial Pack
Section titled “How formats relate in a Spatial Pack”flowchart LR
A[Source Data] --> B[Pipeline]
B --> C[GeoParquet]
B --> D[PMTiles]
C --> E[H3 Indexing]
E --> F[Indexed GeoParquet]
F --> G[Spatial Pack]
D --> G
The Pipeline converts source data into GeoParquet and PMTiles. An optional H3 indexing stage adds spatial index columns to the GeoParquet file. Both outputs are bundled into the final Spatial Pack alongside the spatialpack.json manifest.
Format comparison
Section titled “Format comparison”| Property | GeoParquet | PMTiles | H3 Index |
|---|---|---|---|
| Purpose | Tabular + geometry storage | Map tile delivery | Spatial lookup |
| Compression | Zstandard (columnar) | Gzip (per tile) | N/A (column in Parquet) |
| Query engine | DuckDB, pandas, QGIS | MapLibre GL, Leaflet | DuckDB, pandas |
| Server required | No | No (HTTP range requests) | No |
| File extension | .parquet | .pmtiles | .parquet (same file) |
Next steps
Section titled “Next steps”- Build your first pack — Follow the Getting Started guide to create a Spatial Pack with all three formats.
- Explore the schema — See the Spatial Pack manifest schema for the full list of layer fields including
parquet,pmtiles, andindex.h3_res.