Convert Shapefiles to GeoParquet

Shapefiles are the most common vector format in GIS, but they have significant limitations: multiple sidecar files, 10-character field name caps, and no built-in compression. GeoParquet solves all three problems in a single, columnar file format that is 5—10x smaller and 10—100x faster to query.

This recipe walks through converting a Shapefile (or GeoJSON file) into GeoParquet using the spatialpack CLI.

Prerequisites

spatialpack CLI installed (pip install -e ".[full]")
A Shapefile (.shp with its sidecar files) or a GeoJSON file

Steps

1. Verify your input data

Confirm your source file exists and check its coordinate reference system (CRS). Shapefiles store CRS information in a .prj sidecar file.

ls ./data/cadastre.*

Expected output:

./data/cadastre.dbf
./data/cadastre.prj
./data/cadastre.shp
./data/cadastre.shx

What just happened: You confirmed all four Shapefile sidecar files are present. Missing a .prj file means the converter will not know the source CRS and may fall back to a default.

2. Run the conversion

Convert the Shapefile to GeoParquet with the default settings. The CLI reprojects to WGS84 (EPSG:4326) by default and validates geometries.

spatialpack convert ./data/cadastre.shp

Expected output:

Converting ./data/cadastre.shp to GeoParquet...
CRS: EPSG:28350 -> EPSG:4326 (reprojected)
Geometries validated: 12,847 features, 0 repaired
Output: ./data/cadastre.parquet (4.2 MB)

What just happened: The CLI read the Shapefile, reprojected all geometries from the source CRS to WGS84, validated each geometry (repairing any invalid ones), and wrote a compressed GeoParquet file. The output is a single .parquet file — no sidecar files needed.

3. Specify a custom output path

Use the -o flag to control where the output file lands. This is useful when building a Spatial Pack with a specific directory structure.

spatialpack convert ./data/cadastre.shp -o ./layers/cadastre.parquet

What just happened: The same conversion ran, but the output was written to ./layers/cadastre.parquet instead of the default location next to the input file.

4. Convert GeoJSON instead of Shapefile

The convert command also accepts GeoJSON files. The workflow is identical.

spatialpack convert ./data/roads.geojson -o ./layers/roads.parquet

What just happened: GeoJSON input is handled the same way as Shapefile input. The CLI auto-detects the format and applies the same CRS normalization and geometry validation.

5. Keep the original CRS

If your downstream tools expect a projected CRS (not WGS84), use the --crs flag to set the target coordinate system.

spatialpack convert ./data/cadastre.shp --crs EPSG:28350 -o ./layers/cadastre_mga50.parquet

What just happened: The converter skipped reprojection because the source and target CRS matched. The output file retains the original MGA Zone 50 coordinates.

6. Batch convert an entire directory

Convert all Shapefiles in a directory at once with the --batch flag.

spatialpack convert ./shapefiles/ --batch

Expected output:

Batch converting 5 files matching *.shp in ./shapefiles/
  [1/5] cadastre.shp -> cadastre.parquet (4.2 MB)
  [2/5] roads.shp -> roads.parquet (18.7 MB)
  [3/5] zoning.shp -> zoning.parquet (2.1 MB)
  [4/5] hydrology.shp -> hydrology.parquet (6.3 MB)
  [5/5] vegetation.shp -> vegetation.parquet (11.4 MB)
Batch complete: 5 files converted

What just happened: The CLI found all .shp files in the directory (using the default --pattern *.shp) and converted each one to GeoParquet. Each output file was written alongside its source file.

7. Verify the output

Read the GeoParquet file with DuckDB or Python to confirm the conversion succeeded.

DuckDB
Python

SELECT count(*) as features, ST_GeometryType(geometry) as geom_type
FROM read_parquet('./layers/cadastre.parquet')
GROUP BY geom_type;

import geopandas as gpd

gdf = gpd.read_parquet("./layers/cadastre.parquet")
print(f"Features: {len(gdf)}")
print(f"CRS: {gdf.crs}")
print(f"Columns: {list(gdf.columns)}")

What just happened: You confirmed the GeoParquet file is readable, contains the expected number of features, and has the correct CRS. The column names are no longer truncated to 10 characters — GeoParquet supports full-length field names.

Next steps

Build a Pipeline from YAML — Use your converted GeoParquet files as inputs to a multi-stage Pipeline
Query Pack Data with DuckDB — Run SQL queries against your GeoParquet layers
CLI convert reference — Full options table for spatialpack convert
Data Formats — Why GeoParquet, PMTiles, and H3 indexing