Skip to content

Convert Shapefiles to GeoParquet

Shapefiles are the most common vector format in GIS, but they have significant limitations: multiple sidecar files, 10-character field name caps, and no built-in compression. GeoParquet solves all three problems in a single, columnar file format that is 5—10x smaller and 10—100x faster to query.

This recipe walks through converting a Shapefile (or GeoJSON file) into GeoParquet using the spatialpack CLI.

  • spatialpack CLI installed (pip install -e ".[full]")
  • A Shapefile (.shp with its sidecar files) or a GeoJSON file

Confirm your source file exists and check its coordinate reference system (CRS). Shapefiles store CRS information in a .prj sidecar file.

Terminal window
ls ./data/cadastre.*

Expected output:

./data/cadastre.dbf
./data/cadastre.prj
./data/cadastre.shp
./data/cadastre.shx

What just happened: You confirmed all four Shapefile sidecar files are present. Missing a .prj file means the converter will not know the source CRS and may fall back to a default.

Convert the Shapefile to GeoParquet with the default settings. The CLI reprojects to WGS84 (EPSG:4326) by default and validates geometries.

Terminal window
spatialpack convert ./data/cadastre.shp

Expected output:

Converting ./data/cadastre.shp to GeoParquet...
CRS: EPSG:28350 -> EPSG:4326 (reprojected)
Geometries validated: 12,847 features, 0 repaired
Output: ./data/cadastre.parquet (4.2 MB)

What just happened: The CLI read the Shapefile, reprojected all geometries from the source CRS to WGS84, validated each geometry (repairing any invalid ones), and wrote a compressed GeoParquet file. The output is a single .parquet file — no sidecar files needed.

Use the -o flag to control where the output file lands. This is useful when building a Spatial Pack with a specific directory structure.

Terminal window
spatialpack convert ./data/cadastre.shp -o ./layers/cadastre.parquet

What just happened: The same conversion ran, but the output was written to ./layers/cadastre.parquet instead of the default location next to the input file.

The convert command also accepts GeoJSON files. The workflow is identical.

Terminal window
spatialpack convert ./data/roads.geojson -o ./layers/roads.parquet

What just happened: GeoJSON input is handled the same way as Shapefile input. The CLI auto-detects the format and applies the same CRS normalization and geometry validation.

If your downstream tools expect a projected CRS (not WGS84), use the --crs flag to set the target coordinate system.

Terminal window
spatialpack convert ./data/cadastre.shp --crs EPSG:28350 -o ./layers/cadastre_mga50.parquet

What just happened: The converter skipped reprojection because the source and target CRS matched. The output file retains the original MGA Zone 50 coordinates.

Convert all Shapefiles in a directory at once with the --batch flag.

Terminal window
spatialpack convert ./shapefiles/ --batch

Expected output:

Batch converting 5 files matching *.shp in ./shapefiles/
[1/5] cadastre.shp -> cadastre.parquet (4.2 MB)
[2/5] roads.shp -> roads.parquet (18.7 MB)
[3/5] zoning.shp -> zoning.parquet (2.1 MB)
[4/5] hydrology.shp -> hydrology.parquet (6.3 MB)
[5/5] vegetation.shp -> vegetation.parquet (11.4 MB)
Batch complete: 5 files converted

What just happened: The CLI found all .shp files in the directory (using the default --pattern *.shp) and converted each one to GeoParquet. Each output file was written alongside its source file.

Read the GeoParquet file with DuckDB or Python to confirm the conversion succeeded.

SELECT count(*) as features, ST_GeometryType(geometry) as geom_type
FROM read_parquet('./layers/cadastre.parquet')
GROUP BY geom_type;

What just happened: You confirmed the GeoParquet file is readable, contains the expected number of features, and has the correct CRS. The column names are no longer truncated to 10 characters — GeoParquet supports full-length field names.