Convert Shapefiles to GeoParquet
Shapefiles are the most common vector format in GIS, but they have significant limitations: multiple sidecar files, 10-character field name caps, and no built-in compression. GeoParquet solves all three problems in a single, columnar file format that is 5—10x smaller and 10—100x faster to query.
This recipe walks through converting a Shapefile (or GeoJSON file) into GeoParquet using the spatialpack CLI.
Prerequisites
Section titled “Prerequisites”spatialpackCLI installed (pip install -e ".[full]")- A Shapefile (
.shpwith its sidecar files) or a GeoJSON file
1. Verify your input data
Section titled “1. Verify your input data”Confirm your source file exists and check its coordinate reference system (CRS). Shapefiles store CRS information in a .prj sidecar file.
ls ./data/cadastre.*Expected output:
./data/cadastre.dbf./data/cadastre.prj./data/cadastre.shp./data/cadastre.shxWhat just happened: You confirmed all four Shapefile sidecar files are present. Missing a .prj file means the converter will not know the source CRS and may fall back to a default.
2. Run the conversion
Section titled “2. Run the conversion”Convert the Shapefile to GeoParquet with the default settings. The CLI reprojects to WGS84 (EPSG:4326) by default and validates geometries.
spatialpack convert ./data/cadastre.shpExpected output:
Converting ./data/cadastre.shp to GeoParquet...CRS: EPSG:28350 -> EPSG:4326 (reprojected)Geometries validated: 12,847 features, 0 repairedOutput: ./data/cadastre.parquet (4.2 MB)What just happened: The CLI read the Shapefile, reprojected all geometries from the source CRS to WGS84, validated each geometry (repairing any invalid ones), and wrote a compressed GeoParquet file. The output is a single .parquet file — no sidecar files needed.
3. Specify a custom output path
Section titled “3. Specify a custom output path”Use the -o flag to control where the output file lands. This is useful when building a Spatial Pack with a specific directory structure.
spatialpack convert ./data/cadastre.shp -o ./layers/cadastre.parquetWhat just happened: The same conversion ran, but the output was written to ./layers/cadastre.parquet instead of the default location next to the input file.
4. Convert GeoJSON instead of Shapefile
Section titled “4. Convert GeoJSON instead of Shapefile”The convert command also accepts GeoJSON files. The workflow is identical.
spatialpack convert ./data/roads.geojson -o ./layers/roads.parquetWhat just happened: GeoJSON input is handled the same way as Shapefile input. The CLI auto-detects the format and applies the same CRS normalization and geometry validation.
5. Keep the original CRS
Section titled “5. Keep the original CRS”If your downstream tools expect a projected CRS (not WGS84), use the --crs flag to set the target coordinate system.
spatialpack convert ./data/cadastre.shp --crs EPSG:28350 -o ./layers/cadastre_mga50.parquetWhat just happened: The converter skipped reprojection because the source and target CRS matched. The output file retains the original MGA Zone 50 coordinates.
6. Batch convert an entire directory
Section titled “6. Batch convert an entire directory”Convert all Shapefiles in a directory at once with the --batch flag.
spatialpack convert ./shapefiles/ --batchExpected output:
Batch converting 5 files matching *.shp in ./shapefiles/ [1/5] cadastre.shp -> cadastre.parquet (4.2 MB) [2/5] roads.shp -> roads.parquet (18.7 MB) [3/5] zoning.shp -> zoning.parquet (2.1 MB) [4/5] hydrology.shp -> hydrology.parquet (6.3 MB) [5/5] vegetation.shp -> vegetation.parquet (11.4 MB)Batch complete: 5 files convertedWhat just happened: The CLI found all .shp files in the directory (using the default --pattern *.shp) and converted each one to GeoParquet. Each output file was written alongside its source file.
7. Verify the output
Section titled “7. Verify the output”Read the GeoParquet file with DuckDB or Python to confirm the conversion succeeded.
SELECT count(*) as features, ST_GeometryType(geometry) as geom_typeFROM read_parquet('./layers/cadastre.parquet')GROUP BY geom_type;import geopandas as gpd
gdf = gpd.read_parquet("./layers/cadastre.parquet")print(f"Features: {len(gdf)}")print(f"CRS: {gdf.crs}")print(f"Columns: {list(gdf.columns)}")What just happened: You confirmed the GeoParquet file is readable, contains the expected number of features, and has the correct CRS. The column names are no longer truncated to 10 characters — GeoParquet supports full-length field names.
Next steps
Section titled “Next steps”- Build a Pipeline from YAML — Use your converted GeoParquet files as inputs to a multi-stage Pipeline
- Query Pack Data with DuckDB — Run SQL queries against your GeoParquet layers
- CLI convert reference — Full options table for
spatialpack convert - Data Formats — Why GeoParquet, PMTiles, and H3 indexing