Skip to content

Conversation

@repro-code
Copy link

@repro-code repro-code commented Jan 13, 2026

Summary

  • Add test/links.txt with 127 small parquet files (<2MB) from geoarrow.org/data.html
  • Update runtests.jl to download geoarrow test files to data/geoarrow/
  • Add testset to verify GeoParquet.read() can parse all files, check :geometry column exists, and run GI.testgeometry on first geometry
  • Change GeoParquet.read to warn instead of error when GeoParquet metadata is missing, returning the DataFrame without geometry parsing
  • Add using WellKnownGeometry in tests to enable proper WKB geometry handling

Test files included

Files cover various geometry types and encodings:

  • Natural Earth (12 files): cities, countries, countries-geography, countries-bounds
  • Quadrangles (3 files): 100k quadrangles
  • CRS Examples (11 files): Vermont with different CRS encodings
  • Geometry Examples (98 files): Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, Geometry, GeometryCollection with Z/M/ZM variants
  • NS Water (3 files): water-point data (~1.7-1.9MB)

Each dataset has _geo (GeoArrow encoding) and _native (native Arrow encoding) variants where applicable.

Changes to GeoParquet.read behavior

Files without valid GeoParquet metadata now emit a warning and return the DataFrame as-is, rather than throwing an error. This allows reading plain parquet files that happen to have geometry columns but no geo metadata.

Test plan

  • Verify all 127 files download successfully
  • Check which files fail to parse and categorize errors
  • Document any parsing issues for future fixes

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants