Skip to content

Conversation

@thinkingfish
Copy link
Member

Claude did most of it. My main question is if there is a way to make the type/array conversions less verbose than the existing method, if not, then the general code quality seems fine.

@mihirn
Copy link
Collaborator

mihirn commented Dec 8, 2025

I think the cleaner way of doing type conversions (at least in one's own code, I suspect it does something similar in the library) is to use the arrow_json::writer (https://docs.rs/arrow-json/57.1.0/arrow_json/writer/struct.Writer.html) on the RecordBatch one is iterating over.

I think one can filter the columns either during loading using a ProjectionMask (https://arrow.apache.org/rust/parquet/arrow/struct.ProjectionMask.html) or after loading using filter_record_batch (https://arrow.apache.org/rust/arrow_select/filter/fn.filter_record_batch.html), but I haven't tried any of them myself.

If we don't want to experiment with either of those, I don't know a cleaner way, and this is fine (though I suspect it will baulk with larger files since it is keeping all the data in memory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants