Refactoring node and edge attribute key API#244
Conversation
- Add AttrSchema dataclass with defensive copying - Create process_attr_key_args() helper to eliminate duplication - Refactor dtype inference and SQLAlchemy mapping to use dictionaries - Add overloaded method signatures for convenience and schema modes - Replace _node/edge_attr_keys lists with _attr_schemas dicts - Add comprehensive test suite (16 tests passing) Breaking change: dtype parameter now required. Next phase will update all calling sites throughout the codebase.
…add_edge_attr_key This is a breaking change that requires dtype (polars.DataType) to be explicitly specified when adding attribute keys to graphs. Core changes: - Add AttrSchema dataclass to store key, dtype, and default_value together - Add utility functions: infer_default_value_from_dtype, polars_dtype_to_sqlalchemy_type, validate_default_value_dtype_compatibility, and process_attr_key_args - Update BaseGraph abstract methods with overloaded signatures supporting both direct parameters and AttrSchema objects - Replace _node_attr_keys/_edge_attr_keys lists with _node_attr_schemas dicts in RustWorkXGraph and SQLGraph - Remove _boolean_columns tracking in SQLGraph (now uses AttrSchema) - Fix _polars_schema_override to use == instead of isinstance for pl.Boolean check Backend implementations: - RustWorkXGraph: Use process_attr_key_args helper, store AttrSchema objects - SQLGraph: Use process_attr_key_args helper, pass AttrSchema to _add_new_column - GraphView: Delegate to root graph with all parameters preserved - Add dtype inference fallback for complex objects that polars can't parse Updated all callers: - Graph methods: from_other(), match() - Node operators: RandomNodes, RegionPropsNodes, GenericNodesOperator - Edge operators: DistanceEdges, GenericEdgesOperator - Solvers: ILPSolver, NearestNeighborsSolver - IO: numpy_array, ctc
Update all test files to use the new required dtype parameter for add_node_attr_key and add_edge_attr_key methods. Key test fixes: - Add dtype inference logic in test_add_node_attr_key based on value type - Fix bbox attribute calls to use pl.Array(pl.Int64, size) instead of passing numpy arrays as dtype - Add missing polars imports where needed - Fix argument order from (key, False, dtype=pl.Boolean) to (key, pl.Boolean, default_value=False)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #244 +/- ##
==========================================
- Coverage 88.35% 88.15% -0.21%
==========================================
Files 56 56
Lines 4104 4203 +99
Branches 714 733 +19
==========================================
+ Hits 3626 3705 +79
- Misses 291 311 +20
Partials 187 187 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @TeunHuijben and @yfukai, I would appreciate it if you both could review this PR, at least the public API, since it involves breaking changes. |
| rx_graph = self.rx_graph | ||
| for node_id in rx_graph.node_indices(): | ||
| rx_graph[node_id][key] = default_value | ||
| rx_graph[node_id][schema.key] = schema.default_value |
There was a problem hiding this comment.
Do we really need to add the default_value?
This could be solved when returning the attrs value.
There was a problem hiding this comment.
Yeah I think returning the attrs value for unset attribute would be better.
| # Add to all existing edges | ||
| for edge_attr in self.rx_graph.edges(): | ||
| edge_attr[key] = default_value | ||
| edge_attr[schema.key] = schema.default_value |
|
Haven't personally tested the code, but the API looks good! |
|
Hi @yfukai , this should be ready for review. |
|
Hi sorry @TeunHuijben I'm still reviewing it, I'll finish that within this weekend! |
|
Hi @yfukai, no worries, take your time! Was just wondering whether your thumbs-up meant approval or whether you were reviewing. Thank you! 😊 |
There was a problem hiding this comment.
This looks much nicer than what we currently have, thanks @JoOkuma!
I think this PR is good to merge, but
- It would be even better if the dtype does not change with a round-trip with
from_other. For this purpose, maybe we can store theAttrSchemas into a read-only metadata. For RustworkXGraph, maybe we can skip setting the default value to the rx_graph and return the default value for unset attr to save the initialization time.→ I found this addressed in the next PR, thanks!
| rx_graph = self.rx_graph | ||
| for node_id in rx_graph.node_indices(): | ||
| rx_graph[node_id][key] = default_value | ||
| rx_graph[node_id][schema.key] = schema.default_value |
There was a problem hiding this comment.
Yeah I think returning the attrs value for unset attribute would be better.
|
Awesome! Thanks @TeunHuijben and @yfukai @yfukai, I added checking if the schema matches when calling Can you also take a quick look at #246 if you've time? |
Closes #193 and #243