-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Current State
MAP types are currently supported in data parsing (see R/presto.field.R lines 107-129 for MAP type handling), but there are significant usability gaps:
- No convenient unnesting function: Users must manually write SQL to unnest MAP columns into key-value pairs
- Limited MAP manipulation: While MAP types can be parsed, there's no high-level R function to work with them
- No examples or documentation: Lack of clear examples showing how to work with MAP types in RPresto
Current MAP Support:
- MAP types are parsed correctly in
R/presto.field.R(lines 107-129) - MAP data is organized in
organize_map_type_data()(lines 353-383) - MAP keys and values are extracted and stored as lists in R
What's Missing:
map_unnest()function to convert MAP columns to key-value pairs- Integration with dplyr workflows
- Documentation and examples
Detailed Improvements
1. Create map_unnest() Helper Function
Purpose: Convert MAP columns into key-value pairs, similar to how presto_unnest() works for arrays.
Function Signature:
map_unnest(data, cols, ..., key_to = NULL, value_to = NULL)Parameters:
data: Atbl_prestoobjectcols: Column(s) to unnest (can be multiple MAP columns)key_to: Name for the key column (default:"{col}_key")value_to: Name for the value column (default:"{col}_value")
Implementation Steps:
-
Create new file
R/map_unnest.R:- Follow the pattern from
R/presto_unnest.R - Create
map_unnest()generic andmap_unnest.tbl_presto()method - Create
lazy_map_unnest_queryclass similar tolazy_unnest_query
- Follow the pattern from
-
SQL Generation:
- Use
MAP_ENTRIES()function (from Feature 2) to convert MAP to ARRAY(ROW(key, value)) - Generate SQL:
CROSS JOIN UNNEST(MAP_ENTRIES(map_col)) AS t(key, value) - Handle multiple MAP columns with multiple CROSS JOIN clauses
- Use
-
SQL Example:
-- Single MAP column: SELECT * FROM table_name CROSS JOIN UNNEST(MAP_ENTRIES(map_col)) AS t(key, value) -- Multiple MAP columns: SELECT * FROM table_name CROSS JOIN UNNEST(MAP_ENTRIES(map_col1)) AS t1(key1, value1), UNNEST(MAP_ENTRIES(map_col2)) AS t2(key2, value2)
-
Implementation Pattern (similar to
presto_unnest()):map_unnest <- function(data, cols, ..., key_to = NULL, value_to = NULL) { UseMethod("map_unnest") } map_unnest.tbl_presto <- function(data, cols, ..., key_to = NULL, value_to = NULL) { # Column selection (similar to presto_unnest lines 71-96) # Store in lazy_map_unnest_query structure # Return modified tbl_presto } # SQL rendering: sql_render.map_unnest_query <- function(query, con, ...) { # Generate CROSS JOIN UNNEST(MAP_ENTRIES(...)) SQL }
-
Handle Column Naming:
- If
key_toorvalue_tonot provided, use defaults:"{col}_key"and"{col}_value" - For multiple columns, generate unique names:
"{col1}_key","{col1}_value","{col2}_key", etc. - Support
names_sepparameter (similar to Feature 1) for consistency
- If
Example Usage:
# Single MAP column
tbl(con, "user_preferences") %>%
map_unnest(preferences) %>%
collect()
# Result: columns include original columns + preferences_key, preferences_value
# Multiple MAP columns
tbl(con, "data") %>%
map_unnest(c(map1, map2), key_to = c("k1", "k2"), value_to = c("v1", "v2")) %>%
collect()
# With dplyr operations
tbl(con, "events") %>%
map_unnest(metadata) %>%
filter(value > 100) %>%
group_by(key) %>%
summarize(count = n())Edge Cases to Handle:
- NULL maps (should produce no rows)
- Empty maps (should produce no rows)
- Maps with NULL keys or values
- Multiple MAP columns with different key/value types
- Very large maps
2. Enhance MAP Operations Integration
Dependencies: Requires Feature 2 MAP function translations to be complete.
Integration Points:
- Ensure
MAP_ENTRIES()works correctly (from Feature 2) - Test
map_unnest()with all MAP manipulation functions:map_keys(),map_values()for extracting keys/valuesmap_filter()for filtering before unnestingmap_concat()for combining maps before unnesting
Example Workflows:
# Filter map before unnesting
tbl(con, "data") %>%
mutate(filtered_map = map_filter(original_map, (k, v) -> v > 10)) %>%
map_unnest(filtered_map) %>%
collect()
# Extract keys only
tbl(con, "data") %>%
mutate(keys = map_keys(my_map)) %>%
presto_unnest(keys) %>%
collect()3. Add MAP Examples and Documentation
Documentation to Add:
-
Function Documentation (
R/map_unnest.R):- Add
@description,@param,@return,@examplessections - Include examples showing common use cases
- Document edge cases and limitations
- Add
-
Vignette or Examples:
- Create examples in existing vignettes or new
vignettes/map-types.Rmd - Show common MAP operations:
- Unnesting MAP columns
- Filtering maps
- Combining maps
- Working with nested MAP types (MAP within MAP)
- Include real-world use cases
- Create examples in existing vignettes or new
-
README Updates:
- Add MAP type section to README
- Show quick examples of
map_unnest()
Example Vignette Content:
# Working with MAP Types in RPresto
## Basic MAP Unnesting
# Create a table with MAP column
DBI::dbExecute(con, "
CREATE TABLE user_scores AS
SELECT
user_id,
MAP(ARRAY['math', 'science', 'english'],
ARRAY[95, 87, 92]) AS scores
FROM (VALUES (1), (2)) AS t(user_id)
")
# Unnest the MAP
tbl(con, "user_scores") %>%
map_unnest(scores, key_to = "subject", value_to = "score") %>%
collect()
## Filtering Maps Before Unnesting
tbl(con, "user_scores") %>%
mutate(high_scores = map_filter(scores, (k, v) -> v > 90)) %>%
map_unnest(high_scores) %>%
collect()
## Working with Multiple MAP Columns
tbl(con, "data") %>%
map_unnest(c(metadata, tags)) %>%
collect()Files to Create/Modify
New Files
R/map_unnest.R:- Main implementation file
- Should follow structure similar to
R/presto_unnest.R - Include:
map_unnest()genericmap_unnest.tbl_presto()methodlazy_map_unnest_queryclassop_vars.lazy_map_unnest_query()methodop_grps.lazy_map_unnest_query()methodop_sort.lazy_map_unnest_query()methodop_frame.lazy_map_unnest_query()methodsql_build.lazy_map_unnest_query()methodsql_render.map_unnest_query()method
Modified Files
R/dbplyr-sql.R:- Ensure
MAP_ENTRIEStranslation exists (from Feature 2) - Verify all MAP functions work correctly
- Ensure
Test Files
tests/testthat/test-map_unnest.R:- Comprehensive test suite for
map_unnest() - Test single and multiple MAP columns
- Test with dplyr operations (filter, group_by, summarize, etc.)
- Test edge cases (NULL maps, empty maps, etc.)
- Test integration with MAP manipulation functions
- Comprehensive test suite for
Documentation Files
vignettes/map-types.Rmd(new):- Comprehensive guide to working with MAP types
- Examples and use cases
man/map_unnest.Rd:- Function documentation (auto-generated from roxygen)
Implementation Checklist
Phase 1: Core Implementation
- Create
R/map_unnest.Rfile structure - Implement
map_unnest()generic - Implement
map_unnest.tbl_presto()method - Create
lazy_map_unnest_queryclass - Implement
op_vars.lazy_map_unnest_query() - Implement
op_grps.lazy_map_unnest_query() - Implement
op_sort.lazy_map_unnest_query() - Implement
op_frame.lazy_map_unnest_query() - Implement
sql_build.lazy_map_unnest_query() - Implement
sql_render.map_unnest_query()
Phase 2: Column Handling
- Support single MAP column
- Support multiple MAP columns
- Implement
key_toparameter - Implement
value_toparameter - Handle default column naming
- Support
names_sepparameter (for consistency with Feature 1)
Phase 3: SQL Generation
- Generate correct SQL for single MAP column
- Generate correct SQL for multiple MAP columns
- Handle subqueries correctly (wrap in parentheses when needed)
- Handle CTEs correctly
- Ensure proper column quoting
Phase 4: Integration
- Test with
filter() - Test with
group_by()andsummarize() - Test with
arrange() - Test with window functions
- Test with CTEs
- Test with MAP manipulation functions (map_keys, map_values, etc.)
Phase 5: Edge Cases
- Handle NULL maps
- Handle empty maps
- Handle maps with NULL keys
- Handle maps with NULL values
- Handle very large maps
- Handle nested MAP types (MAP within MAP)
Phase 6: Testing
- Write unit tests for basic functionality
- Write integration tests with live connections
- Write tests for edge cases
- Write tests for dplyr integration
- Ensure all tests pass
Phase 7: Documentation
- Add roxygen documentation to
map_unnest() - Add examples to function documentation
- Create
vignettes/map-types.Rmd(or add to existing vignette) - Update README with MAP examples
- Ensure documentation builds correctly
Testing Requirements
Unit Tests
-
Basic Functionality:
- Single MAP column unnesting
- Multiple MAP column unnesting
- Custom
key_toandvalue_tonames - Default column naming
-
SQL Generation:
- Verify SQL output is correct
- Test with subqueries
- Test with CTEs
- Test column quoting
-
Edge Cases:
- NULL maps
- Empty maps
- Maps with NULL keys/values
- Multiple MAP columns with different types
Integration Tests
-
dplyr Integration:
map_unnest()+filter()map_unnest()+group_by()+summarize()map_unnest()+arrange()map_unnest()+ window functionsmap_unnest()+ CTEs
-
MAP Function Integration:
map_filter()+map_unnest()map_keys()+presto_unnest()map_values()+presto_unnest()map_concat()+map_unnest()
-
Real Data Tests:
- Test with actual Presto tables containing MAP columns
- Test with various MAP key/value types
- Test performance with large maps
Test Pattern
test_that("map_unnest works with single MAP column", {
conn <- setup_live_connection()
# Create test table
DBI::dbExecute(conn, "
CREATE TABLE test_map AS
SELECT MAP(ARRAY['a', 'b'], ARRAY[1, 2]) AS my_map
")
on.exit(DBI::dbRemoveTable(conn, "test_map"), add = TRUE)
# Test unnesting
result <- tbl(conn, "test_map") %>%
map_unnest(my_map) %>%
collect()
expect_equal(nrow(result), 2L)
expect_equal(names(result), c("my_map", "my_map_key", "my_map_value"))
expect_equal(sort(result$my_map_key), c("a", "b"))
})Acceptance Criteria
- ✅
map_unnest()function exists and works for single MAP columns - ✅
map_unnest()supports multiple MAP columns - ✅
key_toandvalue_toparameters work correctly - ✅ Default column naming works as expected
- ✅ SQL generation is correct for all cases
- ✅ Works with all dplyr operations (filter, group_by, summarize, etc.)
- ✅ Works with CTEs and subqueries
- ✅ Handles edge cases (NULL, empty maps) correctly
- ✅ Integration with MAP manipulation functions works
- ✅ Comprehensive test suite passes
- ✅ Documentation is complete with examples
- ✅ Vignette or examples demonstrate common use cases
Dependencies
- Feature 2: Must have
MAP_ENTRIEStranslation implemented - Feature 2: All MAP function translations should be complete
- Can be developed in parallel but requires Feature 2 to be complete for full functionality
Related
Part of RPresto Improvement Plan - Feature 3