Add inner and left joins with a single join condition#19
Add inner and left joins with a single join condition#19
Conversation
| if (s.contains(prefix)) { | ||
| return s.replace(prefix, ""); | ||
| } | ||
| if (s != null && prefix != null) if (s.contains(prefix)) { |
| } | ||
| else | ||
| throw new IllegalArgumentException("FILE: " + fileName + " FOR RDFREADER DOES NOT EXIST"); | ||
| if (fileName != null) if ((new File(fileName)).exists()) { |
There was a problem hiding this comment.
I don't like the consecutive if on the same line, but I assume this is automatically done by some plugin
There was a problem hiding this comment.
Great work on the test cases!
| } | ||
| } | ||
|
|
||
| public List<Map<String, String>> renameDataframeColumn(List<Map<String, String>> dataFrame, String oldColumn, String newColumn) { |
There was a problem hiding this comment.
I would try to use the Java Stream API. Check the last message here: https://chat.openai.com/share/9e8482b2-3c1b-4ddc-b6f4-f131ce39f2f0
| return leftJoin(leftTable, rightTable, key, key); | ||
| } | ||
|
|
||
| public List<Map<String, String>> leftJoin(List<Map<String, String>> leftTable, List<Map<String, String>> rightTable, String leftKey, String rightKey) { |
There was a problem hiding this comment.
I have to check the code in details but I would use the Java Stream API, check https://chat.openai.com/share/9e8482b2-3c1b-4ddc-b6f4-f131ce39f2f0
| return innerJoin(leftTable, rightTable, key, key); | ||
| } | ||
|
|
||
| public List<Map<String, String>> innerJoin(List<Map<String, String>> leftTable, List<Map<String, String>> rightTable, String leftKey, String rightKey) { |
There was a problem hiding this comment.
Is it possible to refactor and reuse portion of the code between the two joins?
There was a problem hiding this comment.
Implement a default behaviour if no condition is provided? i.e., cross-join btw the dataframes
There was a problem hiding this comment.
I would add a separate crossJoin method. If the user knows that he wants to perform a crossJoin he would have to do so explicitly. Additionally this would make the template easier to reason about. Someone reading the template knows that i am doing a crossJoin and does not have to know that it is the behaviour of an innerJoin when no condition is specified. From what i gathered it is non-standard to do a cross join when no condition is given to a leftJoin. WDYT?
There was a problem hiding this comment.
I agree on having a separate method to do it explicitly but I would also add a method .join(table1,table2,key,key) that behaves like JOIN in SQL, i.e., that (if I got it correctly):
- performs by default an inner join if the conditions are provided
- performs by default a cross-join if the conditions are not provided (null/empty keys)
Implemented functionality in a VTL template to facilitate left and inner joins between two dataframes. Users can now utilize
$functions.leftJoin($df1, $df2, "key of df1", "key of df2")or$functions.innerJoin(...)within the template.For convenience, both functions support a "shortcut" when the joining column names are identical in both dataframes. This shortcut can be invoked as
$function.leftJoin($df1, $df2, "key").In cases where the resulting dataframe from the join operation contains multiple columns with the same name, the system now generates a runtime exception. This exception notifies the user regarding which columns need renaming in the input dataframes and instructs them to use the
renameDataframeColumnfunction.The renameDataframeColumn function is accessible from the VTL through
$functions.renameDataframeColumn($df, "oldColumn", "newColumn").One thing that is not addressed in this PR is the possibility of specifying multiple join conditions to be evaluated as a chain of ANDs. It should however be relatively straightforward to implement at a later date should we need it.