New feature: Get top n columns#55
New feature: Get top n columns#55michaelkonstantinou wants to merge 4 commits intodelftdata:masterfrom
Conversation
|
Hello @Mikhail-Konstantinou , first of all thank you for your contribution, it's great to see contributions from the outside being made. Overall the code looks good! I have a couple of comments for you to take into consideration:
EDIT: After a second look I dropped some of my comments, so I've adjusted the post. |
|
Hello and thanks for your input. I believe the final changes solve both of the issues/suggestions you mentioned
PS. I checked the conflicting files that github complains about, and they are not related to this function. I believe you can merge it easily by selecting the line of code you think is correct |
Resolves #52
As stated in issue #52 , it would be useful to be able to get the top n similar columns when analyzing the data. Since the issue is still open, I decided to add this feature myself as I could use it during my data preprocessing
Solution
This pull request adds two new methods into the metrics.py file
I am not quite sure what exactly the OP wanted or what the team would prefer to, but at least a boilerplate is established and in case more information should be added that can be easily modified. (e.g. add float value next to it)
Additional changes
Added a new example to demonstrate the new feature. It uses a different algorithm though as COMA compares the names as well and in this case it might not be much informative
Notes
I hope this is useful. Let me know if you prefer any changes or any additional functionality.