-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
- What's wrong?
On importing CSV and XLSX files the File widget sometimes get confused on what data type the columns are, by setting columns which only contain numerical values as categorical. One then has to manually set these to be numeric, and this has to be done for each individual column.
- How can we reproduce the problem?
Add a File to the canvas, use it to read a data file, check the column data type assignments.
I attach a sample data file that triggers the problem. As can be seen in the attached screen shot, several of the columns are interpreted as categorical even though the data in them clearly are numeric. This may be due to the data being sparse, but funnily enough entirely empty columns are correctly parsed as numeric. I think if all present data items in a column are numeric, the column should be interpreted as numeric, it should be quite rare to have floating point values in a categorical variable. It’s also clear that all values in a column are read, as they are shown in the table, so proper parsing should not require extra reading passes.
It would also be quite convenient if one could select multiple columns at once and re-encode them as the appropriate type, as this would be faster than having to go through each column individually. (This might of course require extensive surgery in the user interface.)
- What's your environment?
- Operating system: macOS 11.1
- Orange version: 3.27.1
- How you installed Orange: Disk image at https://orangedatamining.com/download/#macos
