-
Notifications
You must be signed in to change notification settings - Fork 24
Description
I think that we need an AbstractCategoricalDiskArray to better support categorical variables stored on disk. This would for example make it easier to use the nc_enum types in netCDF files. Related to JuliaGeo/NCDatasets.jl#143
A concrete example of a categorical variables stored on disk is in this netCDF cloud mask (requires free sign up to download). Here there is a variable named "cloud_state" which is an nc_enum. The raw values are stored as Int8, the name of the enum is "cloud_mask" and the following mapping can be read from the variable:
Dict{Int8, String} with 10 entries:
0 => "Not processed (no or corrupt data)"
4 => "Dust contaminated"
5 => "Dust filled (opaque)"
6 => "Ash contaminated"
2 => "Cloud contaminated (partial or semitransparent cloud)"
7 => "Ash filled (opaque)"
9 => "Undefined"
8 => "Snow or ice contaminated"
3 => "Cloud filled (opaque cloud filled)"
1 => "Cloud free (no cloud, snow or ice)"Implementing an AbstractCategoricalDiskArray would make it easier to support these types of variable in NCDatasets.jl. My initial idea is to make it compatible with CategoricalArrays.jl or even an extension.
Idea for the interface
AbstractCategoricalDiskArray{T, N, R} <: AbstractDiskArray{CategoricalArrays.CategoricalValue{T,R},N}
## methods to implement for abstract type
Base.collect(a::AbstractCategoricalDiskArray{T, N, R})::CategoricalArrays.CategoricalArray{T,N,R}
Base.getindex(a::AbstractCategoricalDiskArray{T, N1, R}, inds...)::CategoricalArrays.CategoricalArray{T,N2,R}
Base.setindex!(a::AbstractCategoricalDiskArray, v::CategoricalArrays.CategoricalArray, i...)
## methods to implement for concrete type
getvaluearray(a::AbstractCategoricalDiskArray{T, N, R})::AbstractDiskArray{R,N}
getmapping(a::AbstractCategoricalDiskArray{T, N, R})::AbstractDict{R,T}
Note that using CategoricalArrays comes with the following limitations {T <: Union{AbstractChar, AbstractString, Number}, R <: Integer}. Also CategoricalArrays has some restrictions regarding the mapping so some conversion might be need.