Allow provider-specific data types#529
Conversation
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
…a types off from basic data types.
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
…with "namespace-specific".
|
I have a general question: how to understand, if a prefix denotes a provider-specific datatype, not a general-purpose common datatype. In other words, compare |
|
apparently, by underscore as the first symbol, correct? |
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Provider-specific data type names are only going to appear in |
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
|
@merkys should we merge? |
|
Ok, I'll take a deeper look at this over the next couple of days. |
Likewise, as I want to use this pretty much straight away! |
ml-evs
left a comment
There was a problem hiding this comment.
Thanks for writing this up @merkys!
I only have one general comment; should we encourage the custom type to be described in the description of the fields that use it? Otherwise I don't see where this information would go (even if its only human-readable anyway). I've added two suggestions below along these lines.
rartino
left a comment
There was a problem hiding this comment.
Great work with this!
The one blocking issue I see is that the PR misses a corresponding edit under "Property Definition keys from JSON Schema" -> "type", where in the current edition of the PR (see below) should say that when x-optimade-type is set to a string that indicates a provider-specific datatype, "type" MUST be "string". (I see @ml-evs is "fixing" this on line 2171, but whether or not that is added, the change should also be done on in that part of the specification).
(I couldn't add this as an "edit" comment, it is too far away...)
I would be ok with merging this for string custom datatypes only. However, I think it is worthwhile to pause for one minute here to consider what I wrote in a reply to one of @ml-evs suggestions: can we lightly edit the PR to allow custom data types to be represented using any of the basic, list or dict types?
I foresee that basically the first thing that will happen once we implement this is someone coming along and asking "oh, but in our community we have this data thing that is always represented as an NxM matrix, and for this data the whole community obviously want the ">" operator to mean Z." ...
|
Thanks for reviews, @ml-evs and @rartino! Data type description in both human-readable text and Property Definitions is noncontroversial, thus I will address it later on. Now on the extension beyond the string-based data types. Historically, I drafted this PR having SMILES in mind, thus strings were enough for me. Moreover, we already have strings with internal semantics (timestamps). Now I do not see why we cannot extend this to all basic+list+dict data types, but yes, this needs some more thought. Some issues coming to my mind as I am writing this:
|
|
Just a note as I prepare for the meeting this afternoon: lets discuss the string-based vs. general custom data types at the meeting and decide whether it is fine with just string support for now. @merkys Can you please look at @ml-evs edit suggestion above and act on it (either do it, reply, or close) |
|
During an online discussion on 2025-04-03 it has been decided to allow provider-specific data types to be based on any OPTIMADE data type, not just string. Thus the PR has to be adjusted to accommodate that. |
I have updated the PR to reflect this decision. |
Co-authored-by: Matthew Evans <git@ml-evs.science>
This PR attempts to generalize #436 by allowing to define provider-specific data types. The following conditions for such data types apply:
The latter condition might be too strict for some uses, but since casting is unclear at the moment, let us leave such extensions for future.
Merging this PR will allow to reassign #392 and #436 (SMILES) to
_cheminfo_namespace where it can become a property with special set of rules for comparison.