Expanding functionality for computing V#461
Conversation
…lculation. Since this requires knowing the number of folds, added n_folds as RDM descriptor in build_rdm if crossnobis is used.
…lculation. Since this requires knowing the number of folds, added n_folds as RDM descriptor in build_rdm if crossnobis is used.
…tion index is saved instead of the integer list
|
I fixed two small bugs that made the tests fail on this pull request. Now things appear to run. In terms of the connection to the other methods:
|
# Conflicts: # src/rsatoolbox/rdm/compare.py
Indeed they end up being almost identical, if I'm not wrong: for crossnobis you divide the noise term by M-1, for euclidean/mahalanobis you divide by M. I'm surprised it ended up being so simple but but simulations seem to check out. I've tweaked the code to account for this. If this seems too good to be true please let me know and I'll re-check the math.
Great suggestions and sounds good. I've tweaked the functions in compare.py accordingly. There are a couple remaining wrinkles: 1) comparing RDMs that use negative riemannian distance still takes sigma_k as an argument but doesn't use it in service of computing V, so this is one area where the new control flow breaks down (any thoughts on how to handle?), and 2) for computing whitened correlation/cosine there is some functionality where you provide a 1-D vector for sigma_k (which I believe yields a diagonal matrix with the specified variances). Under the new setup where the user-facing argument is v rather than sigma_k, it seems like this would have to be removed or relocated to elsewhere. One other question: do we want to keep get_v in compare.py, or should we move it to noise.py (or even add it to init somewhere so it can be called without the user having to go too deep down the hierarchy)? As it's now a user-facing function we might want to make it easy to find. |
This pull request is for expanding the functionality for computing V, the variance-covariance matrix of dissimilarity estimates. The reason for this is that the current toolbox does not allow for incorporating the estimates of the true distances between patterns into the calculation of V, but rather only allows for assuming these distances to be zero when estimating V. This is often a practical assumption, but 1) it might be nice to at least give users this option, and 2) in the case of Framed RSA, it ends up being an essential step.
This raises the question of how best to incorporate this into the toolbox. Relevant considerations are as follows:
Heiko has very nicely derived the proper formula for V in the case of the crossnobis distance. This formula is the same as previous formulas for the signal-independent term, but differs in the signal-dependent term. It seems like it would be best to have a single function for computing V, with the option of either including or not including the true distances. I coded up a version where an RDM can be optionally provided for this purpose, along with an optional binary mask (np.array) if the user wishes to incorporate some distances into the estimation of V, but not others (in the case of Framed RSA, this is because there are "frozen" patterns that have no noise and that are of special interest).
For completeness, it seems like the function ought to be able to handle euclidean/mahalanobis distance as well, not just crossnobis. It's not obvious to me whether this falls immediately out of the formula for V with crossnobis distance. If it is also not obvious to others then I will sit down and try to work through the math.
In a separate (and in principle independent) pull request, I've implemented a function for computing sigma_k from a dataset, which doesn't currently exist in the toolbox.
We will have to figure out how this new functionality ought to plug into other parts of the toolbox. In particular: currently the
comparefunction takes an optional sigma_k argument when whitened RDM comparators are used, and this sigma_k is used to compute V under the hood (invisibly to the user) and whiten the RDMs being compared. This raises the question: if we want to give new options for computing V, how should this be made available to the user?One option would be to add new optional arguments to the
comparefunction (e.g., for whether or not to incorporate the true distances when estimating V, and an optional mask for specifying which distances). However, this has the potential downside of adding argument bloat to a core function in the toolbox that's only relevant when using whitened RDM comparators. I wonder about the viability of an alternative option: instead of making sigma_k an argument incompare, we instead make V an optional argument. This allows for the additional arguments to be contained in a separateget_vfunction (instead of adding bloat tocompare), while still enabling the same default behavior (i.e., the default option for V would be equivalent to using sigma_k=identity, and not incorporating the true distances). I am open to other options but those are just some initial thoughts.