Skip to content

Expanding functionality for computing V#461

Open
johnmarktaylor91 wants to merge 10 commits intomainfrom
jmt-new_get_v
Open

Expanding functionality for computing V#461
johnmarktaylor91 wants to merge 10 commits intomainfrom
jmt-new_get_v

Conversation

@johnmarktaylor91
Copy link
Collaborator

This pull request is for expanding the functionality for computing V, the variance-covariance matrix of dissimilarity estimates. The reason for this is that the current toolbox does not allow for incorporating the estimates of the true distances between patterns into the calculation of V, but rather only allows for assuming these distances to be zero when estimating V. This is often a practical assumption, but 1) it might be nice to at least give users this option, and 2) in the case of Framed RSA, it ends up being an essential step.

This raises the question of how best to incorporate this into the toolbox. Relevant considerations are as follows:

  1. Heiko has very nicely derived the proper formula for V in the case of the crossnobis distance. This formula is the same as previous formulas for the signal-independent term, but differs in the signal-dependent term. It seems like it would be best to have a single function for computing V, with the option of either including or not including the true distances. I coded up a version where an RDM can be optionally provided for this purpose, along with an optional binary mask (np.array) if the user wishes to incorporate some distances into the estimation of V, but not others (in the case of Framed RSA, this is because there are "frozen" patterns that have no noise and that are of special interest).

  2. For completeness, it seems like the function ought to be able to handle euclidean/mahalanobis distance as well, not just crossnobis. It's not obvious to me whether this falls immediately out of the formula for V with crossnobis distance. If it is also not obvious to others then I will sit down and try to work through the math.

  3. In a separate (and in principle independent) pull request, I've implemented a function for computing sigma_k from a dataset, which doesn't currently exist in the toolbox.

  4. We will have to figure out how this new functionality ought to plug into other parts of the toolbox. In particular: currently the compare function takes an optional sigma_k argument when whitened RDM comparators are used, and this sigma_k is used to compute V under the hood (invisibly to the user) and whiten the RDMs being compared. This raises the question: if we want to give new options for computing V, how should this be made available to the user?

One option would be to add new optional arguments to the compare function (e.g., for whether or not to incorporate the true distances when estimating V, and an optional mask for specifying which distances). However, this has the potential downside of adding argument bloat to a core function in the toolbox that's only relevant when using whitened RDM comparators. I wonder about the viability of an alternative option: instead of making sigma_k an argument in compare, we instead make V an optional argument. This allows for the additional arguments to be contained in a separate get_v function (instead of adding bloat to compare), while still enabling the same default behavior (i.e., the default option for V would be equivalent to using sigma_k=identity, and not incorporating the true distances). I am open to other options but those are just some initial thoughts.

JohnMark Taylor and others added 7 commits October 25, 2025 09:10
…lculation. Since this requires knowing the number of folds, added n_folds as RDM descriptor in build_rdm if crossnobis is used.
…lculation. Since this requires knowing the number of folds, added n_folds as RDM descriptor in build_rdm if crossnobis is used.
…tion index is saved instead of the integer list
@HeikoSchuett
Copy link
Contributor

I fixed two small bugs that made the tests fail on this pull request. Now things appear to run.
I am sorry about the automatic style fixes I added. This makes the pull request a bit messy to look at now.

In terms of the connection to the other methods:

  • for Euclidean and Mahalanobis distances the variances are quite similar (and easier to derive). I am not sure where we have a good written out version of this equation, but we should have one as in Jörns paper.
  • using v directly as an input is ok from my point of view. The thing that is important to preserve is the default behaviour and that this default runs faster. If we assume independent measurements of the patterns (sigma_k = identity) the equivalence to CKA gives a much faster solution, which we should keep using! Perhaps this is also something worth checking for your framed RSA stuff.

@johnmarktaylor91
Copy link
Collaborator Author

for Euclidean and Mahalanobis distances the variances are quite similar (and easier to derive). I am not sure where we have a good written out version of this equation, but we should have one as in Jörns paper.

Indeed they end up being almost identical, if I'm not wrong: for crossnobis you divide the noise term by M-1, for euclidean/mahalanobis you divide by M. I'm surprised it ended up being so simple but but simulations seem to check out. I've tweaked the code to account for this. If this seems too good to be true please let me know and I'll re-check the math.

using v directly as an input is ok from my point of view. The thing that is important to preserve is the default behaviour and that this default runs faster. If we assume independent measurements of the patterns (sigma_k = identity) the equivalence to CKA gives a much faster solution, which we should keep using! Perhaps this is also something worth checking for your framed RSA stuff.

Great suggestions and sounds good. I've tweaked the functions in compare.py accordingly. There are a couple remaining wrinkles: 1) comparing RDMs that use negative riemannian distance still takes sigma_k as an argument but doesn't use it in service of computing V, so this is one area where the new control flow breaks down (any thoughts on how to handle?), and 2) for computing whitened correlation/cosine there is some functionality where you provide a 1-D vector for sigma_k (which I believe yields a diagonal matrix with the specified variances). Under the new setup where the user-facing argument is v rather than sigma_k, it seems like this would have to be removed or relocated to elsewhere.

One other question: do we want to keep get_v in compare.py, or should we move it to noise.py (or even add it to init somewhere so it can be called without the user having to go too deep down the hierarchy)? As it's now a user-facing function we might want to make it easy to find.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants