Thanks for making this wonderful package.
I'm trying to compute the mutual information in high-dimension but the case I am interested in is exceptionally simple and hence there may be a faster method than using the built-in function.
Specifically, I have a function $f(x_{1},\dots,x_{n})$ where $n$ is large and I would like to estimate the mutual information between the random variable $F = f(X_{1},\dots,X_{n})$ and the independent and identically distributed (iid) random variables $X_{1},\dots,X_{n}$ (so $I(F,\dots,X_{n});X_{1},\dots,X_{n})$), given a large number of samples. Given the fact that all but one variable are iid, I'm hoping the calculation simplifies dramatically.
The documentation has a comment which is rather suggestive but I confess I don't really understand what is being said:
"On the other hand, in high-dimensions, KDTree’s are not very good, and you might be reduced to the speed of brute force $N^2$ search for nearest neighbors. If $N$ is too big, I recommend using multiple, random, small subsets of points ($N′ << N$) to estimate entropies, then average over them."
If I have $N$ samples of the form ${F, X_{1},\dots, X_{n} }{i}$, where $i$ runs from 1 to $N$, is this just saying that one should take a subset $M < N$ of these samples and compute the mutual information, do this multiple times, and then average them? Or is it saying to somehow construct the mutual information by some averaging of lower dimension samples ${F, X{1},\dots, X_{m} }_{i}$, where $m < n$. If the former, then why is this advantageous to using the built-in method? If the latter, then how exactly does this work?
However, this question about the documentation may not be relevant. Perhaps there is a more direct way of answering my primary question.
Thanks again for the wonderful code!