ADJUSTMENT by Vannghia69 · Pull Request #10 · schlegelp/tanglegram

Vannghia69 · 2021-06-23T23:03:10Z

Updated:

Redefine the "entanglement" function
Rewrite the "refine" function
Change the name of the new untangle method from "permutations" to "ShUnTan"

Updated: - Redefine the "entanglement" function - Rewrite the "refine" function - Change the name of the new untangle method from "permutations" to "ShUnTan"

schlegelp

Thanks for this PR - there is some good stuff there. Unfortunately, you are also making some changes with great impact on how the library works and what it can be used for that I'm not onboard with (see my comments). I'm happy to discuss options on how to proceed though.

schlegelp · 2021-06-24T07:35:10Z

tanglegram/tangle.py

-                                method=sort, **sort_kwargs)
-
-    fig = pylab.figure(figsize=(8, 8))
+def draw_tanglegram(linkage_1, linkage_2, labels1, labels2, color_by_diff=True, dend_kwargs={}):


Any reason why you dropped the entire docstring and a couple parameters?

Also am I correct in that you want to change the workflow such that people produce the linkage themselves (i.e. no more DataFrames), untangle it and then pass it to the plotting function?

Hi. Sorry for dropping some parameters that change the use of the library. I deleted them because I did not use them, but you are right, there should be other options for users.

Yes. The workflow I wanted is that the users produce the linkages first and then use the untangle methods and "draw" function to get the desired tanglegram layout.

schlegelp · 2021-06-24T07:43:59Z

tanglegram/tangle.py

-def untangle_random_search(link1, link2, labels1, labels2, R=100, L=1.5):
+def untangle_random_search(link1, link2, labels1, labels2, R=1000, L=1.0):
    """Untangle dendrogram using a simple random search.
-


I'm not really on board with you remove the empty lines in the docstrings.

schlegelp · 2021-06-24T07:45:22Z

tanglegram/tangle.py

-                            often one will want to use 0, 1, 1.5 or 2:
-                            ``sum(abs(x-y)^L)``.
-
+def entanglement(link1, link2):


Seems like L is still accepted (and other functions use it as parameter) but entanglement now ignores it and just does squared distance?

This is my mistake. L should be chosen to be 0, 1, 1.5 or 2. Because I always use L = 2 so I make it unchanged. I am fixing it.

schlegelp · 2021-06-24T07:47:34Z

tanglegram/tangle.py

    exist_in_both = list(set(lindex1) & set(lindex2))
+    ix = np.arange(max(len(lindex1), len(lindex2)))

    if not exist_in_both:


If you are not using labels but just indices, then there is no point in checking if they exist in both. Or am I missing something?

The "leaves_list" function returns the list of leaves' indices. So we only work with indices. In doing so we have to assume that the relationship between indices and labels is one-to-one and identical in both trees.

That's fair but I want/need to cater for scenarios where that's not the case.

Totally agree. We should use labels instead of indices.

schlegelp · 2021-06-24T07:48:57Z

tanglegram/tangle.py

-                        index=labelsB)
+    # Mapping the "number" (1 til tree size) in the left tree with the right tree
+    matching_leaf_vector = np.zeros(max(len(lindex1), len(lindex2)))
+    for i in lindex2:


I haven't tested it properly but this for loop can't possibly be faster than the previous array-based solution. Could you elaborate a bit on what the advantage of doing it this way is?

The previous array-based solution is actually not correct. Here we want to match the "number" (1 til tree size) in the left tree with the right tree and then calculate the difference between these numbers in two trees, not to compute the difference between indices. Do you apply such matching with "dict" which I do not understand honestly?

The old calculation leads to different results compared to R language.

I disagree re the existing solution being incorrect - it might not yield exactly the same results as in R but certainly does the same job. Using {label: index} dicts is necessary for scenarios where labels in both dendrograms don't match up 1:1.

I agree it is necessary to use such dictionary. But I am not sure it matches the numbers (from 1 to tree size) in the left tree with the right tree. What I wanted can be illustrated in the following example:
Left tree: A D E C F
Right tree: C D A F E
Giving objects in the left tree numbers from 1 to tree size yields: 1, 2, 3, 4, 5
Matching these numbers with the right tree: 4, 2, 1, 5, 3

That's pretty much what the existing function does with the dict.

Then that's my bad not realizing it.

schlegelp · 2021-06-24T07:53:04Z

tanglegram/tangle.py



-def untangle(link1, link2, labels1, labels2, method='random', L=1.5, **kwargs):
+def untangle(link1, link2, labels1, labels2, method='random', L=2.0, **kwargs):


Looks like labels are still accepted but essentially ignored in favour of just using the linkage. This implies that the labels in each linkage always match perfectly (i.e. index left 1 = index 1 right and so on). This may work for toy examples but will not be true for most real world examples. This change is unfortunately a deal breaker.

It is assumed that the set of objects (labels) in two dendrograms have a one-to-one correspondence. Such case occasionally occurs in real life when we apply different hierarchical clustering algorithms on the same dataset.

Vannghia69 · 2021-06-24T21:16:15Z

tanglegram/tangle.py

@@ -720,7 +620,6 @@ def shuffle_dendogram(link, copy=True):

 def leaf_order(link, labels=None, as_dict=True):


I still find the entanglement acting so weird. I might know the reason. The problem comes from the leaf_order function.

leafs_ix = sclust.hierarchy.leaves_list(link) returns a list of indices of objects as they appear in the dendrogram (these indices are corresponding with the indices in "labels")

if as_dict: if not isinstance(labels, type(None)): return dict(zip(labels, leafs_ix))
matches the labels in "labels" with indices in sclust.hierarchy.leaves_list(link)

However, the orders of objects in "labels" and in the x-axis of the dendrogram are different, so the matching is wrong.

I can give you an example via email.

ADJUSTMENT

356af5a

Updated: - Redefine the "entanglement" function - Rewrite the "refine" function - Change the name of the new untangle method from "permutations" to "ShUnTan"

schlegelp requested changes Jun 24, 2021

View reviewed changes

Vannghia69 commented Jun 24, 2021

View reviewed changes



		def untangle(link1, link2, labels1, labels2, method='random', L=1.5, **kwargs):
		def untangle(link1, link2, labels1, labels2, method='random', L=2.0, **kwargs):

		@@ -720,7 +620,6 @@ def shuffle_dendogram(link, copy=True):

		def leaf_order(link, labels=None, as_dict=True):

Conversation

Vannghia69 commented Jun 23, 2021

Uh oh!

schlegelp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schlegelp Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

schlegelp Jun 24, 2021 •

edited

Loading