Skip to content

[BUG] UpdateMaxCounts can contaminate the sorted state #7

@Validark

Description

@Validark

The UpdateMaxCounts function updates the termFrequencyCountChildMax field, which is also the field by which the Node.Children Lists are sorted. Since the sort call comes before the UpdateMaxCounts call in the following code, each layer is not actually guaranteed to be in sorted order.

else
{
curr.Children.Add((term, new Node(termFrequencyCount)));
//sort children descending by termFrequencyCountChildMax to start lookup with most promising branch
curr.Children.Sort((x, y) => y.Item2.termFrequencyCountChildMax.CompareTo(x.Item2.termFrequencyCountChildMax));
}
termCount++;
UpdateMaxCounts(nodeList, termFrequencyCount);
}
catch (Exception e) { Console.WriteLine("exception: " + term + " " + e.Message); }
}

And there is no sort call after this UpdateMaxCounts:

if ((common == term.Length) && (common == key.Length))
{
if (node.termFrequencyCount == 0) termCount++;
node.termFrequencyCount += termFrequencyCount;
UpdateMaxCounts(nodeList, node.termFrequencyCount);
}

In practice, this can subtly worsen performance.

Also, because UpdateMaxCounts updates termFrequencyCountChildMax in potentially all ancestor nodes, those layers would need to be sorted as well if every layer is supposed to maintain sorted order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions