-
Notifications
You must be signed in to change notification settings - Fork 54
Description
from_tree = json.build_tree(["de", 1])
to_tree = json.build_tree([2])
print(from_tree.diff(to_tree).edited_cost())
for edit in from_tree.get_all_edits(to_tree):
print(edit)output:
4
Insert(to_insert=IntegerNode(2), insert_into=ListNode((StringNode('de'), IntegerNode(1))))
Remove(StringNode('de'), remove_from=ListNode((StringNode('de'), IntegerNode(1))))
Remove(IntegerNode(1), remove_from=ListNode((StringNode('de'), IntegerNode(1))))
But the minimal cost should be 3.
output:
3
Remove(StringNode('de'), remove_from=ListNode((StringNode('de'), IntegerNode(1))))
Match(match_from=IntegerNode(1), match_to=IntegerNode(2), cost=1)
I found the problem is in EditDistance._best_match:
dcost = (self.costs[row - 1][col - 1], self.path_costs[row - 1][col - 1])
lcost = (self.costs[row][col - 1], self.path_costs[row][col - 1])
ucost = (self.costs[row - 1][col], self.path_costs[row - 1][col])
diag_is_best = dcost <= lcost and dcost <= ucost
if diag_is_best:
make_distinct(self.edit_matrix[row][col], self.edit_matrix[row][0], self.edit_matrix[0][col])
if diag_is_best and \
self.edit_matrix[row][col].bounds() < self.edit_matrix[row][0].bounds() and \
self.edit_matrix[row][col].bounds() < self.edit_matrix[0][col].bounds(): # condition1
brow, bcol, edit = row - 1, col - 1, self.edit_matrix[row][col]
elif ucost <= dcost: # condition2
brow, bcol, edit = row - 1, col, self.edit_matrix[row][0]
else: # condition3
brow, bcol, edit = row, col - 1, self.edit_matrix[0][col]
self.path_costs[row][col] = self.path_costs[brow][bcol] + 1
self.costs[row][col] = self.costs[brow][bcol] + edit.bounds().upper_bound# In the above example, when we call _best_match(1, 1).
# value information:
# self.edit_matrix[row][col].bounds() = (2,2)
# self.edit_matrix[0][col].bounds() = (2,2)
# ucost = (2, 1)
# dcost = (0, 0)
# lcost = (1, 1)brow = row, bcol = col-1 (condition3 success) because self.edit_matrix[row][col].bounds() == self.edit_matrix[0][col].bounds() (condition1 fail) and ucost > dcost (condition2 fail).
But actually self.costs[row][col-1] + self.edit_matrix[0][col].bounds().upper_bound > self.costs[row-1][col-1] + self.edit_matrix[row][col].bounds().upper_bound. The optimal self.costs[row][col] should be self.costs[row-1][col-1] + self.edit_matrix[row][col].bounds().upper_bound.
Here is a fix (success in the above example, may need further testing):
make_distinct(self.edit_matrix[row][col], self.edit_matrix[row][0], self.edit_matrix[0][col])
dcost = (self.costs[row - 1][col - 1] + self.edit_matrix[row][col].bounds().upper_bound, self.path_costs[row - 1][col - 1])
lcost = (self.costs[row][col - 1] + self.edit_matrix[0][col].bounds().upper_bound, self.path_costs[row][col - 1])
ucost = (self.costs[row - 1][col] + self.edit_matrix[row][0].bounds().upper_bound, self.path_costs[row - 1][col])
if dcost <= lcost and dcost <= ucost:
brow, bcol, edit = row - 1, col - 1, self.edit_matrix[row][col]
elif ucost <= dcost and ucost <= lcost:
brow, bcol, edit = row - 1, col, self.edit_matrix[row][0]
else:
brow, bcol, edit = row, col - 1, self.edit_matrix[0][col]
self.path_costs[row][col] = self.path_costs[brow][bcol] + 1
self.costs[row][col] = self.costs[brow][bcol] + edit.bounds().upper_boundIn summary, we should use the final real cost to guide transition. Like the canonical implementation of the Levenshtein distance metric:
dist[row][col] = min(dist[row - 1][col] + 1,
dist[row][col - 1] + 1,
dist[row - 1][col - 1] + cost)If this is a bug, I would be happy to provide a PR.