Skip to content

Comments

Fix leak in range sync components_by_range_requests#7767

Closed
jimmygchen wants to merge 4 commits intosigp:unstablefrom
jimmygchen:fix-sync-leak
Closed

Fix leak in range sync components_by_range_requests#7767
jimmygchen wants to merge 4 commits intosigp:unstablefrom
jimmygchen:fix-sync-leak

Conversation

@jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Jul 21, 2025

Issue Addressed

When a data column range sync fails with coupling error returned below, if the coupling error does not contain a column and peer (i.e. column_and_peer == None)

fn responses_with_custody_columns(
blocks: Vec<Arc<SignedBeaconBlock<E>>>,
data_columns: DataColumnSidecarList<E>,
column_to_peer: HashMap<u64, PeerId>,
expects_custody_columns: &[ColumnIndex],
spec: &ChainSpec,
) -> Result<Vec<RpcBlock<E>>, CouplingError> {

The entire batch gets retried, without the previous context.components_by_range_requests entry being removed, this result in unbounded memory usage increase as shown in the metric:

image

Note: this PR is not tested and I'm not sure if the fix is good - but just wanted to illustrate the issue.

@jimmygchen jimmygchen requested a review from pawanjay176 July 21, 2025 07:53
});
}
// FIXME: hack to remove range request before a retry
network.remove_range_request_by_id(request_id);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this hack didn't fix it

image

The metric i was looking at is sync_active_network_requests, type == components_by_range, recorded here:

(
"components_by_range",
self.components_by_range_requests.len(),
),

This change what caused entries not to be removed i believe, its used by both range and backfill:

if blocks_result.is_ok() {
// remove the entry only if it coupled successfully with
// no errors
entry.remove();
}
// If the request is finished, dequeue everything
Some(blocks_result.map_err(RpcResponseError::BlockComponentCouplingError))

@jimmygchen
Copy link
Member Author

This will be fixed properly by @pawanjay176 in #7762.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant