Network Topics Minhash by MatheusFranco99 · Pull Request #43 · ssvlabs/SIPs

MatheusFranco99 · 2024-03-08T15:57:10Z

Network topology SIP with the proposal of changing the network structure, namely how validators are assigned to topics, to reduce the number of non-committee messages an operator needs to handle.

GalRogozinski

Need to talk about the algorithm tomorrow

sips/network_topology.md

alan-ssvlabs

LGTM!

sips/network_topology.md

MatheusFranco99 · 2025-03-21T12:33:38Z

It seems that simply increasing the number of topics by 4x (to 512) also gives us some benefits
The average and median cases are even better than the greedy, though the scalability factor goes only up to 3x

CC @liorrutenberg @GalRogozinski @moshe-blox @alan-ssvlabs @y0sher

sips/network_topology.md

AgeManning · 2025-04-02T02:07:35Z

Hey all, thought I'd throw my thoughts on this also.

I like the cost function, seems like a good thing to optimize. As has been discussed (but just wanted to make it a bit more explicit), we can minimize the cost function to always be 0 if we have an infinite number of topics. I.e each committee has its own topic. The trade-off as has been mentioned is network stability. What I think might be useful to mention that I've seen on real networks is why and how bad this network instability is. Imo it comes from a few main sources:

Discovery - If there are only 3 nodes in the world (for example) that are on a topic we need to join, discovering them can be difficult depending on how our DHT is structured and filled (might not be a big issue with SSV because we're segregating the DHT for just SSV nodes)
NAT/Connection issues - Home users often don't know how to set up NAT rules correctly, so their nodes don't allow incoming connections. So even if we discover them, we can't connect to them. If we only have a few nodes in the network its harder to connect to them, rather they need to discover and connect to us.
Client connection limits - If a client limits its number of peers to 100 lets say, we can't connect to them because they're "full". The clients should leave priorities for peers in their committee, but that has some security implications.

Anyway, I'd imagine simulations would show larger number of subnets would give better trade-offs in performance, but we want to be weary of the above, which are problems that typically simulations won't show. Erring on the side of caution would imply not going too hard on number of subnets.

The other thing that drew my attention was each client managing state. This is a distributed consensus problem. If our nodes disagree on the state, we're going to be subscribed to different subnets. It wasn't obvious to me what happens in cases of forks. There is a time window, but we could fork beyond that window. Maybe we can rollback the state. If there is an error in our state, will it be easy to diagnose? Like will each client know it needs to re-sync or rollback, or will we just be on the wrong subnets and fail silently? Can we leverage the Ethereum state, and just have a mapping on a smart contract which resolves all of these problems?

This is just some thoughts I had looking over this proposal. I think its a good idea to be smarter about subnet allocation.

dknopik · 2025-04-02T06:47:59Z

The other thing that drew my attention was each client managing state. This is a distributed consensus problem. If our nodes disagree on the state, we're going to be subscribed to different subnets. It wasn't obvious to me what happens in cases of forks. There is a time window, but we could fork beyond that window. Maybe we can rollback the state. If there is an error in our state, will it be easy to diagnose? Like will each client know it needs to re-sync or rollback, or will we just be on the wrong subnets and fail silently? Can we leverage the Ethereum state, and just have a mapping on a smart contract which resolves all of these problems?

I can imagine a hash stored in the smart contract could work. Every event emitted changes the hash, and by checking against the hash in the Ethereum state, a node could ensure that it has the correct SSV state. If that check fails, the node could try to resync from the latest finalized state, and if that fails, fully resync. While I feel like this could allow us to safeguard against weird re-org related issues, changing sync to only consider finalized blocks (instead of a delay of 8 blocks) is preferable in my opinion. While this slows all Validator and Operator management actions (and even prevents them in times of long non-finality), it prevents the whole SSV network from failing if Ethereum is unstable.

MatheusFranco99 · 2025-04-16T09:57:40Z

TODOs:

Greedy without sorting
Degradation of greedy with updates vs. always re-initializing

Conclusion on possible ways forward:

greedy with re-initialization every epoch:
- downsides:
  - double traffic in worst case when joining new topics
  - discovery cost + connections
greedy with re-initialization every X epochs (e.g. 100)
greedy with update events and degradation
MaxReach

MatheusFranco99 · 2025-04-16T13:37:10Z

Hey @AgeManning and @dknopik
Thanks for your useful insights!

Indeed, a large number of topics may be problematic for the discovery layer.
Actually, 128 is already a big number that may cause such problems.
As you Foretold, our analysis of 512 topics provided a much better performance. But, I agree we should stick to 128 due to these issues.

Regarding syncing on the state:

I think it's preferable to go with a solution that doesn't require changes to the contract.
I like the finalized events suggestion as it gives time for nodes to sync and avoid these re-org issues. The downside of it is the delay in the validator start-up.
It would be good to have a solution that allows a validator to quickly join tbh, even because re-orgs are rare as far as I know. Also, if there's a re-org and the operators for the new validator are out of sync, then probably it will only miss duties until they sync (which would be the same in case we wait for finalized events). What do you guys think?

sips/network_topics_minhash.md

diegomrsantos · 2026-01-22T15:29:34Z

Goals (what we're trying to achieve)

No missed duties - Validators must not miss attestations/proposals during transition
No consensus failures - QBFT must reach consensus throughout
No message loss - In-flight messages must be delivered and processed
Clean transition - All nodes eventually on new fork, no lingering state

How Ethereum handles it

From the consensus specs:

Topics include ForkDigestValue - Changes automatically with each fork (4 bytes derived from fork version + genesis root)
Pre-fork subscription - "a node must select and subscribe to subnets of the future fork versioning at least EPOCHS_PER_SUBNET_SUBSCRIPTION epochs in advance"
No cross-fork rebroadcast - "messages SHOULD NOT be re-broadcast from one fork to the other"
Grace for lagging messages - "post-fork, nodes must not be scored negatively for lagging pre-fork messages"
Delayed unsubscription - "Approximately two epochs after the fork, nodes should unsubscribe from deprecated pre-fork topics"

What's different in SSV

Aspect	Ethereum	SSV
Validator independence	Each validator is independent	Committee of 4+ operators must coordinate
Impact of slow upgrade	Only that validator misses duties	Entire committee could fail consensus
Topic structure	/eth2/{fork_digest}/...	/ssv/{network}/{fork_name}/...
What changes	Message formats, validation rules	+ Subnet topology (MinHash) + Proposer selection + Duty types

The key SSV challenge

In Ethereum, if one validator is slow to upgrade, only that validator is affected. In SSV, if one operator in a 4-operator committee is slow, the entire committee could fail to reach consensus because:

They might disagree on proposer (different leader election)
They might disagree on duty type
Messages might go to different subnets

This is why SSV needs a more robust transition strategy than Ethereum's "upgrade your client before the fork" approach.

What I think we're trying to solve

The core problem: How do we ensure all operators in every committee can participate in consensus across the fork boundary, given that some operators may be slightly ahead or behind?

diegomrsantos · 2026-01-22T17:08:00Z

Suggested Improvements to achieve the goals (proposed wording, RFC-style)

1. Slot-derived validation is the source of truth

Rule: Nodes MUST derive the fork from message.slot and validate using that fork's rules (topic mapping, domain, duty type). The received topic is only a consistency check.

Why: Topic is transport-level and sender-controlled; slot is protocol-level and deterministic. If two operators are slightly ahead/behind, slot-based validation ensures they still apply the same rules and avoid split-brain consensus.

2. Explicit handling of early messages on new topics

Rule: During PRIOR_WINDOW, nodes MAY receive messages on new topics (from early-forked peers or skewed clocks). Those messages MUST be validated using slot-derived fork rules and, if valid, MUST be processed normally.

Why: This removes ambiguity ("accept but don't process"), keeps the new mesh healthy, and prevents starving new topics before activation.

3. Publisher rule is slot-based and single-topic

Rule: Each message MUST be published to exactly one topic derived from its slot-implied fork (never to both old and new).

Why: Prevents duplicate propagation, avoids cross-fork rebroadcast, and makes sender/receiver behavior deterministic.

4. SUBSEQUENT_WINDOW aligned with message validity window

Rule: SUBSEQUENT_WINDOW SHOULD be at least the maximum message TTL enforced by slot-time validation (≈ 1 epoch given current rules).

Why: Otherwise we may unsubscribe from old topics while some last-slot pre-fork messages are still valid, causing avoidable duty loss.

If kept at 1 slot: The SIP SHOULD explicitly state that valid late pre-fork messages may be dropped.

5. Unambiguous timeline at the fork boundary

Rule: "At the fork" MUST be defined as slot 0 of ForkEpoch. Unsubscription from old/new happens only after SUBSEQUENT_WINDOW has elapsed.

Why: Removes ambiguity about first-slot behavior, prevents operators from diverging in handling during the activation slot, and improves duty continuity.

6. Keep fork transition encapsulated

Rule: Subscription timing + topic derivation belong in a dedicated fork module; other components only consume "fork from slot" decisions.

Why: Avoids scattering fork-transition logic across validators/QBFT, reduces implementation risk, and makes the transition consistent across the codebase.

diegomrsantos · 2026-01-22T17:45:20Z

What do you think about a clean separation:

SUBSEQUENT_WINDOW = transport policy (how long we stay subscribed to old topics so late messages can still arrive)
Slot‑based validation = correctness policy (what we do with any message that arrives)

But the key nuance is: validation only applies to messages you actually receive. If we unsubscribe too early, valid pre‑fork messages on old topics won’t arrive at all, so validation can’t save them.

So the intended model should be:

Subscription timing (SUBSEQUENT_WINDOW):

Keep old topic subscriptions for SUBSEQUENT_WINDOW after the fork, so late pre‑fork messages still have a path to reach us.

Validation (slot‑based):

For any received message, derive fork from message.slot, compute the expected topic, and apply that fork’s rules.
No special cases for prior/unsubscription windows.

That gives you a crisp division of responsibility:

Transport window → controlled by SUBSEQUENT_WINDOW
Validation/processing → controlled by message.slot

If we want this in RFC wording, something like:

During SUBSEQUENT_WINDOW, nodes MUST remain subscribed to old topics. For all received messages, nodes MUST derive the fork from message.slot and validate against that fork’s rules; the received topic is only a consistency check. After SUBSEQUENT_WINDOW, nodes MAY drop old topics, and any remaining messages on old topics MAY be dropped even if they would otherwise be slot‑valid.

If the goal is “no message loss,” then SUBSEQUENT_WINDOW must cover the valid‑late window (≈ one epoch today). If we keep it at 1 slot, we should explicitly accept that some still‑valid pre‑fork messages can be dropped, even though slot‑based validation would have accepted them.

GalRogozinski · 2026-01-22T18:35:11Z

@nkryuchkov @diegomrsantos @iurii-ssv
I believe I addressed all comments.
Please resolve and approve the PR.

DM me if something is still not clear or unresolved.

nkryuchkov

great work!

diegomrsantos · 2026-01-22T19:02:23Z

Clarify window values + justify the trade-off

Two points still feel underspecified:

1. Constants table is ambiguous

Right now the table says PRIOR_WINDOW = 1 and SUBSEQUENT_WINDOW = 1 without units.

Please make units explicit and state whether these are RECOMMENDED defaults or fixed values.

Suggested text:

PRIOR_WINDOW = 1 epoch (RECOMMENDED)
SUBSEQUENT_WINDOW = 1 slot (RECOMMENDED) (or 1 epoch; see below)

2. Trade-off vs existing message TTL isn't explained

Current validation accepts late messages for committee/aggregator roles up to slots_per_epoch + LATE_SLOT_ALLOWANCE (~34 slots).

If SUBSEQUENT_WINDOW = 1 slot, we intentionally drop a portion of still-valid pre-fork messages — potentially missing duties.

At the same time, the resource overhead of extending SUBSEQUENT_WINDOW is mostly extra mesh connections + inbound gossip, not duplicated publishing: messages are still published to only one topic (old or new), so we're not doubling outbound traffic.

Ask: please either

justify why losing valid pre-fork messages is acceptable vs. the current 34-slot validity window, or
align SUBSEQUENT_WINDOW with that validity window (≈ 1 epoch) so we don't truncate it.

diegomrsantos · 2026-01-22T19:57:35Z

Resource impact of keeping `old` topics for +1 epoch — numbers checked

Source fan-out stats:
https://github.com/ssvlabs/SIPs/pull/43#issuecomment-3639541723

Assumptions (from current Go/Rust configs):
D=8, heartbeat=700ms (1.43 Hz), msgID=12 bytes (ssv_message_id).
Old/new topic names are disjoint. Publishing is single-topic ("one slot → one topic").

Percentile	Extra topics	Extra mesh slots (topics×D)	Mesh ops/sec (≈ slots×1.43)	IHAVE IDs/sec (upper bound)	IHAVE KB/sec	Mem @128–256B/slot
Avg	2.31	18.5	26	845	9.9	2–5 KB
p95	5	40	57	1,829	21.4	5–10 KB
p99	35	280	400	12,800	150	36–72 KB
Max	69	552	789	25,234	296	70–141 KB

Interpretation

These are per-operator, 1-epoch only.
"Mesh slots" are per-topic mesh entries, not necessarily new TCP connections (peer overlap reduces connections).
IHAVE figures are upper bounds; real rates are often lower.
Data-plane bandwidth does not double if we enforce one-topic publishing; the extra load is inbound old-topic gossip for ~1 epoch.

GalRogozinski · 2026-01-22T20:48:53Z

@diegomrsantos

I changed to using slots across both windows so recommended values can easily be changed in the SIP upon experimentation and took your window.
You are correct that we can account for greater asynchrony. Since such asynchrony was never observed on the SSV network we wanted to take a more practical approach of saving resources. Extra resource consumption has more chances to create degraded performance. Since this is a short period of time and you believe that the nodes will be mostly unaffected, we shouldn't fret too much.

sips/network_topics_minhash.md

Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>

sips/network_topics_minhash.md

Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>

diegomrsantos

Amazing work, thanks everyone for the discussion and iterations here!

MatheusFranco99 added 2 commits March 8, 2024 15:53

Add sip

4c19547

Add to lists

ab8006b

MatheusFranco99 requested a review from GalRogozinski March 8, 2024 15:57

MatheusFranco99 self-assigned this Mar 8, 2024

GalRogozinski requested changes Mar 10, 2024

View reviewed changes

MatheusFranco99 added 2 commits March 11, 2024 06:29

Map dependency -> model stability

6548906

Apply review comments

9ef15d0

MatheusFranco99 requested a review from GalRogozinski March 11, 2024 06:48

iurii-ssv mentioned this pull request Oct 5, 2024

Documentation ssvlabs/ssv-spec#496

Open

liorrutenberg reviewed Mar 5, 2025

View reviewed changes

sips/network_topology.md Outdated Show resolved Hide resolved

GalRogozinski mentioned this pull request Mar 10, 2025

New simulations for topics clustering ssvlabs/ssv-spec#527

Open

update sip

19c6111

MatheusFranco99 requested a review from alan-ssvlabs March 20, 2025 12:38

alan-ssvlabs previously approved these changes Mar 20, 2025

View reviewed changes

MatheusFranco99 added 4 commits March 20, 2025 18:00

add topics population results

cb1965a

fix topic populatino results

e7ee951

add history dependence mitigation and small topics mitigation

17a47a5

add table of contents

86379e9

MatheusFranco99 requested a review from alan-ssvlabs March 20, 2025 19:24

alan-ssvlabs reviewed Mar 21, 2025

View reviewed changes

sips/network_topology.md Outdated Show resolved Hide resolved

add execution time comparison

309dbc1

alan-ssvlabs reviewed Mar 25, 2025

View reviewed changes

sips/network_topology.md Outdated Show resolved Hide resolved

MatheusFranco99 added 3 commits April 11, 2025 01:45

update SIP with new models

f3d1c8a

greedy - further analysis

6efc968

adapt to new greedy performance time

5e3826b

diegomrsantos reviewed Jan 22, 2026

View reviewed changes

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

nkryuchkov reviewed Jan 22, 2026

View reviewed changes

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

GalRogozinski added 2 commits January 22, 2026 19:10

change to prior window

7907851

Update sips/network_topics_minhash.md

1d989ee

GalRogozinski added 4 commits January 22, 2026 20:07

Update sips/network_topics_minhash.md

864aa6f

move note to pre-fork

6500dc4

Add first RFC like text

cd39120

apply RFC text

d60b9d2

nkryuchkov approved these changes Jan 22, 2026

View reviewed changes

GalRogozinski added 2 commits January 22, 2026 22:16

Recommended values in slots

b366d85

change to 32 slots

1df164b

iurii-ssv approved these changes Jan 23, 2026

View reviewed changes

diegomrsantos reviewed Jan 23, 2026

View reviewed changes

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

diegomrsantos reviewed Jan 23, 2026

View reviewed changes

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

diegomrsantos reviewed Jan 23, 2026

View reviewed changes

sips/network_topics_minhash.md Outdated Show resolved Hide resolved

GalRogozinski and others added 2 commits January 23, 2026 15:46

Update sips/network_topics_minhash.md

ff9a3fe

Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>

Update sips/network_topics_minhash.md

747464b

Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>

diegomrsantos reviewed Jan 23, 2026

View reviewed changes

sips/network_topics_minhash.md Show resolved Hide resolved

GalRogozinski and others added 2 commits January 23, 2026 16:25

Update sips/network_topics_minhash.md

f8a1a75

Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>

Update sips/network_topics_minhash.md

1a73975

Co-authored-by: diegomrsantos <diegomrsantos@gmail.com>

diegomrsantos approved these changes Jan 23, 2026

View reviewed changes

GalRogozinski merged commit a311856 into boole Jan 23, 2026

GalRogozinski deleted the net-topology branch January 23, 2026 16:46

Comments

Conversation

MatheusFranco99 commented Mar 8, 2024

Uh oh!

GalRogozinski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alan-ssvlabs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MatheusFranco99 commented Mar 21, 2025

Uh oh!

Uh oh!

AgeManning commented Apr 2, 2025

Uh oh!

dknopik commented Apr 2, 2025

Uh oh!

MatheusFranco99 commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatheusFranco99 commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diegomrsantos commented Jan 22, 2026

Goals (what we're trying to achieve)

How Ethereum handles it

What's different in SSV

The key SSV challenge

What I think we're trying to solve

Uh oh!

diegomrsantos commented Jan 22, 2026

Suggested Improvements to achieve the goals (proposed wording, RFC-style)

1. Slot-derived validation is the source of truth

2. Explicit handling of early messages on new topics

3. Publisher rule is slot-based and single-topic

4. SUBSEQUENT_WINDOW aligned with message validity window

5. Unambiguous timeline at the fork boundary

6. Keep fork transition encapsulated

Uh oh!

diegomrsantos commented Jan 22, 2026

Uh oh!

GalRogozinski commented Jan 22, 2026

Uh oh!

nkryuchkov left a comment

Choose a reason for hiding this comment

Uh oh!

diegomrsantos commented Jan 22, 2026

Clarify window values + justify the trade-off

1. Constants table is ambiguous

2. Trade-off vs existing message TTL isn't explained

Uh oh!

diegomrsantos commented Jan 22, 2026

Resource impact of keeping old topics for +1 epoch — numbers checked

Interpretation

Uh oh!

GalRogozinski commented Jan 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diegomrsantos left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

MatheusFranco99 commented Apr 16, 2025 •

edited

Loading

Resource impact of keeping `old` topics for +1 epoch — numbers checked