feat: replace SharedDomainType with SharedForkLifecycle for fork-aware peer selection#824
feat: replace SharedDomainType with SharedForkLifecycle for fork-aware peer selection#824diegomrsantos wants to merge 10 commits intosigp:unstablefrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Add a new lifecycle module to the fork crate that models fork transition states as a single enum with three variants: - Normal: operating on a single fork (pre-fork or post-grace-period) - WarmUp: preparing for an upcoming fork (dual-subscribing to new topics) - GracePeriod: fork activated but keeping old subscriptions for late messages SharedForkLifecycle wraps this in Arc<RwLock<T>> for cross-component sharing, following the same pattern as the SharedDomainType it will replace. Each variant carries domain_type, making invalid states unrepresentable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ForkMonitor now accepts a SharedForkLifecycle and updates it right
before broadcasting each ForkPhase event:
- Preparing → WarmUp { current, upcoming, domain_type }
- Activated → GracePeriod { current, previous, domain_type }
- GracePeriodEnded → Normal { current, domain_type }
This makes ForkMonitor the single writer of fork lifecycle state.
Components that previously needed to react to events just to cache
derived values can now read the shared state directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the SharedDomainType abstraction with SharedForkLifecycle across all network components, making peer selection fork-aware. Key changes: - Remove SharedDomainType from network crate entirely - Discovery and Handshake read domain_type via fork_lifecycle - Network::on_fork_phase simplified to only update ENR (domain type updates are now handled by ForkMonitor via SharedForkLifecycle) - ConnectionManager uses fork lifecycle to filter peer subnets: in Normal state only the current fork's bitmap is returned; during WarmUp/GracePeriod both forks are aggregated - Client creates SharedForkLifecycle and passes to both ForkMonitor and Network This prevents peers subscribed only to a defunct fork from appearing useful for subnet coverage after the grace period ends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
47555e3 to
50378e6
Compare
|
@claude review this PR |
|
I'll analyze this and get back to you. |
|
I'll analyze this and get back to you. |
1 similar comment
|
I'll analyze this and get back to you. |
|
I'll analyze this and get back to you. |
| /// | ||
| /// Both forks' contexts are relevant — peers subscribed to either | ||
| /// the current or upcoming fork are useful. | ||
| WarmUp { |
There was a problem hiding this comment.
What about collapsing WarmUp and GracePeriod variants into something like
Transitioning {
current,
domain_type
}
just an idea since consumers don't seem to care whether they're in WarmUp or GracePeriod, just that current and domain_type are correct. and upcoming/previous fields seem unused
| ) -> Option<Bitfield<Fixed<U128>>> { | ||
| let fork_map = self.observed_peer_subnets.get(peer)?; | ||
| Some(Self::aggregate_fork_bitmaps(fork_map)) | ||
| match lifecycle { |
There was a problem hiding this comment.
when fork_map is empty (newly connected peer, no gossipsub yet), Normal returns None (triggering ENR fallback) but WarmUp/GracePeriod returns Some(all-zeros) (skipping ENR fallback). Adding an early if fork_map.is_empty() { return None } would make the behavior consistent
|
|
||
| /// Fork lifecycle state. Updated only by ForkMonitor. | ||
| #[derive(Clone, Debug, PartialEq, Eq)] | ||
| pub enum ForkLifecycle { |
There was a problem hiding this comment.
thinking of ways we could simplify this further @dknopik. I think one way would be to convert this enum to a flat struct:
pub struct ForkState {
pub domain_type: DomainType,
pub current_fork: Fork,
pub in_transition: bool,
}
since network components seem to just care about what is the current domain type and whether the network is in a fork transition period (warm up/grace period) vs if normal operation
another idea would just be to extend ForkSchedule with a current field:
pub struct ForkSchedule {
configs: BTreeMap<Fork, ForkConfig>,
network_name: String,
// Runtime: updated by ForkMonitor, read by
network components
current: RwLock<ActiveFork>,
}
struct ActiveFork {
fork: Fork,
in_transition: bool,
}
benefit of this is killing need for new ForkState module. con is that it mixes config (current ForkSchedule struct) with runtime characteristics (new current field)
or perhaps you wanted to take a look and come up with something. if so, feel free to lmk if you want me to try out whatever you're thinking
There was a problem hiding this comment.
Good call on simplification. I’d keep runtime state separate from ForkSchedule (it’s config-only today), so I don’t think adding mutable current there is the right direction.
I’m fine simplifying the consumer API with an is_transition() helper, but I’d keep ForkLifecycle as an enum for now so we retain WarmUp vs GracePeriod semantics if/when we need them
|
Claude finished @diegomrsantos's task in 1m 44s —— View job Code Review CompleteReviewing PR #824: fork-aware peer selection with Todo List
SummaryThis PR successfully addresses the core issue of fork-aware peer selection after grace periods end. The implementation is well-structured with comprehensive tests. However, there are architectural opportunities for simplification and a critical behavioral inconsistency that should be addressed. Critical Issues1. Inconsistent ENR Fallback Behavior (connection.rs:319)Issue: Shane-moore correctly identified that when
Location: fn get_peer_subnets_for_lifecycle(...) -> Option<Bitfield<Fixed<U128>>> {
let fork_map = self.observed_peer_subnets.get(peer)?;
match lifecycle {
ForkLifecycle::Normal { current, .. } => fork_map.get(current).cloned(),
ForkLifecycle::WarmUp { .. } | ForkLifecycle::GracePeriod { .. } => {
Some(Self::aggregate_fork_bitmaps(fork_map)) // ← returns Some(zeros) if fork_map is empty
}
}
}Impact: During fork transitions, newly connected peers that haven't gossiped yet appear to offer zero subnets instead of falling back to their ENR claims. Fix: Add early return before the match statement: fn get_peer_subnets_for_lifecycle(...) -> Option<Bitfield<Fixed<U128>>> {
let fork_map = self.observed_peer_subnets.get(peer)?;
if fork_map.is_empty() {
return None; // Consistent ENR fallback across all states
}
match lifecycle {
// ... rest of the logic
}
}Architectural Suggestions2. Enum Complexity vs. Benefit (lifecycle.rs:16)Context: Shane-moore suggested collapsing
Current Design: pub enum ForkLifecycle {
Normal { current: Fork, domain_type: DomainType },
WarmUp { current: Fork, upcoming: Fork, domain_type: DomainType },
GracePeriod { current: Fork, previous: Fork, domain_type: DomainType },
}Observation: The Evidence from code:
Options:
Recommendation: Consider option 1. The explicit phase distinction provides minimal value if the fields aren't used and consumers treat both phases identically. The simpler enum would be easier to understand and maintain. 3. Alternative: Extend ForkSchedule (review comments)Shane-moore also suggested embedding runtime state in pub struct ForkSchedule {
configs: BTreeMap<Fork, ForkConfig>,
network_name: String,
current: RwLock<ActiveFork>, // Runtime state
}Pros: Eliminates separate Recommendation: The current separate Code QualityStrengths
Areas for Improvement
TestingStrong Points:
Suggestion: Add integration test demonstrating the core bug fix - that after grace period ends, peers only on the old fork are correctly excluded from subnet coverage calculations. The existing tests verify the mechanism but not the end-to-end behavior. Security & PerformanceNo Issues Found:
Comparison with Existing Review CommentsShane-moore's feedback aligns with my findings:
Summary of RecommendationsMust Fix
Should Consider
Nice to Have
ConclusionThe PR successfully solves the fork-aware peer selection problem with a well-tested implementation. The critical ENR fallback inconsistency should be fixed before merge. The architectural simplification suggestions are worth discussing but not blocking. |
|
I think we can simplify this - I am just not sure how exactly. Here are a few possibilities:
Right now, this is the rough order of my preference. Happy to discuss all of these approaches! |
|
Thanks for the feedback, I'll take a look at it next week. Did you consider this is also adding a fork-aware peer selection and isn't just a refactor? The idea was to have only one source of truth for fork-dependent data. |
|
@dknopik @diegomrsantos, made a POC of how I interpret option 1 against this pr just to see what it could look like diegomrsantos/anchor@feat/shared-fork-lifecycle...shane-moore:anchor:option1-simplification def kills a lot of code idea is that and we keep the fork aware peer selection logic from this PR via having Not saying what i'm suggesting is necessarily the right call, just showing what it could look like |
They serve two different goals. ForkPhase is for components that need to act on fork events. SharedForkLifecycle is for components that only need the current value. One nuance: fewer lines often help clarity, so it’s not less code vs clarity, it’s clarity first. |
|
agreed with clarity first! i think it's also nice to think about these type of decisions from the perspective of "what's easier for someone new to this part of the codebase to debug?". i would think if you were trying to debug something related to this, you'd just trace through the call paths and find in If we find we need this shared state across more components in the future, we can always introduce something like Just my thoughts! |
|
I’m still struggling to see where the additional complexity is coming from here. We already need a background task (ForkMonitor) to track fork schedule and broadcast ForkPhase transitions. That’s true regardless of whether we propagate state via events or via a shared handle. In this PR, the monitor simply becomes the single writer of a single value (SharedForkLifecycle), and everyone else reads it when they need the current domain type / lifecycle. This is the same pattern we already had with SharedDomainType, just with a better invariant model (ForkLifecycle makes invalid states unrepresentable).  What this change buys us is removing duplicated “cache derived value on event” logic from multiple components and avoiding desync between “what phase are we in?” vs “what domain type are we using?”  If the concern is specifically: can we call that out directly? From what I can see, reads are cheap (and we already hoist outside loops where it matters), and the writer is centralized.  Happy to simplify further, but I’d like to keep the decision rule explicit: If there’s a concrete code path where reading SharedForkLifecycle is harder to reason about/debug than propagating via Network::on_fork_phase, point me at it and we’ll address it. |

Issue Addressed
Follows up on PR #820 (fork-aware
observed_peer_subnets). While #820 correctly stores per-fork bitmaps, all query methods still aggregate across forks with a blindunion(). After the Boole grace period ends, peers only subscribed to Alan topics appear useful for Boole subnets — leading to incorrect peer selection.Proposed Changes
ForkLifecycleenum (Normal,WarmUp,GracePeriod) andSharedForkLifecycleshared state in the fork crate, replacingSharedDomainTypeForkMonitorupdates the shared lifecycle state before broadcasting eachForkPhaseeventConnectionManagernow consult the lifecycle: inNormalstate only the current fork's bitmap is considered; duringWarmUp/GracePeriodboth forks are aggregatedSharedDomainType(Discovery, Handshake) now readSharedForkLifecycle::domain_type()insteadNetwork::on_fork_phasesimplified — no longer sets domain type (monitor handles it), only updates ENRAdditional Info
SharedForkLifecyclefollows the sameArc<RwLock<T>>single-writer pattern as theSharedDomainTypeit replacesNetwork::on_fork_phasebecause they require explicit discv5 mutationcount_observed_peers_for_subnetshoists the lifecycle read outside the per-peer loop to avoid repeated lock acquisition