feat: replace SharedDomainType with SharedForkLifecycle for fork-aware peer selection by diegomrsantos · Pull Request #824 · sigp/anchor

diegomrsantos · 2026-02-11T15:47:54Z

Issue Addressed

Follows up on PR #820 (fork-aware observed_peer_subnets). While #820 correctly stores per-fork bitmaps, all query methods still aggregate across forks with a blind union(). After the Boole grace period ends, peers only subscribed to Alan topics appear useful for Boole subnets — leading to incorrect peer selection.

Proposed Changes

Introduce ForkLifecycle enum (Normal, WarmUp, GracePeriod) and SharedForkLifecycle shared state in the fork crate, replacing SharedDomainType
ForkMonitor updates the shared lifecycle state before broadcasting each ForkPhase event
Peer selection queries in ConnectionManager now consult the lifecycle: in Normal state only the current fork's bitmap is considered; during WarmUp/GracePeriod both forks are aggregated
All components that previously read SharedDomainType (Discovery, Handshake) now read SharedForkLifecycle::domain_type() instead
Network::on_fork_phase simplified — no longer sets domain type (monitor handles it), only updates ENR

Additional Info

Depends on fix: share domain type across network components to fix handshake after fork activation #814, fix: prevent ANSI escape sequences leaking into file log span fields #815, and fix: make observed_peer_subnets fork-aware #820
SharedForkLifecycle follows the same Arc<RwLock<T>> single-writer pattern as the SharedDomainType it replaces
The enum makes invalid states unrepresentable — a single value captures fork identity, domain type, and transition phase
ENR updates remain in Network::on_fork_phase because they require explicit discv5 mutation
count_observed_peers_for_subnets hoists the lifecycle read outside the per-peer loop to avoid repeated lock acquisition

Add a new lifecycle module to the fork crate that models fork transition states as a single enum with three variants: - Normal: operating on a single fork (pre-fork or post-grace-period) - WarmUp: preparing for an upcoming fork (dual-subscribing to new topics) - GracePeriod: fork activated but keeping old subscriptions for late messages SharedForkLifecycle wraps this in Arc<RwLock<T>> for cross-component sharing, following the same pattern as the SharedDomainType it will replace. Each variant carries domain_type, making invalid states unrepresentable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ForkMonitor now accepts a SharedForkLifecycle and updates it right before broadcasting each ForkPhase event: - Preparing → WarmUp { current, upcoming, domain_type } - Activated → GracePeriod { current, previous, domain_type } - GracePeriodEnded → Normal { current, domain_type } This makes ForkMonitor the single writer of fork lifecycle state. Components that previously needed to react to events just to cache derived values can now read the shared state directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the SharedDomainType abstraction with SharedForkLifecycle across all network components, making peer selection fork-aware. Key changes: - Remove SharedDomainType from network crate entirely - Discovery and Handshake read domain_type via fork_lifecycle - Network::on_fork_phase simplified to only update ENR (domain type updates are now handled by ForkMonitor via SharedForkLifecycle) - ConnectionManager uses fork lifecycle to filter peer subnets: in Normal state only the current fork's bitmap is returned; during WarmUp/GracePeriod both forks are aggregated - Client creates SharedForkLifecycle and passes to both ForkMonitor and Network This prevents peers subscribed only to a defunct fork from appearing useful for subnet coverage after the grace period ends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

diegomrsantos · 2026-02-11T17:41:21Z

@claude review this PR

claude-code-actions-sigp · 2026-02-11T17:41:42Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

claude-code-actions-sigp · 2026-02-11T19:15:39Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

claude-code-actions-sigp · 2026-02-11T19:18:38Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

claude-code-actions-sigp · 2026-02-11T19:46:22Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

shane-moore · 2026-02-12T03:37:32Z

anchor/common/fork/src/lifecycle.rs

+    ///
+    /// Both forks' contexts are relevant — peers subscribed to either
+    /// the current or upcoming fork are useful.
+    WarmUp {


What about collapsing WarmUp and GracePeriod variants into something like

Transitioning { current, domain_type }

just an idea since consumers don't seem to care whether they're in WarmUp or GracePeriod, just that current and domain_type are correct. and upcoming/previous fields seem unused

shane-moore · 2026-02-12T03:51:25Z

anchor/network/src/peer_manager/connection.rs

+    ) -> Option<Bitfield<Fixed<U128>>> {
        let fork_map = self.observed_peer_subnets.get(peer)?;
-        Some(Self::aggregate_fork_bitmaps(fork_map))
+        match lifecycle {


when fork_map is empty (newly connected peer, no gossipsub yet), Normal returns None (triggering ENR fallback) but WarmUp/GracePeriod returns Some(all-zeros) (skipping ENR fallback). Adding an early if fork_map.is_empty() { return None } would make the behavior consistent

shane-moore · 2026-02-12T17:03:19Z

anchor/common/fork/src/lifecycle.rs

+
+/// Fork lifecycle state. Updated only by ForkMonitor.
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub enum ForkLifecycle {


thinking of ways we could simplify this further @dknopik. I think one way would be to convert this enum to a flat struct:

pub struct ForkState { pub domain_type: DomainType, pub current_fork: Fork, pub in_transition: bool, }

since network components seem to just care about what is the current domain type and whether the network is in a fork transition period (warm up/grace period) vs if normal operation

another idea would just be to extend ForkSchedule with a current field:

pub struct ForkSchedule { configs: BTreeMap<Fork, ForkConfig>, network_name: String, // Runtime: updated by ForkMonitor, read by network components current: RwLock<ActiveFork>, } struct ActiveFork { fork: Fork, in_transition: bool, }

benefit of this is killing need for new ForkState module. con is that it mixes config (current ForkSchedule struct) with runtime characteristics (new current field)

or perhaps you wanted to take a look and come up with something. if so, feel free to lmk if you want me to try out whatever you're thinking

Good call on simplification. I’d keep runtime state separate from ForkSchedule (it’s config-only today), so I don’t think adding mutable current there is the right direction.
I’m fine simplifying the consumer API with an is_transition() helper, but I’d keep ForkLifecycle as an enum for now so we retain WarmUp vs GracePeriod semantics if/when we need them

claude-code-actions-sigp · 2026-02-12T19:19:22Z

Claude finished @diegomrsantos's task in 1m 44s —— View job

Code Review Complete

Reviewing PR #824: fork-aware peer selection with SharedForkLifecycle

Todo List

Read and understand the new ForkLifecycle and SharedForkLifecycle implementation
Review ForkMonitor changes for lifecycle state management
Analyze peer selection logic changes in ConnectionManager
Review network component updates (Discovery, Handshake, Network)
Check for proper error handling and thread safety
Evaluate test coverage
Consider existing review feedback from shane-moore
Provide comprehensive review feedback

Summary

This PR successfully addresses the core issue of fork-aware peer selection after grace periods end. The implementation is well-structured with comprehensive tests. However, there are architectural opportunities for simplification and a critical behavioral inconsistency that should be addressed.

Critical Issues

1. Inconsistent ENR Fallback Behavior (connection.rs:319)

Issue: Shane-moore correctly identified that when fork_map is empty (newly connected peer), behavior differs based on lifecycle state:

Normal state: returns None → triggers ENR fallback
WarmUp/GracePeriod: returns Some(all-zeros bitfield) → skips ENR fallback

Location: anchor/network/src/peer_manager/connection.rs:313-324

fn get_peer_subnets_for_lifecycle(...) -> Option<Bitfield<Fixed<U128>>> {
    let fork_map = self.observed_peer_subnets.get(peer)?;
    match lifecycle {
        ForkLifecycle::Normal { current, .. } => fork_map.get(current).cloned(),
        ForkLifecycle::WarmUp { .. } | ForkLifecycle::GracePeriod { .. } => {
            Some(Self::aggregate_fork_bitmaps(fork_map))  // ← returns Some(zeros) if fork_map is empty
        }
    }
}

Impact: During fork transitions, newly connected peers that haven't gossiped yet appear to offer zero subnets instead of falling back to their ENR claims.

Fix: Add early return before the match statement:

fn get_peer_subnets_for_lifecycle(...) -> Option<Bitfield<Fixed<U128>>> {
    let fork_map = self.observed_peer_subnets.get(peer)?;
    if fork_map.is_empty() {
        return None;  // Consistent ENR fallback across all states
    }
    match lifecycle {
        // ... rest of the logic
    }
}

Architectural Suggestions

2. Enum Complexity vs. Benefit (lifecycle.rs:16)

Context: Shane-moore suggested collapsing WarmUp and GracePeriod into a single Transitioning variant since consumers only care about:

Current fork identity
Whether to aggregate bitmaps (transition) or not (normal)

Current Design:

pub enum ForkLifecycle {
    Normal { current: Fork, domain_type: DomainType },
    WarmUp { current: Fork, upcoming: Fork, domain_type: DomainType },
    GracePeriod { current: Fork, previous: Fork, domain_type: DomainType },
}

Observation: The upcoming/previous fields in WarmUp/GracePeriod are never accessed outside ForkMonitor. All consumers pattern-match to distinguish transition states from normal, then treat both transition states identically.

Evidence from code:

connection.rs:319-324: WarmUp | GracePeriod => { aggregate() } (treats both identically)
Monitor is the only component that sets/reads upcoming/previous

Options:

Simplify enum (shane-moore's suggestion):

pub enum ForkLifecycle {
    Normal { current: Fork, domain_type: DomainType },
    Transitioning { current: Fork, domain_type: DomainType },
}

Pros: Simpler API, fewer match arms, clearer intent
Cons: Loses explicit phase distinction in type system

Keep current design (status quo):
Pros: Explicit phases self-document the fork lifecycle
Cons: Added complexity, unused fields

Recommendation: Consider option 1. The explicit phase distinction provides minimal value if the fields aren't used and consumers treat both phases identically. The simpler enum would be easier to understand and maintain.

3. Alternative: Extend ForkSchedule (review comments)

Shane-moore also suggested embedding runtime state in ForkSchedule:

pub struct ForkSchedule {
    configs: BTreeMap<Fork, ForkConfig>,
    network_name: String,
    current: RwLock<ActiveFork>,  // Runtime state
}

Pros: Eliminates separate SharedForkLifecycle module
Cons: Mixes configuration (immutable) with runtime state (mutable)

Recommendation: The current separate SharedForkLifecycle approach maintains better separation of concerns. Configuration and runtime state have different lifecycles and mutation patterns.

Code Quality

Strengths

Excellent Test Coverage:
- lifecycle.rs: Basic lifecycle operations well-tested
- monitor.rs: Comprehensive state machine tests covering all transitions
- connection.rs:552-997: 32 tests covering fork-aware behavior, edge cases, and the multi-fork subscription bug
Clear Documentation:
- Doc comments explain the "why" effectively
- Per-fork tracking motivation is well-documented
Thread Safety:
- SharedForkLifecycle correctly uses Arc<RwLock<T>>
- Single-writer pattern (ForkMonitor only) is enforced by convention and documented
Performance Optimization:
- connection.rs:223: Hoists lifecycle read outside peer loop to avoid repeated lock acquisition (good catch!)

Areas for Improvement

Missing Error Handling:
```
// monitor.rs:508
let Some(domain_type) = fork_schedule.domain_type(current_fork) else {
    tracing::error!(...);
    return;  // Silent failure - lifecycle not updated!
};
```
Issue: If the fork schedule is misconfigured, lifecycle updates fail silently. This could lead to incorrect peer selection.

Suggestion: Consider panic or more explicit error propagation for config errors vs. runtime errors.
Clone vs. Reference (lifecycle.rs:85-86):
```
pub fn get(&self) -> ForkLifecycle {
    self.0.read().clone()  // Clones the enum
}
```
Trade-off: Cloning the enum (small, 3 variants) is acceptable, but for bulk operations, prefer passing &ForkLifecycle or pre-reading once (as done in count_observed_peers_for_subnets).

Discovery Domain Type Filtering (discovery.rs:455-462):
The domain type predicate correctly reads from SharedForkLifecycle:

let domain_type_predicate = move |enr: &Enr| {
    if let Some(Ok(domain_type)) = enr.get_decodable::<[u8; 4]>("domaintype") {
        shared_lifecycle.domain_type().0 == domain_type
    } else {
        trace!(?enr, "Rejecting ENR with missing domaintype");
        false
    }
};

This is correct - discovery automatically adapts to fork transitions.

Testing

Strong Points:

All three lifecycle states tested
Fork activation sequence tested (monitor.rs:935-1010)
Multi-fork bitmap aggregation thoroughly tested
Edge cases covered (duplicate subscribes, partial unsubscribes, disconnected peers)
Handshake fork lifecycle update propagation tested (handshake/mod.rs:542-684)

Suggestion: Add integration test demonstrating the core bug fix - that after grace period ends, peers only on the old fork are correctly excluded from subnet coverage calculations. The existing tests verify the mechanism but not the end-to-end behavior.

Security & Performance

No Issues Found:

No obvious security vulnerabilities
No performance regressions - the lifecycle read optimization is a positive change
Thread-safe shared state correctly implemented

Comparison with Existing Review Comments

Shane-moore's feedback aligns with my findings:

✅ ENR fallback inconsistency (issue Draft a Testing Framework #1 above) - valid and should be fixed
✅ Enum simplification suggestion (issue Add basic repository structure #2 above) - worth considering but not critical
The "ForkSchedule extension" suggestion is interesting but I recommend keeping the current separation

Summary of Recommendations

Must Fix

Fix ENR fallback inconsistency for empty fork_map during transitions (add early return in get_peer_subnets_for_lifecycle)

Should Consider

Simplify ForkLifecycle enum by collapsing WarmUp/GracePeriod into Transitioning
Improve error handling for fork schedule misconfigurations (don't fail silently)
Add integration test for core bug fix scenario

Nice to Have

Add inline comments explaining why certain lifecycle phases are treated identically in peer selection

Conclusion

The PR successfully solves the fork-aware peer selection problem with a well-tested implementation. The critical ENR fallback inconsistency should be fixed before merge. The architectural simplification suggestions are worth discussing but not blocking.

dknopik · 2026-02-20T16:08:45Z

I think we can simplify this - I am just not sure how exactly. Here are a few possibilities:

Roll back fix: share domain type across network components to fix handshake after fork activation #814. Considering the original motivation was a bugfix for keeping the current domain type updated, this has grown quite complex - hundreds of lines and a lot of code to test. Instead, I think the original approach is way simpler and at this point less error prone.
Kind of this approach, but less complex: I think having a RwLock for the current ForkConfig is simple enough. Any parts of the code interested in the lifecycle can use the existing broadcast channel.
Finally, avoid the shared state, but instead pass around the ForkSchedule to parts of the code. Or maybe a new wrapper that brings along a SlotClock for easier querying of the current fork.

Right now, this is the rough order of my preference. Happy to discuss all of these approaches!

diegomrsantos · 2026-02-20T16:21:26Z

Thanks for the feedback, I'll take a look at it next week. Did you consider this is also adding a fork-aware peer selection and isn't just a refactor? The idea was to have only one source of truth for fork-dependent data.

shane-moore · 2026-02-20T22:13:57Z

@dknopik @diegomrsantos, made a POC of how I interpret option 1 against this pr just to see what it could look like diegomrsantos/anchor@feat/shared-fork-lifecycle...shane-moore:anchor:option1-simplification

def kills a lot of code

idea is that ForkPhase broadcast is source of truth for fork transitions. so we replace SharedForkLifecycle and have Network::on_fork_phase dispatch directly to the connection manager / discovery / handshake

and we keep the fork aware peer selection logic from this PR via having ConnectionManager track current_fork and in_transition

Not saying what i'm suggesting is necessarily the right call, just showing what it could look like

diegomrsantos · 2026-02-21T00:10:38Z

idea is that ForkPhase broadcast is source of truth for fork transitions. so we replace SharedForkLifecycle and have Network::on_fork_phase dispatch directly to the connection manager / discovery / handshake

They serve two different goals. ForkPhase is for components that need to act on fork events. SharedForkLifecycle is for components that only need the current value.
Less code is good, but not at the expense of clarity. The priority is code that is easy to understand and reason about; cleverness that obscures behavior is a net negative.

One nuance: fewer lines often help clarity, so it’s not less code vs clarity, it’s clarity first.

shane-moore · 2026-02-23T03:52:21Z

agreed with clarity first! i think it's also nice to think about these type of decisions from the perspective of "what's easier for someone new to this part of the codebase to debug?". i would think if you were trying to debug something related to this, you'd just trace through the call paths and find in Network::on_fork_phase where domain type propogates to the three components and go from there. With SharedForkLifecycle, it has the benefit of self-contained state, so components don't need to know who updates them. But it might be more daunting to debug since you're tracing through Arc<RwLock<>> shared state that gets mutated from the monitor task, vs just following fn calls in Network::on_fork_phase

If we find we need this shared state across more components in the future, we can always introduce something like SharedForkLifecycle then since it'd for sure be more justified with more consumers

Just my thoughts!

diegomrsantos · 2026-02-23T05:09:14Z

I’m still struggling to see where the additional complexity is coming from here.

We already need a background task (ForkMonitor) to track fork schedule and broadcast ForkPhase transitions. That’s true regardless of whether we propagate state via events or via a shared handle. In this PR, the monitor simply becomes the single writer of a single value (SharedForkLifecycle), and everyone else reads it when they need the current domain type / lifecycle. This is the same pattern we already had with SharedDomainType, just with a better invariant model (ForkLifecycle makes invalid states unrepresentable).

What this change buys us is removing duplicated “cache derived value on event” logic from multiple components and avoiding desync between “what phase are we in?” vs “what domain type are we using?”

If the concern is specifically:
• traceability (following explicit call paths from Network::on_fork_phase), or
• locks (hot-path reads),

can we call that out directly? From what I can see, reads are cheap (and we already hoist outside loops where it matters), and the writer is centralized.

Happy to simplify further, but I’d like to keep the decision rule explicit:
• single source of truth,
• minimal duplicated derived state,
• easiest to debug “current fork / domain type” at any point in time.

If there’s a concrete code path where reading SharedForkLifecycle is harder to reason about/debug than propagating via Network::on_fork_phase, point me at it and we’ll address it.

This comment was marked as outdated.

Sign in to view

diegomrsantos and others added 4 commits February 11, 2026 13:13

style(fork): fix doc comment line wrapping in lifecycle.rs

50378e6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

diegomrsantos force-pushed the feat/shared-fork-lifecycle branch from 47555e3 to 50378e6 Compare February 11, 2026 16:14

diegomrsantos marked this pull request as draft February 11, 2026 16:14

diegomrsantos marked this pull request as ready for review February 11, 2026 16:14

diegomrsantos self-assigned this Feb 11, 2026

diegomrsantos marked this pull request as draft February 11, 2026 16:21

diegomrsantos marked this pull request as ready for review February 11, 2026 16:21

sigp deleted a comment from claude-code-actions-sigp bot Feb 11, 2026

diegomrsantos mentioned this pull request Feb 11, 2026

fix(ci): fix claude-pr-review checkout for fork PRs #827

Merged

Merge branch 'unstable' into feat/shared-fork-lifecycle

ed894ec

diegomrsantos marked this pull request as draft February 11, 2026 19:15

diegomrsantos marked this pull request as ready for review February 11, 2026 19:15

Merge branch 'unstable' into feat/shared-fork-lifecycle

651104e

diegomrsantos marked this pull request as draft February 11, 2026 19:46

diegomrsantos marked this pull request as ready for review February 11, 2026 19:46

shane-moore reviewed Feb 12, 2026

View reviewed changes

diegomrsantos added 4 commits February 23, 2026 02:22

test(fork): remove low-value lifecycle accessor tests

e4a2dc1

style(fork): remove extra blank line in lifecycle tests

e9fdc62

Merge branch 'unstable' into feat/shared-fork-lifecycle

511de57

fix(network): model observed subnet state explicitly

e21d3e4

Comments

Conversation

diegomrsantos commented Feb 11, 2026

Issue Addressed

Proposed Changes

Additional Info

Uh oh!

This comment was marked as outdated.

diegomrsantos commented Feb 11, 2026

Uh oh!

claude-code-actions-sigp bot commented Feb 11, 2026

Uh oh!

claude-code-actions-sigp bot commented Feb 11, 2026

Uh oh!

claude-code-actions-sigp bot commented Feb 11, 2026

Uh oh!

claude-code-actions-sigp bot commented Feb 11, 2026

Uh oh!

shane-moore Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

shane-moore Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

shane-moore Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

diegomrsantos Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude-code-actions-sigp bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Todo List

Summary

Critical Issues

1. Inconsistent ENR Fallback Behavior (connection.rs:319)

Architectural Suggestions

2. Enum Complexity vs. Benefit (lifecycle.rs:16)

3. Alternative: Extend ForkSchedule (review comments)

Code Quality

Strengths

Areas for Improvement

Testing

Security & Performance

Comparison with Existing Review Comments

Summary of Recommendations

Must Fix

Should Consider

Nice to Have

Conclusion

Uh oh!

dknopik commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegomrsantos commented Feb 20, 2026

Uh oh!

shane-moore commented Feb 20, 2026

Uh oh!

diegomrsantos commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shane-moore commented Feb 23, 2026

Uh oh!

diegomrsantos commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

diegomrsantos Feb 23, 2026 •

edited

Loading

claude-code-actions-sigp bot commented Feb 12, 2026 •

edited

Loading

dknopik commented Feb 20, 2026 •

edited

Loading

diegomrsantos commented Feb 21, 2026 •

edited

Loading

diegomrsantos commented Feb 23, 2026 •

edited

Loading