blockchain, netsync: implement a complete headers-first download during ibd#2428
blockchain, netsync: implement a complete headers-first download during ibd#2428kcalvinalvin wants to merge 17 commits intobtcsuite:masterfrom
Conversation
We add a chainview of bestHeaders so that we'll be able to keep track of headers separately from the bestChain. This is needed as we're getting headers for new block annoucements instead of invs.
Since we may now have blockNodes with just the block header stored without the data, we add a new status to account for this.
Pull Request Test Coverage Report for Build 19373443826Details
💛 - Coveralls |
2308a1e to
3519720
Compare
maybeAcceptHeader performs checks to accept block headers into the header chain. This function allows for a true headers-first download where we only accpet block headers for downloading new blocks.
242fa3b to
e6fadc9
Compare
|
cc: @gijswijs for review |
|
cc: @mohamedawnallah for review |
Roasbeef
left a comment
There was a problem hiding this comment.
Great set of changes! I found the commit structure very easy to follow as well.
Have performed an initial review pass, but will start to run some active tests on one of my nodes to get a better feel for the changes.
| } | ||
|
|
||
| // HaveHeader returns whether the header data is stored in the database. | ||
| func (status blockStatus) HaveHeader() bool { |
There was a problem hiding this comment.
Isn't this just the inverse of HaveData?
There was a problem hiding this comment.
Then you can't communicate not having headers for a given blockhash so I think having this is better.
You can use !HaveData() to check if you have the headers but then what do you do when you want to check if you have a block header.
| // If we're in the initial block download mode, check if we have | ||
| // peers that we can download headers from. | ||
| _, bestHeaderHeight := sm.chain.BestHeader() | ||
| higherHeaderPeers := sm.fetchHigherPeers(bestHeaderHeight) |
There was a problem hiding this comment.
Even though many years have passed at this point, we should retain the logic that we removed to filter out non-segwit peers once segwit is active.
There was a problem hiding this comment.
In sm.fetchHigherPeers(), we filter out peers that aren't sync candidates. The peers can only be sync candidates if they're a segwit activated peer.
This check is done in isSyncCandidate()
ProcessBlockHeader performs chain selection and context-free & contextual validation for the given block header. The function allows a header-first downloading of blocks even without checkpoints.
On flushes to the database, we check that the blockNodes we have for the download block headers are not flushed to the disk unless the block data is stored as well for backwards compatibility. With older btcd clients, they rely on the fact that the blockNode is present to check if the block data is also present. Since we now store blockNodes for just the block headers, this is no longer true. Because of this, we don't flush the blockNodes if there's no accompanying block data for it. This results in the downloading and verifying the headers again if the node were to restart but since the header data is small and the verification is quick, it's not a big downside.
In block processing and block downloading, HaveBlock is used to check if the block data already exists. It was ok to just check for the existence of the blockNode but we now we also need to check if the data exists as the blockNode may be present for just the block header.
The added exported methods on BlockChain provide access to the block header tip like fetching block hashes and heights from the headers chain.
checkHeadersList takes in a blockhash and returns if it's a checkpointed block and the correct behavior flags for the verification of the block.
fetchHigherPeers provides a convenient function to get peers that are sync candidates and are at a higher advertised height than the passed in height.
isInIBDMode returns if the SyncManager needs to download blocks and sync to the latest chain tip. It determines if it's in ibd mode by checking if the blockchain thinks we're current and if we don't have peers that are at higher advertised blocks.
fetchHeaders picks a random peer at a higher advertised block and requests headers from them.
Instead of the old headerList based header processing, we make use of the new ProcessBlockHeader function.
headers Since we now utilize ProcessBlockHeader, we change the fetchHeaderBlocks to be based off of the block index instead of the headerList in SyncManager.
ince we now utilize ProcessBlockHeaders, we change the startSync function to utilize the block index for downloading blocks/headers instead of using the headerList in SyncManager.
handleBlockMsg is updated to not be based off of the headerList and sm.nextCheckpoint when doing operations related to checkpoints and headers.
Since we no longer utilize the headerList for doing headers-first download and checkpoint tracking, we remove related code.
ibdMode is a more fitting name than headersFirstMode since all blocks are downloaded headers-first during the initial block download.
e6fadc9 to
2f4c749
Compare
|
Addressed all the review comments by roasbeef |
|
Tested in the wild, and does what it says on the tin: Details``` 2025-12-17 18:32:10.885 [INF] SYNC: Downloading headers for blocks 136001 to 2401568 from peer PEER_A:18333 2025-12-17 18:32:10.885 [INF] SYNC: Lost peer PEER_B:18333 (outbound) 2025-12-17 18:32:16.309 [INF] SYNC: New valid peer PEER_C:18333 (outbound) (/Satoshi:27.1.0/) 2025-12-17 18:32:40.885 [INF] SYNC: Downloading headers for blocks 136001 to 4807558 from peer PEER_D:18333 2025-12-17 18:32:40.885 [INF] SYNC: Lost peer PEER_A:18333 (outbound) 2025-12-17 18:32:41.044 [INF] SYNC: New valid peer PEER_E:18333 (outbound) (/Satoshi:28.1.0/) 2025-12-17 18:32:41.135 [INF] SYNC: New valid peer PEER_F:18333 (outbound) (/Satoshi:27.0.0/) 2025-12-17 18:32:46.438 [INF] CHAN: Verified checkpoint at height 200000/block 0000000000287bffd321963ef05feab753ebe274e1d78b2fd4e2bfe9ad3aa6f2 2025-12-17 18:32:51.245 [INF] SYNC: New valid peer PEER_G:18333 (outbound) (/Satoshi:28.1.0/) 2025-12-17 18:32:55.171 [INF] CHAN: Verified checkpoint at height 300001/block 0000000000004829474748f3d1bc8fcf893c88be255e6d7f571c548aff57abf4 2025-12-17 18:33:03.103 [INF] CHAN: Verified checkpoint at height 400002/block 0000000005e2c73b8ecb82ae2dbc2e8274614ebad7172b53528aba7501f5a089 2025-12-17 18:33:10.885 [INF] SYNC: Downloading headers for blocks 490001 to 4807558 from peer PEER_G:18333 2025-12-17 18:33:10.885 [INF] SYNC: Lost peer PEER_D:18333 (outbound) 2025-12-17 18:33:13.135 [INF] CHAN: Verified checkpoint at height 500011/block 00000000000929f63977fbac92ff570a9bd9e7715401ee96f2848f7b07750b02 2025-12-17 18:33:25.990 [INF] CHAN: Verified checkpoint at height 600002/block 000000000001f471389afd6ee94dcace5ccc44adc18e8bff402443f034b07240 2025-12-17 18:33:38.331 [INF] CHAN: Verified checkpoint at height 700000/block 000000000000406178b12a4dea3b27e13b3c4fe4510994fd667d7c1e6a3f4dc1 2025-12-17 18:33:40.884 [INF] SYNC: Downloading headers for blocks 718001 to 4807558 from peer PEER_E:18333 2025-12-17 18:33:40.888 [INF] SYNC: Lost peer PEER_G:18333 (outbound) 2025-12-17 18:33:46.254 [INF] SYNC: New valid peer PEER_H:18333 (outbound) (/Satoshi:25.1.0/) 2025-12-17 18:33:50.353 [INF] CHAN: Verified checkpoint at height 800010/block 000000000017ed35296433190b6829db01e657d80631d43f5983fa403bfdb4c1 2025-12-17 18:34:01.433 [INF] CHAN: Verified checkpoint at height 900000/block 0000000000356f8d8924556e765b7a94aaebc6b5c8685dcfa2b1ee8b41acd89b 2025-12-17 18:34:10.884 [INF] SYNC: Downloading headers for blocks 986001 to 4807558 from peer PEER_E:18333 2025-12-17 18:34:10.884 [INF] SYNC: Lost peer PEER_E:18333 (outbound) 2025-12-17 18:34:10.884 [INF] SYNC: Downloading headers for blocks 986001 to 4807558 from peer PEER_H:18333 2025-12-17 18:34:11.269 [INF] SYNC: New valid peer PEER_I:18333 (outbound) (/Satoshi:25.1.0/) 2025-12-17 18:34:11.270 [INF] SYNC: New valid peer PEER_J:18333 (outbound) (/Satoshi:0.20.1/) 2025-12-17 18:34:13.213 [INF] CHAN: Verified checkpoint at height 1000007/block 00000000001ccb893d8a1f25b70ad173ce955e5f50124261bbbc50379a612ddf 2025-12-17 18:34:16.312 [INF] SYNC: New valid peer PEER_K:18333 (outbound) (/Satoshi:27.0.0/) 2025-12-17 18:34:25.081 [INF] CHAN: Verified checkpoint at height 1100007/block 00000000000abc7b2cd18768ab3dee20857326a818d1946ed6796f42d66dd1e8 ```One thing I noticed though is that the peers may D/C us along the way, so we rotate through a few peers to fetch all the headers. This is testnet3 , there's quite a lot of headers (over 4 million). Are we hitting something like default per-peer upload limit for IIUC this is a stepping stone for multi-peer header download, so perhaps that'll be sorted out as we'll fetch blocks of headers from many peers in parallel. |
| return headers | ||
| } | ||
|
|
||
| func TestProcessBlockHeader(t *testing.T) { |
| // If we don't already know this is a valid header, return false and | ||
| // BFNone. | ||
| if !sm.chain.IsValidHeader(blockHash) { | ||
| return false, blockchain.BFNone |
There was a problem hiding this comment.
do we want to be explicit? else we can just return the decalred variables.
| filename := filepath.Join("testdata/", testFile) | ||
|
|
||
| file, err := os.Open(filename) | ||
| if err != nil { |
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
| b, err := hex.DecodeString(line) | ||
| if err != nil { |
There was a problem hiding this comment.
nit: use the require package, and other places where t.Fatal was used.
| t.Fatal(err) | ||
| } | ||
|
|
||
| if !isMainChain { |
| header.BlockHash()) | ||
| } | ||
| } | ||
| if err != nil { |
| require.Equal(t, test.behaviorFlags, gotFlags) | ||
| } | ||
| } | ||
|
|
| return 0 | ||
| } | ||
|
|
||
| func TestIsInIBDMode(t *testing.T) { |
| // Orphan headers are not allowed and this function should never be called | ||
| // with the genesis block. | ||
| prevHash := &header.PrevBlock | ||
| prevNode := b.index.LookupNode(prevHash) |
There was a problem hiding this comment.
seems like a duplicate prevnode lookup, we just did it above in the if node != nil block. Maybe we should consolidate it?.
| _, height := sm.chain.BestHeader() | ||
| higherPeers := sm.fetchHigherPeers(height) | ||
| var bestPeer *peerpkg.Peer | ||
| switch { |
There was a problem hiding this comment.
do we expect other cases in the future else we can just use if?
| if node == nil { | ||
| node = newBlockNode(header, prevNode) | ||
| node.status = statusHeaderStored | ||
| b.index.AddNode(node) |
There was a problem hiding this comment.
The if node == nil guard here correctly avoids overwriting an existing node. But maybeAcceptBlock at line 67 has no such guard — it unconditionally creates a fresh *blockNode and calls AddNode, overwriting the index entry. If this header-only node already exists in bestHeader's chainView, the old pointer becomes orphaned while the chainView still holds it.
After that, bestHeader.Contains(b.index.LookupNode(&hash)) returns false because LookupNode returns the new pointer while bestHeader still has the old one. This breaks IsValidHeader, HeaderHashByHeight, and checkHeadersList in netsync.
Could maybeAcceptBlock check if a node already exists and upgrade its status to statusDataStored rather than creating a new one? Similar to the if node == nil pattern here.
| str := fmt.Sprintf( | ||
| "previous block %s is known to be invalid", prevHash) | ||
| return false, ruleError(ErrInvalidAncestorBlock, str) | ||
| } |
There was a problem hiding this comment.
Small gap here -- when the header is already known (node != nil), the code checks prevNode.KnownInvalid() but not node.status.KnownInvalid() itself. The godoc at line 109-111 promises we return an error for known-invalid headers, so we should probably add that check before the bestHeader.Contains early return.
| node, hasBlock := bi.index[*hash] | ||
| bi.RUnlock() | ||
| return hasBlock | ||
| return hasBlock && node.status.HaveData() |
There was a problem hiding this comment.
Small concurrency note -- HaveBlock now reads node.status after releasing RLock at line 395. The old code only returned a captured bool. Since SetStatusFlags/UnsetStatusFlags mutate status under the write lock, this is technically a data race under Go's memory model, even though byte reads are usually atomic in practice. Moving the HaveData() check inside the lock would keep it clean.
| // up a btcd node with an older version, it would result in | ||
| // an unrecoverable error as older versions would consider a | ||
| // blockNode being present as having the block data as well. | ||
| if node.status.HaveHeader() && |
There was a problem hiding this comment.
flushToDB opens a DB write transaction whenever dirty is non-empty, but during header processing every dirty node is header-only and gets skipped here. That means each ProcessBlockHeader call ends up doing a write txn that writes nothing. During long header sync that's a lot of no-op transactions. Maybe we could skip the flush entirely in maybeAcceptBlockHeader, or have flushToDB bail early if all dirty nodes would be skipped?
| } | ||
|
|
||
| // Set the lastNode in the blockindex as the bestheader. | ||
| b.bestHeader.SetTip(lastNode) |
There was a problem hiding this comment.
initChainState sets bestHeader to lastNode here, which is the last entry in the block index bucket's key order (height+hash). If side-chain blocks exist at a height above the best chain tip, this picks the wrong branch. A few lines below, the best block chain tip is resolved from state.hash at line 1270 — should bestHeader be set to the same tip node? Since header-only nodes aren't persisted, after restart the header and block views should be identical.
| peer.Disconnect() | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
The old headersFirstMode guard got removed but there's no replacement check for unsolicited headers. In non-IBD with syncPeer == nil, a peer sending valid unsolicited headers can drive handleHeadersMsg into fetchHeaderBlocks (line 971), which dereferences sm.syncPeer at lines 899/907/920 — that's a nil pointer panic. Might need either restoring a request/response guard or adding a nil check on syncPeer before entering fetchHeaderBlocks.
| sm.headerList.Remove(sm.headerList.Front()) | ||
| log.Infof("Received %v block headers: Fetching blocks", | ||
| sm.headerList.Len()) | ||
| sm.progressLogger.SetLastLogTime(time.Now()) |
There was a problem hiding this comment.
fetchHeaders at line 294 sends getheaders but never refreshes lastProgressTime, and handleHeadersMsg doesn't update it on successful header processing either. With ~870k mainnet headers and a 3-minute stall timeout, the stall handler at line 470 will fire mid-header-download and disconnect a healthy sync peer. This would hit almost immediately on mainnet IBD. We probably need a lastProgressTime refresh in the header processing path, similar to how block processing refreshes it.
| return | ||
| bestHash, bestHeight := sm.chain.BestHeader() | ||
| if sm.ibdMode { | ||
| if bestHeight < sm.syncPeer.LastBlock() { |
There was a problem hiding this comment.
sm.syncPeer.LastBlock() here has no nil guard. syncPeer can already be nil when we enter handleHeadersMsg (e.g., if a peer disconnect was processed first), so this is a reachable panic. A quick nil check before this call would make it safe.
| return sm, tearDown | ||
| } | ||
|
|
||
| func TestCheckHeadersList(t *testing.T) { |
There was a problem hiding this comment.
The new tests cover the helper predicates well, but the risky control flow in handleHeadersMsg, fetchHeaderBlocks, startSync, and stall handling isn't exercised. Those are where the trickiest behavior changes live — would be great to have at least a basic end-to-end test that drives headers through to block download, so regressions in the sync state machine get caught.
| // Check that the tip is correct. | ||
| lastHeader := headers[len(headers)-1] | ||
| lastHeaderHash := lastHeader.BlockHash() | ||
| tipNode := chain.bestHeader.Tip() |
There was a problem hiding this comment.
Small thing — tipNode is captured once here and reused in assertions at lines 134 and 179. If bestHeader moved unexpectedly, these checks would still pass against the stale pointer. Re-fetching chain.bestHeader.Tip() for each assertion would make the "tip didn't move" checks more robust.

Change Description
Right now the headers-first download is only based off of the checkpoints and is thus limited to the last checkpoint.
The newly implemented headers-first download will always download headers-first and will validate them to see if the headers connect and have proper proof of work.
Then the block download will be based off of the verified headers. This now eliminates any potential downloading of txs or orphan blocks during ibd. It also makes future parallel block download much better as a parallel block download can only happen for blocks we already have headers for.
It's not yet put into the code yet but this allows the node to also receive block headers instead of invs during block propagation.I'll do this in a follow up later.Steps to Test
Pull Request Checklist
Testing
Code Style and Documentation
📝 Please see our Contribution Guidelines for further guidance.