-
Notifications
You must be signed in to change notification settings - Fork 28
feat(rpc): add gettxoutsetinfo endpoint
#595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from 38 commits
ee1beae
070b1cd
1a4a340
729436b
b4bfdbb
6948367
d71e1b9
4acaddb
438f546
03c9e75
485af98
6b7bc67
6ac601a
ace5bc5
b816cc1
4b1bf7c
e24895d
ea1dbfd
b1f40ca
f0b0f0e
ab8b70d
5c92768
eebeab3
380d2b4
df38588
6d86ef3
c4d8622
6dbda1b
37377d9
758b19d
2c42ba9
217f917
6d04353
28c5618
335cd6f
2036641
d875938
abc2d6a
a311fc9
d2d43c1
0c152f1
0954755
1f5415e
b1e68d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # `gettxoutsetinfo` | ||
|
|
||
| See [Zaino's Unspent Hash set](./gettxoutsetinfo/canonical_utxo_set_snapshot_hash.md) for more information on how the UTXO set hash is computed. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| Title: ZAINO-UTXOSET-01 Canonical UTXO Set Snapshot Hash (v1) | ||
| Owners: dorianvp <dorianvp@zingolabs.org> | ||
| Za Wil <zancas@zingolabs.org> | ||
| Status: Draft | ||
| Category: Lightclients | ||
| Created: 2025-10-16 | ||
| License: MIT | ||
|
|
||
| ## Terminology | ||
|
|
||
| - The key words **MUST**, **MUST NOT**, **SHOULD**, and **MAY** are to be interpreted as described in BCP 14 [^BCP14] when, and only when, they appear in all capitals.. | ||
| - Integers are encoded **little-endian** unless otherwise stated. | ||
| - “CompactSize” refers to the [Bitcoin Specified](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer) [Zcash Implementation](https://docs.rs/zcash_encoding/0.3.0/zcash_encoding/struct.CompactSize.html) of variable-length integer format. | ||
| - `BLAKE3` denotes the 32-byte output of the BLAKE3 hash function. | ||
| - This specification defines **version 1** (“V1”) of the ZAINO UTXO snapshot. | ||
| - **network**: | ||
| a blockchain instance identified by its genesis block and consensus parameters. | ||
|
|
||
| ## Abstract | ||
|
|
||
| This document specifies a deterministic, versioned procedure to compute a 32-byte hash of a node’s UTXO set at a specified best block. The snapshot uses a canonical ordering and serialization and is hashed under a domain tag. | ||
|
|
||
| Among other uses, the snapshot hash can be used to: | ||
|
|
||
| - Verify that two nodes at the same best block have the same UTXO set across implementations and versions. | ||
| - Pin failing test fixtures to a snapshot hash to reproduce issues. | ||
| - Log periodic hashes to show continuity of state over time. | ||
|
|
||
| The hash is _not_ input to consensus validation. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Different nodes (e.g., `zcashd`, Zebra, indexers) may expose distinct internals or storage layouts. Operators often need a cheap way to verify “we’re looking at the same unspent set” without transporting the entire set. A canonical, versioned snapshot hash solves this. | ||
|
|
||
| ## Domain Separation | ||
|
|
||
| Implementations **MUST** domain-separate the hash with the ASCII header: | ||
|
|
||
| ``` | ||
| "ZAINO-UTXOSET-V1\0" | ||
| ``` | ||
|
|
||
| Any change to the encoding rules or semantics **MUST** bump the domain string (e.g., `…-V2\0`) and is out of scope of this document. | ||
|
|
||
| ## Inputs | ||
|
|
||
| To compute the snapshot hash, the implementation needs: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why include anything other than the UTXOs as inputs in the snapshot hash? Shouldn't we already know that we're looking at the same UTXO set if the best block hashes match? |
||
|
|
||
| - `network`: ASCII string identifying the chain. Recommended values: `"mainnet"`, `"testnet"`, `"regtest"`. | ||
| - `best_height`: the best chain height at the time of the snapshot (unsigned 32-bit). | ||
dorianvp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - `best_block`: the 32-byte block hash of the best chain tip, in the node’s _canonical internal byte order_. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we have a reference for the canonical internal byte order? |
||
| - `UTXO set`: a finite multimap keyed by outpoints `(txid, vout)` to outputs `(value_zat, scriptPubKey)`, where: | ||
|
|
||
| - `txid` is a 32-byte transaction hash (internal byte order). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above? |
||
| - `vout` is a 32-bit output index (0-based). | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leaving a note here: If we serialize per unspent as
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| - `value_zat` is a non-negative amount in zatoshis, range-checked to the node’s monetary bounds (e.g., `0 ≤ value_zat ≤ MAX_MONEY`). | ||
| - `scriptPubKey` is a byte string. | ||
|
|
||
| Implementations **MUST** reject negative values or out-of-range amounts prior to hashing. | ||
|
|
||
| ## Canonical Ordering | ||
zancas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The snapshot **MUST** be ordered as follows, independent of the node’s in-memory layout: | ||
|
|
||
| 1. Sort by `txid` ascending, comparing the raw 32-byte values as unsigned bytes. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a bad serialization, because it requires recomputation over the entire UTXO set whenever a new block is received. The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the UTXO set were stored in a B-tree data structure that internally kept Merkle hashes at the nodes, then it might be okay to use the Merkle root of that data structure for the snapshot identifier. It would need to be the case that the fanout of the B-tree and the insertion semantics were well-specified to ensure that everyone uses the same hashing approach. One possibility that would allow for this to work as-specified would be to use a separate B-tree (implementing a set, rather than a map) for producing the hashes; since the txid commits to the effects of each transaction, one could build the snapshot identifier alongside the actual data, but building that identifier in parallel would have a risk of data inconsistencies with the primary store. In general, I feel like the UTXO set would be best represented as a persistent data structure with good amortized append costs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| 2. For equal `txid`s, sort by `vout` ascending (unsigned 32-bit). | ||
|
|
||
| This ordering **MUST** be used for serialization. | ||
|
|
||
| ## Serialization | ||
|
|
||
| The byte stream fed to the hash is the concatenation of a **header** and **entries**: | ||
|
|
||
| ### Header | ||
|
|
||
| - ASCII bytes: `"ZAINO-UTXOSET-V1\0"` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It only acts as a terminator/delimiter. It is not strictly necessary... |
||
| - `network` as ASCII bytes, followed by a single NUL byte `0x00`. | ||
| - `best_height` as `u32` little-endian. | ||
| - `best_block` as 32 raw bytes. | ||
| - `count_txouts` as `u64` little-endian, where `count_txouts` is the total number of serialized entries below. | ||
|
|
||
| ### Entries (one per outpoint in canonical order) | ||
|
|
||
| For each `(txid, vout, value_zat, scriptPubKey)`: | ||
|
|
||
| - `txid` as 32 raw bytes. | ||
| - `vout` as `u32` little-endian. | ||
| - `value_zat` as `u64` little-endian. | ||
| - `script_len` as CompactSize (Bitcoin/Zcash varint) of `scriptPubKey.len()`. | ||
| - `scriptPubKey` raw bytes. | ||
|
|
||
| **Note:** No per-transaction terminators or grouping markers are used. Instead, the format commits to _outputs_, not _transactions_. | ||
|
|
||
| ### CompactSize ([reference](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer)) | ||
|
|
||
| - If `n < 0xFD`: a single byte `n`. | ||
| - Else if `n ≤ 0xFFFF`: `0xFD` followed by `n` as `u16` little-endian. | ||
| - Else if `n ≤ 0xFFFF_FFFF`: `0xFE` followed by `n` as `u32` little-endian. | ||
| - Else: `0xFF` followed by `n` as `u64` little-endian. | ||
|
|
||
| ## Hash Function | ||
|
|
||
| - The implementation **MUST** stream the bytes above into a BLAKE3 hasher. | ||
| - The 32-byte output of the hasher is the **snapshot hash**. | ||
|
|
||
| ## Pseudocode | ||
|
|
||
| ```text | ||
| function UtxoSnapshotHashV1(network, best_height, best_block, utxos): | ||
| H ← blake3::Hasher() | ||
|
|
||
| // Header | ||
| H.update("ZAINO-UTXOSET-V1\0") | ||
| H.update(network) | ||
| H.update("\0") | ||
| H.update(le_u32(best_height)) | ||
| H.update(best_block) // 32 raw bytes, node’s canonical order | ||
| count ← number_of_outputs(utxos) | ||
| H.update(le_u64(count)) | ||
|
|
||
| // Entries in canonical order | ||
| for (txid, vout, value, script) in sort_by_txid_then_vout(utxos): | ||
| assert 0 ≤ value ≤ MAX_MONEY | ||
| H.update(txid) // 32 raw bytes | ||
| H.update(le_u32(vout)) | ||
| H.update(le_u64(value)) // zatoshis | ||
| H.update(CompactSize(script.len)) | ||
| H.update(script) | ||
|
|
||
| return H.finalize() // 32-byte BLAKE3 digest | ||
| ``` | ||
|
|
||
| ## Error Handling | ||
|
|
||
| - If any `value_zat` is negative or exceeds `MAX_MONEY`, the snapshot procedure **MUST** fail and **MUST NOT** produce a hash. | ||
| - If the UTXO set changes during iteration (non-atomic read), the implementation **SHOULD** retry using a stable view (e.g., read lock or height-pinned snapshot). | ||
|
|
||
| ## Security and Interop Considerations | ||
|
|
||
| - This hash is **not a consensus commitment** and **MUST NOT** be used to validate blocks or transactions. | ||
| - The domain string prevents cross-protocol collisions. | ||
| - Including `network`, `best_height`, and `best_block` prevents accidental equality across different nodes or heights. | ||
dorianvp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Because the order is fully specified, two independent implementations reading the _same_ set will produce the _same_ hash. | ||
|
|
||
| ## Rationale | ||
|
|
||
| - **BLAKE3** is chosen for speed and strong modern security. SHA-256 would also work but is slower in large sets. The domain string ensures local uniqueness regardless of the hash function family. | ||
| - Committing to _outputs_ rather than _transactions_ simplifies implementations that don’t have transaction-grouped storage. | ||
| - CompactSize matches existing Bitcoin/Zcash encoding and avoids ambiguity. | ||
|
|
||
| ## Versioning | ||
|
|
||
| - Any breaking change to the byte stream, input semantics, or ordering **MUST** bump the domain tag to `ZAINO-UTXOSET-V2\0` (or higher). | ||
| - Implementations **SHOULD** publish the version alongside the hash in logs and APIs. | ||
|
|
||
| ## Test Guidance | ||
|
|
||
| Implementations **SHOULD** include tests covering: | ||
|
|
||
| 1. **Determinism:** Shuffle input, and the hash remains constant. | ||
| 2. **Sensitivity:** Flip one bit in `value_zat` or `scriptPubKey`, and the hash changes. | ||
| 3. **Metadata:** Change `network` or `best_block`, and the hash changes. | ||
| 4. **Empty Set:** With `count_txouts = 0`, the hash is well-defined. | ||
| 5. **Large Scripts:** Scripts with CompactSize boundaries (252, 253, 2^16, 2^32). | ||
| 6. **Ordering:** Two entries with same `txid` different `vout` are ordered by `vout`. | ||
|
|
||
| ## References | ||
|
|
||
| [^BCP14]: [Information on BCP 14 — "RFC 2119"](https://www.rfc-editor.org/info/bcp14) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that zcashd had this method returning a
hash_serializedalready, but, why is it not enough to check that the block hashes match?