Skip to content
Open
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ee1beae
chore(`gettxoutsetinfo`): start writing common types
dorianvp Oct 8, 2025
070b1cd
chore(`gettxoutsetinfo`): initial impl of `get_txout_set`
dorianvp Oct 8, 2025
1a4a340
docs(`gettxoutsetinfo`): draft utxo set spec
dorianvp Oct 9, 2025
729436b
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
dorianvp Oct 10, 2025
b4bfdbb
chore(`gettxoutsetinfo`): initial impl of `ZAINO-UHS-01`
dorianvp Oct 11, 2025
6948367
chore(`gettxoutsetinfo`): add doc comments
dorianvp Oct 12, 2025
d71e1b9
chore(`gettxoutsetinfo`): use `utxoset_hash_v1` in `StateServiceSubsc…
dorianvp Oct 12, 2025
4acaddb
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
dorianvp Oct 12, 2025
438f546
chore(`gettxoutsetinfo`): add doc comments
dorianvp Oct 13, 2025
03c9e75
test(`gettxoutsetinfo`): add `fetch_service` test
dorianvp Oct 13, 2025
485af98
test(`gettxoutsetinfo`): add top-level comment
dorianvp Oct 13, 2025
6b7bc67
test(`gettxoutsetinfo`): add top-level comment
dorianvp Oct 13, 2025
6ac601a
test(`gettxoutsetinfo`): add `state_service_get_txout_set_info`
dorianvp Oct 13, 2025
ace5bc5
chore(`gettxoutsetinfo`): run clippy
dorianvp Oct 13, 2025
b816cc1
chore(`gettxoutsetinfo`): enable endpoint & add tests
dorianvp Oct 13, 2025
4b1bf7c
test(`gettxoutsetinfo`): add `uhs` tests
dorianvp Oct 14, 2025
e24895d
chore(`gettxoutsetinfo`): comments
dorianvp Oct 14, 2025
ea1dbfd
chore(`gettxoutsetinfo`): typed network enum
dorianvp Oct 14, 2025
b1f40ca
chore(`gettxoutsetinfo`): re-export `zaino_common::Network`
dorianvp Oct 14, 2025
f0b0f0e
chore(`gettxoutsetinfo`): address todos
dorianvp Oct 14, 2025
ab8b70d
chore(`gettxoutsetinfo`): add `byte_order_tests`
dorianvp Oct 14, 2025
5c92768
chore(`gettxoutsetinfo`): add `utxo_serialized_size`
dorianvp Oct 14, 2025
eebeab3
chore(`gettxoutsetinfo`): fix last todo
dorianvp Oct 14, 2025
380d2b4
add references for CompactSize
zancas Oct 15, 2025
df38588
make reference explicit
zancas Oct 15, 2025
6d86ef3
chore(`gettxoutsetinfo`): small spec corrections
dorianvp Oct 16, 2025
c4d8622
add header and terminology, propose refinement to abstract
zancas Oct 17, 2025
6dbda1b
ZI-ng-P: 0
zancas Oct 17, 2025
37377d9
start moving sections to more closely align with https://github.com/z…
zancas Oct 17, 2025
758b19d
reference in references, BCP 14
zancas Oct 17, 2025
2c42ba9
fix space in footnote
zancas Oct 17, 2025
217f917
fix footnote
zancas Oct 17, 2025
6d04353
futz with terminology
zancas Oct 17, 2025
28c5618
bold consensus network
zancas Oct 17, 2025
335cd6f
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
dorianvp Oct 17, 2025
2036641
make rpc name part of Title
zancas Oct 17, 2025
d875938
Merge remote-tracking branch 'zingolabs/feat/rpc-gettxoutsetinfo' int…
zancas Oct 17, 2025
abc2d6a
docs(`gettxoutsetinfo`): use `network` instead of `consensus network`
dorianvp Oct 17, 2025
a311fc9
docs(`gettxoutsetinfo`): replace `network` with `genesis_block_hash`
dorianvp Oct 18, 2025
d2d43c1
chore(`gettxoutsetinfo`): remove `BlockHash`, update spec & impl
dorianvp Oct 19, 2025
0c152f1
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
dorianvp Oct 19, 2025
0954755
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
dorianvp Oct 20, 2025
1f5415e
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
dorianvp Nov 3, 2025
b1e68d0
Merge branch 'dev' into feat/rpc-gettxoutsetinfo
zancas Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions docs/json_rpc/gettxoutsetinfo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `gettxoutsetinfo`

See [Zaino's Unspent Hash set](./gettxoutsetinfo/canonical_utxo_set_snapshot_hash.md) for more information on how the UTXO set hash is computed.
169 changes: 169 additions & 0 deletions docs/json_rpc/gettxoutsetinfo/canonical_utxo_set_snapshot_hash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
Title: ZAINO-UTXOSET-01 Canonical UTXO Set Snapshot Hash (v1)
Owners: dorianvp <dorianvp@zingolabs.org>
Za Wil <zancas@zingolabs.org>
Status: Draft
Category: Lightclients
Created: 2025-10-16
License: MIT

## Terminology

- The key words **MUST**, **MUST NOT**, **SHOULD**, and **MAY** are to be interpreted as described in BCP 14 [^BCP14] when, and only when, they appear in all capitals..
- Integers are encoded **little-endian** unless otherwise stated.
- “CompactSize” refers to the [Bitcoin Specified](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer) [Zcash Implementation](https://docs.rs/zcash_encoding/0.3.0/zcash_encoding/struct.CompactSize.html) of variable-length integer format.
- `BLAKE3` denotes the 32-byte output of the BLAKE3 hash function.
- This specification defines **version 1** (“V1”) of the ZAINO UTXO snapshot.
- **network**:
a blockchain instance identified by its genesis block and consensus parameters.

## Abstract

This document specifies a deterministic, versioned procedure to compute a 32-byte hash of a node’s UTXO set at a specified best block. The snapshot uses a canonical ordering and serialization and is hashed under a domain tag.

Among other uses, the snapshot hash can be used to:

- Verify that two nodes at the same best block have the same UTXO set across implementations and versions.
- Pin failing test fixtures to a snapshot hash to reproduce issues.
- Log periodic hashes to show continuity of state over time.

The hash is _not_ input to consensus validation.

## Motivation

Different nodes (e.g., `zcashd`, Zebra, indexers) may expose distinct internals or storage layouts. Operators often need a cheap way to verify “we’re looking at the same unspent set” without transporting the entire set. A canonical, versioned snapshot hash solves this.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that zcashd had this method returning a hash_serialized already, but, why is it not enough to check that the block hashes match?


## Domain Separation

Implementations **MUST** domain-separate the hash with the ASCII header:

```
"ZAINO-UTXOSET-V1\0"
```

Any change to the encoding rules or semantics **MUST** bump the domain string (e.g., `…-V2\0`) and is out of scope of this document.

## Inputs

To compute the snapshot hash, the implementation needs:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include anything other than the UTXOs as inputs in the snapshot hash? Shouldn't we already know that we're looking at the same UTXO set if the best block hashes match?


- `network`: ASCII string identifying the chain. Recommended values: `"mainnet"`, `"testnet"`, `"regtest"`.
- `best_height`: the best chain height at the time of the snapshot (unsigned 32-bit).
- `best_block`: the 32-byte block hash of the best chain tip, in the node’s _canonical internal byte order_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a reference for the canonical internal byte order?

- `UTXO set`: a finite multimap keyed by outpoints `(txid, vout)` to outputs `(value_zat, scriptPubKey)`, where:

- `txid` is a 32-byte transaction hash (internal byte order).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above?

- `vout` is a 32-bit output index (0-based).
Copy link
Member Author

@dorianvp dorianvp Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a note here:

If we serialize per unspent as txid || value || script and a transaction contains two outputs with identical (value, script), then two different UTXO sets that differ only by which index is unspent will serialize to the same bytes (and hash).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vout is a misleading name for this; I would call it output_index because vout is generally used to refer to the vector of outputs of a transaction; I was confused by this name before I got to this line.

- `value_zat` is a non-negative amount in zatoshis, range-checked to the node’s monetary bounds (e.g., `0 ≤ value_zat ≤ MAX_MONEY`).
- `scriptPubKey` is a byte string.

Implementations **MUST** reject negative values or out-of-range amounts prior to hashing.

## Canonical Ordering

The snapshot **MUST** be ordered as follows, independent of the node’s in-memory layout:

1. Sort by `txid` ascending, comparing the raw 32-byte values as unsigned bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bad serialization, because it requires recomputation over the entire UTXO set whenever a new block is received. The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the UTXO set were stored in a B-tree data structure that internally kept Merkle hashes at the nodes, then it might be okay to use the Merkle root of that data structure for the snapshot identifier. It would need to be the case that the fanout of the B-tree and the insertion semantics were well-specified to ensure that everyone uses the same hashing approach.

One possibility that would allow for this to work as-specified would be to use a separate B-tree (implementing a set, rather than a map) for producing the hashes; since the txid commits to the effects of each transaction, one could build the snapshot identifier alongside the actual data, but building that identifier in parallel would have a risk of data inconsistencies with the primary store.

In general, I feel like the UTXO set would be best represented as a persistent data structure with good amortized append costs.

Copy link

@arya2 arya2 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.

sparse-merkle-tree could be useful here, Zaino could:

  • Implement Value for a struct representing the transaction output data to which hash_serialized is committing,
  • Implement StoreReadOps/StoreWriteOps for on-disk storage of the tree,
  • Update the tree and a cache with the other fields in TxOutSetInfo when Zaino is indexing blocks, and
  • Return the cached TxOutSetInfo from the RPC method.

2. For equal `txid`s, sort by `vout` ascending (unsigned 32-bit).

This ordering **MUST** be used for serialization.

## Serialization

The byte stream fed to the hash is the concatenation of a **header** and **entries**:

### Header

- ASCII bytes: `"ZAINO-UTXOSET-V1\0"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this \0 present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only acts as a terminator/delimiter. It is not strictly necessary...

- `network` as ASCII bytes, followed by a single NUL byte `0x00`.
- `best_height` as `u32` little-endian.
- `best_block` as 32 raw bytes.
- `count_txouts` as `u64` little-endian, where `count_txouts` is the total number of serialized entries below.

### Entries (one per outpoint in canonical order)

For each `(txid, vout, value_zat, scriptPubKey)`:

- `txid` as 32 raw bytes.
- `vout` as `u32` little-endian.
- `value_zat` as `u64` little-endian.
- `script_len` as CompactSize (Bitcoin/Zcash varint) of `scriptPubKey.len()`.
- `scriptPubKey` raw bytes.

**Note:** No per-transaction terminators or grouping markers are used. Instead, the format commits to _outputs_, not _transactions_.

### CompactSize ([reference](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer))

- If `n < 0xFD`: a single byte `n`.
- Else if `n ≤ 0xFFFF`: `0xFD` followed by `n` as `u16` little-endian.
- Else if `n ≤ 0xFFFF_FFFF`: `0xFE` followed by `n` as `u32` little-endian.
- Else: `0xFF` followed by `n` as `u64` little-endian.

## Hash Function

- The implementation **MUST** stream the bytes above into a BLAKE3 hasher.
- The 32-byte output of the hasher is the **snapshot hash**.

## Pseudocode

```text
function UtxoSnapshotHashV1(network, best_height, best_block, utxos):
H ← blake3::Hasher()

// Header
H.update("ZAINO-UTXOSET-V1\0")
H.update(network)
H.update("\0")
H.update(le_u32(best_height))
H.update(best_block) // 32 raw bytes, node’s canonical order
count ← number_of_outputs(utxos)
H.update(le_u64(count))

// Entries in canonical order
for (txid, vout, value, script) in sort_by_txid_then_vout(utxos):
assert 0 ≤ value ≤ MAX_MONEY
H.update(txid) // 32 raw bytes
H.update(le_u32(vout))
H.update(le_u64(value)) // zatoshis
H.update(CompactSize(script.len))
H.update(script)

return H.finalize() // 32-byte BLAKE3 digest
```

## Error Handling

- If any `value_zat` is negative or exceeds `MAX_MONEY`, the snapshot procedure **MUST** fail and **MUST NOT** produce a hash.
- If the UTXO set changes during iteration (non-atomic read), the implementation **SHOULD** retry using a stable view (e.g., read lock or height-pinned snapshot).

## Security and Interop Considerations

- This hash is **not a consensus commitment** and **MUST NOT** be used to validate blocks or transactions.
- The domain string prevents cross-protocol collisions.
- Including `network`, `best_height`, and `best_block` prevents accidental equality across different nodes or heights.
- Because the order is fully specified, two independent implementations reading the _same_ set will produce the _same_ hash.

## Rationale

- **BLAKE3** is chosen for speed and strong modern security. SHA-256 would also work but is slower in large sets. The domain string ensures local uniqueness regardless of the hash function family.
- Committing to _outputs_ rather than _transactions_ simplifies implementations that don’t have transaction-grouped storage.
- CompactSize matches existing Bitcoin/Zcash encoding and avoids ambiguity.

## Versioning

- Any breaking change to the byte stream, input semantics, or ordering **MUST** bump the domain tag to `ZAINO-UTXOSET-V2\0` (or higher).
- Implementations **SHOULD** publish the version alongside the hash in logs and APIs.

## Test Guidance

Implementations **SHOULD** include tests covering:

1. **Determinism:** Shuffle input, and the hash remains constant.
2. **Sensitivity:** Flip one bit in `value_zat` or `scriptPubKey`, and the hash changes.
3. **Metadata:** Change `network` or `best_block`, and the hash changes.
4. **Empty Set:** With `count_txouts = 0`, the hash is well-defined.
5. **Large Scripts:** Scripts with CompactSize boundaries (252, 253, 2^16, 2^32).
6. **Ordering:** Two entries with same `txid` different `vout` are ordered by `vout`.

## References

[^BCP14]: [Information on BCP 14 — "RFC 2119"](https://www.rfc-editor.org/info/bcp14)
56 changes: 55 additions & 1 deletion integration-tests/tests/fetch_service.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
//! These tests compare the output of `FetchService` with the output of `JsonRpcConnector`.
//! These tests compare the output of `FetchService` with the output of [`JsonRpSeeConnector`].
//!
//! Note that they both rely on the [`JsonRpSeeConnector`] to get the data.

use futures::StreamExt as _;
use zaino_common::network::ActivationHeights;
Expand Down Expand Up @@ -502,6 +504,53 @@ async fn fetch_service_get_address_tx_ids(validator: &ValidatorKind) {
test_manager.close().await;
}

async fn fetch_service_get_txout_set_info() {
let (mut test_manager, _fetch_service, fetch_service_subscriber) =
create_test_manager_and_fetch_service(&ValidatorKind::Zcashd, None, true, true, true, true)
.await;

let mut clients = test_manager
.clients
.take()
.expect("Clients are not initialized");
clients.faucet.sync_and_await().await.unwrap();

let recipient_ua = clients.get_recipient_address("unified").await;
let _tx = zaino_testutils::from_inputs::quick_send(
&mut clients.faucet,
vec![(&recipient_ua, 250_000, None)],
)
.await
.unwrap();

test_manager.local_net.generate_blocks(1).await.unwrap();
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;

let txout_set_info = fetch_service_subscriber.get_txout_set_info().await.unwrap();

let jsonrpc_client = JsonRpSeeConnector::new_with_basic_auth(
test_node_and_return_url(
test_manager.zebrad_rpc_listen_address,
false,
None,
Some("xxxxxx".to_string()),
Some("xxxxxx".to_string()),
)
.await
.unwrap(),
"xxxxxx".to_string(),
"xxxxxx".to_string(),
)
.unwrap();
let json_rpc_txout_set_info = jsonrpc_client.get_txout_set_info().await.unwrap();
dbg!(&json_rpc_txout_set_info);
dbg!(&txout_set_info);

assert_eq!(txout_set_info, json_rpc_txout_set_info);

test_manager.close().await;
}

async fn fetch_service_get_address_utxos(validator: &ValidatorKind) {
let (mut test_manager, _fetch_service, fetch_service_subscriber) =
create_test_manager_and_fetch_service(validator, None, true, true, true, true).await;
Expand Down Expand Up @@ -1528,6 +1577,11 @@ mod zcashd {
fetch_service_get_address_tx_ids(&ValidatorKind::Zcashd).await;
}

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
pub(crate) async fn txout_set_info() {
fetch_service_get_txout_set_info().await;
}

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
pub(crate) async fn address_utxos() {
fetch_service_get_address_utxos(&ValidatorKind::Zcashd).await;
Expand Down
46 changes: 45 additions & 1 deletion integration-tests/tests/json_server.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Tests that compare the output of both `zcashd` and `zainod` through `FetchService`.
//! Tests that compare the output of both `zcashd` and `zainod` through [`FetchService`].

use zaino_common::network::ActivationHeights;
use zaino_common::{DatabaseConfig, ServiceConfig, StorageConfig};
Expand Down Expand Up @@ -726,6 +726,50 @@ mod zcashd {
test_manager.close().await;
}

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn get_txout_set_info() {
let (
mut test_manager,
_zcashd_service,
zcashd_subscriber,
_zaino_service,
zaino_subscriber,
) = create_test_manager_and_fetch_services(false, true).await;

let mut clients = test_manager
.clients
.take()
.expect("Clients are not initialized");

test_manager.local_net.generate_blocks(1).await.unwrap();
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;

clients.faucet.sync_and_await().await.unwrap();

let recipient_ua = &clients.get_recipient_address("unified").await;
let recipient_taddr = &clients.get_recipient_address("transparent").await;
from_inputs::quick_send(&mut clients.faucet, vec![(recipient_taddr, 250_000, None)])
.await
.unwrap();
from_inputs::quick_send(&mut clients.faucet, vec![(recipient_ua, 250_000, None)])
.await
.unwrap();

tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;

let zcashd_subscriber_txout_set_info =
zcashd_subscriber.get_txout_set_info().await.unwrap();
let zaino_subscriber_txout_set_info =
zaino_subscriber.get_txout_set_info().await.unwrap();

assert_eq!(
zcashd_subscriber_txout_set_info,
zaino_subscriber_txout_set_info
);

test_manager.close().await;
}

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn get_peer_info() {
let (
Expand Down
Loading
Loading