-
Notifications
You must be signed in to change notification settings - Fork 28
feat(rpc): add gettxoutsetinfo endpoint
#595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
876bba8 to
1a4a340
Compare
|
This PR is mostly ready. I only left a couple of review to upgrade some types, but functionally it is ready for review. |
|
I see several equivalence tests, how about some non-equivalance tests proving sensitivity to variance in txoutsets? |
…o feat/rpc-gettxoutsetinfo
| - `UTXO set`: a finite multimap keyed by outpoints `(txid, vout)` to outputs `(value_zat, scriptPubKey)`, where: | ||
|
|
||
| - `txid` is a 32-byte transaction hash (internal byte order). | ||
| - `vout` is a 32-bit output index (0-based). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a note here:
If we serialize per unspent as txid || value || script and a transaction contains two outputs with identical (value, script), then two different UTXO sets that differ only by which index is unspent will serialize to the same bytes (and hash).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vout is a misleading name for this; I would call it output_index because vout is generally used to refer to the vector of outputs of a transaction; I was confused by this name before I got to this line.
|
|
||
| The snapshot **MUST** be ordered as follows, independent of the node’s in-memory layout: | ||
|
|
||
| 1. Sort by `txid` ascending, comparing the raw 32-byte values as unsigned bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bad serialization, because it requires recomputation over the entire UTXO set whenever a new block is received. The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the UTXO set were stored in a B-tree data structure that internally kept Merkle hashes at the nodes, then it might be okay to use the Merkle root of that data structure for the snapshot identifier. It would need to be the case that the fanout of the B-tree and the insertion semantics were well-specified to ensure that everyone uses the same hashing approach.
One possibility that would allow for this to work as-specified would be to use a separate B-tree (implementing a set, rather than a map) for producing the hashes; since the txid commits to the effects of each transaction, one could build the snapshot identifier alongside the actual data, but building that identifier in parallel would have a risk of data inconsistencies with the primary store.
In general, I feel like the UTXO set would be best represented as a persistent data structure with good amortized append costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.
sparse-merkle-tree could be useful here, Zaino could:
- Implement
Valuefor a struct representing the transaction output data to whichhash_serializedis committing, - Implement
StoreReadOps/StoreWriteOpsfor on-disk storage of the tree, - Update the tree and a cache with the other fields in
TxOutSetInfowhen Zaino is indexing blocks, and - Return the cached
TxOutSetInfofrom the RPC method.
arya2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some performance and concurrency issues with this design. It would be better to update a cached TxOutSetInfo as Zaino indexes blocks and to use a merkle tree to update the hash_serialized field.
|
|
||
| let mut state = state.clone(); | ||
|
|
||
| let zebra_state::ReadResponse::Tip { 0: tip } = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Why destructure it this way rather than as a tuple struct?
| let zebra_state::ReadResponse::Tip { 0: tip } = state | |
| let zebra_state::ReadResponse::Tip(tip) = state |
| } | ||
| } | ||
|
|
||
| /// Fetches all UTXOs from the state service, and returns them in a map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading every UTXO into memory shouldn't be necessary and could require more memory than is available on some systems.
|
|
||
| ## Motivation | ||
|
|
||
| Different nodes (e.g., `zcashd`, Zebra, indexers) may expose distinct internals or storage layouts. Operators often need a cheap way to verify “we’re looking at the same unspent set” without transporting the entire set. A canonical, versioned snapshot hash solves this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that zcashd had this method returning a hash_serialized already, but, why is it not enough to check that the block hashes match?
|
|
||
| ## Inputs | ||
|
|
||
| To compute the snapshot hash, the implementation needs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why include anything other than the UTXOs as inputs in the snapshot hash? Shouldn't we already know that we're looking at the same UTXO set if the best block hashes match?
|
|
||
| The snapshot **MUST** be ordered as follows, independent of the node’s in-memory layout: | ||
|
|
||
| 1. Sort by `txid` ascending, comparing the raw 32-byte values as unsigned bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UTXO set can be very large; it would be much better to choose a snapshot protocol where snapshot hashes can incrementally build on the snapshot hash prior to the addition of a new block.
sparse-merkle-tree could be useful here, Zaino could:
- Implement
Valuefor a struct representing the transaction output data to whichhash_serializedis committing, - Implement
StoreReadOps/StoreWriteOpsfor on-disk storage of the tree, - Update the tree and a cache with the other fields in
TxOutSetInfowhen Zaino is indexing blocks, and - Return the cached
TxOutSetInfofrom the RPC method.
| let blk = state | ||
| .call(ReadRequest::Block(HashOrHeight::Height(Height(h)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be faster to read just the unspent outputs by iterating over TransactionLocation { height: 0, index: 0 }..={ height: target_height, index: OutputIndex::MAX } in the utxo_by_out_loc column family and then read any remaining blocks to add any UTXOs that were in blocks that were commited during the iteration.
| }; | ||
|
|
||
| for tx in block.transactions.clone() { | ||
| let txid = tx.hash(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be faster to lookup the transaction hashes in the tx_loc_by_hash column family.
| async fn get_txout_set_info(&self) -> Result<GetTxOutSetInfo, Self::Error> { | ||
| let txouts = Self::get_txout_set(&self.read_state_service).await.unwrap(); | ||
|
|
||
| let best_block_hash = self.get_best_blockhash().await.unwrap().hash(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_txout_set() may not (and on Mainnet, generally won't) read transactions from all blocks up to the best block hash being read here. It would be better to return the tip hash from get_txout_set().
|
Hey folks, thanks for reviewing this! I'll be addressing these comments soon |
|
@dorianvp this RPC involves specification which is scheduled in the next Mile Stone. I am requesting to move this RPC to that Mile Stone. |
On top of #592.
Motivation
#210.
This PR also specifies the algorithm used to generate the UTXO set snapshot hash.
Solution
Implement
gettxoutsetinfo.PR Checklist