Skip to content

Conversation

@antonleviathan
Copy link
Contributor

Motivation

I introduced the use of the StageX build toolchain, and made the zainod build deterministic.

It's worth nothing that runtime inherits a secure config from here:
https://codeberg.org/stagex/stagex/src/commit/4a9058f6cc16dc81f762543d768176762b600dd7/packages/core/filesystem/Containerfile

Tests

I built the image twice using utils/build.sh and compared the resulting hashes

Follow-up Work

We need to figure out how to update the Makefile.toml according to team preferences to use the updated image for creating zainod images and for producing binaries as needed.

I'm also happy to help write documentation as desired.

PR Checklist

  • The PR name is suitable for the release notes.
  • The solution is tested.
  • The documentation is up to date.

@antonleviathan
Copy link
Contributor Author

antonleviathan commented Nov 2, 2025

To test, you will need Docker v26.0.0+ and containerd, then run ./utils/build.sh from root.

@antonleviathan antonleviathan force-pushed the bootstrapped_deterministic_build branch from a247da8 to 5dbe7a5 Compare November 2, 2025 20:57
@ala-mode ala-mode self-requested a review November 3, 2025 00:10
Copy link
Contributor

@ala-mode ala-mode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this by following the instructions on an underpowered VPS, and everything seemed to work as expected!

Though I did check basic help outputs and version commands, I did not put the resulting binary through its full paces. Similarly I looked over the changes to the Dockerfile but I would like more experienced eyes on it.

This modifies our default Dockerfile in place. I'm not sure if it makes sense to do so, but an idea I had was to have this be an option instead of StageX being incorporated into the only Dockerfile available; probably with something like Dockerfile.classic Dockerfile.stagex. Perhaps this would be useful as a transtional gesture. I suspect the proposed change will break existing CI setups because it uses cargo build instead of cargo install for example.

So I approve, but to merge we probably should test more thoroughly and also look at the StageX tooling in greater detail (I only gave it a look-over) - perhaps others from ZL (@zancas @nachog00 ?) can come and try this out.

A few points to @antonleviathan about this PR at this moment:

Documentation is still needed.

I'd also like to see build.sh renamed to something more clearly StageX oriented, because it is not a 'core' Zaino utility. Similar thought for the build directory (and maybe Dockerfile itself).

One other detail I noticed was that the resulting binary was dated Jan 1 1970, if it's possible it might be nice to have the resulting binary's timestamp set to when it's created.

These all seem like small points, and it's probable someone from ZL could make these changes themselves either as a PR into your branch or more directly.

@antonleviathan
Copy link
Contributor Author

This modifies our default Dockerfile in place. I'm not sure if it makes sense to do so, but an idea I had was to have this be an option instead of StageX being incorporated into the only Dockerfile available; probably with something like Dockerfile.classic Dockerfile.stagex

StageX is a drop in replacement for distributions like Alpine and Debian, providing the same software, but built more securely. If you decide to have multiple different images that's fine, but it's redundant.

@antonleviathan
Copy link
Contributor Author

Perhaps this would be useful as a transtional gesture. I suspect the proposed change will break existing CI setups because it uses cargo build instead of cargo install for example.

cargo install is bad for determinism, that's why cargo build is used.

@antonleviathan
Copy link
Contributor Author

antonleviathan commented Nov 3, 2025

I'd also like to see build.sh renamed to something more clearly StageX oriented, because it is not a 'core' Zaino utility. Similar thought for the build directory (and maybe Dockerfile itself).

Building zaino always using the StageX setup has better security guarantees, so ideally it should be used wherever possible. Again, StageX is a more secure alternative to Linux distributions like Debian, Alpine, Nix, Guix etc. To expand on this a bit, with other distros, you have no visibility into what compiler was used of how the binary was built. All you usually get is a cryptographic signature on the hash after it's compiled, adding some assurance that it was not tampered between the mode of delivery and its destination. With StageX you can prove exactly how the entire toolchain was built, down to the compiler, which is itself bootstrapped. On top of that, rust is also bootstrapped. The end result is a drastic reduction in attack surface area for Trusting Trust style attacks, and other build or runtime environment compromise. When you use this setup to reproduce on multiple machines, you get much higher guarantee that the thing you are delivering to your end users, and using internally has not been maliciously manipulated somewhere along the way.

Ideally, you have a team of several people who do the reproduction, and sign the resulting hashes (this is what Bitcoin core does for each release) using well managed PGP keys (ideally never exposed to memory and only stored on smart cards), and make those hashes available with each release. Keyoxide is a nice way to build trust in PGP keys btw.

The ultimate way to verify StageX is to read the part of the tree that's relevant to you, for example the bootstrapping stages for compilers, and rust, and then build it yourself and check that the hashes match what's published here.

I know it's a lot but I'm happy to support around this if there is interest. Just upgrading to the toolchain and making the software deterministic is already a good start though.

@antonleviathan
Copy link
Contributor Author

antonleviathan commented Nov 3, 2025

One other detail I noticed was that the resulting binary was dated Jan 1 1970, if it's possible it might be nice to have the resulting binary's timestamp set to when it's created.

The date always has to be the same or the software will not be reproducible. You can set it to some different value per release, but that's state you have to manage, so it's best to not touch that. The value there is relatively low vs the security gains. Of course, if you decide, you can update the build date for each release, or as desired.

@antonleviathan
Copy link
Contributor Author

antonleviathan commented Nov 3, 2025

Let me know what you would like to see in terms of documentation! Happy to write up whatever you feel is missing butt he idea is that there should be a make build command and that would be used both to generate the image, and export the binary (it calls utils/build.sh).

@ala-mode
Copy link
Contributor

ala-mode commented Nov 3, 2025

Great comments. We'll be discussing them in ZingoLabs. Thank you very much @antonleviathan !

@zancas
Copy link
Member

zancas commented Nov 4, 2025

not been malicious manipulated somewhere along the way.

maliciously

@fluidvanadium fluidvanadium self-assigned this Dec 10, 2025
@ala-mode
Copy link
Contributor

ala-mode commented Jan 12, 2026

REPORT

TLDR:

StageX is an advanced, container-native distribution specifically made to provide tooling for bootstrapping and reproducibility. It aims to eliminate 'single points of failure' along the software supply chain, using their own multi-party trust for all released packages. I think highly of this project, but it is still relatively novel and with some rough edges.

In my view, at this time this PR should be modified to create a second, clearly labeled Dockerfile for StageX, along with its novel scripts, and merged in with our existing code.

Using the StageX project would be a significant improvement on our current setup BUT currently StageX cannot build zainod. It can only build rust up to version 1.88, and Zaino has a core dependency on Zebra, which requires a higher version for the 3.1.0 release. StageX is rapidly working toward an updated release, but even if this lands soon, I believe we should see stability in this project for a period of minimally a few months with successful builds before using it as our only build pipeline.

Detailed information and further suggestions below.


Overview

For StageX background/overview:
https://forum.zcashcommunity.com/t/bootstrapped-and-deterministic-builds-a-la-stagex/53040#p-236714-project-summary-9

I learned some things by reading the whitepaper in draft form, and by working with building StageX directly. I'm happy to try to answer questions, but @antonleviathan has been active here and Distrust has some active chat channels as well. The idea is bootstrapping up from a very minimal assembler with Stage0, bootstrapping a path to a C compiler capable of compiling GCC, followed by subsequent stages: building up modern cross compilers and other packages in a controlled way, step by step with OCI compatible, reproducibly built containers. Many details will be touched on below.

This PR does not renovate Zanio testing, or change test_environment/Dockerfile: testing runs on GH and standards for local testing are divergent from the repo's root Dockerfile's container builds.

Benefits

What would Zaino get from using StageX:

  • Zainod build can be deterministic.1
  • Eliminating single points of failure along supply chain.2
  • Streamlined dependencies.3
  • Minimal runtime image.4
  • Security-minded runtime container configuration.
  • ZingoLabs can be part of the the cutting edge, along with Zallet.5:

Changes

BUILD Containers
Specifically: FROM rust:${RUST_VERSION}-bookworm AS builder -> Several StageX imagesFROM stagex/pallet-rust, stagex/user-protobuf, stagex/user-abseil-cpp.67

RUNTIME container
FROM debian:bookworm-slim AS runtime 297.8 MB -> stagex/core-user-runtime COPY /rootfs/ / # buildkit

A number of bash scripts are also in the PR.

Concerns

  • Additional docker tooling dependencies. Adoption of various docker tooling could have friction, both actually installing the needed components, and understanding/using them.8
  • In addition to the above, locally I had to rely on the moby/buildkit image to run the stack.11
  • No public StageX audit, yet.
  • Working with a special project like StageX takes us away from 'mainstream' build procedures.12
  • Zaino tests and CI pipeline do not have any indications of if our repo-root directory Dockerfile works or not.

Recommendations

  • Update Zaino repo so both our current root-level Dockerfile and StageX Dockerfiles are incorporated.13
  • Evaluate StageX build scripts for extra logic (for example matching in [utils/compat.sh](https://github.com/zingolabs/zaino/pull/641/files#diff-6f6238b041895fc89d8b49281dc571ace26f2c0fd05b8e1d5edf399f4ae98199))
  • Check for state of the art on stable rust build ability, this is a quickly developing area.
  • Phase out old method once StageX demonstrates build stability for a sufficient period of time.
  • Update current Dockerfile (set to ARG RUST_VERSION=1.86.0). This is different than Zaino's [.env.testing-artifacts](https://github.com/zingolabs/zaino/blob/dev/.env.testing-artifacts) which is set to rust 1.92.0.
  • Consider a single source of truth, or closely related images across our production build Dockerfile and test-infrastructure Dockerfile.14
  • Create some test coverage (see if the build reproduces, perhaps), or minimally check if the root Dockerfile functions or builds.
  • Take this opportunity! I believe a lot of thought, skill and patient work went into this project, so if it is possible for ZingoLabs to take this on, I think it is worthy of full consideration.

Stop the presses!

News Flash! Sighting of (draft - working?!) Rust 1.91 emerges on StageX Dockerfile on other blockchain project!

This is 0 day news! Maybe we could get this working now in short order.

Footnotes

  1. via StageX images bootstrapped on Docker.

  2. StageX maintains images which are signed by Distrust core devs in 2 of 2 agreement to add or update StageX images. Distrust has painstakingly created build pipelines around minimalism and security. The earliest build stage, Stage0, billed as "the ultimate lowest level of bootstrap that is useful for systems without firmware, operating systems nor any other provided software functionality" is the most low-level and is said to be easily auditable by a sufficiently advanced engineer (or as they said it must be understood by 70% of programmers). But there have not yet been public audits of StageX as a whole, and I am not sure of the audit status of Stage0 individually.

  3. Beyond the minimal bootstraping, 'Pallets', like a "meta" package, are loaded, there is no dependency tracking. For example, Debian's build-essential or Arch Linux's base-devel, for language-specific options. This setup can be leveraged for a massive reduction of attack surface.

  4. Also, this is a massive reduction of attack surface.

  5. This has operational and optical benefits.

  6. Bookworm image released September 6, 2025 (but bookworm originally released 2023-06-10, EOL mid 2026), StageX image released Oct 3, 2025. StageX has 735 repositories all updated about 3 months ago.

  7. Here is the breaking difference. Bookworm as a rust container currently can build rust 1.92 while the specific StageX cannot build above 1.88. Here is the branch working to close this issue. https://hub.docker.com/_/rust -> https://hub.docker.com/r/stagex/pallet-rust
    Dockerfile:15:FROM stagex/pallet-rust@sha256:9c38bf1066dd9ad1b6a6b584974dd798c2bf798985bf82e58024fbe0515592ca AS pallet-rust

  8. In the end I installed relatively few tools; containerd, runc (a low-level OCI-compliant runtime that manages namespaces, cgroups, and mounting filesystems to isolate processes. See https://github.com/containerd/containerd/blob/main/docs/RUNC.md) and creating buildx capabilities to enable OCI exports.9 Working with my system, I had to install these, but I understand that they can be present by default. Confirm with: docker info | grep Runtime which should output Runtimes: io.containerd.runc.v2 runc}.

  9. Install containerd from source (requires go protoc and Btrfs). On Arch, there is a package in the extra repository. On Debian, containerd is also available via a DEB package distributed by Docker. This includes runc. [I had to manually edit /etc/apt/sources.list.d/docker.sources to set the correct distribution codename.] I also had to build a non-default driver /usr/bin/docker buildx create --name dc-driver --driver docker-container --driver-opt network=host --buildkitd-flags '--allow-insecure-entitlement network.host' --use --bootstrap . This pulls image moby/buildkit:buildx-stable-1 , retains a local image, and runs a new container based on this. The buildx_buildkit_dc-driver0 a "builder" which is a dedicated, isolated BuildKit environment inside a Docker container that supports OCI exports which has a unix socket endpoint (and a buildkit version). Finally I found I needed to export BUILDX_BUILDER=dc-driver. To confirm: docker buildx ls and used docker buildx use dc-driver. Possibly related: I'm not sure how how this runtime situation would map to other OCI compliant systems, like podman, but it may be a concern. Additionally, there are CNI plugins (Container Network Interface - components written in go to configure networking for containers: responsible for setting up network interfaces, assigning IP addresses, managing routes, and connectivity between containers and external networks.). These plugins are reference and example networking plugins, maintained by the CNI team {CNI (Container Network Interface), a Cloud Native Computing Foundation }. These linux networking plugins might provide some utility, but they were concerning to me somewhat.10

  10. Wow you really like footnotes! Recursion is not a no-go for you. Their last release fixed CVE-2025-67499, a bug in the nftables allowing containers to emulate opening a host port, forwarding that traffic to the container (intercepting traffic intended for another container for example), and their last commit message (the only one since) said "Somehow we missed this case ... Oops." As a project I was not familiar with, and only having Github (and Github-provided shasums for integrity) this seems like a weaker link in the security of the toolchain, so I'm happy I didn't end up using them - yet.

  11. which had access to unix:///var/run/docker.sock = essentially root level privileges.
    These concerns could be addressed with rootless docker [or podman which IIUC has non-root operation as a default] and the moby/buildkit project does have some level of rootless support as well.

  12. StageX is deployed in production, but is still relatively new, has small user base, and a small (and select) developer community. Being on the cutting edge will bring demands of time and attention. If the new stack has an issue, it could cause urgent issues for our project. Less eyes on scripts, maybe less responsive, or less support if something is off. The maintenance burden might be significant causing updates or patches to be infrequent or delayed. But this could be seen as a benefit, particularly around supply chain issues (I'd rather have 2 skilled devs who need to agree rather than 179 devs across 68 projects who all could sign off on something critical). Also on the upside would be ability to potentially help advance this tool by offering good bug reports etc.

  13. perhaps named Dockerfile-stagex and utils/ could be $-stagex.sh

  14. for example, test infra could only include additional tools needed for testing - or at least be closely related (especially, base environment and behavior) to ensure consistency.

@ala-mode
Copy link
Contributor

One other thing I forgot to put down here - perhaps it is possible to leverage some kind of auditing via Least Authority, who have some kind of arrangement to spend time on zec related projects in an ongoing way.

The assertion from StageX is that Stage0 could be audited by someone competent in a handful of hours. Because there is no public audit currently, this sounds like a ripe possibility. Maybe they could even go higher into the stages too, time permitting.

@fluidvanadium fluidvanadium removed their assignment Jan 16, 2026
@ala-mode
Copy link
Contributor

ala-mode commented Jan 19, 2026

To enable the needed containerd snapshotting (to work with compat.sh) , I also had to modify /etc/docker/daemon.json, adding :

{
"features": {
    "containerd-snapshotter": true
  }
}

see this document for details.

@zancas zancas marked this pull request as draft January 19, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants