Skip to content

[CI] Add chaotic devnet workflow#4095

Open
kaimast wants to merge 3 commits intostagingfrom
ci/updated_network_delay_test
Open

[CI] Add chaotic devnet workflow#4095
kaimast wants to merge 3 commits intostagingfrom
ci/updated_network_delay_test

Conversation

@kaimast
Copy link
Contributor

@kaimast kaimast commented Feb 4, 2026

Resolves #4062.

This PR adds two new tests .ci/test_reset_minority.sh and .ci/test_reset_majority.sh. Both of these scripts set up a devnet and then periodically reset (stop and remove the ledger) a subset of the nodes. The test succeeds if the nodes keep advancing to a given high.
reset_minority resets exactly f = (N-1)/3 nodes, where N is the total number of validators, and reset_majority resets N-f nodes.

Both of these scripts are run in the new chaotic-devnet-test job, that also adds network delays and message drops through script/chaotic-network-runner.sh.
The job is run with f=2 and N=7, which is a good middle ground between "big enough" to reproduce issues that may appear in production and "not too big" to be run on a single CI machine.

@kaimast kaimast force-pushed the ci/updated_network_delay_test branch 3 times, most recently from f2643e2 to e3f1178 Compare February 4, 2026 06:33
Base automatically changed from ci/upgrade-network-staging to staging February 4, 2026 12:22
@kaimast kaimast force-pushed the ci/updated_network_delay_test branch 8 times, most recently from 8f02401 to 59f00e3 Compare February 4, 2026 22:45
@kaimast kaimast force-pushed the ci/updated_network_delay_test branch from 59f00e3 to 89b2e0b Compare February 4, 2026 22:46
@kaimast kaimast marked this pull request as ready for review February 4, 2026 23:27
@kaimast kaimast requested a review from vicsn February 5, 2026 02:37
@kaimast kaimast mentioned this pull request Feb 5, 2026
@vicsn
Copy link
Collaborator

vicsn commented Feb 5, 2026

The job is run with f=2 and N=7, which is a good middle ground between "big enough" to reproduce issues that may appear in production and "not too big" to be run on a single CI machine.

Great! How reliably do these specifc CI jobs trigger the halts observed in this thread? #4047 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Use network delay script in CI

3 participants