-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Bug Report
Description
CDK-Node unable to sequence after L1 failure.
To Reproduce
https://github.com/0xPolygon/kurtosis-cdk/blob/l1_chaos/docs/mitm.md#l1-missbehaving
https://github.com/0xPolygon/kurtosis-cdk/blob/l1_chaos/scripts/mitm/test_l1_failures.sh
Expected behavior
Recover after L1 resumes normal operation.
Environment (please complete the following information):
cdk-node RC4
Additional context
Doing some testing on stack reliability to L1 failures/issues. Tried many scenarios and most of them result in the same situation: cdk-node (rc4 tested) unable to sequence after a L1 failure until it's restarted (while showing no errors) or even unable to recover in some cases. As I'm using kurtosis/docker it could be that whhat is really required is not to restart but removing the cache file (which is gone after the docker gets restarted).
Scenarios tested with this behavior (all of them tested for 1 minute with a 25% failure ratio -normal L1 operation resumed after the test-):
- L1 returning HTTP error codes (401, 403, 404, 405, 429, 500, 502, 503, 504)
- L1 returning no content at all (empty response)
- L1 returning an empty JSON and/or empty "result" field
- L1 returning arbitrary HTML or JSON (with the right content-type -which seems to be ignored-)
- Receiving a corrupted byte on the JSON content (random byte changed)
- L1 http connection being closed before the answer
- Wrong L1 endpoint set, for instance, setting L2 url instead L1
To add something positive, everything works fine when L1 answer includes additional(unexpected) JSON fields. 💯
Fully automated testing can be done locally with Kurtosis by executing a single script.