feat: Add retry option for fetching rulesets#38
Open
stephenreid wants to merge 4 commits intostatsig-io:mainfrom
Open
feat: Add retry option for fetching rulesets#38stephenreid wants to merge 4 commits intostatsig-io:mainfrom
stephenreid wants to merge 4 commits intostatsig-io:mainfrom
Conversation
Author
|
@lfoster-statsig I explored the SDK some, the default parameter for ruleset download retries is currently 0; this can help failed initialization. |
Author
|
Intention implemented by d54626e#diff-af580512ac44255daab58807d7ca8efd50dce3d9fbfe7d4c96b428dd38e19d76 v2.8.4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: Default Network Timeout and Ruleset Retry Logic
Summary
This PR introduces a default
network_timeoutand a configurableruleset_id_list_retry_limitto the Ruby SDK to improve reliability during transient network instability.nil, which could lead to hung requests in certain environments depending on the underlying HTTP client's behavior.ruleset_id_list_retry_limit(defaulting to 3) to ensure that fetching configuration specs and ID lists is resilient to intermittent failures.StatsigOptionsand the corresponding documentation table alphabetically for better maintainability.Resilience to Cloudflare 520 Errors
Cloudflare 520 ("Web Server Returned an Unknown Error") is a catch-all for unexpected responses from the origin. These are often transient and can occur during brief periods of high latency or socket hangs.
network_timeout, we ensure that if a Cloudflare edge or the origin hangs indefinitely (common in 520 scenarios), the SDK will proactively close the connection rather than waiting indefinitely.Stability via Backoff and Jitter
The SDK utilizes an exponential backoff strategy with added jitter for these retries:
backoff * @backoff_multiplier). This prevents "thundering herd" issues where a recovering service is immediately overwhelmed by a flood of simultaneous retries from all SDK instances.Network#request), we ensure that multiple distributed Ruby processes don't synchronize their retry attempts. This spreads the load over time, providing the network layer and Statsig's infrastructure a better window to stabilize and process requests successfully.Test Plan
StatsigOptionsdefaultsnetwork_timeoutto 30 andruleset_id_list_retry_limitto 3.test/test_network_timeout.rbto confirm default timeout behavior.test_ruleset_id_list_retriestotest/test_network.rbto verify that the SDK correctly retries failed config fetches up to the specified limit.