-
Notifications
You must be signed in to change notification settings - Fork 545
NAS-139113 / 26.0.0-BETA.1 / Optimize Docker Stats Collection #18045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
essinghigh
wants to merge
14
commits into
truenas:master
Choose a base branch
from
essinghigh:docker-stats-sparse
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+134
−39
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…spamming get_netrc_auth
Introduce TypedDict definitions (ResourceStats, BlkioStats, NetworkStats) and apply type annotations, Extract Block IO and Network stats parsing logic into dedicated helper functions (_parse_blkio and _parse_networks) to simplify get_container_stats, Move the project label check in get_container_stats before the expensive container.stats() API calls, Refactor the aggregation loop in list_resources_stats_by_project for easier readability
Neither is scanning the home directory for auth configs on every request. Switched to a cached singleton client.
The threads are spending more time fighting over the lock than actually getting stats, so we might as well just go back to sequential querying. For container.stats(), this is very fast now that we aren't recreating the client constantly
This has crept back up over time, we just need to ensure we are avoiding connect pool contention
Contributor
Author
|
Had to reopen this as I merged my personal & work github accounts, which closed out my open issues/PRs. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow-on from #17906
Persistent Connection Pool
Updated
get_docker_clientto reuse a single Docker client instance. This eliminates the massive overhead of establishing new socket connections for each execution cycle.Removed Unnecessary Disk I/O
Explicitly disabled
trust_envand defined the local Docker socket Base URL. This prevents requests from performing filesystem lookups for .netrc and and environment variables on every single API call.Sparse Container Listing
Switched container listing to use
sparse=True. With this set to False like it was originally, it would do a full inspect API call for each container when listing. We only list these to identify the containers, and all the data needed is returned from a sparse listing.Race Condition Handling
Removed the blocking retry loop. If a container dies between listing and and stats collection, we can just silently skip it.
Type Safety & Readability
I've rewritten the stats collection module to use TypedDict and strict type hints, which improves readability significantly compared to the previous nested dictionary approach.
I have tested this on the latest MASTER build without issues. I have also been running these changes on a production machine on 25.10.0 without any problems.
Before:

After:

I'm also still convinced that there's some bug in the WebUI and/or Middlewared causing the AppStatsEventSource to never terminate. However I haven't been able to find anything to that end.
There are some areas I think could be improved still: