-
Notifications
You must be signed in to change notification settings - Fork 67
[inventory] Add unhealthy zpools from each sled #9615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
karencfv
wants to merge
27
commits into
oxidecomputer:main
Choose a base branch
from
karencfv:zpool-health-check
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 8 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
ed4d1ee
pseudo-code
karencfv b30e767
add some thoughts
karencfv 27204f4
Merge branch 'main' into zpool-health-check
karencfv b038f6e
[inventory] Add unhealthy zpools from each sled
karencfv 25db966
add some tests
karencfv cc6d397
Add unhealthy zpool functionality
karencfv 219918c
add zpool health info to API
karencfv 407a068
fix cmd order
karencfv c817b9c
address comments
karencfv d01216d
get serde/schemars working with Option<Result<T,E>>
karencfv b1fecc8
clean up
karencfv 8eb5001
make time_of_status required
karencfv 7cd8f4b
add json file and fmt
karencfv c5baf76
merge main
karencfv e6944dc
generate openapi doc
karencfv 0458330
sigh...
karencfv 4c7cad3
untagged so it works on none
karencfv 2771225
linter
karencfv b8157ea
report zpool health state as well
karencfv a2e3e90
add generated openapi spec
karencfv c734654
Remove manual testing code
karencfv 030b11f
Make smf_services_in_maintenance optional too
karencfv bc8eb04
merge main
karencfv fe84464
fixes after merge
karencfv 3332389
remove manual testing code
karencfv 6113b94
merge main
karencfv 090b714
generate openapi doc
karencfv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth between just having a list of unhealthy zpools, or associating each zpool with it's state. In the end I went with listing the zpools only, but I'm not convinced. We'll be including the information of the health checks in the support bundle, and it'd be useful for them to be able to see what state each zpool is in. Thoughts? @davepacheco @jgallagher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Associating each zpool with its state sounds good to me; having an explicit entry for "this zpool was healthy" seems safer than inferring "any zpool that isn't explicitly listed as unhealthy must have been healthy".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I was thinking of only including the "unhealthy" zpools with their associated statuses in this list. Similarly with the
svcs_in_maintenanceI only added the services in maintenance with their associated zones. If I were to include the "healthy" zpools then it wouldn't really be consistent with the services in maintenance no?My take on the health checks is to only report on things that are in an unhealthy state. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this, but for some reason it doesn't bother me? I think because (a) there are many fewer zpools, and (b) a zpool definitely can "just not be there" (e.g., if the drive is physically gone or not accessible or whatever), whereas an SMF service can't really, so it seems safer to assume "no report about the SMF service" implies "service is healthy".
Maybe worth discussing live. I just remembered we already do report some zpool information in inventory, and this has some nontrivial differences that we might want to address?
zpool ...commands on demand when an/inventoryrequest comes in; this seems much worse than this branch's "run periodically in the background, and/inventorygets to clone the contents of a watch channel" pattern.zpool ...command we're invoking returns it.I'm wondering if it would make sense to reconcile these differences and then remove the top-level inventory
zpoolsentry, and update any consumers of it to consume the health information instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that familiar with what consumers use the current zpool information. But I wonder if it makes sense to add more information to the health checks than necessary for their purpose. In the case of services in maintenance we stripped a whole bunch of information out just to report a list of services with their zones.
That said, this sounds like an interesting Idea.
What happens with zpools that would be in this state?
Yeah, it feels messy to be calling zpool in different places for the same inventory collection
But yeah, better to chat about this live. I'll be at the next update watercooler!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recap from the chat at the update watercooler:
We definitely want just a single source of truth for zpool information. It makes sense to just add new fields with status information and a time stamp to the existing zpool list. Additionally, we may want to use the new background task to retrieve all of the information instead of calling
zpool liston demand.