Skip to content

Comments

Add retries for software installs#39827

Open
cdcme wants to merge 10 commits intomainfrom
feat-34068-software-install-retries
Open

Add retries for software installs#39827
cdcme wants to merge 10 commits intomainfrom
feat-34068-software-install-retries

Conversation

@cdcme
Copy link
Member

@cdcme cdcme commented Feb 13, 2026

Fixes #34068

Summary

Adds automatic retries (up to 3 attempts) for failed software installs from host details, self-service, and setup experience across all installer types (except VPP apps).

Details

Non-policy software installs previously had no retry mechanism. A single failure required manual re-triggering. This extends the existing policy-automation retry pattern to all install paths.

Key changes:

  • New WithRetries flag on HostSoftwareInstallOptions enables attempt tracking at activation time
  • On failure, the service layer re-queues the install up to MaxSoftwareInstallAttempts (3) times
  • Manual re-installs reset old attempt counters and cancel pending retries for a fresh sequence
  • Setup experience status updates are deferred during intermediate retry attempts

Expected Behavior

  • Failed installs from host details, self-service, and setup experience retry up to 3 times automatically
  • VPP apps do not retry
  • Manual re-triggers clear old retry state and start fresh
  • Policy automation retries are unchanged
  • Activity feed entries are created for each attempt

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 73.45133% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.32%. Comparing base (6f52646) to head (40bfec5).
⚠️ Report is 228 commits behind head on main.

Files with missing lines Patch % Lines
server/service/orbit.go 66.66% 9 Missing and 5 partials ⚠️
server/datastore/mysql/software_installers.go 79.41% 4 Missing and 3 partials ⚠️
ee/server/service/software_installers.go 16.66% 4 Missing and 1 partial ⚠️
server/service/apple_mdm.go 60.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #39827      +/-   ##
==========================================
+ Coverage   66.26%   66.32%   +0.06%     
==========================================
  Files        2438     2447       +9     
  Lines      195266   196546    +1280     
  Branches     8643     8643              
==========================================
+ Hits       129386   130366     +980     
- Misses      54167    54362     +195     
- Partials    11713    11818     +105     
Flag Coverage Δ
backend 68.12% <73.45%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cdcme cdcme marked this pull request as ready for review February 13, 2026 23:12
@cdcme cdcme requested a review from a team as a code owner February 13, 2026 23:12
@mna
Copy link
Member

mna commented Feb 16, 2026

Arrrrrrghhhh I had a somewhat completed review and lost all comments when I switched back to conversation and back again to files changed...

@mna
Copy link
Member

mna commented Feb 16, 2026

Oh no wait, I can go back to it, it just started a new review but the pending one is still available... New UX I think?

Copy link
Member

@mna mna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first look without reviewing the tests, left some comments. My main concern is with the unified queue, to ensure we don't introduce scenarios where the queue gets stuck.

I'll take a full look (including tests) on my next review.

@cdcme cdcme marked this pull request as draft February 17, 2026 14:12
@cdcme cdcme marked this pull request as ready for review February 18, 2026 15:08
@cdcme
Copy link
Member Author

cdcme commented Feb 18, 2026

@mna I think I've addressed your feedback either with changes or by adding comments. Please lmk if anything else jumps out at you, and ty!

@cdcme cdcme requested a review from mna February 18, 2026 15:08
@mna
Copy link
Member

mna commented Feb 18, 2026

@cdcme ty for the ping, I'll take another look this PM!

Copy link
Member

@mna mna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor thing and the reuse of the cancel activity logic. Let me know if you'd prefer to address this in a subsequent PR, if so I'll approve this one.


_, err := svc.ds.InsertSoftwareInstallRequest(ctx, host.ID, installer.InstallerID, fleet.HostSoftwareInstallOptions{
SelfService: false,
WithRetries: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other usages here, it looks like we manage retries three different ways:

  1. This (manually initiated installs)
  2. Setup experience software retries (already existed prior)
  3. Policy failure software install retries (already existed prior)

The latter two don't use the retry mechanism implemented here because they perform retries at a higher level of abstraction, right?

I'm pretty sure I saw the setup experience retries ticket come across a bit ago but don't recall when policy failure based retries happened. Can you link me the ticket for that one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Yes, broadly speaking, I think that's right. Are you thinking of PR #38018?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I'm thinking of for policies. Setup experience was done earlier.

@iansltx
Copy link
Member

iansltx commented Feb 20, 2026

Deferring back to @mna for the last review pass since he's starting at 90% context here rather than 20%, so this'll get merged quicker (on Monday) that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Host details and Self-service: Add retries to software installs

3 participants