Skip to content

[Nexthop] Use split sw/hw agents in Distro #936

Open
travisb-nexthop wants to merge 9 commits intofacebook:mainfrom
nexthop-ai:split_swhw_agent
Open

[Nexthop] Use split sw/hw agents in Distro #936
travisb-nexthop wants to merge 9 commits intofacebook:mainfrom
nexthop-ai:split_swhw_agent

Conversation

@travisb-nexthop
Copy link
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

The mono-mode wedge_agent is going to be deprecated 'soon' and it makes more sense to start Distro Image off using the multi-mode split fboss_sw_agent and multiple fboss_hw_agents, one per ASIC.

This change implements that. Switching to fboss_sw_agent is straightforward.

Enabling a variable number of fboss_hw_agents is slightly trickier because we need some way to determine the correct number of HW agents to start. We need this to be statically configured so a misbehaving ASIC doesn't go undetected.

The way that is solved here is by adding a num_hw_agents file to the configurations extracted by fboss_init. As its name implies, it is the number of hardware agents to start on the given platform. The fboss_init.sh script will read this file and enable that number of
hardware agents.

To make the systemd unit dependencies works correctly, all the possible hardware agents are groups under an fboss_hw_agents.target target. Here up to 4 HW agents are supported, but that is easily extended.

Test Plan

Load the image on a Minipack3. Then verify that:

  1. fboss_sw_agent is running; ps -A|grep fboss_sw
  2. fboss_hw_agent@0 is enabled: systemctl status fboss_hw_agent@0.service
  3. fboss_hw_agent@1 is disabled: systemctl status fboss_hw_agent@1.service

Unfortunately the HW agent crashes on start due to a SAI initialization error.

marif-nexthop and others added 8 commits January 24, 2026 02:00
[Nexthop] Upgrade FBOSS Distro to support systemd services
<!-- Thanks for submitting a pull request! We appreciate you spending
the time to work on these changes. Please provide enough information so
that others can review your pull request. -->

**Pre-submission checklist**
- [X] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [X] `pre-commit run`

<!-- Explain the motivation for making this change and any other context
that you think would help reviewers of your code. What existing problem
does the pull request solve? -->

package_manager, by default, detects the platform it is running on then
uses a compiled-in config file to install the platform BSP RPM using
dnf.

This requires that we have an RPM repo with those RPMs. In Distro Image
we cannot presume an infrastructure RPM server with the correct BSP and
dependent RPMs. Instead, create a device-local one for that purpose.

The repo metadata is created at boot instead of just install time to
help the development workflows where a new RPM is uploaded to the
device, then the device is restarted for testing.

<!-- Demonstrate the code is solid. Example: The exact commands you ran
and their output, screenshots / videos if the pull request changes the
user interface. How exactly did you verify that your PR solves the issue
you wanted to solve? -->

<!-- If a relevant Github issue exists for this PR, please make sure you
link that issue to this PR -->

Build and load the image on a dut. Manually copy the BSP RPM into
/usr/local/share/local_rpm_repo, run createrepo
/usr/local/share/local_rpm_repo, then dnf search for that RPM:
```
dnf search nexthop
Repository local_rpm_repo is listed more than once in the configuration
Last metadata expiration check: 0:00:15 ago on Sat 17 Jan 2026 12:44:46 AM UTC.
======================= Name & Summary Matched: nexthop ========================
nexthop_bsp_kmods-6.11.1-1.fboss.el9.x86_64-1.el9-1.0.0.x86_64 : Nexthop BSP Kernel Modules
```
At boot, start a oneshot systemd service that launches a fboss_init.sh script to perform FBOSS distro initialization and setup e.g. copying right config files to /etc/coop etc.

- Added systemd service launches the script fboss_init.sh
  - The init script does the following steps for now
    - Copy /etc/coop configuration files based on the dmidecode output
    - Generate fruid.json file
- Added option to add default configs (qsfp_service and wedge_agent) for different platforms
- Added default configs for montblanc platform as an example
The mono-mode `wedge_agent` is going to be deprecated 'soon' and it
makes more sense to start Distro Image off using the multi-mode split
`fboss_sw_agent` and multiple `fboss_hw_agent`s, one per ASIC.

This change implements that. Switching to `fboss_sw_agent` is
straightforward.

Enabling a variable number of `fboss_hw_agent`s is slightly trickier
because we need some way to determine the correct number of HW agents to
start. We need this to be statically configured so a misbehaving ASIC
doesn't go undetected.

The way that is solved here is by adding a `num_hw_agents` file to the
configurations extracted by fboss_init. As its name implies, it is the
number of hardware agents to start on the given platform. The
`fboss_init.sh` script will read this file and enable that number of
hardware agents.

To make the systemd unit dependencies works correctly, all the possible
hardware agents are groups under an `fboss_hw_agents.target` target.
Here up to 4 HW agents are supported, but that is easily extended.

Testing

Load the image on a Minipack3. Then verify that:

1. `fboss_sw_agent` is running; `ps -A|grep fboss_sw`
2. `fboss_hw_agent@0` is enabled: `systemctl status
fboss_hw_agent@0.service`
3. `fboss_hw_agent@1` is disabled: `systemctl status
fboss_hw_agent@1.service`

Unfortunately the HW agent crashes on start due to a SAI
initialization error.
@meta-cla meta-cla bot added the CLA Signed label Feb 14, 2026
@travisb-nexthop
Copy link
Contributor Author

This builds upon #888

@travisb-nexthop travisb-nexthop marked this pull request as ready for review February 17, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants