[Nexthop] Use split sw/hw agents in Distro #936
Open
travisb-nexthop wants to merge 9 commits intofacebook:mainfrom
Open
[Nexthop] Use split sw/hw agents in Distro #936travisb-nexthop wants to merge 9 commits intofacebook:mainfrom
travisb-nexthop wants to merge 9 commits intofacebook:mainfrom
Conversation
<!-- Thanks for submitting a pull request! We appreciate you spending the time to work on these changes. Please provide enough information so that others can review your pull request. --> **Pre-submission checklist** - [X] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running `pip install -r requirements-dev.txt && pre-commit install` - [X] `pre-commit run` <!-- Explain the motivation for making this change and any other context that you think would help reviewers of your code. What existing problem does the pull request solve? --> package_manager, by default, detects the platform it is running on then uses a compiled-in config file to install the platform BSP RPM using dnf. This requires that we have an RPM repo with those RPMs. In Distro Image we cannot presume an infrastructure RPM server with the correct BSP and dependent RPMs. Instead, create a device-local one for that purpose. The repo metadata is created at boot instead of just install time to help the development workflows where a new RPM is uploaded to the device, then the device is restarted for testing. <!-- Demonstrate the code is solid. Example: The exact commands you ran and their output, screenshots / videos if the pull request changes the user interface. How exactly did you verify that your PR solves the issue you wanted to solve? --> <!-- If a relevant Github issue exists for this PR, please make sure you link that issue to this PR --> Build and load the image on a dut. Manually copy the BSP RPM into /usr/local/share/local_rpm_repo, run createrepo /usr/local/share/local_rpm_repo, then dnf search for that RPM: ``` dnf search nexthop Repository local_rpm_repo is listed more than once in the configuration Last metadata expiration check: 0:00:15 ago on Sat 17 Jan 2026 12:44:46 AM UTC. ======================= Name & Summary Matched: nexthop ======================== nexthop_bsp_kmods-6.11.1-1.fboss.el9.x86_64-1.el9-1.0.0.x86_64 : Nexthop BSP Kernel Modules ```
At boot, start a oneshot systemd service that launches a fboss_init.sh script to perform FBOSS distro initialization and setup e.g. copying right config files to /etc/coop etc.
- Added systemd service launches the script fboss_init.sh
- The init script does the following steps for now
- Copy /etc/coop configuration files based on the dmidecode output
- Generate fruid.json file
- Added option to add default configs (qsfp_service and wedge_agent) for different platforms
- Added default configs for montblanc platform as an example
…ame cleanup improvement
The mono-mode `wedge_agent` is going to be deprecated 'soon' and it makes more sense to start Distro Image off using the multi-mode split `fboss_sw_agent` and multiple `fboss_hw_agent`s, one per ASIC. This change implements that. Switching to `fboss_sw_agent` is straightforward. Enabling a variable number of `fboss_hw_agent`s is slightly trickier because we need some way to determine the correct number of HW agents to start. We need this to be statically configured so a misbehaving ASIC doesn't go undetected. The way that is solved here is by adding a `num_hw_agents` file to the configurations extracted by fboss_init. As its name implies, it is the number of hardware agents to start on the given platform. The `fboss_init.sh` script will read this file and enable that number of hardware agents. To make the systemd unit dependencies works correctly, all the possible hardware agents are groups under an `fboss_hw_agents.target` target. Here up to 4 HW agents are supported, but that is easily extended. Testing Load the image on a Minipack3. Then verify that: 1. `fboss_sw_agent` is running; `ps -A|grep fboss_sw` 2. `fboss_hw_agent@0` is enabled: `systemctl status fboss_hw_agent@0.service` 3. `fboss_hw_agent@1` is disabled: `systemctl status fboss_hw_agent@1.service` Unfortunately the HW agent crashes on start due to a SAI initialization error.
Contributor
Author
|
This builds upon #888 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pre-submission checklist
pip install -r requirements-dev.txt && pre-commit installpre-commit runSummary
The mono-mode
wedge_agentis going to be deprecated 'soon' and it makes more sense to start Distro Image off using the multi-mode splitfboss_sw_agentand multiplefboss_hw_agents, one per ASIC.This change implements that. Switching to
fboss_sw_agentis straightforward.Enabling a variable number of
fboss_hw_agents is slightly trickier because we need some way to determine the correct number of HW agents to start. We need this to be statically configured so a misbehaving ASIC doesn't go undetected.The way that is solved here is by adding a
num_hw_agentsfile to the configurations extracted by fboss_init. As its name implies, it is the number of hardware agents to start on the given platform. Thefboss_init.shscript will read this file and enable that number ofhardware agents.
To make the systemd unit dependencies works correctly, all the possible hardware agents are groups under an
fboss_hw_agents.targettarget. Here up to 4 HW agents are supported, but that is easily extended.Test Plan
Load the image on a Minipack3. Then verify that:
fboss_sw_agentis running;ps -A|grep fboss_swfboss_hw_agent@0is enabled:systemctl status fboss_hw_agent@0.servicefboss_hw_agent@1is disabled:systemctl status fboss_hw_agent@1.serviceUnfortunately the HW agent crashes on start due to a SAI initialization error.