Add a script to check for breaking C ABI changes#1749
Add a script to check for breaking C ABI changes#1749benfred wants to merge 9 commits intorapidsai:mainfrom
Conversation
This adds a script that can check for changes that would break our C-ABI. t works by comparing two sets of header files: the `c/include` header files for the published ABI, as well as the `c/include` header files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change. Currently the following checks are made and flagged by this tool: * Functions that have been removed from the C-ABI * Functions that have extra parameters added * Functions that have parameters removed * Functions that have the type of any parameter changed * Structs that have been removed * Structs that have have members removed * Structs that have the types of members changed * Enum values that have been removed * Enum values that have their definition changed
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
fantastic! This is exactly what I hoped we could do. I think we can achieve the reference state by:
Overall, just decouple the source code scanning from the result comparison, and we should have a lot of options. |
|
thanks @msarahan ! I've decoupled extracting the ABI from using it to flag abi errors in the last commit - running We will still need to figure out where to store the baseline ABI files for comparison (fwiw, the generated abi file is about 5KB right now, so its actually smaller than the script to generate it) - I'm ok with any of the options you mentioned (github release artifacts or s3) - github release artifacts would be slightly easier for me, but only because I've done that before for py-spy (like storing the wheels for each release here) and am unsure of the S3 workflow/permissions from GHA I don't know if we need to generate the abi file for the current set of files being checked, imo it might be easier to just use the header files directly in the PR |
|
I'm inclined to give this to the ops team to figure out and advise us. I have some vague recollection that we are trying to gradually get away from downloads.rapids.ai, but I could be wrong. The nice thing about using github for this is that there's no additional auth setup required - the job token would be enough. That's a pretty small thing, though. @rapidsai/ops any opinion here? |
|
Thx for the ping @msarahan I read through the conversation above and based on what I am seeing I would recommend keeping things in the github domain (vs storing things externally) especially given the size:
As for storing headers to compare against, my recommendation would be to keep this in git in some fashion, e.g.
I don't think trying to store something remotely in a static way and overwriting it every PR would be a good idea given the fact multiple PRs/pipelines can be running at a given time, but maybe I don't fully understand the full scope of this. Let me know if there is anything else I might be missing as this is 1 of 100 other random things coming our team's way currently as we get closer to the release. |
|
Web content caching may also be an annoying problem if we do overwriting. I will hack up a github action to do things with Ben's code. I'll submit it as a PR to this PR's branch - so please watch for it, Ben. You are the only one who will get the notification, I think. I will post here, too. |
This commit adds automated C ABI compatibility checking to the CI pipeline, building on the check_c_abi.py script from benfred's PR rapidsai#1749. Changes: - Add check-c-abi.yaml workflow to run on PRs modifying C headers - Add store-c-abi-baseline.yaml workflow to store ABI baselines on releases - Integrate ABI check into pr.yaml as a required check - Add comprehensive developer documentation in ci/C_ABI_CHECKING.md - Add implementation summary in GITHUB_ACTIONS_ABI_CHECK_SUMMARY.md The ABI checker automatically detects breaking changes to the C API: - Function signature changes (parameters, return types) - Struct member changes (additions, removals, type changes) - Enum value changes Benefits: - Catch breaking changes before merge, not after release - Automated checking on every PR touching C headers - Clear error messages and guidance for developers - Fast execution (~2-5 minutes) - Stores baselines on releases for future comparisons See GITHUB_ACTIONS_ABI_CHECK_SUMMARY.md for full details.
- Add check-c-abi.yaml workflow for PRs - Add store-c-abi-baseline.yaml workflow for releases - Integrate ABI check into pr.yaml pipeline Co-authored-by: Cursor <cursoragent@cursor.com>
- Extract and store main baseline on push to main (never expires) - PRs download and compare against main baseline - Release workflow archives the main baseline with version tag - Eliminates duplicate ABI extraction work Co-authored-by: Cursor <cursoragent@cursor.com>
Enables bootstrapping the initial c-abi-baseline-main artifact Co-authored-by: Cursor <cursoragent@cursor.com>
- Store baselines with commit SHA for precise comparisons - Cascade: try merge-base → latest main → extract fresh - Add concurrency control to prevent race conditions - Report which baseline source was used for transparency Co-authored-by: Cursor <cursoragent@cursor.com>
|
PR is up at benfred#1 |
Add GitHub Actions for C ABI checking
KyleFromNVIDIA
left a comment
There was a problem hiding this comment.
Hey, just wanted to let you know I'm getting to this now. There's a lot going on here, so it's going to take a while, but I do have some initial comments.
ci/check_c_abi.py
Outdated
| location: Optional[SymbolLocation] = None | ||
|
|
||
|
|
||
| def analyze_c_abi(old_abi, new_abi): |
There was a problem hiding this comment.
Please split this function up into struct, enum, and function handling. Perhaps further break it up into individual checks as well.
Please also add unit tests of some kind.
There was a problem hiding this comment.
This will quickly expand into something that doesn't fit in this one-off folder. It might be wise to plan long term to split this off into a standalone project. Of course, that means it would probably have to undergo osrb review separately from rapids/cuvs.
I would be inclined to move it to its own folder in this project for now, and create the full Python project scaffolding for it. That whole folder could then be moved to a separate repo when we're ready.
| update-baseline: | ||
| if: github.event_name == 'workflow_dispatch' || (github.event_name == 'push' && github.ref == 'refs/heads/main') |
There was a problem hiding this comment.
This job and check-pr are different enough, and have different enough conditions, that I think they should be in separate files.
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y \ | ||
| libclang-dev \ |
There was a problem hiding this comment.
Is libclang-dev needed for the Python API? When you pip install libclang below, does it build against these headers?
| have been removed or had function parameters removed, parameters added or the type of any | ||
| parameter changed. Note: adding new functions to the new abi is allowed | ||
| """ | ||
| errors = [] |
There was a problem hiding this comment.
Let's use generators instead of returning arrays:
| errors = [] |
| errors.append( | ||
| AbiError( | ||
| "Function has been removed", | ||
| symbol=old_function.name, | ||
| location=old_function.location, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
| errors.append( | |
| AbiError( | |
| "Function has been removed", | |
| symbol=old_function.name, | |
| location=old_function.location, | |
| ) | |
| ) | |
| yield AbiError( | |
| "Function has been removed", | |
| symbol=old_function.name, | |
| location=old_function.location, | |
| ) |
| errors.append( | ||
| AbiError( | ||
| f"Function has return type changed from '{old_function.return_type}' to '{new_function.return_type}'", | ||
| symbol=new_function.name, | ||
| location=new_function.location, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
| errors.append( | |
| AbiError( | |
| f"Function has return type changed from '{old_function.return_type}' to '{new_function.return_type}'", | |
| symbol=new_function.name, | |
| location=new_function.location, | |
| ) | |
| ) | |
| yield AbiError( | |
| f"Function has return type changed from '{old_function.return_type}' to '{new_function.return_type}'", | |
| symbol=new_function.name, | |
| location=new_function.location, | |
| ) |
| errors.append( | ||
| AbiError( | ||
| f"Function has a new parameter '{new_type} {new_name}'", | ||
| symbol=new_function.name, | ||
| location=new_function.location, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
| errors.append( | |
| AbiError( | |
| f"Function has a new parameter '{new_type} {new_name}'", | |
| symbol=new_function.name, | |
| location=new_function.location, | |
| ) | |
| ) | |
| yield AbiError( | |
| f"Function has a new parameter '{new_type} {new_name}'", | |
| symbol=new_function.name, | |
| location=new_function.location, | |
| ) |
| errors.append( | ||
| AbiError( | ||
| f"Function has a deleted parameter '{old_type} {old_name}'", | ||
| symbol=old_function.name, | ||
| location=old_function.location, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
| errors.append( | |
| AbiError( | |
| f"Function has a deleted parameter '{old_type} {old_name}'", | |
| symbol=old_function.name, | |
| location=old_function.location, | |
| ) | |
| ) | |
| yield AbiError( | |
| f"Function has a deleted parameter '{old_type} {old_name}'", | |
| symbol=old_function.name, | |
| location=old_function.location, | |
| ) |
| errors = [] | ||
| errors.extend(_analyze_function_abi(old_abi, new_abi)) | ||
| errors.extend(_analyze_struct_abi(old_abi, new_abi)) | ||
| errors.extend(_analyze_enum_abi(old_abi, new_abi)) | ||
| return errors |
There was a problem hiding this comment.
| errors = [] | |
| errors.extend(_analyze_function_abi(old_abi, new_abi)) | |
| errors.extend(_analyze_struct_abi(old_abi, new_abi)) | |
| errors.extend(_analyze_enum_abi(old_abi, new_abi)) | |
| return errors | |
| yield from _analyze_function_abi(old_abi, new_abi) | |
| yield from _analyze_struct_abi(old_abi, new_abi) | |
| yield from _analyze_enum_abi(old_abi, new_abi) |
| ) | ||
| ) | ||
|
|
||
| return errors |
There was a problem hiding this comment.
| return errors |
| location=new_struct.location, | ||
| ) | ||
| ) | ||
| return errors |
There was a problem hiding this comment.
| return errors |
| location=new_function.location, | ||
| ) | ||
| ) | ||
| return errors |
There was a problem hiding this comment.
| return errors |
| location: Optional[SymbolLocation] = None | ||
|
|
||
|
|
||
| def _analyze_function_abi(old_abi, new_abi): |
There was a problem hiding this comment.
Please add type annotations to all function arguments and return values.
| new_abi = Abi.from_include_path( | ||
| header_path, args.include_file, extra_clang_args | ||
| ) | ||
| errors = analyze_c_abi(old_abi, new_abi) |
There was a problem hiding this comment.
| errors = analyze_c_abi(old_abi, new_abi) | |
| errors = list(analyze_c_abi(old_abi, new_abi)) |
This adds a script that can check for changes that would break our C-ABI. It works by comparing two sets of header files: the
c/includeheader files for the published ABI, as well as thec/includeheader files inside a new pull request. Each set of header files is parsed using libclang, and the differences between the old and new header files are examined for changes that would cause a breaking ABI change.Currently the following checks are made and flagged by this tool:
This is just currently a POC of using libclang to flag breaking c-abi changes - and is meant as an alternative to tools that require debug symbols for detecting abi breaks from the compiled library, and isn't ready to merge just yet.
To fully get this integrated, we will need to store the headers for the stable c-abi to use as a baseline somewhere, and then download the stable c-abi headers on each PR and use them to run in a GHA workflow to flag breaking changes.
When run on a branch that has a breaking ABI change (#1496 which has a number of breaking c-abi changes), the output is:
Also, this script runs in < 100ms on my workstation, and we could potentially add this as a pre-commit hook
closes #1739