feat(gateway/data_plane): support gateway bound to multiple data_planes#2350
feat(gateway/data_plane): support gateway bound to multiple data_planes#2350wklken wants to merge 3 commits intoTencentBlueKing:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request implements support for gateways to be bound to multiple data planes, enabling better multi-environment and multi-cluster support for the API Gateway system.
Changes:
- Introduces new
data_planeapp with DataPlane and GatewayDataPlaneBinding models for managing data plane configurations - Updates release and publishing logic to distribute gateway configurations to multiple data planes simultaneously
- Adds management commands to initialize default data plane and bind existing gateways
Reviewed changes
Copilot reviewed 28 out of 33 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| src/dashboard/apigateway/apigateway/apps/data_plane/* | New app implementing data plane models, managers, admin, and management commands |
| src/dashboard/apigateway/apigateway/core/models.py | Added data_plane foreign key to ReleaseHistory model |
| src/dashboard/apigateway/apigateway/utils/crypto.py | Extracted BkCrypto class for reuse |
| src/dashboard/apigateway/apigateway/utils/etcd.py | Added new_etcd_client() for per-data-plane ETCD configurations |
| src/dashboard/apigateway/apigateway/biz/gateway/saver.py | Updated to bind new gateways to data planes |
| src/dashboard/apigateway/apigateway/biz/gateway/releaser.py | Updated to release to all active data planes |
| src/dashboard/apigateway/apigateway/controller/tasks/* | Updated task signatures to accept data_plane_id parameter |
| src/dashboard/apigateway/apigateway/controller/publisher/publish.py | Updated to iterate over data planes for publishing |
| src/dashboard/apigateway/apigateway/controller/distributor/etcd.py | Updated to use per-data-plane ETCD clients |
| src/dashboard/apigateway/apigateway/apis/*/serializers.py | Added data_planes field to gateway sync APIs |
| src/dashboard/bin/on_migrate | Added commands to initialize default data plane |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/dashboard/apigateway/apigateway/controller/tasks/syncing.py
Outdated
Show resolved
Hide resolved
src/dashboard/apigateway/apigateway/controller/publisher/publish.py
Outdated
Show resolved
Hide resolved
...ateway/apigateway/apps/data_plane/management/commands/bind_gateways_to_default_data_plane.py
Show resolved
Hide resolved
src/dashboard/apigateway/apigateway/apis/web/release/serializers.py
Outdated
Show resolved
Hide resolved
|
|
||
| def get_queryset(self): | ||
| return ReleaseHistory.objects.filter(gateway=self.request.gateway) | ||
| return ReleaseHistory.objects.filter(gateway=self.request.gateway).select_related("data_plane") |
There was a problem hiding this comment.
here we don't need the data_plane?
...shboard/apigateway/apigateway/apps/data_plane/management/commands/init_default_data_plane.py
Show resolved
Hide resolved
| first_history = history | ||
| if deploy_history: | ||
| deploy_history.publish_id = history.id | ||
| deploy_history.save() |
There was a problem hiding this comment.
TODO: the programmable gateway should be only bound to one data_plane?
here only save the deploy_history at the first_history
|
|
||
| # record audit log only for the first data plane to avoid duplicates | ||
| if history == first_history: | ||
| Auditor.record_release_op_success( |
There was a problem hiding this comment.
TODO: only one audit record?
| ) | ||
|
|
||
| def release(self): | ||
| def release(self) -> ReleaseHistory: |
There was a problem hiding this comment.
FIXME: the function should return multiple ReleaseHistory?
src/dashboard/apigateway/apigateway/controller/tasks/syncing.py
Outdated
Show resolved
Hide resolved
Code reviewNo critical issues found (confidence threshold: 80+). Checked for bugs and CLAUDE.md compliance. Review Summary:
A detailed review report with all findings has been saved locally to Highest-confidence findings (75):
These issues may warrant attention before moving out of draft status, but they didn't reach the certainty threshold (80+) for automated reporting. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Code Review - Detailed FindingsComprehensive review of PR #2350 analyzed 32 changed files (1100 additions, 138 deletions). Critical Issues (Confidence: 75)1. Missing Error Handling in Celery TasksIssue: Celery tasks call Files: Recommendation: Use 2. Missing Atomic Transaction for Gateway-Data Plane BindingIssue: File: Recommendation: Wrap gateway creation and data plane binding in 3. No Rollback Mechanism for Partial Data Plane ReleasesIssue: The release() method creates ReleaseHistory and triggers releases for each data plane in a loop without transaction wrapping. If release fails for data_plane 2 of 3, the first data plane will have successfully released but there's no rollback, leaving the system in an inconsistent state. Recommendation: Implement a rollback mechanism or compensating transactions for failed releases. 4. Breaking API Change in GatewayResourceDistributorIssue: The constructor signature changed from File: Recommendation: Consider making data_plane optional with a default value for backward compatibility. 5. Missing Test Coverage for Multi-Data-Plane ScenariosIssue: The PR introduces multi-data-plane support but doesn't appear to include tests for edge cases like:
Recommendation: Add comprehensive test coverage for multi-data-plane scenarios. Moderate Issues (Confidence: 50-70)6. Audit Log Only Recorded for First Data Plane (Confidence: 50)Only the first release is audited while others are invisible to audit logs, breaking audit trail completeness. 7. Bare Exception Handling in Crypto Operations (Confidence: 50)The etcd_configs property catches all exceptions and returns an empty dict, which could silently hide critical errors. 8. Missing Error History for Validation Failures (Confidence: 70)If validation fails and there are no active data planes, no ReleaseHistory is created for error tracking. 9. Incomplete Error Logging for Pre-Release Validation (Confidence: 60)If validation fails and there are no data planes, error handling skips creating release history. 10. N+1 Query in Data Plane Binding Loop (Confidence: 50)Individual queries for each data_plane_name instead of bulk fetch. SummaryTotal Issues: 28 analyzed
Full detailed report with all 28 issues available in local file 🤖 Generated with Claude Code |
|
Important
Issues by Priority
Could this lead to silent failures where users think a release succeeded but nothing was actually published? Would it be better to fail explicitly or at least return a failure status that surfaces to the API response?
When decryption fails, an empty dict is returned. Could this cause downstream ETCD connection failures that are hard to debug? Would it be clearer to either raise an exception or return a sentinel value that explicitly indicates a decryption failure?
This issues one query per data_plane_id. Would it make sense to batch this into a single query: DataPlane.objects.filter(id__in=self._data_plane_ids)?
Could this be extracted to a helper function to reduce duplication and ensure consistent behavior?
The logic assumes data_planes is non-empty (checked earlier), but the redundant check at line 94-96 suggests uncertainty. Is this defensive coding, or does it indicate a possible race condition where data_planes could become empty between the check and the loop? |
Description
this PR is processed by cursor(write the code), claude code(review), and me(write the details, review)
the task:
Checklist