Feat: add traction tenant mode for secure webhook registration#926
Conversation
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Pull Request Test Coverage Report for Build 20345995063Details
💛 - Coveralls |
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces comprehensive support for integrating with Traction and other secured multi-tenant ACA-Py deployments where privileged Admin API endpoints are restricted. The implementation adds a new "traction" tenancy mode alongside the existing "single" and "multi" modes, with unified configuration variables and fallback authentication strategies.
Key changes include:
- Introduction of
AGENT_TENANT_MODE="traction"with dedicatedTractionTenantAcapyconfiguration class - Unified
ACAPY_TENANT_WALLET_*configuration variables with backward compatibility for legacyMT_ACAPY_WALLET_*variables - Refactored webhook registration supporting both Admin API and direct Tenant API modes with automatic fallback
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| oidc-controller/api/tests/test_webhook_registration.py | Adds comprehensive test coverage for new webhook registration fallback logic, traction mode, and multi-tenant token injection |
| oidc-controller/api/main.py | Integrates traction mode logic into startup sequence with conditional token fetcher instantiation and webhook registration strategies |
| oidc-controller/api/core/webhook_utils.py | Implements fallback mechanism from Admin API to Tenant API with retry logic and comprehensive error handling |
| oidc-controller/api/core/config.py | Adds unified tenant configuration variables with automatic fallback to legacy variables and validation for multi-tenant/traction modes |
| oidc-controller/api/core/acapy/tests/test_config.py | Expands test suite to verify TractionTenantAcapy functionality, unified variable handling, and error conditions |
| oidc-controller/api/core/acapy/tests/test_client.py | Adds test coverage for traction tenancy mode client initialization |
| oidc-controller/api/core/acapy/config.py | Implements TractionTenantAcapy class with tenant-specific authentication and token fetching logic |
| oidc-controller/api/core/acapy/client.py | Extends AcapyClient initialization to support traction tenancy mode |
| docs/MigrationGuide.md | Documents configuration unification, backward compatibility, and new traction mode requirements |
| docs/ConfigurationGuide.md | Provides comprehensive documentation for traction mode setup, variable mapping, and deployment examples |
| docker/manage | Exports unified tenant configuration variables alongside legacy variables for backward compatibility |
| docker/docker-compose.yaml | Passes unified and legacy tenant configuration variables to controller service |
| docker/.env.example | Provides complete example configuration with traction mode setup and variable documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
loneil
left a comment
There was a problem hiding this comment.
This is really great to get going @MonolithicMonk . I tried this PR out locally with my vcauth using the hosted Traction Sandbox and it was working for me. Great to be able to watch connection and verification go through on the Tenant UI as testing.
I just have a concern about token expiry to look at I think.
| @@ -15,12 +15,15 @@ def get_headers() -> dict[str, str]: ... | |||
|
|
|||
|
|
|||
| class MultiTenantAcapy: | |||
There was a problem hiding this comment.
It occurs to me that, while not an issue with Traction, could this multi tenant token fetch be a problem on standard ACA-Py if a VCAuth instance is running with multiple processes (VCAuth can start up locally like that to try it out). A reason we use the multitenant provider plugin is that we weren't able to get more than 1 token at once. So if I start up 3 pods in a scalable deployment, would 2 of the pods on a regular ACA-Py possibly have trouble as the issuance of the other tokens invalidate the existing ones?
There was a problem hiding this comment.
ACA-Py wallet tokens are typically stateless JWTs, so issuing a new one in one pod shouldn't invalidate valid tokens held by other pods. However, to be safe and efficient, the new caching logic I implemented (1-hour TTL) minimizes the frequency of these fetch calls across all instances and significantly reduces load on the Agent.
There was a problem hiding this comment.
It occurs to me that, while not an issue with Traction, could this multi tenant token fetch be a problem on standard ACA-Py if a VCAuth instance is running with multiple processes (VCAuth can start up locally like that to try it out). A reason we use the multitenant provider plugin is that we weren't able to get more than 1 token at once. So if I start up 3 pods in a scalable deployment, would 2 of the pods on a regular ACA-Py possibly have trouble as the issuance of the other tokens invalidate the existing ones?
Explain this more. Very interesting...
(p/s, ignore my prev reply for now)
There was a problem hiding this comment.
Are you asking if the issuance of a multitenant token invalidates all preceding token issued to the tenant in a non traction multitenant deployment of aca-py?
There was a problem hiding this comment.
Yeah exactly. That's my understanding for ACA-Py Multitenancy and one of the the reasons we made Traction in the first place (and extracted the solution to the Multitenant Provider plugin).
It still states that in the docs here:
https://github.com/openwallet-foundation/acapy/blob/main/docs/features/Multitenancy.md#method-2-get-tenant-token
but it's been a while since I've used that feature other than with Traction
There was a problem hiding this comment.
For the caching solution, if it's in-memory rather than a persistence layer (the mongo Database the controller uses, or some other shared) then multiple pods or serverless processes or would still be fetching individual tokens and invalidating each other. 3 pods all starting up at the same time, or auto-scaling, etc would have 3 different tokens in this model, and 2 of those pods would have tokens that are invalidated.
Not too much a concern for me as we use a single tenant aca-py (or would use Traction), but something that unless I'm mistaken would be a blocker for using VCAuth in a MT ACA-Py deployment that isn't a single process. Though of course using the Multitenancy Provider plugin could alleviate that.
I've posted for clarification just to make sure this does still apply: https://discord.com/channels/1022962884864643214/1286299858994462842/1448556475692748801
|
Thanks you two for your feedbacks. I am a little scatter brained this past day so I haven't fully digested the feedback properly. I goal is to have resolved end of day and provide any additional feedback |
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
|
I'm not too sure about the class level ttl cache. Please provide feedback if needed. |
loneil
left a comment
There was a problem hiding this comment.
I think keeping the token in-memory and refreshing the way it does can be a way to go here.
On a different project I work on I'm a consumer of Traction issuing credentials to businesses and it's using GCP Cloud Run processes, so a similar need. We get a token when needed on cold start, store in the processes memory like here, and each process has a token.
I think as long as there's no odd edge cases with the time this can work as it is.
Could be a more scalable solution to persist the token in the DB, and then still use this caching to avoid churn of the token, but that could be a future todo if considerations for very high scaling are needed I think.
The notes about non-traction multitenant mode with the invalidating tokens are still something to consider though, but keeping in mind this PR is really for the Traction addition, so the MT-mode problem is already existing in there (unless I've got that very wrong...). It might just have to be a consideration that for now that mode can't parallelize? Could add some documentation at least if more investigation is needed.
Will see if @esune thinks we should proceed with this
| # Class-level cache | ||
| _token: str | None = None | ||
| _token_expiry: float = 0.0 | ||
| TOKEN_TTL: int = 3600 |
There was a problem hiding this comment.
While it defaults to 1 day, I could see Traction setups using tokens that live for less than 1 hour for security. Might be good to at least add a TODO to make this configurable in vcauth (or implement if simple)
There was a problem hiding this comment.
+1 to this, better to keep it conigurable
I would leave it as-is (in-memory) for now and revisit on the future as we are introducing quite a bit of extra complexity between this and the redis state sharing changes. |
esune
left a comment
There was a problem hiding this comment.
Approving change, if @MonolithicMonk can do the quick update for the configurable TTL in the next day or so I can re-approve and merge then.
|
Hey all, I've been swamped with family members for the past week and haven't had time to respond. I will respond by today. Thanks for feedback and review. |
Signed-off-by: Yuki I <omoge.real@gmail.com>
Signed-off-by: Yuki I <omoge.real@gmail.com>
Reference: Closes #923
Summary:
This PR introduces full support for integrating with Traction (or other secured multi-tenant ACA-Py deployments) where the privileged Admin API endpoints are restricted.
AGENT_TENANT_MODE="traction"(mapped toACAPY_TENANCY).TractionTenantAcapyconfiguration class to handle authentication viaTRACTION_TENANT_IDandTRACTION_TENANT_API_KEY(preferred), with a fallback to Wallet ID/Key authentication.register_tenant_webhookto support a "direct" mode. When in Traction mode, the controller bypasses the restricted Admin API (PUT /multitenancy/wallet/{id}) and uses the authenticated Tenant API (PUT /tenant/wallet) directly.docker-composeandmanagescripts to pass throughTRACTION_variables.ConfigurationGuide.mdandMigrationGuide.mdwith new variables and startup requirements.Impact:
Enables VC-AuthN to be deployed in hardened environments where the ACA-Py Admin API is blocked for security, preventing the "webhook registration hanging" issue described in #923.