Conversation
DanielMSchmidt
left a comment
There was a problem hiding this comment.
Great work, I think the basis looks really good, there are a couple of "edge" topics that are still open, but it looks like the right course to me.
| }, | ||
| }, | ||
|
|
||
| "trigger on_failure set to 'fail' fails the resource": { |
There was a problem hiding this comment.
I think the test cases could be more extensive:
- we should assert that the resource change has been made (so the right RPC call on the provider has been made)
- we should check that
on_failure = continuecontinues with the other actions in theactionslist and in otheraction_triggerblocks. - we should make sure the behavior with multiple action triggers where the on_failure behavior is mixed is consistent with the expectations
There was a problem hiding this comment.
Hey thanks for this pointers.
They should have been address in this commit: 2a46bea.
Just for the clarification:
we should assert that the resource change has been made (so the right RPC call on the provider has been made)
Do you mean we should assert action invocations or we do something else to verify changes?
| ) tfdiags.Diagnostics { | ||
| switch aii.ActionTrigger.TriggerOnFailure() { | ||
| case configs.ActionTriggerOnFailureContinue: | ||
| if currentDiags.HasErrors() { |
There was a problem hiding this comment.
Instead of printing the errors we want to continue with we probably want to wrap them in warning level diagnostics so that consumers that work directly with the returned diagnostics (e.g. the stacks runtime) can appropriately handle these diagnostics as well.
Also we should only do this for errors coming from the provider complete event, if a hook sends a diagnostic it is unrelated to the on_failure behavior and we should still pass them through.
| ActionTriggerBlockIndex: at.actionTriggerBlockIndex, | ||
| ActionsListIndex: at.actionListIndex, | ||
| ActionTriggerEvent: triggeringEvent, | ||
| ActionTriggerOnFailure: at.onFailure, |
There was a problem hiding this comment.
This is now part of plans.LifecycleActionTrigger, so it needs to also be handled in the JSON representation:
Also I think we need to handle this in plans.ActionInvocationInstanceSrc and in the serialization to and from the planfile:
terraform/internal/plans/planfile/tfplan.go
Line 1357 in a051ac6
mildwonkey
left a comment
There was a problem hiding this comment.
👋🏻 hi, sorry! This isn't an actual review - I just wanted to make sure we're not actually merging any changes to actions before the PRD gets approved and RFCs are written and approved. We are ready for prototypes and RFCs, not mergable PRs 😁
|
Just throwing some context in here for thought as the RFC are finalized. These may not end up being the final requirements, but are important to consider when trying to move provisioners to this new model. In comparison to provisioners:
The current implementation can't fulfill either of those two points, because the separate apply nodes are not in the dependency chain, and are not evaluated until after the resource has been already recorded as complete. |
| var wrappedErrorDiags tfdiags.Diagnostics | ||
| wrappedErrorDiags = wrappedErrorDiags.Append(&hcl.Diagnostic{ | ||
| Severity: hcl.DiagWarning, | ||
| Summary: "Actions contained errors but we're wrapping them " + |
There was a problem hiding this comment.
This sounds kind of like an implementation detail to me, and not something the user needs to be concerned with. The fact that we are wrapping the error in a warning is how we are choosing to implement on_failure = continue, and doesn't impact the user. It's also dropping the real summary, and replacing it with something not relevant to the actual diagnostic.
I might also be inclined to use our tfdiags diagnostic override, since this isn't necessarily an hcl diagnostic.
Introduce
on_failureattribute to theaction_triggerblock.This attribute will allow consumers to define the behavior if an action fails.
As of now we default to failing run if an error has been returned as part of action invocation.
We'll keep that the default behavior but introduce two options:
failis the default behavior - we're just making our choice explicit.continueis the new behavior we're adding - it will instruct terraform to ignore errors from the action invocation, log that the error has occurred and continue with the run execution.The
on_failureattribute takes inspiration from the namesake attribute in provisioners.Example configuration
Testing
In order to manually test this feature one should use a provider that allows for controlled erroring of actions. For that purpose I've used the modified
tfcoremock[GH] provider which will always fail on the action invocation.Fixes #
Target Release
1.15.x
Rollback Plan
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.
CHANGELOG entry
Add
on_failureattribute to theaction_triggerblock to allow defining different behavior on action failure. For now we support the defaultfailkeyword as well as the newcontinuekeyword which instructs terraform to ignore action failure and continue with the run.RFC