Matejrisek/actions/on failure by matejrisek · Pull Request #37945 · hashicorp/terraform

matejrisek · 2025-11-27T13:34:55Z

Introduce on_failure attribute to the action_trigger block.
This attribute will allow consumers to define the behavior if an action fails.
As of now we default to failing run if an error has been returned as part of action invocation.
We'll keep that the default behavior but introduce two options:

on_failure: fail
on_failure: continue

fail is the default behavior - we're just making our choice explicit.
continue is the new behavior we're adding - it will instruct terraform to ignore errors from the action invocation, log that the error has occurred and continue with the run execution.

The on_failure attribute takes inspiration from the namesake attribute in provisioners.

Example configuration

resource "tfcoremock_simple_resource" "test_resource" {
  id     = "test-resource"
  string = "This is a message"

  lifecycle {
    action_trigger {
      events = [before_create]
      actions = [
        action.tfcoremock_simple_resource.example
      ]
      on_failure = continue
    }
  }
}

action "tfcoremock_simple_resource" "example" {
  config {
    string = "Hello from action"
  }
}

Testing

In order to manually test this feature one should use a provider that allows for controlled erroring of actions. For that purpose I've used the modified tfcoremock[GH] provider which will always fail on the action invocation.

Fixes #

Target Release

1.15.x

Rollback Plan

If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

CHANGELOG entry

Add on_failure attribute to the action_trigger block to allow defining different behavior on action failure. For now we support the default fail keyword as well as the new continue keyword which instructs terraform to ignore action failure and continue with the run.

This change is user-facing and I added a changelog entry.
This change is not user-facing.

RFC

DanielMSchmidt

Great work, I think the basis looks really good, there are a couple of "edge" topics that are still open, but it looks like the right course to me.

DanielMSchmidt · 2025-11-28T13:40:06Z

internal/terraform/context_apply_action_test.go

 			},
 		},
+
+		"trigger on_failure set to 'fail' fails the resource": {


I think the test cases could be more extensive:

we should assert that the resource change has been made (so the right RPC call on the provider has been made)

we should check that on_failure = continue continues with the other actions in the actions list and in other action_trigger blocks.

we should make sure the behavior with multiple action triggers where the on_failure behavior is mixed is consistent with the expectations

Hey thanks for this pointers.

They should have been address in this commit: 2a46bea.

Just for the clarification:

we should assert that the resource change has been made (so the right RPC call on the provider has been made)

Do you mean we should assert action invocations or we do something else to verify changes?

DanielMSchmidt · 2025-11-28T13:43:33Z

internal/terraform/node_action_trigger_instance_apply.go

+) tfdiags.Diagnostics {
+	switch aii.ActionTrigger.TriggerOnFailure() {
+	case configs.ActionTriggerOnFailureContinue:
+		if currentDiags.HasErrors() {


Instead of printing the errors we want to continue with we probably want to wrap them in warning level diagnostics so that consumers that work directly with the returned diagnostics (e.g. the stacks runtime) can appropriately handle these diagnostics as well.

Also we should only do this for errors coming from the provider complete event, if a hook sends a diagnostic it is unrelated to the on_failure behavior and we should still pass them through.

DanielMSchmidt · 2025-11-28T13:49:28Z

internal/terraform/node_action_trigger_instance_plan.go

 		ActionTriggerBlockIndex: at.actionTriggerBlockIndex,
 		ActionsListIndex:        at.actionListIndex,
 		ActionTriggerEvent:      triggeringEvent,
+		ActionTriggerOnFailure:  at.onFailure,


This is now part of plans.LifecycleActionTrigger, so it needs to also be handled in the JSON representation:

terraform/internal/command/jsonplan/action_invocations.go

Line 130 in a051ac6

case *plans.LifecycleActionTrigger:

Also I think we need to handle this in plans.ActionInvocationInstanceSrc and in the serialization to and from the planfile:

terraform/internal/plans/planfile/tfplan.go

Line 1357 in a051ac6

ret.ActionTrigger = &plans.LifecycleActionTrigger{

Harden tests.

mildwonkey

👋🏻 hi, sorry! This isn't an actual review - I just wanted to make sure we're not actually merging any changes to actions before the PRD gets approved and RFCs are written and approved. We are ready for prototypes and RFCs, not mergable PRs 😁

jbardin · 2025-12-01T15:18:32Z

Just throwing some context in here for thought as the RFC are finalized. These may not end up being the final requirements, but are important to consider when trying to move provisioners to this new model.

In comparison to provisioners:

on_failure = fail marks the resource as tainted if there's an error. The provisioner is considered part of the resource, and failure indicates that the status of the resource is unknown.
on_failure = fail implies that dependency processing will halt, and no dependencies of the resource will be applied.

The current implementation can't fulfill either of those two points, because the separate apply nodes are not in the dependency chain, and are not evaluated until after the resource has been already recorded as complete.

jbardin · 2026-01-08T14:40:05Z

internal/terraform/node_action_trigger_instance_apply.go

+			var wrappedErrorDiags tfdiags.Diagnostics
+			wrappedErrorDiags = wrappedErrorDiags.Append(&hcl.Diagnostic{
+				Severity: hcl.DiagWarning,
+				Summary: "Actions contained errors but we're wrapping them " +


This sounds kind of like an implementation detail to me, and not something the user needs to be concerned with. The fact that we are wrapping the error in a warning is how we are choosing to implement on_failure = continue, and doesn't impact the user. It's also dropping the real summary, and replacing it with something not relevant to the actual diagnostic.

I might also be inclined to use our tfdiags diagnostic override, since this isn't necessarily an hcl diagnostic.

…_failure

matejrisek added 2 commits November 26, 2025 15:57

Add on_failure attribute to the action trigger config.

3c2aaa5

Add on_failure attribute to the action trigger config.

8aa68aa

matejrisek requested a review from a team as a code owner November 27, 2025 13:34

matejrisek requested review from DanielMSchmidt and mildwonkey November 27, 2025 13:35

matejrisek added 4 commits November 28, 2025 11:21

Populate CHANGELOG.md entry

aa4edf1

Correct way to populate CHANGELOG

c3f0dae

Merge branch 'main' into matejrisek/actions/on_failure

a426a07

Add autogenerated stringer file

054daae

DanielMSchmidt reviewed Nov 28, 2025

View reviewed changes

matejrisek added 3 commits December 1, 2025 13:00

Merge branch 'main' into matejrisek/actions/on_failure

c6efbc0

Address the first set of PR comments.

2a46bea

Harden tests.

Fix test.

047af32

mildwonkey requested changes Dec 1, 2025

View reviewed changes

matejrisek added 3 commits December 2, 2025 16:20

Make 'on_fail = continue' return wrapped errors.

db41829

Merge branch 'main' into matejrisek/actions/on_failure

d04ce1b

Merge branch 'main' into matejrisek/actions/on_failure

3402c5f

jbardin reviewed Jan 8, 2026

View reviewed changes

matejrisek added 4 commits January 15, 2026 10:30

Merge remote-tracking branch 'origin/main' into matejrisek/actions/on…

42dee18

…_failure

Add taint as an option for on_failure

7068aa3

Add tainting.

a780c9a

Merge branch 'main' into matejrisek/actions/on_failure

7fe146f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matejrisek/actions/on failure#37945

Matejrisek/actions/on failure#37945
matejrisek wants to merge 16 commits intomainfrom
matejrisek/actions/on_failure

matejrisek commented Nov 27, 2025 •

edited

Loading

Uh oh!

DanielMSchmidt left a comment

Uh oh!

DanielMSchmidt Nov 28, 2025

Uh oh!

matejrisek Dec 1, 2025

Uh oh!

DanielMSchmidt Nov 28, 2025

Uh oh!

DanielMSchmidt Nov 28, 2025

Uh oh!

mildwonkey left a comment

Uh oh!

jbardin commented Dec 1, 2025

Uh oh!

jbardin Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

matejrisek commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example configuration

Testing

Target Release

Rollback Plan

Changes to Security Controls

CHANGELOG entry

Uh oh!

DanielMSchmidt left a comment

Choose a reason for hiding this comment

Uh oh!

DanielMSchmidt Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

matejrisek Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

DanielMSchmidt Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

DanielMSchmidt Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mildwonkey left a comment

Choose a reason for hiding this comment

Uh oh!

jbardin commented Dec 1, 2025

Uh oh!

jbardin Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

matejrisek commented Nov 27, 2025 •

edited

Loading