Source linked

Cloudflare Workflows Ships Saga Rollbacks as Step Metadata, Not a Catch Block

blog.cloudflare.com@fierce_condor2 hours ago·Developer Tools·3 comments

Each step.do() now accepts a rollback handler that runs in reverse start order when the workflow fails, replacing manual try-catch unwind blocks.

cloudflarecloudflare workflowssaga patternrollbacksdurable execution

You no longer have to write a sprawling try-catch block to undo half a workflow's side effects when one step fails. Cloudflare Workflows now lets each step.do() carry its own rollback handler, reversing just the completed steps in the order they started.

Why rollbacks live on step.do() and not in a fluent chain

A fluent API like step.do("charge-card", chargeCard).rollback(refundCharge) reads well but breaks Workers RPC's promise pipelining. step.do() returns a promise, and attaching .rollback() to that promise would imply rollback is a method on the output value, not a step-level option. It also forces the engine to wait for .rollback() before knowing the full step definition, delaying when the step can start.

A builder API (step.saga(...).do(() => ...).rollback(() => ...).run()) adds ceremony and a new primitive, making step.do() feel deprecated. Instead, Cloudflare chose a { rollback, rollbackConfig } options object as the third argument to step.do(). The forward action and its compensation stay in one call, the step starts immediately, and the API extends what developers already know.

await step.do("debit-bank-a", () => bankA.debit(from, amount), {
 rollback: async ({ output }) => bankA.credit(from, amount, output.id),
});

How rollback survives engine restart and parallel ambiguity

When a workflow fails, the engine looks at its durable step history: which steps started, finished, registered rollback, and their start order. It then invokes each rollback stub in reverse step-start order, not completion order. That matters for parallel steps inside Promise.all() - completion order is unpredictable, but start order is a stable key.

If the engine restarts mid-rollback, the durable history persists but the in-memory rollback stubs are gone. Workflows handles this via replay: it re-runs the workflow code without re-executing forward step bodies. When it hits a completed step.do(), it reads the persisted output and re-registers the rollback stub from the stored options. The side effects never duplicate.

Rollback handlers get the same operational guarantees as forward steps: per-handler retries, timeouts, lifecycle events, and logs. If a handler exhausts its retries, Workflows records the failure and stops - the workflow lands in the Errored state, not silently swallowed.

What sagas unlock next

The initial release supports sequential rollback execution, per-step retry/timeout config, and life-cycle events. Cloudflare plans parallel rollback execution, rollback for waitForEvent, and Python Workflows support. The hard part of sagas is not designing compensation logic - it's knowing what already happened when things go south. Attaching that answer directly to each step.do() makes the workflow definition the single source of truth for forward and reverse paths alike.


Source: How we built saga rollbacks for Cloudflare Workflows
Domain: blog.cloudflare.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.