Back to blog
Article

Webhook Replay: What It Really Means and Why You Need It

By Aylon··6 min read

Replay is the feature your customers want when their endpoint was broken and now isn't. Here's what it actually requires.

Replay is what you reach for when retries weren't enough. The receiver was down for 48 hours, their entire weekend backlog of events is stuck in a DLQ, and they want it delivered now that they're back. What replay actually requires is more than 'send the events again', and the difference matters when you're building it or buying it.

What replay is

Replay is the ability to re-deliver events that were previously emitted, against the same destination configuration, with the same payload the receiver would have seen the first time. It exists because real-world receivers go down, credentials expire, and bugs ship. Without replay, the customer is on their own: they have to rebuild whatever state your events would have set up.

The three replay scopes you typically need:

  • Single delivery, re-run one failed attempt. Useful when a transient bug hit one specific event.
  • Time window, re-run every delivery to a destination between time A and time B. Useful after an outage.
  • Whole DLQ, re-run everything currently in the destination's dead-letter queue. Useful for recovery after a long-running misconfiguration.

What makes replay hard

The naive version is "look up the old payload, POST it again." That works for read-only events that just notify. It breaks for anything else, in three ways.

Idempotency. Most webhook receivers aren't idempotent by default. Re-delivering an "order placed" event twice can charge the customer twice, send two confirmation emails, or duplicate a row in their database. Replay needs to carry the original idempotencyKey (or equivalent dedup token) so the receiver can recognize the replay and skip it if they've already processed it.

Ordering. Some downstream systems care about order. Replaying a "subscription cancelled" event after a "subscription renewed" event from the same time window can corrupt state. Replay needs to either preserve ordering (per partition key, per customer, per resource) or surface enough information that the receiver can sort it out.

Transforms. If you have any payload transformation between ingest and delivery, field mapping, filtering, format conversion, the transformed payload needs to be the same on replay as it was on the original delivery. If you've upgraded your transform pipeline since the original delivery, your customer can get a different shape than they had before. Pin the transform version per delivery attempt.

Signing. Webhook signatures are typically computed over the payload and a timestamp. On replay, you need a fresh signature with a current timestamp so the receiver's verification doesn't reject it as a replay attack. Pre-computed historical signatures are useless.

What good replay looks like

The customer should be able to:

  1. Pick a scope, single delivery, time window, or DLQ.
  2. Optionally filter, by event type, by status, by error class.
  3. Preview what will be replayed, count of events, time range, sample payloads.
  4. Trigger the replay. Track its progress.
  5. See the results, how many succeeded, how many failed, links to each new attempt.

And the platform should:

  1. Re-deliver with the original customerExternalId, idempotencyKey, and routing decisions.
  2. Apply the same transform version that was applied to the original delivery (or the current one, if the customer explicitly opts to use the new transform).
  3. Generate fresh signatures and timestamps so receiver-side verification still passes.
  4. Log replay deliveries as a distinct event class so they don't get confused with original deliveries in the audit trail.
  5. Respect the retry policy on replay deliveries the same way as on originals.

What replay isn't

Replay isn't a substitute for retries. Retries cover transient failures inside a normal delivery window. Replay covers manual recovery after retries exhausted, or after configuration changes. The two work together, retries are automatic and short-window; replay is operator-triggered and arbitrarily-windowed.

Replay also isn't a substitute for idempotency on the receiver side. Even with perfect replay tooling, a receiver that processes every event eagerly without dedup will double-process replays. Replay can only do as much as the receiver's idempotency model allows.

When you need replay

You need replay if any of the following will happen to your customers more than once:

  • Their endpoint goes down for longer than your retry window.
  • Their credentials expire and need rotation.
  • They deploy a bug that 500s on a class of events.
  • They onboard mid-month and want backfill of events from before they signed up.
  • They reconnect a previously-deleted destination and want history.

For B2B SaaS, all five happen. Replay is non-optional for production-grade outbound delivery.

What it costs to build

Replay is the feature most teams underestimate. The core delivery engine is the easy part. Replay requires storing the original payload after transformation, tracking transform version per delivery attempt, generating a UI for the customer to select scope and trigger, instrumenting replay deliveries separately in the audit log, and handling partial-failure cases gracefully. Plan three to four engineering weeks for a usable v1, then expect another few weeks of polish based on what customers actually do with it.

Or use a platform that ships it as part of the event delivery layer. Pushrail's replay works across all 18 destination types with the idempotency + signing + transform-version handling baked in.

Next up in this series: dead-letter queues, what they should hold, how to inspect them, and what happens when they fill up.

Ready to stop building delivery infrastructure?

Start free. Send your first event in under 5 minutes.

Protected by reCAPTCHA, Google's Privacy Policy and Terms apply.