How to A/B Test Your Dunning Sequence

Most SaaS companies set up their dunning sequence once and never touch it again. That's a problem, because your first version is almost certainly not your best version. Small changes to subject lines, timing, and tone can move recovery rates by 5-15 percentage points. On a base of $10,000/month in failed payments, that's $500-$1,500/month in additional recovered revenue, compounding every month.

This guide covers how to A/B test your dunning sequence effectively, even with small volumes.

Why dunning is different from marketing A/B testing

If you've run A/B tests on landing pages or marketing emails, some of the same principles apply. But dunning has unique characteristics that change the approach.

The audience is different. These aren't prospects you're trying to convert. They're existing customers who already chose your product. They want to keep using it. The barrier is a failed payment, not a lack of interest.

The stakes are higher per test. Each failed payment represents real recurring revenue. A bad test variant doesn't just reduce clicks on a landing page. It loses a customer. You need to be more careful with how you split traffic and how quickly you detect losers.

Volume is lower. Unless you're processing thousands of subscriptions, you might only see 50-200 failed payments per month. That limits how many tests you can run simultaneously and how quickly you reach statistical significance.

The funnel is compressed. In marketing, the path from impression to purchase has many steps. In dunning, the path from email open to payment recovery has two or three. That means individual touchpoints have outsized impact.

What to test (in priority order)

Not all test variables are equal. Here's what to test first, ranked by typical impact on recovery rate.

1. Subject lines

Subject lines determine whether your email gets opened. And an unopened email recovers exactly zero payments.

Test variations like:

Specific vs vague: "Your Visa ending in 4242 was declined" vs "Problem with your payment"
Name personalization vs no name: "[First name], your payment failed" vs "Your payment failed"
Urgency framing vs neutral: "Action needed: your subscription is at risk" vs "Your recent payment didn't go through"
Question vs statement: "Did you mean to cancel?" vs "Your payment was declined"

Typical lift from subject line optimization: 5-15% improvement in open rates, which translates to 2-5% improvement in recovery rates.

2. Timing of the first email

How soon after the failure you send the first email has a surprisingly large effect. Test:

Immediate (within 5 minutes) vs 1 hour vs 4 hours
Morning send time vs afternoon vs "whenever it fails"

The data across the industry consistently shows that faster is better. Emails sent within the first hour recover more than emails sent the next morning. But "within 5 minutes" vs "within 1 hour" varies by audience. B2C customers respond better to very fast notifications. B2B customers (who may need to route a card update through their finance team) show less sensitivity to the first hour.

3. Tone and copy

The overall tone of your dunning emails affects both open rates and action rates. Test:

Casual and friendly vs professional and neutral
Empathetic ("it happens to everyone") vs matter-of-fact ("your card was declined")
Help-oriented ("we're here to help") vs action-oriented ("update your card now")

One consistent finding: emails that feel like they're from a person (signed by a name, conversational tone) outperform emails that feel corporate (signed by "The Billing Team," formal language). The typical recovery lift is 3-8%.

4. Number of emails in the sequence

More isn't always better. Test:

3-email sequence vs 5-email sequence
With a final "suspension" email vs without

The sweet spot for most SaaS companies is 4-5 emails over a 10-14 day period. Fewer than 3 and you miss customers who simply didn't see the first one. More than 6 and opt-out rates climb, which hurts recovery on future failures.

5. SMS inclusion

If you have phone numbers, test:

Email-only sequence vs email + SMS
SMS on Day 3 vs SMS on Day 5
One SMS vs two SMS messages

Adding SMS typically lifts recovery by 15-25% over email alone. But the timing and frequency matter. One well-timed SMS (around Day 3-5) often performs as well as two, without the additional compliance burden.

6. CTA copy and design

The call-to-action button in your email is the final conversion point. Test:

"Update my card" vs "Fix my payment" vs "Keep my subscription"
Button color (contrast matters more than specific color)
Button placement (above the fold vs after explanation)

CTA tests typically produce smaller lifts (1-3%) but they're easy to run and the gains compound.

How to split test with small volumes

If you have 50-100 failed payments per month, you can't run 5 simultaneous tests. You need a disciplined approach.

Run one test at a time

Pick the highest-impact variable (start with subject lines on your first email). Run a single A/B test. Reach significance. Lock in the winner. Move to the next variable.

Use a simple 50/50 split

Assign each failed payment to variant A or variant B. The simplest method: use the last digit of the customer ID. Even digits get variant A, odd digits get variant B. This ensures a random, consistent split without needing a testing framework.

Set a minimum sample size

For dunning, where your baseline recovery rate might be 40-60% and you're looking for a 5+ percentage point lift, you need roughly 400 failed payments per variant to detect a meaningful difference with 95% confidence and 80% power.

If you process 100 failed payments per month, that's 8 months for a single test at 50/50 split. That's too slow for most teams.

The practical approach: relax your requirements. Accept 80-90% confidence instead of 95%. Look for larger effects (7-10% lift). This brings your minimum down to 100-150 per variant, or 2-3 months per test.

Alternatively, use a sequential testing approach. Monitor results as they come in and stop the test early if one variant is clearly winning or losing. This is less statistically rigorous but more practical for low-volume scenarios.

Don't test during anomalies

If you have a surge of failures due to a Stripe outage or a card network issue, pause your test or exclude those events. Anomalous data will skew your results.

Metrics that matter

When evaluating test results, focus on the right metrics.

Recovery rate (primary)

The percentage of failed payments in each variant that were eventually recovered. This is the only metric that directly translates to revenue. A subject line with a higher open rate but a lower recovery rate is a loser.

Revenue recovered per failure (secondary)

If your customer base has a wide range of plan prices, recovery rate alone can be misleading. Recovering 10 customers on a $29 plan ($290) is less valuable than recovering 5 customers on a $99 plan ($495). Weight by revenue when the distribution is skewed.

Open rate (diagnostic)

Open rate tells you whether your subject line is working. It's useful for diagnosing why a variant won or lost, but it's not the goal. A subject line that gets opened but doesn't drive action (because the email body is weak) has a high open rate and a low recovery rate.

Click-through rate (diagnostic)

CTR tells you whether the email content and CTA are compelling enough to drive action. Again, diagnostic. The customer still needs to complete the payment update for it to count as a recovery.

Time to recovery (secondary)

How quickly does each variant recover payments? If variant A recovers 45% of payments by Day 3 and variant B recovers 45% by Day 7, variant A is better. Faster recovery means fewer days of "free" service during the grace period and less risk of the customer forgetting entirely.

Opt-out rate (guard rail)

Monitor opt-out rates for both variants. If a more aggressive subject line lifts recovery by 3% but increases opt-outs by 5%, it's likely a net negative over time. Customers who opt out of dunning emails can never be recovered through email again.

Common wins from testing

Based on patterns across thousands of SaaS dunning sequences, here are changes that consistently produce lifts.

Personalizing the subject line with the customer's first name. Lift: 5-10% on open rates. This is table stakes. If you're not doing it, start here.

Sending the first email within 1 hour instead of the next day. Lift: 10-20% on recovery from the first email. The payment failure is fresh. The customer might still be at their computer.

Adding a single SMS on Day 3-5. Lift: 15-25% on overall recovery rate. SMS cuts through inbox noise. Many customers who ignored two emails will act on one text.

Using "your [Product] subscription is about to be canceled" instead of "payment issue." Lift: 5-8% on open rates for later-stage emails. Loss aversion is a strong motivator.

Including the specific card details ("your Visa ending in 4242"). Lift: 3-5% on recovery rate. Specificity signals that this is a real, actionable problem, not a generic notification.

Shortening the email body. Lift: 2-5% on click-through rates. Dunning emails don't need three paragraphs. Problem, consequence, CTA. That's it.

Mistakes to avoid

Optimizing for open rate instead of recovery rate. Clickbait subject lines can boost opens but hurt trust and action. "YOUR ACCOUNT WILL BE DELETED" might get opened, but it also gets marked as spam.

Testing too many variables at once. If you change the subject line, the email body, and the send time simultaneously, you won't know what caused the difference. One variable per test.

Ending tests too early. After 20 data points, variant A might be winning 60/40. After 100, it might be 52/48 and not statistically significant. Be patient. Set your sample size in advance and commit to it.

Ignoring weekday vs weekend effects. If you start a test on Monday and end it on Wednesday, your results are skewed toward weekday behavior. Run tests for full weeks to capture the full cycle.

Not accounting for plan value. A variant that recovers more $9/month customers but fewer $99/month customers might look like a winner by recovery rate but be a loser by revenue. Segment your analysis.

Forgetting to test the full sequence. Your dunning sequence is a system. Optimizing Email #1 in isolation is fine, but eventually you need to test sequence-level changes (e.g., 4 emails vs 5 emails, or adding SMS vs not). These require more volume but produce the largest overall lifts.

A practical testing roadmap

Here's a 6-month roadmap for a SaaS with ~100 failed payments per month.

Month 1-2: Subject line test on Email #1. This is your highest-impact, lowest-effort test. Compare your current subject line against one personalized alternative.

Month 3: Send timing test. Test sending Email #1 within 15 minutes of failure vs your current timing.

Month 4-5: SMS inclusion test. Split customers into email-only and email+SMS groups. One SMS on Day 5.

Month 6: Sequence length test. Test your current sequence against a version with one fewer or one more email.

After 6 months, you'll have data-driven answers to the four biggest questions in dunning optimization: what subject line works, when to send, whether SMS is worth it, and how many touches are optimal. That foundation will inform every future test.

When to stop testing

You'll never reach a "perfect" dunning sequence. Customer behavior changes, your product changes, even card networks change their decline behavior. Plan to re-test your top performers every 6-12 months.

But there's a point of diminishing returns. Once your recovery rate is above 65-70% and you've optimized the major variables, the incremental gains from testing become small. At that point, shift your focus to other levers: improving your payment update flow, reducing hard declines through card updater services, or expanding to new channels.

The goal of testing isn't perfection. It's closing the gap between what you're recovering now and what you could be recovering. For most teams, that gap is 10-20 percentage points of recovery rate, worth thousands of dollars per month. A/B testing is how you close it.