http-resilience-part1-retries-timeouts-circuit-breakers

HTTP Resilience in .NET (Part 1): Retries, Timeouts & Circuit Breakers

Newsletter edition β€” Resilience series, 1 of 3 Every HTTP call is a tiny bet that the network, DNS, and the remote server all behave. They won't. This series shows how to lose those bets gracefully β€” without writing a single for (int i = 0; i < 3; i++) retry loop.

Transient failures β€” a dropped connection, a 503 during a deploy, a node that's briefly overloaded β€” are not exceptions in distributed systems. They're the weather. Resilience is your app's ability to recover from them and keep working.

In .NET, you get this almost for free with the Microsoft.Extensions.Http.Resilience package, built on top of Polly.

⚠️ Heads up: Microsoft.Extensions.Http.Polly is deprecated. Use Microsoft.Extensions.Http.Resilience instead.

Table of Contents


The 30-Second Setup

Install the package:

dotnet add package Microsoft.Extensions.Http.Resilience

Then chain one method onto your HTTP client registration:

var builder = Host.CreateApplicationBuilder(args);

builder.Services
	.AddHttpClient<ExampleClient>(client =>
		client.BaseAddress = new("https://api.example.com"))
	.AddStandardResilienceHandler();

That's it. Your HttpClient now retries transient failures, enforces timeouts, and trips a circuit breaker when a dependency is clearly down.

🧠 Rule of thumb: add one resilience handler per client. Don't stack handlers. If you need more control, use AddResilienceHandler (covered in Part 2).


What You Just Got for Free

AddStandardResilienceHandler chains five strategies, from the outermost to the innermost layer. I've added a "What it protects you from" column so the why is obvious at a glance:

Order Strategy Default What it protects you from
1 Rate limiter Queue: 0, Permit: 1_000 Your own app flooding a dependency
2 Total timeout 30s A single logical call hanging forever
3 Retry 3 retries, exponential backoff + jitter, 2s delay Brief transient blips
4 Circuit breaker 10% failure ratio, 100 min throughput, 30s sampling, 5s break Hammering a dependency that's already down
5 Attempt timeout 10s One slow attempt eating the whole budget
β€” Summary β€” All transient HTTP faults, with sane production defaults

The order matters: the total timeout caps the entire operation (including all retries), while the attempt timeout caps each individual try. The circuit breaker sits inside the retry so it can short-circuit fast when things are clearly broken.


Pattern 1: Retries

A retry simply re-sends a request that failed transiently. The trick is doing it politely:

  • Exponential backoff β€” wait longer after each failure (2s, 4s, 8s…) instead of pounding instantly.
  • Jitter β€” add randomness so 10,000 clients don't all retry at the exact same millisecond (the "thundering herd").

The standard handler enables both by default. If you want to tune it:

builder.Services
	.AddHttpClient<ExampleClient>()
	.AddStandardResilienceHandler(options =>
	{
		options.Retry.MaxRetryAttempts = 5;
		options.Retry.BackoffType = DelayBackoffType.Exponential;
		options.Retry.UseJitter = true;
	});

πŸ’‘ More retries β‰  more reliable. Past ~5 attempts you're usually just adding latency to a request that was never going to succeed.


Pattern 2: Timeouts

There are two timeouts, and conflating them is a classic mistake:

Timeout Scope Default Mental model
Attempt timeout One single try 10s "This one call took too long β€” give up on it."
Total timeout The whole operation incl. retries 30s "The user has waited long enough β€” stop everything."

Without a total timeout, three slow retries (10s each) could blow a 30-second budget unnoticed. With it, the operation is bounded no matter how the retries play out.

⚠️ Polly throws TimeoutRejectedException, not the standard TimeoutException. If you write custom ShouldHandle logic, handle the right one.


Pattern 3: Circuit Breakers

A circuit breaker stops you from repeatedly calling a dependency that is clearly failing. It mirrors the electrical version:

State Meaning Behavior
Closed Healthy Requests flow normally
Open Too many failures Requests fail fast β€” no call is even attempted
Half-Open Probation A trial request is allowed to test recovery
β€” Summary β€” Gives a struggling dependency room to recover instead of piling on

With the defaults, the breaker trips when 10% of requests fail within a 30-second sampling window (and at least 100 requests flowed through), then stays open for 5 seconds before testing recovery.

builder.Services
	.AddHttpClient<ExampleClient>()
	.AddStandardResilienceHandler(options =>
	{
		options.CircuitBreaker.FailureRatio = 0.2;            // trip at 20%
		options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30);
		options.CircuitBreaker.MinimumThroughput = 50;
		options.CircuitBreaker.BreakDuration = TimeSpan.FromSeconds(5);
	});

🧠 The minimum-throughput guard is what stops a single failure during a quiet period from tripping the breaker. You need real signal before you cut the line.


What Counts as a Transient Failure?

Both retry and circuit breaker react to the same set of signals out of the box:

Signal Example Retry/break?
HTTP 5xx 500, 502, 503 βœ…
HTTP 408 Request Timeout βœ…
HTTP 429 Too Many Requests βœ…
HttpRequestException Connection refused / reset βœ…
TimeoutRejectedException Polly attempt timeout fired βœ…
HTTP 400 / 401 / 404 Bad request, unauthorized, not found ❌ (your bug, not a blip)

That last row is the important one: don't retry your own logic errors. A 404 won't fix itself on attempt #3.


The One Gotcha: Retrying POST

By default the standard handler retries all HTTP methods. For a GET, that's harmless. For a POST that inserts a record, a retry can create duplicate data.

Disable retries for unsafe methods:

builder.Services
	.AddHttpClient<ExampleClient>()
	.AddStandardResilienceHandler(options =>
	{
		// Skip POST, PATCH, PUT, DELETE, CONNECT
		options.Retry.DisableForUnsafeHttpMethods();
	});

…or be explicit:

options.Retry.DisableFor(HttpMethod.Post, HttpMethod.Delete);

πŸ’‘ The real fix for retried writes is idempotency keys β€” but disabling retries on unsafe methods is the safe default until you have them.


Key Takeaways

  1. One line β€” AddStandardResilienceHandler() β€” gives you retries, timeouts, and a circuit breaker with production-grade defaults.
  2. Two timeouts exist: per-attempt (10s) and per-operation (30s). Respect both.
  3. Backoff + jitter prevent thundering herds; circuit breakers prevent hammering a dead dependency.
  4. Never retry blindly on POST/PUT/DELETE β€” guard against duplicate writes.
  5. Only 5xx, 408, 429, and connection-level errors are treated as transient. Your 4xx logic bugs are left alone β€” as they should be.

Next up β€” Part 2: Hedging & Custom Pipelines. What if retrying sequentially is too slow? Racing requests in parallel and building your own strategy stack.


Resources


Got questions? Reach out on LinkedIn.

Want more .NET deep dives? Follow along β€” Parts 2 and 3 drop next.