Skip to main content

Error Handling Template

Build resilient workflows that gracefully handle errors, retry failed operations, and notify the right people. Prevent cascading failures and ensure critical operations complete successfully.

Time to build: 25 minutes Difficulty: Intermediate Perfect for: Production workflows, critical operations, payment processing, data pipelines


What It Does

Attempt operation → If fails, retry with backoff → If still fails, notify + fallback

Ensures workflows handle errors gracefully without losing data or leaving operations incomplete.


Step-by-Step

1. Create Workflow

  • New workflow → Name it "Resilient Payment Processor"
  • Webhook trigger for payment requests

2. Add Validation Layer

  • Drag Condition Block
  • Connect Webhook to Condition
Condition:
<webhook.body.amount> > 0
AND <webhook.body.customer_id> is not empty
AND <webhook.body.payment_method> is not empty

If True: Proceed to processing
If False: Return validation error

3. Handle Validation Error (False Path)

  • Drag Response Block on False path
{
"status": "error",
"error_code": "VALIDATION_FAILED",
"message": "Invalid payment request",
"details": {
"amount": "<webhook.body.amount>",
"customer_id": "<webhook.body.customer_id>",
"payment_method": "<webhook.body.payment_method>"
},
"timestamp": "<webhook.timestamp>"
}

4. Add Try-Catch Pattern (True Path)

  • Drag HTTP Request Block for payment API
Method: POST
URL: https://api.stripe.com/v1/charges

Headers:
Authorization: Bearer <env.STRIPE_SECRET_KEY>

Body:
{
"amount": "<webhook.body.amount>",
"currency": "usd",
"customer": "<webhook.body.customer_id>",
"payment_method": "<webhook.body.payment_method>"
}

Timeout: 30 seconds
Retry: 3 times
Retry delay: 2, 4, 8 seconds (exponential backoff)

5. Check for Success

  • Drag Condition Block
  • Connect HTTP Request to Condition
Condition: <http_request.status> == 200

If True: Payment succeeded
If False: Payment failed, handle error

6. Handle Success (True Path)

  • Drag Database Query Block
INSERT INTO payments (
customer_id,
amount,
status,
stripe_charge_id,
created_at
) VALUES (
'<webhook.body.customer_id>',
<webhook.body.amount>,
'completed',
'<http_request.body.id>',
NOW()
)

7. Send Success Notification

  • Drag Email Tool
To: <webhook.body.customer_email>
Subject: Payment Successful

Your payment of $<webhook.body.amount> has been processed successfully.

Transaction ID: <http_request.body.id>
Date: <webhook.timestamp>

8. Handle Failure (False Path)

  • Drag Condition Block to check error type
Condition: <http_request.error_code> in ["card_declined", "insufficient_funds"]

If True: User error, notify customer
If False: System error, retry or escalate

9. Handle User Errors (True Path)

  • Drag Email Tool
To: <webhook.body.customer_email>
Subject: Payment Failed

Your payment of $<webhook.body.amount> could not be processed.

Reason: <http_request.error_message>
Action: Please update your payment method and try again.

10. Handle System Errors (False Path)

  • Drag Database Query Block to log error
INSERT INTO failed_payments (
customer_id,
amount,
error_code,
error_message,
retry_count,
created_at
) VALUES (
'<webhook.body.customer_id>',
<webhook.body.amount>,
'<http_request.error_code>',
'<http_request.error_message>',
<http_request.retry_count>,
NOW()
)

11. Implement Fallback Payment Method

  • Drag HTTP Request Block for backup processor
Method: POST
URL: https://api.paypal.com/v1/payments

Headers:
Authorization: Bearer <env.PAYPAL_API_KEY>

Body:
{
"amount": "<webhook.body.amount>",
"customer": "<webhook.body.customer_id>"
}

12. Alert Engineering Team

  • Drag Slack Tool for critical errors
Channel: #payment-alerts
Priority: urgent
Message:
"🚨 Payment Processing Failure

Customer: <webhook.body.customer_id>
Amount: $<webhook.body.amount>
Primary processor: Failed (<http_request.error_code>)
Fallback processor: <http_fallback.status>
Retry count: <http_request.retry_count>

Action required: Investigate payment processor issues
View details: https://dashboard.yourapp.com/payments/failed/<database.id>"

13. Add Circuit Breaker

  • Drag Database Query Block to check recent failures
SELECT COUNT(*) as failure_count
FROM failed_payments
WHERE created_at > NOW() - INTERVAL '5 minutes'

14. Trip Circuit if Too Many Failures

  • Drag Condition Block
Condition: <database.failure_count> > 10

If True: Circuit open - stop processing, alert team
If False: Circuit closed - continue normal operation

15. Send Critical Alert (Circuit Tripped)

  • Drag PagerDuty Tool or critical Slack
Severity: critical
Message:
"🔥 CIRCUIT BREAKER TRIPPED - Payment System Down

Failure count: <database.failure_count> in last 5 minutes
Action: Emergency response required
All payment processing PAUSED"

Error Handling Patterns

Retry with Exponential Backoff

Attempt 1: Immediate
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Attempt 5: Give up, alert team

Circuit Breaker

Monitor: Error rate over time
Threshold: > 50% errors in 5 minutes
Action: Stop processing, prevent cascade
Recovery: Auto-reset after 1 minute

Fallback Chain

Try: Primary service
Fallback 1: Secondary service
Fallback 2: Cache response
Fallback 3: Default value

Dead Letter Queue

Failed operations → Special queue
Process later when system recovered
Manual review if auto-retry fails

Real-World Examples

API Integration

  • Call external API
  • Retry on timeout or 5xx errors
  • Fall back to cached data
  • Alert if API down > 5 minutes

Email Delivery

  • Try primary email service
  • Fall back to secondary service
  • Queue for later if both fail
  • Alert if queue > 1000 emails

Database Operations

  • Try primary database
  • Fall back to read replica
  • Return cached data if all fail
  • Alert if primary down > 1 minute

File Upload

  • Upload to primary S3 bucket
  • Fall back to secondary region
  • Retry failed chunks
  • Alert if storage quota exceeded

Advanced Error Handling

Idempotency

  • Generate unique request ID
  • Check if operation already completed
  • Skip duplicate operations
  • Prevents double-charging customers

Partial Failure Handling

  • Some items succeed, some fail
  • Return partial success response
  • Queue failed items for retry
  • Don't rollback successful operations

Graceful Degradation

  • Critical feature fails → Continue with reduced functionality
  • Example: Recommendation engine down → Show popular items
  • User experience maintained

Error Budgets

  • Track error rate over time
  • Alert if exceeds budget (e.g., 0.1%)
  • Pause new deploys if over budget
  • Focus on reliability

Monitoring and Alerting

Error Rate Monitoring

Track: Errors per minute
Alert: > 10 errors/minute
Action: Page on-call engineer

Latency Monitoring

Track: p95 latency
Alert: > 5 seconds
Action: Investigate slow operations

Success Rate Monitoring

Track: % successful operations
Alert: < 99%
Action: Review recent failures

Enhancements

Add Retry Queue

  • Failed operations go to queue
  • Process queue on schedule
  • Exponential backoff for retries

Add Error Categorization

  • Classify errors (user, system, external)
  • Different handling per category
  • Better debugging and metrics

Add Automatic Recovery

  • Detect when service recovers
  • Auto-retry queued operations
  • Close circuit breaker

Add Error Aggregation

  • Group similar errors
  • Single alert for multiple failures
  • Reduce alert fatigue

Cost

Error handling adds minimal cost:

  • Retry logic: Same operation cost × retry count
  • Monitoring: ~$0.01 per 1,000 operations
  • Alerts: ~$0.001 per alert

Value of error handling:

  • Prevents lost revenue
  • Better user experience
  • Faster incident response
  • Reduced manual intervention

Example savings:

  • Payment system: 99.9% → 99.99% uptime
  • Saves $10,000/month in lost payments
  • Cost of error handling: ~$50/month
Next Step

Combine with State Management to track error history and recovery patterns!