Error Handling Template

Build resilient workflows that gracefully handle errors, retry failed operations, and notify the right people. Prevent cascading failures and ensure critical operations complete successfully.

Time to build: 25 minutes Difficulty: Intermediate Perfect for: Production workflows, critical operations, payment processing, data pipelines

What It Does

Attempt operation → If fails, retry with backoff → If still fails, notify + fallback

Ensures workflows handle errors gracefully without losing data or leaving operations incomplete.

Step-by-Step

1. Create Workflow

New workflow → Name it "Resilient Payment Processor"
Webhook trigger for payment requests

2. Add Validation Layer

Drag Condition Block
Connect Webhook to Condition

Condition:
<webhook.body.amount> > 0
AND <webhook.body.customer_id> is not empty
AND <webhook.body.payment_method> is not empty

If True: Proceed to processing
If False: Return validation error

3. Handle Validation Error (False Path)

Drag Response Block on False path

{
  "status": "error",
  "error_code": "VALIDATION_FAILED",
  "message": "Invalid payment request",
  "details": {
    "amount": "<webhook.body.amount>",
    "customer_id": "<webhook.body.customer_id>",
    "payment_method": "<webhook.body.payment_method>"
  },
  "timestamp": "<webhook.timestamp>"
}

4. Add Try-Catch Pattern (True Path)

Drag HTTP Request Block for payment API

Method: POST
URL: https://api.stripe.com/v1/charges

Headers:
Authorization: Bearer <env.STRIPE_SECRET_KEY>

Body:
{
  "amount": "<webhook.body.amount>",
  "currency": "usd",
  "customer": "<webhook.body.customer_id>",
  "payment_method": "<webhook.body.payment_method>"
}

Timeout: 30 seconds
Retry: 3 times
Retry delay: 2, 4, 8 seconds (exponential backoff)

5. Check for Success

Drag Condition Block
Connect HTTP Request to Condition

Condition: <http_request.status> == 200

If True: Payment succeeded
If False: Payment failed, handle error

6. Handle Success (True Path)

Drag Database Query Block

INSERT INTO payments (
  customer_id,
  amount,
  status,
  stripe_charge_id,
  created_at
) VALUES (
  '<webhook.body.customer_id>',
  <webhook.body.amount>,
  'completed',
  '<http_request.body.id>',
  NOW()
)

7. Send Success Notification

Drag Email Tool

To: <webhook.body.customer_email>
Subject: Payment Successful

Your payment of $<webhook.body.amount> has been processed successfully.

Transaction ID: <http_request.body.id>
Date: <webhook.timestamp>

8. Handle Failure (False Path)

Drag Condition Block to check error type

Condition: <http_request.error_code> in ["card_declined", "insufficient_funds"]

If True: User error, notify customer
If False: System error, retry or escalate

9. Handle User Errors (True Path)

Drag Email Tool

To: <webhook.body.customer_email>
Subject: Payment Failed

Your payment of $<webhook.body.amount> could not be processed.

Reason: <http_request.error_message>
Action: Please update your payment method and try again.

10. Handle System Errors (False Path)

Drag Database Query Block to log error

INSERT INTO failed_payments (
  customer_id,
  amount,
  error_code,
  error_message,
  retry_count,
  created_at
) VALUES (
  '<webhook.body.customer_id>',
  <webhook.body.amount>,
  '<http_request.error_code>',
  '<http_request.error_message>',
  <http_request.retry_count>,
  NOW()
)

11. Implement Fallback Payment Method

Drag HTTP Request Block for backup processor

Method: POST
URL: https://api.paypal.com/v1/payments

Headers:
Authorization: Bearer <env.PAYPAL_API_KEY>

Body:
{
  "amount": "<webhook.body.amount>",
  "customer": "<webhook.body.customer_id>"
}

12. Alert Engineering Team

Drag Slack Tool for critical errors

Channel: #payment-alerts
Priority: urgent
Message:
"🚨 Payment Processing Failure

Customer: <webhook.body.customer_id>
Amount: $<webhook.body.amount>
Primary processor: Failed (<http_request.error_code>)
Fallback processor: <http_fallback.status>
Retry count: <http_request.retry_count>

Action required: Investigate payment processor issues
View details: https://dashboard.yourapp.com/payments/failed/<database.id>"

13. Add Circuit Breaker

Drag Database Query Block to check recent failures

SELECT COUNT(*) as failure_count
FROM failed_payments
WHERE created_at > NOW() - INTERVAL '5 minutes'

14. Trip Circuit if Too Many Failures

Drag Condition Block

Condition: <database.failure_count> > 10

If True: Circuit open - stop processing, alert team
If False: Circuit closed - continue normal operation

15. Send Critical Alert (Circuit Tripped)

Drag PagerDuty Tool or critical Slack

Severity: critical
Message:
"🔥 CIRCUIT BREAKER TRIPPED - Payment System Down

Failure count: <database.failure_count> in last 5 minutes
Action: Emergency response required
All payment processing PAUSED"

Error Handling Patterns

Retry with Exponential Backoff

Attempt 1: Immediate
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Attempt 5: Give up, alert team

Circuit Breaker

Monitor: Error rate over time
Threshold: > 50% errors in 5 minutes
Action: Stop processing, prevent cascade
Recovery: Auto-reset after 1 minute

Fallback Chain

Try: Primary service
Fallback 1: Secondary service
Fallback 2: Cache response
Fallback 3: Default value

Dead Letter Queue

Failed operations → Special queue
Process later when system recovered
Manual review if auto-retry fails

Real-World Examples

API Integration

Call external API
Retry on timeout or 5xx errors
Fall back to cached data
Alert if API down > 5 minutes

Email Delivery

Try primary email service
Fall back to secondary service
Queue for later if both fail
Alert if queue > 1000 emails

Database Operations

Try primary database
Fall back to read replica
Return cached data if all fail
Alert if primary down > 1 minute

File Upload

Upload to primary S3 bucket
Fall back to secondary region
Retry failed chunks
Alert if storage quota exceeded

Advanced Error Handling

Idempotency

Generate unique request ID
Check if operation already completed
Skip duplicate operations
Prevents double-charging customers

Partial Failure Handling

Some items succeed, some fail
Return partial success response
Queue failed items for retry
Don't rollback successful operations

Graceful Degradation

Critical feature fails → Continue with reduced functionality
Example: Recommendation engine down → Show popular items
User experience maintained

Error Budgets

Track error rate over time
Alert if exceeds budget (e.g., 0.1%)
Pause new deploys if over budget
Focus on reliability

Monitoring and Alerting

Error Rate Monitoring

Track: Errors per minute
Alert: > 10 errors/minute
Action: Page on-call engineer

Latency Monitoring

Track: p95 latency
Alert: > 5 seconds
Action: Investigate slow operations

Success Rate Monitoring

Track: % successful operations
Alert: < 99%
Action: Review recent failures

Enhancements

Add Retry Queue

Failed operations go to queue
Process queue on schedule
Exponential backoff for retries

Add Error Categorization

Classify errors (user, system, external)
Different handling per category
Better debugging and metrics

Add Automatic Recovery

Detect when service recovers
Auto-retry queued operations
Close circuit breaker

Add Error Aggregation

Group similar errors
Single alert for multiple failures
Reduce alert fatigue

Cost

Error handling adds minimal cost:

Retry logic: Same operation cost × retry count
Monitoring: ~$0.01 per 1,000 operations
Alerts: ~$0.001 per alert

Value of error handling:

Prevents lost revenue
Better user experience
Faster incident response
Reduced manual intervention

Example savings:

Payment system: 99.9% → 99.99% uptime
Saves $10,000/month in lost payments
Cost of error handling: ~$50/month

Next Step

Combine with State Management to track error history and recovery patterns!

What It Does​

Step-by-Step​

1. Create Workflow​

2. Add Validation Layer​

3. Handle Validation Error (False Path)​

4. Add Try-Catch Pattern (True Path)​

5. Check for Success​

6. Handle Success (True Path)​

7. Send Success Notification​

8. Handle Failure (False Path)​

9. Handle User Errors (True Path)​

10. Handle System Errors (False Path)​

11. Implement Fallback Payment Method​

12. Alert Engineering Team​

13. Add Circuit Breaker​

14. Trip Circuit if Too Many Failures​

15. Send Critical Alert (Circuit Tripped)​

Error Handling Patterns​

Retry with Exponential Backoff​

Circuit Breaker​

Fallback Chain​

Dead Letter Queue​

Real-World Examples​

API Integration​

Email Delivery​

Database Operations​

File Upload​

Advanced Error Handling​

Idempotency​

Partial Failure Handling​

Graceful Degradation​

Error Budgets​

Monitoring and Alerting​

Error Rate Monitoring​

Latency Monitoring​

Success Rate Monitoring​

Enhancements​

Add Retry Queue​

Add Error Categorization​

Add Automatic Recovery​

Add Error Aggregation​

Cost​