Error Handling Template
Build resilient workflows that gracefully handle errors, retry failed operations, and notify the right people. Prevent cascading failures and ensure critical operations complete successfully.
Time to build: 25 minutes Difficulty: Intermediate Perfect for: Production workflows, critical operations, payment processing, data pipelines
What It Does
Attempt operation → If fails, retry with backoff → If still fails, notify + fallback
Ensures workflows handle errors gracefully without losing data or leaving operations incomplete.
Step-by-Step
1. Create Workflow
- New workflow → Name it "Resilient Payment Processor"
- Webhook trigger for payment requests
2. Add Validation Layer
- Drag Condition Block
- Connect Webhook to Condition
Condition:
<webhook.body.amount> > 0
AND <webhook.body.customer_id> is not empty
AND <webhook.body.payment_method> is not empty
If True: Proceed to processing
If False: Return validation error
3. Handle Validation Error (False Path)
- Drag Response Block on False path
{
"status": "error",
"error_code": "VALIDATION_FAILED",
"message": "Invalid payment request",
"details": {
"amount": "<webhook.body.amount>",
"customer_id": "<webhook.body.customer_id>",
"payment_method": "<webhook.body.payment_method>"
},
"timestamp": "<webhook.timestamp>"
}
4. Add Try-Catch Pattern (True Path)
- Drag HTTP Request Block for payment API
Method: POST
URL: https://api.stripe.com/v1/charges
Headers:
Authorization: Bearer <env.STRIPE_SECRET_KEY>
Body:
{
"amount": "<webhook.body.amount>",
"currency": "usd",
"customer": "<webhook.body.customer_id>",
"payment_method": "<webhook.body.payment_method>"
}
Timeout: 30 seconds
Retry: 3 times
Retry delay: 2, 4, 8 seconds (exponential backoff)
5. Check for Success
- Drag Condition Block
- Connect HTTP Request to Condition
Condition: <http_request.status> == 200
If True: Payment succeeded
If False: Payment failed, handle error
6. Handle Success (True Path)
- Drag Database Query Block
INSERT INTO payments (
customer_id,
amount,
status,
stripe_charge_id,
created_at
) VALUES (
'<webhook.body.customer_id>',
<webhook.body.amount>,
'completed',
'<http_request.body.id>',
NOW()
)
7. Send Success Notification
- Drag Email Tool
To: <webhook.body.customer_email>
Subject: Payment Successful
Your payment of $<webhook.body.amount> has been processed successfully.
Transaction ID: <http_request.body.id>
Date: <webhook.timestamp>
8. Handle Failure (False Path)
- Drag Condition Block to check error type
Condition: <http_request.error_code> in ["card_declined", "insufficient_funds"]
If True: User error, notify customer
If False: System error, retry or escalate
9. Handle User Errors (True Path)
- Drag Email Tool
To: <webhook.body.customer_email>
Subject: Payment Failed
Your payment of $<webhook.body.amount> could not be processed.
Reason: <http_request.error_message>
Action: Please update your payment method and try again.
10. Handle System Errors (False Path)
- Drag Database Query Block to log error
INSERT INTO failed_payments (
customer_id,
amount,
error_code,
error_message,
retry_count,
created_at
) VALUES (
'<webhook.body.customer_id>',
<webhook.body.amount>,
'<http_request.error_code>',
'<http_request.error_message>',
<http_request.retry_count>,
NOW()
)
11. Implement Fallback Payment Method
- Drag HTTP Request Block for backup processor
Method: POST
URL: https://api.paypal.com/v1/payments
Headers:
Authorization: Bearer <env.PAYPAL_API_KEY>
Body:
{
"amount": "<webhook.body.amount>",
"customer": "<webhook.body.customer_id>"
}
12. Alert Engineering Team
- Drag Slack Tool for critical errors
Channel: #payment-alerts
Priority: urgent
Message:
"🚨 Payment Processing Failure
Customer: <webhook.body.customer_id>
Amount: $<webhook.body.amount>
Primary processor: Failed (<http_request.error_code>)
Fallback processor: <http_fallback.status>
Retry count: <http_request.retry_count>
Action required: Investigate payment processor issues
View details: https://dashboard.yourapp.com/payments/failed/<database.id>"
13. Add Circuit Breaker
- Drag Database Query Block to check recent failures
SELECT COUNT(*) as failure_count
FROM failed_payments
WHERE created_at > NOW() - INTERVAL '5 minutes'
14. Trip Circuit if Too Many Failures
- Drag Condition Block
Condition: <database.failure_count> > 10
If True: Circuit open - stop processing, alert team
If False: Circuit closed - continue normal operation
15. Send Critical Alert (Circuit Tripped)
- Drag PagerDuty Tool or critical Slack
Severity: critical
Message:
"🔥 CIRCUIT BREAKER TRIPPED - Payment System Down
Failure count: <database.failure_count> in last 5 minutes
Action: Emergency response required
All payment processing PAUSED"
Error Handling Patterns
Retry with Exponential Backoff
Attempt 1: Immediate
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Attempt 5: Give up, alert team
Circuit Breaker
Monitor: Error rate over time
Threshold: > 50% errors in 5 minutes
Action: Stop processing, prevent cascade
Recovery: Auto-reset after 1 minute
Fallback Chain
Try: Primary service
Fallback 1: Secondary service
Fallback 2: Cache response
Fallback 3: Default value
Dead Letter Queue
Failed operations → Special queue
Process later when system recovered
Manual review if auto-retry fails
Real-World Examples
API Integration
- Call external API
- Retry on timeout or 5xx errors
- Fall back to cached data
- Alert if API down > 5 minutes
Email Delivery
- Try primary email service
- Fall back to secondary service
- Queue for later if both fail
- Alert if queue > 1000 emails
Database Operations
- Try primary database
- Fall back to read replica
- Return cached data if all fail
- Alert if primary down > 1 minute
File Upload
- Upload to primary S3 bucket
- Fall back to secondary region
- Retry failed chunks
- Alert if storage quota exceeded
Advanced Error Handling
Idempotency
- Generate unique request ID
- Check if operation already completed
- Skip duplicate operations
- Prevents double-charging customers
Partial Failure Handling
- Some items succeed, some fail
- Return partial success response
- Queue failed items for retry
- Don't rollback successful operations
Graceful Degradation
- Critical feature fails → Continue with reduced functionality
- Example: Recommendation engine down → Show popular items
- User experience maintained
Error Budgets
- Track error rate over time
- Alert if exceeds budget (e.g., 0.1%)
- Pause new deploys if over budget
- Focus on reliability
Monitoring and Alerting
Error Rate Monitoring
Track: Errors per minute
Alert: > 10 errors/minute
Action: Page on-call engineer
Latency Monitoring
Track: p95 latency
Alert: > 5 seconds
Action: Investigate slow operations
Success Rate Monitoring
Track: % successful operations
Alert: < 99%
Action: Review recent failures
Enhancements
Add Retry Queue
- Failed operations go to queue
- Process queue on schedule
- Exponential backoff for retries
Add Error Categorization
- Classify errors (user, system, external)
- Different handling per category
- Better debugging and metrics
Add Automatic Recovery
- Detect when service recovers
- Auto-retry queued operations
- Close circuit breaker
Add Error Aggregation
- Group similar errors
- Single alert for multiple failures
- Reduce alert fatigue
Cost
Error handling adds minimal cost:
- Retry logic: Same operation cost × retry count
- Monitoring: ~$0.01 per 1,000 operations
- Alerts: ~$0.001 per alert
Value of error handling:
- Prevents lost revenue
- Better user experience
- Faster incident response
- Reduced manual intervention
Example savings:
- Payment system: 99.9% → 99.99% uptime
- Saves $10,000/month in lost payments
- Cost of error handling: ~$50/month
Next Step
Combine with State Management to track error history and recovery patterns!