Monitoring your website or service is critical for maintaining uptime and catching issues before they impact users. But setting up monitoring shouldn’t require complex infrastructure or expensive third-party tools.
With ETLR, you can build production-grade monitoring workflows using simple YAML files. Schedule health checks, track response times, filter for errors, and send alerts to your team—all with infrastructure you can version, test, and deploy like code.
Why ETLR for Monitoring?
Traditional monitoring solutions often require:
- Complex setup and configuration UIs
- Vendor lock-in with proprietary alerting
- Limited customisation options
- High costs at scale
ETLR gives you:
- Infrastructure as Code - Monitor workflows in Git alongside your application
- Flexible Alerting - Send alerts to Slack, Discord, email, or any webhook
- Custom Logic - Filter, transform, and enrich monitoring data however you need
- Simple Deployment - One command to deploy or update monitoring rules
Let’s explore real-world monitoring examples you can deploy today.
Example 1: Basic HTTP Health Check
The simplest monitoring workflow: ping your service every minute and alert on failures.
workflow:
name: "website_healthcheck"
description: "Monitor website uptime and response time"
input:
type: cron
cron: "*/1 * * * *" # Every minute
steps:
- type: http_call
url: https://yourapp.com/health
method: GET
timeout: 10
include_status: true
output_to: health_check
- type: filter
groups:
- conditions:
- field: health_check.status
op: ne
value: 200
- type: slack_webhook
webhook_url: "${env:SLACK_WEBHOOK_URL}"
text_template: "🚨 Health check failed!\nURL: https://yourapp.com/health\nStatus: ${health_check.status}\nDuration: ${health_check.duration_ms}ms"
How it works:
- Cron trigger - Runs every minute automatically
- HTTP call - Makes a GET request to your health endpoint with a 10-second timeout
- Filter - Only continues if status code is NOT 200 (drops successful checks)
- Slack alert - Sends a formatted message with status code and response time
Key parameters:
include_status: true- Adds status code to the responseoutput_to- Stores the response athealth_checkin statefilterwithne(not equal) operator - Drops events when status is 200
Example 2: Multi-Endpoint Monitoring
Monitor multiple services and track which ones are down:
workflow:
name: "multi_service_monitor"
description: "Monitor multiple endpoints and report failures"
input:
type: cron
cron: "*/5 * * * *" # Every 5 minutes
steps:
# Check API
- type: http_call
url: https://api.yourapp.com/health
method: GET
timeout: 10
include_status: true
drop_on_failure: false
output_to: api_health
# Check Web App
- type: http_call
url: https://app.yourapp.com
method: GET
timeout: 10
include_status: true
drop_on_failure: false
output_to: web_health
# Check Database API
- type: http_call
url: https://db.yourapp.com/status
method: GET
timeout: 10
include_status: true
drop_on_failure: false
output_to: db_health
# Add timestamp
- type: add_timestamp
format: ISO-8601
field: checked_at
# Send comprehensive status report
- type: slack_webhook
webhook_url: "${env:SLACK_WEBHOOK_URL}"
text_template: |
📊 Service Status Report
Time: ${checked_at}
API: ${api_health.status} (${api_health.duration_ms}ms)
Web: ${web_health.status} (${web_health.duration_ms}ms)
DB: ${db_health.status} (${db_health.duration_ms}ms)
Key features:
drop_on_failure: false- Continue workflow even if a request fails- Multiple
http_callsteps with differentoutput_tovalues add_timestamp- Adds current time for reportingtext_templatewith multiline string - Creates formatted report
Example 3: Response Time Monitoring with Thresholds
Alert when response times exceed acceptable thresholds:
workflow:
name: "response_time_monitor"
description: "Alert on slow response times"
input:
type: cron
cron: "*/2 * * * *" # Every 2 minutes
steps:
- type: http_call
url: https://api.yourapp.com/users
method: GET
headers:
Authorization: "Bearer ${env:API_TOKEN}"
timeout: 30
include_status: true
output_to: api_response
# Filter for slow responses (> 2000ms)
- type: filter
groups:
- conditions:
- field: api_response.duration_ms
op: gt
value: 2000
# Send detailed alert
- type: slack_webhook
webhook_url: "${env:SLACK_WEBHOOK_URL}"
text_template: |
⚠️ SLOW API RESPONSE DETECTED
Endpoint: /users
Response Time: ${api_response.duration_ms}ms
Status: ${api_response.status}
Threshold: 2000ms
Action Required: Investigate performance degradation
Key concepts:
gtoperator - Greater than comparison in filter step- Response time threshold filtering
- Custom alert messages with context
Example 4: API Error Rate Monitoring
Track API errors and alert when error rates spike:
workflow:
name: "api_error_monitor"
description: "Monitor API for 4xx and 5xx errors"
input:
type: cron
cron: "*/1 * * * *"
steps:
- type: http_call
url: https://api.yourapp.com/analytics/errors
method: GET
headers:
X-API-Key: "${env:API_KEY}"
timeout: 15
output_to: error_stats
# Filter for error rate > 5%
- type: filter
groups:
- conditions:
- field: error_stats.body.error_rate
op: gt
value: 5.0
- type: add_timestamp
field: alert_time
- type: slack_webhook
webhook_url: "${env:SLACK_WEBHOOK_URL}"
text_template: |
🔴 HIGH ERROR RATE ALERT
Error Rate: ${error_stats.body.error_rate}%
Total Requests: ${error_stats.body.total_requests}
Failed Requests: ${error_stats.body.failed_requests}
Time: ${alert_time}
Dashboard: https://dashboard.yourapp.com/errors
Advanced features:
- Accessing nested response data with state paths
- Percentage-based thresholds
- Including dashboard links in alerts
Example 5: Certificate Expiry Monitoring
Monitor SSL certificate expiry dates:
workflow:
name: "ssl_certificate_monitor"
description: "Alert on expiring SSL certificates"
input:
type: cron
cron: "0 9 * * *" # Daily at 9 AM UTC
steps:
- type: http_call
url: https://api.sslchecker.com/check
method: POST
headers:
Authorization: "Bearer ${env:SSL_CHECKER_TOKEN}"
body:
domains:
- yourapp.com
- api.yourapp.com
- app.yourapp.com
output_to: cert_status
# Filter for certificates expiring in < 30 days
- type: filter
groups:
- conditions:
- field: cert_status.body.days_until_expiry
op: lt
value: 30
- type: slack_webhook
webhook_url: "${env:SLACK_WEBHOOK_URL}"
text_template: |
⚠️ SSL CERTIFICATE EXPIRING SOON
Domain: ${cert_status.body.domain}
Expires: ${cert_status.body.expiry_date}
Days Remaining: ${cert_status.body.days_until_expiry}
Action Required: Renew certificate before expiry
Example 6: Advanced Multi-Channel Alerting
Send alerts to multiple channels based on severity:
workflow:
name: "advanced_monitoring"
description: "Multi-service monitoring with severity-based routing"
input:
type: cron
cron: "*/3 * * * *"
steps:
# Check critical endpoint
- type: http_call
url: https://api.yourapp.com/critical-service
method: GET
timeout: 10
include_status: true
output_to: critical_check
# Add metadata
- type: add_timestamp
field: check_time
- type: add_uuid
field: incident_id
# Critical failure path
- type: filter
groups:
- conditions:
- field: critical_check.status
op: gte
value: 500
# Alert via Slack
- type: slack_webhook
webhook_url: "${env:SLACK_CRITICAL_WEBHOOK}"
text_template: |
🚨 CRITICAL SERVICE FAILURE
Incident ID: ${incident_id}
Service: critical-service
Status: ${critical_check.status}
Response Time: ${critical_check.duration_ms}ms
Time: ${check_time}
@channel - Immediate attention required!
# Also send email for critical issues
- type: resend_email
api_key: "${env:RESEND_API_KEY}"
from_email: "[email protected]"
to: "[email protected]"
subject_template: "🚨 Critical Service Failure - ${incident_id}"
html_template: |
<h2>Critical Service Failure Detected</h2>
<p><strong>Incident ID:</strong> ${incident_id}</p>
<p><strong>Service:</strong> critical-service</p>
<p><strong>Status Code:</strong> ${critical_check.status}</p>
<p><strong>Response Time:</strong> ${critical_check.duration_ms}ms</p>
<p><strong>Timestamp:</strong> ${check_time}</p>
<hr>
<p>Please investigate immediately.</p>
Enterprise features:
add_uuidfor incident tracking- Severity-based filtering with
filterstep - Multi-channel alerting (Slack + Email)
- Structured incident data in state
Example 7: Uptime Tracking with Logging
Log all health check results for uptime analysis:
workflow:
name: "uptime_tracker"
description: "Track uptime and log all checks"
input:
type: cron
cron: "*/1 * * * *"
steps:
- type: http_call
url: https://yourapp.com/health
method: GET
timeout: 10
include_status: true
drop_on_failure: false
output_to: health
- type: add_timestamp
format: ISO-8601
field: timestamp
# Log to external service
- type: http_call
url: https://logging.yourapp.com/healthchecks
method: POST
headers:
Authorization: "Bearer ${env:LOGGING_TOKEN}"
body:
timestamp: "${timestamp}"
status: "${health.status}"
duration_ms: "${health.duration_ms}"
success: "${health.status == 200}"
output_to: log_result
# Alert only on failure
- type: filter
groups:
- conditions:
- field: health.status
op: ne
value: 200
- type: slack_webhook
webhook_url: "${env:SLACK_WEBHOOK_URL}"
text_template: "🚨 Service down! Status: ${health.status}"
Key patterns:
- Logging all checks regardless of outcome
- Separate alert logic from logging
- Structured data collection for analysis
Best Practices
1. Use Environment Variables for Secrets
Never hardcode tokens or webhook URLs. Use environment variables instead:
# ✅ Good
webhook_url: "${env:SLACK_WEBHOOK_URL}"
api_key: "${env:API_KEY}"
# ❌ Bad
webhook_url: "https://hooks.slack.com/services/T00/B00/XXX"
2. Set Appropriate Timeouts
Balance between catching slow responses and avoiding false alarms:
# Fast API endpoint
timeout: 5
# Slower backend service
timeout: 30
# Long-running operation
timeout: 120
3. Use drop_on_failure: false for Comprehensive Monitoring
When monitoring multiple services, continue the workflow even if one fails:
- type: http_call
url: https://service1.com/health
drop_on_failure: false
output_to: service1
- type: http_call
url: https://service2.com/health
drop_on_failure: false
output_to: service2
4. Add Context to Alerts
Include actionable information in alert messages:
text_template: |
🚨 Alert: ${service_name} Down
Status: ${response.status}
Response Time: ${response.duration_ms}ms
Endpoint: ${endpoint_url}
Time: ${timestamp}
Dashboard: https://dashboard.yourapp.com
Runbook: https://docs.yourapp.com/runbooks/service-down
5. Test Before Deploying
Use start_now: true on cron inputs to test immediately:
input:
type: cron
cron: "*/5 * * * *"
start_now: true # Runs once immediately on deployment
Deployment
Deploy any of these workflows with the ETLR CLI:
# Deploy from YAML file
etlr deploy monitoring.yaml
# Deploy from workflow.yaml in current directory
etlr deploy
# Deploy with environment variables
etlr deploy monitoring.yaml -e SLACK_WEBHOOK_URL=https://hooks.slack.com/...
The CLI will automatically:
- Create or update the workflow
- Start it running
- Display the webhook URL (for
http_webhookinputs)
Monitor your workflows in the ETLR dashboard where you can view:
- Execution history and logs
- Response times and success rates
- State data from each run
- Version history and rollback options
Conclusion
With ETLR, you can build comprehensive monitoring solutions using simple YAML workflows. From basic health checks to advanced multi-service monitoring with custom alerting logic, everything is code that you can version, test, and deploy with confidence.
The examples in this post are production-ready and based on actual ETLR step documentation. Start with a simple health check, then expand to monitor multiple services, track response times, and route alerts based on severity.
Your monitoring infrastructure should be as reliable and maintainable as the services you’re monitoring. With ETLR, it finally can be.
Next Steps
Ready to build your first monitoring workflow? Here are some resources to help you get started:
Core Documentation
- ETLR CLI - Install the CLI and learn deployment commands
- Workflows Guide - Understand how workflows are structured
- State Management - Learn how data flows between steps
Key Steps for Monitoring
- http_call - Make HTTP requests to check endpoints
- filter - Apply conditional logic to your checks
- slack_webhook - Send alerts to Slack
- add_timestamp - Add timestamps to your monitoring data
Input Types
- Cron Schedules - Run checks on a schedule
- HTTP Webhooks - Trigger workflows via HTTP
More Resources
- Browse All Steps - Explore 20+ available integrations
- Environment Variables - Securely manage API keys and secrets