Monitoring API
The Monitoring API provides endpoints for real-time system monitoring, performance metrics, and health checks.
Status: Roadmap
This API is on the development roadmap. The endpoints documented below represent the planned interface design.
Base URL
http://localhost:8080/api/v1/monitoring
Authentication
Uses the standard botserver authentication mechanism with appropriate role-based permissions.
Endpoints
System Health
| Method |
Endpoint |
Description |
| GET |
/api/v1/monitoring/health |
Get overall system health |
| GET |
/api/v1/monitoring/health/live |
Kubernetes liveness probe |
| GET |
/api/v1/monitoring/health/ready |
Kubernetes readiness probe |
Performance Metrics
| Method |
Endpoint |
Description |
| GET |
/api/v1/monitoring/metrics |
Get all metrics (Prometheus format) |
| GET |
/api/v1/monitoring/metrics/summary |
Get metrics summary |
| GET |
/api/v1/monitoring/metrics/{metric_name} |
Get specific metric |
Service Status
| Method |
Endpoint |
Description |
| GET |
/api/v1/monitoring/services |
List all services status |
| GET |
/api/v1/monitoring/services/{service_id} |
Get specific service status |
| POST |
/api/v1/monitoring/services/{service_id}/restart |
Restart a service |
Resource Usage
| Method |
Endpoint |
Description |
| GET |
/api/v1/monitoring/resources |
Get resource usage overview |
| GET |
/api/v1/monitoring/resources/cpu |
Get CPU usage |
| GET |
/api/v1/monitoring/resources/memory |
Get memory usage |
| GET |
/api/v1/monitoring/resources/disk |
Get disk usage |
| GET |
/api/v1/monitoring/resources/network |
Get network statistics |
Alert Configuration
| Method |
Endpoint |
Description |
| GET |
/api/v1/monitoring/alerts |
List all alerts |
| POST |
/api/v1/monitoring/alerts |
Create a new alert rule |
| GET |
/api/v1/monitoring/alerts/{alert_id} |
Get alert details |
| PUT |
/api/v1/monitoring/alerts/{alert_id} |
Update alert rule |
| DELETE |
/api/v1/monitoring/alerts/{alert_id} |
Delete alert rule |
| GET |
/api/v1/monitoring/alerts/active |
Get currently firing alerts |
Log Stream
| Method |
Endpoint |
Description |
| GET |
/api/v1/monitoring/logs |
Get recent logs |
| GET |
/api/v1/monitoring/logs/stream |
Stream logs via WebSocket |
| GET |
/api/v1/monitoring/logs/search |
Search logs with query |
Request Examples
Check System Health
health = GET "/api/v1/monitoring/health"
TALK "System Status: " + health.status
TALK "Uptime: " + health.uptime
FOR EACH component IN health.components
TALK component.name + ": " + component.status
NEXT
Get Performance Metrics
metrics = GET "/api/v1/monitoring/metrics/summary"
TALK "Request Rate: " + metrics.requests_per_second + "/s"
TALK "Average Latency: " + metrics.avg_latency_ms + "ms"
TALK "Error Rate: " + metrics.error_rate + "%"
Check Resource Usage
resources = GET "/api/v1/monitoring/resources"
TALK "CPU: " + resources.cpu.usage_percent + "%"
TALK "Memory: " + resources.memory.used_mb + "/" + resources.memory.total_mb + " MB"
TALK "Disk: " + resources.disk.used_gb + "/" + resources.disk.total_gb + " GB"
Create Alert Rule
alert = NEW OBJECT
alert.name = "High CPU Alert"
alert.metric = "cpu_usage_percent"
alert.condition = ">"
alert.threshold = 80
alert.duration = "5m"
alert.severity = "warning"
alert.notify = ["ops@example.com"]
result = POST "/api/v1/monitoring/alerts", alert
TALK "Alert created: " + result.alert_id
Get Active Alerts
alerts = GET "/api/v1/monitoring/alerts/active"
IF alerts.count > 0 THEN
TALK "Active alerts: " + alerts.count
FOR EACH alert IN alerts.items
TALK alert.severity + ": " + alert.message
NEXT
ELSE
TALK "No active alerts"
END IF
Search Logs
params = NEW OBJECT
params.query = "error"
params.level = "error"
params.start_time = "2025-01-01T00:00:00Z"
params.limit = 100
logs = GET "/api/v1/monitoring/logs/search?" + ENCODE_PARAMS(params)
FOR EACH log IN logs.entries
TALK log.timestamp + " [" + log.level + "] " + log.message
NEXT
Health Response Format
{
"status": "healthy",
"uptime": "5d 12h 30m",
"version": "6.1.0",
"components": [
{"name": "database", "status": "healthy", "latency_ms": 2},
{"name": "cache", "status": "healthy", "latency_ms": 1},
{"name": "storage", "status": "healthy", "latency_ms": 5},
{"name": "llm", "status": "healthy", "latency_ms": 150}
]
}
Metrics Format
Metrics are exposed in Prometheus format:
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 12345
# HELP http_request_duration_seconds HTTP request latency
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 8000
http_request_duration_seconds_bucket{le="0.5"} 11000
http_request_duration_seconds_bucket{le="1"} 12000
Alert Severity Levels
| Level |
Description |
info |
Informational, no action required |
warning |
Attention needed, not critical |
error |
Error condition, requires attention |
critical |
Critical, immediate action required |
Response Codes
| Code |
Description |
| 200 |
Success |
| 201 |
Alert created |
| 400 |
Bad Request (invalid parameters) |
| 401 |
Unauthorized |
| 403 |
Forbidden (insufficient permissions) |
| 404 |
Resource not found |
| 500 |
Internal Server Error |
| 503 |
Service Unavailable |
Required Permissions
| Endpoint Category |
Required Role |
| Health Checks |
Public (no auth for basic health) |
| Metrics |
monitor or admin |
| Service Status |
monitor or admin |
| Resource Usage |
monitor or admin |
| Alert Configuration |
admin |
| Logs |
admin or log_viewer |