Monitoring & APM Integration
The Monitoring integration connects your ITSM Automation platform to observability platforms — Prometheus, Grafana, Datadog, New Relic, Dynatrace, Zabbix, Nagios, and PRTG. Alerts from these platforms auto-create incidents with AI-enriched context, correlated with CMDB topology for blast-radius analysis. Every alert is matched against Configuration Items in your CMDB so on-call teams see exactly which services and dependencies are affected.
The alert model
The alert model represents a monitoring alert ingested from an external observability platform. Each alert is automatically correlated with CMDB Configuration Items, enriched with AI context, and optionally linked to an auto-created ITSM ticket.
Properties
- Name
id- Type
- string
- Description
Unique identifier for the alert.
- Name
source- Type
- string
- Description
The monitoring platform that generated the alert. One of
prometheus,grafana,datadog,newrelic,dynatrace,zabbix,nagios, orprtg.
- Name
alert_name- Type
- string
- Description
The alert rule name as defined in the source platform.
- Name
severity- Type
- string
- Description
Alert severity level. One of
critical,warning, orinfo.
- Name
status- Type
- string
- Description
Current alert status. One of
firing,resolved,acknowledged, orsilenced.
- Name
target- Type
- string
- Description
The affected resource — a hostname, service name, or URL.
- Name
ci_id- Type
- string
- Description
Auto-matched CMDB Configuration Item ID based on target and labels.
- Name
metric_name- Type
- string
- Description
The metric that triggered the alert (e.g.,
cpu_usage_percent,http_request_duration_seconds).
- Name
metric_value- Type
- number
- Description
The current value of the metric at the time the alert fired.
- Name
threshold- Type
- number
- Description
The threshold value that was breached to trigger the alert.
- Name
description- Type
- string
- Description
Human-readable description of the alert condition.
- Name
labels- Type
- object
- Description
Key-value labels from the source platform (e.g.,
{"env": "production", "cluster": "us-east-1"}).
- Name
annotations- Type
- object
- Description
Annotations from the source platform containing runbook URLs, dashboards, or additional context.
- Name
ticket_id- Type
- string
- Description
The ID of the auto-created ITSM ticket, if the alert triggered incident creation.
- Name
fired_at- Type
- timestamp
- Description
Timestamp of when the alert first fired.
- Name
resolved_at- Type
- timestamp
- Description
Timestamp of when the alert was resolved. Null if still active.
- Name
org_id- Type
- string
- Description
The tenant organization ID that owns this alert.
List alerts
This endpoint returns a paginated list of monitoring alerts across all connected sources. You can filter by source platform, severity, status, and target resource. Results are ordered by fired_at descending (most recent first).
Optional attributes
- Name
source- Type
- string
- Description
Filter by monitoring source:
prometheus,grafana,datadog,newrelic,dynatrace,zabbix,nagios, orprtg.
- Name
severity- Type
- string
- Description
Filter by severity:
critical,warning, orinfo.
- Name
status- Type
- string
- Description
Filter by alert status:
firing,resolved,acknowledged, orsilenced.
- Name
target- Type
- string
- Description
Filter by affected resource hostname or service name.
- Name
limit- Type
- integer
- Description
Limit the number of alerts returned. Default is
20, maximum is100.
Request
curl -G http://localhost:3000/v1/monitoring/alerts \
-H "Authorization: Bearer {token}" \
-d severity=critical \
-d status=firing \
-d limit=10
Response
{
"has_more": true,
"data": [
{
"id": "alrt_9xKpQ2mR4wVbN3",
"source": "prometheus",
"alert_name": "HighCpuUsage",
"severity": "critical",
"status": "firing",
"target": "api-server-prod-03.us-east-1",
"ci_id": "ci_7nLsW8kF9xYt",
"metric_name": "cpu_usage_percent",
"metric_value": 97.3,
"threshold": 90.0,
"description": "CPU usage on api-server-prod-03 has exceeded 90% for more than 5 minutes",
"labels": {
"env": "production",
"cluster": "us-east-1",
"service": "api-gateway",
"team": "platform"
},
"annotations": {
"runbook": "https://wiki.internal/runbooks/high-cpu",
"dashboard": "https://grafana.internal/d/cpu-overview"
},
"ticket_id": "tkt_4kPmN8rT2wXq",
"fired_at": 1708732800,
"resolved_at": null,
"org_id": "org_5jHtK4pR8uWn"
},
{
"id": "alrt_3nLsW8kF9xYtP7",
"source": "datadog",
"alert_name": "HighErrorRate",
"severity": "critical",
"status": "firing",
"target": "checkout-service",
"ci_id": "ci_8mWqR3nT5vXp",
"metric_name": "http_5xx_rate",
"metric_value": 12.8,
"threshold": 5.0,
"description": "5xx error rate for checkout-service has exceeded 5% for 3 minutes",
"labels": {
"env": "production",
"service": "checkout",
"region": "eu-west-1"
},
"annotations": {
"runbook": "https://wiki.internal/runbooks/checkout-errors"
},
"ticket_id": "tkt_6jHtK4pR8uWn",
"fired_at": 1708732500,
"resolved_at": null,
"org_id": "org_5jHtK4pR8uWn"
}
]
}
Get alert details
This endpoint retrieves the full details of a single alert, including its CMDB correlation data. The response includes the matched Configuration Item, its upstream and downstream dependencies, and a blast-radius summary showing which services may be affected.
Request
curl http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3 \
-H "Authorization: Bearer {token}"
Response
{
"id": "alrt_9xKpQ2mR4wVbN3",
"source": "prometheus",
"alert_name": "HighCpuUsage",
"severity": "critical",
"status": "firing",
"target": "api-server-prod-03.us-east-1",
"ci_id": "ci_7nLsW8kF9xYt",
"metric_name": "cpu_usage_percent",
"metric_value": 97.3,
"threshold": 90.0,
"description": "CPU usage on api-server-prod-03 has exceeded 90% for more than 5 minutes",
"labels": {
"env": "production",
"cluster": "us-east-1",
"service": "api-gateway",
"team": "platform"
},
"annotations": {
"runbook": "https://wiki.internal/runbooks/high-cpu",
"dashboard": "https://grafana.internal/d/cpu-overview"
},
"ticket_id": "tkt_4kPmN8rT2wXq",
"fired_at": 1708732800,
"resolved_at": null,
"org_id": "org_5jHtK4pR8uWn",
"cmdb_correlation": {
"ci_name": "api-server-prod-03",
"ci_type": "server",
"environment": "production",
"owner_team": "platform-engineering",
"upstream_dependencies": ["load-balancer-prod", "dns-prod"],
"downstream_dependencies": ["checkout-service", "user-service", "inventory-service"],
"blast_radius": 3
}
}
Acknowledge an alert
This endpoint marks an alert as acknowledged. Acknowledging an alert signals that an engineer is aware of the issue and is actively investigating. Acknowledged alerts stop triggering repeat notifications but remain visible in dashboards.
Optional attributes
- Name
comment- Type
- string
- Description
A note explaining who is investigating and what the initial assessment is.
Request
curl -X POST http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/acknowledge \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{
"comment": "Investigating high CPU — likely caused by the 14:00 deployment. Rolling back."
}'
Response
{
"id": "alrt_9xKpQ2mR4wVbN3",
"status": "acknowledged",
"acknowledged_by": "usr_2kNpL7mS9wYr",
"acknowledged_at": 1708733100,
"comment": "Investigating high CPU — likely caused by the 14:00 deployment. Rolling back."
}
Silence an alert
This endpoint silences an alert for a specified duration. Silenced alerts are suppressed from notifications and dashboards but continue to be recorded. Use this for planned maintenance windows or known conditions that are being addressed.
Required attributes
- Name
duration_minutes- Type
- integer
- Description
How long to silence the alert, in minutes. Maximum is
10080(7 days).
Optional attributes
- Name
reason- Type
- string
- Description
The reason for silencing this alert (e.g., scheduled maintenance, known issue).
Request
curl -X POST http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/silence \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{
"duration_minutes": 120,
"reason": "Scheduled maintenance window — server reboot and patch application"
}'
Response
{
"id": "alrt_9xKpQ2mR4wVbN3",
"status": "silenced",
"silenced_by": "usr_2kNpL7mS9wYr",
"silenced_at": 1708733200,
"silenced_until": 1708740400,
"reason": "Scheduled maintenance window — server reboot and patch application"
}
Ingest alert webhook
This endpoint ingests alert webhooks from monitoring platforms. It supports the native webhook formats of Prometheus Alertmanager, Grafana, Datadog, New Relic, Dynatrace, Zabbix, Nagios, and PRTG. The platform auto-detects the source format, normalizes the alert into the internal model, matches against CMDB CIs, and optionally creates an ITSM ticket based on configured rules.
Authentication for webhooks uses HMAC-SHA256 signatures via the X-Webhook-Signature header, configured per monitoring source.
Required attributes
- Name
body- Type
- object
- Description
The raw webhook payload from the monitoring platform. Format varies by source — the platform auto-detects and normalizes.
Request
curl -X POST http://localhost:3000/v1/monitoring/webhook \
-H "X-Webhook-Signature: sha256=a1b2c3d4e5f6..." \
-H "Content-Type: application/json" \
-d '{
"receiver": "itsm-automation",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighMemoryUsage",
"severity": "warning",
"instance": "db-primary-01:9090",
"job": "postgres"
},
"annotations": {
"summary": "Memory usage above 85% on db-primary-01",
"runbook_url": "https://wiki.internal/runbooks/high-memory"
},
"startsAt": "2026-02-24T14:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
],
"groupLabels": { "alertname": "HighMemoryUsage" },
"commonLabels": { "severity": "warning" }
}'
Response
{
"ingested": 1,
"alerts": [
{
"id": "alrt_5pRtH5jN2cEmK8",
"source": "prometheus",
"alert_name": "HighMemoryUsage",
"severity": "warning",
"status": "firing",
"target": "db-primary-01",
"ci_id": "ci_3kPmN8rT2wXq",
"metric_name": "memory_usage_percent",
"metric_value": 87.2,
"threshold": 85.0,
"ticket_id": null,
"fired_at": 1708783200
}
]
}
List monitoring sources
This endpoint returns all configured monitoring sources with their connection health and sync status. Each source represents a connected observability platform that sends alerts into the ITSM platform.
Request
curl http://localhost:3000/v1/monitoring/sources \
-H "Authorization: Bearer {token}"
Response
{
"data": [
{
"id": "msrc_7xKpQ2mR4wVb",
"type": "prometheus",
"name": "Prometheus Production",
"status": "connected",
"endpoint": "https://prometheus.internal:9090",
"auth_type": "bearer_token",
"webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
"health": {
"status": "healthy",
"last_alert_received_at": 1708732800,
"alerts_ingested_24h": 47,
"latency_ms": 12
},
"created_at": 1706140800
},
{
"id": "msrc_3nLsW8kF9xYt",
"type": "datadog",
"name": "Datadog EU",
"status": "connected",
"endpoint": "https://api.datadoghq.eu",
"auth_type": "api_key",
"webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
"health": {
"status": "healthy",
"last_alert_received_at": 1708732500,
"alerts_ingested_24h": 23,
"latency_ms": 89
},
"created_at": 1706227200
},
{
"id": "msrc_9pRtH5jN2cEm",
"type": "grafana",
"name": "Grafana Cloud",
"status": "connected",
"endpoint": "https://grafana.company.com",
"auth_type": "api_key",
"webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
"health": {
"status": "degraded",
"last_alert_received_at": 1708730000,
"alerts_ingested_24h": 8,
"latency_ms": 342
},
"created_at": 1706313600
}
]
}
Configure a monitoring source
This endpoint configures a new monitoring source. Once configured, the platform generates a unique webhook URL for the source and validates the connection. Alert-to-ticket rules can be configured to automatically create incidents when specific severity levels or alert names are received.
Required attributes
- Name
type- Type
- string
- Description
The monitoring platform type. One of
prometheus,grafana,datadog,newrelic,dynatrace,zabbix,nagios, orprtg.
- Name
name- Type
- string
- Description
Human-readable name for this monitoring source.
- Name
auth- Type
- object
- Description
Authentication credentials. Structure varies by platform type.
Optional attributes
- Name
endpoint- Type
- string
- Description
The monitoring platform API endpoint URL. Required for pull-based sources.
- Name
auto_create_ticket- Type
- object
- Description
Rules for automatic ticket creation. Contains
enabled(boolean),severity_threshold(string), andalert_names(array of strings).
Request
curl -X POST http://localhost:3000/v1/monitoring/sources \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{
"type": "prometheus",
"name": "Prometheus Production",
"auth": {
"type": "bearer_token",
"token": "prom-token-abc123"
},
"endpoint": "https://prometheus.internal:9090",
"auto_create_ticket": {
"enabled": true,
"severity_threshold": "critical",
"alert_names": ["HighCpuUsage", "HighMemoryUsage", "DiskSpaceLow"]
}
}'
Response
{
"id": "msrc_4kPmN8rT2wXq",
"type": "prometheus",
"name": "Prometheus Production",
"status": "connected",
"endpoint": "https://prometheus.internal:9090",
"auth_type": "bearer_token",
"webhook_url": "https://itsm.company.com/v1/monitoring/webhook?source=msrc_4kPmN8rT2wXq",
"webhook_secret": "whsec_a1b2c3d4e5f6g7h8",
"auto_create_ticket": {
"enabled": true,
"severity_threshold": "critical",
"alert_names": ["HighCpuUsage", "HighMemoryUsage", "DiskSpaceLow"]
},
"health": {
"status": "healthy",
"last_alert_received_at": null,
"alerts_ingested_24h": 0,
"latency_ms": 45
},
"created_at": 1708733400
}
Get monitoring dashboard
This endpoint returns aggregated monitoring dashboard data for your organization. It includes alert counts grouped by severity and status, mean time to resolution (MTTR), top affected Configuration Items, and a timeline of alert activity over the past 24 hours.
Optional attributes
- Name
period- Type
- string
- Description
Time period for aggregation. One of
1h,6h,24h,7d, or30d. Defaults to24h.
- Name
source- Type
- string
- Description
Filter dashboard data to a specific monitoring source.
Request
curl -G http://localhost:3000/v1/monitoring/dashboard \
-H "Authorization: Bearer {token}" \
-d period=24h
Response
{
"period": "24h",
"summary": {
"total_alerts": 78,
"firing": 12,
"acknowledged": 5,
"silenced": 3,
"resolved": 58
},
"by_severity": {
"critical": 8,
"warning": 31,
"info": 39
},
"mttr_minutes": {
"critical": 14.2,
"warning": 42.7,
"info": 128.5,
"overall": 61.8
},
"top_affected_cis": [
{ "ci_id": "ci_7nLsW8kF9xYt", "ci_name": "api-server-prod-03", "alert_count": 7 },
{ "ci_id": "ci_8mWqR3nT5vXp", "ci_name": "checkout-service", "alert_count": 5 },
{ "ci_id": "ci_3kPmN8rT2wXq", "ci_name": "db-primary-01", "alert_count": 4 },
{ "ci_id": "ci_6jHtK4pR8uWn", "ci_name": "redis-cluster-prod", "alert_count": 3 },
{ "ci_id": "ci_2kNpL7mS9wYr", "ci_name": "kafka-broker-02", "alert_count": 3 }
],
"tickets_created": 6,
"timeline": [
{ "hour": "2026-02-24T00:00:00Z", "fired": 2, "resolved": 3 },
{ "hour": "2026-02-24T01:00:00Z", "fired": 1, "resolved": 2 },
{ "hour": "2026-02-24T02:00:00Z", "fired": 0, "resolved": 1 }
]
}
Get correlated alerts
This endpoint returns correlated alerts and the affected CMDB topology for a given alert. The correlation engine groups alerts that share common infrastructure — for example, a failing database will correlate with application-level errors on services that depend on it. This provides blast-radius visibility and helps identify root cause faster.
Request
curl http://localhost:3000/v1/monitoring/correlation/alrt_9xKpQ2mR4wVbN3 \
-H "Authorization: Bearer {token}"
Response
{
"alert_id": "alrt_9xKpQ2mR4wVbN3",
"correlated_alerts": [
{
"id": "alrt_3nLsW8kF9xYtP7",
"alert_name": "HighErrorRate",
"source": "datadog",
"target": "checkout-service",
"severity": "critical",
"correlation_reason": "downstream_dependency",
"fired_at": 1708732500
},
{
"id": "alrt_8mWqR3nT5vXpL2",
"alert_name": "HighLatency",
"source": "prometheus",
"target": "user-service",
"severity": "warning",
"correlation_reason": "downstream_dependency",
"fired_at": 1708732600
},
{
"id": "alrt_6jHtK4pR8uWnM4",
"alert_name": "ConnectionPoolExhausted",
"source": "grafana",
"target": "api-server-prod-03",
"severity": "warning",
"correlation_reason": "same_host",
"fired_at": 1708732750
}
],
"topology": {
"root_ci": {
"id": "ci_7nLsW8kF9xYt",
"name": "api-server-prod-03",
"type": "server",
"status": "critical"
},
"affected_cis": [
{ "id": "ci_8mWqR3nT5vXp", "name": "checkout-service", "type": "application", "status": "critical" },
{ "id": "ci_2kNpL7mS9wYr", "name": "user-service", "type": "application", "status": "warning" },
{ "id": "ci_4kPmN8rT2wXq", "name": "inventory-service", "type": "application", "status": "warning" }
],
"blast_radius": 4,
"estimated_users_affected": 12400
}
}