Monitoring & APM Integration

The Monitoring integration connects your ITSM Automation platform to observability platforms — Prometheus, Grafana, Datadog, New Relic, Dynatrace, Zabbix, Nagios, and PRTG. Alerts from these platforms auto-create incidents with AI-enriched context, correlated with CMDB topology for blast-radius analysis. Every alert is matched against Configuration Items in your CMDB so on-call teams see exactly which services and dependencies are affected.

The alert model

The alert model represents a monitoring alert ingested from an external observability platform. Each alert is automatically correlated with CMDB Configuration Items, enriched with AI context, and optionally linked to an auto-created ITSM ticket.

Properties

Name
id
Type
string
Description
Unique identifier for the alert.
Name
source
Type
string
Description
The monitoring platform that generated the alert. One of prometheus, grafana, datadog, newrelic, dynatrace, zabbix, nagios, or prtg.
Name
alert_name
Type
string
Description
The alert rule name as defined in the source platform.
Name
severity
Type
string
Description
Alert severity level. One of critical, warning, or info.
Name
status
Type
string
Description
Current alert status. One of firing, resolved, acknowledged, or silenced.
Name
target
Type
string
Description
The affected resource — a hostname, service name, or URL.
Name
ci_id
Type
string
Description
Auto-matched CMDB Configuration Item ID based on target and labels.
Name
metric_name
Type
string
Description
The metric that triggered the alert (e.g., cpu_usage_percent, http_request_duration_seconds).
Name
metric_value
Type
number
Description
The current value of the metric at the time the alert fired.
Name
threshold
Type
number
Description
The threshold value that was breached to trigger the alert.
Name
description
Type
string
Description
Human-readable description of the alert condition.
Name
labels
Type
object
Description
Key-value labels from the source platform (e.g., {"env": "production", "cluster": "us-east-1"}).
Name
annotations
Type
object
Description
Annotations from the source platform containing runbook URLs, dashboards, or additional context.
Name
ticket_id
Type
string
Description
The ID of the auto-created ITSM ticket, if the alert triggered incident creation.
Name
fired_at
Type
timestamp
Description
Timestamp of when the alert first fired.
Name
resolved_at
Type
timestamp
Description
Timestamp of when the alert was resolved. Null if still active.
Name
org_id
Type
string
Description
The tenant organization ID that owns this alert.

GET/v1/monitoring/alerts

List alerts

This endpoint returns a paginated list of monitoring alerts across all connected sources. You can filter by source platform, severity, status, and target resource. Results are ordered by fired_at descending (most recent first).

Optional attributes

Name
source
Type
string
Description
Filter by monitoring source: prometheus, grafana, datadog, newrelic, dynatrace, zabbix, nagios, or prtg.
Name
severity
Type
string
Description
Filter by severity: critical, warning, or info.
Name
status
Type
string
Description
Filter by alert status: firing, resolved, acknowledged, or silenced.
Name
target
Type
string
Description
Filter by affected resource hostname or service name.
Name
limit
Type
integer
Description
Limit the number of alerts returned. Default is 20, maximum is 100.

Request

GET

/v1/monitoring/alerts

curl -G http://localhost:3000/v1/monitoring/alerts \
  -H "Authorization: Bearer {token}" \
  -d severity=critical \
  -d status=firing \
  -d limit=10

Response

{
  "has_more": true,
  "data": [
    {
      "id": "alrt_9xKpQ2mR4wVbN3",
      "source": "prometheus",
      "alert_name": "HighCpuUsage",
      "severity": "critical",
      "status": "firing",
      "target": "api-server-prod-03.us-east-1",
      "ci_id": "ci_7nLsW8kF9xYt",
      "metric_name": "cpu_usage_percent",
      "metric_value": 97.3,
      "threshold": 90.0,
      "description": "CPU usage on api-server-prod-03 has exceeded 90% for more than 5 minutes",
      "labels": {
        "env": "production",
        "cluster": "us-east-1",
        "service": "api-gateway",
        "team": "platform"
      },
      "annotations": {
        "runbook": "https://wiki.internal/runbooks/high-cpu",
        "dashboard": "https://grafana.internal/d/cpu-overview"
      },
      "ticket_id": "tkt_4kPmN8rT2wXq",
      "fired_at": 1708732800,
      "resolved_at": null,
      "org_id": "org_5jHtK4pR8uWn"
    },
    {
      "id": "alrt_3nLsW8kF9xYtP7",
      "source": "datadog",
      "alert_name": "HighErrorRate",
      "severity": "critical",
      "status": "firing",
      "target": "checkout-service",
      "ci_id": "ci_8mWqR3nT5vXp",
      "metric_name": "http_5xx_rate",
      "metric_value": 12.8,
      "threshold": 5.0,
      "description": "5xx error rate for checkout-service has exceeded 5% for 3 minutes",
      "labels": {
        "env": "production",
        "service": "checkout",
        "region": "eu-west-1"
      },
      "annotations": {
        "runbook": "https://wiki.internal/runbooks/checkout-errors"
      },
      "ticket_id": "tkt_6jHtK4pR8uWn",
      "fired_at": 1708732500,
      "resolved_at": null,
      "org_id": "org_5jHtK4pR8uWn"
    }
  ]
}

GET/v1/monitoring/alerts/:id

Get alert details

This endpoint retrieves the full details of a single alert, including its CMDB correlation data. The response includes the matched Configuration Item, its upstream and downstream dependencies, and a blast-radius summary showing which services may be affected.

Request

GET

/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3

curl http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3 \
  -H "Authorization: Bearer {token}"

Response

{
  "id": "alrt_9xKpQ2mR4wVbN3",
  "source": "prometheus",
  "alert_name": "HighCpuUsage",
  "severity": "critical",
  "status": "firing",
  "target": "api-server-prod-03.us-east-1",
  "ci_id": "ci_7nLsW8kF9xYt",
  "metric_name": "cpu_usage_percent",
  "metric_value": 97.3,
  "threshold": 90.0,
  "description": "CPU usage on api-server-prod-03 has exceeded 90% for more than 5 minutes",
  "labels": {
    "env": "production",
    "cluster": "us-east-1",
    "service": "api-gateway",
    "team": "platform"
  },
  "annotations": {
    "runbook": "https://wiki.internal/runbooks/high-cpu",
    "dashboard": "https://grafana.internal/d/cpu-overview"
  },
  "ticket_id": "tkt_4kPmN8rT2wXq",
  "fired_at": 1708732800,
  "resolved_at": null,
  "org_id": "org_5jHtK4pR8uWn",
  "cmdb_correlation": {
    "ci_name": "api-server-prod-03",
    "ci_type": "server",
    "environment": "production",
    "owner_team": "platform-engineering",
    "upstream_dependencies": ["load-balancer-prod", "dns-prod"],
    "downstream_dependencies": ["checkout-service", "user-service", "inventory-service"],
    "blast_radius": 3
  }
}

POST/v1/monitoring/alerts/:id/acknowledge

Acknowledge an alert

This endpoint marks an alert as acknowledged. Acknowledging an alert signals that an engineer is aware of the issue and is actively investigating. Acknowledged alerts stop triggering repeat notifications but remain visible in dashboards.

Optional attributes

Name
comment
Type
string
Description
A note explaining who is investigating and what the initial assessment is.

Request

POST

/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/acknowledge

curl -X POST http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/acknowledge \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "comment": "Investigating high CPU — likely caused by the 14:00 deployment. Rolling back."
  }'

Response

{
  "id": "alrt_9xKpQ2mR4wVbN3",
  "status": "acknowledged",
  "acknowledged_by": "usr_2kNpL7mS9wYr",
  "acknowledged_at": 1708733100,
  "comment": "Investigating high CPU — likely caused by the 14:00 deployment. Rolling back."
}

POST/v1/monitoring/alerts/:id/silence

Silence an alert

This endpoint silences an alert for a specified duration. Silenced alerts are suppressed from notifications and dashboards but continue to be recorded. Use this for planned maintenance windows or known conditions that are being addressed.

Required attributes

Name
duration_minutes
Type
integer
Description
How long to silence the alert, in minutes. Maximum is 10080 (7 days).

Optional attributes

Name
reason
Type
string
Description
The reason for silencing this alert (e.g., scheduled maintenance, known issue).

Request

POST

/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/silence

curl -X POST http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/silence \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "duration_minutes": 120,
    "reason": "Scheduled maintenance window — server reboot and patch application"
  }'

Response

{
  "id": "alrt_9xKpQ2mR4wVbN3",
  "status": "silenced",
  "silenced_by": "usr_2kNpL7mS9wYr",
  "silenced_at": 1708733200,
  "silenced_until": 1708740400,
  "reason": "Scheduled maintenance window — server reboot and patch application"
}

POST/v1/monitoring/webhook

Ingest alert webhook

This endpoint ingests alert webhooks from monitoring platforms. It supports the native webhook formats of Prometheus Alertmanager, Grafana, Datadog, New Relic, Dynatrace, Zabbix, Nagios, and PRTG. The platform auto-detects the source format, normalizes the alert into the internal model, matches against CMDB CIs, and optionally creates an ITSM ticket based on configured rules.

Authentication for webhooks uses HMAC-SHA256 signatures via the X-Webhook-Signature header, configured per monitoring source.

Required attributes

Name
body
Type
object
Description
The raw webhook payload from the monitoring platform. Format varies by source — the platform auto-detects and normalizes.

Request

POST

/v1/monitoring/webhook

curl -X POST http://localhost:3000/v1/monitoring/webhook \
  -H "X-Webhook-Signature: sha256=a1b2c3d4e5f6..." \
  -H "Content-Type: application/json" \
  -d '{
    "receiver": "itsm-automation",
    "status": "firing",
    "alerts": [
      {
        "status": "firing",
        "labels": {
          "alertname": "HighMemoryUsage",
          "severity": "warning",
          "instance": "db-primary-01:9090",
          "job": "postgres"
        },
        "annotations": {
          "summary": "Memory usage above 85% on db-primary-01",
          "runbook_url": "https://wiki.internal/runbooks/high-memory"
        },
        "startsAt": "2026-02-24T14:00:00.000Z",
        "endsAt": "0001-01-01T00:00:00Z"
      }
    ],
    "groupLabels": { "alertname": "HighMemoryUsage" },
    "commonLabels": { "severity": "warning" }
  }'

Response

{
  "ingested": 1,
  "alerts": [
    {
      "id": "alrt_5pRtH5jN2cEmK8",
      "source": "prometheus",
      "alert_name": "HighMemoryUsage",
      "severity": "warning",
      "status": "firing",
      "target": "db-primary-01",
      "ci_id": "ci_3kPmN8rT2wXq",
      "metric_name": "memory_usage_percent",
      "metric_value": 87.2,
      "threshold": 85.0,
      "ticket_id": null,
      "fired_at": 1708783200
    }
  ]
}

GET/v1/monitoring/sources

List monitoring sources

This endpoint returns all configured monitoring sources with their connection health and sync status. Each source represents a connected observability platform that sends alerts into the ITSM platform.

Request

GET

/v1/monitoring/sources

curl http://localhost:3000/v1/monitoring/sources \
  -H "Authorization: Bearer {token}"

Response

{
  "data": [
    {
      "id": "msrc_7xKpQ2mR4wVb",
      "type": "prometheus",
      "name": "Prometheus Production",
      "status": "connected",
      "endpoint": "https://prometheus.internal:9090",
      "auth_type": "bearer_token",
      "webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
      "health": {
        "status": "healthy",
        "last_alert_received_at": 1708732800,
        "alerts_ingested_24h": 47,
        "latency_ms": 12
      },
      "created_at": 1706140800
    },
    {
      "id": "msrc_3nLsW8kF9xYt",
      "type": "datadog",
      "name": "Datadog EU",
      "status": "connected",
      "endpoint": "https://api.datadoghq.eu",
      "auth_type": "api_key",
      "webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
      "health": {
        "status": "healthy",
        "last_alert_received_at": 1708732500,
        "alerts_ingested_24h": 23,
        "latency_ms": 89
      },
      "created_at": 1706227200
    },
    {
      "id": "msrc_9pRtH5jN2cEm",
      "type": "grafana",
      "name": "Grafana Cloud",
      "status": "connected",
      "endpoint": "https://grafana.company.com",
      "auth_type": "api_key",
      "webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
      "health": {
        "status": "degraded",
        "last_alert_received_at": 1708730000,
        "alerts_ingested_24h": 8,
        "latency_ms": 342
      },
      "created_at": 1706313600
    }
  ]
}

POST/v1/monitoring/sources

Configure a monitoring source

This endpoint configures a new monitoring source. Once configured, the platform generates a unique webhook URL for the source and validates the connection. Alert-to-ticket rules can be configured to automatically create incidents when specific severity levels or alert names are received.

Required attributes

Name
type
Type
string
Description
The monitoring platform type. One of prometheus, grafana, datadog, newrelic, dynatrace, zabbix, nagios, or prtg.
Name
name
Type
string
Description
Human-readable name for this monitoring source.
Name
auth
Type
object
Description
Authentication credentials. Structure varies by platform type.

Optional attributes

Name
endpoint
Type
string
Description
The monitoring platform API endpoint URL. Required for pull-based sources.
Name
auto_create_ticket
Type
object
Description
Rules for automatic ticket creation. Contains enabled (boolean), severity_threshold (string), and alert_names (array of strings).

Request

POST

/v1/monitoring/sources

curl -X POST http://localhost:3000/v1/monitoring/sources \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "prometheus",
    "name": "Prometheus Production",
    "auth": {
      "type": "bearer_token",
      "token": "prom-token-abc123"
    },
    "endpoint": "https://prometheus.internal:9090",
    "auto_create_ticket": {
      "enabled": true,
      "severity_threshold": "critical",
      "alert_names": ["HighCpuUsage", "HighMemoryUsage", "DiskSpaceLow"]
    }
  }'

Response

{
  "id": "msrc_4kPmN8rT2wXq",
  "type": "prometheus",
  "name": "Prometheus Production",
  "status": "connected",
  "endpoint": "https://prometheus.internal:9090",
  "auth_type": "bearer_token",
  "webhook_url": "https://itsm.company.com/v1/monitoring/webhook?source=msrc_4kPmN8rT2wXq",
  "webhook_secret": "whsec_a1b2c3d4e5f6g7h8",
  "auto_create_ticket": {
    "enabled": true,
    "severity_threshold": "critical",
    "alert_names": ["HighCpuUsage", "HighMemoryUsage", "DiskSpaceLow"]
  },
  "health": {
    "status": "healthy",
    "last_alert_received_at": null,
    "alerts_ingested_24h": 0,
    "latency_ms": 45
  },
  "created_at": 1708733400
}

GET/v1/monitoring/dashboard

Get monitoring dashboard

This endpoint returns aggregated monitoring dashboard data for your organization. It includes alert counts grouped by severity and status, mean time to resolution (MTTR), top affected Configuration Items, and a timeline of alert activity over the past 24 hours.

Optional attributes

Name
period
Type
string
Description
Time period for aggregation. One of 1h, 6h, 24h, 7d, or 30d. Defaults to 24h.
Name
source
Type
string
Description
Filter dashboard data to a specific monitoring source.

Request

GET

/v1/monitoring/dashboard

curl -G http://localhost:3000/v1/monitoring/dashboard \
  -H "Authorization: Bearer {token}" \
  -d period=24h

Response

{
  "period": "24h",
  "summary": {
    "total_alerts": 78,
    "firing": 12,
    "acknowledged": 5,
    "silenced": 3,
    "resolved": 58
  },
  "by_severity": {
    "critical": 8,
    "warning": 31,
    "info": 39
  },
  "mttr_minutes": {
    "critical": 14.2,
    "warning": 42.7,
    "info": 128.5,
    "overall": 61.8
  },
  "top_affected_cis": [
    { "ci_id": "ci_7nLsW8kF9xYt", "ci_name": "api-server-prod-03", "alert_count": 7 },
    { "ci_id": "ci_8mWqR3nT5vXp", "ci_name": "checkout-service", "alert_count": 5 },
    { "ci_id": "ci_3kPmN8rT2wXq", "ci_name": "db-primary-01", "alert_count": 4 },
    { "ci_id": "ci_6jHtK4pR8uWn", "ci_name": "redis-cluster-prod", "alert_count": 3 },
    { "ci_id": "ci_2kNpL7mS9wYr", "ci_name": "kafka-broker-02", "alert_count": 3 }
  ],
  "tickets_created": 6,
  "timeline": [
    { "hour": "2026-02-24T00:00:00Z", "fired": 2, "resolved": 3 },
    { "hour": "2026-02-24T01:00:00Z", "fired": 1, "resolved": 2 },
    { "hour": "2026-02-24T02:00:00Z", "fired": 0, "resolved": 1 }
  ]
}

GET/v1/monitoring/correlation/:alertId

Get correlated alerts

This endpoint returns correlated alerts and the affected CMDB topology for a given alert. The correlation engine groups alerts that share common infrastructure — for example, a failing database will correlate with application-level errors on services that depend on it. This provides blast-radius visibility and helps identify root cause faster.

Request

GET

/v1/monitoring/correlation/alrt_9xKpQ2mR4wVbN3

curl http://localhost:3000/v1/monitoring/correlation/alrt_9xKpQ2mR4wVbN3 \
  -H "Authorization: Bearer {token}"

Response

{
  "alert_id": "alrt_9xKpQ2mR4wVbN3",
  "correlated_alerts": [
    {
      "id": "alrt_3nLsW8kF9xYtP7",
      "alert_name": "HighErrorRate",
      "source": "datadog",
      "target": "checkout-service",
      "severity": "critical",
      "correlation_reason": "downstream_dependency",
      "fired_at": 1708732500
    },
    {
      "id": "alrt_8mWqR3nT5vXpL2",
      "alert_name": "HighLatency",
      "source": "prometheus",
      "target": "user-service",
      "severity": "warning",
      "correlation_reason": "downstream_dependency",
      "fired_at": 1708732600
    },
    {
      "id": "alrt_6jHtK4pR8uWnM4",
      "alert_name": "ConnectionPoolExhausted",
      "source": "grafana",
      "target": "api-server-prod-03",
      "severity": "warning",
      "correlation_reason": "same_host",
      "fired_at": 1708732750
    }
  ],
  "topology": {
    "root_ci": {
      "id": "ci_7nLsW8kF9xYt",
      "name": "api-server-prod-03",
      "type": "server",
      "status": "critical"
    },
    "affected_cis": [
      { "id": "ci_8mWqR3nT5vXp", "name": "checkout-service", "type": "application", "status": "critical" },
      { "id": "ci_2kNpL7mS9wYr", "name": "user-service", "type": "application", "status": "warning" },
      { "id": "ci_4kPmN8rT2wXq", "name": "inventory-service", "type": "application", "status": "warning" }
    ],
    "blast_radius": 4,
    "estimated_users_affected": 12400
  }
}