Monitoring & APM Integration

The Monitoring integration connects your ITSM Automation platform to observability platforms — Prometheus, Grafana, Datadog, New Relic, Dynatrace, Zabbix, Nagios, and PRTG. Alerts from these platforms auto-create incidents with AI-enriched context, correlated with CMDB topology for blast-radius analysis. Every alert is matched against Configuration Items in your CMDB so on-call teams see exactly which services and dependencies are affected.

The alert model

The alert model represents a monitoring alert ingested from an external observability platform. Each alert is automatically correlated with CMDB Configuration Items, enriched with AI context, and optionally linked to an auto-created ITSM ticket.

Properties

  • Name
    id
    Type
    string
    Description

    Unique identifier for the alert.

  • Name
    source
    Type
    string
    Description

    The monitoring platform that generated the alert. One of prometheus, grafana, datadog, newrelic, dynatrace, zabbix, nagios, or prtg.

  • Name
    alert_name
    Type
    string
    Description

    The alert rule name as defined in the source platform.

  • Name
    severity
    Type
    string
    Description

    Alert severity level. One of critical, warning, or info.

  • Name
    status
    Type
    string
    Description

    Current alert status. One of firing, resolved, acknowledged, or silenced.

  • Name
    target
    Type
    string
    Description

    The affected resource — a hostname, service name, or URL.

  • Name
    ci_id
    Type
    string
    Description

    Auto-matched CMDB Configuration Item ID based on target and labels.

  • Name
    metric_name
    Type
    string
    Description

    The metric that triggered the alert (e.g., cpu_usage_percent, http_request_duration_seconds).

  • Name
    metric_value
    Type
    number
    Description

    The current value of the metric at the time the alert fired.

  • Name
    threshold
    Type
    number
    Description

    The threshold value that was breached to trigger the alert.

  • Name
    description
    Type
    string
    Description

    Human-readable description of the alert condition.

  • Name
    labels
    Type
    object
    Description

    Key-value labels from the source platform (e.g., {"env": "production", "cluster": "us-east-1"}).

  • Name
    annotations
    Type
    object
    Description

    Annotations from the source platform containing runbook URLs, dashboards, or additional context.

  • Name
    ticket_id
    Type
    string
    Description

    The ID of the auto-created ITSM ticket, if the alert triggered incident creation.

  • Name
    fired_at
    Type
    timestamp
    Description

    Timestamp of when the alert first fired.

  • Name
    resolved_at
    Type
    timestamp
    Description

    Timestamp of when the alert was resolved. Null if still active.

  • Name
    org_id
    Type
    string
    Description

    The tenant organization ID that owns this alert.


GET/v1/monitoring/alerts

List alerts

This endpoint returns a paginated list of monitoring alerts across all connected sources. You can filter by source platform, severity, status, and target resource. Results are ordered by fired_at descending (most recent first).

Optional attributes

  • Name
    source
    Type
    string
    Description

    Filter by monitoring source: prometheus, grafana, datadog, newrelic, dynatrace, zabbix, nagios, or prtg.

  • Name
    severity
    Type
    string
    Description

    Filter by severity: critical, warning, or info.

  • Name
    status
    Type
    string
    Description

    Filter by alert status: firing, resolved, acknowledged, or silenced.

  • Name
    target
    Type
    string
    Description

    Filter by affected resource hostname or service name.

  • Name
    limit
    Type
    integer
    Description

    Limit the number of alerts returned. Default is 20, maximum is 100.

Request

GET
/v1/monitoring/alerts
curl -G http://localhost:3000/v1/monitoring/alerts \
  -H "Authorization: Bearer {token}" \
  -d severity=critical \
  -d status=firing \
  -d limit=10

Response

{
  "has_more": true,
  "data": [
    {
      "id": "alrt_9xKpQ2mR4wVbN3",
      "source": "prometheus",
      "alert_name": "HighCpuUsage",
      "severity": "critical",
      "status": "firing",
      "target": "api-server-prod-03.us-east-1",
      "ci_id": "ci_7nLsW8kF9xYt",
      "metric_name": "cpu_usage_percent",
      "metric_value": 97.3,
      "threshold": 90.0,
      "description": "CPU usage on api-server-prod-03 has exceeded 90% for more than 5 minutes",
      "labels": {
        "env": "production",
        "cluster": "us-east-1",
        "service": "api-gateway",
        "team": "platform"
      },
      "annotations": {
        "runbook": "https://wiki.internal/runbooks/high-cpu",
        "dashboard": "https://grafana.internal/d/cpu-overview"
      },
      "ticket_id": "tkt_4kPmN8rT2wXq",
      "fired_at": 1708732800,
      "resolved_at": null,
      "org_id": "org_5jHtK4pR8uWn"
    },
    {
      "id": "alrt_3nLsW8kF9xYtP7",
      "source": "datadog",
      "alert_name": "HighErrorRate",
      "severity": "critical",
      "status": "firing",
      "target": "checkout-service",
      "ci_id": "ci_8mWqR3nT5vXp",
      "metric_name": "http_5xx_rate",
      "metric_value": 12.8,
      "threshold": 5.0,
      "description": "5xx error rate for checkout-service has exceeded 5% for 3 minutes",
      "labels": {
        "env": "production",
        "service": "checkout",
        "region": "eu-west-1"
      },
      "annotations": {
        "runbook": "https://wiki.internal/runbooks/checkout-errors"
      },
      "ticket_id": "tkt_6jHtK4pR8uWn",
      "fired_at": 1708732500,
      "resolved_at": null,
      "org_id": "org_5jHtK4pR8uWn"
    }
  ]
}

GET/v1/monitoring/alerts/:id

Get alert details

This endpoint retrieves the full details of a single alert, including its CMDB correlation data. The response includes the matched Configuration Item, its upstream and downstream dependencies, and a blast-radius summary showing which services may be affected.

Request

GET
/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3
curl http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3 \
  -H "Authorization: Bearer {token}"

Response

{
  "id": "alrt_9xKpQ2mR4wVbN3",
  "source": "prometheus",
  "alert_name": "HighCpuUsage",
  "severity": "critical",
  "status": "firing",
  "target": "api-server-prod-03.us-east-1",
  "ci_id": "ci_7nLsW8kF9xYt",
  "metric_name": "cpu_usage_percent",
  "metric_value": 97.3,
  "threshold": 90.0,
  "description": "CPU usage on api-server-prod-03 has exceeded 90% for more than 5 minutes",
  "labels": {
    "env": "production",
    "cluster": "us-east-1",
    "service": "api-gateway",
    "team": "platform"
  },
  "annotations": {
    "runbook": "https://wiki.internal/runbooks/high-cpu",
    "dashboard": "https://grafana.internal/d/cpu-overview"
  },
  "ticket_id": "tkt_4kPmN8rT2wXq",
  "fired_at": 1708732800,
  "resolved_at": null,
  "org_id": "org_5jHtK4pR8uWn",
  "cmdb_correlation": {
    "ci_name": "api-server-prod-03",
    "ci_type": "server",
    "environment": "production",
    "owner_team": "platform-engineering",
    "upstream_dependencies": ["load-balancer-prod", "dns-prod"],
    "downstream_dependencies": ["checkout-service", "user-service", "inventory-service"],
    "blast_radius": 3
  }
}

POST/v1/monitoring/alerts/:id/acknowledge

Acknowledge an alert

This endpoint marks an alert as acknowledged. Acknowledging an alert signals that an engineer is aware of the issue and is actively investigating. Acknowledged alerts stop triggering repeat notifications but remain visible in dashboards.

Optional attributes

  • Name
    comment
    Type
    string
    Description

    A note explaining who is investigating and what the initial assessment is.

Request

POST
/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/acknowledge
curl -X POST http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/acknowledge \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "comment": "Investigating high CPU — likely caused by the 14:00 deployment. Rolling back."
  }'

Response

{
  "id": "alrt_9xKpQ2mR4wVbN3",
  "status": "acknowledged",
  "acknowledged_by": "usr_2kNpL7mS9wYr",
  "acknowledged_at": 1708733100,
  "comment": "Investigating high CPU — likely caused by the 14:00 deployment. Rolling back."
}

POST/v1/monitoring/alerts/:id/silence

Silence an alert

This endpoint silences an alert for a specified duration. Silenced alerts are suppressed from notifications and dashboards but continue to be recorded. Use this for planned maintenance windows or known conditions that are being addressed.

Required attributes

  • Name
    duration_minutes
    Type
    integer
    Description

    How long to silence the alert, in minutes. Maximum is 10080 (7 days).

Optional attributes

  • Name
    reason
    Type
    string
    Description

    The reason for silencing this alert (e.g., scheduled maintenance, known issue).

Request

POST
/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/silence
curl -X POST http://localhost:3000/v1/monitoring/alerts/alrt_9xKpQ2mR4wVbN3/silence \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "duration_minutes": 120,
    "reason": "Scheduled maintenance window — server reboot and patch application"
  }'

Response

{
  "id": "alrt_9xKpQ2mR4wVbN3",
  "status": "silenced",
  "silenced_by": "usr_2kNpL7mS9wYr",
  "silenced_at": 1708733200,
  "silenced_until": 1708740400,
  "reason": "Scheduled maintenance window — server reboot and patch application"
}

POST/v1/monitoring/webhook

Ingest alert webhook

This endpoint ingests alert webhooks from monitoring platforms. It supports the native webhook formats of Prometheus Alertmanager, Grafana, Datadog, New Relic, Dynatrace, Zabbix, Nagios, and PRTG. The platform auto-detects the source format, normalizes the alert into the internal model, matches against CMDB CIs, and optionally creates an ITSM ticket based on configured rules.

Authentication for webhooks uses HMAC-SHA256 signatures via the X-Webhook-Signature header, configured per monitoring source.

Required attributes

  • Name
    body
    Type
    object
    Description

    The raw webhook payload from the monitoring platform. Format varies by source — the platform auto-detects and normalizes.

Request

POST
/v1/monitoring/webhook
curl -X POST http://localhost:3000/v1/monitoring/webhook \
  -H "X-Webhook-Signature: sha256=a1b2c3d4e5f6..." \
  -H "Content-Type: application/json" \
  -d '{
    "receiver": "itsm-automation",
    "status": "firing",
    "alerts": [
      {
        "status": "firing",
        "labels": {
          "alertname": "HighMemoryUsage",
          "severity": "warning",
          "instance": "db-primary-01:9090",
          "job": "postgres"
        },
        "annotations": {
          "summary": "Memory usage above 85% on db-primary-01",
          "runbook_url": "https://wiki.internal/runbooks/high-memory"
        },
        "startsAt": "2026-02-24T14:00:00.000Z",
        "endsAt": "0001-01-01T00:00:00Z"
      }
    ],
    "groupLabels": { "alertname": "HighMemoryUsage" },
    "commonLabels": { "severity": "warning" }
  }'

Response

{
  "ingested": 1,
  "alerts": [
    {
      "id": "alrt_5pRtH5jN2cEmK8",
      "source": "prometheus",
      "alert_name": "HighMemoryUsage",
      "severity": "warning",
      "status": "firing",
      "target": "db-primary-01",
      "ci_id": "ci_3kPmN8rT2wXq",
      "metric_name": "memory_usage_percent",
      "metric_value": 87.2,
      "threshold": 85.0,
      "ticket_id": null,
      "fired_at": 1708783200
    }
  ]
}

GET/v1/monitoring/sources

List monitoring sources

This endpoint returns all configured monitoring sources with their connection health and sync status. Each source represents a connected observability platform that sends alerts into the ITSM platform.

Request

GET
/v1/monitoring/sources
curl http://localhost:3000/v1/monitoring/sources \
  -H "Authorization: Bearer {token}"

Response

{
  "data": [
    {
      "id": "msrc_7xKpQ2mR4wVb",
      "type": "prometheus",
      "name": "Prometheus Production",
      "status": "connected",
      "endpoint": "https://prometheus.internal:9090",
      "auth_type": "bearer_token",
      "webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
      "health": {
        "status": "healthy",
        "last_alert_received_at": 1708732800,
        "alerts_ingested_24h": 47,
        "latency_ms": 12
      },
      "created_at": 1706140800
    },
    {
      "id": "msrc_3nLsW8kF9xYt",
      "type": "datadog",
      "name": "Datadog EU",
      "status": "connected",
      "endpoint": "https://api.datadoghq.eu",
      "auth_type": "api_key",
      "webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
      "health": {
        "status": "healthy",
        "last_alert_received_at": 1708732500,
        "alerts_ingested_24h": 23,
        "latency_ms": 89
      },
      "created_at": 1706227200
    },
    {
      "id": "msrc_9pRtH5jN2cEm",
      "type": "grafana",
      "name": "Grafana Cloud",
      "status": "connected",
      "endpoint": "https://grafana.company.com",
      "auth_type": "api_key",
      "webhook_url": "https://itsm.company.com/v1/monitoring/webhook",
      "health": {
        "status": "degraded",
        "last_alert_received_at": 1708730000,
        "alerts_ingested_24h": 8,
        "latency_ms": 342
      },
      "created_at": 1706313600
    }
  ]
}

POST/v1/monitoring/sources

Configure a monitoring source

This endpoint configures a new monitoring source. Once configured, the platform generates a unique webhook URL for the source and validates the connection. Alert-to-ticket rules can be configured to automatically create incidents when specific severity levels or alert names are received.

Required attributes

  • Name
    type
    Type
    string
    Description

    The monitoring platform type. One of prometheus, grafana, datadog, newrelic, dynatrace, zabbix, nagios, or prtg.

  • Name
    name
    Type
    string
    Description

    Human-readable name for this monitoring source.

  • Name
    auth
    Type
    object
    Description

    Authentication credentials. Structure varies by platform type.

Optional attributes

  • Name
    endpoint
    Type
    string
    Description

    The monitoring platform API endpoint URL. Required for pull-based sources.

  • Name
    auto_create_ticket
    Type
    object
    Description

    Rules for automatic ticket creation. Contains enabled (boolean), severity_threshold (string), and alert_names (array of strings).

Request

POST
/v1/monitoring/sources
curl -X POST http://localhost:3000/v1/monitoring/sources \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "prometheus",
    "name": "Prometheus Production",
    "auth": {
      "type": "bearer_token",
      "token": "prom-token-abc123"
    },
    "endpoint": "https://prometheus.internal:9090",
    "auto_create_ticket": {
      "enabled": true,
      "severity_threshold": "critical",
      "alert_names": ["HighCpuUsage", "HighMemoryUsage", "DiskSpaceLow"]
    }
  }'

Response

{
  "id": "msrc_4kPmN8rT2wXq",
  "type": "prometheus",
  "name": "Prometheus Production",
  "status": "connected",
  "endpoint": "https://prometheus.internal:9090",
  "auth_type": "bearer_token",
  "webhook_url": "https://itsm.company.com/v1/monitoring/webhook?source=msrc_4kPmN8rT2wXq",
  "webhook_secret": "whsec_a1b2c3d4e5f6g7h8",
  "auto_create_ticket": {
    "enabled": true,
    "severity_threshold": "critical",
    "alert_names": ["HighCpuUsage", "HighMemoryUsage", "DiskSpaceLow"]
  },
  "health": {
    "status": "healthy",
    "last_alert_received_at": null,
    "alerts_ingested_24h": 0,
    "latency_ms": 45
  },
  "created_at": 1708733400
}

GET/v1/monitoring/dashboard

Get monitoring dashboard

This endpoint returns aggregated monitoring dashboard data for your organization. It includes alert counts grouped by severity and status, mean time to resolution (MTTR), top affected Configuration Items, and a timeline of alert activity over the past 24 hours.

Optional attributes

  • Name
    period
    Type
    string
    Description

    Time period for aggregation. One of 1h, 6h, 24h, 7d, or 30d. Defaults to 24h.

  • Name
    source
    Type
    string
    Description

    Filter dashboard data to a specific monitoring source.

Request

GET
/v1/monitoring/dashboard
curl -G http://localhost:3000/v1/monitoring/dashboard \
  -H "Authorization: Bearer {token}" \
  -d period=24h

Response

{
  "period": "24h",
  "summary": {
    "total_alerts": 78,
    "firing": 12,
    "acknowledged": 5,
    "silenced": 3,
    "resolved": 58
  },
  "by_severity": {
    "critical": 8,
    "warning": 31,
    "info": 39
  },
  "mttr_minutes": {
    "critical": 14.2,
    "warning": 42.7,
    "info": 128.5,
    "overall": 61.8
  },
  "top_affected_cis": [
    { "ci_id": "ci_7nLsW8kF9xYt", "ci_name": "api-server-prod-03", "alert_count": 7 },
    { "ci_id": "ci_8mWqR3nT5vXp", "ci_name": "checkout-service", "alert_count": 5 },
    { "ci_id": "ci_3kPmN8rT2wXq", "ci_name": "db-primary-01", "alert_count": 4 },
    { "ci_id": "ci_6jHtK4pR8uWn", "ci_name": "redis-cluster-prod", "alert_count": 3 },
    { "ci_id": "ci_2kNpL7mS9wYr", "ci_name": "kafka-broker-02", "alert_count": 3 }
  ],
  "tickets_created": 6,
  "timeline": [
    { "hour": "2026-02-24T00:00:00Z", "fired": 2, "resolved": 3 },
    { "hour": "2026-02-24T01:00:00Z", "fired": 1, "resolved": 2 },
    { "hour": "2026-02-24T02:00:00Z", "fired": 0, "resolved": 1 }
  ]
}

GET/v1/monitoring/correlation/:alertId

Get correlated alerts

This endpoint returns correlated alerts and the affected CMDB topology for a given alert. The correlation engine groups alerts that share common infrastructure — for example, a failing database will correlate with application-level errors on services that depend on it. This provides blast-radius visibility and helps identify root cause faster.

Request

GET
/v1/monitoring/correlation/alrt_9xKpQ2mR4wVbN3
curl http://localhost:3000/v1/monitoring/correlation/alrt_9xKpQ2mR4wVbN3 \
  -H "Authorization: Bearer {token}"

Response

{
  "alert_id": "alrt_9xKpQ2mR4wVbN3",
  "correlated_alerts": [
    {
      "id": "alrt_3nLsW8kF9xYtP7",
      "alert_name": "HighErrorRate",
      "source": "datadog",
      "target": "checkout-service",
      "severity": "critical",
      "correlation_reason": "downstream_dependency",
      "fired_at": 1708732500
    },
    {
      "id": "alrt_8mWqR3nT5vXpL2",
      "alert_name": "HighLatency",
      "source": "prometheus",
      "target": "user-service",
      "severity": "warning",
      "correlation_reason": "downstream_dependency",
      "fired_at": 1708732600
    },
    {
      "id": "alrt_6jHtK4pR8uWnM4",
      "alert_name": "ConnectionPoolExhausted",
      "source": "grafana",
      "target": "api-server-prod-03",
      "severity": "warning",
      "correlation_reason": "same_host",
      "fired_at": 1708732750
    }
  ],
  "topology": {
    "root_ci": {
      "id": "ci_7nLsW8kF9xYt",
      "name": "api-server-prod-03",
      "type": "server",
      "status": "critical"
    },
    "affected_cis": [
      { "id": "ci_8mWqR3nT5vXp", "name": "checkout-service", "type": "application", "status": "critical" },
      { "id": "ci_2kNpL7mS9wYr", "name": "user-service", "type": "application", "status": "warning" },
      { "id": "ci_4kPmN8rT2wXq", "name": "inventory-service", "type": "application", "status": "warning" }
    ],
    "blast_radius": 4,
    "estimated_users_affected": 12400
  }
}

Was this page helpful?