Architecture

The ITSM Automation platform is a multi-tenant, AI-driven IT Service Management system built for enterprise-scale operations. This document describes the full system architecture spanning 130+ NestJS modules, 430+ database tables, and a multi-provider AI gateway -- all deployed on Kubernetes with HPE AI Essentials.


Tech stack

The platform is composed of purpose-built layers, each selected for enterprise reliability and horizontal scalability. The backend alone contains 92 controllers exposing 500+ endpoints, while the frontend delivers 200+ routes through 90+ React components.

Backend

  • Name
    Runtime
    Type
    NestJS + TypeScript
    Description

    130+ modules, 92 controllers, 500+ REST endpoints with full OpenAPI documentation. Built on NestJS for dependency injection, modular architecture, and decorator-driven route handling.

  • Name
    ORM
    Type
    Prisma
    Description

    Type-safe database access with generated client, migrations, and introspection against PostgreSQL 16. Schema covers 430+ tables and 91 enums.

  • Name
    Validation
    Type
    class-validator + class-transformer
    Description

    DTO-level input validation on every endpoint with automatic transformation and whitelist enforcement.

Frontend

  • Name
    Framework
    Type
    Next.js 15 + React 19
    Description

    200+ routes with server-side rendering, React Server Components, and streaming. App Router architecture with nested layouts.

  • Name
    Styling
    Type
    TailwindCSS
    Description

    90+ components built with utility-first CSS, design tokens, and dark mode support across all tenant themes.

Infrastructure

  • Name
    Database
    Type
    PostgreSQL 16 + pgvector
    Description

    430+ tables, 91 enums, row-level security for tenant isolation. UUIDv7 primary keys for time-sortable identifiers.

  • Name
    Cache and Queue
    Type
    Redis + BullMQ
    Description

    Three Redis instances (cache, queue, locks). 12 BullMQ queues for background processing. Redlock for distributed locking.

  • Name
    Workflows
    Type
    Temporal.io
    Description

    5 task queues for durable workflow orchestration: ticket lifecycle, SLA enforcement, approval chains, scheduled reports, and bulk operations.

  • Name
    AI
    Type
    Multi-provider LLM Gateway
    Description

    OpenAI, vLLM (HPE 120B GPT), and Ollama with automatic fallback. Qdrant vector database for RAG and semantic search.

  • Name
    Deployment
    Type
    Kubernetes + Helm + Istio
    Description

    HPE AI Essentials for on-premises GPU inference. Prometheus, Grafana, and Jaeger for observability.

Quick verification

# Count NestJS modules
find backend/src -name "*.module.ts" | wc -l
# => 130+

# Count controllers
find backend/src -name "*.controller.ts" | wc -l
# => 92

Service topology

The platform runs as a set of independently deployable services, each with dedicated health checks and monitoring endpoints. All inter-service communication uses either gRPC (Temporal workers) or HTTP with circuit breakers.

Core services

  • Name
    API Server
    Type
    port 3000
    Description

    NestJS application serving all REST endpoints. Health check at /health returns service status, database connectivity, Redis availability, and Temporal connection state.

  • Name
    Frontend
    Type
    port 3001
    Description

    Next.js 15 application with SSR. Serves the tenant-aware dashboard, ticket management UI, knowledge base portal, and admin console.

  • Name
    Worker
    Type
    port 9464
    Description

    BullMQ processor and Temporal activity worker. Exposes Prometheus metrics at /metrics for queue depth, processing latency, and failure rates across all 12 queues.

Data services

  • Name
    PostgreSQL
    Type
    port 5432
    Description

    Primary datastore with 430+ tables, pgvector extension for embedding storage, and row-level security policies for multi-tenant isolation.

  • Name
    Redis
    Type
    port 6379
    Description

    Three Redis instances for Redlock distributed locking, BullMQ job queues, and application caching. Separate instances ensure lock contention does not affect queue throughput.

  • Name
    Qdrant
    Type
    port 6333
    Description

    Vector database for semantic search, RAG document retrieval, and similar-ticket lookup. Stores embeddings generated by the AI gateway.

  • Name
    Temporal
    Type
    port 7233
    Description

    Workflow orchestration engine managing 5 task queues for long-running processes: ticket lifecycle, SLA escalation, approval chains, scheduled reports, and bulk operations.

Health checks

curl -sf http://localhost:3000/health | jq
# {
#   "status": "healthy",
#   "database": "connected",
#   "redis": "connected",
#   "temporal": "connected",
#   "uptime": 86400
# }

Observability services

  • Name
    Prometheus
    Type
    port 9090
    Description

    Metrics collection and alerting. Scrapes all service endpoints on 15-second intervals with retention configured for 30 days.

  • Name
    Grafana
    Type
    port 3005
    Description

    Dashboards for SLA compliance, ticket throughput, AI model latency, queue health, and per-tenant usage. Pre-built dashboards ship with the Helm chart.

  • Name
    Jaeger
    Type
    port 16686
    Description

    Distributed tracing across API, worker, Temporal, and AI gateway calls. Every ticket triage request generates a full trace from ingestion through AI classification to database persistence.

Service map

{
  "services": {
    "api":        { "port": 3000, "health": "/health" },
    "frontend":   { "port": 3001, "health": "/" },
    "worker":     { "port": 9464, "health": "/metrics" },
    "temporal":   { "port": 7233 },
    "postgresql": { "port": 5432 },
    "redis":      { "port": 6379 },
    "qdrant":     { "port": 6333 },
    "prometheus": { "port": 9090 },
    "grafana":    { "port": 3005 },
    "jaeger":     { "port": 16686 }
  }
}

Architecture layers

The system follows a strict five-layer architecture. Each layer has a well-defined responsibility boundary and communicates only with its immediate neighbors.

Layer 1: Presentation

The presentation layer is stateless by design. All session state lives in JWT tokens and server-side caches.

  • Name
    Dashboard
    Type
    Next.js App Router
    Description

    Real-time ticket overview, SLA timers, team workload visualization, and AI-suggested actions. 45+ routes.

  • Name
    Tenant Portal
    Type
    Multi-theme SSR
    Description

    Organization-branded self-service portal for ticket submission, knowledge base, and resolution tracking.

  • Name
    Admin Console
    Type
    Role-gated routes
    Description

    System configuration, user management, SLA policy editor, workflow designer, and AI model tuning. 30+ routes.

Frontend structure

{
  "routes": {
    "dashboard": "45+ pages",
    "admin": "30+ pages",
    "portal": "25+ pages"
  },
  "stores": 12,
  "components": "90+"
}

Layer 2: API Gateway

Single entry point for all client requests with authentication, authorization, rate limiting, and validation.

  • Name
    Authentication
    Type
    JWT RS256
    Description

    Stateless JWT with 15-minute access tokens and 7-day refresh tokens. Per-device tracking and revocation via Redis.

  • Name
    Authorization
    Type
    RBAC + ABAC
    Description

    5 role levels: end_user, agent, team_lead, admin, org_owner. Permissions evaluated per-request via NestJS guards.

  • Name
    Rate Limiting
    Type
    Redis token bucket
    Description

    Per-tenant, per-endpoint rate limiting. Configurable burst and sustained rates per subscription tier.

  • Name
    Validation
    Type
    class-validator DTOs
    Description

    Every endpoint enforces input validation through DTO decorators with structured error responses.

Rate limit headers

curl -si http://localhost:3000/api/v1/tickets \
  -H "Authorization: Bearer {jwt_token}" \
  | grep -i "x-ratelimit"
# X-RateLimit-Limit: 1000
# X-RateLimit-Remaining: 997
# X-RateLimit-Reset: 1708819200

Layer 3: AI Orchestration

Provider-agnostic AI gateway routing inference to OpenAI, vLLM, or Ollama with automatic fallback. Every call is traced, logged, and validated through guardrails.

  • Name
    Intent Classification
    Type
    LLM inference
    Description

    Classifies tickets into service categories using the tenant service catalogue. Supports multiple providers with automatic fallback.

  • Name
    Priority Prediction
    Type
    LLM inference
    Description

    Predicts P1-P5 priority based on description, affected service, historical patterns, and SLA impact. Confidence scores attached.

  • Name
    Entity Extraction
    Type
    LLM inference
    Description

    Extracts structured entities: affected systems, error codes, user identifiers, Polish PII (PESEL, NIP, IBAN).

  • Name
    RAG Search
    Type
    Qdrant + LLM
    Description

    Retrieval-augmented generation over knowledge base and resolution history. Semantic similarity ranking.

AI triage request

curl -sf http://localhost:3000/api/v1/triage \
  -X POST \
  -H "Authorization: Bearer {jwt_token}" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Cannot access production database",
    "description": "Connection timeout on pg-prod-01 since 14:30 UTC. API pods returning 503.",
    "reporter_email": "oncall@acme.com"
  }' | jq
# {
#   "ticket_id": "TKT-20260224-0042",
#   "classification": {
#     "category": "Infrastructure",
#     "subcategory": "Database",
#     "confidence": 0.94
#   },
#   "priority": { "level": "P1", "confidence": 0.97 },
#   "entities": {
#     "affected_system": "pg-prod-01",
#     "error_type": "connection_timeout"
#   },
#   "model_used": "hpe-gpt-120b",
#   "trace_id": "abc123def456"
# }

Layer 4: Business Logic

Core ITSM domain: ticket lifecycle, SLA enforcement, approval workflows, and escalation policies. Long-running processes orchestrated through Temporal.

  • Name
    Ticket Lifecycle
    Type
    Temporal workflow
    Description

    Full state machine from creation through triage, assignment, resolution, and closure. Each transition audited.

  • Name
    SLA Engine
    Type
    BullMQ + Temporal
    Description

    Monitors response and resolution time targets per priority and tenant SLA policy. Auto-escalation on breach.

  • Name
    Approval Workflows
    Type
    Temporal workflow
    Description

    Multi-level approval chains for change requests. Sequential, parallel, and quorum-based patterns with timeout escalation.

  • Name
    Notification Engine
    Type
    BullMQ
    Description

    6 channels: email (SMTP), SMS (Twilio), push (FCM), Slack, Teams, webhooks. Dedicated queues with retry.

Queue and workflow configuration

{
  "queues": [
    { "name": "ticket-triage", "concurrency": 5 },
    { "name": "sla-monitor", "concurrency": 3 },
    { "name": "notification-email", "concurrency": 10 },
    { "name": "notification-slack", "concurrency": 5 },
    { "name": "notification-teams", "concurrency": 5 },
    { "name": "approval-processor", "concurrency": 3 },
    { "name": "report-generator", "concurrency": 2 },
    { "name": "bulk-operations", "concurrency": 2 },
    { "name": "knowledge-indexer", "concurrency": 3 },
    { "name": "audit-logger", "concurrency": 5 },
    { "name": "webhook-dispatch", "concurrency": 5 },
    { "name": "ai-embedding", "concurrency": 4 }
  ]
}

Layer 5: Data Layer

Persistent storage, caching, and vector search. Each technology serves a distinct purpose with no overlap.

  • Name
    PostgreSQL 16
    Type
    Primary datastore
    Description

    430+ tables with pgvector. All transactional data. Row-level security enforces tenant isolation at the database level.

  • Name
    Redis
    Type
    Cache + Queue + Locks
    Description

    Three instances: caching (6379), BullMQ queues (6380), Redlock distributed locking (6381).

  • Name
    Qdrant
    Type
    Vector database
    Description

    Embedding storage for semantic search. Collections partitioned by tenant. Configurable dimensions and distance metrics.

  • Name
    Temporal
    Type
    Workflow state
    Description

    Persists workflow execution state, activity results, and timer schedules. Exactly-once execution guarantees.

Database details

-- Table count
SELECT count(*)
FROM information_schema.tables
WHERE table_schema = 'public';
-- => 430+

-- Enum count
SELECT count(*)
FROM pg_type
WHERE typtype = 'e';
-- => 91

-- Row-level security policies
SELECT count(*)
FROM pg_policies;
-- => 200+

AI orchestration

The AI orchestration layer is provider-agnostic, routing requests through a unified gateway that supports multiple LLM backends. Every inference call is tracked with the model provider, token usage, latency, and confidence score. If the primary provider is unavailable, the gateway automatically falls back to the next configured provider.

Provider configuration

  • Name
    OpenAI
    Type
    Primary provider
    Description

    GPT-4 class models for high-accuracy classification, entity extraction, and knowledge synthesis. Used as the default provider for enterprise-tier tenants.

  • Name
    vLLM
    Type
    Self-hosted inference
    Description

    Runs HPE 120B GPT on-premises via HPE AI Essentials. Provides data-sovereign inference for tenants with compliance requirements that prohibit external API calls.

  • Name
    Ollama
    Type
    Development / fallback
    Description

    Local model inference for development environments and as a last-resort fallback. Supports Llama, Mistral, and other open-weight models.

  • Name
    Qdrant
    Type
    Vector search
    Description

    Embedding storage and retrieval for RAG pipelines. Collections are partitioned by tenant with configurable embedding dimensions and distance metrics.

Guardrails

Every AI response passes through validation before reaching the caller. The guardrail pipeline checks for hallucinated entity references, toxic content, confidence thresholds, and schema compliance. Responses that fail validation are rejected and the request is retried with an adjusted prompt or routed to a human agent.

AI provider configuration

ai:
  gateway:
    default_provider: openai
    fallback_order:
      - openai
      - vllm
      - ollama
    timeout_ms: 30000
    retry:
      max_attempts: 3
      backoff_ms: 1000

  providers:
    openai:
      base_url: https://api.openai.com/v1
      model: gpt-4o
      max_tokens: 4096

    vllm:
      base_url: http://vllm.internal:8000/v1
      model: hpe-gpt-120b
      max_tokens: 4096

    ollama:
      base_url: http://ollama.internal:11434/v1
      model: llama3.1:70b
      max_tokens: 4096

  guardrails:
    min_confidence: 0.6
    max_retries_on_low_confidence: 2
    content_filter: enabled
    schema_validation: strict
    hallucination_check: enabled

Data layer

Database schema highlights

The PostgreSQL schema is organized into logical domains, each owning a set of tables, enums, and relationships. The schema uses UUIDv7 primary keys for globally unique, time-sortable identifiers.

  • Name
    Ticketing domain
    Type
    ~80 tables
    Description

    Tickets, comments, attachments, tags, custom fields, watchers, linked tickets, time tracking, and audit history.

  • Name
    Identity domain
    Type
    ~40 tables
    Description

    Users, organizations, teams, roles, permissions, sessions, API keys, and OAuth connections.

  • Name
    SLA domain
    Type
    ~25 tables
    Description

    SLA policies, targets, breach records, escalation rules, business hours, and holiday calendars.

  • Name
    Knowledge domain
    Type
    ~30 tables
    Description

    Articles, categories, versions, feedback ratings, view analytics, and embedding references.

  • Name
    Workflow domain
    Type
    ~20 tables
    Description

    Approval chains, approval steps, workflow templates, trigger conditions, and execution logs.

  • Name
    Configuration domain
    Type
    ~35 tables
    Description

    Tenant settings, feature flags, notification templates, integration configs, and webhook registrations.

  • Name
    Audit domain
    Type
    ~15 tables
    Description

    Change logs, login history, API access logs, data export records, and compliance snapshots.

Prisma schema examples

model Ticket {
  id             String         @id @default(dbgenerated("gen_random_uuid()"))
  orgId          String         @map("org_id")
  number         Int
  title          String
  description    String
  status         TicketStatus   @default(OPEN)
  priority       Priority       @default(P3)
  category       String?
  subcategory    String?
  assigneeId     String?        @map("assignee_id")
  teamId         String?        @map("team_id")
  slaBreached    Boolean        @default(false) @map("sla_breached")
  aiConfidence   Float?         @map("ai_confidence")
  modelUsed      String?        @map("model_used")
  traceId        String?        @map("trace_id")
  createdAt      DateTime       @default(now()) @map("created_at")
  updatedAt      DateTime       @updatedAt @map("updated_at")
  resolvedAt     DateTime?      @map("resolved_at")

  organization   Organization   @relation(fields: [orgId], references: [id])
  assignee       User?          @relation(fields: [assigneeId], references: [id])
  team           Team?          @relation(fields: [teamId], references: [id])
  comments       Comment[]
  attachments    Attachment[]
  tags           TicketTag[]
  slaRecords     SlaRecord[]

  @@unique([orgId, number])
  @@index([orgId, status])
  @@index([orgId, priority])
  @@index([assigneeId])
  @@map("tickets")
}

Multi-tenancy

Full multi-tenant isolation at every layer. No tenant can access another tenant's data through any API surface.

  • Name
    Database
    Type
    Row-level security
    Description

    PostgreSQL RLS policies filter every query by org_id. Bypass impossible even through raw SQL.

  • Name
    Cache
    Type
    Key-prefix isolation
    Description

    All Redis keys prefixed with organization identifier. Cache invalidation scoped per tenant.

  • Name
    Vector Search
    Type
    Filtered collections
    Description

    Qdrant queries include mandatory org_id filter. Embeddings never returned across tenant boundaries.

  • Name
    API
    Type
    JWT org_id claim
    Description

    Every request carries an org_id claim in the JWT. Gateway validates claim against requested resource.

Subscription tiers

  • Name
    free
    Type
    tier
    Description

    5 agents, 100 tickets/month, community support, basic triage (Ollama).

  • Name
    starter
    Type
    tier
    Description

    25 agents, 1,000 tickets/month, email support, AI triage with standard models.

  • Name
    professional
    Type
    tier
    Description

    100 agents, 10,000 tickets/month, priority support, full AI suite, custom workflows.

  • Name
    enterprise
    Type
    tier
    Description

    Unlimited agents and tickets, dedicated support, on-prem AI inference (vLLM/HPE), custom integrations, SSO/SAML, audit logs.

  • Name
    unlimited
    Type
    tier
    Description

    Everything in enterprise plus dedicated infrastructure, custom model fine-tuning, 24/7 premium support, and SLA guarantees with financial penalties.

Multi-tenancy enforcement

-- Enable RLS on tickets table
ALTER TABLE tickets ENABLE ROW LEVEL SECURITY;

-- Tenant isolation policy
CREATE POLICY tenant_isolation ON tickets
  USING (org_id = current_setting('app.current_org_id')::uuid);

-- Verify isolation
SET app.current_org_id = 'org_abc123';
SELECT count(*) FROM tickets;
-- => Only returns tickets for org_abc123

Deployment

The platform deploys on Kubernetes with Helm charts, Istio service mesh, and HPE AI Essentials for on-premises AI inference.

  • Name
    Kubernetes
    Type
    Orchestration
    Description

    All services as deployments with HPA. API scales 3-20 replicas based on CPU and request latency.

  • Name
    Helm
    Type
    Package management
    Description

    Single chart packages all services. Values files for dev, staging, and production environments.

  • Name
    Istio
    Type
    Service mesh
    Description

    Mutual TLS, traffic routing for canary deployments, and circuit breakers for external dependencies.

  • Name
    HPE AI Essentials
    Type
    AI infrastructure
    Description

    On-premises GPU cluster running vLLM with HPE 120B GPT model and tensor parallelism.

Deployment commands

# Deploy to production
helm upgrade --install itsm ./helm/itsm \
  --namespace itsm-production \
  --values helm/values.production.yaml \
  --set image.tag=v2.4.0 \
  --wait --timeout 600s

# Verify rollout
kubectl -n itsm-production rollout status \
  deployment/itsm-api
# => deployment "itsm-api" successfully rolled out

What's next?

Now that you understand the architecture, here are some recommended next steps to explore the platform in detail:

  • Authentication -- Learn how JWT auth and API keys work
  • Quickstart -- Set up your first API client and make requests
  • Errors -- Understand error codes and troubleshooting
  • Webhooks -- Integrate real-time events into your systems

Was this page helpful?