Architecture
The ITSM Automation platform is a multi-tenant, AI-driven IT Service Management system built for enterprise-scale operations. This document describes the full system architecture spanning 130+ NestJS modules, 430+ database tables, and a multi-provider AI gateway -- all deployed on Kubernetes with HPE AI Essentials.
This architecture reference reflects the production topology. All services listed here are persistent, distributed, and survive restarts. There are no in-memory substitutes or mock implementations in the production path.
Tech stack
The platform is composed of purpose-built layers, each selected for enterprise reliability and horizontal scalability. The backend alone contains 92 controllers exposing 500+ endpoints, while the frontend delivers 200+ routes through 90+ React components.
Backend
- Name
Runtime- Type
- NestJS + TypeScript
- Description
130+ modules, 92 controllers, 500+ REST endpoints with full OpenAPI documentation. Built on NestJS for dependency injection, modular architecture, and decorator-driven route handling.
- Name
ORM- Type
- Prisma
- Description
Type-safe database access with generated client, migrations, and introspection against PostgreSQL 16. Schema covers 430+ tables and 91 enums.
- Name
Validation- Type
- class-validator + class-transformer
- Description
DTO-level input validation on every endpoint with automatic transformation and whitelist enforcement.
Frontend
- Name
Framework- Type
- Next.js 15 + React 19
- Description
200+ routes with server-side rendering, React Server Components, and streaming. App Router architecture with nested layouts.
- Name
Styling- Type
- TailwindCSS
- Description
90+ components built with utility-first CSS, design tokens, and dark mode support across all tenant themes.
Infrastructure
- Name
Database- Type
- PostgreSQL 16 + pgvector
- Description
430+ tables, 91 enums, row-level security for tenant isolation. UUIDv7 primary keys for time-sortable identifiers.
- Name
Cache and Queue- Type
- Redis + BullMQ
- Description
Three Redis instances (cache, queue, locks). 12 BullMQ queues for background processing. Redlock for distributed locking.
- Name
Workflows- Type
- Temporal.io
- Description
5 task queues for durable workflow orchestration: ticket lifecycle, SLA enforcement, approval chains, scheduled reports, and bulk operations.
- Name
AI- Type
- Multi-provider LLM Gateway
- Description
OpenAI, vLLM (HPE 120B GPT), and Ollama with automatic fallback. Qdrant vector database for RAG and semantic search.
- Name
Deployment- Type
- Kubernetes + Helm + Istio
- Description
HPE AI Essentials for on-premises GPU inference. Prometheus, Grafana, and Jaeger for observability.
Quick verification
# Count NestJS modules
find backend/src -name "*.module.ts" | wc -l
# => 130+
# Count controllers
find backend/src -name "*.controller.ts" | wc -l
# => 92
Service topology
The platform runs as a set of independently deployable services, each with dedicated health checks and monitoring endpoints. All inter-service communication uses either gRPC (Temporal workers) or HTTP with circuit breakers.
Core services
- Name
API Server- Type
- port 3000
- Description
NestJS application serving all REST endpoints. Health check at
/healthreturns service status, database connectivity, Redis availability, and Temporal connection state.
- Name
Frontend- Type
- port 3001
- Description
Next.js 15 application with SSR. Serves the tenant-aware dashboard, ticket management UI, knowledge base portal, and admin console.
- Name
Worker- Type
- port 9464
- Description
BullMQ processor and Temporal activity worker. Exposes Prometheus metrics at
/metricsfor queue depth, processing latency, and failure rates across all 12 queues.
Data services
- Name
PostgreSQL- Type
- port 5432
- Description
Primary datastore with 430+ tables, pgvector extension for embedding storage, and row-level security policies for multi-tenant isolation.
- Name
Redis- Type
- port 6379
- Description
Three Redis instances for Redlock distributed locking, BullMQ job queues, and application caching. Separate instances ensure lock contention does not affect queue throughput.
- Name
Qdrant- Type
- port 6333
- Description
Vector database for semantic search, RAG document retrieval, and similar-ticket lookup. Stores embeddings generated by the AI gateway.
- Name
Temporal- Type
- port 7233
- Description
Workflow orchestration engine managing 5 task queues for long-running processes: ticket lifecycle, SLA escalation, approval chains, scheduled reports, and bulk operations.
Health checks
curl -sf http://localhost:3000/health | jq
# {
# "status": "healthy",
# "database": "connected",
# "redis": "connected",
# "temporal": "connected",
# "uptime": 86400
# }
Observability services
- Name
Prometheus- Type
- port 9090
- Description
Metrics collection and alerting. Scrapes all service endpoints on 15-second intervals with retention configured for 30 days.
- Name
Grafana- Type
- port 3005
- Description
Dashboards for SLA compliance, ticket throughput, AI model latency, queue health, and per-tenant usage. Pre-built dashboards ship with the Helm chart.
- Name
Jaeger- Type
- port 16686
- Description
Distributed tracing across API, worker, Temporal, and AI gateway calls. Every ticket triage request generates a full trace from ingestion through AI classification to database persistence.
Service map
{
"services": {
"api": { "port": 3000, "health": "/health" },
"frontend": { "port": 3001, "health": "/" },
"worker": { "port": 9464, "health": "/metrics" },
"temporal": { "port": 7233 },
"postgresql": { "port": 5432 },
"redis": { "port": 6379 },
"qdrant": { "port": 6333 },
"prometheus": { "port": 9090 },
"grafana": { "port": 3005 },
"jaeger": { "port": 16686 }
}
}
Architecture layers
The system follows a strict five-layer architecture. Each layer has a well-defined responsibility boundary and communicates only with its immediate neighbors.
Layer 1: Presentation
The presentation layer is stateless by design. All session state lives in JWT tokens and server-side caches.
- Name
Dashboard- Type
- Next.js App Router
- Description
Real-time ticket overview, SLA timers, team workload visualization, and AI-suggested actions. 45+ routes.
- Name
Tenant Portal- Type
- Multi-theme SSR
- Description
Organization-branded self-service portal for ticket submission, knowledge base, and resolution tracking.
- Name
Admin Console- Type
- Role-gated routes
- Description
System configuration, user management, SLA policy editor, workflow designer, and AI model tuning. 30+ routes.
Frontend structure
{
"routes": {
"dashboard": "45+ pages",
"admin": "30+ pages",
"portal": "25+ pages"
},
"stores": 12,
"components": "90+"
}
Layer 2: API Gateway
Single entry point for all client requests with authentication, authorization, rate limiting, and validation.
- Name
Authentication- Type
- JWT RS256
- Description
Stateless JWT with 15-minute access tokens and 7-day refresh tokens. Per-device tracking and revocation via Redis.
- Name
Authorization- Type
- RBAC + ABAC
- Description
5 role levels: end_user, agent, team_lead, admin, org_owner. Permissions evaluated per-request via NestJS guards.
- Name
Rate Limiting- Type
- Redis token bucket
- Description
Per-tenant, per-endpoint rate limiting. Configurable burst and sustained rates per subscription tier.
- Name
Validation- Type
- class-validator DTOs
- Description
Every endpoint enforces input validation through DTO decorators with structured error responses.
Rate limit headers
curl -si http://localhost:3000/api/v1/tickets \
-H "Authorization: Bearer {jwt_token}" \
| grep -i "x-ratelimit"
# X-RateLimit-Limit: 1000
# X-RateLimit-Remaining: 997
# X-RateLimit-Reset: 1708819200
Layer 3: AI Orchestration
Provider-agnostic AI gateway routing inference to OpenAI, vLLM, or Ollama with automatic fallback. Every call is traced, logged, and validated through guardrails.
- Name
Intent Classification- Type
- LLM inference
- Description
Classifies tickets into service categories using the tenant service catalogue. Supports multiple providers with automatic fallback.
- Name
Priority Prediction- Type
- LLM inference
- Description
Predicts P1-P5 priority based on description, affected service, historical patterns, and SLA impact. Confidence scores attached.
- Name
Entity Extraction- Type
- LLM inference
- Description
Extracts structured entities: affected systems, error codes, user identifiers, Polish PII (PESEL, NIP, IBAN).
- Name
RAG Search- Type
- Qdrant + LLM
- Description
Retrieval-augmented generation over knowledge base and resolution history. Semantic similarity ranking.
AI triage request
curl -sf http://localhost:3000/api/v1/triage \
-X POST \
-H "Authorization: Bearer {jwt_token}" \
-H "Content-Type: application/json" \
-d '{
"title": "Cannot access production database",
"description": "Connection timeout on pg-prod-01 since 14:30 UTC. API pods returning 503.",
"reporter_email": "oncall@acme.com"
}' | jq
# {
# "ticket_id": "TKT-20260224-0042",
# "classification": {
# "category": "Infrastructure",
# "subcategory": "Database",
# "confidence": 0.94
# },
# "priority": { "level": "P1", "confidence": 0.97 },
# "entities": {
# "affected_system": "pg-prod-01",
# "error_type": "connection_timeout"
# },
# "model_used": "hpe-gpt-120b",
# "trace_id": "abc123def456"
# }
Layer 4: Business Logic
Core ITSM domain: ticket lifecycle, SLA enforcement, approval workflows, and escalation policies. Long-running processes orchestrated through Temporal.
- Name
Ticket Lifecycle- Type
- Temporal workflow
- Description
Full state machine from creation through triage, assignment, resolution, and closure. Each transition audited.
- Name
SLA Engine- Type
- BullMQ + Temporal
- Description
Monitors response and resolution time targets per priority and tenant SLA policy. Auto-escalation on breach.
- Name
Approval Workflows- Type
- Temporal workflow
- Description
Multi-level approval chains for change requests. Sequential, parallel, and quorum-based patterns with timeout escalation.
- Name
Notification Engine- Type
- BullMQ
- Description
6 channels: email (SMTP), SMS (Twilio), push (FCM), Slack, Teams, webhooks. Dedicated queues with retry.
Queue and workflow configuration
{
"queues": [
{ "name": "ticket-triage", "concurrency": 5 },
{ "name": "sla-monitor", "concurrency": 3 },
{ "name": "notification-email", "concurrency": 10 },
{ "name": "notification-slack", "concurrency": 5 },
{ "name": "notification-teams", "concurrency": 5 },
{ "name": "approval-processor", "concurrency": 3 },
{ "name": "report-generator", "concurrency": 2 },
{ "name": "bulk-operations", "concurrency": 2 },
{ "name": "knowledge-indexer", "concurrency": 3 },
{ "name": "audit-logger", "concurrency": 5 },
{ "name": "webhook-dispatch", "concurrency": 5 },
{ "name": "ai-embedding", "concurrency": 4 }
]
}
Layer 5: Data Layer
Persistent storage, caching, and vector search. Each technology serves a distinct purpose with no overlap.
- Name
PostgreSQL 16- Type
- Primary datastore
- Description
430+ tables with pgvector. All transactional data. Row-level security enforces tenant isolation at the database level.
- Name
Redis- Type
- Cache + Queue + Locks
- Description
Three instances: caching (6379), BullMQ queues (6380), Redlock distributed locking (6381).
- Name
Qdrant- Type
- Vector database
- Description
Embedding storage for semantic search. Collections partitioned by tenant. Configurable dimensions and distance metrics.
- Name
Temporal- Type
- Workflow state
- Description
Persists workflow execution state, activity results, and timer schedules. Exactly-once execution guarantees.
Database details
-- Table count
SELECT count(*)
FROM information_schema.tables
WHERE table_schema = 'public';
-- => 430+
-- Enum count
SELECT count(*)
FROM pg_type
WHERE typtype = 'e';
-- => 91
-- Row-level security policies
SELECT count(*)
FROM pg_policies;
-- => 200+
AI orchestration
The AI orchestration layer is provider-agnostic, routing requests through a unified gateway that supports multiple LLM backends. Every inference call is tracked with the model provider, token usage, latency, and confidence score. If the primary provider is unavailable, the gateway automatically falls back to the next configured provider.
Provider configuration
- Name
OpenAI- Type
- Primary provider
- Description
GPT-4 class models for high-accuracy classification, entity extraction, and knowledge synthesis. Used as the default provider for enterprise-tier tenants.
- Name
vLLM- Type
- Self-hosted inference
- Description
Runs HPE 120B GPT on-premises via HPE AI Essentials. Provides data-sovereign inference for tenants with compliance requirements that prohibit external API calls.
- Name
Ollama- Type
- Development / fallback
- Description
Local model inference for development environments and as a last-resort fallback. Supports Llama, Mistral, and other open-weight models.
- Name
Qdrant- Type
- Vector search
- Description
Embedding storage and retrieval for RAG pipelines. Collections are partitioned by tenant with configurable embedding dimensions and distance metrics.
Guardrails
Every AI response passes through validation before reaching the caller. The guardrail pipeline checks for hallucinated entity references, toxic content, confidence thresholds, and schema compliance. Responses that fail validation are rejected and the request is retried with an adjusted prompt or routed to a human agent.
AI provider configuration
ai:
gateway:
default_provider: openai
fallback_order:
- openai
- vllm
- ollama
timeout_ms: 30000
retry:
max_attempts: 3
backoff_ms: 1000
providers:
openai:
base_url: https://api.openai.com/v1
model: gpt-4o
max_tokens: 4096
vllm:
base_url: http://vllm.internal:8000/v1
model: hpe-gpt-120b
max_tokens: 4096
ollama:
base_url: http://ollama.internal:11434/v1
model: llama3.1:70b
max_tokens: 4096
guardrails:
min_confidence: 0.6
max_retries_on_low_confidence: 2
content_filter: enabled
schema_validation: strict
hallucination_check: enabled
Data layer
Database schema highlights
The PostgreSQL schema is organized into logical domains, each owning a set of tables, enums, and relationships. The schema uses UUIDv7 primary keys for globally unique, time-sortable identifiers.
- Name
Ticketing domain- Type
- ~80 tables
- Description
Tickets, comments, attachments, tags, custom fields, watchers, linked tickets, time tracking, and audit history.
- Name
Identity domain- Type
- ~40 tables
- Description
Users, organizations, teams, roles, permissions, sessions, API keys, and OAuth connections.
- Name
SLA domain- Type
- ~25 tables
- Description
SLA policies, targets, breach records, escalation rules, business hours, and holiday calendars.
- Name
Knowledge domain- Type
- ~30 tables
- Description
Articles, categories, versions, feedback ratings, view analytics, and embedding references.
- Name
Workflow domain- Type
- ~20 tables
- Description
Approval chains, approval steps, workflow templates, trigger conditions, and execution logs.
- Name
Configuration domain- Type
- ~35 tables
- Description
Tenant settings, feature flags, notification templates, integration configs, and webhook registrations.
- Name
Audit domain- Type
- ~15 tables
- Description
Change logs, login history, API access logs, data export records, and compliance snapshots.
Prisma schema examples
model Ticket {
id String @id @default(dbgenerated("gen_random_uuid()"))
orgId String @map("org_id")
number Int
title String
description String
status TicketStatus @default(OPEN)
priority Priority @default(P3)
category String?
subcategory String?
assigneeId String? @map("assignee_id")
teamId String? @map("team_id")
slaBreached Boolean @default(false) @map("sla_breached")
aiConfidence Float? @map("ai_confidence")
modelUsed String? @map("model_used")
traceId String? @map("trace_id")
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
resolvedAt DateTime? @map("resolved_at")
organization Organization @relation(fields: [orgId], references: [id])
assignee User? @relation(fields: [assigneeId], references: [id])
team Team? @relation(fields: [teamId], references: [id])
comments Comment[]
attachments Attachment[]
tags TicketTag[]
slaRecords SlaRecord[]
@@unique([orgId, number])
@@index([orgId, status])
@@index([orgId, priority])
@@index([assigneeId])
@@map("tickets")
}
Multi-tenancy
Full multi-tenant isolation at every layer. No tenant can access another tenant's data through any API surface.
- Name
Database- Type
- Row-level security
- Description
PostgreSQL RLS policies filter every query by org_id. Bypass impossible even through raw SQL.
- Name
Cache- Type
- Key-prefix isolation
- Description
All Redis keys prefixed with organization identifier. Cache invalidation scoped per tenant.
- Name
Vector Search- Type
- Filtered collections
- Description
Qdrant queries include mandatory org_id filter. Embeddings never returned across tenant boundaries.
- Name
API- Type
- JWT org_id claim
- Description
Every request carries an org_id claim in the JWT. Gateway validates claim against requested resource.
Subscription tiers
- Name
free- Type
- tier
- Description
5 agents, 100 tickets/month, community support, basic triage (Ollama).
- Name
starter- Type
- tier
- Description
25 agents, 1,000 tickets/month, email support, AI triage with standard models.
- Name
professional- Type
- tier
- Description
100 agents, 10,000 tickets/month, priority support, full AI suite, custom workflows.
- Name
enterprise- Type
- tier
- Description
Unlimited agents and tickets, dedicated support, on-prem AI inference (vLLM/HPE), custom integrations, SSO/SAML, audit logs.
- Name
unlimited- Type
- tier
- Description
Everything in enterprise plus dedicated infrastructure, custom model fine-tuning, 24/7 premium support, and SLA guarantees with financial penalties.
Multi-tenancy enforcement
-- Enable RLS on tickets table
ALTER TABLE tickets ENABLE ROW LEVEL SECURITY;
-- Tenant isolation policy
CREATE POLICY tenant_isolation ON tickets
USING (org_id = current_setting('app.current_org_id')::uuid);
-- Verify isolation
SET app.current_org_id = 'org_abc123';
SELECT count(*) FROM tickets;
-- => Only returns tickets for org_abc123
Deployment
The platform deploys on Kubernetes with Helm charts, Istio service mesh, and HPE AI Essentials for on-premises AI inference.
- Name
Kubernetes- Type
- Orchestration
- Description
All services as deployments with HPA. API scales 3-20 replicas based on CPU and request latency.
- Name
Helm- Type
- Package management
- Description
Single chart packages all services. Values files for dev, staging, and production environments.
- Name
Istio- Type
- Service mesh
- Description
Mutual TLS, traffic routing for canary deployments, and circuit breakers for external dependencies.
- Name
HPE AI Essentials- Type
- AI infrastructure
- Description
On-premises GPU cluster running vLLM with HPE 120B GPT model and tensor parallelism.
Deployment commands
# Deploy to production
helm upgrade --install itsm ./helm/itsm \
--namespace itsm-production \
--values helm/values.production.yaml \
--set image.tag=v2.4.0 \
--wait --timeout 600s
# Verify rollout
kubectl -n itsm-production rollout status \
deployment/itsm-api
# => deployment "itsm-api" successfully rolled out
What's next?
Now that you understand the architecture, here are some recommended next steps to explore the platform in detail:
- Authentication -- Learn how JWT auth and API keys work
- Quickstart -- Set up your first API client and make requests
- Errors -- Understand error codes and troubleshooting
- Webhooks -- Integrate real-time events into your systems