Skip to main content

System Architecture Overview

This document provides a high‑level map of the SmartSRE platform: core services, data flows, and where guardrails, approvals, and observability are applied.

Key Components

  • FastAPI Backend (src/api/main.py): The central nervous system serving public and internal routes.
  • Domain Agents (src/agents/*): Specialized AI agents orchestrated locally via the in-process AgentManager.
  • Change Set Executor (src/services/change_set_executor.py): The "hands" of the system that enforce guardrails and apply changes.
  • Risk Engine (src/services/risk_engine.py): The safety valve determining if an action requires human approval.
  • Real-time Service (src/services/realtime_service.py): WebSocket and Event Bus layer for live updates.

High-Level Architecture

Data & Control Flow

  1. Trigger: A user request or monitoring alert hits the API.
  2. Planning: The AgentManager spawns a Domain Agent (e.g., Cloud Run Scaler).
  3. Reasoning: The Agent uses LLM reasoning + Tools to generate a Change Set.
  4. Safety Check: The RiskEngine scores the Change Set. High-risk changes pause for Approval.
  5. Execution: Once approved, the ChangeSetExecutor applies the changes to GCP.
  6. Feedback: All steps emit events via the RealtimeService to the UI.