System Architecture Overview

This document provides a high‑level map of the SmartSRE platform: core services, data flows, and where guardrails, approvals, and observability are applied.

Key Components

FastAPI Backend (src/api/main.py): The central nervous system serving public and internal routes.
Domain Agents (src/agents/*): Specialized AI agents orchestrated locally via the in-process AgentManager.
Change Set Executor (src/services/change_set_executor.py): The "hands" of the system that enforce guardrails and apply changes.
Risk Engine (src/services/risk_engine.py): The safety valve determining if an action requires human approval.
Real-time Service (src/services/realtime_service.py): WebSocket and Event Bus layer for live updates.

High-Level Architecture

Data & Control Flow

Trigger: A user request or monitoring alert hits the API.
Planning: The AgentManager spawns a Domain Agent (e.g., Cloud Run Scaler).
Reasoning: The Agent uses LLM reasoning + Tools to generate a Change Set.
Safety Check: The RiskEngine scores the Change Set. High-risk changes pause for Approval.
Execution: Once approved, the ChangeSetExecutor applies the changes to GCP.
Feedback: All steps emit events via the RealtimeService to the UI.

Onboarding & Discovery: How users connect and scan their environments.
Agent Execution: Deep dive into the Planner/Executor loop.
Runtime Flow: Sequence diagrams for specific API interactions.
Risk Guardrails: How safety policies are defined and enforced.

Key Components​

High-Level Architecture​

Data & Control Flow​

Related Documentation​

Key Components

High-Level Architecture

Data & Control Flow

Related Documentation