System Architecture Overview
This document provides a high‑level map of the SmartSRE platform: core services, data flows, and where guardrails, approvals, and observability are applied.
Key Components
- FastAPI Backend (
src/api/main.py): The central nervous system serving public and internal routes. - Domain Agents (
src/agents/*): Specialized AI agents orchestrated locally via the in-process AgentManager. - Change Set Executor (
src/services/change_set_executor.py): The "hands" of the system that enforce guardrails and apply changes. - Risk Engine (
src/services/risk_engine.py): The safety valve determining if an action requires human approval. - Real-time Service (
src/services/realtime_service.py): WebSocket and Event Bus layer for live updates.
High-Level Architecture
Data & Control Flow
- Trigger: A user request or monitoring alert hits the API.
- Planning: The
AgentManagerspawns a Domain Agent (e.g., Cloud Run Scaler). - Reasoning: The Agent uses LLM reasoning + Tools to generate a Change Set.
- Safety Check: The
RiskEnginescores the Change Set. High-risk changes pause for Approval. - Execution: Once approved, the
ChangeSetExecutorapplies the changes to GCP. - Feedback: All steps emit events via the
RealtimeServiceto the UI.
Related Documentation
- Onboarding & Discovery: How users connect and scan their environments.
- Agent Execution: Deep dive into the Planner/Executor loop.
- Runtime Flow: Sequence diagrams for specific API interactions.
- Risk Guardrails: How safety policies are defined and enforced.