Risk Guardrails & Approvals
Overview
- Tenant risk configuration governs caps, approval requirements, and global settings.
- Executor enforces guardrails before apply; Approval Manager gates high‑risk actions.
- Safety Profiles: Tenants select from predefined profiles (Conservative, Balanced, Aggressive) or configure Custom settings.
Configuration model
- Defaults in
src/core/config.py:DEFAULT_RISK_CONFIG. - Tenant overlays via
ConfigurationService.get_tenant_risk_config(). - Risk Engine:
AdvancedRiskEnginecalculates risk scores dynamically based on:- Profile Rules: Explicitly allowed/forbidden operations.
- Inherent Risk: Metadata from the operation registry (Low/Medium/High/Critical).
- Context: Cost impact, historical failure rates, and time of day.
- No Magic Inference: Previous "Industry Template" inference (guessing risk based on org name) has been removed in favor of explicit Profile selection.
Safety Profiles
| Profile | Description | Key Behaviors |
|---|---|---|
| Conservative | Maximum safety; manual approvals for all stateful changes. | Blocks sensitive ops (e.g., grant_invoker); requires approval for DB/BQ changes. |
| Balanced | The sweet spot; approvals for DBs, autonomous tuning for stateless. | Allows standard scaling; gates destructive ops. |
| Aggressive | Full autopilot; executes playbooks automatically. | Minimal gates; relies on rollback safety nets. |
Rollback Safety
- Default: Rollbacks are not auto-approved by default (
auto_approve_rollbacks: False). - Intelligent Assessment: The Risk Engine analyzes rollback checkpoints:
- Safe: Compute scaling (e.g., Cloud Run memory) is Low Risk.
- Destructive: Data ops (e.g., BigQuery table revert, schema changes) or massive scale events are High Risk and require approval.
Approval flow
Guardrail examples
- Blast radius:
max_changes_per_run,allowed_services. - Cloud Run caps:
max_memory_gi,max_cpu_m. - GKE caps:
min_nodes,max_nodes,max_hpa_max. - BigQuery caps:
min_slots,max_slots. - Cloud SQL:
allow_hafor HA management.