Risk Guardrails

Risk guardrails protect your infrastructure by enforcing cost and impact limits on all SmartSRE operations.

The 4-Tier Risk Model

SmartSRE classifies every remediation operation into one of four tiers:

Tier	Behavior	Example Operations
Tier 1: Auto-Execute	Executes immediately without approval	Memory increase within limits, min instance adjustment
Tier 2: Execute + Notify	Executes and sends notification	Max instance scaling, CPU increase
Tier 3: Require Approval	Pauses for human approval	High-cost changes, service restart
Tier 4: Manual Only	Creates ticket, never auto-executes	Resource deletion, IAM changes, cross-region moves

Risk Evaluation Flow

Configuring Guardrails

Global Settings

Navigate to Settings → Risk Policy to configure tenant-wide limits:

Setting	Description	Default
`max_cost_impact_auto_percent`	Max cost increase (% of budget) for auto-execute	5%
`max_cost_impact_approval_percent`	Max cost increase even with approval	25%
`max_impact_score_auto`	Max impact score (0-100) for auto-execute	50
`require_approval_for_high_risk`	Always require approval for high-risk operations	true

Per-Service Overrides

Override specific limits for individual services:

{
  "service_overrides": {
    "cloudrun": {
      "max_cost_impact_auto_percent": 10,
      "max_memory_gi": 8,
      "max_cpu_m": 4000
    },
    "bigquery": {
      "min_slots": 100,
      "max_slots": 1000
    }
  }
}

Cost Baselines

Project Budget (Recommended)

If your GCP project has a configured monthly_budget_usd, guardrails use this as the baseline:

percent_impact = estimated_cost_change / monthly_budget_usd × 100

Virtual Baseline (Fallback)

When no budget is configured, SmartSRE uses a virtual baseline (default: $100/month):

Allows small changes while catching large step-function increases
Shown in approval UI as "virtual baseline" vs "budget"

Circuit Breakers

Automatic safety stops that halt automation when anomalies are detected:

Circuit Breaker	Trigger	Effect
Failure Rate	> 10% of operations failing	Pause all auto-execute
Cost Overrun	Monthly costs exceed budget by 25%	Block cost-increasing changes
Error Spike	Error rate > 5% post-change	Trigger automatic rollback

Impact Scoring

Each operation has an impact score from 0-100 based on:

Factor	Weight	Examples
Blast Radius	40%	Single resource vs entire service
Reversibility	30%	Easy rollback vs permanent delete
Service Criticality	20%	Production vs development
Time Sensitivity	10%	Business hours vs maintenance window

Example Scores

Operation	Impact Score
Scale Cloud Run memory up	15
Set min instances to 1	25
Restart Cloud Run service	45
Delete GCS bucket	90

Approval Integration

When guardrails require approval:

Approval Request Created — Contains ChangeSet, cost estimate, risk assessment
Notification Sent — Via configured channels (webhook, email)
Timer Starts — Default 30-minute expiration
Decision Made — Approved, Rejected, or Expired
Execution or Rollback — Based on decision

See Approvals Guide for details.

Currency Handling

Mixed Currency Scenarios

When project cost currency differs from guardrail baseline (USD):

Auto-execute is disabled for cost-impacting changes
Approval is forced with CURRENCY_MISMATCH reason
UI shows both project currency and baseline currency

Best Practices

Start Conservative

Begin with low thresholds and increase as you gain confidence:

{
  "max_cost_impact_auto_percent": 2,
  "max_impact_score_auto": 30,
  "require_approval_for_high_risk": true
}

Use Per-Service Overrides

Production Cloud Run may need stricter limits than development BigQuery.

Monitor Circuit Breakers

Enable alerts for circuit breaker activations to catch systemic issues.

Viewing Risk Assessments

Every scan and execution includes a risk assessment visible in:

Run Details → Risk Tab — Full breakdown of risk factors
Approval Requests — Summary of why approval is required
Audit Trail — Historical risk decisions

Next Steps

Approvals — Handle approval workflows
Scope Management — Restrict allowed operations per scope
Cost Control — Budget configuration

The 4-Tier Risk Model​

Risk Evaluation Flow​

Configuring Guardrails​

Global Settings​

Per-Service Overrides​

Cost Baselines​

Project Budget (Recommended)​

Virtual Baseline (Fallback)​

Circuit Breakers​

Impact Scoring​

Example Scores​

Approval Integration​

Currency Handling​

Mixed Currency Scenarios​

Best Practices​

Start Conservative​

Use Per-Service Overrides​

Monitor Circuit Breakers​

Viewing Risk Assessments​

Next Steps​