Skip to main content

Key Concepts

This guide explains the core terminology and concepts used throughout SmartSRE.

Findings

A Finding represents an issue or optimization opportunity detected during a scan.

{
"id": "finding-abc123",
"service": "cloudrun",
"resource_id": "projects/my-project/locations/us-central1/services/api",
"issue_type": "high_memory_usage",
"severity": "medium",
"details": {
"current_memory": "2Gi",
"avg_utilization": "15%",
"recommended_memory": "512Mi"
}
}

Severity Levels

SeverityDescriptionExamples
CriticalImmediate action required—security breach or service downPublic bucket with sensitive data, expired SSL certificate
HighSignificant impact—performance degradation or cost overrunMemory OOM crashes, runaway query costs
MediumShould address soon—suboptimal configurationOver-provisioned resources, missing lifecycle policies
LowNice to fix—minor optimizationUnused but cheap resources
InfoInformational only—no action neededSuccessful configurations, compliance confirmations

ChangeSets

A ChangeSet is a collection of atomic operations proposed to remediate one or more findings.

{
"service": "cloudrun",
"intent": "Reduce memory allocation to match actual usage",
"steps": [
{
"op": "scale_memory",
"resource_ref": {
"project_id": "my-project",
"region": "us-central1",
"service_name": "api"
},
"params": {
"memory": "512Mi"
},
"impact_score": 25,
"estimated_cost_usd": -15.00
}
]
}

ChangeStep Properties

PropertyDescription
opCanonical operation name (e.g., scale_memory, set_lifecycle_rule)
resource_refTarget resource identifier
paramsOperation-specific parameters
impact_score0-100 score indicating potential disruption (higher = more risky)
estimated_cost_usdExpected monthly cost change (negative = savings)

Scopes

A Scope defines which resources SmartSRE will scan and what operations are permitted.

Use Cases

  • Limit by project: Only scan prod-project, not dev-project
  • Limit by service: Only scan Cloud Run services, not BigQuery
  • Limit by region: Only scan us-central1 resources
  • Limit by operation: Allow scale_memory but not delete_service

Scope Policies

Scopes can include an allowed_ops policy that restricts which operations SmartSRE can execute:

{
"policy": {
"allowed_ops": ["scale_memory", "scale_cpu", "set_min_instances"],
"risk_profile": "guarded"
}
}

If an operation is not in allowed_ops, SmartSRE will block execution even if the finding is valid.

Risk Profiles

A Risk Profile determines how aggressively SmartSRE acts on findings.

ProfileBehavior
ConservativeAll changes require approval; low-impact changes still flagged
BalancedDefault; follows standard risk/cost guardrails
AggressiveLower thresholds for auto-approval; suited for non-production

Approvals

When a ChangeSet exceeds risk thresholds, SmartSRE creates an Approval Request.

Approval States

Approvals can be delivered via:

  • Webhook — POST to a configured URL with HMAC signature
  • Email — Notification to designated approvers
  • In-App — Visible on the Approvals page

Checkpoints & Rollbacks

Before executing a ChangeSet, SmartSRE creates a Checkpoint capturing the pre-change state.

{
"checkpoint_id": "cp-789xyz",
"execution_id": "exec-456",
"service": "cloudrun",
"before_state": {
"memory": "2Gi",
"cpu": "2",
"min_instances": 0
},
"change_steps_applied": ["scale_memory"],
"ttl_hours": 72
}

If issues arise post-execution, the Rollback operation uses this checkpoint to restore the original state.

Tenants

A Tenant represents an organization using SmartSRE. Each tenant has:

  • Isolated data (projects, scopes, findings, audit logs)
  • Separate billing and subscription
  • Independent RBAC configuration

Users can belong to multiple tenants and switch between them.

Next Steps