Key Concepts
This guide explains the core terminology and concepts used throughout SmartSRE.
Findings
A Finding represents an issue or optimization opportunity detected during a scan.
{
"id": "finding-abc123",
"service": "cloudrun",
"resource_id": "projects/my-project/locations/us-central1/services/api",
"issue_type": "high_memory_usage",
"severity": "medium",
"details": {
"current_memory": "2Gi",
"avg_utilization": "15%",
"recommended_memory": "512Mi"
}
}
Severity Levels
| Severity | Description | Examples |
|---|---|---|
| Critical | Immediate action required—security breach or service down | Public bucket with sensitive data, expired SSL certificate |
| High | Significant impact—performance degradation or cost overrun | Memory OOM crashes, runaway query costs |
| Medium | Should address soon—suboptimal configuration | Over-provisioned resources, missing lifecycle policies |
| Low | Nice to fix—minor optimization | Unused but cheap resources |
| Info | Informational only—no action needed | Successful configurations, compliance confirmations |
ChangeSets
A ChangeSet is a collection of atomic operations proposed to remediate one or more findings.
{
"service": "cloudrun",
"intent": "Reduce memory allocation to match actual usage",
"steps": [
{
"op": "scale_memory",
"resource_ref": {
"project_id": "my-project",
"region": "us-central1",
"service_name": "api"
},
"params": {
"memory": "512Mi"
},
"impact_score": 25,
"estimated_cost_usd": -15.00
}
]
}
ChangeStep Properties
| Property | Description |
|---|---|
op | Canonical operation name (e.g., scale_memory, set_lifecycle_rule) |
resource_ref | Target resource identifier |
params | Operation-specific parameters |
impact_score | 0-100 score indicating potential disruption (higher = more risky) |
estimated_cost_usd | Expected monthly cost change (negative = savings) |
Scopes
A Scope defines which resources SmartSRE will scan and what operations are permitted.
Use Cases
- Limit by project: Only scan
prod-project, notdev-project - Limit by service: Only scan Cloud Run services, not BigQuery
- Limit by region: Only scan
us-central1resources - Limit by operation: Allow
scale_memorybut notdelete_service
Scope Policies
Scopes can include an allowed_ops policy that restricts which operations SmartSRE can execute:
{
"policy": {
"allowed_ops": ["scale_memory", "scale_cpu", "set_min_instances"],
"risk_profile": "guarded"
}
}
If an operation is not in allowed_ops, SmartSRE will block execution even if the finding is valid.
Risk Profiles
A Risk Profile determines how aggressively SmartSRE acts on findings.
| Profile | Behavior |
|---|---|
| Conservative | All changes require approval; low-impact changes still flagged |
| Balanced | Default; follows standard risk/cost guardrails |
| Aggressive | Lower thresholds for auto-approval; suited for non-production |
Approvals
When a ChangeSet exceeds risk thresholds, SmartSRE creates an Approval Request.
Approval States
Approvals can be delivered via:
- Webhook — POST to a configured URL with HMAC signature
- Email — Notification to designated approvers
- In-App — Visible on the Approvals page
Checkpoints & Rollbacks
Before executing a ChangeSet, SmartSRE creates a Checkpoint capturing the pre-change state.
{
"checkpoint_id": "cp-789xyz",
"execution_id": "exec-456",
"service": "cloudrun",
"before_state": {
"memory": "2Gi",
"cpu": "2",
"min_instances": 0
},
"change_steps_applied": ["scale_memory"],
"ttl_hours": 72
}
If issues arise post-execution, the Rollback operation uses this checkpoint to restore the original state.
Tenants
A Tenant represents an organization using SmartSRE. Each tenant has:
- Isolated data (projects, scopes, findings, audit logs)
- Separate billing and subscription
- Independent RBAC configuration
Users can belong to multiple tenants and switch between them.
Next Steps
- Running Scans — Execute and configure scans
- Scope Management — Create and manage scopes
- Risk Guardrails — Configure safety thresholds