Skip to main content

Introduction to SmartSRE

SmartSRE is an Intelligent Remediation Service that automatically detects and fixes common issues in your Google Cloud Platform (GCP) environment using AI-powered agents.

What SmartSRE Does

CapabilityDescription
ScanDiscovers resources across 8 GCP services and identifies optimization opportunities, performance issues, and security gaps
PlanAI agents analyze findings and generate safe, reversible remediation plans with cost/impact estimates
ApproveRisk-based guardrails ensure high-impact changes require human approval before execution
ExecuteApplies approved changes to your GCP resources with full audit trails
RollbackCheckpoint-based rollbacks enable safe recovery if issues arise post-execution

Supported GCP Services

SmartSRE provides deep integration with:

  • BigQuery — Slot optimization, query cost analysis, table lifecycle management
  • Cloud Run — Auto-scaling, memory/CPU right-sizing, cold start mitigation
  • Cloud SQL — Connection pooling, HA configuration, storage management
  • Compute Engine (GCE) — Disk cleanup, snapshot management, instance scheduling
  • Cloud Storage (GCS) — Lifecycle policies, public access controls, archive transitions
  • Google Kubernetes Engine (GKE) — Node scaling, HPA tuning, resource quotas
  • Pub/Sub — Backlog monitoring, dead letter policies
  • Secret Manager — Rotation schedules, version management

Core Principles

Truth in Automation

SmartSRE follows a strict "Truth in Automation" policy:

  • Only capabilities that are physically implemented are presented to users
  • Features in development are clearly labeled "Coming Soon"
  • Visual mockups never show non-existent integrations

Human-in-the-Loop by Default

For safety, SmartSRE requires human approval for all changes by default:

  • Free, Team, and Pro tiers always require approval before execution
  • Enterprise tenants may enable "Zero-Touch" policies for low-risk changes
  • All changes create rollback checkpoints for safe recovery

Fail-Closed Security

When in doubt, SmartSRE blocks rather than guesses:

  • Ambiguous permissions result in explicit denial
  • Unknown operations require approval
  • Risk guardrails enforce cost and impact limits

Next Steps