Managing infrastructure across AWS, GCP, and Azure means working with three separate billing models, three separate cost dashboards, and three separate ways of attributing spend to the teams and services that generated it. Without a consistent tagging strategy and automated governance across all three, the monthly bill becomes a number with no internal structure. You know you spent $40,000. You have very little idea which environment, which team, or which feature generated which portion of it.
The Flexera State of the Cloud 2025 report found that 27% of cloud spend is wasted, and this figure has been consistent for five consecutive years despite cloud cost optimisation being the stated number one priority every year. The waste is not going down because knowing you have a problem is not the same as being able to see it at the resource level in real time.
The scale of this is worth contextualising. OpenAI spent $8.7 billion on inference alone in the first three quarters of 2025, more than double their entire 2024 spend. Midjourney migrated its inference fleet from Nvidia A100 and H100 GPUs to Google TPU v6e in Q2 2025 and cut monthly inference costs from $2.1M to under $700K, a 65% reduction, with an eleven-day payback period on the migration work. These are not startup numbers, but the underlying pattern, unmonitored workloads consuming resources at a rate disconnected from the product, scales down to any team running cloud infrastructure without active cost governance.
Why the bill looks different now than it did three years ago
A startup's cloud spend used to be mostly compute and storage. Predictable, scaling slowly, easy to forecast. Now it is managed services, LLM API calls, GPU workloads, preview environments that spin up automatically per pull request, data egress from multi-region setups, and Kubernetes clusters running at generous resource limits because nobody wanted to deal with OOMKilled pods during a demo.
GPU instances specifically are a cost category most teams are not budgeting for correctly. An H100 instance sitting idle overnight is roughly $24 per GPU in wasted spend per night. A fine-tuning job that finishes at 3am but does not terminate until someone checks in the morning burns real money with no output. A CloudZero study found that 78% of organisations detect cloud cost anomalies late, with only 22% catching them quickly. By the time the monthly bill arrives, the damage is done. A Harness study from February 2025 found enterprises take an average of 31 days to identify idle or orphaned resources. For a startup burning $50,000 per month on cloud, that is a significant amount of runway to lose before the first alert fires.
Tagging: the foundation without which nothing else works
Every cloud resource needs four tags at minimum: environment, team, project, and owner. Without them, cost attribution is impossible and automated governance has nothing to act on. With them, you can ask which feature is costing $8,000 per month, which team is responsible for the non-production environments running over the weekend, and which project generated the egress spike in the last billing cycle.
The problem on multi-cloud setups is consistency. Seventeen variations of env, Env, environment, and Environment across AWS, GCP, and Azure make cost reports unreadable and automation unreliable. The fix is a strict tag policy enforced at the provider level: AWS Service Control Policies to block untagged resource creation, Azure Policy assignments, and GCP Organisation constraints. Once the policy exists in code, it applies on every deployment without manual review.
On the AWS side, a GitHub Actions workflow that runs Cloud Custodian as part of the deployment pipeline enforces tag compliance before resources reach production:
policies:
- name: ec2-tag-compliance
resource: aws.ec2
description: |
Flag EC2 instances missing required tags
and schedule them for termination in 4 days.
filters:
- State.Name: running
- "tag:environment": absent
- "tag:team": absent
- "tag:project": absent
actions:
- type: mark-for-op
op: terminate
days: 4
- type: notify
to:
- slack://your-channel
message: "Instance {resource[InstanceId]} missing required tags"Cloud Custodian is a CNCF incubating project that supports AWS, Azure, and GCP with a consistent YAML policy DSL. Policies run as Lambda functions triggered by CloudWatch Events, as scheduled cron jobs, or directly in the CI pipeline against Terraform plans before provisioning. Running it in the pipeline means non-compliant resources never reach the cloud rather than being cleaned up after the fact.
TTLs on non-production environments
Preview environments, staging environments, and developer sandboxes are the most consistent source of avoidable waste. They get created for a specific purpose, that purpose ends, and the environment continues running because nobody owns the decommission step. A Cloud Custodian policy with an off-hours schedule reduces non-production instance costs by approximately 50% with no engineering effort beyond the initial setup:
policies:
- name: non-prod-off-hours
resource: aws.ec2
filters:
- "tag:environment": dev
- "tag:environment": staging
actions:
- type: stop
mode:
type: periodic
schedule: "cron(0 20 * * ? *)" # Stop at 8pm daily
- name: non-prod-on-hours
resource: aws.ec2
filters:
- "tag:environment": dev
- "tag:environment": staging
- State.Name: stopped
actions:
- type: start
mode:
type: periodic
schedule: "cron(0 8 * * MON-FRI *)" # Start at 8am weekdaysFor environments that should not survive beyond a fixed window regardless of business hours, adding a TTL tag at creation time and a Cloud Custodian policy that terminates resources past their TTL closes the loop. A GitHub Actions step that writes a ttl tag with an expiry date as part of the environment creation workflow means the policy has everything it needs to enforce cleanup automatically.
Utilisation review: the data that is already there
Overprovisioned instances are one of the most recoverable sources of cloud waste because the data required to identify them already exists in every cloud provider's native tooling. On AWS, Cost Explorer's right-sizing recommendations and Trusted Advisor both flag instances running below 20% average CPU utilisation. On GCP, the Recommender API surfaces idle VM recommendations. On Azure, Azure Advisor provides equivalent right-sizing recommendations.
Pulling this data programmatically and acting on it as part of a regular review cycle rather than waiting for a finance conversation is straightforward with the AWS CLI:
# Pull right-sizing recommendations from Cost Explorer
aws ce get-right-sizing-recommendation \
--service EC2 \
--configuration '{"RecommendationTarget": "SAME_INSTANCE_FAMILY", "BenefitsConsidered": true}' \
--query 'RightsizingRecommendations[?RightsizingType==`Terminate`].[CurrentInstance.ResourceId, CurrentInstance.MonthlyCost]' \
--output tableRight-sizing 100 overprovisioned instances typically saves $75,000 per month according to FinOps benchmarks. Most teams have access to this data in their cloud console and have never run the query.
LLM token costs: the new variable nobody is tracking
If the product makes calls to hosted LLM APIs, inference costs are now a variable that can change dramatically with product usage patterns. One new feature that chains multiple model calls together can multiply inference spend before it surfaces in a budget review. The teams managing this well are tracking token consumption per feature using structured logging, setting per-environment spend limits via the API provider's rate limiting, and surfacing cost metrics alongside latency and error rate in the same dashboards.
The practical implementation is adding token usage to the observability pipeline. Most LLM provider SDKs return token counts in the response object. Logging these as structured fields and aggregating them per service, feature, and environment in Grafana alongside the infrastructure cost metrics gives a complete picture of where spend is going across both compute and inference:
import logging
import json
def call_llm(prompt: str, feature_name: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
# Structured log for cost attribution
logging.info(json.dumps({
"feature": feature_name,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"total_tokens": response.usage.input_tokens + response.usage.output_tokens,
"model": response.model
}))
return response.content[0].textWith this in the logging pipeline, a Grafana dashboard can show token consumption per feature over time, making it possible to catch a workflow that is consuming tokens at an unexpected rate before it appears as a billing surprise.
Budget alerts that actually work
Native budget alerts in AWS, GCP, and Azure notify after a threshold is crossed, which means the spend has already happened by the time the alert fires. The more useful configuration is anomaly detection alerts that fire when spend deviates from the expected pattern for that day of the week and time of day, not just when a fixed threshold is exceeded.
AWS Cost Anomaly Detection is free and takes five minutes to configure. It uses machine learning to establish a baseline for each service, linked account, or cost category, and alerts when spend deviates from that baseline by a configurable threshold. Setting a minimum anomaly amount of $100 and a threshold of 20% above expected spend for each major service catches the GPU instance left running and the ephemeral environment that was never cleaned up before they compound across a billing cycle.
From where I see it working in teams at Frigga Cloud Labs, the shift from reactive to proactive cloud cost management is not primarily a tooling problem. It is a visibility problem. The spend data exists. The tagging that would make it attributable does not. The TTL policies that would clean up non-production environments have not been written. The utilisation recommendations that would right-size the compute have not been acted on. The tooling to do all of this is either free in the cloud provider or open source. The work is configuring it once and building it into the deployment pipeline so it stays enforced as the system grows.
Author note
Ayesha Siddiqua & Mohan Gopi
Mohan is an Associate DevOps Engineer at Frigga Cloud Labs. He manages infrastructure across AWS, GCP, and Azure, deploys through GitHub Actions, and focuses on proactive resilience: building the feedback loops that keep systems stable, efficient, and improvable over time. This blog comes from his work on cost governance across multi-cloud environments, specifically the gap between a cloud bill that arrives as a single number and the attribution layer that makes it actionable.
I work with founding teams and CTOs through Frigga Cloud Labs, a DevOps consultancy built specifically for growing startups, and the technical perspective in this blog is Mohan's hands-on experience working inside these systems.
Let's connect on LinkedIn → Ayesha | Mohan Gopi