This is not a blog post you read once and move on from. It is a working reference. The stack below covers every layer of the cloud-native foundation a growing startup should be building toward in 2026, with tool choices grounded in what engineering teams are actually running in production. Each layer has a primary recommendation and an honest alternative. The primary is what most teams should start with. The alternative exists because budget, team expertise, and compliance constraints are real.
The layers are ordered by urgency, not by alphabetical convenience. Compute before observability. Secrets before security tooling. CI/CD before you have ten engineers. The sequence matters as much as the choices.
Layer one. Compute.
Primary: AWS ECS / Fargate
Nothing else in the stack works without something to run your code on. AWS ECS with Fargate removes the need to manage a Kubernetes control plane at early stage. You define a task, Fargate provisions the compute, and services scale automatically based on load. You pay for what runs, nothing more. The native integrations with IAM, CloudWatch, and ALB mean the surrounding AWS ecosystem clicks together without custom wiring. For a team that is already on AWS, this is the path of least resistance to production-grade container infrastructure.
Alternative: Google Cloud Run
Google Cloud Run goes further toward zero infrastructure management. Push a container image and Google handles provisioning, scaling, and routing. Billing is per request, which is genuinely cheaper for products with unpredictable or bursty traffic patterns. If your team is not yet committed to a cloud provider and your workload is stateless, Cloud Run's model is worth taking seriously before defaulting to AWS.
Layer two. Networking.
Primary: AWS ALB
AWS Application Load Balancer handles Layer 7 routing, HTTPS termination, health checks, and WAF rules in a managed service that requires almost no operational attention once configured. For startups running on AWS, it integrates directly with ECS services and scales automatically with traffic. It is not glamorous infrastructure. It is infrastructure that works and stays out of the way.
Alternative: Cloudflare
Cloudflare covers CDN, DDoS protection, DNS, and a reverse proxy layer in one product, with a free tier that is genuinely useful. For teams that want to abstract networking away from their cloud provider entirely, or that need global CDN performance from the first user, Cloudflare earns its place at the front of the stack. Many teams run both: Cloudflare in front for DNS and DDoS protection, ALB behind it for routing to services.
Layer three. CI/CD.
Primary: GitHub Actions
If your code is on GitHub, GitHub Actions is the right starting point. Workflows live in the repository, trigger on any GitHub event, and require no separate service to configure or maintain. The marketplace has over 10,000 community-built actions covering deployments to AWS, security scanning, Slack notifications, and most other standard pipeline needs. GitHub reduced hosted runner prices by 39% from January 2026, removing the main cost argument for choosing a dedicated CI platform at early stage. The free tier gives private repositories 2,000 minutes per month. The Team plan at $4 per user per month raises that to 3,000.
Alternative: CircleCI
CircleCI is the right move when build speed has become a measurable bottleneck. Docker Layer Caching is more mature than GitHub Actions. Resource class granularity is higher. Parallelism is easier to configure for large test suites. If pipelines are regularly running past 20 to 30 minutes and engineers are losing meaningful time waiting on builds, the performance difference is real and worth paying for. Do not make the switch until that problem actually exists.
Layer four. Infrastructure as code.
Primary: Terraform / OpenTofu
Every resource provisioned manually is a resource that lives outside version control. Terraform has been the standard for cloud infrastructure provisioning for years, and the workflow is familiar to most DevOps engineers. OpenTofu is the open-source fork that emerged after HashiCorp changed Terraform's license to the Business Source License in 2023. For startups using Terraform internally, the license change does not materially affect usage. For organizations with legal or open-source policy concerns, OpenTofu is the safe path. Either way: write every resource as code, commit it to Git, and review it like product code. Starting this habit on the first service costs almost nothing. Retrofitting it onto the fifth costs weeks.
Alternative: Pulumi
Pulumi lets engineers write infrastructure in TypeScript, Python, Go, or Java rather than HCL. For teams where the engineering culture is strongly software-first and HCL feels like a foreign language, Pulumi reduces the friction of getting engineers to care about infrastructure code. The tradeoff is a smaller community and fewer readily available examples compared to Terraform.
Layer five. Secrets management.
Primary: Doppler
The moment a second person needs access to production credentials, you need a secrets manager. Doppler centralizes all environment variables and secrets across environments, injects them at runtime via CLI or SDK, and keeps them out of your codebase entirely. The free tier covers three users. Setup takes an afternoon. The cost of not doing this is a committed API key in version history that you discover six months later, usually because something breaks.
Alternative: Infisical
Infisical is the open-source alternative for teams that need full control over where secrets live, including self-hosted deployments. It covers the same core use cases as Doppler and integrates with GitHub Actions, Kubernetes, and most major cloud providers. For teams with compliance requirements that prohibit secrets touching third-party infrastructure, Infisical is often the only acceptable answer.
Layer six. Observability.
Primary: Grafana Cloud
Grafana Cloud's permanent free tier includes 10,000 active metric series, 50GB of logs, and 50GB of traces per month. It unifies metrics via Prometheus-compatible collection, logs via Loki, and traces via Tempo in a single interface. The instrumentation standard underneath all of it is OpenTelemetry, now confirmed as the vendor-neutral standard across the CNCF ecosystem. Instrument your services with OpenTelemetry SDKs once and you can point the exporter at any backend: Grafana Cloud today, Datadog later, self-hosted Grafana if that ever makes sense. Do not build observability around a proprietary agent that cannot migrate.
Alternative: Datadog
Datadog is the mature commercial choice when a unified platform with enterprise support, out-of-the-box dashboards, and AI-driven anomaly detection justifies the cost. For a startup growing past Series A, where observability needs expand across infrastructure, APM, log management, and security in a way that outgrows Grafana Cloud's free tier and the team lacks bandwidth to manage a self-hosted stack, Datadog is where most teams land. Go in knowing that costs scale quickly in containerized environments and set budget alerts from day one.
Layer seven. Alerting and on-call.
Primary: PagerDuty
Alerting without an on-call process is noise. An on-call process without alerting is hope. Both need to exist at the same time. PagerDuty handles on-call schedules, escalation policies, and alert routing in a way that makes it possible to run a rotation across a small team without anyone falling through the cracks. The free tier covers basic use for small teams. The answer to "who is on-call this week" should be a one-word answer at any team size.
Alternative: OpsGenie
OpsGenie covers the same core functionality and is often cheaper at small scale, particularly for teams already using Atlassian products like Jira and Confluence where the integration simplifies incident tracking. Note that Atlassian closed OpsGenie to new accounts from June 2025, so new teams should evaluate PagerDuty or alternatives like incident.io for a more modern incident management experience.
Layer eight. Security.
Primary: Snyk
Snyk runs in your CI pipeline on every pull request and flags dependency vulnerabilities, container image issues, and SAST findings before they reach production. It integrates with GitHub Actions in under an hour and requires no security team to operate. The free tier covers open-source vulnerability scanning for small teams. For a startup building toward SOC 2 or enterprise sales where customers send security questionnaires, having automated scanning as part of every deploy is something you want to be able to demonstrate early.
Alternative: Trivy
Trivy is the open-source container and code scanner from Aqua Security. Zero licensing cost, runs anywhere, integrates with GitHub Actions as a community action. It covers OS packages, application dependencies, IaC misconfigurations, and exposed secrets in a single tool. For teams where the Snyk free tier is not sufficient and paid licensing is not yet justified, Trivy covers the most critical scanning use cases without a budget line.
The goal of infrastructure is to disappear into the background so the team can build the product. A stack that requires daily attention to keep running has failed at its job. Every tool on this list was chosen because it can be set up once, maintained with minimal ongoing effort, and grown into without a dedicated platform team. That is the standard a startup infrastructure stack should meet in 2026. You do not need all eight layers from day one. You need them in the right order, at the right moment, before the absence of each one becomes a problem that is harder to fix than it would have been to set up.

