There is a moment that happens at a predictable point in a startup's growth. An engineer leaves the company. They were the person who set up most of the AWS infrastructure eighteen months ago. Nobody asked them to document it, because everyone assumed it would be cleaned up and formalised later. Later never arrived. Now there is a VPC configured in a way nobody fully understands, security group rules that look like they were added for a reason that no longer exists, IAM roles with permissions that are probably broader than they need to be, and a support bill for a service nobody is certain is still in use.
From where I sit at Frigga Cloud Labs, this is one of the most common conversations we have with growing startups. Not because the teams are careless, but because infrastructure built through console clicks accumulates invisibly. Nobody makes a single bad decision. Everyone makes individually reasonable decisions under time pressure, and the aggregate is an infrastructure that exists only in one person's memory and in the AWS console. When that person leaves, or when a compliance questionnaire arrives, or when a security audit starts asking questions, the conversation gets expensive.
What infrastructure as code actually is, and why it matters beyond the technical argument
The console click problem is a documentation problem
Infrastructure as Code means defining every cloud resource, servers, databases, networking rules, IAM policies, load balancers, in code that lives in version control alongside your application code. When someone needs to know why a security group allows traffic on a specific port, the answer is in a commit message and a pull request review, not in a support ticket to a former employee. When someone needs to spin up an identical staging environment, the command runs in minutes from the same code. When something breaks, the history of every infrastructure change is auditable without asking anyone.
The business case for this is not about automation elegance. It is about what security researchers consistently identify as one of the most common entry points for attackers in cloud environments: shadow infrastructure and unmanaged resources. Resources created through console clicks, without governance, without tagging, without version control, accumulate without oversight. They do not appear in audit reports because nobody knows they exist. They carry permissions set at creation that nobody has reviewed since. And they do not get decommissioned because there is no record of why they were created.
The "we will clean it up later" assumption is the most expensive decision most startups never make
Every team I have watched make this assumption has eventually paid to unwind it. The cost of retrofitting IaC onto infrastructure that was built manually is not just the engineering hours. It is the risk surface that exists in the meantime. Every console-created resource is a resource without a review process. Every IAM role created manually is a role whose permissions have never been challenged in a pull request. Every security group created for a specific purpose and never revisited is a potential exposure that exists in no documentation and sits on no one's remediation list.
Research across engineering teams finds that manual console changes, race conditions from concurrent applies, and state file mismanagement are responsible for roughly 23% of production infrastructure incidents in teams without proper infrastructure management. The incidents themselves are less interesting than what they reveal: unmanaged infrastructure does not stay static. It drifts, accumulates, and eventually surprises the team that built it.
The three tools worth knowing, and which to start with
Terraform: the most widely used, the most familiar to hire for
Terraform is the incumbent. It holds over 32% of the IaC market in 2026, with over 4,800 providers covering AWS, GCP, Azure, and most major SaaS platforms. The workflow is declarative: you describe what the infrastructure should look like, Terraform compares that to the current state, and shows you a plan before making any changes. The language, HCL, is learnable in days and has more community resources, modules, and answered questions than any other IaC option.
In 2023, HashiCorp changed Terraform's licence from open-source to the Business Source License, which restricts certain commercial uses. For most startups using Terraform internally, this change does not materially affect usage. For companies building products or managed services that use Terraform as a component, the licence deserves a legal review. If there is any doubt, OpenTofu resolves it.
OpenTofu: the open-source alternative with identical syntax
OpenTofu is the community-governed fork of Terraform, maintained by the Linux Foundation and licensed under the Mozilla Public License 2.0. It has seen 300% annual growth, reaching 9.8 million downloads, and joined the CNCF as a sandbox project in April 2025. For a team starting fresh with no existing Terraform investment, the choice between Terraform and OpenTofu is primarily one of governance preference. The syntax is identical. The provider ecosystem is compatible. The switch from Terraform to OpenTofu for most teams is a one-line change in the CI pipeline.
OpenTofu is the right starting point for teams where open-source licence certainty matters, either for legal reasons or because the organisation has a policy preference for community-governed tooling. The feature sets are diverging slightly as both projects evolve independently, but for the use cases relevant to a startup, they are functionally interchangeable.
Pulumi: the choice for teams that want infrastructure in real programming languages
Pulumi takes a different philosophical approach. Instead of HCL, infrastructure is written in TypeScript, Python, Go, or Java. This means engineers can use the same testing frameworks, the same package managers, the same code review conventions, and the same conditional logic they already use for application code. For teams with complex infrastructure requirements, a large number of environments with different configurations, or a strong preference for applying software engineering practices to infrastructure, Pulumi produces more maintainable code at scale.
The trade-off is a smaller community, fewer readily available modules compared to the Terraform registry, and a meaningful rewrite effort if a team ever wants to move away from it. Hiring is also a consideration: "Terraform experience" generates three times more candidates than "Pulumi" on job platforms in 2026. For a startup starting from scratch with a software-engineering-first culture, Pulumi is worth serious consideration. For a startup that wants to hire broadly and leverage a mature ecosystem quickly, Terraform or OpenTofu is the safer path.
The security and compliance debt that accumulates when IaC is deferred
Every unmanaged resource is a question with no answer
The compliance conversation is the one that tends to force the issue for startups, often at an inconvenient moment. An enterprise customer sends a security questionnaire. A SOC 2 audit begins. An investor asks for an infrastructure review as part of due diligence. In each of these scenarios, the team needs to be able to answer: what resources are running in your environment, what permissions do they have, and how do you know those permissions are still appropriate? Console-managed infrastructure cannot answer these questions reliably, because there is no audit trail and no review process.
Infrastructure defined in code answers these questions automatically. Every change is a commit. Every resource has a documented reason for existing. IAM policies are reviewed in pull requests before they are applied. Security group rules are version-controlled. Nothing is created without a record of who created it, when, and why. Continuous compliance in cloud environments requires programmatic governance: point-in-time audits and manual evidence collection cannot keep up with continuously changing infrastructure. IaC is the foundation that makes continuous compliance tractable.
Configuration drift is invisible until it is expensive
Configuration drift is what happens when the actual state of the infrastructure diverges from what anyone believes it to be. A security group gets a rule added manually to unblock a developer during an incident. The rule never gets removed. A database gets its backup retention period changed through the console during a compliance review. That change never gets reflected in the IaC. An IAM role gets expanded permissions to solve an urgent problem. The expansion never gets reviewed or reduced.
Over months, the accumulation of these small drifts creates an infrastructure that no longer matches the version in anyone's head or in any documentation. The team believes their infrastructure looks one way. The AWS console shows something different. When an incident or an audit reveals the gap, the work of reconciling it is significant, and the exposure that existed in the meantime was real.
Infrastructure as Code does not eliminate mistakes. It makes mistakes visible, reversible, and reviewable before they reach production. That is worth more than the hours saved on provisioning.
Starting from where you are, not from scratch
The first service is easier than the retrofit
The most practical thing a team can do today, regardless of how much infrastructure already exists in the console, is to commit to IaC for everything that gets created from this point forward. Not a retroactive project to import everything at once. Not a migration sprint that competes with the product roadmap. Just a decision that the next service, the next database, the next IAM role, gets defined in code before it gets created in the cloud.
The habit is the important thing. The discipline of writing the Terraform or OpenTofu definition before clicking the console button is the behaviour change that prevents the problem from compounding further. The existing console infrastructure can be imported incrementally, prioritised by security sensitivity, as bandwidth allows. Starting with the IAM roles and security groups, the resources with the highest compliance and security relevance, is a reasonable first priority for the import work.
The one mistake to avoid at the start
The most consistent failure mode observed in teams adopting IaC for the first time is storing the state file locally rather than remotely. The state file is Terraform's or OpenTofu's record of what it has created and what the current configuration looks like. If it lives on one engineer's laptop, the team cannot collaborate safely, concurrent changes create conflicts, and a lost laptop means a lost record of the infrastructure. Remote state with locking, in an S3 bucket with DynamoDB locking for AWS environments, is a prerequisite for any team use. It takes fifteen minutes to set up and prevents a category of problem that consistently affects teams who skip it.
The case for Infrastructure as Code is not primarily about efficiency. Teams that provision infrastructure through code are not dramatically faster than teams that use the console, at the individual provisioning level. What they have is something more valuable: a record of every decision, a process for reviewing those decisions before they take effect, and a foundation that makes compliance, security audits, and team transitions orders of magnitude less painful. The teams that I have watched build this habit from the beginning never look back and wish they had waited. The ones that defer it always underestimate the cost of the catch-up.

