Every few months I end up in a conversation with an engineering leader who is either paying for something they are not using, not paying for something that would save them real time, or running a tool that made sense at ten engineers but has become a liability at forty. This list comes from those conversations. It is not exhaustive, and it is deliberately opinionated.
One framing I keep coming back to: the right tool at the wrong stage is still the wrong decision. So for each of these, I will say not just what it does, but when it stops making sense to stay on the free tier, and what has actually changed in the market around it recently.
Terraform (and the conversation you now have to have about OpenTofu)
Terraform is still the default answer when someone asks how to manage infrastructure as code. Nearly every provider supports it. The community is enormous. The documentation is deep. It does the job.
But in August 2023, HashiCorp switched Terraform's license from the Mozilla Public License to the Business Source License, and that decision still matters to you in 2026. The BSL is not open source. If you are building a SaaS product and Terraform is part of your infrastructure automation layer, your legal team needs to look at whether your use case sits inside or outside the new license terms. For most startups using Terraform internally to provision their own cloud resources, nothing changes in practice. But the uncertainty is real, and HashiCorp's own FAQ acknowledges the ambiguity.
The community's response was OpenTofu, a fork now stewarded by the Linux Foundation with backing from Gruntwork, Harness, Spacelift, and others. It uses the same HCL syntax, supports the same provider ecosystem, and is a drop-in replacement for most teams. OpenTofu 1.11.0 is in active development as of late 2025. If open-source licensing matters to your organization, or you are building something where vendor lock-in is a real risk, it is worth evaluating seriously.
GitHub Actions
If your team is already on GitHub, the argument for running a separate CI system has become very thin. GitHub Actions is native to the workflow, triggers on any repository event, and has a marketplace that covers most standard use cases without writing much yourself.
The part that catches teams off guard is how quickly the free minutes disappear once Docker is involved. Private repositories get 2,000 minutes per month on the free tier. A single Docker image build on a mid-sized application can run 15 to 20 minutes. A team of ten pushing daily across a handful of services will blow through that allowance in under two weeks. The moment that happens, teams either start throttling builds or move to self-hosted runners, which are free but require someone to manage the underlying compute.
GitHub also introduced larger runner options in 2024, including GPU-enabled runners for AI workloads. If your team is doing model fine-tuning or inference testing in CI, that is now a native option rather than a workaround.
ArgoCD
ArgoCD is what GitOps looks like in practice. It watches a Git repository, compares it to what is running in your Kubernetes cluster, and corrects any divergence automatically. The result is that your cluster's state always reflects what is in Git, deployments become commits, and rollbacks become reverts.
Built by Intuit and open-sourced, it now runs in production at Google, Tesla, Goldman Sachs, and CERN. The CNCF 2025 survey placed it in nearly 60% of Kubernetes clusters globally, with a Net Promoter Score of 79, which is high for infrastructure tooling. That number reflects something real: engineers who use it tend to want to keep using it.
What I keep coming back to with ArgoCD is the visibility. You can open a browser and see every application's sync status, resource health, and what has drifted from the intended state. For a team that is still building confidence in declarative infrastructure, that visibility replaces a lot of anxiety. The recommendation I give: run it in manual sync mode for the first few weeks. Understand the mental model before you turn on automated self-healing. Once that discipline is there, the tool does most of the work.
Datadog
Datadog is genuinely good at what it does. Metrics, logs, traces, dashboards, APM, all in one place with 800 integrations and a UI that other observability tools are still trying to match. When engineering teams adopt it, they tend to like it. The problem surfaces on the invoice.
"Datadog bill shock" has become common enough to have its own Reddit threads and its own line item in FinOps discussions. A mid-sized SaaS team generating 100GB of logs per day is looking at roughly $107,000 per year in Datadog costs, and that figure excludes APM, RUM, synthetic monitoring, and overages. The structural issue is that Datadog charges across multiple dimensions simultaneously: per host, per GB ingested, per GB indexed, per million spans, per session. None of those line items looks unreasonable in isolation. Together, they compound aggressively as your infrastructure grows.
The trap most teams fall into is enabling everything on day one, instrumented at full fidelity, with no log exclusion filters and no sampling on traces. The bill for month two arrives and the conversation about cost containment has to happen reactively instead of proactively. If you adopt Datadog, decide on day one what you are and are not indexing, and set ingestion caps at the pipeline level before anything reaches Datadog's meters.
Doppler
Secrets management is one of those problems that every startup has and very few handle well. API keys in environment variables. Database credentials in Slack messages. Different values in dev, staging, and production with no single source of truth. Doppler exists to solve all of that without requiring you to stand up and maintain your own secrets infrastructure, which is what HashiCorp Vault requires and why most small teams never actually do it properly.
The developer experience is the real selling point: a CLI that injects secrets as environment variables at runtime with a single command, a dashboard that shows every environment and every service in one place, and native integrations with GitHub Actions, Kubernetes, Vercel, and most CI systems. Secrets sync automatically across environments when they change, which eliminates the manual copy-paste that causes the category of incidents nobody ever wants to write a postmortem about.
Two things worth knowing that most comparisons skip. First, Doppler recently reduced the capacity of its free tier, making it less useful for teams that were previously using it as a long-term free solution. Second, Doppler is cloud-only with no self-hosted option, which matters if you have data sovereignty requirements or operate in a regulated environment. For teams where that is a constraint, Infisical is an open-source alternative with a self-hosted deployment path.
Sentry
The question Sentry answers is one that every engineering team eventually needs answered: something broke in production, and what exactly broke, where in the code, for which users, and what was happening in the application right before it happened. Without a tool like Sentry, answering that question involves log grep-ing, guesswork, and hoping someone reproduced the error locally. With it, you get the stack trace, the user context, the sequence of actions that led to the failure, and usually a direct line to the relevant file and function.
Sentry offers a startup program with $50,000 in credits for qualifying early-stage companies, which is worth checking before you pay anything. The free Developer plan covers 5,000 errors per month, 50 session replays, and one user, which is enough for a very early product but not for a team shipping to real users regularly. The Team plan at $26 per month covers 50,000 errors and unlimited users, which is where most growing startups sit for a long time.
One operational detail that teams learn the hard way: Sentry's pricing is per event, not per unique error. If the same error fires in a loop because a background job is retrying a failed API call, every occurrence counts against your monthly quota. A bad deployment can consume an entire month's allowance in under an hour if error rates spike and nobody has set a quota notification. Configure that notification before you need it.
Grafana
Grafana Cloud has quietly become the most credible answer to the question "what if we want what Datadog provides but cannot justify what Datadog costs." The free tier is a permanent free tier, not a trial: 10,000 active metric series, 50GB of logs, 50GB of traces, and 500 active users per month at no cost. That covers a meaningful portion of early-stage infrastructure with no time limit and no credit card required.
The trade-off versus Datadog is real and worth being honest about. Grafana's integrations are fewer, the out-of-the-box experience requires more configuration, and the UI has historically required more expertise to navigate. But the underlying stack, Prometheus for metrics, Loki for logs, Tempo for traces, is genuinely battle-tested and used at scale by organizations that have no interest in paying Datadog prices. OpenTelemetry compatibility across all three means the instrumentation you build today is not locked to any vendor.
The pattern I see work well: teams start on Grafana Cloud's free tier while they are small, stay there for longer than they expect, and either continue on Grafana Pro when they exceed the free limits or use what they learned to evaluate whether Datadog's additional polish is worth the premium at their current scale. Starting with Grafana does not close the door to Datadog later. Starting with Datadog and trying to move away from it is considerably harder.
The most common mistake I see is not picking the wrong tool. It is picking the right tool at the wrong time, without understanding what it costs when you grow into it. The second most common mistake is waiting too long because the free tier is comfortable, and then having to retrofit good practices under pressure. Neither is catastrophic. Both are avoidable with a bit of deliberate thinking early on.

