Running infrastructure across AWS, GCP, and Azure with GitHub Actions as the deployment layer means tooling decisions compound fast. Every paid tool you add is a per-seat or per-host cost that scales with the team. The five tools in this post cover observability, metrics, instrumentation, GitOps, and security scanning. None of them cost anything. All of them are running in production at serious engineering organisations globally, and each one directly replaces a category where the paid alternative starts at hundreds or thousands of dollars per month.
1. Prometheus
What it replaces: Datadog infrastructure monitoring, AWS CloudWatch at scale, New Relic infrastructure
Prometheus is the default metrics collection and alerting system for cloud-native infrastructure. It uses a pull model: a scraper running in your cluster fetches metrics from configured targets at defined intervals and stores them as time-series data. The query language, PromQL, handles everything from simple threshold queries to complex rate calculations across multiple dimensions.
The paid alternative most teams reach for first is Datadog infrastructure monitoring, which starts at $15 per host per month. At 20 hosts, that is $300 per month for metrics collection alone, before APM, logs, or any other features. Prometheus costs nothing. The trade-off is operational overhead: you manage the storage, retention, and high-availability yourself. For teams hitting scale limits, Grafana Mimir provides horizontally scalable long-term storage on top of Prometheus using object storage backends.
Getting Prometheus running in a Kubernetes cluster with the Prometheus Operator:
helm repo add prometheus-community \
https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50GiThe kube-prometheus-stack chart installs Prometheus, Alertmanager, and a set of default recording rules and alerts for Kubernetes cluster health in a single command. It also installs Grafana, which covers tool two.
2. Grafana
What it replaces: Datadog dashboards, New Relic, Dynatrace visualisation layer
Grafana is the visualisation layer for the open-source observability stack. It connects to Prometheus for metrics, Loki for logs, and Tempo for traces, providing a unified interface to query and correlate all three signal types. The dashboard ecosystem is one of the largest in the industry: thousands of pre-built dashboards exist for Kubernetes, PostgreSQL, NGINX, Redis, and most common infrastructure components, importable directly from Grafana's dashboard library by ID.
The commercial observability platforms, Datadog, New Relic, Dynatrace, bundle visualisation with their proprietary backends and charge accordingly. Datadog's full-stack observability starts at roughly $27 per host per month and scales with data volume. Grafana open source is free. Grafana Cloud's permanent free tier covers 10,000 active metric series, 50GB of logs, and 50GB of traces per month, which is enough for a meaningful early-stage production workload with no operational overhead.
Importing a Kubernetes cluster dashboard immediately after installation:
# Import dashboard ID 15661 (Kubernetes cluster monitoring)
# via Grafana UI: Dashboards → Import → Enter ID → Load
# Or via API:
curl -X POST \
http://admin:admin@localhost:3000/api/dashboards/import \
-H 'Content-Type: application/json' \
-d '{
"dashboard": {"id": null},
"folderId": 0,
"inputs": [{
"name": "DS_PROMETHEUS",
"type": "datasource",
"pluginId": "prometheus",
"value": "Prometheus"
}],
"overwrite": false
}'Grafana's alerting integrates directly with Prometheus Alertmanager and supports routing to Slack, PagerDuty, webhooks, and email. The same alert rules that fire in Prometheus can be visualised and managed from the Grafana interface without maintaining two separate systems.
3. OpenTelemetry
What it replaces: Datadog APM agent, New Relic APM agent, Dynatrace OneAgent
OpenTelemetry is the CNCF standard for application instrumentation. It provides vendor-neutral SDKs for 12+ languages, auto-instrumentation for common frameworks, and a Collector that receives, processes, and routes telemetry to any backend. The core value proposition: instrument once, send anywhere. Switch from Grafana to Datadog or from Jaeger to Honeycomb without touching application code.
The paid alternative is a proprietary APM agent. Datadog APM starts at $31 per host per month. New Relic APM is usage-based from $0.35 per GB ingested. Both lock your instrumentation to their platform. OTel costs nothing and leaves every backend option open. By 2026 it is the second most active CNCF project after Kubernetes, supported natively by every major observability vendor.
Auto-instrumentation for a Node.js service, zero code changes required:
npm install \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc
// instrumentation.js - load before application code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://otel-collector:4317',
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();Run with node --require ./instrumentation.js app.js. The auto-instrumentation library automatically generates spans for HTTP servers, Express routes, database clients, outbound HTTP calls, and most common Node.js frameworks. The Collector receives on port 4317 and routes to Grafana Tempo for traces, Prometheus for metrics, and Loki for logs, all configured in a single YAML file without touching the application again.
4. ArgoCD
What it replaces: AWS CodeDeploy, Harness CD, Spinnaker managed deployments
ArgoCD is the most widely adopted GitOps controller for Kubernetes, running in nearly 60% of Kubernetes clusters globally according to the 2025 CNCF End User Survey, with 97% of respondents running it in production. It watches a Git repository and continuously reconciles the cluster state to match the desired state defined in Git. Manual kubectl apply changes that diverge from Git get corrected automatically.
The commercial alternatives: Harness CD starts at around $100 per developer per month. AWS CodeDeploy adds per-deployment costs at scale. ArgoCD is free. The web UI shows sync status, resource health, dependency trees, and visual diffs between desired and live state across every application and cluster in a single interface.
Installing ArgoCD and connecting the first application:
kubectl create namespace argocd
kubectl apply -n argocd \
-f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-service
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/your-repo.git
targetRevision: HEAD
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
EOFselfHeal: true enables continuous drift correction. prune: true removes resources from the cluster when they are removed from Git. With both enabled, the cluster becomes a pure reflection of the repository: every change goes through a commit, every rollback is a revert, and the full deployment history is the Git log.
5. Trivy
What it replaces: Snyk Container, Aqua Security, Prisma Cloud container scanning
Trivy is an open-source security scanner maintained by Aqua Security, released under the Apache 2.0 license with no paid tiers, usage limits, or feature restrictions. It scans container images, filesystems, Git repositories, Kubernetes clusters, Terraform, CloudFormation, and Kubernetes manifests from a single binary. It detects OS package vulnerabilities, language-specific dependency vulnerabilities, infrastructure misconfigurations, exposed secrets, and licence compliance issues.
Snyk's advanced features require paid plans starting at several hundred dollars per month for teams. Trivy provides container scanning, dependency scanning, IaC checks, and SBOM generation at zero cost. The trade-off is no automated fix PRs and no centralised dashboard without additional tooling. For most teams, CI pipeline integration is sufficient: scan on every build, fail the build on critical vulnerabilities, output results as SARIF for GitHub Security tab integration.
Trivy in a GitHub Actions workflow:
name: Security Scan
on: [push, pull_request]
jobs:
trivy-scan:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build image
run: docker build -t my-service:${{ github.sha }} .
- name: Scan container image
uses: aquasecurity/trivy-action@master
with:
image-ref: my-service:${{ github.sha }}
format: sarif
output: trivy-results.sarif
severity: CRITICAL,HIGH
exit-code: 1
- name: Upload to GitHub Security tab
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-results.sarif
- name: Scan IaC configs
uses: aquasecurity/trivy-action@master
with:
scan-type: config
scan-ref: ./k8s/
format: table
exit-code: 1
severity: CRITICAL,HIGHThe exit-code: 1 on critical and high findings means a container image with unpatched vulnerabilities cannot be merged to main. The IaC scan step catches Kubernetes manifest misconfigurations in the same pipeline. The Trivy Operator also runs as an in-cluster scanner for continuous monitoring of deployed workloads, not just images at build time.
The total licensing cost for this stack is zero. Prometheus and Grafana for metrics and visualisation. OpenTelemetry for instrumentation with no vendor lock-in. ArgoCD for GitOps. Trivy for security scanning in the pipeline. Paid alternatives for these five categories alone can run to several thousand dollars per month at small team scale. The open-source versions are not stripped-down alternatives: they are what engineering teams at Google, Netflix, Goldman Sachs, and CERN actually run. The operational cost is real, you manage the infrastructure, but for teams that already manage infrastructure as a matter of course, that cost is already accounted for.
Author note
Ayesha Siddiqua & Manjunaathaa
Manjunaathaa is an Associate DevOps Engineer at Frigga Cloud Labs. He works across AWS, GCP, and Azure daily with GitHub Actions as the deployment backbone, and every one of the five tools in this blog is something he actually runs, not something he evaluated once.
This blog came from a straightforward question he kept encountering: why are teams paying for things that are available free, production-grade, and running at Google and Netflix? His focus is Proactive Resilience, and keeping the tooling cost low while keeping the visibility high is a direct part of that.
From where I sit working with early-stage CTOs, the tooling budget conversation comes up early and often, and the answer is almost always sitting in open source waiting to be used.
Let's connect on LinkedIn → Ayesha | Manjunaathaa
