How to optimize Datadog costs

DatadogObservabilityCost Optimization

Datadog is excellent, and it gets expensive faster than almost any other line on an engineering budget. The pricing is multi-dimensional — hosts, custom metrics, indexed logs, APM spans, synthetics, and more — so cost creeps in from several directions at once. A bill that looked fine last quarter can double without anyone shipping a deliberate “spend more on Datadog” decision.

The fix is the same as with Kubernetes costs: understand which dimension is driving the bill, then attack it directly. Here is where the money usually hides.

1. Custom metrics: watch cardinality

Custom metrics are billed per unique combination of metric name and tag values — the cardinality. One innocent-looking tag with high uniqueness (a user ID, request ID, container ID, or full URL) can multiply a single metric into hundreds of thousands of billable timeseries.

  • Audit your top custom metrics by volume; the worst offenders are almost always a high-cardinality tag.
  • Drop or aggregate away unbounded tags (IDs, raw paths, ephemeral container names).
  • Use metrics without limits controls to keep only the tag combinations you actually query.

This is the most common cause of a runaway Datadog bill, and usually the fastest to fix.

2. Logs: ingest cheap, index selectively

Datadog separates ingestion from indexing, and indexing is the expensive part. You do not need every log searchable in real time.

  • Use filters and exclusion rules so only logs you actually query get indexed.
  • Send the rest to archive (cheap object storage) and rehydrate on demand when you investigate.
  • Sample high-volume, low-value logs (health checks, debug chatter) before they ever cost you.
  • Set realistic retention — 15 days indexed is plenty for most logs; keep the long tail in the archive.

3. Pay for the hosts and traces you mean to

  • Host count is billed at the high-water mark of concurrent hosts. Autoscaling that spins up large numbers of short-lived nodes can inflate this — and it compounds with the Kubernetes over-provisioning covered in the K8s cost guide.
  • APM is billed on instrumented hosts and ingested spans. Trace the services that matter and sample the high-throughput ones — you rarely need 100% of traces to find problems.
  • Turn off agent integrations and features you are not using; each can carry its own metric and span volume.

4. Kill the silent extras

  • Synthetics (especially browser tests on tight intervals) add up — widen intervals where minute-level resolution is not needed.
  • Profiling, RUM, CI Visibility, DBM are each separate SKUs; make sure every one you pay for is actually used.
  • Remove dashboards and monitors that quietly pull expensive custom metrics nobody reads.

5. Make cost visible so it stays down

Optimization that isn’t monitored drifts straight back up.

  • Use Datadog’s usage and cost dashboards (and usage attribution by team/tag) to see who is driving spend.
  • Alert on month-over-month growth in custom metrics, indexed logs and host count.
  • Make a tag-hygiene and sampling review part of the regular cadence, not a one-off panic when the invoice lands.

The pattern

Across every dimension the move is the same: default to cheap (ingest, archive, sample), pay for depth only where you actually query it. Most teams can cut their observability bill substantially without losing any visibility they were really using.

Observability and infrastructure costs tend to bloat together — the same over-provisioned, chatty, high-cardinality systems run up both. If your Datadog or cloud bill has outgrown the value you get from it, get in touch and I’ll help you find the waste and a plan to bring it down.

← All articles