# Datadog vs New Relic vs Grafana without the regret

Source: https://www.techinterview.org/post/3233475444/datadog-vs-new-relic-vs-grafana/
Updated: 2026-07-01 · techinterview.org

Three teams running nearly identical workloads can end up paying $6,000, $20,000, and $90,000 a year for monitoring, and the gap usually has little to do with how big their infrastructure is. It comes down to which tool they picked and, more than anything, how that tool decides to bill them. Observability is one of the few line items in an engineering budget that can quietly triple while your traffic stays flat, because the pricing is tied to data you generate almost by accident.

So before comparing dashboards and feature checklists, look at the meter. The meter is where the regret comes from.

## How each one decides what to charge you

Datadog charges along many axes at once. You pay per host for infrastructure monitoring, then again per host for APM, then per gigabyte to ingest logs, then again to index those logs so you can actually search them, then separately for custom metrics, synthetics, real user monitoring, and so on. Each product is reasonably priced on its own. The bill is the sum of a dozen of them, and the custom-metrics and indexed-log lines are the ones that surprise people, because they scale with cardinality and volume rather than with anything you consciously provisioned.

New Relic took the opposite bet years ago and mostly stuck with it. You pay per gigabyte of data ingested (with a free monthly allowance), plus a per-user fee that depends on edition. There is no per-host APM tax. For a shop with a lot of short-lived containers and a small team of engineers who look at the data, that math is friendlier, because users grow slowly even when your host count is bouncing around with autoscaling.

Grafana is two different products wearing one name. Grafana Cloud is a hosted, usage-based service priced per metric series, per gigabyte of logs and traces, and per active user, with a genuinely usable free tier. The open-source LGTM stack (Loki for logs, Grafana for dashboards, Tempo for traces, Mimir for metrics) is free to run forever under an Apache license. That word free is doing a lot of hiding, and I will come back to it.

Exact numbers move every year and vary by contract, so treat any figure you read, including the ones below, as a starting point to confirm on the current pricing pages and with a real quote for your own volume.

|   | Datadog | New Relic | Grafana |
| --- | --- | --- | --- |
| Core billing axis | Per host, per product, plus per-GB logs and custom metrics | Per GB ingested plus per user | Per series / GB / user (Cloud) or infra + labor (self-hosted) |
| Roughly cheapest when | You want everything in one polished place and can afford it | Hosts churn a lot, head count is small | You have SRE muscle, or high volume and a tight budget |
| Where it bites | Custom metrics, indexed logs, multi-product compounding | High-ingest, large-team scenarios | Operational toil on the open-source stack |
| Lock-in feel | High: dashboards, monitors, and muscle memory live there | Medium: query language and agents | Low: built on Prometheus and OpenTelemetry conventions |
| Free tier | Thin (limited hosts, short retention) | Generous ingest allowance, one free user | Real and useful on Cloud; unlimited self-hosted |

## Where Datadog earns the premium

People love to dunk on Datadog's invoice, and the bills can be genuinely absurd at scale. But there is a reason finance keeps approving it. The product is cohesive in a way the others struggle to match. A trace links to the host metrics, which link to the log lines, which link to the deploy that introduced the regression, and you click through all of it without leaving the page or stitching together correlation IDs by hand. For an on-call engineer at 3 a.m., that continuity is worth real money.

The breadth is the other half. Network performance monitoring, security signals, RUM, session replay, database query insights, CI visibility. You can fold a half-dozen vendors into one pane, and the integrations almost always work on the first try. If your org is large enough that engineering time costs far more than the tooling, Datadog often wins on total cost even though it loses badly on the per-unit sticker.

The trap is that the same breadth runs up the bill. Every team turns on one more product, every service emits a few more custom metrics, and nobody owns the aggregate. The fix is boring governance: cap custom metrics, sample high-volume traces, and route debug logs to cheap storage instead of indexing them. Teams that do this keep Datadog affordable. Teams that don't end up writing a postmortem about the invoice.

## The case for New Relic's per-user bet

New Relic reads best for a team that is small relative to its infrastructure. Picture a startup running a few hundred containers across a handful of services, with five engineers who ever open a monitoring tool. Under per-host pricing, the container churn punishes you. Under New Relic's per-user model, you pay for those five people plus whatever data you push past the free allowance, and the bill stays legible as you scale.

The platform is strongest where it started, in application performance. Transaction traces, error analytics, and the deployment markers that tie a latency spike to a specific release are mature and pleasant to use. The friction shows up in two places. Ingest costs can climb fast if your apps are chatty, so you end up tuning log and metric volume to control spend, which is real work. And the full-platform user fee is steep enough that you think twice before handing access to a sixth or tenth person, which quietly works against the culture of everyone-can-see-the-graphs that good observability depends on.

## Grafana, and the cost that never shows up on an invoice

Self-hosted LGTM is the most misunderstood option here. The license is free, the architecture is sound, and the cost model is fundamentally different: Loki indexes only labels rather than full log text, so you store the bulk of your logs in cheap object storage like S3 instead of paying per gigabyte to index them. Mimir gives you Prometheus-compatible metrics with no per-custom-metric charge, so the cardinality that would wreck a Datadog bill becomes a question of how much compute you provision. At high volume, the savings against a commercial tool can be an order of magnitude.

The catch is that someone has to run it. Mimir, Loki, and Tempo are distributed systems with their own scaling cliffs, their own storage tuning, and their own 3 a.m. failure modes. Loki in particular has a reputation for being fiddly to operate well at scale. An SRE who keeps that stack healthy is a six-figure salary, and the compute and storage are not nothing either. So the honest framing is a trade: you swap a predictable vendor invoice for a payroll line and an operational burden. That trade pays off for organizations with real platform teams and serious data volume. For a five-person startup, paying people to babysit observability infrastructure is almost always the wrong use of the people.

Grafana Cloud is the middle path, and it is underrated. You get the same OSS-flavored stack and query languages, hosted by the people who write it, on usage-based pricing that tends to land below the commercial heavyweights for comparable signal volume. You keep the low lock-in, since dashboards and alerts are built on Prometheus and OpenTelemetry conventions you could carry elsewhere, without staffing a team to run Mimir yourself.

## The cardinality trap nobody warns you about

Whatever you pick, the thing most likely to blow your budget is high-cardinality metrics. Tag a metric with a user ID, a request ID, or a raw URL path, and you have just created millions of distinct time series out of one innocent-looking line of instrumentation. On per-metric pricing this lands as a five-figure surprise. On a self-hosted backend it lands as Mimir falling over. The defense is the same everywhere: keep label values bounded, push high-cardinality detail into traces and logs where it belongs, and put an alert on your own metric growth the same way you alert on error rates.

## Picking by what you actually have

If you are a small team drowning in containers and you want predictable spend without running infrastructure, New Relic's per-user model is the easiest one to live with, and the free tier lets you start tonight. If engineering time is your scarcest resource and the budget can absorb it, Datadog buys you a coherence the others can't quite match, as long as you put guardrails on custom metrics and log indexing from day one. If you have a platform team and the volume to justify it, self-hosted LGTM is the cheapest serious option by a wide margin, with Grafana Cloud as the version that gives you most of the savings and the low lock-in without the staffing.

The worst outcome isn't picking the pricier tool. It's picking any of them, letting every team flip on every feature, and discovering the meter only when finance forwards you the renewal.