What Is a Web Analytics API? A Practical Guide
Most people think of web analytics as a dashboard you log into. But behind every dashboard — whether you built it yourself or bought it — there’s an API doing the actual work. Understanding how that API works is what separates teams that pull data reactively from teams that build systems that run on their own.
A web analytics API is the programmatic interface that lets you request, send, and manipulate analytics data without touching a UI. You write code; the API returns data. That’s the whole idea. But the details — auth methods, endpoint structures, rate limits, data freshness — vary enough across platforms that it’s worth understanding the fundamentals before you commit to an integration.
This guide covers what a web analytics API actually is, how REST and GraphQL architectures differ in practice, what kinds of endpoints you’ll encounter, how authentication works, and where the real limitations show up. I’ll use Plausible, Matomo, and Umami as concrete examples throughout — they’re open source, well-documented, and representative of how modern analytics APIs are built.

What a Web Analytics API Is — and What It’s Not
An API (Application Programming Interface) is a defined contract between your code and a remote service. You make an HTTP request to a specific URL with specific parameters; the service returns structured data — almost always JSON in modern analytics APIs.
What it’s not is the tracking pixel or script that collects data in the first place. The collection layer and the query layer are separate. Your analytics platform collects events — pageviews, clicks, custom events — and stores them. The API gives you access to that stored data (or lets you pipe events in programmatically). This article focuses on the query/reporting side of that equation.
If you’re wondering how data gets into the analytics pipeline server-side, that’s a separate topic covered in the complete guide to server-side tracking without cookies.
Why Teams Use Web Analytics APIs
There are four distinct use cases, and they call for different approaches.
Custom dashboards
You want analytics data displayed somewhere that isn’t the default platform UI — inside your product, on a TV in the office, in a client portal. The API lets you pull the specific metrics you need and render them however you want. This is the most common starting point.
Automated reporting
Scheduled reports that run without anyone clicking “export” are one of the highest-leverage things a small analytics team can build. You query the API on a schedule, format the output, and send it to Slack, email, or a spreadsheet. For a deeper look at building these pipelines end to end, the automated reporting workflows guide covers the full architecture.
Data pipelines and warehousing
Moving analytics data into a central warehouse — BigQuery, Postgres, Snowflake, whatever you use — so it can be joined with CRM data, revenue data, or product events. The API is your extraction layer. You pull, transform, and load on a schedule.
Event capture
Some analytics APIs aren’t just for reading data — they also accept incoming events. If you’re tracking server-side actions (purchases, form submissions, background jobs) that never touch a browser, you send them directly to the API. Matomo’s tracking API and PostHog’s capture endpoint both work this way.
REST vs GraphQL — What Analytics APIs Actually Use
Most analytics APIs are REST APIs. GraphQL exists and solves real problems, but it hasn’t displaced REST in the analytics space. Here’s how they compare in practice.
| Characteristic | REST | GraphQL |
|---|---|---|
| Endpoint structure | Multiple URLs, one per resource (/stats, /events, /pageviews) |
Single endpoint (/graphql), query in request body |
| Data shaping | Server defines response shape; you filter by parameters | Client specifies exactly which fields to return |
| Over/under-fetching | Common — response may include fields you don’t need | Avoided by design — get exactly what you ask for |
| Caching | HTTP-native (GET requests cache easily) | More complex — POST requests don’t cache by default |
| Learning curve | Low — HTTP verbs + URL patterns | Higher — requires learning query syntax |
| Analytics examples | Plausible, Matomo, Umami, GA4 Data API | PostHog (HogQL), some custom implementations |
For most analytics integrations, REST is simpler and sufficient. The main time GraphQL wins is when you need highly specific field subsets from complex nested data — PostHog’s HogQL endpoint is a good example of SQL-in-GraphQL-style querying that gives you fine-grained control over what comes back.
The underlying HTTP protocol is the same either way — see MDN’s HTTP overview for a solid grounding in how these requests travel across the wire.

Typical Endpoint Types You’ll Encounter
Analytics APIs tend to cluster around a few core endpoint patterns regardless of which platform you’re using.
Aggregate stats / query endpoints
The most common type. You specify a site, a date range, and a set of metrics; you get back aggregated numbers. Plausible’s v2 API is a clean example — a single POST /api/v2/query endpoint handles all reporting queries:
POST https://plausible.io/api/v2/query
Authorization: Bearer your-api-key-here
Content-Type: application/json
{
"site_id": "yoursite.com",
"metrics": ["visitors", "pageviews", "bounce_rate"],
"date_range": "30d",
"dimensions": ["visit:source"]
}
The response comes back as JSON with a results array. Clean, predictable, easy to parse. Matomo takes a different approach — a single unified endpoint with method parameters: index.php?module=API&method=VisitsSummary.get&format=JSON. Same idea, different URL structure.
Pageview and session endpoints
Some platforms expose raw or lightly aggregated pageview/session data for a time window. Umami’s self-hosted API offers GET /api/websites/{websiteId}/stats for summary metrics and separate endpoints for pageview time-series data. These are useful when you need trend data at daily or hourly granularity rather than a single aggregated number.
Event endpoints
Two flavors: event query (retrieve events that match criteria) and event capture (submit new events). If you’re building first-party data tracking and want to capture server-side events without a browser script, the capture endpoint is what you’re looking for.
Real-time endpoints
Most “real-time” analytics APIs aren’t truly streaming — they’re polled. You hit a /realtime or /live endpoint every 30 or 60 seconds and display the latest snapshot. True streaming (WebSockets, Server-Sent Events, webhooks) exists in some event capture flows but is rare on the query side. Build your real-time dashboards around polling unless the platform specifically documents streaming support.
Authentication: How It Works and What to Watch Out For
Auth is where most analytics API integrations first go wrong. The pattern you use depends on the platform and who owns the data.
API keys (most common for analytics)
You generate a key in the platform settings and send it with every request. The delivery mechanism varies:
- Bearer token (HTTP header):
Authorization: Bearer <key>— used by Plausible and PostHog. Preferred: the key never appears in URLs or server logs. - Custom header: Umami Cloud uses
x-umami-api-key: <key>. - Query parameter: Matomo uses
token_auth=<token>in the URL. This works, but means your token appears in web server access logs. Use POST requests with Matomo and pass the token in the request body to avoid this.
The rule that applies everywhere: keep API keys server-side. Never put an analytics API key in client-side JavaScript — it will be extracted and abused. Proxy the request through your own backend if you need to serve analytics data to a browser.
OAuth 2.0
Used when you’re querying data on behalf of a user who owns the analytics account — common with Google’s ecosystem (the GA4 Data API requires OAuth or a service account). OAuth is more complex to implement: you handle authorization flows, token exchange, token refresh. For server-to-server integrations where you control both sides, API keys are almost always simpler. Use OAuth when the platform requires it or when you’re building a multi-tenant app that accesses analytics on behalf of multiple users.
Token from login
Umami’s self-hosted version uses a different model: POST /api/auth/login with username and password returns a bearer token valid for a session. This is practical for internal tooling but adds the overhead of managing session token expiration and re-authentication.
Rate Limits, Sampling, and Other Real-World Constraints
The documentation makes analytics APIs look clean. Production use is messier. Here are the constraints you’ll actually hit.
Rate limits
Every API has them. Plausible’s default is 600 requests per hour. When you exceed the limit, you get a 429 Too Many Requests response. The right handling pattern is exponential backoff with jitter — wait, then retry, then wait longer. Build this into any polling loop from the start; retrofitting it after you’ve hit rate limits in production is painful.
Pagination is the other dimension of rate management. APIs that return large datasets paginate their responses. Always check for pagination tokens or next links in responses and handle multi-page results.
Data sampling
High-traffic properties on some platforms return sampled data in aggregate reports — meaning the API returns a statistically representative subset rather than processing every event. Responses usually include a flag indicating whether sampling was applied. Check for it and surface it in your UI if the accuracy of the number matters to whoever’s reading it.
Open-source self-hosted tools like Matomo and Plausible give you more control here — on your own infrastructure, you can query against unsampled data.
Data freshness and latency
Analytics data is rarely real-time even when the API is called “real-time.” Typical latencies: event capture to reporting availability runs from a few seconds to several hours depending on the platform and the report type. Day-old data showing up in “yesterday” reports is common. Build your integrations with this in mind — don’t alarm on a zero-value API response when the more likely explanation is data hasn’t been processed yet.
Response validation
API responses have schema shifts — fields get renamed, new optional fields appear, nulls show up where you expected a number. Write defensive parsers. Validate the shape of what comes back, handle nulls explicitly, and log unexpected response structures. The analytics data validation patterns you use on your tracking layer apply equally here.
Real-World Use Cases
Let me walk through what this looks like in practice across a few common scenarios I’ve seen across dozens of implementations.
Client reporting portal
An agency runs Matomo self-hosted for client sites. They build a simple Next.js dashboard that queries VisitsSummary.get and Referrers.getAll on a per-client basis and renders the numbers in the client’s brand colors. No login to Matomo required; the client sees their numbers in a clean UI. The API key is stored server-side; the Next.js API route proxies the Matomo query. Total build time: one weekend.
Weekly Slack digest
A SaaS team uses Plausible for their marketing site. A Python script runs every Monday at 8am via cron: it hits POST /api/v2/query with the previous week’s date range, formats the top-5 pages by visitors and the week-over-week change, and posts to a Slack webhook. The whole thing is 80 lines of Python. No one has to remember to check the dashboard — the digest comes to them.
Data warehouse sync
A product team wants to join their analytics data with subscription revenue to calculate revenue-per-acquisition by channel. They run a nightly job that pulls 7 days of Plausible data (with 3-day overlap for late-arriving data), upserts it into a Postgres table, and the BI tool joins it against the Stripe export. The overlap window handles the latency issue — data from two days ago sometimes gets updated as late-arriving events are processed.
Server-side event capture
An e-commerce site sends purchase events server-side via the Umami API after payment confirmation, bypassing ad blockers and ensuring every conversion is captured regardless of browser cookie state. This pairs well with a privacy-first data layer that separates collection from storage.
Choosing the Right API for Your Stack
The honest answer: start with what you’re already using. If you’re on Plausible, the Plausible Stats API is clean and well-documented. If you’re on Matomo, the Matomo Reporting API covers essentially every metric the platform tracks. If you’re on Umami, the self-hosted API requires a login token; the cloud version uses a dedicated key.
If you’re not yet committed to a platform and the API matters to your use case — and it should, because sooner or later you’ll want to pull data programmatically — here’s what to evaluate:
- Can you self-host? Self-hosted means no per-event costs, no sampling on large datasets, and full data ownership. Plausible, Matomo, and Umami are all open source.
- What’s the rate limit? 600 req/hour (Plausible default) is plenty for most reporting use cases. High-frequency polling for real-time dashboards needs a platform that either has a higher limit or a true streaming option.
- What does the response schema look like? A POST body with explicit metrics and dimensions (Plausible v2) is easier to work with than positional parameters in a query string (Matomo). Both work; one is more readable.
- Is auth server-side only? Good APIs make it impossible or obviously wrong to use the key client-side. Check the docs before you assume.
GA4’s Data API is available if you’re locked into Google’s ecosystem — but it requires OAuth or a service account, has its own sampling quirks, and doesn’t move you toward data independence. If you’re building something new, the open-source options give you more control at every layer.
What to Build First
If you’ve never integrated an analytics API before, don’t start with a data warehouse pipeline. Start with a simple automated report.
Pick your platform. Generate an API key. Write a script that makes one API call, parses the response, and prints a formatted summary to the console. Run it. Then make it email or Slack the output on a schedule. That’s a working analytics API integration, and it delivers value immediately.
From there, the path to custom dashboards, data pipelines, and multi-platform aggregation is incremental. Each step adds one layer of complexity. The fundamentals — auth headers, JSON parsing, rate limit handling, response validation — are the same across all of them.
In my experience, teams that build even a single working API integration change how they think about analytics data. It stops being something you check in a dashboard and becomes something you can compose, transform, and route wherever it’s useful. That shift in mental model is worth more than any specific feature the API exposes.
Written by Alicia Bennett
Lead Web Analyst based in Toronto with 12+ years in digital analytics. Specializing in privacy-first tracking, open-source tools, and making data meaningful.
More about Alicia →Related Articles
Server-Side Tracking: Complete Setup Guide Without Cookies
Why Server-Side Tracking Changes Everything Your analytics data is disappearing. Between ad blockers, browser privacy restrictions, and users declining cookie…
Matomo Reporting API: A Developer’s Guide
Most analytics platforms give you a dashboard. Matomo gives you an API. And if you’ve ever needed to pull visit…
Building a Privacy-First Data Layer: Step-by-Step
Your tracking data is only as good as your data layer. I’ve audited dozens of analytics implementations over the past…