SEO

How to Build an SEO Reporting API Workflow

Alicia Bennett
· · 10 min read
seo reporting api

Most SEO teams are still copying data out of dashboards by hand. Search Console in one tab, your analytics platform in another, a spreadsheet somewhere in the middle — and a report that takes two hours every Monday morning. There’s a better way: build a SEO reporting API workflow that pulls all the data automatically, combines it into a single view, and delivers it wherever your stakeholders actually look.

In this guide I’ll show you exactly how to do that — from authenticating with the Google Search Console API and a privacy-first analytics API like Plausible or Matomo, to scheduling the pulls, joining the datasets, and pushing results to a dashboard or spreadsheet. I’ll include real request examples throughout. If you’ve already built a general automation pipeline, this is the SEO-specific layer that sits on top — think of it as a companion to our guide on automated reporting workflows.

What You Need Before You Start

Before writing a single line of code, collect three things:

  • A Google Search Console property with verified ownership (or a service account with delegated access)
  • An analytics platform that exposes a reporting API — Plausible, Matomo, or another privacy-first tool
  • A destination for your report — Google Sheets, a Postgres table, a BI tool like Metabase, or a simple HTML file sent by email

I’ve seen teams skip the destination question until the end and regret it. Know where the data is going before you design the schema. The shape of your output determines which fields you actually need to request.

Step 1: Authenticate with the Google Search Console API

The Google Search Console API uses OAuth 2.0. For an automated (unattended) workflow the right approach is a service account — you generate a JSON key file, grant the service account access to your GSC property, and your script authenticates without any human in the loop.

In the Google Cloud Console: create a service account under your project, download the JSON credentials, then go to Search Console → Settings → Users and permissions and add the service account email as a restricted user.

Here’s a minimal Python authentication block using the official client library:

from google.oauth2 import service_account
from googleapiclient.discovery import build

SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly']

credentials = service_account.Credentials.from_service_account_file(
    '/path/to/service-account.json',
    scopes=SCOPES
)
gsc = build('searchconsole', 'v1', credentials=credentials)

Keep the JSON key file out of version control. Store it as an environment variable or a secrets manager entry. Never let it appear in client-side code — service account credentials are effectively permanent until rotated.

Step 2: Pull Search Performance Data

The core endpoint is searchanalytics.query. You POST a JSON body specifying your property, date range, dimensions, and row limit. Here’s a request that fetches the last 28 days of clicks, impressions, position, and CTR broken down by page and query:

response = gsc.searchanalytics().query(
    siteUrl='https://yoursite.com/',
    body={
        'startDate': '2026-05-16',
        'endDate':   '2026-06-12',
        'dimensions': ['page', 'query'],
        'rowLimit': 25000,
        'dataState': 'final'
    }
).execute()

rows = response.get('rows', [])
# Each row: { keys: [page, query], clicks, impressions, ctr, position }
for row in rows:
    page, query = row['keys']
    print(page, query, row['clicks'], row['position'])

A few things worth knowing: the API lags 2-3 days behind real time, so set your end date accordingly. The default row limit is 1,000; 25,000 is the maximum per request. For larger properties you’ll need to paginate by incrementing startRow. Use 'dataState': 'final' to avoid getting preliminary numbers that will shift later.

SEO Reporting API Data Flow

Step 3: Pull Traffic Data from Your Analytics API

Search Console tells you about impressions and clicks from Google. It doesn’t tell you what happens after the click. For that you need your analytics platform.

Plausible Stats API

The Plausible Stats API uses a Bearer token. Generate a key from your Plausible account settings, then POST to /api/v2/query:

curl -X POST https://plausible.io/api/v2/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "site_id": "yoursite.com",
    "metrics": ["visitors", "pageviews", "bounce_rate", "visit_duration"],
    "date_range": ["2026-05-16", "2026-06-12"],
    "dimensions": ["event:page"],
    "filters": [["is", "visit:source", ["Google"]]]
  }'

The filter on visit:source scopes results to organic search traffic, so when you join this to your GSC data you’re comparing apples to apples. Plausible enforces a rate limit of 600 requests per hour — more than enough for a scheduled daily job.

Matomo Reporting API

Matomo uses a token_auth parameter. For self-hosted instances, every request hits a single endpoint: index.php. POST rather than GET — with GET, the token appears in server logs.

curl -X POST https://your-matomo-instance.com/index.php \
  --data-urlencode "module=API" \
  --data-urlencode "method=Actions.getPageUrls" \
  --data-urlencode "idSite=1" \
  --data-urlencode "period=range" \
  --data-urlencode "date=2026-05-16,2026-06-12" \
  --data-urlencode "segment=referrerType==search" \
  --data-urlencode "format=JSON" \
  --data-urlencode "token_auth=YOUR_TOKEN_AUTH"

The segment=referrerType==search filter isolates organic search sessions, mirroring what you’d get from Plausible’s source filter. Matomo’s API is exceptionally flexible — almost every report in the UI has a corresponding API method.

Step 4: Join the Datasets

You now have two tables: GSC (page + query → clicks, impressions, position) and analytics (page → sessions, bounce rate). The join key is the page URL, but watch out for trailing slash mismatches and protocol differences (http vs https). Normalize before joining.

In Python with pandas:

import pandas as pd

# gsc_df columns: page, query, clicks, impressions, ctr, position
# analytics_df columns: page, visitors, bounce_rate

# Normalize URLs
gsc_df['page'] = gsc_df['page'].str.rstrip('/').str.lower()
analytics_df['page'] = analytics_df['page'].str.rstrip('/').str.lower()

# Aggregate GSC to page level (drop query dimension)
gsc_pages = gsc_df.groupby('page').agg(
    clicks=('clicks', 'sum'),
    impressions=('impressions', 'sum'),
    avg_position=('position', 'mean')
).reset_index()

# Join
combined = gsc_pages.merge(analytics_df, on='page', how='left')

The left join keeps all GSC pages even when analytics has no matching record — useful for flagging pages that get clicks but generate no measurable traffic (possible tracking gaps). This kind of data reconciliation is covered in more depth in our article on semantic SEO analytics and user intent, which talks about interpreting engagement signals alongside search data.

One column worth computing immediately: click-to-session ratio. If GSC reports 500 clicks to a page but analytics records 100 sessions, something is wrong — broken tracking, a redirect loop, or a bot filter discrepancy. Catching this in the join step saves you from presenting bad numbers to stakeholders.

Joining GSC and Analytics Data

Step 5: Handle Rate Limits and Errors Gracefully

Automated workflows break silently more often than they fail loudly. Rate limits are the most common cause. Both the GSC API and analytics APIs will return a 429 Too Many Requests response when you exceed your quota. The correct response is exponential backoff — wait, then retry, with an increasing delay each time.

Here’s a simple retry wrapper:

import time

def api_call_with_backoff(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as e:
            if '429' in str(e) or 'rate' in str(e).lower():
                wait = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
    raise RuntimeError("Max retries exceeded")

Beyond rate limits: always validate the shape of the response before processing it. Check that expected keys exist, handle empty rows arrays (a valid response for a date range with no data), and log any unexpected status codes to a file rather than letting them crash silently. I’ve seen dozens of SEO reporting pipelines that looked fine in testing and fell apart on the first Monday morning run because nobody handled the edge cases.

If your pages have slow load times or poor Core Web Vitals scores, that affects rankings directly — and it’s worth pulling that data into the same report. Our Core Web Vitals measurement guide covers how to pull CWV data programmatically via the PageSpeed Insights API, which you can fold into this same workflow.

Step 6: Schedule the Workflow with Cron

A report that runs once is a script. A report that runs every day without you touching it is a workflow. On Linux servers, cron is the standard scheduler. On cloud platforms, you have equivalents like GitHub Actions scheduled workflows, AWS EventBridge, or Cloud Scheduler.

A cron entry that runs your SEO report script every day at 6 AM:

# crontab -e
0 6 * * * /usr/bin/python3 /opt/seo-report/run.py >> /var/log/seo-report.log 2>&1

A few scheduling rules I always follow:

  • Run data pulls in off-peak hours — not just to avoid rate limits, but because APIs like GSC are sometimes slower during business hours
  • Log stdout and stderr to a file (>> logfile 2>&1) so you can debug failures after the fact
  • Send yourself a brief email or Slack message on failure — silence is not a success signal
  • Use a lock file or process check to prevent overlapping runs if the previous job is still running

For a more robust setup, GitHub Actions scheduled workflows give you logging, retry handling, and notifications out of the box — without managing a server.

Step 7: Push Results to a Dashboard or Sheet

Data sitting in a database helps nobody. Get it in front of stakeholders.

Google Sheets is the lowest-friction destination. Use the gspread Python library with a service account (the same one you used for GSC) to write your combined dataframe directly to a named sheet:

import gspread
from google.oauth2 import service_account

creds = service_account.Credentials.from_service_account_file(
    '/path/to/service-account.json',
    scopes=[
        'https://www.googleapis.com/auth/spreadsheets',
        'https://www.googleapis.com/auth/drive'
    ]
)
gc = gspread.authorize(creds)
sheet = gc.open("SEO Weekly Report").worksheet("Data")

# Write header + rows
sheet.clear()
sheet.append_row(combined.columns.tolist())
sheet.append_rows(combined.values.tolist())

For teams already using a BI tool like Metabase, Grafana, or Redash, writing to a Postgres table is often cleaner — stakeholders query the live table, and you just keep it refreshed. Either way, include a report_date column so it’s always clear when the data was last updated.

Key Endpoints at a Glance

Source Endpoint Auth Method Key Parameters
Google Search Console POST /v1/sites/{site}/searchAnalytics/query OAuth 2.0 / Service Account startDate, endDate, dimensions, rowLimit
Plausible POST /api/v2/query Bearer token (header) site_id, metrics, date_range, dimensions, filters
Matomo POST /index.php token_auth parameter module=API, method, idSite, period, segment
PageSpeed Insights GET /pagespeedonline/v5/runPagespeed API key (query param) url, strategy, key
Google Sheets Sheets API v4 OAuth 2.0 / Service Account spreadsheetId, range, valueInputOption

Putting It All Together

A complete SEO reporting API workflow looks like this in practice: a cron job fires at 6 AM, authenticates with a service account, pulls 28 days of GSC data for your top 5,000 pages, pulls matching traffic and engagement data from Plausible or Matomo, normalizes and joins the two datasets, computes derived metrics (click-to-session ratio, average position per page, traffic trend), and writes the result to a Google Sheet that your team has bookmarked. Total runtime: under three minutes for most sites.

The setup takes a few hours the first time. After that it runs itself, and your Monday morning frees up. Start with a single data source — GSC alone is already more useful than most manual SEO reports — then layer in analytics data once that first pull is stable. Incremental is better than a big-bang build that breaks before anyone sees it.

If you’re building this for the first time, the hardest part isn’t the code — it’s agreeing on which metrics matter. That conversation is worth having before you write the first query.

Written by Alicia Bennett

Lead Web Analyst based in Toronto with 12+ years in digital analytics. Specializing in privacy-first tracking, open-source tools, and making data meaningful.

More about Alicia →

Related Articles