Building an SEO Intelligence Platform

Overview

What I built: A custom SEO intelligence platform. MCP server with DuckDB backend, Next.js frontend, and API integrations across six major data sources.
Development timeline: 8 weeks from first commit to production use (February to April 2026).
Platform scale: 109 automated tools, 22 structured reporting prompts, 159 custom UI components, 370 unit tests, 69 end-to-end tests.
Business impact: Reduced enterprise audit deliverable time from 40 to 60 hours down to 10 hours. An 83 to 88 percent reduction without sacrificing analytical depth or recommendation volume.
My role: Sole architect, project manager, and QA lead. Planned the system architecture, wrote all project plans and user stories, directed an AI coding agent (Claude Code) to execute the build, and tested and verified every output.

The problem

I joined an enterprise SEO consultancy where the pace of client delivery was fundamentally different from what I was accustomed to. In previous roles, a comprehensive technical SEO audit with strategic recommendations typically required 40 to 60 hours of work: exporting data from Screaming Frog, Google Search Console, GA4, Ahrefs, and SEMrush individually, cross-referencing URLs across spreadsheets to identify patterns, manually calculating page-level scores and template-level rollups, and writing everything into a client-ready deliverable.

At this consultancy, I had roughly one-quarter of that time, 10 to 15 hours per deliverable, while managing multiple enterprise client accounts simultaneously, each with 10,000 to 500,000 URLs. The expectation was faster turnaround. The standard industry response to that constraint is to sacrifice depth: run a shallower crawl, skip the cross-referencing, deliver a shorter report with fewer recommendations.

I chose a different approach. Instead of cutting the analysis, I built a system that could perform the data unification and pattern detection automatically, freeing my time for the work that actually requires human judgment: interpreting patterns, prioritizing recommendations, and translating technical findings into business-impact language for client stakeholders.

The solution

Architecture

I designed a platform built on three layers.

Data layer: A DuckDB analytical database that ingests and normalizes data from six sources. Screaming Frog crawl exports, Google Search Console, GA4, Ahrefs (via API), SEMrush, and XML sitemaps and robots.txt. URL normalization happens at import time, so every data source speaks the same language. Bulk SQL ingestion handles scale efficiently: 111,000 rows in approximately one second versus two or more minutes with row-by-row processing.

Intelligence layer: 109 MCP (Model Context Protocol) tools that perform specific analytical functions. These range from foundational operations (import data, create clients, list crawls) to sophisticated cross-source analysis:

A 7-CTE page scoring engine that calculates composite SEO value per URL using log-normalized metrics across current performance and growth potential.
Traffic-weighted issue prioritization that blends Screaming Frog technical issues with analytics data so a missing H1 on a page with 50,000 sessions ranks higher than the same issue on a zero-traffic page.
Template classification and directory-level scoring rollups that reveal which page types are underperforming systematically.
URL coverage analysis that cross-references what the crawler found against what GSC, GA4, and backlink data say exists, exposing the gap between what you have audited and what actually matters.
Keyword cannibalization detection, content gap analysis, and quick-win identification.
A full Ahrefs backlink audit SOP executable in a single command.
AI citation share-of-voice tracking and GEO readiness scoring.

Presentation layer: A Next.js frontend with 26 routes, interactive dashboards, filterable data tables, chart visualizations, an AI chat interface with 22 structured reporting prompts, and a deliverable export system. The frontend was rebuilt from scratch in version 2 across 14 sprints (258 story points).

How I built it

This project is itself a demonstration of the AI-augmented workflow I advocate for. I served as the architect, project manager, and QA lead. I identified the problem, designed the data model and tool architecture, wrote detailed project plans and user stories for each feature, and directed Claude Code (an AI coding agent) to execute the implementation. Every tool, component, and test was verified through my own testing and user acceptance process.

The deliverable: what 10 hours produces

For a B2B SaaS client preparing for a site migration, I produced a strategic recommendations document covering six analytical pillars across approximately 2,800 indexed pages and 787,000 annual organic clicks.

Data-driven page prioritization. The system cross-referenced GSC impressions, click-through rates, ranking positions, GA4 engagement rates, and AI embedding cluster centrality to produce tiered priority lists. For example, identifying 15 pages with the highest impression volume that were in striking distance (position 4 to 15) with CTR below 2 percent.

AI embedding analysis. Using embeddings from the crawl data, the system identified 37 natural content clusters, 111 near-duplicate page pairs (95 percent or higher cosine similarity), and 277 cannibalization risks where semantically identical pages competed against each other.

Schema markup roadmap, internal linking strategy, LLM optimization, and a phased priority action matrix rounded out the deliverable. The split was approximately 60 percent system-generated analysis and 40 percent strategic decisions, editorial judgment, and client-specific recommendations.

This deliverable took 10 hours. Without the platform, the same depth of analysis would have required 40 to 60 hours.

What the platform revealed

Coverage gaps are invisible without cross-referencing. On one client engagement, the standard Screaming Frog crawl captured 140 pages. The platform’s coverage analysis tool revealed that 87 percent of all user sessions were happening on pages the crawler never reached. Without the cross-reference, the audit would have covered 0.2 percent of the pages that actually matter.

Template-level patterns are invisible at the page level. The platform’s template classification revealed systemic issues. For instance, 91 percent of all content quality issues on one site were caused by a single page template. One template fix resolved 350 issues simultaneously.

Traffic weighting changes every priority list. A missing H1 tag appears on hundreds of pages in any large crawl. The platform’s traffic-weighted issue scoring surfaced the 15 missing-H1 pages that collectively accounted for the majority of the site’s organic traffic.

Key takeaways

1. The constraint was a design opportunity, not a limitation. Faster turnarounds did not require shallower analysis. They required separating the work that machines do well from the work that requires human judgment.

2. AI-directed development is a legitimate engineering methodology. I did not write the code line by line. I architected the system, wrote detailed project plans and user stories, directed an AI coding agent to implement them, and QA’d every output. The skill is in the planning, the problem decomposition, and the quality judgment, not in typing syntax.

3. Cross-source analysis is where the real insights live. No single SEO tool tells the full story. The most valuable outputs are the ones that cross-reference sources.

4. The platform is the portfolio piece. For an SEO professional, building a 109-tool intelligence platform that integrates six data sources, includes AI citation tracking, and produces comprehensive audit deliverables in one-sixth the time is the most concrete possible demonstration of technical depth and the ability to leverage AI for real business outcomes.

Built with: Python, FastMCP 2.x, DuckDB, Next.js 16, React 19, TanStack Query v5, Ahrefs API v3, Claude Code, Playwright, Vitest.