Case Study
20 min read · Updated May 5, 2026
An AI‑Native Portfolio
Built brick by brick, just like my FYP
A recursive artifact: a case study about building the site that hosts it, with Claude Code ↗ as build partner. Multiple production incidents, several architecture bets, and a deliberate division of labor between human direction and agent execution. What follows is the case study version, not the changelog. Twenty minutes now, hours saved later. Probably not for the recruiter, ironically (sorry). For the next person pair-programming with an agent.
01 · The Brief
Three jobs, seven days.
A recruiter-facing portfolio surface that did three things at once, shipped in a hard timebox while interviewing several days a week. The constraint was time, not scope.
Job 01
Resume in 30 seconds
Tech PM hiring is a scanning game. The resume is the primary artifact, and nothing should compete with it for attention above the fold on the landing page.
Job 02
Show, don't tell
A live, well-built site signals shipping ability more than any deck. What you're viewing is the proof.
Job 03
Re-introduce my creative side
A centralized home for my playlists, reviews, and other creative work. Some may come to my site just for this, but it operates in concert with my professional work.
This site has a wide scope, and more is coming soon. I deliberately scoped MVP down to five pages— Landing, Resume, About, Contact, Music—with everything else I'm working on gated behind a no-public-placeholders rule. Day 4 was my deadline for a shippable MVP; Days 5 through 7 were for polish and stretch goals.
How I used Claude
02 · The Workflow
claude code · build partner
Where I drove, where Claude drove.
Most of the code is the agent's. Most of the decisions are mine. The interesting part—and the part recruiters keep asking about—is what that division of labor actually looks like in practice.
The human in the loop
Where I drove
Information architecture and brand voice. Decision-quality reviews when the agent picked a fork I'd have picked differently. Editorial taste on the user-facing product. Constraint discipline (e.g. no placeholders, platform links never show handles, etc.).
The agent
Where Claude drove
Implementation: React, Tailwind, the typography primitives, the layout system, the Spotify client, the Calendly embed, the token-pipeline build script. Defensive engineering once instructed. Inline comments dense enough that a non-technical reader can navigate the code.
The blind spot
Where neither of us was great
Visual design creation. Iconography, layout proportions, the feel of a UI choice—anywhere taste matters more than logic. We went through four rounds of phone-icon variants, none landing, before the call was to pick a stock Heroicons handset and move on. The agent can't develop taste; a PM can't always articulate it. Even when provided design references, the agent struggled to create a visually pleasing icon.
“Plan in conversation, build in code.”
Most sessions started with me writing or referencing the plan (PLAN.md), the agent and I aligning on what to build, then the agent shipping a working iteration in 15 to 45 minutes. I'd review, redirect, and redline. The total build time was a fraction of what solo-coding would have taken; the cognitive load was almost entirely on direction and judgment, which is the part of the work that I specialize in (and that adds value to AI-native product teams).
How I used Claude
The pattern is durable because of context, not magic. CLAUDE.md at the repo root carries the working scenario, brand voice, and stakeholders. Project memory persists decisions and voice rules across sessions. Five custom sub-agents bring lens-specific critique without re-priming context:
growth-product-strategist—activation, engagement, and lifecycle-loop critique on product decisionsdesign-reviewer—typography rhythm, spacing, color, and sub-brand cohesion against rendered surfacescode-reviewer—type safety, naming, dead code, async correctness, and maintainability footgunsa11y-reviewer—WCAG 2.2 AA across semantic HTML, keyboard navigation, contrast, and reduced-motionseo-expert—technical SEO, structured data, and AI-search readiness (llms.txt, JSON-LD, OG)
All five live in ~/.claude/agents/ rather than inside the project—global and portable across whatever I build next. The PM cost of standing them up is paid once; the lens travels. They form the foundation of an AI "second brain" for all my work.
03 · The Architecture
claude · implementation
A few bets that distinguish this from generic Next.js.
The site looks deceptively simple: a recruiter cluster, a few sub-brand pages, a case-study cluster. The decisions that hold it together are quieter, and they're the part worth flagging for anyone reading the source.
Information architecture
Recruiter-first, exploration-second
Landing elevates exactly one CTA—“See my resume →”. Below that, a conditional sub-brand matrix lets cultural-curious visitors explore (today, just Music; the rest of the matrix unlocks the day each sub-brand ships, per the no-placeholders rule). Two audiences served, each with the surface area they actually need, without diluting the primary call to action.
Information architecture
No public placeholders, ever
Routes don't ship until they're real. Nav only shows live pages. The landing matrix renders only tiles for sub-brand pages that actually exist. There are no “coming soon” pages, no skeleton routes, no nav entries pointing at unbuilt work. The discipline keeps the site small, the chrome honest, and the surface area genuinely shippable on any given day.
Design system
One codebase, many sub-brands, zero JS flips
Each sub-brand (Music, eventually Film, TV, and some others) gets its own primary color and display font via data-attribute-scoped CSS variables: <div data-subbrand="music"> flips --font-primary, --font-secondary, and the entire --primary-* ramp via cascade. SSR rendered, no flash, no JS. The Card primitive can carry a sub-brand accent independently of its surrounding page.
Design system
Tokens-as-source-of-truth, in code
Started in Figma with Tokens Studio. Three days in, I deprecated the Figma file and made the JSON in tokens/ canonical. A small parser (scripts/build-tokens.mjs) walks the multi-tier files (Brand → Alias → Mapped → Responsive) and emits CSS custom properties grouped by tier into globals.css. When “design system” means “spreadsheet of values,” the spreadsheet should live where the values are read.
How I used Claude
04 · Integrating Spotify
claude · defensive engineering
From play to pause and back again.
The /music page was supposed to be the easiest of the sub-brand pages. The brief was straightforward: pull the public playlists from my Spotify account, render them in a grid, link to detail pages. Three plot points later, it took three times longer than estimated.
Plot point 01
Half the API I planned to use no longer exists
Spotify quietly deprecated a swath of Web API endpoints in November 2024—including the user-playlists endpoint that historically mapped 1:1 to “public playlists on this profile.” Apps registered after that cutoff get 403, regardless of auth method. We refactored from Client Credentials to Authorization Code with refresh-token persistence. Cost: ~1 hour.
Plot point 02
The API itself reshuffled
Endpoints renamed (/tracks → /items), response shapes flattened, field names changed (item.track became item.item because Spotify unified tracks and podcast episodes under one model). The docs hadn't fully caught up. Cost: another hour of empirical shape-discovery via logged responses.
Plot point 03
The rate-limit incident
The fetch logic, in a moment of “this works for now” optimism, used Promise.all to fetch track lists for 57 playlists in parallel as I was working with Claude to understand how to curate that list for the site. Multiple back-to-back bursts during dev triggered the rate limiter, then an escalated cool-down. At one point Spotify's Retry-After returned 77,368 seconds—about 21 hours.
Retry-After value
77,368s
Spotify's escalated cool-down after multiple bursts. About 21 hours.
Concurrent fetches
3
In-flight cap after the fix. With retry-on-429 honoring Retry-After and a graceful fallback page when the API is unreachable.
The cure was straightforward: concurrency limiter (3 in-flight max), retry-on-429 honoring Retry-After, and a graceful fallback page when Spotify is unreachable. The lesson is the part that travels: rate limits are a property of production systems, and “I'll add throttling later” is a sentence that buys you a 21-hour penalty box. Should have throttled from the first request.
There's a quieter PM lesson in here too. The original ROI estimate on the Music page assumed the API would behave the way the docs described. That assumption broke twice (deprecation, reshape) and the rate-limit was the third hit. Multi-day vendor incidents on a one-week MVP are exactly the kind of thing that should slip into the buffer, not push out the deadline. Music slid one day; the rest of the week absorbed it.
A subtler discovery after the fact: Spotify rate-limits per endpoint family, not per app. A clear /me bucket doesn't mean a clear /me/playlists bucket. I built a small probe (npm run spotify:health, also exposed at /api/spotify/health in dev) that hits both and returns the time-to-clear. When /music misbehaves now, the diagnostic is one command away—instead of a 21-hour mystery box.
How I used Claude
Promise.all and, after the incident, the throttled client with retry logic. Both were correct against the spec I gave at the moment I gave it. The constraint “rate-limit every third-party integration on day one” now lives in project memory so the next integration doesn't earn its lesson the same way.06 · Three Resumes
claude · triple-output craft
Browse, tailor, grab.
One person, three reading contexts. The /resume page is built for browsing—a recruiter following a link from LinkedIn or an emailed intro. The .docx template is what I download and tailor per application, export to PDF, then submit through the company's portal—ATS-shaped, single column, no fancy typography, edited to the specific posting. And the standard PDF is a static fallback for anyone who just wants a file in their inbox without asking—no tailoring, perpetually current as the source evolves. Three jobs, three editorial decisions, one underlying record.
Web resume
Browse
Lives at /resume. Rendered in Instrument Serif and DM Sans, with company links in their actual brand colors, a sticky TOC for desktop, and a case-study card grid. Education and experience are presented in full (of course, still editorialized). The audience is browsing, not scanning. Source: app/resume/resume-data.tsx—a .tsx file because some bullets embed inline JSX links.
.docx template
Tailor
My working file. Download → tailor for the application → export to PDF → submit. Single column, no tables, no fancy typography. Some education is dropped, some roles are trimmed. Same person, scanning context, tighter editorial. Generated by scripts/build-resume-docx.mjs (using the docx package), pinned at /resume/malcolm-xavier-resume-template.docx.
Standard PDF
Grab
The friction-free fallback. A generic, non-tailored rendering of the .docx template, pinned at /resume/malcolm-xavier-resume.pdf. Built by scripts/build-resume-pdf.mjs, which chains off the docx build and converts—so the file stays in lockstep with the template every time the source changes. For the recruiter who just wants a quick file to share with a hiring team.
Three artifacts, two pipelines. The web resume sources from app/resume/resume-data.tsx (JSX-aware, since some bullets embed inline links). The .docx and the PDF both come from a single Node script chain—build-resume-docx.mjs makes the Word file; build-resume-pdf.mjs runs that and converts. So the .docx and PDF stay in lockstep automatically; the web version is the parallel maintenance burden. Bridging the JSX and the Node side would mean a TS compiler step plus a React-element walker for an artifact regenerated maybe once a month. I accepted the parallel maintenance, with a comment at the top of each file pointing at the other, since there is value to use case matching.
“Brand-craft is mostly the details no one's supposed to consciously notice.”
A handful of details worth flagging for anyone building something similar—most of them about respecting the actual reading mechanics of a Word doc that may be parsed by ATS, opened in Google Docs, exported to PDF, and printed before it gets read:
keepNext: trueon every paragraph in an entry except the last. Prevents Word from breaking a single role across pages—the rule any recruiter would tell you about a resume but no template enforces by default.titlePage: trueto suppress the page header on page 1. Page 1 carries the full hero (name, headline, contact, summary) in the body. Pages 2+ get a minimal header so a printed-and-stapled resume still identifies itself if the pages get separated.- Hyperlinks preserved through the .docx → Doc → PDF pipeline. Every company name, every contact item, every case-study title carries a real
href. ATS that strip formatting still get the URL as text; humans clicking through the PDF in Gmail get a working link. - Friendly link labels. “LinkedIn · GitHub · Personal Website” instead of bare URLs in the contact strip. Easier to scan, less visual noise, the underlying hyperlink still resolves.
- Company URLs match each company's actual canonical hostname. People Inc., Muck Rack, GitHub, and Calendly canonicalize to apex (no
www.); LinkedIn, User Interviews, Fullstack Academy, Fractured Atlas, Artist Growth, and NEFA canonicalize towww.. Matching each site's canonical avoids a 301 redirect hop when the link is clicked. Approximately zero recruiter value, and a non-zero number of senior engineers will notice.
How I used Claude
.docx is the path of least resistance the moment an application asks for a working file. The standard PDF is the polite drop-in. No friction, standard resume on demand. If data ever suggests the web resume isn't getting reviewed, the file artifacts become the lead surface and the web version becomes the deep cut companion. Same source material, three distributions. A recruiter who prefers a file can get it; a recruiter who prefers a web resume can get that; and a recruiter who prefers a PDF can get that.07 · QA
claude · multi-agent orchestration
Three reviewers, one punch list, one humbling number.
Pre-launch QA at solo-PM scale needs three lanes that don't overlap—one for accessibility, one for design cohesion, one for code maintainability—and a way to merge them into a single punch list. Of the five sub-agents I built, three sit on the QA loop: a11y-reviewer, design-reviewer, and code-reviewer. Growth and SEO sit outside it; they operate on different surfaces and a different cadence. The three on the loop run as a single orchestrated command.
a11y-reviewer
WCAG 2.2 AA
Reviews semantic HTML, keyboard navigability, focus states, contrast in light + dark, alt text, ARIA usage, prefers-reduced-motion, form labeling, and target-size minimums. The original lane on the harness; the other two were built around it.
design-reviewer
Sensibility + cohesion
Reviews typography rhythm, spacing, color usage, sub-brand cohesion, iconography consistency, hierarchy, motion, responsive treatment, and editorial voice in UI copy. Operates against rendered surfaces when a dev server is available; falls back to code-and-token review when not.
code-reviewer
Efficiency + structure
Reviews type safety, separation of concerns, naming, dead code and duplication, performance footguns, error handling at boundaries, React and Next.js patterns, async correctness, and bundle hygiene. Reads context around the diff, not just the touched lines.
Each role spec lives at ~/.claude/agents/<name>.md: checklist, output format, severity definitions, and an explicit what NOT to do section so the agents don't drift into each other's lanes. All three were standardized on a shared severity vocabulary—Critical, High, Medium, Low, plus a Couldn't-verify bucket—so a downstream orchestrator can merge their reports cleanly.
How I used Claude
I built a custom command, /full-review, that ties them together. It establishes scope (current diff, or whatever the user named in the previous message), spawns all three reviewers in parallel, then synthesizes their reports into a single structured punch list. Two sections come first: Conflicts (where reviewers disagree on the same element—the user adjudicates), and Aligned (where two or more reviewers independently flag the same issue—high-confidence calls). After that, the standard severity buckets, exhaustive—no findings cap, to explicitly surface everything reviewable rather than a curated top-N list.
The first run, against the entire site in light and dark modes, came back with ninety-nine findings. One of them—a token-chain bug that silently invalidates the recruiter cluster's text colors—was independently flagged by both the design reviewer and the a11y reviewer at Critical severity. That's the kind of cross-confirmed signal a single reviewer would have either missed entirely or underweighted. Aligned items are the highest-leverage fixes; conflicts are the ones the user actually has to think about.
“Three overlapping reviews would be mush. Three lanes with explicit handoffs is signal.”
Ninety-nine findings is a lot to triage from a markdown report. So the synthesis output also renders as a self- contained interactive HTML dashboard at _private/_reviews/full-review-<date>.html, with each finding as a card carrying severity, reviewer tags, file:line citation, description, and fix recommendation. Status dropdowns (Open / Done / Won't do) and severity dropdowns are wired to localStorage, so working through the list survives reloads. Filters on Status, Severity, and Reviewer cut the list to whatever slice you want to work in.
The point isn't the dashboard. The point is that the review becomes load-bearing data—something to triage, manage, and check off—instead of a markdown file that gets read once and buried. Per the recursion lesson above, written notes don't drive behavior. A working surface does.
A day after the first run, with the audit-closeout commits landed, I ran /full-review a second time on the closeout work itself. It surfaced 35 substantive new findings—six items the first pass had missed, twelve tradeoff costs of choices I'd made knowingly, and seventeen regressions in code we'd just written. Forty-nine percent of the new findings were brand-new bugs introduced while fixing other bugs.
The reflection on the missed-six was the more humbling number. Five of the six were variations on a single failure mode: we'd audited the trigger of a fix, not the family the bug belonged to. The original --text-action contrast fix landed for the components we'd noticed but missed PaginationButton (which used --primary-default directly) and --border-focus (which chained through --primary-700)—sibling bugs in the same family. A 60-second class-audit would have caught them.
Regression rate
49%
Of 35 new findings on the audit-closeout commits, 17 were regressions in code we'd just written.
Sibling-bug discoveries
5 of 6
Of the pre-existing items the first pass missed, 5 were variations on a single failure family the original fix didn't sweep for.
Both lessons now sit in AGENTS.md at the repo root, where any agent working in the codebase reads them before writing the next fix: catch regressions during the change, not in the next audit (three failure-mode checks before declaring done—error paths for new scripts, breakpoint pass for UI, consumer grep for shared logic), and when fixing a class of bug, audit the whole class (name the abstraction, grep every consumer, verify each under the same conditions). Per the recursion lesson in Button Mashing above: written documentation isn't operational discipline. The repo-level rule is the discipline; the markdown is the documentation. Both, not one.
A new instance landed mid-redline on this very article. One visible JSX-whitespace bug in Section 8 (Three Resumes) surfaced eleven siblings—ten on this page, one on the Basecamp study—once the trigger was audited rather than patched. The rule paid for itself in a single sweep.
How I used Claude
/full-review in the same session that birthed it had to bootstrap the new reviewers through the generic general-purpose agent reading their role files at runtime—same persona, just not native yet. After a session restart, they load directly. Worth knowing if you're authoring agents and trying them in the same conversation.08 · What's Live
What shipped, what got cut, what's next.
As of writing, the MVP is live, gated behind a Basic Auth proxy while the site is being battle tested by close friends and colleagues. Two audit cycles deep, with the regressions and class-audit lessons captured as repo-level rules so the next agent doesn't earn the same ones.
Landing
One CTA above the fold (“See my resume →”), with a conditional sub-brand matrix below for cultural-curious visitors.
Resume
Web-native recruiter resume with editorial typography, sticky TOC, company links in their actual brand colors, and a case-study card grid (this case study included). Plus a .docx template for ATS submissions, generated from the same editorial decisions.
About
Senior PM positioning up top, followed by a more personal bio and creative socials.
Contact
Inline Calendly widget for the recruiter booking flow, plus mailto and LinkedIn fallbacks.
Music
37 Spotify playlists, sorted by most-recent track added_at as a proxy for last-edited (Spotify doesn't expose modified_at), with manual-pin override and an /api/spotify/health probe for per-bucket rate-limit diagnostics.
Case studies
Two articles in the cluster: the Basecamp Coffee turnaround and this meta study. Shared chrome, shared primitives, shared scroll-progress bar.
Test coverage
Vitest suite covering the brittle paths the audit flagged—rate-limit hardening, mosaic helpers, snapshot fallback. 21 tests today; the commitment is that brittle correctness paths ship with tests, not as a follow-up.
Review tooling
The three-reviewer harness above, wired up as a single /full-review command, generating an interactive dashboard at _private/_reviews/ that triages, filters, and tracks each finding to closure.
PM judgment is mostly about what not to ship. The biggest scope cut from the MVP: forthcoming sub-brand pages. No content for them yet—per the no-placeholders rule from Section 3, they appear in nav and on landing the day they ship, not before. Stay tuned...
What I'd do differently next time:
Lesson 01
Rate-limit on day one, not later
Every third-party integration, throttled from the first request. The original Promise.all across 57 playlists (associated with my profile, not all coming over to this site) bought a 21-hour Spotify penalty box; rate-limiting on day one would have bought zero.
Lesson 02
Cache fixtures locally for dev
API responses on disk during dev would have prevented the rate-limit cascade entirely and made iteration faster. Production code shouldn't pay the cost of dev iteration loops.
Lesson 03
Hardcode the visual baseline
Theme-aware UI primitives shouldn't depend on var() references that can fail silently. Hex literals via [data-theme="dark"] selectors are more brittle to read but more bulletproof to render.
Lesson 04
Validate against the deployed render
Computed-style or screenshot before declaring a visual fix done. Not “the code looks right.” The two aren't the same thing, as the button bug spent four rounds proving.
Lesson 05
Audit the family, not the trigger
Bugs rarely live alone. A 60-second class-audit before declaring the fix done catches sibling bugs before the next reviewer does, and avoids the 49% regression rate I paid for skipping it.
Lesson 06
Cheap test before high-cost action
Anything touching global state needs a 90-second falsification before the destructive commit. Same gate I'd apply to a junior PM proposing a vendor escalation.
Lesson 07
Documentation isn't discipline
Written postmortems are documentation. Operational discipline is the ongoing thing. Product teams need to invest in both, not one.
How I used Claude
This case study is a deep dive into how this site was built. My first time using Claude to build a site is documented in another case study: Basecamp Coffee →.
If this resonated, two next steps: Review my resume → or Book a 30-min product chat ↗.