Case Study

10 min read · Updated May 5, 2026

Claude x Growth PM

Basecamp Rewards: A Turnaround

This page is an overview of two things at once. First and foremost, this is documentation of my first pass at using Claude Code as a building and thinking partner in developing a site. Second, it's a case study of a loyalty-program turnaround, using the Basecamp Coffee scenario as a practice space for demonstrating my Growth PM skills and what I've learned about using Claude Code as a partner. The quiz ↗ is the prototype that came out of it. What follows is the story of how it got there—and the divergence that started everything.

01 · The Signal

claude · file-tree exploration

Brand healthy. Program on fire.

The first data that told me what was actually broken: Brand NPS 67, Program NPS 12. Same customer base, same roasters, a 55-point gap between how people felt about Basecamp and how they felt about the rewards program.

The divergence tangibly frames the problem and its criticality. If brand NPS had dropped with program NPS, this might have been a brand crisis. But it didn't.

Customers still loved the coffee, the shops, the baristas. They specifically disliked the rewards program. An isolated program-design failure was dragging down a healthy business.

For context: Basecamp Rewards is a standard points-per-purchase program built around three tiers—Trailblazer → Explorer → Summit. The mechanics are conventional and, for accounting reasons, locked. The program isn't broken in the plumbing; it's broken in what customers feel when they use it.

Brand NPS

Basecamp Coffee overall. Stable across the decline window.

Program NPS

Basecamp Rewards specifically. -65% from 34 six months ago.

How I used Claude

First move: I pointed Claude at the inherited-chaos/ directory and asked it to triage. Member data CSV, advisor emails, failed campaign post-mortems, customer feedback by month, competitor research—hundreds of pages from the outgoing manager. Claude built a mental map of the corpus; I asked it to highlight what was load-bearing. The NPS divergence surfaced from organized/feedback-synthesis.md almost immediately.Claude didn't pick the insight; it found the data points that led me to it.

02 · The Data

claude · data synthesis

Signups masking indifference.

Six months of metric movement. Three patterns mattered.

Pattern 01

Hollow growth

Signups +123%, MAU -60%. A leaky bucket that would lead to a churn and burn strategy if not addressed directly. Management pointed to signups as good news; ironically, it was overshadowing more important data.

Pattern 02

Unit-econ inversion

LTV collapsed from $47 to $29 while cost-per-member climbed from $8.50 to $14.50. ROI went from 1.8 to 0.5. At the current slope, the program was one to two months from net-negative.

Pattern 03

Cadence rot

Average days between visits: 4.2 → 10.4. Cadence is the leading indicator; retention and LTV are lagging. Members who used to come in twice a week were coming in every other week. They weren't leaving angrily, but they weren't coming back excited either.

Metric	6mo ago	Now	Target
Signups	1,850	4,120	15,000
MAU	1,200	480	8,000
30-day retention	45%	27%	60%
Days between visits	4.2	10.4	—
Cost per member	$8.50	$14.50	< $11
LTV	$47	$29	—
ROI	1.8	0.5	—
Program NPS	34	12	25+

Signups

6mo ago

1,850

Now

4,120

Target

15,000

MAU

6mo ago

1,200

Now

480

Target

8,000

30-day retention

6mo ago

45%

Now

27%

Target

60%

Days between visits

6mo ago

4.2

Now

10.4

Target

—

Cost per member

6mo ago

$8.50

Now

$14.50

Target

< $11

LTV

6mo ago

$47

Now

$29

Target

—

ROI

6mo ago

1.8

Now

0.5

Target

—

Program NPS

6mo ago

Now

Target

25+

How I used Claude

I loaded the member-data CSV into context and asked Claude to extract what had actually moved. It surfaced the table above. I then asked it to separate vanity metrics from load-bearing metrics—what would a skeptical CFO say if I brought them the signup number alone? That reframe made the hollow-growth diagnosis unavoidable.I should note that this is a problem the data shows, irrespective of the rewards program. The hypothesis here is just that the rewards program is a lever that can be pulled to start fixing the problem.

03 · The Triangulation

claude · multi-agent review

Smoke signal to house on fire.

When a system is failing, one bad signal is the smell of smoke. Four independent signals starts feeling like the coffee is burning.

Signal 01

Member behavior

Cadence rotting, tier stagnation (89% stuck at Trailblazer, the first tier of the program), 23-second app sessions. Members weren't rage-quitting—they were drifting away indifferent. Worse than anger. It's forgetting.

Signal 02

Customer voice

“Fine but forgettable” showed up in customer feedback for three consecutive months. And: “Your coffee has personality. Your rewards program doesn't.”

Signal 03

Competitor-by-feature

Every failed Basecamp campaign had a competitor who won at the same mechanic. Double Points Weekend vs. Starbucks Star Dash. Social Share vs. Dutch Bros sticker culture. Pattern: winners led with identity, then layered mechanics.

Signal 04

Market whitespace

Starbucks owns convenient. Dutch Bros owns fun. Peet's owns quality. Roast & Co. owns craft. Nobody in the PNW market owned coffee identity.

“Dutch Bros remembers my name. Starbucks at least has games.”

customer feedback, March

How I used Claude

I had Claude spin up three custom sub-agents: an Exec (ROI lens), a Product Designer (identity/emotion lens), a Barista Lead (ground-truth lens). Then, I fed them the synthesis and hypothesis, asking for pressure-testing. They converged from different angles — Exec said “numbers or kill it,” Designer said “identity vacuum, not loyalty problem,” Barista said “customers ask ‘what should I try?’ ten times a day and we can't tell them.” Then I built a fourth agent of my own: a Growth Product Strategist, tuned for funnel/activation/retention-loop product thinking.The three generalists told me whether the synthesis held up from their angles. The growth agent checked for growth levers that formed the hypothesis; that's the lens that matters most in my day job. I also stored it globally as a reusable agent, which means I can reach for it on my forthcoming projects...Output from the first three agents is in reviews/synthesis-feedback.md. In the real world, these agents would be tuned to actual stakeholders and their behaviors. The automated, cross-lens analysis would enable me to move through stakeholder opportunity validation to solution recommendation(s) more efficiently.

“Basecamp doesn't have a loyalty problem. It has an identity vacuum.”

Product Designer sub-agent

04 · The Bet

claude · thinking partner

Personality over points.

The mechanics couldn't change — the points structure is legally locked (accounting reasons). That removed a tempting distraction. The question now was: what would make members want to come back, independent of the mechanics?

Hypothesis: if we give users a coffee-identity artifact—an archetype they claim, a matched drink, something shareable—cadence recovers, retention follows, program NPS stops bleeding.

That's testable. And with quality experiment design you must define what success looks like before you run it (which means you must define failure too). This is where growth and data strategy meet.

“People light up when we ask about their coffee preferences. The app doesn't capture any of that.”

Barista Lead sub-agent

How I used Claude

The move from “convergent evidence” to “specific hypothesis” is where Claude earns its keep as a thinking partner. I iterated on what the evidence actually implied — not “people want personality in general” but specifically a claimable archetype with a matched drink. Decisions and open research questions went into project memory so I wouldn't re-litigate them tomorrow.

05 · The Experiment

claude · artifact drafting

Succeed or sunset.

The pilot:

2 stores, selected for representative member profiles
60 days of operation
$15K capped spend, including tech + comms
3-of-4 success gate—miss 2+ and we sunset the program

A pilot without clear success (and failure) conditions is a project, not an experiment. The explicit success criteria are what make it a test:

MAU

≥ 800

from current 480

30-day retention

≥ 38%

from 27%

Cost per member

< $11

from $14.50

Program NPS

> 25

from 12

How I used Claude

Claude drafted the pilot memo and the success-gate framing, stress-tested against the Exec sub-agent's “show me a number” critique, and tightened the success conditions. The research debt—what we don't know but would need to validate post-pilot—lives in a memory file tagged “known unknowns” so it can't get lost.

06 · The Artifact

claude code · build partner

The prototype is live.

Take the quiz ↗. Sixty seconds. It maps you to one of 16 archetypes, recommends a drink from the actual Basecamp menu, and mints a one-time discount code. The recommender is a pure function—facet state in (from the user's answers), drink and archetype out. No AI call at runtime. The intelligence is in the facet system.

Facet Mapping7 dimensions · 16 drinks · 16 archetypes

Style: espresso-based · brewed · cold-brewed · tea-based
Temperature: hot · iced
Strength: light · medium · bold · extra-bold
Milk: black · whole · 2% · oat · almond · soy
Sweetness: none · touch · sweet · indulgent
Flavor: fruity · floral · chocolate · nutty · caramel · spicy · earthy
Roast: light · medium · dark

Size is deliberately excluded—it's a volume choice, not a taste signal. Six quiz questions capture these seven dimensions; the recommender scores every drink in the menu against the resulting facet state.

Drinks, Archetypes, Facets, and Coverage

16 drinks · 16 archetypes · 7,168 quiz paths

Espresso-based · 8

Espresso·The Purist

2.1%

bold – extra-bold · black · hot · chocolate, nutty, caramel, earthy, fruity

Macchiato·The Margin Note

0.3%

bold · milk optional · hot · chocolate, nutty, caramel, earthy

Cortado·The Even Keel

2.8%

medium – bold · dairy required · hot · chocolate, nutty, caramel

Cappuccino·The Old Soul

1.0%

medium – bold · dairy required · hot · chocolate, nutty, caramel, spicy

Latte·The Steady

7.9%

light – medium · milk optional · hot · iced · chocolate, nutty, caramel, floral, spicy

Mocha·Warm Throws

5.2%

medium – bold · dairy required · hot · iced · chocolate

Americano·The Long Walk

5.4%

medium – bold · milk optional · hot · iced · fruity, chocolate, nutty, caramel, earthy

Affogato·Sweet Ceremony

0.3%

bold · dairy required · hot · chocolate, caramel

Brewed · 3

Drip Coffee·The Regular

21.5%

light – bold · milk optional · hot · iced · chocolate, nutty, caramel, earthy

Pour-Over·The Ritualist

1.4%

light – medium · black · hot · fruity, floral, chocolate, caramel

Red Eye·Second Wind

2.1%

extra-bold · milk optional · hot · earthy, chocolate

Cold-brewed · 3

Cold Brew·The Slow Burn

11.0%

bold · milk optional · iced · chocolate, nutty, caramel

Nitro Cold Brew·The Velvet

9.5%

medium – bold · milk optional · iced · chocolate, nutty, caramel

Cold Brew Tonic·Sun Season

4.5%

light – medium · black · iced · fruity, floral

Tea-based · 2

Matcha Latte·Still Water

14.7%

light – medium · dairy required · hot · iced · floral, earthy

Chai Latte·Spice Keeper

10.3%

medium – bold · dairy required · hot · iced · spicy

Each drink's facet profile is the thing the recommender scores against. A user's facet state from the quiz returns the closest match on strength, milk, temperature, and flavor—filtered first by style family. Every drink maps 1-to-1 to an archetype (shown in italic next to each drink), so the recommender returns an identity and an order at the same time. The percentage shows each drink's share of the 7,168 distinct quiz answer paths—a coverage audit, not a forecast (real users don't pick answers uniformly at random).

Deliberately not in the MVP:

Auth/user accounts
- This would theoretically be integrated into the Basecamp app/site, but it is not MVP scope. The session code is minted once per pageview to prevent gaming within a single visit; cross-session gaming is the gap auth would close.
POS/order-flow integration
- The discount code copy CTA is an MVP shortcut. Ideally, the button would initiate the order flow.
Analytics instrumentation
- Hooks documented, not wired.

What you see is the prototype that answers “could a 60-second identity artifact drive a meaningful visit moment?” A production version would wire the rest.

A handful of design/growth/data/architecture decisions from the build are worth pulling up—short annotations on what we tried, what we chose, and what we gave up. Each one is present-tense enough to ship and future-minded enough to survive the production rewrite.

Architecture

Pure recommender at runtime

Could have wired a model call per result—flexible, but expensive on latency, reliability, and cost. Chose a pure function in lib/recommender.ts: facet state in, drink and archetype out. Deterministic, zero-latency, testable. A coverage audit script verifies every one of the 16 drinks stays reachable after any scoring change.

Architecture · Data

Facet mapping, not personality buckets

First instinct: map users directly to personality archetypes. The valid recommendation set is the actual menu, though—16 real SKUs. Rewrote to decompose drinks into facets (temperature, strength, milk, sweetness, flavor, roast, style) and score. By construction, every archetype points at a drink a customer can order. Size is deliberately excluded.

Design · Data

Oblique prompts, not direct facet questions

Early quiz asked “how strong do you like your coffee?” Users picked the category they thought they should, not the one they meant. Rewrote every prompt into sensory/ritual framing—striking a match, drawing a bath. Answer text never names the facet it probes. Lost some directness; gained honest taste signal.

Architecture · UX

Committed vs. draft state

First cut updated the drink card live as the editor mutated. This was too flickery—every toggle felt like the rug moving. Split state: committedState drives the displayed recommendation, draftState is what the editor mutates. The card re-computes only when the user hits “Find my new ritual.” Iteration, not chaos.

Data · Growth

Session code as measurement

Initial draft re-minted the discount code on every preference update. That broke the instrument. Locked the code on first quiz completion, preserved across edits AND re-takes. Measurement can now include alignment between the recommendation (as encoded in the discount code) and the drink actually purchased, in addition to conversion. We can also review alignment between percentage of users recommended a certain drink and the likelihood of that output to help frame this data.

Growth · UX

Editable profile, not one-shot

Course requirements end at the quiz and results. I added the editable taste profile from my expertise. A user who hits a result they don't like can tweak instead of re-taking the quiz. This sets up the pattern for a persistent profile in production. In theory, there would be a separate user profile that also houses the taste profile utility. That gives users a consistent surface to return to, and gives the business a reliable core data stream.

How I used Claude

Next.js 16 app, built with Claude Code over a weekend. GitHub → Vercel on every push to main. Claude wrote most of the code; I wrote the spec, steered the architecture, reviewed diffs, and caught “almost right” decisions that matter. (It's important to note that they were countless throughout the process; AI is fallible still.) That balance—human sets direction, Claude executes, human reviews—is what I think of as the modern PM (Builder) workflow.

07 · How This Was Built

The Claude Code workflow.

The features I leaned on, concretely:

Project memory

A MEMORY.md index plus topic-scoped files (feedback rules, project state, references). Decisions and voice preferences persist across sessions. When I come back tomorrow, Claude already knows how I work.

Custom sub-agents

Three advisors (Exec/Product Designer/Barista Lead) plus a Growth Product Strategist. Invoke any for a lens-specific review without re-priming context.

Plan mode

For anything non-trivial. This very page was planned, reviewed, revised before a line of code was written. Plan mode forces alignment before commitment.

CLAUDE.md at the repo root

Durable working context — scenario, brand voice, stakeholders, what's been done, what's next. Claude reads it on every turn.

Skills and hooks

Recurring stuff: deploy, status, quick iterations. Less context switching, less re-explaining.

The human in the loop

Claude wrote the code and drafted the prose. I set direction, wrote the spec, caught the “almost right” calls, and owned the decisions. None of this works without that part.

These aren't novel tools on their own. The workflow is: treat Claude like a smart junior. Give it durable context, lens specialization, and structured decision points. Then iterate fast, review everything, own the direction.

Recursive detail: this case study was planned in plan mode, revised after my first draft missed the meta-frame entirely, written in ~500 lines of TSX with Claude doing the heavy typing, and shipped via a push to main. A full business day of work to build a functioning prototype and detailed case study.

For a deeper dive into how this case study ended up on this site, check out Building this site →.

If this resonated, two next steps: Review my resume → or Book a 30-min product chat ↗.