Skip to content

Prompt Chain: Automated Verbatim Coding Pipeline

For Research Analysts ·

Claude

For Research Analysts

Tools: Claude Pro | Time to build: 1-2 hours | Difficulty: Intermediate-Advanced Prerequisites: Comfortable using Claude for verbatim coding — see Level 3 guide: "Open-Ended Verbatim Coding with Claude"


What This Builds

A multi-step prompt chain that takes your raw open-ended survey responses and produces: (1) batch-coded verbatims with consistency checks, (2) frequency counts by code, (3) a thematic synthesis narrative, and (4) the top 5 quotes per theme — all in one systematic workflow. Instead of four separate manual tasks spread over a day, this runs as a connected sequence in under 2 hours for a 500-response dataset.

Prerequisites

  • Claude Pro ({{tool:Claude.plan}} at {{tool:Claude.price}}) — the extended context handles larger batches
  • Your verbatim data exported from Qualtrics as a numbered list or CSV
  • A finalized codebook (codes with definitions and examples)
  • Comfortable with the basic Claude verbatim coding workflow (Level 3)

The Concept

A prompt chain is like an assembly line where each step produces output that feeds the next step. Instead of doing all four tasks manually in separate sessions, you run them as a sequence — each step builds on the last. The key insight: if the first step (coding) is accurate, every downstream step (frequencies, synthesis, quotes) is automatically accurate too. Fix quality at the source, not at the end.

Think of it as building a research analysis pipeline in four stages that you run end-to-end every time you have a new dataset.


Build It Step by Step

Part 1: Prepare your data and codebook

  1. Export your open-ended responses from Qualtrics: Data & AnalysisExport & ImportExport Data → choose CSV
  2. Open the CSV. Copy the column with your open-ended responses into a plain text file, numbered 1 through N
  3. Clean the data: remove any rows with blank responses, remove PII if required by your data policy
  4. Finalize your codebook with this format for each code:
    • Code name
    • Definition (1-2 sentences)
    • Positive examples (2-3 verbatim examples that should get this code)
    • Edge cases (1-2 examples that might seem like this code but shouldn't be)

Part 2: Run Step 1 — Batch Coding

Open Claude Pro and start a new conversation. Paste the full prompt chain setup:

Copy and paste this
I'm going to run a 4-step verbatim analysis pipeline. Please confirm you understand each step before we start.

STEP 1 (now): Code all open-ended responses using the codebook below.
STEP 2 (after step 1): Calculate frequency counts for each code.
STEP 3 (after step 2): Write a thematic synthesis narrative.
STEP 4 (after step 3): Pull the top 5 most representative quotes for each theme.

CODEBOOK:
[Code 1 Name]: [definition] | Examples: [example 1], [example 2] | Not this code: [counterexample]
[Code 2 Name]: [definition] | Examples: [example 1], [example 2] | Not this code: [counterexample]
[repeat for all codes]

CODING RULES:
- Apply 1-3 codes per response
- Flag with [REVIEW] if genuinely ambiguous
- Return as a table: # | Response (first 10 words) | Codes | Notes

Ready to start? I'll paste the first batch now.

Paste responses in batches of 50. After each batch, check 10 responses against your manual judgment. Correct any systematic errors before continuing.

Part 3: Run Step 2 — Frequency Counts

After all batches are coded, paste this prompt:

Copy and paste this
STEP 2: Now calculate the frequency of each code across all [N] coded responses. For each code, give me:
- Total count
- Percentage of all responses
- Any notable patterns (e.g., this code appears much more often in combination with another code)

List codes from highest to lowest frequency.

What you should see: A frequency table with counts and percentages, plus any patterns Claude noticed across the coded data.

Part 4: Run Step 3 — Thematic Synthesis

Copy and paste this
STEP 3: Based on the coded data and frequency counts, write a thematic synthesis. 

Organize around the [3-4] most important themes (codes or clusters of related codes). For each theme:
- Bold headline stating the key insight (not just "Theme: Price")
- 2-paragraph narrative explaining what respondents are actually saying and why it matters for [the client's business question]
- Note any interesting subgroups (e.g., "this theme is disproportionately mentioned by detractors")

Research question we're answering: [paste the research question]

What you should see: A structured synthesis with insight-forward theme headlines and substantive narratives, ready to shape the report's qualitative analysis section.

Part 5: Run Step 4 — Pull Representative Quotes

Copy and paste this
STEP 4: For each of the [3-4] themes you identified, pull the 5 most representative verbatim quotes. 

Criteria for selection: specific (not vague), authentic-sounding, representative of the theme (not an extreme case), and suitable for a client presentation.

Return as: Theme Name | Quote | Response # (so I can verify)

What you should see: A curated set of 15–20 quotes, organized by theme, ready to use as pull quotes in your report and deck.


Real Example: Churn Driver Analysis

Setup: You have 420 responses to "What is the main reason you stopped using our service?" Your codebook has 8 codes.

Input: 420 numbered responses pasted in 9 batches of 47.

Pipeline run:

  • Step 1: 9 batches coded in 45 minutes. [REVIEW] flags on 22 edge cases.
  • Step 2: Frequency table shows "Price/Value" (34%), "Customer Service" (28%), "Competitor Switch" (19%), "Product Gap" (12%), Other codes (7%)
  • Step 3: Synthesis identifies 3 themes — the price sensitivity narrative, the service failure cluster, and the silent switcher (competitor) pattern — all framed around the client's question about retention strategy
  • Step 4: 15 quotes extracted, including 5 vivid price objections, 5 service failure stories, and 5 competitor mentions with specific brand names mentioned

Time saved: 7 hours of manual coding + synthesis → 90 minutes of systematic pipeline execution


What to Do When It Breaks

  • Claude loses track of the coding scheme partway through → Paste the codebook again at the start of a new batch with: "Reminder of coding scheme before this batch: [codebook]"
  • Frequency counts don't add up → Ask Claude to show its work: "Show me the tally for Code X — list all response numbers you assigned it"
  • Synthesis narrative is too generic → Add more context to the Step 3 prompt: "The client is trying to decide whether to improve pricing or service first — frame the synthesis around that decision"
  • Quotes selected are too extreme or unusual → Ask for replacements: "Quote #3 for Theme A is too extreme to use in a client deck. Replace it with a more representative example."

Variations

  • Simpler version: Just run Steps 1 and 4 (code + extract quotes) — skip the formal frequency counts and synthesis if you prefer to write that manually
  • Extended version: Add a Step 5 — "Compare these themes to the coded data from our last wave [paste prior wave codes] and identify what's changed"

What to Do Next

  • This week: Run this pipeline on your next open-ended question with 100+ responses and time the difference
  • This month: Standardize the prompt chain for your most common question types (churn drivers, satisfaction reasons, brand perceptions)
  • Advanced: Combine with a Claude Project for the client engagement so the codebook and prior wave context are pre-loaded, eliminating setup time entirely

Advanced guide for Research Analyst professionals. Claude Pro recommended for datasets over 200 responses.