How I'm teaching an AI my writing voice

Teaching an AI your voice usually means copying how your writing looks, with little attention to the reasoning behind your choices. Here's the method I built to capture that reasoning and continuously recalibrate.

Field notes from the experiment behind this post: Calibrating a writing voice with AI as the mirror →

Most ways of getting an AI to write in your voice come down to the same move: feed it a pile of your old work and hope it imitates the surface. It picks up your sentence lengths, your favourite words, and the shape of your paragraphs reasonably well.

But is voice calibration about how I put the words together? I think it's more about WHY I make the choices I make. When I cut a line from an AI draft and I can say why I cut it, that "why" is the durable thing: a value, a rule about how I want to come across, or a line I won't cross. The wording is just where it surfaced. Imitating the wording keeps none of it.

So... as one does, I started building something different that matches the way I think about the problem: a way to teach an AI the reasons behind my writing instead of the look of it. Here's how the method works. It drives an experiment I'm running on my own voice, one of a handful of experiments I'm publishing as I go.

Revisions surface reasoning better than final copy

The strongest signal comes from granular edits and revisions to a draft. That's where the reasoning lives, and it's usually reasoning I'd never have written down on its own, because I didn't know it was a rule until I caught myself applying it.

So the first job is to capture those moments well, and two things matter.

Save the whole sequence of edits, not just the end points. Probably aim for more granular than you think you'd need.
Make sure you're comparing the published version vs the very first draft. (This has got me a few times when I've done a calibration for sending out my newsletter -- I use AI to scaffold the structure, then do all the drafting locally, and then make even MORE changes when I'm in the newsletter interface... you know... "just a few more tweaks" kind of thing)

The cheapest way to do both of these is one I already had on hand: git commits. When I commit a draft as I iterate, the diff holds the what and the commit message holds the why, both captured for free and in order. If you already version your writing, you're sitting on a calibration log you haven't been reading.

The architecture

I'll stop paraphrasing. Here's what runs, in order:

Ellis, my Chief of Staff agent, fires the skill, /voice-calibration, after I publish a piece.
The spine opens a run and calls the playbook.
The playbook works through the job spec's steps.
It consults the watch protocol as it goes.
Promoted patterns land in the calibration file (BASE-CALIBRATION.md).

Below is the real instruction text each one runs on, with only the file paths cleaned up.

The skill

Start to finish

Identify the publish event (artifact, source draft path, publish commit, published version)
Load the as-of-publish source via git show <commit>:<path> (never current on-disk per the source-version-disambiguation principle)
Diff + categorize change types per the watch protocol methods
Cross-reference correction moments against the calibration file
Surface candidate patterns one verdict at a time
Capture verdicts (promote / defer / dismiss) to the right destination

That second step is the one I keep relearning. It reads the version that existed at publish time, pulled straight from the commit, never the current file, which has usually changed since the piece went out.

The job spec

The change types

Categorize change types per the watch protocol methods: word choice (substitutions), structural (sentence/paragraph reordering, length changes), framing (recognition-vs-reframe, audience positioning, no-blame), register (corporate vs. plain, vocabulary level), length (cuts, expansions).

The three signal classes

Final-converged signal (strongest). Claude-baseline (earliest snapshot or Claude's first authored state) → published. Traditional draft-vs-published diff; the endpoint diff is the most-settled voice signal.

Along-the-way signal (instructive). Each consecutive snapshot pair across the sequence is its own learning moment. The arc reveals voice patterns the endpoint diff alone misses — what Derek nudged you off of, what survived multiple revision passes, what flipped late.

Override-of-own-correction signal (high-value flag). When Derek's mid-process correction at step N is itself reversed at step N+M (Derek changes Claude's prose one way mid-flow, then later reverses to a different framing). These reveal where Derek's stated preferences diverge from his settled preferences — first-correction reflexes versus final-considered position.

That last one is an important signal... sometimes I walk back a previous edit, or rewrite completely after reconsideration. This catches the gap between what I initially thought I wanted to say and what I actually ended up saying.

Checking each correction

For each correction moment, ask: does this match an existing principle (confirmation), extend an existing principle (refinement), contradict an existing principle (revisit needed), or reveal something new (candidate)?

Capturing a verdict

Surface candidates one at a time. Present each candidate pattern with: what changed, the canonical incident, the proposed principle text, and the suggested home.

promote → write the new principle text to the calibration file in the suggested section; update the changelog entry at the file's end.
defer → append the candidate to today's dated proposals file (creating the file if it doesn't exist); the user reviews these in batch later.
dismiss → append to the experiment log with the date, candidate text, and the user's reason for dismissing. Future runs see it and don't re-propose the same pattern.

The synthesis gate

Synthesis gate — when a candidate contradicts an active principle already in the calibration file, surface both views (the existing principle + the new evidence) and request explicit verdict on which holds. Never auto-merge contradictions.

Contradictions are where a voice actually lives, so resolving them stays my call, not the script's.

The watch protocol

Every signal here comes from one place: me correcting Claude. That's a narrow lens. If Claude leans on a particular phrase and I cut it every time, "I avoid that phrase" looks like a rule of my voice, when it might just mean Claude overuses it and I keep telling it not to. So I don't lock a pattern in until it shows up somewhere besides my edits to Claude: in my own past writing, or something I have another model generate.

Triangulation thresholds

Single method: Interesting signal, needs more observation

Two methods: Likely pattern, watch for third confirmation

Three+ methods: Locked principle, high confidence

This prevents overfitting to "how Derek corrects AI" vs. capturing "Derek's actual voice."

Technique by context

Voice-Mode / Driving Sessions: Comparative selection (A/B), Anti-voice identification, Constraint exercises

Desktop Deep Work Sessions: AI draft → Derek reacts, Deconstruction, Live annotation

Async / Lower Bandwidth Sessions: Derek free-writes, Cross-context pattern matching, Historical evolution analysis

That watch protocol checks in on BASE-CALIBRATION.md with every run to see if we're hitting thresholds for inclusion or if we need refinements.

And you can too!

Nothing about this method means it only works for me. If you write, and you've started letting an AI draft for you, the moments where you "take the pen back" have some real value. Let them do some of the work.