Feather · open lab · notebook 001

Experiments

field notes from the lab — unpolished, on purpose

The lab notebook — small builds and probes at the intersection of AI and accessibility, plus a few methods experiments beyond it. Finished, failed, and mid-run. One featured up top; the rest in order, February 2026 on. Logged as I go, not cleaned up after.

Featured

EXP-ACC-001May 23 – Jun 11, 2026990 trials

Is AI-generated UI accessible by default?

The plan
3 accessibility guidance options on/off (2×2×2 = 8 prompt conditions) × 6 models × 15 runs = 720 trials, plus 3 control arms

Result
With a bare prompt, 61% of generated modals had proper dialog markup (role="dialog" + aria-modal). Citing the ARIA APG pattern raised that to 99%.

Notes
The question pretty much everyone wants an answer to, right? Yes, you're correct. It's not that simple.

Read the field notes →

Chronology

EXP-DSC-001Feb 19 – 20, 2026

AI-assisted design system component identification

The plan
5 production sites · detect the design system · inventory components across pages · score complexity

Result
identified 15 and 32 component families on the two sites with a design system; none on those without

Notes
next steps — move from identification to prioritization
EXP-TRE-001Feb 27 – Apr 6, 2026

Designing the day for flow

The plan
investigate effectiveness of AI co-planning my schedule to optimize for flow and anticipate/design around momentum breakers

Result
worked well enough that it's been in my planning ever since

Notes
not a magical solution; improved but still iterating & identifying momentum breakers
EXP-TRE-002Apr 24, 2026 → ongoing

Calibrating a writing voice with AI as the mirror

The plan
AI drafts → I get frustrated with the quality and take over → compare to the published version → recalibrate

Result
outgrew the experiment — now a standing tool I run on real drafts

Notes
created skill that incorporates stop-slop, inclusive language, and no-new-invented-hyphenated-terminology
EXP-TRE-003May 14 – 16, 2026

Does restructuring an AI agent's instructions improve its output?

The plan
same agent, two versions — original prose vs. a restructured rewrite — same task, 7 trials each

Result
looked promising on one trial; across 7 the advantage vanished — killed it

Notes
A clean negative is still an answer; investigating other methods
EXP-ACC-002May 23, 2026 → ongoing

↳ scales up the accessible-by-default study

Does an accessibility reference change the quality of code generated?

The plan
the rigorous scale-up — fully composed screens, judged by running the code (not just what automated tools catch) · 4 guidance conditions × 7 models × scenarios — designed

Status
designed · pre-registering · instrument in build

Notes
automated tools only catch so much; this measures what they miss
EXP-ACC-003May 24, 2026

Can computer vision reliably name the components on a screen?

The plan
a vision detector + Claude vision on real screenshots — name components + read intent

Result
promising on the pilot; not yet scaled
EXP-TRE-004May 27, 2026 → ongoing

What's the right adversarial architecture to improve an outcome?

The plan
4 different Chief of Staff framings × real decisions + multi-turn adversarial rounds with external model check

Status
live in production but currently on probation; under review

Result
a single critique pass only produced review — real pushback emerged only across multiple back-and-forth turns
EXP-ACC-004May 28 – 29, 2026

Four automated tools vs. one deliberately broken dashboard

The plan
4 automated testing tools × 8 components with positive and negative controls

Result
0 / 8 — none of the four automated tools caught a keyboard failure

Notes
results consistent regardless of page and component complexity; needs more investigation
EXP-ACC-005May 30, 2026 → ongoing

↳ grew out of the automated-tools study

Can a model catch what automated tools can't?

The plan
8 components × 2 models × 3 runs — designed

Status
instrument in build

Notes
first instrument fed false facts → discarded, rebuilding clean
EXP-TRE-005May 30 – Jun 5, 2026methods

Does semantic search beat plain file search? (3 steps)

The plan
three escalating runs — (1) head-to-head, does it win? (2) does it surface what plain search misses? (3) does better recall mean better answers? — vector vs. plain file search; pilot then a 15-run study

Result
Mixed: 3 / 6 queries were roughly equal on accuracy; semantic search was more cost effective and faster

Notes
most important take away — design agents/skills to fit the task profile
EXP-AGT-001Jun 3 – 4, 202643 entries

AI Engineer Melbourne 2026

The plan
send my agent to the conference; explore connections between sessions and my work, document big thinking questions for later

Result
My agent Ellis created 200+ connections between the conference sessions and my work and other fields. Truly interesting and mind-extending.

Notes
need to catalog and share the many connections for exploration

Read the field notes →

↓ more gets logged here as I run it

Experiments

Featured

Is AI-generated UI accessible by default?

Chronology

AI-assisted design system component identification

Designing the day for flow

Calibrating a writing voice with AI as the mirror

Does restructuring an AI agent's instructions improve its output?

Does an accessibility reference change the quality of code generated?

Can computer vision reliably name the components on a screen?

What's the right adversarial architecture to improve an outcome?

Four automated tools vs. one deliberately broken dashboard

Can a model catch what automated tools can't?

Does semantic search beat plain file search? (3 steps)

AI Engineer Melbourne 2026