Leave the lab

Feather · open lab · notebook 001

Experiments

field notes from the lab — unpolished, on purpose

The lab notebook — small builds and probes at the intersection of AI and accessibility, plus a few methods experiments beyond it. Finished, failed, and mid-run. One featured up top; the rest in order, February 2026 on. Logged as I go, not cleaned up after.

Featured

  • EXP-ACC-001May 23 – Jun 11, 2026990 trials

    Is AI-generated UI accessible by default?

    The plan
    3 accessibility guidance options on/off (2×2×2 = 8 prompt conditions) × 6 models × 15 runs = 720 trials, plus 3 control arms
    Result
    With a bare prompt, 61% of generated modals had proper dialog markup (role="dialog" + aria-modal). Citing the ARIA APG pattern raised that to 99%.
    Notes
    The question pretty much everyone wants an answer to, right? Yes, you're correct. It's not that simple.

    Read the field notes →

Chronology

  • EXP-DSC-001Feb 19 – 20, 2026

    AI-assisted design system component identification

    The plan
    5 production sites · detect the design system · inventory components across pages · score complexity
    Result
    identified 15 and 32 component families on the two sites with a design system; none on those without
    Notes
    next steps — move from identification to prioritization
  • EXP-TRE-001Feb 27 – Apr 6, 2026

    Designing the day for flow

    The plan
    investigate effectiveness of AI co-planning my schedule to optimize for flow and anticipate/design around momentum breakers
    Result
    worked well enough that it's been in my planning ever since
    Notes
    not a magical solution; improved but still iterating & identifying momentum breakers
  • EXP-TRE-002Apr 24, 2026 → ongoing

    Calibrating a writing voice with AI as the mirror

    The plan
    AI drafts → I get frustrated with the quality and take over → compare to the published version → recalibrate
    Result
    outgrew the experiment — now a standing tool I run on real drafts
    Notes
    created skill that incorporates stop-slop, inclusive language, and no-new-invented-hyphenated-terminology
  • EXP-TRE-003May 14 – 16, 2026

    Does restructuring an AI agent's instructions improve its output?

    The plan
    same agent, two versions — original prose vs. a restructured rewrite — same task, 7 trials each
    Result
    looked promising on one trial; across 7 the advantage vanished — killed it
    Notes
    A clean negative is still an answer; investigating other methods
  • EXP-ACC-002May 23, 2026 → ongoing

    ↳ scales up the accessible-by-default study

    Does an accessibility reference change the quality of code generated?

    The plan
    the rigorous scale-up — fully composed screens, judged by running the code (not just what automated tools catch) · 4 guidance conditions × 7 models × scenarios — designed
    Status
    designed · pre-registering · instrument in build
    Notes
    automated tools only catch so much; this measures what they miss
  • EXP-ACC-003May 24, 2026

    Can computer vision reliably name the components on a screen?

    The plan
    a vision detector + Claude vision on real screenshots — name components + read intent
    Result
    promising on the pilot; not yet scaled
  • EXP-TRE-004May 27, 2026 → ongoing

    What's the right adversarial architecture to improve an outcome?

    The plan
    4 different Chief of Staff framings × real decisions + multi-turn adversarial rounds with external model check
    Status
    live in production but currently on probation; under review
    Result
    a single critique pass only produced review — real pushback emerged only across multiple back-and-forth turns
  • EXP-ACC-004May 28 – 29, 2026

    Four automated tools vs. one deliberately broken dashboard

    The plan
    4 automated testing tools × 8 components with positive and negative controls
    Result
    0 / 8 — none of the four automated tools caught a keyboard failure
    Notes
    results consistent regardless of page and component complexity; needs more investigation
  • EXP-ACC-005May 30, 2026 → ongoing

    ↳ grew out of the automated-tools study

    Can a model catch what automated tools can't?

    The plan
    8 components × 2 models × 3 runs — designed
    Status
    instrument in build
    Notes
    first instrument fed false facts → discarded, rebuilding clean
  • EXP-TRE-005May 30 – Jun 5, 2026methods

    Does semantic search beat plain file search? (3 steps)

    The plan
    three escalating runs — (1) head-to-head, does it win? (2) does it surface what plain search misses? (3) does better recall mean better answers? — vector vs. plain file search; pilot then a 15-run study
    Result
    Mixed: 3 / 6 queries were roughly equal on accuracy; semantic search was more cost effective and faster
    Notes
    most important take away — design agents/skills to fit the task profile
  • EXP-AGT-001Jun 3 – 4, 202643 entries

    AI Engineer Melbourne 2026

    The plan
    send my agent to the conference; explore connections between sessions and my work, document big thinking questions for later
    Result
    My agent Ellis created 200+ connections between the conference sessions and my work and other fields. Truly interesting and mind-extending.
    Notes
    need to catalog and share the many connections for exploration

    Read the field notes →

↓ more gets logged here as I run it