AI Builders are how you turn an LLM from a capability into a product.
I've been designing AI products for a few years now, and the same pattern keeps showing up. The model usually isn't the bottleneck. The model is calibrated, the evals are green, the latency is fine. What separates products that get adopted from products that don't is the builder surface itself, and whether the actual user (a contact-center operator, a clinician, an underwriter) can compose, test, and trust the thing in their workflow.
That's the work I'm most interested in. Turning AI primitives like tool calls, model confidence, fallback chains, and the seam between agentic and deterministic behavior into surfaces real operators can adopt. Recent work spans regulated digital health, enterprise insurance underwriting, and a Cisco enterprise marketplace at 1M+ monthly admins.
Designing AI-supported assessments and results experiences for an early-stage women's health platform - translating dense clinical data into clarity for patients and reviewable insight for clinicians.
Two surfaces tuned to opposite needs from a single model output
The hard part
Two users with opposite needs from the same model output. A patient wants clarity and a confident next step. A clinician wants enough nuance and uncertainty exposed to actually trust the model and act on it. The same data has to do both jobs.
Approach
The fix turned out to be one underlying inference, two completely different surfaces. The patient surface hides the calibration metadata and routes them confidently toward a single next action. The clinician surface exposes signal weight, flagged inputs, and easy override paths. Underneath both, we wired in a posture matrix (confidence on one axis, stakes on the other) that decides at runtime how aggressive the UI should be. Same model output, four different behaviors depending on what the user is being asked to do next.
The same inference renders four different ways depending on confidence and stakes.
Why it matters here
The shape of this problem keeps showing up in AI work. Technical and non-technical users consuming the same model output through different surfaces, wanting different things from it. Same shape as an AI builder where developers and business operators look at the same primitives (model output, tool calls, evals) and want them surfaced totally differently.
2019 - 2024Enterprise B2BMarketplace at scale1M+ users
Cisco App Marketplace
Designed the Cisco App Marketplace alongside a multi-team partnership - navigation, IA, and core discovery flows used by IT teams and partners across the Cisco ecosystem.
Decisions came from data, not opinion
The hard part
At this scale, the seams between flows are where the experience is either great or invisible. Configuration was deep, content was dense, admins were time-poor, and the best decisions about what to ship came from instrumentation rather than opinion.
Approach
We started by aligning on a single event taxonomy across the funnel (browse, configure, purchase, activate) so design, PM, and growth could read the same data and move from the same playbook. From there, every design hypothesis was tied to a specific cohort metric, and we ran multi-arm tests against dropoff and time-to-activate with statistical guardrails. The other half of the job was design system governance across multiple product squads - Figma variables, component contracts, review rituals. Quiet, foundational work that doesn't make the case-study cover, but it's what keeps an experience coherent at this scale.
Why it matters here
At this scale, the right answer is almost always "ship and measure" rather than "redesign and pray." Craft has to live alongside constraint, and the most useful skill is knowing when to defer your taste to what the data is telling you.
Optimized onboarding and activation for a men's mental health app, using behavioral analytics to find the moments where small design choices made the biggest difference for someone deciding to keep going.
Same data captured. Different posture. Drop-off moves.
The hard part
Mental health onboarding is a balancing act. The user is showing up in a vulnerable moment, and the product needs enough information to actually help them. Every additional question is a small ask of someone who's already taken a meaningful step by being there. Every question we don't ask is one less piece of context to personalize what happens next.
Approach
First move was instrumentation. We tagged every step with attempt, completion, and abandonment events, and switched the primary metric from completion rate to time-to-first-value. Completion rate flatters you; time-to-first-value doesn't. The big iteration was progressive disclosure paired with a copy rewrite. Less clinical-assessment language, more conversational check-in. Same data captured, different posture. We tested each variant against the cohort baseline before rolling out.
Why it matters here
Adoption in fragile moments is emotional, not just functional. The same dynamic shows up everywhere people are asked to use a new tool while already stretched. A contact-center agent mid-conversation. A clinician mid-shift. A user navigating their own anxiety to take the next step. Designing well for those moments is what makes a tool actually used, not just shipped.
Redesigned underwriting and onboarding for an enterprise insurance system spanning underwriters, brokers, and end customers - three roles with overlapping data and very different stakes.
One state, three role-aware surfaces, no lowest-common-denominator UI
The hard part
Multi-role enterprise platforms work best when each role gets an experience tailored to how they actually think about the work. Force one experience on everyone and you end up with the lowest common denominator. Design for each role over a shared foundation, and every user feels like the product was built for them.
Approach
The shift was treating the underlying schema as a single source of truth, and the role surfaces as projections over it. RBAC-driven view filters. Role-specific event handlers. A shared component library that absorbed the role variance into props instead of forking screens. The real unlock was co-designing the API contract with engineering early, so the design system and the data layer evolved together. When design and data are aligned from the start, the user gets an experience that feels native to their role.
Outcome
Onboarding completion increased by 18%.
Why it matters here
The job here was designing for two audiences with overlapping data and very different mental models. Same shape as a builder where developers and business operators ask different things of the same primitives. Doing this well means each user feels the product was made for them, while sharing the same foundation underneath.
AI is part of how I design now, not just what I design.
The loop runs in days, not weeks - and most of it touches code.
I prototype with code, not just in Figma.
Claude Code and Cursor are part of my daily workflow. When an idea needs to move faster than Figma allows - or when the interaction depends on async UI, model latency, streaming output, retry, or fallback states - I build a throwaway React + Tailwind prototype with AI in the loop. Output is disposable; the point is to feel the interaction before committing to a polished mock.
I treat AI primitives as design material.
System prompt design, context window budgets, tool calls, model confidence and calibration, eval suites, the line between agentic and deterministic behavior, MCP-style server boundaries - these are first-class design variables. The decision of when to expose model uncertainty versus when to hide it is a UX call, not an engineering one.
I pair with engineers in the build, not just the handoff.
I read React and Tailwind well enough to leave PR comments at the line level, propose component contract tweaks, and spot early when the design and the data shape need to evolve together. The shorter the loop between design intent and shipped behavior, the better the product.
I write to think.
Most of the design work happens before any pixels - in problem framing, in mapping ambiguity, in deciding what metric is actually being optimized and what tradeoffs are acceptable. Documents are part of how I design - they let the team align on direction early, when changes are still cheap.
Notes
Things I've been thinking about while shipping AI products.
April 2026Builder UXAI designAgentic vs deterministic
What an AI builder needs that a regular tool doesn't
On configurability, evaluation, and the line between agentic and deterministic.
Most product design assumes a single user with a single goal. AI builders ask us to widen that assumption.
The person composing the agent is rarely the person it serves. A business operator is wiring up tools for a contact-center agent who will run that agent hundreds of times a day. A developer is shipping a primitive that a non-technical operator will compose into something the developer never imagined. The builder and the user are two different people, and the design has to hold both.
A few things I've come to believe about this:
Configurability is not a feature, it's a constraint. Every option you expose is a decision the user has to make. The art is in choosing which complexity to push down to defaults and which to surface.
Evaluation has to be visible. When operators can see why the agent is making the calls it's making, they trust it. When they trust it, they ship it - and they keep refining it.
The line between agentic and deterministic is a UX call, not just an engineering one. Sometimes the user wants the agent to figure it out. Sometimes they want the rule to fire exactly the same way every time. The interface should make that choice deliberate, and reversible.
The right unit of design isn't the screen, it's the loop. Build, test, observe, iterate. The builder is only as good as the iteration cycle it enables.
January 2026AI toolsPrototypingProcess
Designing with AI, not just designing AI
On Claude Code, Cursor, Figma Make, and the collapse of the design-to-code gap.
For most of my career, the design loop ended at handoff. You'd export the Figma frames, write a spec, and watch the build slowly drift from the intent. The further the spec traveled, the worse the drift.
That's the part of design AI tools changed for me - not the part most people talk about.
Claude Code and Cursor are part of my daily workflow now. Not for production code - for prototypes that test the behavior of an interaction, not just its appearance. Hover states are easy to fake in Figma. Async UI, model latency, retry behavior, what the experience feels like when the network hiccups - those are much easier to design once you've actually felt them. So I build a quick version and play with it.
The output is throwaway. The point isn't a perfect prototype, it's that I've felt the thing before committing to a polished mock. I show up to engineering with "here's the shape I want, here's where it broke when I tried" - a much shorter conversation than handoff comments.
Figma Make, AI inside Figma's prototyping, Cursor's agentic mode - they're all converging on the same thing. The designer who can move from idea to prototype to real interaction without crossing a discipline boundary has a different velocity than the designer who can't. I'd rather be that designer than not.
February 2026AI designAgentic vs deterministic
When AI should be confident, hedge, or defer
AI primitives are design material, not engineering details handed back at the end.
The most interesting design decisions in AI products aren't visual. They're about the posture of the model.
Same model output, same data, very different surfaces - when should the AI sound confident? When should it hedge? When should it step back and hand to a human? Those are not engineering questions.
On a recent clinical assessment product, the patient and clinician needed opposite things from the same model output. The patient needed clarity and a confident next step. The clinician needed enough nuance and uncertainty exposed to actually trust the model and act on it. The model didn't change. The way we exposed it did.
I think a lot about this in the context of agentic versus deterministic systems. Agentic UI tells the user the system has choices and is making them. Deterministic UI tells the user the system is following a rule. Both can be right. But the user's mental model - and their willingness to trust, override, or defer - is determined by which posture you choose.
That's why I treat latency, confidence, evaluation, and fallback paths as first-class design variables. Designing the model's voice is design work. Skipping it just means the engineer makes those decisions instead, and they almost always make them invisibly.
March 2026ResearchCraft
Customer interviews are the unfair advantage
AI has made ideas cheap. Watching real people work has not.
A few months into using AI heavily in my own design process, I noticed something uncomfortable. I had more design ideas, faster, than I'd ever had before. And almost none of them were better.
The bottleneck moved. It used to be "what should we build?" and AI helps with that. Now it's "what's actually happening in the workflow we're trying to change?" That isn't getting cheaper.
Sitting with an underwriter as they review a submission. Watching a clinician click through three tabs to find one piece of context. Asking a contact-center agent what they were doing the moment a flow broke. None of that scales. None of it can be replaced with a benchmark.
The interviews I find most useful aren't the polished, scripted ones. They're workflow walkthroughs - sit beside the user, watch them work, ask "what just happened there?" when something looks weird. Ten minutes of that is worth an hour of structured Q&A.
The AI products I've seen succeed in production are the ones designed for the actual workflow, not the idea of it. The model is part of the answer; the product around it is the rest. For a senior designer in 2026, fluency with users is the real edge. AI tools are commoditizing the inside-out part of the job. The outside-in part - what the actual person is actually trying to do - is where the value is concentrating.
April 2026StartupsRole designHiring
Senior product designers are quietly doing PM work
A note for startup founders thinking about hiring.
The "product designer" title at startups is a partial description.
The work that actually moves the product - problem framing, prioritization, deciding what not to ship, talking to customers, defining the success metric - slides toward design when there's no PM in the room. And in most startups under fifty people, there isn't.
Designers fill the gap because design forces specificity. You can't draw a screen without first deciding what it does, who's doing it, and why. The artifact creates the conversation. PMs make the same decisions through docs; designers make them through pixels. Neither is wrong, but in resource-constrained teams the pixel is faster.
If you're hiring "Senior Product Designer" at a startup, what you're often actually hiring is a partial PM with a craft toolkit. The good ones know it. The great ones lean into it - they'll write the PRD if there isn't one, define the metric if there isn't one, run the customer interviews if there isn't a researcher.
This isn't scope creep. It's the actual job. And it's why senior designers from startups are an under-rated hire - the operating instincts they build aren't legible on most resumes.