build note · 2026-05-03

Where do agents actually live?

Agent installability is the quiet product problem underneath the agent era: where the agent runs, how a user gets it there, and what it feels like once it has a home.

Most agent conversations still begin with capability. Can the model reason through the task. Can it use tools. Can it recover when a test fails. Those questions matter, but they have started to hide the more physical problem: where does the agent actually live after it becomes useful.

I do not mean that metaphorically. A useful agent needs a surface, a launch point, local context, credentials, files, permissions, a memory of what it was doing, and a path back into the user's normal workflow. Without those, the agent is a demo in a tab. With them, it starts to become software.

App stores became inevitable only after the phone made the surface for apps obvious. Before that, software distribution was mostly a maze of downloads, installers, license keys, folder paths, and little bits of folklore. The iPhone did not make applications interesting by itself. It made their place obvious. A user knew where an app went, how it opened, how it updated, how it was removed, and what it meant for an app to belong to the device.

Agents are near that same threshold. We have spent years improving the thing that thinks. The next layer is the thing that installs.

The install problem is a product problem.

Installability sounds like infrastructure until it breaks. Then it becomes the whole product. A tool that cannot be found again is not installed. A skill that works in one client but disappears in another is not installed. A server that requires a hand-edited config file but gives no feedback when it fails is not installed. It is merely present in a way the user cannot trust.

The hard part is not just writing files to disk. It is creating a loop where the user can answer four questions without thinking: where did this go, how do I invoke it, what can it touch, and how do I undo it. Every successful install surface answers those questions quickly. Every failed one leaves the user spelunking through settings.

Codex makes this especially interesting because it is not one surface. OpenAI describes the Codex app as a desktop command center with threads, worktrees, automations, Git support, terminal actions, an in-app browser, image generation, skills, plugins, and app-to-IDE sync. The CLI is a terminal TUI with slash commands, model controls, image inputs, web search, MCP, reviews, cloud tasks, and scripting. The IDE extension brings Codex into the editor with context from open files and selections. Codex web delegates tasks to cloud environments.

That is the four-surface problem: app, CLI, IDE, and web. The user should not have to care which surface owns an install. They should care that the thing they installed follows them to the next place they work.

Codex app

Invokes and previews.

Codex CLI

Invokes and previews.

IDE extension

Invokes and previews.

Codex web

Invokes and previews.

config.toml

MCP servers and tools.

$CODEX_HOME

Local home for skills, packages, and install receipts.

The user experiences four surfaces. The install layer has to make them feel like one environment.

MCP is the shared doorway.

This is why MCP matters. OpenAI's Codex docs frame Model Context Protocol as the way to connect models to tools and context, and note that Codex supports MCP servers in both the CLI and the IDE extension. The same docs say MCP configuration lives in config.toml and that the CLI and IDE extension share that configuration. That shared file is not glamorous, but it is a product boundary. It is the difference between installing a tool into one window and installing a capability into the user's Codex environment.

The best install systems make scope visible. MCP is promising because it can make the boundary explicit: here is the server, here are the tools, here is the auth model, here is where the configuration lives. A marketplace can build on that only if it respects the boundary. It should not pretend that every surface is the same. It should make the handoff between surfaces legible.

The practical version is boring in a good way. A user finds a thing in the app. They copy or approve one install step. The MCP server is registered. The CLI can see it. The IDE can see it. The app can render the result. There is a status check. There is an uninstall path. The install is a receipt, not a scavenger hunt.

Pets are a serious case study.

Codex Pets look tiny enough to dismiss, which is why they are useful as a case study. A pet is visual, local, personal, and persistent. It makes the install surface visible in a way a hidden tool rarely does. If the pet can be generated, packaged, selected, summoned, and carried across the places Codex runs, then the same shape can apply to more serious agent artifacts.

OpenAI's Codex app settings page is the important reference point here because it places pets inside the app surface rather than treating them as detached assets. The official curated hatch-pet skill is the other half. It defines a Codex-compatible pet workflow: animated spritesheet creation, validation, preview media, and packaging. The details matter because they turn a pet from a picture into an installable object.

That is the part worth studying. The pet has a file shape. It has QA. It has constraints. It has a package. It has a surface where the user can see whether the install worked. Most agent tools should envy that clarity.

The Codex app is winning this particular install-surface argument because it gives the artifact a place to appear. The terminal is excellent for command and control. The IDE is excellent for code context. The web surface is excellent for background delegation. The app is where a persistent companion can become part of the environment instead of another thing to remember.

Marketplaces need install semantics.

If agents are going to have marketplaces, the marketplace cannot only be a catalog. A catalog answers what exists. An install system answers what happens next. The second question is harder and more valuable.

The marketplace layer needs to know which surface the user is in, which runtime can accept the artifact, what permissions are required, whether the user owns the thing, whether it can be previewed before purchase, and whether it can be removed cleanly. It also needs to avoid the worst habit of software stores: treating acquisition as the finish line. In agent software, acquisition is where the dangerous part begins.

This changes what a creator tool has to produce. A creator is not merely uploading an asset. They are publishing something that needs a manifest, a compatibility target, a preview, a policy boundary, and a support story. Even a playful object benefits from this discipline. Especially a playful object, because play often travels farther than a dry developer utility.

The Hatchery is my attempt to build one small version of that layer around Codex pets. It is not the whole answer. The public build is honest about that: the marketplace is live, the manifest API is live, and the install clients use that same contract. The point is not to wrap a half-finished loop in shinier copy. The point is to make the install boundary visible and then earn it.

The agent has to belong somewhere.

I keep coming back to that word: belong. A useful agent should belong to a project, a device, a workspace, a team, or a user's local operating rhythm. It should have a place in the UI and a place in the filesystem. It should have capabilities that can be inspected. It should leave behind artifacts the user can understand.

Capability got agents to the point where people care. Installability is what will decide whether they become everyday software. The winning agent products will not only answer better. They will live better. They will be easier to place, easier to invoke, easier to trust, and easier to remove.

That work is not cosmetic. It is the bridge between a model that can do something once and a product a person can use every day.

Back home.