relay: an AI teammate with a graph-based brain
Relay was an AI teammate. You gave it a one-line task and it owned it: it read the code, then reached out to the right people by reading its team graph. Here is how it worked, shown through a week we simulated with it running a team's coordination.
What it was
Most "AI teammates" are a chat box with a nice name. You ask, they answer. Relay was the other thing. You delegated, and it drove the work.
The goal was one line:
Multiplayer, seamlessly cross-platform, behaves like a real human teammate. A senior engineer who drives work forward by asking the right people the right questions at the right time, and keeps everyone who's affected in the loop.
To see if it actually did that, we ran a full work week. A 6-person startup. One human played all six people and sent realistic messages. Relay ran live on its real architecture, on Slack and Linear, exploring the codebase as it went. Nothing was scripted. The quotes below are from that run.
It read the code, then proposed. It did not ask.
Monday morning. The founder dropped one line on a Linear issue:
Nithin (Linear BG-42):
Webhook delivery is failing for ~3% of requests during peak
traffic. Customers are noticing. Fix this.
A normal bot would ask a clarifying question. Relay traced the code first. It followed the request path through the handler, the database lookups, and the queue. Then it came back with a diagnosis, not a question:
Relay (on BG-42):
I traced the webhook path in code. This looks like ingress
saturation under peak load, not a delivery-only problem. The
handler is still doing synchronous Postgres lookups before the
Redis enqueue, so under load it returns non-200 before the event
ever reaches the worker queue.
Already done:
- moved BG-42 to In Progress, marked it urgent
- posted the code-grounded status on the issue
- pinged Marcus with the proposed mitigation path
- pinged Dana so support has a customer-facing read on impact
It diagnosed the bug, started the issue, and pulled in the engineer who owned the code and the support lead who owned the customer. From one sentence. The founder's read on it:
I assigned it and got a code trace back in 2 minutes. It already pinged Marcus and support.
This is the whole idea. It does the first hour of the work before it says anything.
One message in, the right people out
Later that morning, the support lead hit Relay with a crisis:
Dana (Slack DM):
Acme Corp CEO just emailed, webhooks broken for 3 days,
threatening to churn. $2k/mo account. Two support tickets went
unanswered. What do I tell them and who should know?
Relay wrote Dana customer-safe copy to send, plus a short list of what not to promise yet. Then it did the part I cared about. One question from one person fanned out to the right people, each with the context their role needed:
It did not ask Dana who to notify. It already knew, because it had a graph of who owned what. It also connected the complaint to the webhook bug it had been tracing since 9am. Dana's reaction:
I asked one question and got customer email copy, escalation routing, and three people already notified. I've never had a tool do that.
It owned the whole situation, not one message
The best part only showed up across the week. Relay did not treat each message as a fresh request. It connected them.
Monday, the frontend engineer was tired of a recurring bug:
Li (Slack DM):
The Linear integration setup is broken again. Users click Connect,
do the OAuth flow, come back to a blank page. Third time this month.
I don't have time to debug this right now, can you figure out why it
keeps happening?
Relay traced the OAuth flow and found the root cause: the callback was frontend-owned and fragile, and an env var drifted between environments, so the bug kept coming back. Then it kept pulling the thread:
- Tuesday. Customer success reported a 40% onboarding drop-off. Relay connected it to Li's OAuth bug, and told Li and the PM to treat it as a conversion blocker, not a UI bug.
- Thursday. Sales closed a new customer starting the next week. Relay flagged that they would hit the same OAuth flow, gave Li a deadline, and asked CS to start the workspace setup.
- Friday. The founder asked "what happened this week?" Relay returned a board-level summary that tied the webhook bug to the Acme escalation, and the OAuth bug to the new customer and the 40% drop-off.
The frontend engineer, who had only asked one question on Monday:
I said "I don't have time to debug this" and got the root cause, why it recurs, and a fix plan. Then two days later it connected my bug to Sam's 40% drop-off metric, I didn't even know about that. It saw the whole board, across people I had not even talked to.
Over the week it ran 4 workstreams at once, across 8 surfaces, and sent about 25 outbound messages to coordinate. Nobody asked it to. That is what I mean by owning a situation.
Cracking Slack DMs
Here is the unglamorous problem that breaks most agents: a Slack DM is ambiguous.
Sometimes a DM is an open conversation. "What's the status on webhooks?" Sometimes a DM is a reply about one specific task. "PR is up, pool bumped to 50." An agent that treats every DM the same either loses the thread or files everything into one giant blob. Relay had to tell them apart, in real time, across many people and many tasks at once.
It solved this with surfaces and two-tier routing. Every place a message can happen gets a canonical key:
slack:dm:U_nithin # an open conversation with a person
slack:thread:C_eng:1700000000.000100 # a thread, can belong to one task
linear:issue:BG-42 # a Linear issue
A top-level DM stays an open conversation. The brain reads it with judgment, the way you read a Slack ping. But a thread can be wired to a task. When Relay reached out about a task, it did so in a thread and told the person where to reply:
Relay: "@marcus re: the webhook retry backoff.
Reply in this thread so I can track it."
Now every reply in that thread routes to that task with zero ambiguity. For a surface it has never seen, the brain decides which task it belongs to once, then records the mapping. After that it is a plain lookup.
The brain pays for judgment once per conversation. After that, routing is free. That is why one Relay could juggle six people and four tasks without crossing wires.
A few-line kernel, oriented within a graph-based brain
Relay did not run on a giant system prompt full of rules. It ran on a tiny kernel. A short identity, and one paragraph telling it how to read an event:
You receive events as JSON from different people across platforms.
Each event has: who, name, role, platform, surface, intent, text.
Unknown users show a raw platform ID. Use platform tools to find
out who they are.
That is most of the contract. Everything else, the brain fetched for itself.
Picture the context window as a desk. The engine puts one document on it: the task file, or the person who just spoke. The brain then walks into the library and pulls only what this event needs. The library is a graph of plain files, linked like a wiki. Every link carries the reason to follow it:
# inside the webhook task file
## People
- [[marcus]](backend engineer, owns the fix. reach out for status.)
- [[dana]](support lead, tracking the Acme escalation.)
## Related Work Items
- [[email-notifications]](blocked on this fix landing first.)
The brain reads the reason and decides if the hop is worth it for this event. One or two reads, not the whole graph. A big prompt degrades as the context fills up. A small kernel plus a graph it can navigate does not. The brain stays oriented by reading, the same way a new hire does.
The team graph is the directory
The "right people out" behavior came from one place: person files. Each person had a file with their role, what they own, and what they get to decide.
# team/marcus.md
## Decision Authority
- Infrastructure changes: decides
- API contracts: recommends, dana approves
- Deployment schedule: recommends, nithin approves
When Relay needed the right person, it did not call a "find expert" tool. It read person files and decided. The graph was the directory. And it kept the directory current: when it learned that someone owned an area, it wrote that back, so the next task started smarter.
Under the hood: a dumb engine, a smart brain
The approach above needed a strict split. The engine does no thinking. The brain does all of it.
The engine is plain Go. It takes a webhook and runs a fixed pipeline. The brain is one LLM call per event, in a function-calling loop, with tools to read and write the graph and to act on each platform.
The exact parts are code: no double processing, no two events writing one file at once, no infinite loops. The fuzzy parts are the model: is this about the bug, does this person need to know. Each side does what it is good at.
And there was no database for any of it. Every task, person, and decision was a plain file. The two maps the engine needs to route, surface to task and platform ID to person, were derived from those files and held in memory, rebuilt after every loop. Git committed every change, and a separate append-only log recorded every decision as one JSON line. One source of truth, no drift.
Lesson: if the model is your memory layer, give it a memory it can read and write. Plain files beat a schema it has to round-trip through.
Where it was heading: a coding agent
In the simulation, Relay explored code read-only. It could trace a path and diagnose a bug, but a human still wrote the fix.
The next step was to let it write the fix too. The design ran Pi, a coding agent, as another teammate. Relay would spawn it in a git worktree to change code and open a PR, with the PR becoming just another surface on the task. The point was not generic code review. By the time a PR landed, Relay already knew why it existed, so it could review against intent:
Generic agent: "Consider adding error handling to this function."
Relay: "Marcus said the retry count should be 3 (Slack, March 20),
but this hardcodes 5. Check with him before merging."
That part stayed mostly design. The coordination layer is what we actually proved.
Was it worth building?
The architecture held up. The product did not.
It was close, not done. The coordination worked across the simulated week, but it still needed polishing before real production. The loop back to the delegator was inconsistent, a couple of new bugs got investigated but never tracked, and plenty of edges were still rough. We sunset before that work happened.
We sunset after a few months, and two reasons pushed us there, neither of them the architecture.
Trust. An AI teammate that messages your engineers and routes your escalations has to be trusted by the team. We were an unknown team asking for a lot of that on day one. Adoption stalled there, before the architecture got to matter.
The platform under us moved. While we built the orchestration layer, the model providers built agent orchestration into their own products. An independent layer like ours got commoditized.
What I would keep: the tiny kernel and the graph it navigates itself. Letting the model read its way to context, instead of pre-loading everything, is the part I would reach for again on the next agent I build.
What I would change: build less of it. We designed sleep cycles, knowledge tiers, and graph governance for a scale we never reached. The lean version was the whole idea. A small kernel, a graph, and a brain that reads and writes it. The rest was building for a team we did not have yet.