how we taught a slack bot to run code (safely-ish)

how we taught a slack bot to run code (safely-ish)

February 25, 2026

this blog post was a collaboration between two people. gorkie itself was built by devarsh, and i led the sandbox/runtime work, with devarsh helping throughout with testing, ideas, and sanity checks.

intro

gorkie started as a helpful ai slack assistant: answer questions, help with tasks, and be useful in threads. then users started asking for things that needed real execution, not just text.

examples:

  • "run this script"
  • "convert this file"
  • "download this thing"
  • "inspect this repo"
  • "do the annoying terminal thing for me"

at some point, you either say "sorry, i can't" or you give it a sandbox. we chose sandboxes.

what stayed constant

from phase 1 onward, a few things stayed important regardless of provider or agent:

  • syncing attachments into the sandbox
  • showing tool status (so users can see progress, not just "thinking…")
  • streaming updates back to slack instead of waiting for one giant final blob

we also use slack's ai implementation for the chat surface and interaction model, which lets us render interactive ui directly in slack:

phase 1: vercel sandboxes

we were already using vercel's ai sdk, so vercel sandboxes were the obvious first step. they had a generous free tier and were easy to try.

we gave gorkie a sandbox tool that spawned a ToolLoopAgent with tools to work inside the sandbox.

the flow looked like this:

  1. model decides it needs execution
  2. it calls a sandbox tool
  3. the sandbox runs a subagent and completes the task
  4. results go back to gorkie, then to slack

in code, this looked like get-or-create with snapshot restore plus redis TTLs for thread persistence:

// source: https://github.com/imdevarsh/gorkie-slack/blob/feat/sandbox/server/lib/ai/tools/execute-code/sandbox.ts
const live = await reconnect(ctxId);
if (live) {
  return live;
}

const restored = await restoreFromSnapshot(ctxId);
const instance =
  restored ??
  (await Sandbox.create({
    runtime: config.runtime,
    timeout: config.timeoutMs,
  }));

await redis.set(redisKeys.sandbox(ctxId), instance.sandboxId);
await redis.expire(redisKeys.sandbox(ctxId), config.sandboxTtlSeconds);

this made per-thread persistence possible before we had a full DB session model.

on shutdown, we also snapshot and save the snapshot id with a TTL:

// https://github.com/imdevarsh/gorkie-slack/blob/feat/sandbox/server/lib/ai/tools/execute-code/sandbox.ts
const snap = await instance.snapshot().catch((error: unknown) => {
  logger.warn({ sandboxId, error, ctxId }, 'Snapshot failed');
  return null;
});

if (snap) {
  await redis.set(redisKeys.snapshot(ctxId), snap.snapshotId);
  await redis.expire(redisKeys.snapshot(ctxId), config.snapshotTtlSeconds);
}

this let us resume work even after the running sandbox died.

it worked, but it felt janky for anything beyond short runs.

what sucked

  • not as advanced as e2b/daytona-style sandboxes
  • limited lifecycle controls
  • snapshot support was not where we needed it
  • random bugs and weirdness
  • easy-to-hit limits (bandwidth, storage, etc.)

phase 1 proved sandboxes made gorkie much more useful, but it wasn't the foundation we wanted.

phase 2: e2b

next we moved to e2b. e2b is built for ai-agent execution, and it felt like a more serious provider.

we kept the same architecture (tool loop agent), only swapping the sandbox layer. overall, this phase worked fine for users.

at the data layer, we made thread ownership explicit with a thread-to-sandbox session table:

// source: https://github.com/imdevarsh/gorkie-slack/blob/main/server/db/schema.ts
export const sandboxSessions = pgTable('sandbox_sessions', {
  threadId: text('thread_id').primaryKey(),
  sandboxId: text('sandbox_id').notNull(),
  status: text('status').notNull().default('creating'),
  pausedAt: timestamp('paused_at', { withTimezone: true }),
  resumedAt: timestamp('resumed_at', { withTimezone: true }),
  destroyedAt: timestamp('destroyed_at', { withTimezone: true }),
});

this made thread -> sandbox ownership explicit and durable. cleanup still lived on our side via janitor jobs for expired sessions, which is exactly where operational burden started to grow for us.

but then reality hit

as usage grew, we realized the provider was not the biggest issue. the bigger issue was our tool loop architecture.

it handled short tasks, but it struggled with long-term runtime needs:

  • compaction
  • skills
  • mcp plumbing
  • coordination and orchestration
  • stateful session handling
  • retries and failure recovery without spaghetti

in practice, we were slowly building our own runtime by accident.

we also hit provider-level friction with e2b:

  • we had to keep extending timeouts
  • persistence existed, but it was beta and sometimes buggy
  • the model felt closer to "create sandbox, kill sandbox" than true persistence

what we wanted was persistence-first from day one.

phase 3: daytona

then we switched to daytona. daytona is built around lifecycle control and persistence, which matched our model much better.

daytona fit because:

  • great dx
  • persistence-first design
  • cleaner lifecycle controls
  • a natural mapping to "this thread owns a runtime"

state model

we persist sandbox state in our database so each thread can reattach to its runtime. the core model is:

  1. threadId -> sandboxId + sessionId
  2. state transitions (active, paused, resuming, error, destroyed)
  3. automatic reattach on the next message in the same thread

daytona lifecycle settings made cleanup simple:

  • auto-stop after 5 minutes of inactivity
  • auto-archive 2 hours after stop
  • auto-delete after 2 days

no custom janitor needed.

with daytona, lifecycle and session mapping became much simpler in code:

// source: https://github.com/imdevarsh/gorkie-slack/blob/feat/daytona-pi/server/lib/sandbox/session.ts
const sandbox = await daytona.create({
  autoStopInterval: config.timeouts.stopMinutes,
  autoArchiveInterval: config.timeouts.archiveMinutes,
  autoDeleteInterval: config.timeouts.deleteMinutes,
  snapshot: SANDBOX_SNAPSHOT,
});

await upsert({
  threadId,
  sandboxId: sandbox.id,
  sessionId: session.id,
  status: 'active',
});

cleanup shifted from our cron logic to platform lifecycle settings.

once daytona was in place, the next question was: what coding agent should run inside the sandbox?

phase 4: sandbox agent + opencode

we tried sandbox agent by rivet.dev, with opencode behind it.

sandbox agent runs inside the sandbox, exposes an http interface, and communicates over acp. to keep sandbox-agent stable, we had to boot, health-check, and restart it when needed. that made the server-in-sandbox shape survivable, but it was still extra moving parts.

why it was better

it offloaded orchestration to something purpose-built.

why we still moved on

it added too many failure points:

  • an http server inside the sandbox
  • more moving parts
  • more "if this dies, everything dies"

opencode also felt over-engineered for our use case and used more tokens than we wanted.

phase 5: pi with sandbox agent

next we tried pi (a coding agent) with sandbox agent.

pi gave us what we cared about:

  • skills
  • mcps
  • easy extensibility

this phase was better, but the core shape was still "http server inside sandbox," and we still saw reliability issues. crashes happen. we wanted boring infra.

phase 6: pi rpc

the cleanest setup was rpc with pi.

at this point, our two integration paths were acp/http-server mode or rpc, and rpc won for simplicity.

rpc is basically:

  • send json
  • get json back

no extra daemon, no internal http listener to babysit, just a process.

what we built

we built a small rpc client inspired by pi's own client.

flow:

  1. gorkie decides execution is needed
  2. we spin up or attach to the thread's daytona sandbox
  3. we start pi in rpc mode
  4. rpc client sends prompt + context
  5. pi runs commands/edits
  6. rpc returns structured output
  7. we stream progress/results back to slack

rpc path details:

  • launch pi --mode rpc inside a PTY
  • disable PTY echo so JSON is not mirrored back
  • use newline-delimited JSON messages
  • resume with pi --mode rpc --session <id>

the rpc boot path was just PTY startup, stty -echo, and optional session resume:

// source: https://github.com/imdevarsh/gorkie-slack/blob/feat/refactor-pi/server/lib/sandbox/rpc.ts
const piCmd = sessionId
  ? `pi --mode rpc --session ${sessionId}`
  : 'pi --mode rpc';

await pty.sendInput(`stty -echo; exec ${piCmd}\n`);
await client.waitUntilReady();

this removed the internal HTTP server and reduced failure points. events came back as newline-delimited JSON over the PTY stream, which kept transport simple and robust over stdout.

this was simple, fast, and reliable.

tools with pi

pi also made tool integration much cleaner.

before (sandbox agent + opencode), tool wiring looked like:

  • define custom mcp server
  • run that MCP server in the sandbox
  • connect the MCP server to the coding agent (opencode)
  • intercept the tool call
  • route it

with pi, we can:

  • define a custom tool
  • plug it in directly
  • add status and metadata
  • extend without inventing more plumbing

for tools, we registered handlers directly (for example showFile for Slack uploads) with validation built in:

// source: https://github.com/imdevarsh/gorkie-slack/blob/feat/refactor-pi/server/lib/sandbox/config/extensions/tools.ts
pi.registerTool({
  name: 'showFile',
  label: 'showFile',
  description:
    'Signal the host to upload a sandbox file to Slack once it is ready.',
  parameters: showFileParams,
  execute: (_toolCallId, params) => {
    const { path, title } = params as Static<typeof showFileParams>;
    if (!nodePath.isAbsolute(path)) {
      throw new Error('showFile.path must be absolute');
    }
    return Promise.resolve({
      content: [{ type: 'text' as const, text: `Queued upload for ${path}` }],
      details: { path, title: title ?? null },
    });
  },
});

this made tools first-class instead of pushing them through a MCP server.

current architecture

demo video

lessons learned

the stuff we wish we knew earlier:

  • sandboxes are easy; orchestration is the real boss fight
  • "just build it" becomes "we accidentally built a runtime" very quickly
  • persistence-first is a different world from spin-up/spin-down
  • fewer moving parts wins almost every time
  • running an http server inside your sandbox sounds cool until it isn't

closing

sandboxes upgraded gorkie from a helpful chatbot to something that can actually do work.

moving to daytona + pi rpc made the system:

  • simpler
  • cheaper (tokens + complexity)
  • more reliable
  • easier to extend

now we can spend time on features instead of babysitting infra.

vercel also launched a new Chat SDK that supports multiple runtimes. we may rewrite Gorkie with the Chat SDK in the future.

code

the codebase is available on github:

Gorkie Slack (feat/refactor-pi)

credits

this blog post's style is inspired by xyzeva. this is probably the first blog post i've ever written 😭.

Leave comment

Written by

Anirudh & Devarsh

Created At

Wed Feb 25 2026

Updated At

Wed Feb 25 2026