Skip to main content
  1. Documentation/

Agent Server

What is PocketHook Agent Server?
#

The agent server turns PocketHook into a full AI assistant. Instead of writing response logic yourself, you connect an LLM (Claude, GPT, Gemini, etc.) that processes messages, calls tools, and returns structured PocketHook responses — including Shortcut triggers.

The server runs on your own machine. Your data stays with you.

This is a starting point. The server ships with a core set of tools and is designed to be extended by you. Add your own integrations — email, calendars, documents, APIs — and make it yours.

Features
#

  • Multi-provider LLM — Anthropic, OpenAI, GitHub Copilot, Google, Mistral, Groq, xAI, OpenRouter, Ollama (local), LM Studio (local)
  • OAuth authentication — GitHub Copilot and OpenAI Codex via device code / browser flow
  • Agent tools — Shell commands, file read/write, directory listing, web search, web scraping, dev server management
  • Framework / user split — Framework files (skills/, custom-tools/, config/) stay read-only. Your customizations live under data/user/ (skills, custom tools, instructions, typed prefs). Framework updates land cleanly without overwriting your work
  • Typed user prefs — Store values like your preferred maps app or tunnel domain in data/user/prefs.json. Reference them in skills as {{prefs.key}} and the server substitutes them on load
  • Programming tasks in one call — The run_code_job meta-tool creates a prompt-type background job (run by your configured LLM) and sends the user the ack in a single step, replacing the error-prone “respond + create-job” pattern
  • Typed protocol tools — Six dedicated respond_* tools (respond_text, respond_image, respond_buttons, respond_shortcut, respond_html, respond_sequence), plus typed job tools (create_once_job, create_cron_job) and typed workspace tools (create_project, list_projects, delete_project). Schemas reject malformed URLs, button syntax, and type/schedule combinations before they reach the device
  • Typed writers for customizationcreate_user_skill and create_custom_tool build the user-layer markdown with correct frontmatter, so the loader always parses them and the agent never hand-writes these files
  • Background jobs — One-time or recurring tasks with cron expressions or simple intervals
  • Dynamic skills — Define shortcuts and behavior rules as .md files. Only a compact index is loaded into the prompt; full content is fetched on demand via the load_skill tool
  • Self-managing skills — The agent can create, edit, and delete skill definitions (writes always land in the user layer)
  • Semantic memory — Vector-based search with embeddings (Ollama, LM Studio, or OpenAI). Memories are auto-classified into wing/room/hall/status dimensions by the LLM
  • Knowledge graph — Temporal triple store for durable facts with auto-invalidation. Multi-value relationships coexist; single-value facts auto-replace
  • PARA method with project-end cascade — Every memory is tagged with a status (Project, Area, Resource, Archive). When a project ends, a single complete_project call archives its vectors, invalidates every planning triple tied to its slug, and records the completion — one call instead of three
  • Hybrid recall — Combines FTS5 keyword search with vector semantic search using reciprocal rank fusion
  • Long-term memory — SQLite + FTS5 full-text search as fallback when semantic memory is disabled
  • Dev server management with tunnel contract — Start, stop, and list dev servers. When tunnel: true is requested, the server enforces it pre-flight and post-spawn — an unreachable localhost server is never left running silently
  • Automatic URL sanitization — If the agent leaves a localhost URL in a response, the respond_* tools rewrite it to the matching tunnel URL so your phone always gets a reachable link
  • Custom tools — The agent can install CLI tools and register them as new capabilities
  • Versioning — Automatic git versioning for workspace files; config backups for skills and permissions
  • Web dashboard — Live overview of background jobs, customizable per user. /dashboard and /api/jobs are unauthenticated by design — restrict access at the network layer (Tailscale ACL, firewall, reverse proxy with basic auth) or set DASHBOARD=false if you don’t need it
  • HTTPS tunneling — Built-in support for Tailscale, ngrok, and Cloudflare Tunnel
  • System service — Install as a persistent service on macOS, Linux, or Windows
  • Rate limiting — Per-token request limits with configurable thresholds

Requirements
#

Quick Start
#

git clone https://github.com/pockethook-app/pockethook-agent-server.git
cd pockethook-agent-server
bun install

# Interactive setup — choose provider, model, auth token, port
bun run setup

# Start server + HTTPS tunnel
bun run dev:tunnel

The setup wizard will guide you through choosing an LLM provider, configuring authentication, and setting up tool permissions.

Once running, copy the displayed URLs into PocketHook Settings:

PocketHook SettingURL
Server URLhttps://your-host
Health Check URLhttps://your-host/health
Polling URLhttps://your-host/jobs

How It Works
#

  1. You send a message in PocketHook
  2. The server forwards it to your chosen LLM with conversation history, recalled memories, and available tools
  3. The LLM processes the message — it can run shell commands, read/write files, search the web, schedule background jobs, remember facts, or start dev servers
  4. The response is returned in PocketHook format (msg + shortcut + data + url)
  5. PocketHook displays the message and executes any Shortcuts on your device

Supported LLM Providers
#

ProviderAuthDefault Model
AnthropicAPI keyclaude-sonnet-4-20250514
OpenAIAPI keygpt-4.1-mini
OpenAI CodexOAuthgpt-5.1-codex-mini
GitHub CopilotOAuthclaude-sonnet-4
Google (Gemini)API keygemini-2.5-flash
MistralAPI keymistral-medium-latest
GroqAPI keyllama-3.3-70b-versatile
xAI (Grok)API keygrok-3-mini-fast
OpenRouterAPI keyanthropic/claude-sonnet-4
Ollama (local)Nonellama3.2
LM Studio (local)Noneqwen3.5-4b-mlx

Switch providers anytime with bun run switch. Ollama and LM Studio run entirely on your machine — no API key needed, no data leaves your network.

Memory
#

The memory system has three layers, each serving a different purpose.

The semantic memory design combines ideas from MemPalace (a memory palace architecture that organizes memories into wings, halls, and rooms) and Tiago Forte’s PARA method (Projects, Areas, Resources, Archive) for knowledge lifecycle management.

Conversation memory
#

SQLite with FTS5 full-text search. All messages are stored with timestamps and session IDs.

  • Short-term — Last MAX_HISTORY messages kept in memory per session
  • Long-term — All messages persisted in SQLite, searchable via FTS5 keyword matching
  • Recall per turn — When semantic memory is on, MAX_RECALL controls how many relevant memories are injected into the prompt each turn
  • Sessions expire after SESSION_TTL_MINUTES, but long-term memory persists forever

Tune these interactively with bun run memory.

Semantic memory
#

Requires VECTOR_MEMORY=true and an embedding provider (Ollama, LM Studio, or OpenAI).

Each memory is embedded as a vector and auto-classified by the LLM into four dimensions:

  • Wing — The entity: user, person:john, project:blog, place:london
  • Room — The type: facts, preferences, events, decisions, requests
  • Hall — The topic: personal, tech, health, travel, food, work
  • Status — PARA classification: project, area, resource, archive

When you ask a question, entity extraction focuses the vector search on the most relevant wings. Results are merged with FTS5 keyword results using reciprocal rank fusion — so you get the best of both keyword and semantic matching.

Knowledge graph
#

A temporal triple store for structured, durable facts:

  • Triples: (subject, predicate, object) with valid_from / valid_until timestamps
  • Single-value predicates (lives_in, partner) auto-invalidate the old value on update
  • Multi-value predicates (child, friend, hobby) coexist without invalidation
  • Knowledge graph facts are injected alongside recalled memories in every conversation

When you tell the agent “I moved to Berlin”, it invalidates the old lives_in triple and creates a new one — automatically.

PARA lifecycle
#

Every memory is tagged with a PARA status:

  • Project — Active, time-bound work
  • Area — Ongoing responsibilities
  • Resource — Reference material (lists, recommendations, how-tos)
  • Archive — Completed or cancelled projects

When a project completes, the agent uses semantic similarity to archive only that project’s memories while preserving reference material for future use.

Project-end cascade
#

Say “I’m cancelling my trip to Barcelona” and a single tool call handles everything:

  1. Archives the project’s vectors (events, decisions, requests tied to Barcelona).
  2. Invalidates every active knowledge-graph triple whose predicate matches the project slug (scheduled_visit_barcelona, planning_visit_barcelona, confirmed_visit_barcelona).
  3. Records the completion as a new triple: (user, "cancelled_visit_barcelona", "2026-04-15").

Matching is boundary-aware — a different project called revisit_barcelona stays untouched. The agent no longer has to orchestrate three separate calls in the right order, so smaller models get it right too.

If VECTOR_MEMORY is disabled or the embedding provider is unreachable, the system falls back to FTS5-only with no errors.

Skills
#

Skills are .md files in skills/ that define iOS Shortcuts the agent can trigger and/or behavior rules. They use dynamic loading: only a compact index (title, description, shortcut list) is injected into the system prompt. The agent loads full content on demand via the load_skill tool, keeping token usage low as you add more skills.

Each skill file uses YAML frontmatter:

---
title: Notes
description: Create notes on the user's device with a title and body
shortcuts: [newNote]
target: mac
sync_app: Notes
---

### New Note

Shortcut name: `newNote`

Creates a new note on the user's device.

Data fields:
- title (string, required): Note title
- content (string, required): Note body

Frontmatter fields
#

FieldRequiredDescription
titleYesHuman-readable name
descriptionYesOne sentence used in the skills index shown to the agent
shortcutsYesArray of shortcut names defined in the file. Use [] for behavior-only skills
targetNoWhere shortcuts execute: device (default, sent to iOS) or mac (run on the server)
sync_appNoApp to nudge in the background after server-side execution to trigger iCloud sync (e.g. Notes, Calendar, Reminders). Omit or use none to skip

Skills can also be behavior rules without shortcuts (e.g., “how to plan a family trip”). Use shortcuts: [] for these.

The agent can create and manage skills when asked — ask it to “create a skill for controlling my lights” and it will write the .md file for you. New and edited skills always land in your user layer (data/user/skills/), so framework updates never overwrite them. See the Customizing your agent section below.

Executing shortcuts on the Mac server
#

When a skill has target: mac, shortcuts run silently on the Mac server via the shortcuts run CLI instead of being sent to the iOS device. This is ideal for actions that create iCloud-synced content — notes, reminders, calendar events — because the result syncs to all your devices automatically without needing the PocketHook app to do anything.

How it works:

  1. The agent decides a shortcut should run (e.g. “create a note with today’s meeting notes”)
  2. The server invokes shortcuts run "shortcutName" with the data passed as JSON on stdin, using the same wrapper format PocketHook iOS uses
  3. If sync_app is set, the server briefly opens that app in the background (open -gj -a Notes) to force iCloud sync, then closes it after 5 seconds
  4. The user receives a confirmation message in the chat; the shortcut itself is not sent to the device

Requirements:

  • The server must be running on macOSshortcuts run is macOS-only. On other platforms, the server logs a warning and falls back to device execution
  • The shortcut must be installed in Shortcuts.app on the server Mac
  • The shortcut should expect a Dictionary as input (PocketHook wraps data in { context, timestamp, app, data })

When to use target: mac:

  • iCloud-synced actions (Notes, Reminders, Calendar) — the result reaches every device anyway
  • Long-running processing you want to keep off the iOS device
  • Any shortcut that doesn’t need to interact with the iPhone’s UI

When to keep target: device (default):

  • Shortcuts that need iPhone-only features (camera, precise location, local app automations)
  • Shortcuts that prompt the user for interactive input
  • Shortcuts that use App Intents from iOS-only apps

Background Jobs
#

Ask the agent to schedule tasks and it will handle the rest:

  • “Check the weather every morning at 8am and create a note”
  • “Run this script every hour”
  • “Remind me to check my email in 30 minutes”

Jobs support cron expressions (0 8 * * *) and simple intervals (30m, 1h, 2d). Results are delivered to PocketHook when it polls the /jobs endpoint.

Two execution types:

  • Shell — Runs a bash command, captures output. Can trigger a Shortcut on completion
  • Prompt — Processed by the AI agent with full tool access, stores the complete PocketHook response

Dev Servers
#

When the agent creates a web project in the workspace (Hugo, Astro, Next.js, Flask, Go, etc.), it proactively offers to serve it:

  • Preview — Starts a local dev server on an auto-assigned port for quick viewing
  • Public — Starts the server and exposes it via HTTPS tunnel so it’s accessible from anywhere

The agent manages the lifecycle: start, stop, and list running servers. All servers are cleaned up when the main server stops.

Tunnel contract
#

When the agent starts a server with tunnel exposure requested, the runtime enforces it: if no tunnel tool (Tailscale, ngrok, cloudflared) is installed, the server refuses to start. If tunnel setup fails after spawn, the orphan process is stopped and the agent is told explicitly — so it can fall back to preview mode or ask you to install a tunnel. The returned URL is always the tunnel URL when tunneling is on, with a note that the local URL is host-only.

As a safety net, every respond_* tool post-processes its message: any localhost or 127.0.0.1 URL that sneaks into a reply gets rewritten to the matching tunnel URL automatically when a managed server has one. When it can’t rewrite, you get a warning in the logs instead of a broken link on your phone.

Dashboard
#

The built-in web dashboard at /dashboard shows a live overview of background jobs.

Unauthenticated by design. Both /dashboard and /api/jobs are open GET endpoints — anyone who can reach the host can list jobs. Restrict access at the network layer (Tailscale ACL, firewall, reverse proxy with basic auth) or set DASHBOARD=false if you don’t need it. The PocketHook iOS app doesn’t use these endpoints.

It’s fully customizable:

  • Quick edit — Place a dashboard.html in workspace/dashboard/ for simple customizations
  • Full project — Create a framework project (Svelte, React, Vue, etc.) in workspace/dashboard/ with build output to dist/

Ask the agent to customize your dashboard and it will handle the rest — each user gets a unique, personalized dashboard.

Custom Tools
#

The agent can install CLI tools and register them as new capabilities — extending itself without modifying the server code.

For example, say “install Playwright and use it to take screenshots”. The agent will:

  1. Install the dependency
  2. Create a tool definition (a simple .md file)
  3. Use the new tool in future conversations

Custom tools are hot-reloaded — no restart needed. Delete the .md file to remove a tool.

Versioning
#

All user data is versioned automatically:

  • Workspace files — Tracked with a local git repo inside workspace/. Every write creates an auto-commit. Ask the agent to “undo the last change” or use git revert HEAD manually
  • Config filesconfig/agent-instructions.md, config/personality.md, skills/, and permissions.json are backed up before each modification. Up to 20 versions per file

Git is optional — if not installed, workspace changes are unversioned. Config backups always work.

Customizing your agent
#

The agent server ships with a minimal framework base and expects you to layer your own customization on top. The runtime keeps the two apart so framework updates never clobber your work.

Framework vs user
#

pockethook-agent-server/
├── skills/                      # framework-shipped skills (read-only)
├── custom-tools/                # reserved for framework-shipped tools (read-only)
├── config/
│   ├── agent-instructions.md    # framework agent instructions (read-only)
│   └── personality.md           # framework personality (read-only)
└── data/user/                   # YOUR customization lives here (git-ignored)
    ├── skills/                  # your own skills (override base on filename)
    ├── custom-tools/            # your installed custom tools
    ├── instructions.md          # your additions to agent instructions
    └── prefs.json               # typed values referenced as {{prefs.key}}

User customization is written via dedicated typed tools (create_user_skill, create_custom_tool) so the resulting files always match the loader’s format. The write tool also rejects any path under skills/, custom-tools/, or config/ and redirects the agent to data/user/* — so even direct file edits end up in the user layer.

Note about the base custom-tools/ directory. Today it only holds a template (_example.md) that the loader ignores — every tool the agent installs for you goes to data/user/custom-tools/. The directory is reserved so future framework releases can ship optional built-in tools without clobbering your installs. When that happens, your user-layer files still win on tool-name collision, so there’s nothing to migrate.

Four ways to customize
#

What you want to changeWhere it goesExample
A shortcut or behavior skilldata/user/skills/<name>.md“Create a skill to log my workouts”
A CLI tool wrapped as an agent capabilitydata/user/custom-tools/<name>.md“Install ffmpeg and let me use it for conversions”
A global rule (“always reply in English”, “never use tables”)data/user/instructions.md“From now on, always summarize articles in 3 bullets”
A typed default value referenced by skillsdata/user/prefs.json“My default route origin is Madrid”{"routeOrigin": "Madrid"}

You never have to write these files by hand. Just tell the agent what you want and it picks the right layer automatically.

Typed preferences with {{prefs.*}}
#

Say you write a route-planner skill that needs to know your default starting point. Instead of hardcoding “Madrid” into the skill, reference the pref:

- **Starting point**: {{prefs.routeOrigin}}, unless the user specifies a different origin.

And store the value in data/user/prefs.json:

{
  "routeOrigin": "Madrid, Spain",
  "preferredMapsApp": "apple",
  "tunnel": { "domain": "my-host.ts.net" }
}

The server substitutes placeholders when the skill is loaded. Nested keys ({{prefs.tunnel.domain}}) work too. Unknown keys are left untouched so typos stay visible.

Editing the framework base directly
#

If you’re self-hosting and want to tweak the framework itself, you can edit config/agent-instructions.md, config/personality.md, skills/, or custom-tools/ directly — the server doesn’t stop you when you use a file editor. But the agent won’t write to those paths from a conversation. And framework updates will overwrite your edits. Prefer the user layer for anything you want to keep.

Extending the Server
#

  • Custom tools — Ask the agent to install CLI tools; they land in data/user/custom-tools/ automatically
  • Add skills — Ask the agent to create a skill; the file goes in data/user/skills/
  • Change behavior — Ask the agent to apply a global rule; it appends to data/user/instructions.md
  • Configure permissions — Run bun run permissions to control which tools the agent can use
  • Add built-in tools — Implement new tool functions in src/tools.ts for deeper integrations (requires forking the server)

Configuration
#

All settings are stored in .env (created by bun run setup). Key options:

VariableDefaultDescription
AUTH_TOKEN(required)Shared secret with PocketHook
LLM_API_KEY(required)LLM provider API key
LLM_PROVIDERanthropicProvider name
LLM_MODELclaude-sonnet-4-20250514Model ID
LLM_REASONINGoffReasoning effort: off, minimal, low, medium, high, xhigh. Higher levels add hidden thinking tokens (slower + more expensive). Ignored by models that don’t support it
PORT3000Server port
AGENT_NAMEPocketHook AssistantAgent display name
MAX_HISTORY50Messages in short-term memory
MAX_RECALL5Memories returned per turn by semantic recall (only when VECTOR_MEMORY=true)
SESSION_TTL_MINUTES60Session expiration
VECTOR_MEMORYfalseEnable semantic memory (requires an embedding provider)
EMBEDDING_PROVIDERollamaEmbedding provider: ollama, lm-studio, or openai
EMBEDDING_MODELnomic-embed-textEmbedding model name
EMBEDDING_URL(auto)Embedding API URL
EMBEDDING_API_KEYAPI key for OpenAI embeddings
LOG_LEVELinfoLog level: debug, info, warn, error
RATE_LIMIT_MAX30Max requests per window
DASHBOARDtrueEnable web dashboard (/dashboard route)
INSTANCE_NAME(project dir basename, with pockethook- stripped)Suffix used for the system service label, log directory, and process matching. Set explicitly when running multiple checkouts on the same machine

See the full configuration reference in the GitHub repository.

Running as a Service
#

Install as a persistent service that starts automatically:

bun run service install
PlatformBackendService location
macOSlaunchd~/Library/LaunchAgents/com.pockethook.${INSTANCE_NAME}.plist
Linuxsystemd (user)~/.config/systemd/user/pockethook-${INSTANCE_NAME}.service
WindowsNSSMPocketHook-${PascalCase(INSTANCE_NAME)} in Windows Service Manager

INSTANCE_NAME defaults to the project directory basename with the pockethook- prefix stripped (e.g., a checkout in pockethook-agent-server/ becomes agent-server). Set it explicitly to run several checkouts on the same machine without collisions — each instance keeps its own data/ and logs.

Manage with bun run service status, restart, stop, or uninstall.

Security
#

  • HTTPS required — PocketHook enforces HTTPS for all URLs
  • Bearer token auth — Shared secret between app and server
  • Rate limiting — Per-token limits prevent abuse
  • Sandboxed tools — Shell commands and file access restricted by permissions
  • Blocked patterns — Dangerous commands (sudo, rm -rf /) blocked by default
  • Working directory boundary — Agent can’t escape its designated directory
  • Sensitive files protected.env, .git, *.key, *.pem blocked from agent access
  • Automatic versioning — All workspace changes are git-tracked for easy rollback