Agent Server

Table of Contents

What is PocketHook Agent Server?
#

The agent server turns PocketHook into a full AI assistant. Instead of writing response logic yourself, you connect an LLM (Claude, GPT, Gemini, etc.) that processes messages, calls tools, and returns structured PocketHook responses — including Shortcut triggers.

The server runs on your own machine. Your data stays with you.

This is a starting point. The server ships with a core set of tools and is designed to be extended by you. Add your own integrations — email, calendars, documents, APIs — and make it yours.

Features
#

Multi-provider LLM — Anthropic, OpenAI, GitHub Copilot, Google, Mistral, Groq, xAI, OpenRouter, Ollama (local), LM Studio (local)
OAuth authentication — GitHub Copilot and OpenAI Codex via device code / browser flow
Agent tools — Shell commands, file read/write, directory listing, web search, web scraping, dev server management
Framework / user split — Framework files (skills/, custom-tools/, config/) stay read-only. Your customizations live under data/user/ (skills, custom tools, instructions, typed prefs). Framework updates land cleanly without overwriting your work
Typed user prefs — Store values like your preferred maps app or tunnel domain in data/user/prefs.json. Reference them in skills as {{prefs.key}} and the server substitutes them on load
Programming tasks in one call — The run_code_job meta-tool creates a prompt-type background job (run by your configured LLM) and sends the user the ack in a single step, replacing the error-prone “respond + create-job” pattern
Typed protocol tools — Six dedicated respond_* tools (respond_text, respond_image, respond_buttons, respond_shortcut, respond_html, respond_sequence), plus typed job tools (create_once_job, create_cron_job) and typed workspace tools (create_project, list_projects, delete_project). Schemas reject malformed URLs, button syntax, and type/schedule combinations before they reach the device
Typed writers for customization — create_user_skill and create_custom_tool build the user-layer markdown with correct frontmatter, so the loader always parses them and the agent never hand-writes these files
Background jobs — One-time or recurring tasks with cron expressions or simple intervals
Dynamic skills — Define shortcuts and behavior rules as .md files. Only a compact index is loaded into the prompt; full content is fetched on demand via the load_skill tool
Self-managing skills — The agent can create, edit, and delete skill definitions (writes always land in the user layer)
Semantic memory — Vector-based search with embeddings (Ollama, LM Studio, or OpenAI). Memories are auto-classified into wing/room/hall/status dimensions by the LLM
Knowledge graph — Temporal triple store for durable facts with auto-invalidation. Multi-value relationships coexist; single-value facts auto-replace
PARA method with project-end cascade — Every memory is tagged with a status (Project, Area, Resource, Archive). When a project ends, a single complete_project call archives its vectors, invalidates every planning triple tied to its slug, and records the completion — one call instead of three
Hybrid recall — Combines FTS5 keyword search with vector semantic search using reciprocal rank fusion
Long-term memory — SQLite + FTS5 full-text search as fallback when semantic memory is disabled
Dev server management with tunnel contract — Start, stop, and list dev servers. When tunnel: true is requested, the server enforces it pre-flight and post-spawn — an unreachable localhost server is never left running silently
Automatic URL sanitization — If the agent leaves a localhost URL in a response, the respond_* tools rewrite it to the matching tunnel URL so your phone always gets a reachable link
Custom tools — The agent can install CLI tools and register them as new capabilities
Versioning — Automatic git versioning for workspace files; config backups for skills and permissions
Web dashboard — Live overview of background jobs, customizable per user. /dashboard and /api/jobs are unauthenticated by design — restrict access at the network layer (Tailscale ACL, firewall, reverse proxy with basic auth) or set DASHBOARD=false if you don’t need it
HTTPS tunneling — Built-in support for Tailscale, ngrok, and Cloudflare Tunnel
System service — Install as a persistent service on macOS, Linux, or Windows
Rate limiting — Per-token request limits with configurable thresholds

Requirements
#

Bun runtime
An API key or OAuth credentials for your LLM provider
(Optional) Tailscale, ngrok, or cloudflared for HTTPS tunneling

Quick Start
#

git clone https://github.com/pockethook-app/pockethook-agent-server.git
cd pockethook-agent-server
bun install

# Interactive setup — choose provider, model, auth token, port
bun run setup

# Start server + HTTPS tunnel
bun run dev:tunnel

The setup wizard will guide you through choosing an LLM provider, configuring authentication, and setting up tool permissions.

Once running, copy the displayed URLs into PocketHook Settings:

PocketHook Setting	URL
Server URL	`https://your-host`
Health Check URL	`https://your-host/health`
Polling URL	`https://your-host/jobs`

How It Works
#

You send a message in PocketHook
The server forwards it to your chosen LLM with conversation history, recalled memories, and available tools
The LLM processes the message — it can run shell commands, read/write files, search the web, schedule background jobs, remember facts, or start dev servers
The response is returned in PocketHook format (msg + shortcut + data + url)
PocketHook displays the message and executes any Shortcuts on your device

Supported LLM Providers
#

Provider	Auth	Default Model
Anthropic	API key	`claude-sonnet-4-20250514`
OpenAI	API key	`gpt-4.1-mini`
OpenAI Codex	OAuth	`gpt-5.1-codex-mini`
GitHub Copilot	OAuth	`claude-sonnet-4`
Google (Gemini)	API key	`gemini-2.5-flash`
Mistral	API key	`mistral-medium-latest`
Groq	API key	`llama-3.3-70b-versatile`
xAI (Grok)	API key	`grok-3-mini-fast`
OpenRouter	API key	`anthropic/claude-sonnet-4`
Ollama (local)	None	`llama3.2`
LM Studio (local)	None	`qwen3.5-4b-mlx`

Switch providers anytime with bun run switch. Ollama and LM Studio run entirely on your machine — no API key needed, no data leaves your network.

Memory
#

The memory system has three layers, each serving a different purpose.

The semantic memory design combines ideas from MemPalace (a memory palace architecture that organizes memories into wings, halls, and rooms) and Tiago Forte’s PARA method (Projects, Areas, Resources, Archive) for knowledge lifecycle management.

Conversation memory
#

SQLite with FTS5 full-text search. All messages are stored with timestamps and session IDs.

Short-term — Last MAX_HISTORY messages kept in memory per session
Long-term — All messages persisted in SQLite, searchable via FTS5 keyword matching
Recall per turn — When semantic memory is on, MAX_RECALL controls how many relevant memories are injected into the prompt each turn
Sessions expire after SESSION_TTL_MINUTES, but long-term memory persists forever

Tune these interactively with bun run memory.

Semantic memory
#

Requires VECTOR_MEMORY=true and an embedding provider (Ollama, LM Studio, or OpenAI).

Each memory is embedded as a vector and auto-classified by the LLM into four dimensions:

Wing — The entity: user, person:john, project:blog, place:london
Room — The type: facts, preferences, events, decisions, requests
Hall — The topic: personal, tech, health, travel, food, work
Status — PARA classification: project, area, resource, archive

When you ask a question, entity extraction focuses the vector search on the most relevant wings. Results are merged with FTS5 keyword results using reciprocal rank fusion — so you get the best of both keyword and semantic matching.

Knowledge graph
#

A temporal triple store for structured, durable facts:

Triples: (subject, predicate, object) with valid_from / valid_until timestamps
Single-value predicates (lives_in, partner) auto-invalidate the old value on update
Multi-value predicates (child, friend, hobby) coexist without invalidation
Knowledge graph facts are injected alongside recalled memories in every conversation

When you tell the agent “I moved to Berlin”, it invalidates the old lives_in triple and creates a new one — automatically.

PARA lifecycle
#

Every memory is tagged with a PARA status:

Project — Active, time-bound work
Area — Ongoing responsibilities
Resource — Reference material (lists, recommendations, how-tos)
Archive — Completed or cancelled projects

When a project completes, the agent uses semantic similarity to archive only that project’s memories while preserving reference material for future use.

Project-end cascade
#

Say “I’m cancelling my trip to Barcelona” and a single tool call handles everything:

Archives the project’s vectors (events, decisions, requests tied to Barcelona).
Invalidates every active knowledge-graph triple whose predicate matches the project slug (scheduled_visit_barcelona, planning_visit_barcelona, confirmed_visit_barcelona).
Records the completion as a new triple: (user, "cancelled_visit_barcelona", "2026-04-15").

Matching is boundary-aware — a different project called revisit_barcelona stays untouched. The agent no longer has to orchestrate three separate calls in the right order, so smaller models get it right too.

If VECTOR_MEMORY is disabled or the embedding provider is unreachable, the system falls back to FTS5-only with no errors.

Skills
#

Skills are .md files in skills/ that define iOS Shortcuts the agent can trigger and/or behavior rules. They use dynamic loading: only a compact index (title, description, shortcut list) is injected into the system prompt. The agent loads full content on demand via the load_skill tool, keeping token usage low as you add more skills.

Each skill file uses YAML frontmatter:

---
title: Notes
description: Create notes on the user's device with a title and body
shortcuts: [newNote]
target: mac
sync_app: Notes
---

### New Note

Shortcut name: `newNote`

Creates a new note on the user's device.

Data fields:
- title (string, required): Note title
- content (string, required): Note body

Frontmatter fields
#

Field	Required	Description
`title`	Yes	Human-readable name
`description`	Yes	One sentence used in the skills index shown to the agent
`shortcuts`	Yes	Array of shortcut names defined in the file. Use `[]` for behavior-only skills
`target`	No	Where shortcuts execute: `device` (default, sent to iOS) or `mac` (run on the server)
`sync_app`	No	App to nudge in the background after server-side execution to trigger iCloud sync (e.g. `Notes`, `Calendar`, `Reminders`). Omit or use `none` to skip

Skills can also be behavior rules without shortcuts (e.g., “how to plan a family trip”). Use shortcuts: [] for these.

The agent can create and manage skills when asked — ask it to “create a skill for controlling my lights” and it will write the .md file for you. New and edited skills always land in your user layer (data/user/skills/), so framework updates never overwrite them. See the Customizing your agent section below.

Executing shortcuts on the Mac server
#

When a skill has target: mac, shortcuts run silently on the Mac server via the shortcuts run CLI instead of being sent to the iOS device. This is ideal for actions that create iCloud-synced content — notes, reminders, calendar events — because the result syncs to all your devices automatically without needing the PocketHook app to do anything.

How it works:

The agent decides a shortcut should run (e.g. “create a note with today’s meeting notes”)
The server invokes shortcuts run "shortcutName" with the data passed as JSON on stdin, using the same wrapper format PocketHook iOS uses
If sync_app is set, the server briefly opens that app in the background (open -gj -a Notes) to force iCloud sync, then closes it after 5 seconds
The user receives a confirmation message in the chat; the shortcut itself is not sent to the device

Requirements:

The server must be running on macOS — shortcuts run is macOS-only. On other platforms, the server logs a warning and falls back to device execution
The shortcut must be installed in Shortcuts.app on the server Mac
The shortcut should expect a Dictionary as input (PocketHook wraps data in { context, timestamp, app, data })

When to use target: mac:

iCloud-synced actions (Notes, Reminders, Calendar) — the result reaches every device anyway
Long-running processing you want to keep off the iOS device
Any shortcut that doesn’t need to interact with the iPhone’s UI

When to keep target: device (default):

Shortcuts that need iPhone-only features (camera, precise location, local app automations)
Shortcuts that prompt the user for interactive input
Shortcuts that use App Intents from iOS-only apps

Background Jobs
#

Ask the agent to schedule tasks and it will handle the rest:

“Check the weather every morning at 8am and create a note”
“Run this script every hour”
“Remind me to check my email in 30 minutes”

Jobs support cron expressions (0 8 * * *) and simple intervals (30m, 1h, 2d). Results are delivered to PocketHook when it polls the /jobs endpoint.

Two execution types:

Shell — Runs a bash command, captures output. Can trigger a Shortcut on completion
Prompt — Processed by the AI agent with full tool access, stores the complete PocketHook response

Dev Servers
#

When the agent creates a web project in the workspace (Hugo, Astro, Next.js, Flask, Go, etc.), it proactively offers to serve it:

Preview — Starts a local dev server on an auto-assigned port for quick viewing
Public — Starts the server and exposes it via HTTPS tunnel so it’s accessible from anywhere

The agent manages the lifecycle: start, stop, and list running servers. All servers are cleaned up when the main server stops.

Tunnel contract
#

When the agent starts a server with tunnel exposure requested, the runtime enforces it: if no tunnel tool (Tailscale, ngrok, cloudflared) is installed, the server refuses to start. If tunnel setup fails after spawn, the orphan process is stopped and the agent is told explicitly — so it can fall back to preview mode or ask you to install a tunnel. The returned URL is always the tunnel URL when tunneling is on, with a note that the local URL is host-only.

As a safety net, every respond_* tool post-processes its message: any localhost or 127.0.0.1 URL that sneaks into a reply gets rewritten to the matching tunnel URL automatically when a managed server has one. When it can’t rewrite, you get a warning in the logs instead of a broken link on your phone.

Dashboard
#

The built-in web dashboard at /dashboard shows a live overview of background jobs.

Unauthenticated by design. Both /dashboard and /api/jobs are open GET endpoints — anyone who can reach the host can list jobs. Restrict access at the network layer (Tailscale ACL, firewall, reverse proxy with basic auth) or set DASHBOARD=false if you don’t need it. The PocketHook iOS app doesn’t use these endpoints.

It’s fully customizable:

Quick edit — Place a dashboard.html in workspace/dashboard/ for simple customizations
Full project — Create a framework project (Svelte, React, Vue, etc.) in workspace/dashboard/ with build output to dist/

Ask the agent to customize your dashboard and it will handle the rest — each user gets a unique, personalized dashboard.

Custom Tools
#

The agent can install CLI tools and register them as new capabilities — extending itself without modifying the server code.

For example, say “install Playwright and use it to take screenshots”. The agent will:

Install the dependency
Create a tool definition (a simple .md file)
Use the new tool in future conversations

Custom tools are hot-reloaded — no restart needed. Delete the .md file to remove a tool.

Versioning
#

All user data is versioned automatically:

Workspace files — Tracked with a local git repo inside workspace/. Every write creates an auto-commit. Ask the agent to “undo the last change” or use git revert HEAD manually
Config files — config/agent-instructions.md, config/personality.md, skills/, and permissions.json are backed up before each modification. Up to 20 versions per file

Git is optional — if not installed, workspace changes are unversioned. Config backups always work.

Customizing your agent
#

The agent server ships with a minimal framework base and expects you to layer your own customization on top. The runtime keeps the two apart so framework updates never clobber your work.

Framework vs user
#

pockethook-agent-server/
├── skills/                      # framework-shipped skills (read-only)
├── custom-tools/                # reserved for framework-shipped tools (read-only)
├── config/
│   ├── agent-instructions.md    # framework agent instructions (read-only)
│   └── personality.md           # framework personality (read-only)
└── data/user/                   # YOUR customization lives here (git-ignored)
    ├── skills/                  # your own skills (override base on filename)
    ├── custom-tools/            # your installed custom tools
    ├── instructions.md          # your additions to agent instructions
    └── prefs.json               # typed values referenced as {{prefs.key}}

User customization is written via dedicated typed tools (create_user_skill, create_custom_tool) so the resulting files always match the loader’s format. The write tool also rejects any path under skills/, custom-tools/, or config/ and redirects the agent to data/user/* — so even direct file edits end up in the user layer.

Note about the base custom-tools/ directory. Today it only holds a template (_example.md) that the loader ignores — every tool the agent installs for you goes to data/user/custom-tools/. The directory is reserved so future framework releases can ship optional built-in tools without clobbering your installs. When that happens, your user-layer files still win on tool-name collision, so there’s nothing to migrate.

Four ways to customize
#

What you want to change	Where it goes	Example
A shortcut or behavior skill	`data/user/skills/<name>.md`	“Create a skill to log my workouts”
A CLI tool wrapped as an agent capability	`data/user/custom-tools/<name>.md`	“Install ffmpeg and let me use it for conversions”
A global rule (“always reply in English”, “never use tables”)	`data/user/instructions.md`	“From now on, always summarize articles in 3 bullets”
A typed default value referenced by skills	`data/user/prefs.json`	“My default route origin is Madrid” → `{"routeOrigin": "Madrid"}`

You never have to write these files by hand. Just tell the agent what you want and it picks the right layer automatically.

Typed preferences with `{{prefs.*}}`
#

Say you write a route-planner skill that needs to know your default starting point. Instead of hardcoding “Madrid” into the skill, reference the pref:

- **Starting point**: {{prefs.routeOrigin}}, unless the user specifies a different origin.

And store the value in data/user/prefs.json:

{
  "routeOrigin": "Madrid, Spain",
  "preferredMapsApp": "apple",
  "tunnel": { "domain": "my-host.ts.net" }
}

The server substitutes placeholders when the skill is loaded. Nested keys ({{prefs.tunnel.domain}}) work too. Unknown keys are left untouched so typos stay visible.

Editing the framework base directly
#

If you’re self-hosting and want to tweak the framework itself, you can edit config/agent-instructions.md, config/personality.md, skills/, or custom-tools/ directly — the server doesn’t stop you when you use a file editor. But the agent won’t write to those paths from a conversation. And framework updates will overwrite your edits. Prefer the user layer for anything you want to keep.

Extending the Server
#

Custom tools — Ask the agent to install CLI tools; they land in data/user/custom-tools/ automatically
Add skills — Ask the agent to create a skill; the file goes in data/user/skills/
Change behavior — Ask the agent to apply a global rule; it appends to data/user/instructions.md
Configure permissions — Run bun run permissions to control which tools the agent can use
Add built-in tools — Implement new tool functions in src/tools.ts for deeper integrations (requires forking the server)

Configuration
#

All settings are stored in .env (created by bun run setup). Key options:

Variable	Default	Description
`AUTH_TOKEN`	(required)	Shared secret with PocketHook
`LLM_API_KEY`	(required)	LLM provider API key
`LLM_PROVIDER`	`anthropic`	Provider name
`LLM_MODEL`	`claude-sonnet-4-20250514`	Model ID
`LLM_REASONING`	`off`	Reasoning effort: `off`, `minimal`, `low`, `medium`, `high`, `xhigh`. Higher levels add hidden thinking tokens (slower + more expensive). Ignored by models that don’t support it
`PORT`	`3000`	Server port
`AGENT_NAME`	`PocketHook Assistant`	Agent display name
`MAX_HISTORY`	`50`	Messages in short-term memory
`MAX_RECALL`	`5`	Memories returned per turn by semantic recall (only when `VECTOR_MEMORY=true`)
`SESSION_TTL_MINUTES`	`60`	Session expiration
`VECTOR_MEMORY`	`false`	Enable semantic memory (requires an embedding provider)
`EMBEDDING_PROVIDER`	`ollama`	Embedding provider: `ollama`, `lm-studio`, or `openai`
`EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model name
`EMBEDDING_URL`	(auto)	Embedding API URL
`EMBEDDING_API_KEY`	—	API key for OpenAI embeddings
`LOG_LEVEL`	`info`	Log level: debug, info, warn, error
`RATE_LIMIT_MAX`	`30`	Max requests per window
`DASHBOARD`	`true`	Enable web dashboard (`/dashboard` route)
`INSTANCE_NAME`	(project dir basename, with `pockethook-` stripped)	Suffix used for the system service label, log directory, and process matching. Set explicitly when running multiple checkouts on the same machine

See the full configuration reference in the GitHub repository.

Running as a Service
#

Install as a persistent service that starts automatically:

bun run service install

Platform	Backend	Service location
macOS	launchd	`~/Library/LaunchAgents/com.pockethook.${INSTANCE_NAME}.plist`
Linux	systemd (user)	`~/.config/systemd/user/pockethook-${INSTANCE_NAME}.service`
Windows	NSSM	`PocketHook-${PascalCase(INSTANCE_NAME)}` in Windows Service Manager

INSTANCE_NAME defaults to the project directory basename with the pockethook- prefix stripped (e.g., a checkout in pockethook-agent-server/ becomes agent-server). Set it explicitly to run several checkouts on the same machine without collisions — each instance keeps its own data/ and logs.

Manage with bun run service status, restart, stop, or uninstall.

Security
#

HTTPS required — PocketHook enforces HTTPS for all URLs
Bearer token auth — Shared secret between app and server
Rate limiting — Per-token limits prevent abuse
Sandboxed tools — Shell commands and file access restricted by permissions
Blocked patterns — Dangerous commands (sudo, rm -rf /) blocked by default
Working directory boundary — Agent can’t escape its designated directory
Sensitive files protected — .env, .git, *.key, *.pem blocked from agent access
Automatic versioning — All workspace changes are git-tracked for easy rollback

What is PocketHook Agent Server?#

Features#

Requirements#

Quick Start#

How It Works#

Supported LLM Providers#

Memory#

Conversation memory#

Semantic memory#

Knowledge graph#

PARA lifecycle#

Project-end cascade#

Skills#

Frontmatter fields#

Executing shortcuts on the Mac server#

Background Jobs#

Dev Servers#

Tunnel contract#

Dashboard#

Custom Tools#

Versioning#

Customizing your agent#

Framework vs user#

Four ways to customize#

Typed preferences with {{prefs.*}}#

Editing the framework base directly#

Extending the Server#

Configuration#

Running as a Service#

Security#