Tools — overview

A tool is a function the AI can call to do something on the visitor’s behalf — scroll a page, click a button, search your content, capture a lead, end the call. Every tool has a name, a JSON schema for its parameters, and a server-side handler.

Spelo’s widget ships with 17 built-in tools organized into 4 groups. The AI picks which one to call based on what the visitor said.

Page perception read_page · read_viewport · read_section · see.snapshot — how the AI knows what's on the screen.

Actions navigate · scroll · click · fill · confirm — the AI moves around and fills forms by voice.

Knowledge & lifecycle search_knowledge_base · submit_lead · end_call · set_flow_state — search, capture, transition.

How tool-calling works end-to-end

1. Visitor speaks
   "Show me the pricing page"
        │
        ▼
2. OpenAI Realtime API (over WebRTC, no Spelo middleman)
   transcribes audio + decides which tool to call
        │
        ▼
3. Tool call returns to the browser as a data-channel event:
   { name: "navigate", arguments: { url: "/pricing" } }
        │
        ▼
4. Widget runs the local handler in spelo.js
   → navigate("/pricing") triggers in-page nav, returns "Navigated to /pricing"
        │
        ▼
5. Tool result is sent back to OpenAI
   model speaks: "Done — you're on the pricing page now."

Audio never touches Spelo servers. Tool handlers run in the visitor’s browser (most tools) or on Spelo’s API (only search_knowledge_base, read_section, and submit_lead, which need server credentials).

Two protocol generations: v2 (see/act) and legacy

Spelo’s tool surface evolved in two phases. Both are live and the LLM picks between them per task.

v2 — namespaced `see.` / `act.` (preferred)

The v2 protocol is snapshot-driven: the AI first asks for a structured grid of every interactive/textual element on the page (see.snapshot), gets back stable ids, then acts on those ids (act.click, act.fill, act.scroll_to).

Why this is better:

Stable element addressing — no fuzzy text matching, no “I see two ‘Submit’ buttons” ambiguity
Survives DOM rewrites mid-conversation
Snapshot tells the AI exactly what’s visible vs. below the fold
Icon-only buttons (where click_element by text fails) become reliably targetable

The flow: see.snapshot → AI picks an id from the result → act.click({ id: "sp-12" }).

Legacy flat tools (still wired as fallback)

The legacy tools (navigate, scroll_to, scroll_by, click_element, fill_field) address elements by visible text or label. They still work, and the LLM uses them when:

No snapshot has been taken yet for the current page
The target is unambiguously named (a single “Submit” button on a small form)
The legacy tool is more efficient (e.g. scroll_by 50% is just a viewport math hint — no snapshot needed)

You don’t have to configure anything — the AI picks the right path automatically per the instructions in its system prompt.

The wire format

Every tool follows the OpenAI Realtime function-calling schema:

{
  "type": "function",
  "name": "navigate",
  "description": "Navigate to a different page on the website. Use the href paths from the NAVIGATION section of the page context...",
  "parameters": {
    "type": "object",
    "properties": {
      "url": { "type": "string", "description": "..." },
      "external": { "type": "boolean", "description": "..." }
    },
    "required": ["url"]
  }
}

When the AI invokes a tool, the data channel delivers:

{
  "type": "response.function_call_arguments.done",
  "call_id": "call_xyz",
  "name": "navigate",
  "arguments": "{ \"url\": \"/pricing\" }"
}

The widget runs the handler and replies:

{
  "type": "conversation.item.create",
  "item": {
    "type": "function_call_output",
    "call_id": "call_xyz",
    "output": "Navigated to /pricing"
  }
}

The model then continues its turn — usually with a short spoken confirmation.

Where tools live in the codebase

Concern	File
LLM-visible tool schemas + browser handlers	`packages/spelo-system/src/tools.ts`
Server-side voice-relay tool handlers	`apps/voice-relay/src/tools.ts`
Server endpoints called by browser handlers	`apps/api/src/routes/{search,read-section,leads}.ts`

When customers want to know “what can Spelo do on my site?” — the answer is whatever tools are in voiceTools[] in packages/spelo-system/src/tools.ts at bundle build time.

Tools by transport

Spelo has two voice transports, and the LLM-visible toolset differs slightly between them:

Transport	When used	search_knowledge_base	search_database	submit_lead
openai-direct (browser ↔ OpenAI WebRTC)	Most customer sites, by default	Yes (RAG over crawled content)	Hidden (DFY tier)	Yes
voice-relay (browser ↔ our managed voice infrastructure ↔ Gemini/OpenAI)	Sites configured for Gemini, or free-tier overflow	Yes	Yes (full DB adapter)	Yes

If your site uses connected database adapters and you want the AI to query them directly — see Plans and limits for the DFY tier requirements.

How the AI decides which tool to call

The AI’s choice is driven entirely by the tool descriptions in the schema (the description field shown in the wire-format example above). These descriptions are part of the system prompt. The AI doesn’t know how the tool is implemented — it picks based on what the description says it does.

This is why Spelo’s tool descriptions are precise and behavioural. For example:

scroll_to — Scroll to a specific NAMED section, heading, or element on the page. Use when the user names a destination (“scroll to the pricing section”, “go to the contact form”, “show me the FAQ”). For vague directional moves like “scroll down a little” use scroll_by instead.

The “use when … but use X for Y” phrasing trains the model to disambiguate.

What customers should and shouldn’t worry about

You don’t need to:

Define tools yourself (they’re built into the bundle)
Wire up handlers (the widget does this)
Pick which tool the AI uses (the AI does this from descriptions)

You can influence tool behaviour via:

Personality + custom instructions — change tone and add domain-specific guidance
Restricted topics — hard-block topics the AI shouldn’t answer
Enabled / disabled pages — keep the orb off /checkout etc.
Connect a database — unlock the search_database capability (DFY tier)
Webhooks — be notified when the AI captures a lead

Other tool categories

Page perception Actions Knowledge & lifecycle