Adapter failures and failover

A connected data adapter is a live dependency. Real databases go down — replicas lag, connection pools exhaust, cloud providers have incidents. This page covers what happens in a Spelo voice call when that occurs and what you can do about it.

What the visitor experiences when the adapter is down

The agent does not crash. It gracefully degrades to DOM-only mode: answers come from the current page’s content (read_page) and the crawled knowledge base (search_knowledge_base), but specific database queries fail soft.

A typical visible failure mode:

Visitor: "Show me 2-bedroom apartments in West Hollywood."
Agent:   "I'm having trouble pulling live listings right now, but
          our latest published list is on our /apartments page —
          I can take you there now."

The agent knows the tool failed (it sees the error in the tool-result envelope), explains it in human terms, and offers a fallback. No technical jargon, no crash, no dropped call.

The failure detection path

Visitor speaks
     │
     ▼
AI calls search_database (or /v1/<siteId>/query)
     │
     ▼
Spelo API → your adapter's search() function
     │
     ▼
Adapter throws / times out / returns malformed
     │
     ▼
Spelo API returns:
   { "success": true,
     "data": { "success": false, "error": "Connection refused" } }
     │
     ▼
Browser widget forwards to model
     │
     ▼
Model's system prompt says: "If a tool returns success:false,
   apologize briefly + offer the DOM/KB fallback + don't keep retrying."

The wrapper-level success: true means “the API call itself succeeded” — the inner data.success: false means “the adapter said no.” This distinction lets your monitoring pick out adapter outages from API outages.

Retry behavior

At the adapter layer:

Adapter	Default retries	Backoff	Timeout per attempt
Postgres, MySQL, MSSQL	1 retry	200 ms fixed	5 s
MongoDB	2 retries	100 ms, 400 ms	4 s
Shopify (Storefront API)	1 retry on 5xx only	500 ms	6 s
Airtable	2 retries on 429 (honors `Retry-After`)	per header	5 s
Google Sheets	1 retry	1 s	8 s
Firebase	1 retry	500 ms	4 s
REST adapter	1 retry on 5xx + network errors	500 ms	depends on config `timeout_ms`
Webhook adapter	0 retries by default (your endpoint controls this)	—	depends on config

After retries exhaust, the API returns data.success: false with the last error.

At the model layer:

The system prompt instructs the AI to not call search_database again in the same turn after a failure. This avoids tying up the call with retries while the visitor waits in silence. The AI moves on to fallback paths immediately.

Monitoring — the kb-health endpoint

Spelo exposes a health endpoint per site that summarizes adapter status:

curl https://api.spelo.ai/v1/sites/:id/kb-health \
  -H "Authorization: Bearer vk_live_..."

Returns:

{
  "success": true,
  "data": {
    "site_id": "ab1c2d3e",
    "provider": "openai",
    "adapter": {
      "type": "postgres",
      "status": "healthy",
      "last_check": "2026-05-15T07:30:00Z",
      "last_success": "2026-05-15T07:30:00Z",
      "consecutive_failures": 0
    },
    "knowledge_base": {
      "total_pages": 24,
      "total_chunks": 412,
      "last_crawl": "2026-05-14T22:00:00Z"
    },
    "voice_relay_deployed": true,
    "gemini_runtime_broken": false
  }
}

Hook this into your monitoring (Datadog, Grafana, BetterStack, internal). Alert on adapter.status != "healthy" or consecutive_failures > 3.

status values:

Value	Meaning
`healthy`	Last ping succeeded within the last 5 minutes
`degraded`	1-2 consecutive failures, retries pending
`unhealthy`	3+ consecutive failures, adapter is currently failing
`unknown`	Never been pinged (just configured)

The health check runs every 5 minutes per site automatically.

Webhook events for failures

Subscribe to the adapter.failed event to get notified the moment an adapter starts failing:

curl -X POST https://api.spelo.ai/v1/webhooks \
  -H "Authorization: Bearer vk_live_..." \
  -d '{
    "url": "https://yourapp.com/spelo-adapter-alert",
    "events": ["adapter.failed", "adapter.recovered"]
  }'

Payload:

{
  "event": "adapter.failed",
  "site_id": "ab1c2d3e",
  "adapter_type": "postgres",
  "error_code": "connection_refused",
  "error_message": "ECONNREFUSED 10.0.1.50:5432",
  "first_failure_at": "2026-05-15T07:45:00Z",
  "consecutive_failures": 3
}

A matching adapter.recovered fires when the next health check succeeds.

Recovery patterns

Pattern A — restart the adapter pool

For Postgres / MySQL / MongoDB, after the database itself recovers, the Spelo API holds stale connections from the prior outage. Force a fresh pool:

curl -X POST https://api.spelo.ai/v1/sites/:id/adapter/reset-pool \
  -H "Authorization: Bearer vk_live_..."

Returns immediately. The next adapter call will negotiate a fresh connection.

Pattern B — re-validate credentials

If your DB credentials rotated and the adapter is failing with auth errors:

curl -X PATCH https://api.spelo.ai/v1/sites/:id \
  -H "Authorization: Bearer vk_live_..." \
  -d '{ "adapter": { "config": { "password": "${NEW_PASSWORD}" } } }'

Then verify with POST /v1/sites/:id/adapter/test.

Pattern C — temporarily disable the adapter

If your DB is going to be down for an extended planned maintenance window, disable the adapter so the AI doesn’t try at all (cleaner UX than letting every call fail soft):

curl -X PATCH https://api.spelo.ai/v1/sites/:id \
  -H "Authorization: Bearer vk_live_..." \
  -d '{ "adapter": { "enabled": false } }'

The AI falls back to DOM-only + knowledge-base mode for the duration. Re-enable when maintenance is done.

What customers should NOT do

Common failure modes by adapter

Adapter	Most common cause	Fix
Postgres / MySQL	Connection pool exhausted	Increase `max_connections` on DB, or reduce concurrent voice sessions
Postgres / MySQL	SSL cert expired	Renew, update `ssl.ca` in config
MongoDB	Replica set primary changed	Auto-recovers via driver; may see 1-2 failures during election
Shopify	Storefront API rate-limited (60 req/min per token)	Use multiple tokens, or upgrade to Plus
Airtable	Per-base rate limit (5 req/sec)	Same as Shopify — multiple PATs, or aggregate queries
Google Sheets	Token expired	OAuth refresh — see OAuth → refresh UX
REST	Endpoint changed shape	Re-check `collections.*.searchable_fields` matches
Webhook	Your endpoint down	The whole point of using a webhook is you control it — fix your endpoint