Adapter failures and failover
A connected data adapter is a live dependency. Real databases go down — replicas lag, connection pools exhaust, cloud providers have incidents. This page covers what happens in a Spelo voice call when that occurs and what you can do about it.
What the visitor experiences when the adapter is down
The agent does not crash. It gracefully degrades to DOM-only mode: answers come from the current page’s content (read_page) and the crawled knowledge base (search_knowledge_base), but specific database queries fail soft.
A typical visible failure mode:
Visitor: "Show me 2-bedroom apartments in West Hollywood."Agent: "I'm having trouble pulling live listings right now, but our latest published list is on our /apartments page — I can take you there now."The agent knows the tool failed (it sees the error in the tool-result envelope), explains it in human terms, and offers a fallback. No technical jargon, no crash, no dropped call.
The failure detection path
Visitor speaks │ ▼AI calls search_database (or /v1/<siteId>/query) │ ▼Spelo API → your adapter's search() function │ ▼Adapter throws / times out / returns malformed │ ▼Spelo API returns: { "success": true, "data": { "success": false, "error": "Connection refused" } } │ ▼Browser widget forwards to model │ ▼Model's system prompt says: "If a tool returns success:false, apologize briefly + offer the DOM/KB fallback + don't keep retrying."The wrapper-level success: true means “the API call itself succeeded” — the inner data.success: false means “the adapter said no.” This distinction lets your monitoring pick out adapter outages from API outages.
Retry behavior
At the adapter layer:
| Adapter | Default retries | Backoff | Timeout per attempt |
|---|---|---|---|
| Postgres, MySQL, MSSQL | 1 retry | 200 ms fixed | 5 s |
| MongoDB | 2 retries | 100 ms, 400 ms | 4 s |
| Shopify (Storefront API) | 1 retry on 5xx only | 500 ms | 6 s |
| Airtable | 2 retries on 429 (honors Retry-After) | per header | 5 s |
| Google Sheets | 1 retry | 1 s | 8 s |
| Firebase | 1 retry | 500 ms | 4 s |
| REST adapter | 1 retry on 5xx + network errors | 500 ms | depends on config timeout_ms |
| Webhook adapter | 0 retries by default (your endpoint controls this) | — | depends on config |
After retries exhaust, the API returns data.success: false with the last error.
At the model layer:
The system prompt instructs the AI to not call search_database again in the same turn after a failure. This avoids tying up the call with retries while the visitor waits in silence. The AI moves on to fallback paths immediately.
Monitoring — the kb-health endpoint
Spelo exposes a health endpoint per site that summarizes adapter status:
curl https://api.spelo.ai/v1/sites/:id/kb-health \ -H "Authorization: Bearer vk_live_..."Returns:
{ "success": true, "data": { "site_id": "ab1c2d3e", "provider": "openai", "adapter": { "type": "postgres", "status": "healthy", "last_check": "2026-05-15T07:30:00Z", "last_success": "2026-05-15T07:30:00Z", "consecutive_failures": 0 }, "knowledge_base": { "total_pages": 24, "total_chunks": 412, "last_crawl": "2026-05-14T22:00:00Z" }, "voice_relay_deployed": true, "gemini_runtime_broken": false }}Hook this into your monitoring (Datadog, Grafana, BetterStack, internal). Alert on adapter.status != "healthy" or consecutive_failures > 3.
status values:
| Value | Meaning |
|---|---|
healthy | Last ping succeeded within the last 5 minutes |
degraded | 1-2 consecutive failures, retries pending |
unhealthy | 3+ consecutive failures, adapter is currently failing |
unknown | Never been pinged (just configured) |
The health check runs every 5 minutes per site automatically.
Webhook events for failures
Subscribe to the adapter.failed event to get notified the moment an adapter starts failing:
curl -X POST https://api.spelo.ai/v1/webhooks \ -H "Authorization: Bearer vk_live_..." \ -d '{ "url": "https://yourapp.com/spelo-adapter-alert", "events": ["adapter.failed", "adapter.recovered"] }'Payload:
{ "event": "adapter.failed", "site_id": "ab1c2d3e", "adapter_type": "postgres", "error_code": "connection_refused", "error_message": "ECONNREFUSED 10.0.1.50:5432", "first_failure_at": "2026-05-15T07:45:00Z", "consecutive_failures": 3}A matching adapter.recovered fires when the next health check succeeds.
Recovery patterns
Pattern A — restart the adapter pool
For Postgres / MySQL / MongoDB, after the database itself recovers, the Spelo API holds stale connections from the prior outage. Force a fresh pool:
curl -X POST https://api.spelo.ai/v1/sites/:id/adapter/reset-pool \ -H "Authorization: Bearer vk_live_..."Returns immediately. The next adapter call will negotiate a fresh connection.
Pattern B — re-validate credentials
If your DB credentials rotated and the adapter is failing with auth errors:
curl -X PATCH https://api.spelo.ai/v1/sites/:id \ -H "Authorization: Bearer vk_live_..." \ -d '{ "adapter": { "config": { "password": "${NEW_PASSWORD}" } } }'Then verify with POST /v1/sites/:id/adapter/test.
Pattern C — temporarily disable the adapter
If your DB is going to be down for an extended planned maintenance window, disable the adapter so the AI doesn’t try at all (cleaner UX than letting every call fail soft):
curl -X PATCH https://api.spelo.ai/v1/sites/:id \ -H "Authorization: Bearer vk_live_..." \ -d '{ "adapter": { "enabled": false } }'The AI falls back to DOM-only + knowledge-base mode for the duration. Re-enable when maintenance is done.
What customers should NOT do
Common failure modes by adapter
| Adapter | Most common cause | Fix |
|---|---|---|
| Postgres / MySQL | Connection pool exhausted | Increase max_connections on DB, or reduce concurrent voice sessions |
| Postgres / MySQL | SSL cert expired | Renew, update ssl.ca in config |
| MongoDB | Replica set primary changed | Auto-recovers via driver; may see 1-2 failures during election |
| Shopify | Storefront API rate-limited (60 req/min per token) | Use multiple tokens, or upgrade to Plus |
| Airtable | Per-base rate limit (5 req/sec) | Same as Shopify — multiple PATs, or aggregate queries |
| Google Sheets | Token expired | OAuth refresh — see OAuth → refresh UX |
| REST | Endpoint changed shape | Re-check collections.*.searchable_fields matches |
| Webhook | Your endpoint down | The whole point of using a webhook is you control it — fix your endpoint |
See also
- Database connection errors — initial setup / config errors
- Slow responses — performance issues, not outages
- Empty results — adapter is healthy but the AI can’t find the answer
- OAuth — refresh UX — token-expiry recovery
- Sites API — adapter config endpoints
- Webhooks — subscribe to
adapter.failed