Guide · 2026
How to add voice to your website
There are four realistic ways to make your website talk in 2026. Three of them are overkill for most sites, and one has a free tier. Here is the honest comparison — so you pick the right tool instead of the loudest one.
What “add voice” actually means
The phrase is ambiguous. It can mean three different things:
- Voice output (read-aloud): a button that narrates your page content, usually for accessibility.
- Voice input (dictation): a mic icon in a form that transcribes what the user says into text.
- Voice conversation (AI agent): a two-way spoken exchange — visitor talks, AI answers, page updates in real time. This is what Spelo does.
Most businesses want the third but implement the first, then wonder why engagement did not change. Read-aloud is table stakes. Conversation is what converts.
The four approaches, compared
| Approach | Cost | Voice quality | Interactivity | Setup | Best for |
|---|---|---|---|---|---|
| Browser text-to-speech | Free | Robotic | None — read-only | ~10 lines of JavaScript | Accessibility read-aloud only |
| Chatbot + TTS add-on | $20–$100/mo | OK | Text in, audio out | Two tools to maintain | Teams already using a chatbot |
| Hosted voice widget (e.g., Spelo) | $0–$499/mo | Natural | Full-duplex voice | One script tag | Any site that wants a real conversation |
| Custom voice agent | $5,000+ build, $500+/mo run | Natural | Whatever you build | Weeks of engineering | Enterprises with unique requirements |
Option 1 — browser text-to-speech
Every modern browser ships with the SpeechSynthesis API.
Ten lines of JavaScript and you can make any page read itself aloud. It is free, it works
offline, and it sounds exactly like a 2010-era GPS. Use it if your only goal is a
read-aloud accessibility button.
Option 2 — chatbot plus a TTS add-on
Take an existing text chatbot (Intercom, Drift, Tidio) and layer a TTS service on top so the bot can speak its replies. You get a voice that sounds better than browser TTS and a familiar chat UI. You also get two systems to pay for, two consoles to manage, and a conversation that still feels like typing in disguise — the visitor has to wait for the bot to write before it speaks.
Option 3 — a hosted voice widget
Paste one script tag. Get full-duplex voice — the visitor can interrupt the agent, the agent can scroll the page, fill filters, and answer in context. This is what Spelo does. Free tier for small sites, usage-based paid plans for growing businesses.
The install is literally this:
<script src="https://spelo.ai/widget.js"
data-site-id="YOUR_SITE_ID"
async></script> That is the whole integration. The widget reads your page content on demand — no scheduled indexing, no database sync jobs. When you update a product or publish a blog post, the voice agent knows about it instantly.
Option 4 — custom voice agent
Build the whole thing yourself: a speech-to-text model, a large language model, a text-to-speech model, and realtime orchestration to make them interrupt each other naturally. Expect six to twelve weeks of engineering, a four-figure monthly cloud bill, and a latency budget of under 300 ms round-trip to feel human. Worth it for enterprises with unique requirements — overkill for a small business.
Which one should you pick?
- Accessibility only: browser TTS. Free, no vendor.
- Small business site, blog, or landing page: hosted voice widget. Spelo free tier covers most small sites.
- E-commerce or lead-gen site: hosted voice widget with database connection. Spelo Starter or Pro.
- Enterprise with unusual rules (healthcare, fintech): custom build or Spelo Enterprise with dedicated infrastructure.