Zum Inhalt springen
BFSG compliance since 2025
WCAG

Accessible AI Chatbots: WCAG-Compliant Voice Bots

14 min read
KI-ChatbotVoiceWCAGBFSGARIA

In 2026, many companies are adding AI chatbots and voice assistants to their websites. What is intended as a pure efficiency measure in customer service has a legal dimension that is frequently overlooked: a chatbot or voice bot on a consumer website is a digital service and therefore falls under the German Accessibility Strengthening Act (BFSG), which has applied since 28 June 2025 (BFSG, 2025). This is exactly where the most expensive gaps arise. In Germany, around 13 million people (Aktion Mensch, 2024) rely on accessible content due to disability or age. A bot whose dialog a screen reader does not read out, that cannot be operated by keyboard, or whose audio output has no text alternative systematically excludes this group. This article shows the typical pitfalls and how a conversational UI becomes compliant with semantic HTML, ARIA live regions and focus management.

Accessible AI chatbot: from dialog to screen reader outputAISupport assistantrole=log, aria-live=politeBot, 2:02 pmHow can I help you?You, 2:02 pmWhere is my order?Bot, 2:03 pmYour shipment is on its way.Delivery tomorrow by 6 pm.Type a message ...Play audioRead as textSend (Enter)Focus stays in the input fieldno jump to the top of the pageWhat the screen reader announcesLog: bot says, shipment is on its wayNew message, 2:03 pm, from assistantInput field, type a message, editablepolite: waits for a pause, does not interruptFour pillars of conformance1KeyboardTab, Enter, Escapefully operable without a mouse2Live regionrole=log, politenew replies are read out3Focusno focus lossstays in the dialog after sending4Text pathalternative to audiovoice bot also delivered as text

Why AI Chatbots Fall Under the BFSG

The BFSG obliges providers of certain products and services to make their digital offerings accessible. This includes electronic commerce, that is websites and apps through which consumers can conclude contracts for services or goods (BFSG, Section 1). A chatbot that takes orders, recommends products, books appointments or handles support is part of this offering and therefore not an optional gimmick but a conformance-relevant component of the service. Supervisory authorities can impose fines of up to 100,000 euros (BFSG, Section 37) for violations and, in extreme cases, prohibit the provision of the service.

The technical benchmark for this accessibility is the harmonized European standard EN 301 549, which at its core refers to the Web Content Accessibility Guidelines (WCAG) 2.1 at conformance level AA (EN 301 549). Anyone operating a chatbot must therefore apply the relevant WCAG success criteria to the chat widget and its dynamic content as well. This is more demanding than for static pages, because a bot continuously generates new content, shifts focus and often works with audio or synthetic speech.

The urgency also follows from the current state of the web. The WebAIM Million Report (2024) found that 95.9 percent (WebAIM Million, 2024) of the home pages examined have detectable WCAG errors, with an average of 56.8 errors (WebAIM Million, 2024) per page. A retrofitted chatbot usually lands on exactly these pages and increases the error load further, because dynamic components are particularly error prone. We clarify the overview of BFSG requirements and the cost side in our article on accessibility audit cost and ROI.

In concrete terms this affects several user groups that are easily overlooked in day-to-day operations. Blind and visually impaired people operate the site with a screen reader and rely on every bot reply being present as text in the DOM. People with motor impairments navigate exclusively by keyboard or via alternative input devices that use the keyboard path. Deaf and hard-of-hearing people cannot hear a speech-only output. And older people, who often combine several of these limitations, form a growing target group in e-commerce. An accessible chatbot is therefore not only a legal obligation but also opens up a substantial share of customers who would otherwise be lost at the very first point of contact.

Purchased bot widgets do not remove the obligation

Many companies embed a ready-made chatbot widget and assume the provider has solved accessibility. However, the company itself remains responsible for the conformance of the service it offers. Whether an embedded widget is actually usable by keyboard and screen reader can only be determined by testing the specific integration on your own site, not by a vendor's marketing promise.

The Typical Pitfalls in Conversational UI

Chatbots and voice bots rarely fail on a single exotic detail, but on a handful of recurring patterns. The following four pitfalls appear in almost every project in which a conversational UI was retrofitted without accessibility being considered from the start.

Screen reader does not read the dialog

New bot replies appear visually but are not announced because no live region is set up. Screen reader users do not know a reply has arrived and would have to search for it manually.

No complete keyboard operation

Quick-reply buttons, closing the widget or sending only respond to a mouse click. Anyone relying on the keyboard cannot proceed in the dialog or cannot leave the window.

Focus jumps uncontrolled

After sending, the focus lands at the top of the page or disappears entirely. The user has to laboriously navigate back into the dialog, which makes the conversation unusable in practice.

Audio without a text path

Voice bots output replies as synthetic speech without the same content being available as text. Deaf and hard-of-hearing users, as well as anyone in a quiet environment, are left out.

The common cause of these pitfalls is that conversational UI is dynamic. Unlike a static page, the content changes continuously after loading, the focus moves, and modalities like audio are added that do not appear in classic web forms. Anyone who does not actively describe this dynamism for assistive technologies builds an interface that works smoothly for sighted mouse users and is invisible to everyone else.

Announce the Bot Dialog as a Live Region

The core of an accessible chatbot is the live region. It instructs the screen reader to automatically read out new content in a specific DOM area without focus having to move there. For a chat history, the role role=log is the most fitting choice: it is designed precisely for sequentially added information such as log entries or chat messages and has an implicit aria-live value of polite (W3C WAI, ARIA23). Polite means the screen reader does not interrupt the ongoing output but reads the new message only after a speech pause. The W3C explicitly lists the chat history as an example of this technique (W3C WAI, ARIA23).

A common confusion concerns the difference between polite and assertive. An assertive live region interrupts the ongoing output immediately. For a chatbot this is usually the wrong choice, because every bot reply would pull the user out of their current activity. The Nielsen Norman Group (2023) notes that frequent assertive interruptions are quickly perceived as stressful. Assertive is reserved for errors and session timeouts; the normal flow of conversation belongs in a polite region.

The live region must exist in the DOM before the first message

The most common live region error in chatbots: the history container is created only when the widget opens and marked as a live region at the same time, or worse, anew for each message. Then the first reply is not announced, because the screen reader was not yet observing the region when it came into being. The correct approach: the empty history container exists in the DOM with role=log from the start, and only then are messages added as child elements.
chat-log.html
<!-- History exists from the start with role=log -->
<div id="chat-log" role="log" aria-live="polite"
     aria-label="Conversation history">
  <!-- Messages are appended as child elements -->
</div>

<script>
  function addBotMessage(text) {
    const item = document.createElement('div');
    // Make speaker and time audible for the screen reader
    item.innerHTML =
      '<span class="vh">Assistant, ' + timeNow() + ': </span>' + text;
    document.getElementById('chat-log').appendChild(item);
  }
  // .vh is visually hidden but readable for screen readers
</script>

Each individual message should make clear to the screen reader who it is from and when it was sent. Sighted users infer this from position, color and speech bubble. Screen reader users need this information as text, for example through a visually hidden prefix such as Assistant or You together with the time. This allows the history to be navigated coherently later, and it stays clear which message is from the bot and which from the user. We explore how live regions and status messages work in principle in our article on accessible error and status messages.

Keyboard Operation and Focus Management

A chatbot must be fully operable by keyboard. This includes opening and closing the widget, reaching every quick-reply button, entering and sending a message, and leaving the dialog. WCAG success criterion 2.1.1 Keyboard requires that all functionality is available from a keyboard (W3C WCAG 2.2). In practice this often fails on buttons implemented as non-focusable div or span elements, or on a tab order that does not match the visual order.

Focus management is at least as important. When a user sends a message, focus must not be lost. The most common error with dynamically loaded replies is that focus jumps to the top of the page after sending, because the input field is briefly removed from the DOM and re-rendered. For screen reader and keyboard users the conversation is then practically unusable, because they have to navigate back into the dialog after every message. Instead, focus should stay in the input field or be set in a controlled way to the appropriate element.

  • The widget opens and closes by keyboard, and focus moves into the dialog on opening.
  • Escape closes the chat window and returns focus to the triggering element.
  • All quick-reply buttons are native buttons, focusable and triggerable with enter and space.
  • After sending, focus stays in the input field or is set deliberately and is not lost.
  • The tab order follows the visual order of the dialog.
  • A visible focus indicator shows at all times which element is currently active.

If the chatbot opens as a modal dialog over the page, the rules for modals additionally apply: focus should be kept within the dialog (focus trap) so that the tab key does not accidentally jump to the page behind it. We describe the details in our article on accessible modals and overlays and in our article on focus management in single-page apps.

Voice Bots: the Text Path to Every Audio Output

Voice assistants and bots with synthetic speech output present an additional hurdle. If the only output is spoken audio, deaf and hard-of-hearing people are excluded, as is anyone who cannot use sound in a loud or quiet environment. The WCAG require an equivalent text alternative for any content conveyed via audio: success criterion 1.2.1 calls for an alternative for audio-only content (W3C WCAG 2.2). For a voice bot this specifically means that every spoken reply must also appear as readable text in the history.

The same applies in reverse for input: a bot that relies exclusively on voice input excludes people with a speech impairment or in a noisy environment. WCAG success criterion 1.3.6 and the operability principles require that input is not tied to a single modality. An accessible voice UI therefore also offers text input as an equivalent path. The principle is: speech is an additional modality, not the only one.

ModalityBarrier without an alternativeAccessible implementation
Bot audio outputDeaf and hard-of-hearing users do not hear the replyEvery reply also appears as text in the history (WCAG 1.2.1)
Voice inputUsers with a speech impairment or in a loud environment cannot replyText input field available as an equivalent path
Automatic playbackSounds overlay the screen reader and cause confusionAudio starts only on user action, with an option to stop (WCAG 1.4.2)
Time limit for replyUsers with motor impairments cannot complete the input in timeNo time pressure or extendable limit (WCAG 2.2.1)

Synthetic speech does not replace a screen reader

A common misconception: if the bot speaks anyway, it must automatically be usable for blind people. That is not the case. Screen reader users control their own speech output, speed and navigation. An unsolicited audio output from the bot overlays the screen reader and disrupts more than it helps. The correct approach is text in the DOM that the user's own screen reader reads out, complemented by an optional, user-controlled audio output.

Building an Accessible Conversational UI Step by Step

Taken together, the individual requirements form a clear build guide. In accessible web development we typically proceed in this order for conversational UI, because the early steps form the foundation for the later ones.

In implementation, one principle proves its worth: the conversational UI should not impair the rest of the page. A chat widget that seizes focus on load, plays sounds automatically or sets up an assertive live region disrupts assistive technologies across the whole page, not just in the dialog. The bot therefore starts unobtrusively, announces itself via a clearly named trigger and only takes focus once the user actively opens the widget. This keeps the bot an additional offering that complements the page rather than undermining its operability. This principle of gentle enhancement can be planned in from the start and is considerably more effort to correct afterwards.

  1. Semantic foundation: the chat window is a clearly named region, the input field is a real label-linked form field, all buttons are native buttons with an accessible name.
  2. Set up the live region: the history container exists in the DOM from the start with role=log and aria-live=polite, empty, before the first message arrives.
  3. Attribute messages: every message carries a screen-reader-readable indication of speaker and time, sighted users see position and color.
  4. Control focus: focus moves into the dialog on opening, stays in the input field on sending, escape closes and returns focus.
  5. Secure the text path: every audio output has an equivalent text version, every voice input an equivalent text input.
  6. Test with real screen readers: the entire dialog is traversed by keyboard and with at least two screen readers before it goes live.

Artificial intelligence can promote accessibility, for example through automatic captioning or translation into easy-to-read language. For this to succeed, inclusion must be considered from the start, not added afterwards.

paraphrased from Aktion Mensch, AI and Accessibility

The last step is the decisive one. Automated testing tools detect a portion of the errors, such as missing labels or invalid ARIA values. However, studies show that automated tests can only check a portion of the WCAG success criteria at all (W3C/WAI). Whether a live region announces the right thing at the right time, whether focus sits sensibly after sending and whether the text path is substantively equivalent is shown only by manual testing with real screen readers. This very review is part of our WCAG audit.

This article is based on data from: German Accessibility Strengthening Act BFSG (2025), EN 301 549, W3C WCAG 2.2 (2023), W3C WAI ARIA23 technique role=log, WebAIM Million Report (2024), Aktion Mensch on AI and accessibility and on digital accessibility (2024), Nielsen Norman Group (2023), moin.ai chatbot wiki on the BFSG (2025).