Auto-generated Captions

What are auto-generated captions?

Auto-generated captions use technology to convert spoken words in videos or audio files into text displayed on the screen. This makes content accessible to more people, including those who are hard of hearing or in noisy places.

Auto-generated captions can be called auto-transcribing, auto-captioning, speech-to-text, or Automatic Speech Recognition (ASR). They are a cost-effective way to create captions without hiring someone, but they aren't perfect.

How do they work?

ASR software guesses what someone is saying based on voice patterns. However, it can make mistakes because it does not always understand the context. For example, it might confuse "2" with "II", "to," "too," or "two." Natural Language Understanding (NLU) engines help by looking at surrounding words to figure out the correct meaning.

Are they accurate?

The Federal Communications Commission (FCC) requires TV captions to be 99% accurate. A 2025 study by 3Play Media found that ASR engines have:

  • Word Error Rate (WER): 80-93% accuracy
  • Formatting Error Rate (FER): 72-84% accuracy

Previous studies by 3Play Media show a major improvement in ASR technology. However, no ASR engine is 99% accurate.

Factors affecting accuracy

  • Audio Quality: Background noise, connection type, and equipment quality
  • Content*: Complex jargon, uncommon spellings, and proper names
  • Duration: Longer content tends to have more inaccuracies

* Note: You can train some ASR engines on spellings if you provide scripts or a list of special words and proper names.

Should you use auto-generated captions?

For pre-recorded content, start with auto-generated captions but check them for accuracy. They can misinterpret words, context, and sentence structure, and may be out of sync with the audio.

For live web conferences or online meetings, auto-generated captions are a good minimum requirement. Ensure your platform supports them as well as manual captioning. If someone requests accommodations under the Americans with Disabilities Act (ADA), you should hire a human CART services provider.

Considerations

  • Test ASR on your platform to see if it handles your terminology well.
  • For performative content with censored words, use a human CART services provider.
  • If identifying speakers or translating jargon is difficult, use a human CART services provider.