Auto-generated Captions
What are auto-generated captions?
Auto-generated captions use technology to convert spoken words in videos or audio files into text displayed on the screen. This makes content accessible to more people, including those who are hard of hearing or in noisy places.
Auto-generated captions can be called auto-transcribing, auto-captioning, speech-to-text, or Automatic Speech Recognition (ASR). They are a cost-effective way to create captions without hiring someone, but they aren't perfect.
How do they work?
ASR software guesses what someone is saying based on voice patterns. However, it can make mistakes because it does not always understand the context. For example, it might confuse "2" with "II", "to," "too," or "two." Natural Language Understanding (NLU) engines help by looking at surrounding words to figure out the correct meaning.
Are they accurate?
The Federal Communications Commission (FCC) requires TV captions to be 99% accurate. A 2025 study by 3Play Media found that ASR engines have:
- Word Error Rate (WER): 80-93% accuracy
- Formatting Error Rate (FER): 72-84% accuracy
Previous studies by 3Play Media show a major improvement in ASR technology. However, no ASR engine is 99% accurate.
Factors affecting accuracy
- Audio Quality: Background noise, connection type, and equipment quality
- Content*: Complex jargon, uncommon spellings, and proper names
- Duration: Longer content tends to have more inaccuracies
* Note: You can train some ASR engines on spellings if you provide scripts or a list of special words and proper names.
Should you use auto-generated captions?
For pre-recorded content, start with auto-generated captions but check them for accuracy. They can misinterpret words, context, and sentence structure, and may be out of sync with the audio.
For live web conferences or online meetings, auto-generated captions are a good minimum requirement. Ensure your platform supports them as well as manual captioning. If someone requests accommodations under the Americans with Disabilities Act (ADA), you should hire a human CART services provider.
Considerations
- Test ASR on your platform to see if it handles your terminology well.
- For performative content with censored words, use a human CART services provider.
- If identifying speakers or translating jargon is difficult, use a human CART services provider.
What solutions have auto-generated captions features for live broadcasts and online meetings?
Not all of the following are vetted for use at Texas A&M University-Corpus Christi. However, you may see them used at various conferences or in webinars.
What solutions can auto-generate captions for pre-recorded audio and video content?
Since the content is pre-recorded, it must be more accurate than live captions. While auto-generated captions can start you off, you must always check the quality and accuracy of the captions.
Shorter audio and video content is easier to auto-caption and correct. You can also request closed captioning services through Information Technology, so you do not have to correct auto-generated content.