X

Android's Expressive Captions Aim to Give You a Better Idea of What's Happening Onscreen

The feature uses AI to convey details like intensity of speech and background sounds in videos and livestreams.

Abrar Al-Heeti Technology Reporter
Abrar Al-Heeti is a technology reporter for CNET, with an interest in phones, streaming, autonomous cars, internet trends, entertainment, pop culture and digital accessibility. She's also worked for CNET's video, culture and news teams. She graduated with bachelor's and master's degrees in journalism from the University of Illinois at Urbana-Champaign. Though Illinois is home, she now loves San Francisco -- steep inclines and all.
Expertise Abrar has spent her career at CNET analyzing tech trends while also writing news, reviews and commentaries across mobile, streaming and online culture. Credentials
  • Named a Tech Media Trailblazer by the Consumer Technology Association in 2019, a winner of SPJ NorCal's Excellence in Journalism Awards in 2022 and has three times been a finalist in the LA Press Club's National Arts & Entertainment Journalism Awards.
Abrar Al-Heeti
3 min read
Expressive Captions in use during a football game, shown depicting some words in all caps.

Expressive Captions will put some words in all-caps to convey excitement.

Google

Google on Thursday debuted a new feature that makes captions more true to life. Called Expressive Captions, it not only relays what someone is saying in a video or livestream but can also convey how someone is saying it. 

For instance, if someone excitedly wishes you a "HAPPY BIRTHDAY!" the captions will appear in all-caps. You'll also see descriptions of ambient sounds like applause or music to get a fuller picture of the environment. Other expressions such as sighing, groaning or gasping will also be relayed via Expressive Captions. 

The new feature is part of Live Caption, which automatically generates real-time captions across media like videos, phone calls and audio messages. The feature is built into Android's operating system and works across your phone's apps, which means Expressive Captions can work with most things you watch like social media livestreams and video messages. And because captions are generated on-device, they're also available when you're on airplane mode or don't have an internet connection. 

Examples of how Expressive Captions can convey sighs, excited speech or cheers and applause

Expressive Captions offer a fuller picture of what's happening in a video.

Google/Jeffrey Hazelwood/CNET

Captions have traditionally been used by people who are deaf or hard of hearing to follow TV content. But in recent years, the use of captions has expanded across demographics as people opt to watch videos without sound on the subway, for instance, or seek to better understand what's being said in a movie or TV show. In fact, 70% of Gen Z users regularly watch TV with subtitles, according to online language tutor site Preply. But oftentimes, livestreams, social content and videos from friends and family don't include pre-loaded captions.

Android and Google DeepMind teams came together to build Expressive Captions, which uses multiple AI models to create stylized captions that can label a wider range of sounds. The goal is to emulate how dynamic listening to audio can be. 

"It's just one way we're building for the real lived experiences of people with disabilities and using AI to build for everyone," Angana Ghosh, director of product management on Android, said in a blog post. 

Expressive Captions is available starting Thursday in the US in English on any Android device running Android 14 and above that supports Live Caption. It's just one of several updates Google is announcing for Android and Pixel devices. 

Google is also adding updates to its Lookout app, which can help blind and low-vision users identify objects and get more information about their surroundings. Lookout now adds Arabic to the dozens of languages it already supports and will tap Gemini AI models to power image descriptions and its Q&A mode that lets people ask follow-up questions about an image. It also includes auto-language detection and more natural-sounding voices on the app.

Lookout describes a turquoise lake surrounded by green capped mountains with people boating on the water.

Gemini will now help power the Lookout tool's image description feature.

Google/Jeffrey Hazelwood/CNET

The company is also adding more Gemini extensions to Android for apps like Utilities, Spotify, Messaging and Calling, making them easier to access through Google's virtual assistant.

Those with a Pixel device get additional new features, like a Gemini Saved Info feature that lets you ask Gemini to remember your interests and preferences, so it can surface more helpful and relevant responses. There's also an update that lets you save content to Pixel Screenshots when using Circle to Search with a quick tap, making it easier to find later. You can also add credit cards or tickets you've screenshotted to your wallet, and Pixel Screenshots will automatically categorize your screenshots to keep things more organized.

And finally, Simple View on Pixel makes seeing and navigating controls, apps and widgets easier by increasing your phone's font size and touch sensitivity. It also shows a simplified home screen layout with a preselected set of essential apps and increases the app grid to a four-by-four display. Simple View is available on Pixel 6 phones and newer.

The addition of Expressive Captions is likely to make watching content without audio more engaging both for users who are deaf or hard of hearing, as well as anyone who chooses not to watch something with the volume up.

The Pixel 9 Pro XL's New Design Is One of My Favorite Things About It

See all photos