Page Reader AI | Subtitling Your Life

Subtitling Your Life | The New Yorker

This article explores the transformative impact of voice-to-text technology on the lives of people with hearing loss, highlighting advancements in assistive devices and software.

AI Summary available — skim the key points instantly. Show AI Generated Summary

Show AI Generated Summary

A little over thirty years ago, when he was in his mid-forties, my friend David Howorth lost all hearing in his left ear, a calamity known as single-sided deafness. “It happened literally overnight,” he said. “My doctor told me, ‘We really don’t understand why.’ ” At the time, he was working as a litigator in the Portland, Oregon, office of a large law firm. (He and his family had moved there from New York after one of his daughters pricked a finger on a discarded syringe while climbing on rocks in Prospect Park.) His hearing loss had no impact on his job—“In a courtroom, you can get along fine with one ear”—but other parts of his life were upended. The brain pinpoints sound sources in part by analyzing minute differences between left-ear and right-ear arrival times, the same process that helps bats and owls find prey they can’t see. Now that Howorth had just one working ear, he didn’t know where to look when someone called his name on a busy sidewalk. In groups, he would pretend to follow what others were saying, nodding occasionally. “Even when I knew the topic, I was reluctant to join in for fear of being somewhat off point, or, worse, saying the same thing that someone else had just said,” he recalled. At dinner parties, his wife, Martha, always tried to sit on his left, so that he wouldn’t have to explain to a stranger why he had failed to respond.

Martha died in 2016. Perhaps because she was no longer there to act as an intermediary, he noticed that his good ear wasn’t very good anymore, and he was fitted, for the first time, for hearing aids. The type he got was designed specifically for people with his condition, and included a unit for each ear. The one in his dead ear had a microphone but no speaker; it wirelessly transmitted sounds from that side to the unit in his functioning ear. “I went to a bar with my brothers, and was amazed,” he said. “One of them was talking to me across the table, and I could hear him.” The amazement didn’t last. Multi-speaker conversations were still confusing, and he was no better at locating sound sources, since everything seemed to be coming from the same place.

One morning in 2023, Howorth put on his hearing aids and realized, with a shock, that his right ear had stopped working, too. He travelled to one of the world’s leading facilities for hearing disorders, the Shea Clinic, in Memphis. Doctors repeatedly injected the steroid dexamethasone directly into his middle ear, through his eardrum. Steroids are the standard treatment for sudden deafness, but they sometimes have no effect. For Howorth, they did nothing.

Last year, after he had given up all hope that the hearing in his right ear would return, he received a cochlear implant on the other side. A professor of otolaryngology at Harvard Medical School once described cochlear implants to me as “undeniably the finest biological prosthesis that we have today, for anybody, in terms of restoration of function.” Research on implants began in the nineteen-fifties, and the technology has improved steadily since then. Contrary to popular belief, though, they don’t magically turn normal hearing back on. The implants bypass the almost unimaginably complex sensory structures of the cochlea with relatively simple electrodes. Many recipients become adept at interpreting the electrodes’ signals as intelligible sounds, especially if the implantation is done in infancy, but others struggle.

Howorth now has new hearing aids, and he can adjust them and his implant together, using his phone, but even when the devices are working optimally he can’t understand much. “When I pee, it sounds like a roomful of people making conversation,” he told me. “In fact, it sounds more like that than a roomful of people making conversation does.” Nothing helps with music. Rush Limbaugh, who had bilateral cochlear implants, once said that they made violins in movie scores sound like “fingernails on a chalkboard.” Howorth told me, “I’m not sure that’s the analogy I would use, but it does get across the unpleasantness of the sound. You do want to say, ‘Make it stop!’ ”

Nevertheless, Howorth says that, in many situations, he actually does better now than he did when he had one fully functioning ear. The reason is that he has begun using a free voice-to-text app on his phone, Google Live Transcribe & Notification. When someone speaks to him, he can read what they’re saying on the screen and respond as if he’d heard it. He belongs to a weekly lunch group with half a dozen men in their seventies and eighties, and when they get together he puts his phone in the center of the table and has no trouble joining in. Live Transcribe makes mistakes—“One of the guys, a retired history professor, said something that it transcribed as ‘I have a dick,’ ” Howorth told me—but it’s remarkably accurate, and it punctuates and capitalizes better than many English majors I know. It can also vibrate or flash if it detects smoke alarms, police sirens, crying babies, beeping appliances, running faucets, or other potentially worrisome sound emitters, and it works, with varying degrees of accuracy, in eighty languages. Howorth remarried a few years ago; his current wife, whose name is Sally, never knew him when he had two good ears. He used Live Transcribe at a party they attended together, and she told him afterward that it was the first time she’d been with him in a social setting in which he didn’t seem “aloof and unengaged.”

A researcher I interviewed in 2018 told me, “There is no better time in all of human history to be a person with hearing loss.” Nearly every expert I spoke with back then agreed. They cited over-the-counter hearing devices, improvements in conventional hearing aids and cochlear implants, and drugs and gene therapies in development. Those advances have continued, but, for Howorth and many others with hearing problems, the breakthrough has been acquiring the ability to subtitle life. “It’s transcription that has made the difference,” Howorth told me. The main contributor has been the tech industry’s staggering investment in artificial intelligence. Live Transcribe draws on Google’s vast collection of speech and text samples, which the company acquires by—well, who knows how Google acquires anything?

Back in the days when software came on disks, I bought a voice-to-text program called Dragon NaturallySpeaking. I had read about it in some computer magazine and thought it would be fun to fool around with, but I had to train it to understand my voice, using a headset that came with the disk, and even once I’d done that it was so error-prone that correcting a transcript took longer than typing the entire text would have taken. Now there are many options (among them the modern iteration of Dragon). The dictation feature in Microsoft Word works so well that a writer I know barely uses his keyboard anymore. Howorth and I sometimes play bridge online with two friends. The four of us chat on Zoom as we play, and if I didn’t know that he couldn’t hear I would never guess. Zoom’s captioning utility shows him everything the rest of us say, identified by name, and he responds, by speaking, without a noticeable lag. The app even ignores “um”—a feature that I had trouble explaining to Howorth, because Zoom left it out of my explanation, too.

For people who couldn’t hear, silent movies were an accessible form of public entertainment, since dialogue that couldn’t be deduced from the action appeared on printed title cards. Talkies—movies with synchronized sound, introduced in the late nineteen-twenties—were a setback. Subtitles are easy to add to film, but, for the most part, they were used only when actors and audiences spoke different languages. In 1958, Congress created Captioned Films for the Deaf, a program that was meant to be analogous to Talking Books for the Blind. Subtitles for television came later. The first captioned TV broadcast was an episode of “The French Chef,” starring Julia Child, which the Boston public station WGBH aired, as an experiment, in 1971. Other successful tests followed, and in 1979 the government funded the National Captioning Institute (N.C.I.), with the goal of producing more text. The first live network-TV broadcast that included real-time captioning was the 1982 Academy Awards show, on ABC. Most of the text that night was copied from the script; ad libs and award announcements were added, by stenographers, as they occurred.

Many of N.C.I.’s first captioners were moonlighting court reporters. They used stenotype machines, devices on which skilled users can produce accurate transcripts faster than typists can type. By the early two-thousands, the demand for captioning was outstripping the supply of trained stenographers, and N.C.I. began experimenting with Automatic Speech Recognition. The software couldn’t convert television dialogue directly; captioners had to train it to recognize their own voices, as I did with Dragon. Once they’d done that, they worked like simultaneous translators, by listening to what was said onscreen and immediately repeating it into a microphone connected to a computer. They were known within the organization as “voice writers.”

Meredith Patterson, who is now N.C.I.’s president, began working there in 2003 and was one of the first voice writers. “The software was great with vocabulary that you would expect to be difficult,” she said. “But it struggled with little words, which we don’t articulate well—like ‘in’ versus ‘and.’ ” Patterson and her colleagues had to insert all punctuation verbally, sometimes by using shortcuts—instead of “question mark,” they said “poof”—and they created verbal tags to differentiate among words like “two,” “to,” and “too.” Good short-term memory was a job requirement; if a TV business commentator rattled off stock names and prices, voice writers had to be able to repeat the information immediately without losing track of what came next. When hiring, Patterson said, “we used a screening process that was similar to what they use for air-traffic controllers.”

N.C.I. still employs voice writers, and even stenographers, but most captioning nowadays is automated. The transition began in earnest a little over four years ago, prompted by COVID-19, which pushed huge amounts of human interaction onto screens and raised the demand for captioning. (N.C.I. provides its service not just to TV networks but also to websites, educational institutions, corporations, and many other clients.) Meanwhile, rapid improvements in A.I. increased transcription accuracy.

In December, I spent an evening with Cristi Alberino and Ari Shell, both of whom are in their fifties and severely hearing impaired. We met at Alberino’s house, in West Hartford, Connecticut. They are members of the board of an organization called Hear Here Hartford, and Alberino is a board member of the American School for the Deaf (A.S.D.), whose campus isn’t far from her house. They both wear powerful hearing aids, and are adept at reading lips. Alberino began to lose her hearing when she was in graduate school. Shell said that he’s not certain when he lost his, but that when he was eight or nine he would sometimes go downstairs while his parents were sleeping and watch TV with the sound muted. “My dad came down once, and said, ‘Why don’t you turn up the volume?’ ” he told me. “I said I didn’t need to, because I knew exactly what they were saying.”

Alberino said that the pandemic had posed many challenges for her and other people with hearing loss, since masks muffled voices and made lipreading impossible. (Transparent masks exist, but weren’t widely available.) Nevertheless, she said, the pandemic was hugely beneficial for her. She works as a consultant in Connecticut’s Department of Education, and spends much of every workday on the phone or in meetings. “Ten years ago, we moved from a building with separate little offices into a giant room with floor-to-ceiling windows,” she said. “It’s two hundred and fifty people on an open floor, and they pipe in white noise. It’s an acoustic nightmare.”

The pandemic forced her to work from home, and her life changed. “Now I’m in a room by myself,” she continued. “There’s no noise except for me. No one cares how loud I am. And everything is captioned.” Work meetings moved onto Microsoft Teams, a videoconferencing app, which she called “the single greatest thing ever invented.” Teams includes a captioning utility, which works the way Live Transcribe and Zoom do. She can read anything her co-workers say, and respond, by speaking, without a lag. Before captioning, she had to concentrate so hard on what people were saying that she often had difficulty responding thoughtfully, and when she got home in the evening she was exhausted. She said, “After the lockdown, I went to H.R. and asked, ‘Can I please stay home?’ Because I don’t ever want to say ‘What?’ again.”

When I have Zoom conversations with my mother, who is about to turn ninety-six, I usually see just the top of her head and the smoke alarm on her ceiling, because she doesn’t like to aim her laptop’s camera at her face. I can’t see her eyes or her expression—a drawback when we talk. Using transcription utilities can pose a similar challenge, because a person reading your words on a phone can’t also look you in the eye. Howorth told me that he had used Live Transcribe during a meeting with a pair of financial advisers, but hadn’t been able to tell which adviser was speaking and so had to keep looking up to see whose lips were moving. (His cochlear implant didn’t help, since it makes all voices sound the same to him.)

One solution was devised by Madhav Lavakare, a senior at Yale. He was born in India, lived in the United States briefly, and attended school in Delhi. “One of my classmates had hearing loss,” he told me recently. “He had hearing aids, but he said they didn’t help him understand conversations—they just amplified noise.” Voice-to-text software didn’t help, either. “Because he didn’t have access to tone of voice, he needed to be able to read lips and see facial expressions and hand gestures—things that he couldn’t do while looking at his phone.”

Lavakare concluded that the ideal solution would be eyeglasses that displayed real-time speech transcription but didn’t block the wearer’s field of vision; his friend agreed. Lavakare had always been a tinkerer. When he was six, he built a solar-powered oven out of aluminum foil because his mother wouldn’t let him use the oven in their kitchen, and when he was nine he built “an annoying burglar alarm that was hard to disarm” in order to keep his parents out of his room. As he considered his friend’s hearing problem, he realized that he didn’t know enough about optics to build the glasses they had discussed, so he took apart his family’s movie projector and studied the way it worked.

He built a crude prototype, which he continued to refine when he got to Yale. Then he took two years off to work on the device exclusively, often with volunteer help, including from other students. He’s now twenty-three, and, to the relief of his parents, back in college. Not long ago, I met him for lunch at a pizza place in New Haven. He had brought a demo, which, from across the table, looked like a regular pair of eyeglasses. I’m nearsighted, so he told me to wear his glasses over my own. (If I were a customer, I could add snap-in prescription inserts.) Immediately, our conversation appeared as legible lines of translucent green text, which seemed to be floating in the space between us. “Holy shit,” I said (duly transcribed). He showed me that I could turn off the transcription by tapping twice on the glasses’ right stem, and turn it back on by doing the same again. He added speaker identification by changing a setting on his phone. The restaurant was small and noisy, but the glasses ignored two women talking loudly at a table to my left.

Was this article displayed correctly? Not happy with what you see?

See Archived Versions Request Manual Review

Category: Technology

Tags: hearing loss assistive technology Speech-to-Text Accessibility cochlear implants

Tabs Reminder: Tabs piling up in your browser? Set a reminder for them, close them and get notified at the right time.

If you often open multiple tabs and struggle to keep track of them, Tabs Reminder is the solution you need. Tabs Reminder lets you set reminders for tabs so you can close them and get notified about them later. Never lose track of important tabs again with Tabs Reminder!

Try our Chrome extension today!

Add to Chrome

Save As Favorite

Add To Reading List

Share this article with your
friends and colleagues.
Earn points from views and
referrals who sign up.
Learn more

Twitter/X

WhatsApp

Facebook

Save articles to reading lists
and access them on any device