Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think
by James Vlahos, Houghton Mifflin Harcourt, 2019
Steve Jobs could be relentless when he wanted something. In early 2010, he wanted a small startup in San Jose, Calif. CEO Dag Kittlaus and his cofounders had just raised a second round of funding and didn’t want to sell. Jobs called Kittlaus for 37 days straight, until he wrangled and wheedled a deal to buy the two-year-old venture for Apple at a price reportedly between US$150 million and $200 million. The company was Siri Inc.
Wired contributor James Vlahos tells the story of how Siri took up permanent residence in the iPhone in his new book, Talk to Me. It’s the first nontechnical book on voice computing that I’ve seen and a must-read if you have any interest in the topic.
Vlahos spends the first third of Talk to Me describing the platform war currently raging in voice computing. It details the race among the big players, including Amazon, Google, and Apple, to embed AI-driven voices in as many different devices as possible, as they seek to dominate the emerging ecosystem. The fact that Amazon now has more than 10,000 employees working on Alexa provides a good sense of the dimensions of that race.
Talk to Me is the first nontechnical book on voice computing that I’ve seen and a must-read if you have any interest in the topic.
But voice computing is more than a platform play. It is likely to have ramifications and applications for every company, especially if Vlahos’s contention that “the advent of voice computing is a watershed moment in human history” turns out to be right.
“Voice is becoming the universal remote to reality, a means to control any and every piece of technology,” he writes. “Voice allows us to command an army of digital helpers — administrative assistants, concierges, housekeepers, butlers, advisors, babysitters, librarians, and entertainers.” Voice will disrupt the business models of powerful companies — and create new opportunities for upstarts — in part because it will put AI directly in the control of consumers, Vlahos argues. “And voice introduces the world to relationships long prophesied by science fiction — ones in which personified AIs become our helpers, watchdogs, oracles, and friends.”
This fanciful future is barely evident in the present relationship I have with the cylindrical smart speaker sitting on my desk. The ring around the top of the speaker is usually glowing red. That’s because it’s on mute, which, I’m assured (but not completely reassured), precludes its maker’s employees from listening to me at will. But I do get a glimmer of how valuable that relationship could become when I unmute the device and ask it for the weather forecast, or order up a particularly tasty Grateful Dead jam. Voice is the simplest, most natural interface with technology yet invented.
“With voice,” Vlahos explains, “computers are finally doing it our way. They are learning our preferred way of communication: through language. Voice, optimally realized, has the potential to be so easy to use that it hardly feels like an interface at all. We know how to speak because we’ve been doing it for all of our lives.”
The key words here are “optimally realized.” It is abundantly clear that voice technology is far from it. Vlahos describes why in the remaining two-thirds of Talk to Me, which is devoted to explaining the technology and to exploring the challenges and decisions that lie ahead.
Voice computing is enabled by a mashup of technologies. “The sound waves emanating from your mouth must be converted into words, a process known as automated speech recognition,” writes Vlahos. “Determining what you were trying to communicate with those words is called natural-language understanding. Formulating a suitable reply is natural-language generation. And finally, speech synthesis allows voice-computing devices to audibly reply.”
The reason we’re seeing an explosion in voice computing now is that deep learning has enabled researchers to overcome a host of challenges in the above technologies. For instance, instead of having a person author every line a computer speaks, recently developed generative methods for training neural networks enable computers to come up with responses on their own.
But many challenges remain, as anyone who tries to have an actual conversation with Siri or Alexa well knows. Vlahos devotes a chapter to the ongoing efforts to meet the conversational challenge, but as yet, no one has won the Alexa Prize Socialbot Grand Challenge reward of $1 million for building a “socialbot that can converse coherently and engagingly with humans on popular topics for 20 minutes.”
Talk to Me does a terrific job of explaining why voice computing is such a fascinating and promising technology. It also explains why, as Vlahos writes, “across the landscape, companies ranging from Facebook to 1-800-Flowers are eying it and asking: How will the voice revolution affect us? Is this an opportunity or a threat? Voice creates new ways to sell things, advertise, and monetize people’s attention. To interact with consumers for marketing or customer service. To collect data and profit from it. To make bookings and provide services from matchmaking to therapy.”
Of course, figuring out all these new ways — and introducing them successfully — is your job.