Ed Yong | 4 J
When we take turns speaking, we chime in after a culturally universal short gap.
One of the greatest human skills becomes evident during conversations. It’s there, not in what we say but in what we don’t. It’s there in the pauses, the silences, the gaps between the end of my words and the start of yours.
When we talk we take turns, where the “right” to speak flips back and forth between partners. This conversational pitter-patter is so familiar and seemingly unremarkable that we rarely remark on it. But consider the timing: On average, each turn lasts for around 2 seconds, and the typical gap between them is just 200 milliseconds—barely enough time to utter a syllable. That figure is nigh-universal. It exists across cultures, with only slight variations. It’s even there in sign-language conversations.
“It’s the minimum human response time to anything,“ says Stephen Levinson from the Max Planck Institute for Psycholinguistics. It’s the time that runners take to respond to a starting pistol—and that’s just a simple signal. If you gave them a two-way choice—say, run on green but stay on red—they’d take longer to pick the right response. Conversations have a far greater number of possible responses, which ought to saddle us with lengthy gaps between turns. Those don’t exist because we build our responses during our partner’s turn. We listen to their words while simultaneously crafting our own, so that when our opportunity comes, we seize it as quickly as it’s physically possible to.
While she recorded Americans, her colleagues did the same around the world, for speakers of Italian, Dutch, Danish, Japanese, Korean, Lao, ≠Akhoe Hai//om (from Namibia), Yélî-Dnye (from Papua New Guinea), and Tzeltal (a Mayan language from Mexico). Despite the vastly different grammars of these ten tongues, and the equally vast cultural variations between their speakers, the researchers found more similarities than differences.
The typical gap was 200 milliseconds long, rising to 470 for the Danish speakers and falling to just 7 for the Japanese. So, yes, there’s some variation, but it’s pretty minuscule, especially when compared to cultural stereotypes. There are plenty of anecdotal reports of minute-long pauses in Scandinavian chat, and virtually simultaneous speech among New York Jews and Antiguan villagers. But Stivers and her colleagues saw none of that.
Instead, they uncovered what Levinson describes as a “basic metabolism of human social life”—a universal tendency to minimize the silence between turns, without overlaps. (Overlaps only happened in 17 percent of turns, typically lasted for just 100 milliseconds, and were mostly slight misfires where one speaker unexpectedly drew out their last syllable.)
Pessimists among us might view this as the ultimate indictment of conversation, a sign that we’re spending most of our “listening” time actually prepping what we are going to say. (As Chuck Pahlaniuk once wrote, “The only reason why we ask other people how their weekend was is so we can tell them about our own weekend.”) But really, this work shows that even the most chronic interruptor is really listening. “Everything points to what astute observers we are of every word choice, every phonetic change,” says Stivers.
The researchers also want to understand how turn-taking develops throughout our lives. So far, studies have shown that even six-month-old infants respond to their parents very quickly, albeit with more overlaps. At nine months, when they start to grasp that they’re actually communicating with another mind, they slow down. After that, it takes a surprisingly long time to get back to adult speeds. Stivers has found that even 8-year-olds, who have been speaking for many years, are still a few hundred milliseconds slower than adults. “That’s a puzzle that I don’t have answer to,” she says.