Advertisement

Why humans can't use natural language processing to speak with the animals

We speak around 6,500 languages, and they're all easier to translate than what comes out of a finch.

C Flanigan via Getty Images

We’ve been wondering what goes on inside the minds of animals since antiquity. Dr. Doolittle’s talent was far from novel when it was first published in 1920; Greco-Roman literature is lousy with speaking animals, writers in Zhanguo-era China routinely ascribed language to certain animal species and they’re also prevalent in Indian, Egyptian, Hebrew and Native American storytelling traditions.

Even today, popular Western culture toys with the idea of talking animals, though often through a lens of technology-empowered speech rather than supernatural force. The dolphins from both Seaquest DSV and Johnny Mnemonic communicated with their bipedal contemporaries through advanced translation devices, as did Dug the dog from Up.

We’ve already got machine-learning systems and natural language processors that can translate human speech into any number of existing languages, and adapting that process to convert animal calls into human-interpretable signals doesn’t seem that big of a stretch. However, it turns out we’ve got more work to do before we can converse with nature.

What is language?

“All living things communicate,” an interdisciplinary team of researchers argued in 2018’s On understanding the nature and evolution of social cognition: a need for the study of communication. “Communication involves an action or characteristic of one individual that influences the behavior, behavioral tendency or physiology of at least one other individual in a fashion typically adaptive to both.”

From microbes, fungi and plants on up the evolutionary ladder, science has yet to find an organism that exists in such extreme isolation as to not have a natural means of communicating with the world around it. But we should be clear that “communication” and “language” are two very different things.

“No other natural communication system is like human language,” argues the Linguistics Society of America. Language allows us to express our inner thoughts and convey information, as well as request or even demand it. “Unlike any other animal communication system, it contains an expression for negation — what is not the case … Animal communication systems, in contrast, typically have at most a few dozen distinct calls, and they are used only to communicate immediate issues such as food, danger, threat, or reconciliation.”

That’s not to say that pets don’t understand us. “We know that dogs and cats can respond accurately to a wide range of human words when they have prior experience with those words and relevant outcomes,” Dr. Monique Udell, Director of the Human-Animal Interaction Laboratory at Oregon State University, told Engadget. “In many cases these associations are learned through basic conditioning,” Dr. Udell said — like when we yell “dinner” just before setting out bowls of food.

Whether or not our dogs and cats actually understand what “dinner” means outside of the immediate Pavlovian response — remains to be seen. “We know that at least some dogs have been able to learn to respond to over 1,000 human words (labels for objects) with high levels of accuracy,” Dr. Udell said. “Dogs currently hold the record among non-human animal species for being able to match spoken human words to objects or actions reliably,” but it’s “difficult to know for sure to what extent dogs understand the intent behind our words or actions.”

Dr. Udell continued: “This is because when we measure a dog or cat’s understanding of a stimulus, like a word, we typically do so based on their behavior.” You can teach a dog to sit with both English and German commands, but “if a dog responds the same way to the word ‘sit’ in English and in German, it is likely the simplest explanation — with the fewest assumptions — is that they have learned that when they sit in the presence of either word then there is a pleasant consequence.”

Tea Stražičić for Engadget/Silica Magazine

Hush, the computers are speaking

Natural Language Programming (NLP) is the branch of AI that enables computers and algorithmic models to interpret text and speech, including the speaker’s intent, the same way we meatsacks do. It combines computational linguistics, which models the syntax, grammar and structure of a language, and machine-learning models, which “automatically extract, classify, and label elements of text and voice data and then assign a statistical likelihood to each possible meaning of those elements,” according to IBM. NLP underpins the functionality of every digital assistant on the market. Basically any time you’re speaking at a “smart” device, NLP is translating your words into machine-understandable signals and vice versa.

The field of NLP research has undergone a significant evolution in recent years, as its core systems have migrated from older Recurrent and Convoluted Neural Networks towards Google’s Transformer architecture, which greatly increases training efficiency.

Dr. Noah D. Goodman, Associate Professor of Psychology and Computer Science, and Linguistics at Stanford University, told Engadget that, with RNNs, “you'll have to go time-step by time-step or like word by word through the data and then do the same thing backward.” In contrast, with a transformer, “you basically take the whole string of words and push them through the network at the same time.”

“It really matters to make that training more efficient,” Dr. Goodman continued. “Transformers, they're cool … but by far the biggest thing is that they make it possible to train efficiently and therefore train much bigger models on much more data.”

Talkin’ jive ain’t just for turkeys

While many species’ communication systems have been studied in recent years — most notably cetaceans like whales and dolphins, but also the southern pied babbler, for its song’s potentially syntactic qualities, and vervet monkeys’ communal predator warning system — none have shown the sheer degree of complexity as the call of the avian family Paridae: the chickadees, tits and titmice.

Dr. Jeffrey Lucas, professor in the Biological Sciences department at Purdue University, told Engadget that the Paridae call “is one of the most complicated vocal systems that we know of. At the end of the day, what the [field’s voluminous number of research] papers are showing is that it's god-awfully complicated, and the problem with the papers is that they grossly under-interpret how complicated [the calls] actually are.”

These parids often live in socially complex, heterospecific flocks, mixed groupings that include multiple songbird and woodpecker species. The complexity of the birds’ social system is correlated with an increased diversity in communications systems, Dr. Lucas said. “Part of the reason why that correlation exists is because, if you have a complex social system that's multi-dimensional, then you have to convey a variety of different kinds of information across different contexts. In the bird world, they have to defend their territory, talk about food, integrate into the social system [and resolve] mating issues.”

The chickadee call consist of at least six distinct notes set in an open-ended vocal structure, which is both monumentally rare in non-human communication systems and the reason for the Chickadee’s call complexity. An open-ended vocal system means that “increased recording of chick-a-dee calls will continually reveal calls with distinct note-type compositions,” explained the 2012 study, Linking social complexity and vocal complexity: a parid perspective. “This open-ended nature is one of the main features the chick-a-dee call shares with human language, and one of the main differences between the chick-a-dee call and the finite song repertoires of most songbird species.”

Dolphin translation by Tea Stražičić
Tea Stražičić for Engadget/Silica Magazine

Dolphins have no need for kings

Training language models isn’t simply a matter of shoving in large amounts of data. When training a model to translate an unknown language into what you’re speaking, you need to have at least a rudimentary understanding of how the the two languages correlate with one another so that the translated text retains the proper intent of the speaker.

“The strongest kind of data that we could have is what's called a parallel corpus,” Dr. Goodman explained, which is basically having a Rosetta Stone for the two tongues. In that case, you’d simply have to map between specific words, symbols and phonemes in each language — figure out what means “river” or “one bushel of wheat” in each and build out from there.

Without that perfect translation artifact, so long as you have large corpuses of data for both languages, “it's still possible to learn a translation between the languages, but it hinges pretty crucially on the idea that the kind of latent conceptual structure,” Dr. Goodman continued, which assumes that both culture’s definitions of “one bushel of wheat” are generally equivalent.

Goodman points to the word pairs ’man and woman’ and ’king and queen’ in English. “The structure, or geometry, of that relationship we expect English, if we were translating into Hungarian, we would also expect those four concepts to stand in a similar relationship,” Dr. Goodman said. “Then effectively the way we'll learn a translation now is by learning to translate in a way that preserves the structure of that conceptual space as much as possible.”

Having a large corpus of data to work with in this situation also enables unsupervised learning techniques to be used to “extract the latent conceptual space,” Dr. Goodman said, though that method is more resource intensive and less efficient. However, if all you have is a large corpus in only one of the languages, you’re generally out of luck.

“For most human languages we assume the [quartet concepts] are kind of, sort of similar, like, maybe they don't have ‘king and queen’ but they definitely have ‘man and woman,’” Dr. Goodman continued. ”But I think for animal communication, we can't assume that dolphins have a concept of ‘king and queen’ or whether they have ‘men and women.’ I don't know, maybe, maybe not.”

And without even that rudimentary conceptual alignment to work from, discerning the context and intent of a animal’s call — much less, deciphering the syntax, grammar and semantics of the underlying communication system — becomes much more difficult. “You're in a much weaker position,” Dr. Goodman said. “If you have the utterances in the world context that they're uttered in, then you might be able to get somewhere.”

Basically, if you can obtain multimodal data that provides context for the recorded animal call — the environmental conditions, time of day or year, the presence of prey or predator species, etc — you can “ground” the language data into the physical environment. From there you can “assume that English grounds into the physical environment in the same way as this weird new language grounds into the physical environment’ and use that as a kind of bridge between the languages.”

Unfortunately, the challenge of translating bird calls into English (or any other human language) is going to fall squarely into the fourth category. This means we’ll need more data and a lot of different types of data as we continue to build our basic understanding of the structures of these calls from the ground up. Some of those efforts are already underway.

The Dolphin Communication Project, for example, employs a combination “mobile video/acoustic system” to capture both the utterances of wild dolphins and their relative position in physical space at that time to give researchers added context to the calls. Biologging tags — animal-borne sensors affixed to hide, hair, or horn that track the locations and conditions of their hosts — continue to shrink in size while growing in both capacity and capability, which should help researchers gather even more data about these communities.

What if birds are just constantly screaming about the heat?

Even if we won’t be able to immediately chat with our furred and feathered neighbors, gaining a better understanding of how they at least talk to each other could prove valuable to conservation efforts. Dr. Lucas points to a recent study he participated in that found environmental changes induced by climate change can radically change how different bird species interact in mixed flocks. “What we showed was that if you look across the disturbance gradients, then everything changes,” Dr. Lucas said. “What they do with space changes, how they interact with other birds changes. Their vocal systems change.”

“The social interactions for birds in winter are extraordinarily important because you know, 10 gram bird — if it doesn't eat in a day, it's dead,” Dr. Lucas continued. “So information about their environment is extraordinarily important. And what those mixed species flocks do is to provide some of that information.”

However that network quickly breaks down as the habitat degrades and in order to survive “they have to really go through fairly extreme changes in behavior and social systems and vocal systems … but that impacts fertility rates, and their ability to feed their kids and that sort of thing.”

Better understanding their calls will help us better understand their levels of stress, which can serve both modern conservation efforts and agricultural ends. “The idea is that we can get an idea about the level of stress in [farm animals], then use that as an index of what's happening in the barn and whether we can maybe even mitigate that using vocalizations,” Dr. Lucas said. “AI probably is going to help us do this.”

“Scientific sources indicate that noise in farm animal environments is a detrimental factor to animal health,” Jan Brouček of the Research Institute for Animal Production Nitra, observed in 2014. “Especially longer lasting sounds can affect the health of animals. Noise directly affects reproductive physiology or energy consumption.” That continuous drone is thought to also indirectly impact other behaviors including habitat use, courtship, mating, reproduction and the care of offspring.

Conversely, 2021’s research, The effect of music on livestock: cattle, poultry and pigs, has shown that playing music helps to calm livestock and reduce stress during times of intensive production. We can measure that reduction in stress based on what sorts of happy sounds those animals make. Like listening to music in another language, we can get with the vibe, even if we can't understand the lyrics.

This article contains affilate links; if you click such a link and make a purchase, we may earn a commission.