INTRODUCTION
Human language is unparalleled in its diversity, complexity, and informational potential. Through the production and combination of linguistic units, we can communicate about the past and the future, the real and the imagined. The acquisition of this unique communication system is highly dependent on linguistic input. One pivotal input type, infant-directed communication, is a crucial source of language learning and a key predictor of acquisition [e.g., (
1–
3)]. Infant-directed communication has been documented in many cultures and languages (
4,
5) and involves modifications to spoken language (
6,
7), sign language (
8), as well as gestures (
9) used when directly addressing a child or infant (hereafter infant). In the vocal domain, infant-directed communication is typically characterized by a number of acoustic (
5,
10) and structural (
11–
13) features and has been demonstrated to attract the infants’ attention more than adult-directed speech (
14,
15). These features have been shown to support language acquisition, both in comprehension (
3,
16) and production (
2,
17). A considerable body of research has further indicated that infant-directed communication plays a role in the transmission of cultural knowledge, a process commonly referred to as natural pedagogy (
18). However, there is also substantial cross-cultural variation in infants’ exposure to infant-directed communication, with no obvious effect on language learning (
19–
24). As a result, recent studies have also begun to emphasize the potential augmentative role of infant-surrounding communication in language acquisition (
21,
25–
27).
Despite the central role that infant-directed communication plays in language acquisition and cultural transmission, its evolutionary origins remain largely unknown (
28). The few studies investigating the topic in our closest-living relatives, the nonhuman great apes, have suggested minimal (
29) or no (
30) infant-directed vocal communication, although targeted and systematic empirical studies of great ape individuals in their natural habitat are currently lacking. To reconstruct the evolutionary emergence of infant-directed vocal communication, we investigated the extent to which vocal communication is directed at human infants from different cultures (Chintang, Qaqet, Shipibo-Konibo, and Tuatschin) and infants from at least one species from each genus of all nonhuman great apes [Bornean orangutans (
Pongo pygmaeus wurmbii), western gorillas (
Gorilla gorilla), chimpanzees (
Pan troglodytes schweinfurthii), and bonobos (
Panpaniscus)] using comparable methods. On the basis of earlier work (
28), we expected high levels of infant-directed vocal communication in humans and low levels in nonhuman great apes. On the other hand, we expected human and most nonhuman great ape infants to be exposed to a similar amount of surrounding communication (
28).
To investigate differences in vocal input across species, we first compared the absolute amount of infant-directed and surrounding vocal communication (see
Table 1 for definitions) between great ape species. In a second step, given underlying differences in the volubility of each species (see fig. S1), we compared the proportion of infant-directed and surrounding vocal communication in relation to the general vocal activity of each species. In all models, we also accounted for the infant’s age to control for age-related changes in infant-directed or surrounding communication. Our findings suggest that humans produce infant-directed vocal communication at levels that drastically exceed any other great ape species while the input from the surrounding environment across
Pan species (chimpanzees and bonobos) and humans is broadly equivalent.
DISCUSSION
Through comparing vocal input received by infants of all great ape species, we demonstrated notable differences in the amount of directed communication between human and nonhuman great apes. Humans engaged in infant-directed communication at orders of magnitude higher than any other great ape species. In contrast, we found fewer marked differences between species in terms of surrounding vocal input, with most nonhuman great apes displaying proportions similar to humans. Critically, our analyses showed that the vocal activity across species (i.e., how voluble a species is) was not sufficient to explain the differences between human and all nonhuman great apes.
A key implication of our data is that there must have been a massive expansion in the amount of infant-directed communication within the hominin lineage. What might explain this difference between humans and other great apes? Insights into the drivers of this expansion of infant-directed communication could be gleaned from our understanding of its function. One dominant hypothesis for the function of infant-directed vocal communication in humans is that these vocal interactions with children play a key role in scaffolding the transmission and learning of language. Our data provide compelling comparative support for this since nonhuman great ape vocal systems are generally considered to be far more fixed and genetically determined than humans’ (
32), and, accordingly, we see much lower levels of infant-directed vocal input. However, humans not only direct vocalizations at infants but also adopt an idiosyncratic vocal register when doing so (e.g., repeating words and using higher pitch) [e.g., (
6,
33)]. Given the very low levels of infant-directed vocal communication in nonhuman great apes, it simply was not possible to also examine vocalizations for equivalent structural features known to characterize human infant–directed speech. To shed further light on any potential acoustic variation, in addition to better understanding the precise function of the rare infant-directed vocalizations in nonhuman great apes, behavioral data (e.g., the contexts in which these vocalizations are produced) compiled over longer study periods are critical. Previous research indicates that infant-directed gestures in great apes are, like infant-directed communication in humans, characterized by enhanced repetition (
34,
35) and might even be more frequent in nonhuman great apes in contrast to the low rates of infant-directed vocal communication (
29,
36,
37). The gestural modality might therefore represent an additional fruitful avenue for future work investigating the evolutionary origins of infant-directed communication in humans.
Our focus here has been on the occurrence of infant-directed communication in our closest-living great ape relatives. Parallel research over the past 20 years has, however, also identified analogous or convergent cases in more distantly related species to humans. While informative, these cases of infant-directed vocal communication seem to serve qualitatively different functions than infant-directed vocal communication in humans. Functions range from infant-retrieval [e.g., domestic cats,
Felis catus: (
38)], mother recognition [e.g., Mexican free-tailed bats,
Tadarida brasiliensis: (
39)] to fine-tuning vocal production using vocal accommodation [e.g., orcas,
Orcinus orca: (
40)]. In marmoset monkeys (
Callithrix jacchus) vocal input from caregivers has been shown to bootstrap infant vocal development. Specifically, contingent parental vocal feedback (within turn-taking events), but not the overall amount of surrounding parental vocalizations, had a positive effect on the development of adult-like vocalizations in immatures (
41–
43). Possible cases where the features of infant-directed vocal communication have parallels to human infant–directed vocal communication are found in greater sac-winged bats (
Saccopteryx bilineata) as infant-directed vocal communication differs in pitch and timbre in comparison to adult-directed vocal communication (
44). In addition, bottlenose dolphins (
Tursiops truncatus) have also been shown to modulate the acoustic features of their signature whistles when their infant is present (
45). Critically, both dolphins and greater sac-winged bats, in addition to humans, are considered vocal learners (
46), highlighting a potential link between vocal learning and the presence of infant-directed vocal communication. Future studies investigating the presence and function of infant-directed vocal communication in vocal learning and nonvocal learning animals are required to support the generality of this relationship.
To better understand the overall vocal input infants are exposed to, we also captured the surrounding vocal communication of humans and other great apes. Our data indicate that infant-surrounding vocal communication is the major source of input in all nonhuman great ape species we tested. Our results also showed that orangutans received less surrounding vocal input compared to all the other great apes, including humans, a finding that can be explained by the fairly solitary nature of Bornean orangutans (
47). Secondly, we found an additional, albeit much smaller, difference whereby bonobo infants received a marginally higher proportion of surrounding vocal input compared to humans (see
Fig. 3). This difference can likely be explained by the greater amount of infant-directed communication in humans. An emerging picture from our data is that learning during vocal development in great apes must be nearly exclusively based on the surrounding (as opposed to directed) vocal input.
In some human cultures, infant-surrounding vocal input is also more prevalent than infant-directed vocal input (
21–
23), suggesting that surrounding vocal communication could also play a more important role for language acquisition than previously assumed. Such a conclusion is supported by growing evidence from more experimentally driven studies, demonstrating that children are not only able to learn language from surrounding interactions not explicitly directed toward them (
25,
48,
49), but that the precise nature of the surrounding input can provide differential learning opportunities. For example, a recent study has indicated that, across cultures, surrounding speech from children captures the infants’ attention more effectively compared to surrounding speech from adults, suggesting that surrounding speech from other children might provide more learnable input compared to more complex adult speech (
50). Following from this, a promising direction for further research would be to investigate the precise nature of infant-surrounding input for nonhuman great apes in greater detail—specifically focusing on the callers’ identities, age classes, relationship to the infant, and nature of the vocal input (whether, for example, it consists of calls or call combinations) and the potential influence this has on the infant’s vocal output.
In conclusion, our findings suggest that the tendency to direct vocalizations at infants, a key feature of human communication, has been massively expanded in the human lineage. These data provide support for the hypothesis that infant-directed vocal communication played a critical role in the emergence of human language through scaffolding the learning and acquisition of such a complex communication system. The presence of broadly equivalent levels of surrounding vocal communication in Pan (chimpanzees and bonobos) and humans suggests that early hominins probably relied on surrounding vocal communication for any learned component of their vocal system until infant-directed vocal communication became more prominent.
Acknowledgments
We thank the research staff of all field sites for invaluable help with data collection. We thank the Institut Congolais pour la Conservations de la Nature and the Ministry of Scientific Research and Technology in the DRC for permission to work in the Kokolopori Bonobo Reserve and the Bonobo Conservation Initiative and Vie Sauvage for support. For support and permission to collect data on chimpanzees at the Budongo Conservation Field Station, we thank UWA, UNCST, and RZSS. We thank the government and the Ministre de la Recherche Scientifique et de l’Innovation Technologique of the CAR and the WWF CAR for permission and support to collect data on gorillas in the Dzanga-Sangha Protected Areas. In addition, for permission and support to collect data on orangutans at Tuanan, we thank RISTEK, BKSDA, KLHK, and BOSF. We are very grateful to all the families who participated in the child language data collection. We thank G. You for help with data preparation, N. Lahiff for blind coding data, E. Ringen for statistical advice, and C. Schuppli and L. Fornof for helpful discussions. We also thank A. Russell, N. Lahiff, and M. Townsend for helpful comments on earlier drafts of this manuscript.
Funding: This research was funded by the NCCR Evolving Language, SwCSS NSF agreement Nr.51NF40_180888 (F.W., C.F., J.S., K.Z., C.P.v.S., S.S., and S.W.T.), the SNSF grant PP00P3_198912 (S.W.T), the SNSF grant 310030_185324 (L.N. and K.Z.), the Leverhulme Trust Research Leadership Award F/00 268/AP (M.L. and K.Z.), and the Transversal Action of Muséum National d’Histoire Naturelle 2020-2021 and 2021-2022 (S.M. and L.N.).
Author contributions: Conceptualization: F.W., C.F., J.S., K.Z., C.P.v.S., S.S., S.W.T., and E.P.W. Data curation: F.W., C.F., J.S., L.N., M.L., and M.A.v.N. Formal analysis: F.W., C.F., J.S., and E.P.W. Funding acquisition: K.Z., C.P.v.S., S.S., S.W.T., S.M., J.S., C.F., F.W., and M.L. Investigation: F.W., C.F., J.S., L.N., and M.L. Methodology: F.W., C.F., J.S., K.Z., M.L., C.P.v.S., S.S., S.W.T., E.P.W., M.S., and S.M. Project administration: F.W., C.F., J.S., L.N., M.L., M.S., M.A.v.N., S.M., B.H., K.Z., C.P.v.S., S.S., and S.W.T. Software: F.W., C.F., J.S., L.N., and E.P.W. Resources: M.S., S.M., K.Z., C.P.v.S., S.S., and S.W.T. Supervision: K.Z., C.P.v.S., S.S., S.W.T., M.S., M.L., M.A.v.N., and S.M. Validation: F.W., C.F., J.S., E.P.W., K.Z., S.W.T., and S.M. Visualization: J.S., C.F., and F.W. Writing—original draft: F.W., C.F., J.S., S.W.T., S.S., and C.P.v.S. Writing—review and editing: F.W., C.F., J.S., L.N., M.L., M.A.v.N., M.S., S.M., B.H., E.P.W., K.Z., C.P.v.S., S.S., and S.W.T.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Data and code used in the analyses are available in the Zenodo repository:
https://doi.org/10.5281/zenodo.15261663.