The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2
Pandemic epicenter
As 2019 turned into 2020, a coronavirus spilled over from wild animals into people, sparking what has become one of the best documented pandemics to afflict humans. However, the origins of the pandemic in December 2019 are controversial. Worobey et al. amassed the variety of evidence from the City of Wuhan, China, where the first human infections were reported. These reports confirm that most of the earliest human cases centered around the Huanan Seafood Wholesale Market. Within the market, the data statistically located the earliest human cases to one section where vendors of live wild animals congregated and where virus-positive environmental samples concentrated. In a related report, Pekar et al. found that genomic diversity before February 2020 comprised two distinct viral lineages, A and B, which were the result of at least two separate cross-species transmission events into humans (see the Perspective by Jiang and Wang). The precise events surrounding virus spillover will always be clouded, but all of the circumstantial evidence so far points to more than one zoonotic event occurring in Huanan market in Wuhan, China, likely during November–December 2019. —CA
Abstract
Understanding the circumstances that lead to pandemics is important for their prevention. We analyzed the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted “A” and “B.” Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were most probably the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October to 8 December), and the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans before November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the coronavirus disease 2019 (COVID-19) pandemic that caused more than 5 million confirmed deaths in the 2 years after its detection at the Huanan Seafood Wholesale Market (hereafter the “Huanan market”) in December 2019 in Wuhan, China (1–3). As the original outbreak spread to other countries, the diversity of SARS-CoV-2 quickly increased and led to the emergence of multiple variants of concern, but the beginning of the pandemic was marked by two major lineages denoted “A” and “B” (4).
Lineage B has been the most common throughout the pandemic and includes all 11 sequenced genomes from humans directly associated with the Huanan market, including the earliest sampled genome, Wuhan/IPBCAMS-WH-01/2019, and the reference genome, Wuhan/Hu-1/2019 (hereafter “Hu-1”) (5), sampled on 24 and 26 December 2019, respectively. The earliest lineage A viruses, Wuhan/IME-WH01/2019 and Wuhan/WH04/2020, were sampled on 30 December 2019 and 5 January 2020, respectively (6). Lineage A differs from lineage B by two nucleotide substitutions, C8782T and T28144C, which are also found in related coronaviruses from Rhinolophus bats (4), the presumed host reservoir (7). Lineage B viruses have a “C/T” pattern at these key sites (C8782 and T28144), whereas lineage A viruses have a “T/C” pattern (C8782T and T28144C). The earliest lineage A genomes from humans lack a direct epidemiological connection to the Huanan market but were sampled from individuals who lived or had recently stayed close to the market (8). It has been hypothesized that lineages A and B emerged separately (9), but “C/C” and “T/T” genomes intermediate to lineages A and B present a challenge to that hypothesis because their existence suggests within-human evolution of one lineage toward the other by way of a transitional form.
Questions about these lineages remain: If lineage B viruses are more distantly related to sarbecoviruses from Rhinolophus bats, then (i) why were lineage B viruses detected earlier than lineage A viruses, and (ii) why did lineage B predominate early in the pandemic?
Answering these questions requires determining the ancestral haplotype, the genomic sequence characteristics of the most recent common ancestor (MRCA) at the root of the SARS-CoV-2 phylogeny. In this study, we combined genomic and epidemiological data from early in the COVID-19 pandemic with phylodynamic models and epidemic simulations. We eliminated many of the haplotypes previously suggested as the MRCA of SARS-CoV-2 and show that the pandemic most likely began with at least two separate zoonotic transmissions starting in November 2019.
Results
Erroneous assignment of haplotypes intermediate to lineages A and B
There are 787 near-full-length genomes available from lineages A and B sampled by 14 February 2020 (data S1 and S2). However, there are also 20 genomes of intermediate haplotypes from this period that contain either T28144C or C8782T but not both mutations: C/C or T/T, respectively.
We identified numerous instances of C/C and T/T genomes sharing rare mutations with lineage A or lineage B viruses, often sequenced in the same laboratory, indicating that these intermediate genomes are likely artifacts of contamination or bioinformatics (10), similar to findings from our analysis of the emergence of SARS-CoV-2 in North America (fig. S1 and supplementary text) (11). We confirmed that a C/C genome from South Korea sharing three such mutations had low sequencing depth at position 28144 (≤10×), a T/T genome sampled in Singapore had low coverage at both 8782 and 28144 (≤10×), and three T/T genomes sampled in Wuhan had low sequencing depth and indeterminate nucleotide assignment at position 8782 (table S1). Further, the authors of 11 C/C genomes sampled in Wuhan and Sichuan confirmed that low sequencing depth at position 8782 led to the erroneous assignment of intermediate haplotypes.
C/C and T/T genomes continue to be observed throughout the pandemic as a result of convergent evolution, including T/T in the Diamond Princess cruise ship outbreak and subsequent COVID-19 waves in New York City and San Diego (figs. S2 to S5 and supplementary text). Instances of convergent evolution are identifiable because SARS-CoV-2 phylogenies exist in “near-perfect” tree space, in which topology can be inferred with high accuracy (12). These findings cast doubt on the claim that transitional C/C or T/T haplotypes between lineages A and B circulated in humans, reopening the door to the hypothesis that lineages A and B represent separate introductions.
Progenitor genome reconstruction
To better understand SARS-CoV-2 mutational patterns, we reconstructed the genome of a hypothetical progenitor of SARS-CoV-2. Using maximum likelihood ancestral state reconstruction across 15 nonrecombinant regions of SARS-CoV-2 and closely related sarbecovirus genomes sampled from bats and pangolins (13), we inferred the genome of this recombinant common ancestor (recCA) (figs. S6 and S7 and supplementary text). The recCA differed from Hu-1 by just 381 substitutions, including C8782T and T28144C. It is more informative than an outgroup sarbecovirus because it accounts for the closest relative across all recombinant segments (figs. S8 to S14 and supplementary text) (14) and, as an internal node on the phylogeny, is more genetically similar to SARS-CoV-2 than any extant sarbecovirus.
Reversions across the early pandemic phylogeny
The ubiquity of SARS-CoV-2 reversions (mutations from Hu-1 toward the recCA) indicates that genetic similarity to related viruses is a poor proxy for the ancestral haplotype. We observe 23 distinct reversions and 631 distinct substitutions (excluding reversions) across the SARS-CoV-2 phylogeny from the COVID-19 pandemic up to 14 February 2020 (Fig. 1). Substitutions were overrepresented at the 381 sites separating the recCA from Hu-1 (23 of 381, 6.04%), compared with substitutions at all other sites (631 of 29,134, 2.17%).
Fig. 1. Maximum likelihood phylogeny of the early SARS-CoV-2 pandemic, showing nucleotide reversions and putative candidates for the ancestral haplotype at the MRCA.
Putative ancestral haplotypes are identified with colored shapes. Reversions from the Hu-1 reference genome to the recCA are colored. Blue indicates C-to-T reversions, and black indicates all other reversions. The tree is rooted on Hu-1 to show reversion dynamics to the recCA.
Most reversions were C-to-T mutations (19 of 23, 82.6%), matching the mutational bias of SARS-CoV-2 (15–17). Genomes with C-to-T reversions can be found within lineage A, including C18060T (lineage A.1; for example, WA1) and C29095T (for example, 20SF012), as well as C24023T, C25000T, C4276T, and C22747T in mid-late January and February 2020. Hence, triple revertant genomes, such as WA1 and 20SF012, are neither unique nor rare. We also identified a lineage A genome (Malaysia/MKAK-CL-2020-6430/2020), sampled on 4 February 2020 from a Malaysian citizen traveling from Wuhan whose only four mutations from Hu-1 are all reversions (lineage A.1+T6025C) (Fig. 1). Therefore, no highly revertant haplotype can automatically be assumed to represent the MRCA of SARS-CoV-2, especially when these reversions are most often the result of C-to-T mutations. We continue to observe these reversion patterns throughout the pandemic, including in the emergence of World Health Organization (WHO)–named variants (figs. S15 and S16).
Inferring the MRCA of SARS-CoV-2
To infer the ancestral SARS-CoV-2 haplotype, we developed a nonreversible, random-effects substitution process model in a Bayesian phylodynamic framework that simultaneously reconstructs the underlying coalescent processes and the sequence of the MRCA of the SARS-CoV-2 phylogeny. The random-effects substitution model captures the C-to-T transition and G-to-T transversion biases (fig. S17 and supplementary text). Using this model, referred to as the unconstrained rooting (fig. S18A), we inferred the ancestral haplotype of the 787 lineage A and B genomes sampled by 14 February 2020.
Our unconstrained rooting strongly favors a lineage B or C/C ancestral haplotype and shows that a lineage A ancestral haplotype is inconsistent with the molecular clock [Bayes factor (BF) = 48.1] (Table 1). Lineage B exhibits more divergence from the root of the tree than would be expected if lineage A were the ancestral virus in humans (figs. S19 and S20). The T/T ancestral haplotype was also disfavored (BF > 10), likely because of the C-to-T transition bias (fig. S17). We acknowledge that the timing of the earliest sampled lineage B genomes associated with the Huanan market could bias rooting inference toward lineage B haplotypes; however, lineage A was still disfavored after excluding all market-associated genomes (BF = 11.0).
| Haplotype | Mutations from Hu-1 reference | Representative genome | Phylodynamic analysis | ||
|---|---|---|---|---|---|
| Unconstrained (%) | No market (%) | recCA (%) | |||
| B (C/T) | N/A | Hu-1 | 80.85† | 62.96† | 8.18* |
| A (T/C) | C8782T+T28144C | WH04 | 1.68** | 5.73** | 77.28† |
| C/C | T28144C | N/A | 10.32* | 23.02 | 10.49* |
| T/T | C8782T | N/A | 0.92** | 1.68** | 3.71** |
| A+C29095T (T/C) | C8782T+T28144C+C29095T | 20SF012 | <0.01*** | <0.01*** | 0.20** |
| A.1 (T/C) | C8782T+T28144C+C18060T | WA1 | <0.01*** | <0.01*** | 0.04*** |
Table 1. Posterior probabilities of inferred ancestral haplotype at the MRCA of SARS-CoV-2.
Positions 8782 and 28144 are indicated in parentheses. Representative genome is genome with sequence matching the haplotype. “No market” excludes 15 market-associated genomes (13 lineage B genomes associated with the Huanan market plus one lineage A and one lineage B genome not associated with the Huanan market). *BF > 3.2; **BF > 10; ***BF > 100. BFs are in favor of hypothesis rejection.
†Haplotype with greatest posterior probability; reference for BF.
Even though sequence similarity to closely related sarbecoviruses alone is insufficient to determine the SARS-CoV-2 ancestral haplotype, this similarity can inform phylodynamic inference. Rather than rely on outgroup rooting (fig. S18B) (18), we developed a rooting method that assigns the recCA as the progenitor of the inferred SARS-CoV-2 MRCA (fig. S18C). As opposed to the unconstrained rooting, the recCA root favored a lineage A haplotype over lineage B (BF = 9.4), although support for C/C was unchanged (Table 1). Our results were insensitive to the method of breakpoint identification in the recCA (supplementary text).
The A.1 and A+C29095T proposed ancestral haplotypes were strongly rejected by all the phylodynamic analyses, even when rooting with recCA or bat sarbecovirus outgroups, which include both C18060T and C29095T (Table 1 and data S3). Hence, WA1-like and 20SF012-like haplotypes cannot plausibly represent the MRCA of SARS-CoV-2 as previously suggested (19–21); the similarity of these genomes to the recCA is due to C-to-T reversions. Haplotypes not reported in Table 1 were similarly rejected (data S3).
We inferred the time of MRCA (tMRCA) for SARS-CoV-2 to be 11 December 2019 [95% highest posterior density (HPD) interval, 25 November to 12 December] by using unconstrained rooting. It has been suggested that a phylogenetic root in lineage A would produce an older tMRCA than would a lineage B rooting (21). Therefore, we developed an approach to assign a haplotype as the SARS-CoV-2 MRCA (A, B, C/C, A.1, or A+C29095T) and inferred the tMRCA (fig. S18D). The tMRCA was consistent with the recCA-rooted and fixed ancestral haplotype analyses (table S2 and supplementary text).
We infer only three plausible ancestral haplotypes: lineage A, lineage B, and C/C. However, the inability to reconcile the molecular clock at the outset of the COVID-19 pandemic with a lineage A ancestor without information from related sarbecoviruses (such as the recCA) requires us to question the assumption that both lineages A and B resulted from a single introduction.
Separate introductions of lineages A and B
We next sought to determine whether a single introduction from one of the plausible ancestral haplotypes (lineage A, lineage B, or C/C) is consistent with the SARS-CoV-2 phylogeny. We simulated SARS-CoV-2–like epidemics (22, 23) with a doubling time of 3.47 days [95% highest density interval (HDI) across simulations, 1.35 to 5.44] (24–26) to account for the rapid spread of SARS-CoV-2 before it was identified as the etiological agent of COVID-19 (figs. S21 and S22, tables S3 and S4, and supplementary text). We then simulated coalescent processes and viral genome evolution across these epidemics to determine how frequently we recapitulated the observed SARS-CoV-2 phylogeny.
Lineages A and B comprise 35.2 and 64.8% of the early SARS-CoV-2 genomes, respectively, and each lineage is characterized by a large polytomy (many sampled lineages descending from a single node on the phylogenetic tree), with the base of lineages A and B being the two largest polytomies observed in the early pandemic (Fig. 1). Furthermore, large polytomies are characteristic of SARS-CoV-2 introductions into geographical regions at the start of the pandemic (for example, fig. S23) (11, 27–29) and would similarly be expected to occur after a successful introduction of SARS-CoV-2 into humans. Congruently, the most common topology in our simulations is a large basal polytomy (with ≥100 descendent lineages), which is present in 47.5% of simulated epidemics (Fig. 2A).
Fig. 2. Probability of phylogenetic structures arising from a single introduction of SARS-CoV-2 in epidemic simulations.
(A) A large polytomy of at least 100 descendent lineages, which is consistent with the base of both lineages A and B. (B) Topology matching a C/C ancestral haplotype: two clades, each one mutation from the ancestor, both with polytomies of at least 100 descendent lineages. (C) Topology matching either a lineage A or lineage B ancestral haplotype: a basal polytomy with at least 100 descendent lineages, including a large clade separated by two mutations, also possessing a polytomy of at least 100 descendent lineages. Basal taxa have short branch lengths for clarity. The probability of each phylogenetic structure after a single introduction is reported in the respective boxes.
By contrast, a topology corresponding to a single introduction of an ancestral C/C haplotype—characterized by two clades, each comprising ≥30% of the taxa, possessing a large polytomy at the base, and separated from the MRCA by one mutation (Fig. 2B)—was only observed in 0.0% of our simulations. Further, a topology corresponding to a single introduction of an ancestral lineage A or lineage B haplotype—characterized by a large basal polytomy and a large clade, comprising between 30 and 70% of taxa, two mutations from the root with no intermediate genomes—was observed in only 3.1% of our simulations (Fig. 2C and supplementary text).
Our epidemic simulations do not support a single introduction of SARS-CoV-2 giving rise to the observed phylogeny. We therefore quantified the relative support for two introductions resulting in the empirical topology. By synthesizing posterior probabilities of inferred ancestral haplotypes, frequencies of topologies in epidemic simulations, and the expected relationships between these haplotypes and topologies, we inferred substantial support favoring separate introductions of lineages A and B (BF = 4.3 and BF = 4.2 by using the recCA and unconstrained rooting, respectively) [supplementary materials (SM), materials and methods]. This support is robust across shorter and longer doubling times, varying ascertainment rates, and minimum polytomy size (tables S4 and S5).
If lineages A and B arose from separate introductions, then the MRCA of SARS-CoV-2 was not in humans, and it is the tMRCAs of lineages A and B that are germane to the origins of SARS-CoV-2 (not the timing of their shared ancestor). Rooting with the recCA, we inferred the median tMRCA of lineage B to be 15 December (95% HPD, 5 December to 23 December) and the median tMRCA of lineage A to be 20 December (95% HPD, 5 December to 29 December) (Fig. 3A). The tMRCA of lineage B consistently predates the tMRCA of lineage A (Fig. 3B). These results are robust to using unconstrained rooting, fixing the ancestral haplotype, and excluding market-associated genomes (Fig. 3, A and B; table S2; and supplementary text).
Fig. 3. Comparison of the tMRCA and primary case dates for lineage A and lineage B in late 2019 across rooting strategies.
Each row represents a different rooting constraint in phylodynamic analysis, with lineage B, C/C, and lineage A representing a fixed ancestral haplotype. (A) The tMRCA for lineages A and B. (B) The number of weeks the tMRCA of lineage A occurs after the tMRCA of lineage B. (C) The timing of the primary case for lineages A and B. (D) The number of weeks the time of the primary case of lineage A occurs after the time of the primary case of lineage B. Long dashed lines indicate the median, and shading indicates the 95% HPD for each distribution. Short dashed lines indicate 0 weeks difference between lineages A and B. Posterior probability that lineage A originated after lineage B is reported in the gray box in each graph in (B) and (D).
Timing the introductions of lineages A and B
The primary case, the first human infected with a virus in an outbreak, could precede the tMRCA if basal lineages went extinct during cryptic transmission (23, 30, 31). The index case, the first identified case, is rarely also the primary case (32, 33). We next used an extension of our previously published framework that combines epidemic simulations and phylodynamic tMRCA inference (SM materials and methods) (23, 30, 31) to infer the timing of the lineage B and lineage A primary cases, accounting for both the index case symptom onset date and earliest documented COVID-19 hospitalization date.
The earliest unambiguous case of COVID-19, with symptom onset on 10 December and hospitalization on 16 December, was a seafood vendor at the Huanan market. Unfortunately, no published genome is available for this case (8). Nonetheless, we can reasonably assume that this individual had a lineage B virus (supplementary text) because an environmental sample (EPI_ISL_408512) from the stall this vendor operated was lineage B. The earliest lineage A genome (IME-WH01) is from a familial cluster for which the earliest symptom onset is 15 December and earliest hospitalization is 25 December (34). Accounting for these dates and using the recCA rooting, we inferred the infection date of the lineage B primary case to be 18 November (95% HPD, 23 October to 8 December) and the infection date of the primary case of lineage A to be 25 November (95% HPD, 29 October to 14 December). The lineage B primary case predated that of lineage A in 64.6% of the posterior sample, by a median of 7 days (Fig. 3D and table S6).
Our lineage A and B primary case inference is robust to rooting strategy and fixing the plausible ancestral haplotype to lineage A, lineage B, or C/C, as well as different index case dates, accounting for only hospitalization dates and varying growth rates and ascertainment rates (tables S7 to S10 and supplementary text). Therefore, our results indicate that lineage B was introduced into humans no earlier than late October and likely in mid-November 2019, and the introduction of lineage A occurred within days to weeks of this event.
We then inferred the number of ascertained infections and hospitalizations arising from these separate introductions. We found that an earlier introduction of lineage B led to a faster rise in lineage B–associated infections, dominating the simulated epidemics (Fig. 4) and recapitulating the predominance of lineage B observed in China in early 2020 (35). Similarly, simulated lineage B hospitalizations are more common than those from lineage A through January 2020 (fig. S24). We observed these patterns regardless of rooting strategy (unconstrained or recCA), ancestral haplotype (B, A, or C/C) (Fig. 4 and tables S11 and S12), and doubling time (figs. S25 to S28).
Fig. 4. Dynamics of simulated SARS-CoV-2 epidemics resulting from separate introductions of lineages A and B in late 2019.
Each row represents a different rooting constraint in phylodynamic analysis, with lineage B, C/C, and lineage A representing a fixed ancestral haplotype. (A) Estimated number of infections. The header of each column indicates whether the number of infections is caused by lineage A, lineage B, or the two lineages combined. Darker and lighter shading indicates the 50 and 95% HPD, respectively. (B) The log ratio of lineage B to lineage A infections on 15 December 2019. Posterior probability of having more lineage B infections than lineage A reported in the gray box in each graph.
Minimal cryptic circulation of SARS-CoV-2
We do not see evidence for substantial cryptic circulation before December 2019 (Fig. 4), even if we assume a single introduction (fig. S29 and supplementary text). Our simulated epidemics have a median of three (95% HPD, 1 to 18) cumulative infections at the tMRCA, with 99% of simulated epidemics resulting in at most 33 infections (table S13 and supplementary text). Further, it is unlikely that there were any COVID-19–related hospitalizations before December (36) because the simulated epidemics show a median of zero (95% HPD, 0 to 2) hospitalizations by 1 December 2019. These results are in accordance with the lack of a single SARS-CoV-2–positive sample among tens of thousands of serology samples from healthy blood donors from September to December 2019 (37) and thousands of specimens obtained from influenza-like illness patients at Wuhan hospitals from October to December 2019 (34). Therefore, there was likely extremely low prevalence of SARS-CoV-2 in Wuhan before December 2019. Even when we simulated epidemics with a longer doubling time, resulting in an earlier timing of the primary cases (tables S8 and S10), there were still few infections before December 2019 (table S13).
Additional introductions
The extinction rate of our simulated epidemics (simulations that did not produce self-sustaining transmission chains) indicate that there were likely multiple failed introductions of SARS-CoV-2. Similar to our previous findings (23), 77.8% of simulated epidemics went extinct. These failed introductions produced a mean of 2.06 infections and 0.10 hospitalizations; hence, failed introductions could easily go unnoticed. If we treat each SARS-CoV-2 introduction, failed or successful, as a Bernoulli trial and simulate introductions until we see two successful introductions, we estimate that eight (95% HPD, 2 to 23) introductions led to the establishment of both lineage A and B in humans.
Limitations
Our analysis of the putative intermediate haplotypes suggests that there remain lineage assignment errors between lineages A and B, particularly of genomes sampled in January and February of 2020, which could influence the precision of the phylogenetic topology and tMRCA inference. We lack direct evidence of a virus closely related to SARS-CoV-2 in nonhuman mammals at the Huanan market or its supply chain. The genome sequence of a virus directly ancestral to SARS-CoV-2 would provide more precision regarding the timing of the introductions of SARS-CoV-2 into humans and the epidemiological dynamics before its discovery. Although we simulated epidemics across a range of plausible epidemiological dynamics, our models represent a time frame before the ascertainment of COVID-19 cases and sequencing of SARS-CoV-2 genomes and thus before when these models could be empirically validated.
Discussion
The genomic diversity of SARS-CoV-2 during the early pandemic presents a paradox. Lineage A viruses are at least two mutations closer to bat coronaviruses, indicating that the ancestor of SARS-CoV-2 arose from this lineage. However, lineage B viruses predominated early in the pandemic, particularly at the Huanan market, indicating that this lineage began spreading earlier in humans. Further complicating this matter is the molecular clock of SARS-CoV-2 in humans, which rejects a single-introduction origin of the pandemic from a lineage A virus. We resolved this paradox by showing that early SARS-CoV-2 genomic diversity and epidemiology are best explained by at least two separate zoonotic transmissions, in which lineage A and B progenitor viruses were both circulating in nonhuman mammals before their introduction into humans (figs. S30 and S31).
The most probable explanation for the introduction of SARS-CoV-2 into humans involves zoonotic jumps from as-yet-undetermined, intermediate host animals at the Huanan market (34, 38, 39). Through late 2019, the Huanan market sold animals that are known to be susceptible to SARS-CoV-2 infection and capable of intraspecies transmission (40–42). The presence of potential animal reservoirs, coupled with the timing of the lineage B primary case and the geographic clustering of early cases around the Huanan market (39), support the hypothesis that SARS-CoV-2 lineage B jumped into humans at the Huanan market in mid-November 2019.
In a related study (39), we show that the two earliest lineage A cases are more closely positioned geographically to the Huanan market than expected compared with other COVID-19 cases in Wuhan in early 2020, despite having no known association with the market. This geographic proximity is consistent with a separate and subsequent origin of lineage A at the Huanan market in late November 2019. The presence of lineage A virus at the Huanan market was confirmed by Gao et al. (43) from a sample taken from discarded gloves.
The high extinction rate of SARS-CoV-2 transmission chains, observed in both our simulations and real-world data (44), indicates that the two zoonotic events that established lineages A and B may have been accompanied by additional, cryptic introductions. However, such introductions could easily be missed, particularly if their subsequent transmission chains quickly went extinct or the introduced viruses had a lineage A or B haplotype. Failed introductions of intermediate haplotypes are also possible. Critically, we have no evidence of subsequent zoonotic introductions in late December leading up to the closure of the Huanan market on 1 January 2020. By then, the susceptible host animals that had been documented at the market during the previous months were no longer found in the Huanan market (34).
Other coronavirus epidemics and outbreaks in humans—including SARS-CoV-1, Middle East respiratory syndrome coronavirus (MERS-CoV), and most recently, porcine deltacoronavirus in Haiti—have been the result of repeated introductions from animal hosts (45–47). These repeated introductions were easily identifiable because human viruses in these outbreaks were more closely related to viruses sampled in the animal reservoirs than to other human viruses. However, the genomic diversity within the putative SARS-CoV-2 animal reservoir at the Huanan market was likely shallower than that seen in SARS-CoV-1 and MERS-CoV reservoirs (45, 46, 48). Hence, even though lineages A and B had nearly identical haplotypes, their MRCA likely existed in an animal reservoir. The ability to disentangle repeated introductions of SARS-CoV-2 from a shallow genetic reservoir has previously been shown in the early SARS-CoV-2 epidemic in Washington state, where two viruses, separated by two mutations, were independently introduced from, and shared an MRCA in, China (figs. S23 and S30 and supplementary text) (11).
Successful transmission of both lineage A and B viruses after independent zoonotic events indicates that evolutionary adaptation within humans was not needed for SARS-CoV-2 to spread (49). We now know that SARS-CoV-2 can readily spread after reverse-zoonosis to Syrian hamsters (Mesocricetus auratus), American mink (Neovison vison), and white-tailed deer (Odocoileus virginianus), indicating its host generalist capacity (50–55). Furthermore, once an animal virus acquires the capacity for human infection and transmission, the only remaining barrier to spillover is contact between humans and the pathogen. Thereafter, a single zoonotic transmission event indicates that the conditions necessary for spillovers have been met, which portends additional jumps. For example, there were at least two zoonotic jumps of SARS-CoV-2 into humans from pet hamsters in Hong Kong (55) and dozens from minks to humans on Dutch fur farms (52, 53).
We show that it is highly unlikely that SARS-CoV-2 circulated widely in humans earlier than November 2019 and that there was limited cryptic spread, with at most dozens of SARS-CoV-2 infections in the weeks leading up to the inferred tMRCA, but likely far fewer. By late December, when SARS-CoV-2 was identified as the etiological agent of COVID-19 (8), the virus had likely been introduced into humans multiple times as a result of persistent contact with a viral reservoir.
Materials and methods summary
Materials and methods described in full detail can be found in the supplementary materials.
Sequence data
We queried the GISAID database (56), GenBank, and National Genomics Data Center of the China National Center for Bioinformatics (CNCB) for complete high-coverage SARS-CoV-2 genomes collected by 14 February 2020, resulting in a dataset of 787 taxa belonging to lineages A and B and 20 taxa with C/C or T/T haplotypes. Genomes were aligned by using MAFFT v7.453 (57) to the SARS-CoV-2 reference genome (Wuhan/Hu-1/2019), and 388 sites were masked at the 5′ and 3′ ends and at sites based on De Maio et al. (58). All genome accessions are available in data S1 and S2.
Progenitor genome reconstruction and reversion analysis
We reconstructed the progenitor of SARS-CoV-2, the the recCA. We (i) inferred a maximum likelihood tree of 31 sarbecovirus genomes (SARS-CoV-2 and 30 closely related sarbecoviruses sampled from bats and pangolins) across 15 predefined nonrecombinant regions (13) with IQ-TREE v2.0.7 (59), (ii) inferred the sequence of the ancestor of SARS-CoV-2 in each tree with TreeTime v0.8.1 (60), and (iii) concatenated the resulting sequences. We next inferred a maximum likelihood tree of the 787 SARS-CoV-2 taxa with IQ-TREE and performed ancestral state reconstruction with TreeTime to identify substitutions that were reversions from Wuhan-Hu-1 to the recCA across the SARS-CoV-2 phylogeny.
Phylodynamic inference and epidemic simulations
We performed phylodynamic inference using BEAST v1.10.5 (61) with the 787-taxon dataset to infer the ancestral haplotype and the tMRCA of SARS-CoV-2 (and the tMRCAs of lineages A and B), using a nonreversible random-effects substitution model and exploring unconstrained rooting, recCA-rooting, fixing the ancestral haplotype as a root, and outgroup rooting. SARS-CoV-2–like epidemics were simulated with FAVITES-COVID-Lite v0.0.1 (22, 62) using a scale-free network of 5 million individuals and a customized extension of the SAPHIRE model (63), producing coalescent trees on which we simulated mutations. We calculated the BF comparing the support of two introductions of SARS-CoV-2 with one introduction by considering the posterior probabilities of the four most likely ancestral haplotypes from the phylodynamic inference (lineage A, lineage B, C/C, and T/T), the frequencies of the phylogenetic structures associated with introductions of these haplotypes in the epidemic simulations, and equal prior probabilities for each ancestral haplotype and the number of introductions.
We connected the phylodynamic inference and epidemic simulations by means of a rejection sampling–based approach (23), accounting for the tMRCAs of lineages A and B and the earliest documented COVID-19 illness onset and hospitalization dates. We then inferred the timing of the introductions of lineages A and B and the infections and hospitalizations for each lineage. The proportion of epidemic simulations that went extinct (no onward transmission by the end of the simulation) was used to approximate the number of SARS-CoV-2 introductions needed to result in two introductions with sustained onward transmission.
Acknowledgments
We gratefully acknowledge the authors from the originating laboratories and the submitting laboratories, who generated and shared through GISAID the viral genomic sequences and metadata on which this research is based (data S1) (57). We are greatly appreciative toward L. Chen, D. Liu, and Y. Yan for providing insight into the putative intermediate genomes and clarification regarding the relative sequencing depth at positions 8782 and 28144, M. Eloit and S. Temmam for sharing their sarbecovirus dataset and recombination analysis results, and M. Kuehnert for general feedback. Figure S30 was created with Biorender.com.
Funding: This project has been funded in whole or in part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), Department of Health and Human Services, under contract 75N93021C00015 (M.W.). J.E.P. acknowledges support from NIH (T15LM011271). N.M. acknowledges support from the National Science Foundation (NSF) (NSF-2028040). J.I.L. acknowledges support from NIH (5T32AI007244-38). J.O.W. acknowledges support from NIH (R01AI135992 and R01AI136056). R.F.G. is supported by NIH (R01AI132223, R01AI132244, U19AI142790, U54CA260581, U54HG007480, and OT2HL158260), the Coalition for Epidemic Preparedness Innovation, the Wellcome Trust Foundation, Gilead Sciences, and the European and Developing Countries Clinical Trials Partnership Programme. M.A.S. and A.R. acknowledge the support from the Wellcome Trust (Collaborators Award 206298/Z/17/Z–ARTIC network), the European Research Council (grant agreement 725422–ReservoirDOCS), and NIH (R01AI153044). K.G.A. is supported by NIH (U19AI135995, U01AI151812, and UL1TR002550). E.C.H. is funded by an Australian Research Council Laureate Fellowship (FL170100022). J.L., H.P., and M.-S.P. acknowledge support from the National Research Foundation of Korea, funded by the Ministry of Science and Information and Communication Technologies, Republic of Korea (NRF-2017M3A9E4061995 and NRF-2019R1A2C2084206). T.I.V. acknowledges support from the Branco Weiss Fellowship. We thank AMD for the donation of critical hardware and support resources from its HPC Fund that made this work possible. This work was supported (in part) by the Epidemiology and Laboratory Capacity (ELC) for Infectious Diseases Cooperative Agreement [grant ELC DETECT (6NU50CK000517-01-07)] funded by the Centers for Disease Control and Prevention (CDC). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC or the Department of Health and Human Services.
Author contributions: Conceptualization: J.E.P., M.A.S., K.G.A., M.W., and J.O.W. Methodology: J.E.P., A.M., N.M., M.A.S., K.G.A., M.W., and J.O.W. Software: J.E.P., A.M., N.M., K.G., and M.A.S. Validation: J.E.P., A.M., K.I., K.G., and M.A.S. Formal analysis: J.E.P., A.M., E.P., K.I., J.L.H., K.G., and J.O.W. Investigation: J.E.P., A.M., E.P., K.I., J.L.H., K.G., and J.O.W. Resources: M.A.S., K.G.A., and J.O.W. Data curation: J.E.P., E.P., K.G., M.Z., J.C.W., S.H., J.L., H.P., M.-S.P., K.C.Z.Y., R.T.P.L., M.N.M.I., Y.M.N., and J.O.W. Writing – original draft preparation: J.E.P., M.W., and J.O.W. Writing – review and editing: All authors. Visualization: J.E.P., J.L.H., K.G., and L.M.M.S.; Supervision: M.A.S, K.G.A., M.W., and J.O.W.; Project administration: M.A.S., K.G.A., M.W., and J.O.W.; Funding acquisition: M.A.S., K.G.A., M.W., and J.O.W.
Competing interests: J.O.W. has received funding from the CDC (ongoing) through contracts or agreements to his institution unrelated to this research. M.A.S. receives contracts and grants from the US Food and Drug Administration, the US Department of Veterans Affairs, and Janssen Research and Development unrelated to this research. R.F.G. is cofounder of Zalgen Labs, a biotechnology company developing countermeasures to emerging viruses. M.W., E.C.H., A.R., M.A.S., J.O.W., and K.G.A. have received consulting fees and/or provided compensated expert testimony on SARS-CoV-2 and the COVID-19 pandemic.
Data and materials availability: Genome accessions are available in data S1 and S2, and raw data for two genomes were deposited to NCBI SRA (PRJNA806767 and PRJNA802993). Code is available on Zenodo (64). The following data are available on Zenodo (65): recCA sequence, BEAST phylogenetic inference output, and simulation and rejection sampling output for the primary analysis.
License information: This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
Supplementary Materials
This PDF file includes:
Other Supplementary Material for this manuscript includes the following:
MDAR Reproducibility Checklist
- Download
- 167.92 KB
Data S1 to S3
- Download
- 58.20 KB
Erratum 12 October 2023:
The original Supplementary Material versions are available here:
- Download
- 6.24 MB
- Download
- 51.00 KB
In the print version of Pekar et al., “The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2,” Fig. 4 is printed incorrectly. The correct version can be viewed at https://www.science.org/doi/epdf/10.1126/science.abp8337.
We apologize to readers and the authors for this error.
During the processing of the previous correction for this paper, data files were inadvertently not uploaded to the new version. This affected data S1 and S2, which have now been uploaded.
References and Notes
1
E. Dong, H. Du, L. Gardner, An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
2
L.-L. Ren, Y.-M. Wang, Z.-Q. Wu, Z.-C. Xiang, L. Guo, T. Xu, Y.-Z. Jiang, Y. Xiong, Y.-J. Li, X.-W. Li, H. Li, G.-H. Fan, X.-Y. Gu, Y. Xiao, H. Gao, J.-Y. Xu, F. Yang, X.-M. Wang, C. Wu, L. Chen, Y.-W. Liu, B. Liu, J. Yang, X.-R. Wang, J. Dong, L. Li, C.-L. Huang, J.-P. Zhao, Y. Hu, Z.-S. Cheng, L.-L. Liu, Z.-H. Qian, C. Qin, Q. Jin, B. Cao, J.-W. Wang, Identification of a novel coronavirus causing severe pneumonia in human: A descriptive study. Chin. Med. J. 133, 1015–1024 (2020).
3
H. Ritchie, E. Mathieu, L. Rodés-Guirao, C. Appel, C. Giattino, E. Ortiz-Ospina, J. Hasell, B. Macdonald, S. Beltekian, X. Roser, Coronavirus Pandemic (COVID-19). Our World in Data (2022); https://ourworldindata.org/covid-deaths.
4
A. Rambaut, E. C. Holmes, Á. O’Toole, V. Hill, J. T. McCrone, C. Ruis, L. du Plessis, O. G. Pybus, A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
5
F. Wu, S. Zhao, B. Yu, Y.-M. Chen, W. Wang, Z.-G. Song, Y. Hu, Z.-W. Tao, J.-H. Tian, Y.-Y. Pei, M.-L. Yuan, Y.-L. Zhang, F.-H. Dai, Y. Liu, Q.-M. Wang, J.-J. Zheng, L. Xu, E. C. Holmes, Y.-Z. Zhang, A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
6
R. Lu, X. Zhao, J. Li, P. Niu, B. Yang, H. Wu, W. Wang, H. Song, B. Huang, N. Zhu, Y. Bi, X. Ma, F. Zhan, L. Wang, T. Hu, H. Zhou, Z. Hu, W. Zhou, L. Zhao, J. Chen, Y. Meng, J. Wang, Y. Lin, J. Yuan, Z. Xie, J. Ma, W. J. Liu, D. Wang, W. Xu, E. C. Holmes, G. F. Gao, G. Wu, W. Chen, W. Shi, W. Tan, Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet 395, 565–574 (2020).
7
S. Lytras, J. Hughes, D. Martin, P. Swanepoel, A. de Klerk, R. Lourens, S. L. Kosakovsky Pond, W. Xia, X. Jiang, D. L. Robertson, Exploring the natural origins of SARS-CoV-2 in the light of recombination. Genome Biol. Evol. 14, evac018 (2022).
8
M. Worobey, Dissecting the early COVID-19 cases in Wuhan. Science 374, 1202–1204 (2021).
9
R. F. Garry, Early appearance of two distinct genomic lineages of SARS-CoV-2 in different Wuhan wildlife markets suggests SARS-CoV-2 has a natural origin. Virological (2021); https://virological.org/t/early-appearance-of-two-distinct-genomic-lineages-of-sars-cov-2-in-different-wuhan-wildlife-markets-suggests-sars-cov-2-has-a-natural-origin/691.
10
N. De Maio, C. Walker, R. Borges, L. Weilguny, G. Slodkowicz, N. Goldman, Issues with SARS-CoV-2 sequencing data. Virological (2020); https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473.
11
M. Worobey, J. Pekar, B. B. Larsen, M. I. Nelson, V. Hill, J. B. Joy, A. Rambaut, M. A. Suchard, J. O. Wertheim, P. Lemey, The emergence of SARS-CoV-2 in Europe and North America. Science 370, 564–570 (2020).
12
J. O. Wertheim, M. Steel, M. J. Sanderson, Accuracy in Near-Perfect Virus Phylogenies. Syst. Biol. 71, 426–438 (2022).
13
S. Temmam, K. Vongphayloth, E. Baquero, S. Munier, M. Bonomi, B. Regnault, B. Douangboubpha, Y. Karami, D. Chrétien, D. Sanamxay, V. Xayaphet, P. Paphaphanh, V. Lacoste, S. Somlor, K. Lakeomany, N. Phommavanh, P. Pérot, O. Dehan, F. Amara, F. Donati, T. Bigot, M. Nilges, F. A. Rey, S. van der Werf, P. T. Brey, M. Eloit, Bat coronaviruses related to SARS-CoV-2 and infectious for human cells. Nature 604, 330–336 (2022).
14
J. B. Pease, M. W. Hahn, More accurate phylogenies inferred from low-recombination regions in the presence of incomplete lineage sorting. Evolution 67, 2376–2384 (2013).
15
J. Ratcliff, P. Simmonds, Potential APOBEC-mediated RNA editing of the genomes of SARS-CoV-2 and other coronaviruses and its impact on their longer term evolution. Virology 556, 62–72 (2021).
16
P. Simmonds, Rampant C→U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: Causes and consequences for their short- and long-term evolutionary trajectories. MSphere 5, e00408-20 (2020).
17
P. Simmonds, M. A. Ansari, Extensive C->U transition biases in the genomes of a wide range of mammalian RNA viruses; potential associations with transcriptional mutations, damage- or host-mediated editing of viral RNA. PLOS Pathog. 17, e1009596 (2021).
18
P. Forster, L. Forster, C. Renfrew, M. Forster, Phylogenetic network analysis of SARS-CoV-2 genomes. Proc. Natl. Acad. Sci. U.S.A. 117, 9241–9243 (2020).
19
J. D. Bloom, Recovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemic. Mol. Biol. Evol. 38, 5211–5224 (2021).
20
M. A. Caraballo-Ortiz, S. Miura, M. Sanderford, T. Dolker, Q. Tao, S. Weaver, S. L. K. Pond, S. Kumar, TopHap: Rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity. Bioinformatics 38, 2719–2726 (2022).
21
S. Kumar, Q. Tao, S. Weaver, M. Sanderford, M. A. Caraballo-Ortiz, S. Sharma, S. L. K. Pond, S. Miura, An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic. Mol. Biol. Evol. 38, 3046–3059 (2021).
22
N. Moshiri, M. Ragonnet-Cronin, J. O. Wertheim, S. Mirarab, FAVITES: Simultaneous simulation of transmission networks, phylogenetic trees and sequences. Bioinformatics 35, 1852–1861 (2019).
23
J. Pekar, M. Worobey, N. Moshiri, K. Scheffler, J. O. Wertheim, Timing the SARS-CoV-2 index case in Hubei province. Science 372, 412–417 (2021).
24
S. Hsiang, D. Allen, S. Annan-Phan, K. Bell, I. Bolliger, T. Chong, H. Druckenmiller, L. Y. Huang, A. Hultgren, E. Krasovich, P. Lau, J. Lee, E. Rolf, J. Tseng, T. Wu, The effect of large-scale anti-contagion policies on the COVID-19 pandemic. Nature 584, 262–267 (2020).
25
A. L. Bertozzi, E. Franco, G. Mohler, M. B. Short, D. Sledge, The challenges of modeling and forecasting the spread of COVID-19. Proc. Natl. Acad. Sci. U.S.A. 117, 16732–16738 (2020).
26
S. Sanche, Y. T. Lin, C. Xu, E. Romero-Severson, N. Hengartner, R. Ke, High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg. Infect. Dis. 26, 1470–1477 (2020).
27
T. Bedford, A. L. Greninger, P. Roychoudhury, L. M. Starita, M. Famulare, M.-L. Huang, A. Nalla, G. Pepper, A. Reinhardt, H. Xie, L. Shrestha, T. N. Nguyen, A. Adler, E. Brandstetter, S. Cho, D. Giroux, P. D. Han, K. Fay, C. D. Frazar, M. Ilcisin, K. Lacombe, J. Lee, A. Kiavand, M. Richardson, T. R. Sibley, M. Truong, C. R. Wolf, D. A. Nickerson, M. J. Rieder, J. A. Englund, J. Hadfield, E. B. Hodcroft, J. Huddleston, L. H. Moncla, N. F. Müller, R. A. Neher, X. Deng, W. Gu, S. Federman, C. Chiu, J. S. Duchin, R. Gautom, G. Melly, B. Hiatt, P. Dykema, S. Lindquist, K. Queen, Y. Tao, A. Uehara, S. Tong, D. MacCannell, G. L. Armstrong, G. S. Baird, H. Y. Chu, J. Shendure, K. R. Jerome, H. Y. Chu, M. Boeckh, J. A. Englund, M. Famulare, B. R. Lutz, D. A. Nickerson, M. J. Rieder, L. M. Starita, M. Thompson, J. Shendure, T. Bedford, A. Adler, E. Brandstetter, S. Cho, C. D. Frazar, D. Giroux, P. D. Han, J. Hadfield, S. Huang, M. L. Jackson, A. Kiavand, L. E. Kimball, K. Lacombe, J. Logue, V. Lyon, K. L. Newman, M. Richardson, T. R. Sibley, M. L. Zigman Suchsland, M. Truong, C. R. Wolf, Seattle Flu Study Investigators, Cryptic transmission of SARS-CoV-2 in Washington state. Science 370, 571–575 (2020).
28
M. Zeller, K. Gangavarapu, C. Anderson, A. R. Smither, J. A. Vanchiere, R. Rose, D. J. Snyder, G. Dudas, A. Watts, N. L. Matteson, R. Robles-Sikisaka, M. Marshall, A. K. Feehan, G. Sabino-Santos Jr., A. R. Bell-Kareem, L. D. Hughes, M. Alkuzweny, P. Snarski, J. Garcia-Diaz, R. S. Scott, L. I. Melnik, R. Klitting, M. McGraw, P. Belda-Ferre, P. DeHoff, S. Sathe, C. Marotz, N. D. Grubaugh, D. J. Nolan, A. C. Drouin, K. J. Genemaras, K. Chao, S. Topol, E. Spencer, L. Nicholson, S. Aigner, G. W. Yeo, L. Farnaes, C. A. Hobbs, L. C. Laurent, R. Knight, E. B. Hodcroft, K. Khan, D. N. Fusco, V. S. Cooper, P. Lemey, L. Gardner, S. L. Lamers, J. P. Kamil, R. F. Garry, M. A. Suchard, K. G. Andersen, Emergence of an early SARS-CoV-2 epidemic in the United States. Cell 184, 4939–4952.e15 (2021).
29
C. Alteri, V. Cento, A. Piralla, V. Costabile, M. Tallarita, L. Colagrossi, S. Renica, F. Giardina, F. Novazzi, S. Gaiarsa, E. Matarazzo, M. Antonello, C. Vismara, R. Fumagalli, O. M. Epis, M. Puoti, C. F. Perno, F. Baldanti, Genomic epidemiology of SARS-CoV-2 reveals multiple lineages and early spread of SARS-CoV-2 infections in Lombardy, Italy. Nat. Commun. 12, 434 (2021).
30
L. du Plessis, O. Pybus, Further musings on the tMRCA. Virological (2020); https://virological.org/t/further-musings-on-the-tmrca/340.
31
J. Giesecke, Primary and index cases. Lancet 384, 2024 (2014).
32
Centers for Disease Control and Prevention (CDC), Prevalence of IgG antibody to SARS-associated coronavirus in animal traders—Guangdong Province, China, 2003. MMWR Morb. Mortal. Wkly. Rep. 52, 986–987 (2003).
33
A. Marí Saéz, S. Weiss, K. Nowak, V. Lapeyre, F. Zimmermann, A. Düx, H. S. Kühl, M. Kaba, S. Regnaut, K. Merkel, A. Sachse, U. Thiesen, L. Villányi, C. Boesch, P. W. Dabrowski, A. Radonić, A. Nitsche, S. A. J. Leendertz, S. Petterson, S. Becker, V. Krähling, E. Couacy-Hymann, C. Akoua-Koffi, N. Weber, L. Schaade, J. Fahr, M. Borchert, J. F. Gogarten, S. Calvignac-Spencer, F. H. Leendertz, Investigating the zoonotic origin of the West African Ebola epidemic. EMBO Mol. Med. 7, 17–23 (2015).
34
WHO Headquarters, WHO-convened global study of origins of SARS-CoV-2: China Part (2021); https://www.who.int/publications/i/item/who-convened-global-study-of-origins-of-sars-cov-2-china-part.
35
X. Zhang, Y. Tan, Y. Ling, G. Lu, F. Liu, Z. Yi, X. Jia, M. Wu, B. Shi, S. Xu, J. Chen, W. Wang, B. Chen, L. Jiang, S. Yu, J. Lu, J. Wang, M. Xu, Z. Yuan, Q. Zhang, X. Zhang, G. Zhao, S. Wang, S. Chen, H. Lu, Viral and host factors related to the clinical outcome of COVID-19. Nature 583, 437–440 (2020).
36
E. O. Nsoesie, B. Rader, Y. L. Barnoon, L. Goodwin, J. Brownstein, Analysis of hospital traffic and search engine data in Wuhan China indicates early disease activity in the Fall of 2019. Dig. Acc. Scholar. Harv. 2, 019 (2020).
37
L. Chang, L. Zhao, Y. Xiao, T. Xu, L. Chen, Y. Cai, X. Dong, C. Wang, X. Xiao, L. Ren, L. Wang, Serosurvey for SARS-CoV-2 among blood donors in Wuhan, China from September to December 2019. Protein Cell 10.1093/procel/pwac013 (2022).
38
E. C. Holmes, S. A. Goldstein, A. L. Rasmussen, D. L. Robertson, A. Crits-Christoph, J. O. Wertheim, S. J. Anthony, W. S. Barclay, M. F. Boni, P. C. Doherty, J. Farrar, J. L. Geoghegan, X. Jiang, J. L. Leibowitz, S. J. D. Neil, T. Skern, S. R. Weiss, M. Worobey, K. G. Andersen, R. F. Garry, A. Rambaut, The origins of SARS-CoV-2: A critical review. Cell 184, 4848–4856 (2021).
39
M. Worobey, J. I. Levy, L. M. Malpica Serrano, A. Crits-Christoph, J. E. Pekar, S. A. Goldstein, A. L. Rasmussen, M. U. G. Kraemer, C. Newman, M. P. G. Koopmans, M. A. Suchard, J. O. Wertheim, P. Lemey, D. L. Robertson, R. F. Garry, E. C. Holmes, A. Rambaut, K. G. Andersen, The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic. Science 377, 951–959 (2022).
40
X. Xiao, C. Newman, C. D. Buesching, D. W. Macdonald, Z.-M. Zhou, Animal sales from Wuhan wet markets immediately prior to the COVID-19 pandemic. Sci. Rep. 11, 11898 (2021).
41
C. M. Freuling, A. Breithaupt, T. Müller, J. Sehl, A. Balkema-Buschmann, M. Rissmann, A. Klein, C. Wylezich, D. Höper, K. Wernike, A. Aebischer, D. Hoffmann, V. Friedrichs, A. Dorhoi, M. H. Groschup, M. Beer, T. C. Mettenleiter, Susceptibility of raccoon dogs for experimental SARS-CoV-2 infection. Emerg. Infect. Dis. 26, 2982–2985 (2020).
42
S. M. Porter, A. E. Hartwig, H. Bielefeldt-Ohmann, A. M. Bosco-Lauth, J. Root, Susceptibility of wild canids to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). bioRxiv 478082 [Preprint] (2022); https://doi.org/10.1101/2022.01.27.478082.
43
G. Gao, W. Liu, P. Liu, W. Lei, Z. Jia, X. He, L.-L. Liu, W. Shi, Y. Tan, S. Zou, X. Zhao, G. Wong, J. Wang, F. Wang, G. Wang, K. Qin, R. Gao, J. Zhang, M. Li, W. Xiao, Y. Guo, Z. Xu, Y. Zhao, J. Song, J. Zhang, W. Zhen, W. Zhou, B. Ye, J. Song, M. Yang, W. Zhou, Y. Bi, K. Cai, D. Wang, W. Tan, J. Han, W. Xu, G. Wu, Surveillance of SARS-CoV-2 in the environment and animal samples of the Huanan Seafood Market.Research Square [Preprint] (2022); https://doi.org/10.21203/rs.3.rs-1370392/v1.
44
L. du Plessis, J. T. McCrone, A. E. Zarebski, V. Hill, C. Ruis, B. Gutierrez, J. Raghwani, J. Ashworth, R. Colquhoun, T. R. Connor, N. R. Faria, B. Jackson, N. J. Loman, Á. O’Toole, S. M. Nicholls, K. V. Parag, E. Scher, T. I. Vasylyeva, E. M. Volz, A. Watts, I. I. Bogoch, K. Khan, D. M. Aanensen, M. U. G. Kraemer, A. Rambaut, O. G. Pybus; COVID-19 Genomics UK (COG-UK) Consortium, Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021).
45
Chinese SARS Molecular Epidemiology Consortium, Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. Science 303, 1666–1669 (2004).
46
G. Dudas, L. M. Carvalho, A. Rambaut, T. Bedford, MERS-CoV spillover at the camel-human interface. eLife 7, e31257 (2018).
47
J. A. Lednicky, M. S. Tagliamonte, S. K. White, M. A. Elbadry, M. M. Alam, C. J. Stephenson, T. S. Bonny, J. C. Loeb, T. Telisma, S. Chavannes, D. A. Ostrov, C. Mavian, V. M. Beau De Rochars, M. Salemi, J. G. Morris Jr., Independent infections of porcine deltacoronavirus among Haitian children. Nature 600, 133–137 (2021).
48
B. Kan, M. Wang, H. Jing, H. Xu, X. Jiang, M. Yan, W. Liang, H. Zheng, K. Wan, Q. Liu, B. Cui, Y. Xu, E. Zhang, H. Wang, J. Ye, G. Li, M. Li, Z. Cui, X. Qi, K. Chen, L. Du, K. Gao, Y.-T. Zhao, X.-Z. Zou, Y.-J. Feng, Y.-F. Gao, R. Hai, D. Yu, Y. Guan, J. Xu, Molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and on farms. J. Virol. 79, 11892–11900 (2005).
49
K. G. Andersen, A. Rambaut, W. I. Lipkin, E. C. Holmes, R. F. Garry, The proximal origin of SARS-CoV-2. Nat. Med. 26, 450–452 (2020).
50
V. L. Hale, P. M. Dennis, D. S. McBride, J. M. Nolting, C. Madden, D. Huey, M. Ehrlich, J. Grieser, J. Winston, D. Lombardi, S. Gibson, L. Saif, M. L. Killian, K. Lantz, R. M. Tell, M. Torchetti, S. Robbe-Austerman, M. I. Nelson, S. A. Faith, A. S. Bowman, SARS-CoV-2 infection in free-ranging white-tailed deer. Nature 602, 481–486 (2022).
51
J. C. Chandler, S. N. Bevins, J. W. Ellis, T. J. Linder, R. M. Tell, M. Jenkins-Moore, J. J. Root, J. B. Lenoch, S. Robbe-Austerman, T. J. DeLiberto, T. Gidlewski, M. Kim Torchetti, S. A. Shriner, SARS-CoV-2 exposure in wild white-tailed deer (Odocoileus virginianus). Proc. Natl. Acad. Sci. U.S.A. 118, e2114828118 (2021).
52
L. Lu, R. S. Sikkema, F. C. Velkers, D. F. Nieuwenhuijse, E. A. J. Fischer, P. A. Meijer, N. Bouwmeester-Vincken, A. Rietveld, M. C. A. Wegdam-Blans, P. Tolsma, M. Koppelman, L. A. M. Smit, R. W. Hakze-van der Honing, W. H. M. van der Poel, A. N. van der Spek, M. A. H. Spierenburg, R. J. Molenaar, J. Rond, M. Augustijn, M. Woolhouse, J. A. Stegeman, S. Lycett, B. B. Oude Munnink, M. P. G. Koopmans, Adaptation, spread and transmission of SARS-CoV-2 in farmed minks and associated humans in the Netherlands. Nat. Commun. 12, 6802 (2021).
53
B. B. Oude Munnink, R. S. Sikkema, D. F. Nieuwenhuijse, R. J. Molenaar, E. Munger, R. Molenkamp, A. van der Spek, P. Tolsma, A. Rietveld, M. Brouwer, N. Bouwmeester-Vincken, F. Harders, R. Hakze-van der Honing, M. C. A. Wegdam-Blans, R. J. Bouwstra, C. GeurtsvanKessel, A. A. van der Eijk, F. C. Velkers, L. A. M. Smit, A. Stegeman, W. H. M. van der Poel, M. P. G. Koopmans, Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 371, 172–177 (2021).
54
S. V. Kuchipudi, M. Surendran-Nair, R. M. Ruden, M. Yon, R. H. Nissly, K. J. Vandegrift, R. K. Nelli, L. Li, B. M. Jayarao, C. D. Maranas, N. Levine, K. Willgert, A. J. K. Conlan, R. J. Olsen, J. J. Davis, J. M. Musser, P. J. Hudson, V. Kapur, Multiple spillovers from humans and onward transmission of SARS-CoV-2 in white-tailed deer. Proc. Natl. Acad. Sci. U.S.A. 119, e2121644119 (2022).
55
H.-L. Yen, T. H. C. Sit, C. J. Brackman, S. S. Y. Chuk, S. M. S. Cheng, H. Gu, L. D. J. Chang, P. Krishnan, D. Y. M. Ng, G. Y. Z. Liu, M. M. Y. Hui, S. Y. Ho, K. W. S. Tam, P. Y. T. Law, W. Su, S. F. Sia, K.-T. Choy, S. S. Y. Cheuk, S. P. N. Lau, A. W. Y. Tang, J. C. T. Koo, L. Yung, G. Leung, J. S. M. Peiris, L. L. M. Poon, Transmission of SARS-CoV-2 delta variant (AY.127) from pet hamsters to humans, leading to onward human-to-human transmission: A case study. Lancet 399, 1070–1078 (2022).
56
Y. Shu, J. McCauley, GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 22, 30494 (2017).
57
K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
58
N. De Maio, C. Walker, R. Borges, L. Weilguny, G. Slodkowicz, N. Goldman, Masking strategies for SARS-CoV-2 alignments. Virological (2020); https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480.
59
B. Q. Minh, H. A. Schmidt, O. Chernomor, D. Schrempf, M. D. Woodhams, A. von Haeseler, R. Lanfear, IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37, 1530–1534 (2020).
60
P. Sagulenko, V. Puller, R. A. Neher, TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 4, vex042 (2018).
61
M. A. Suchard, P. Lemey, G. Baele, D. L. Ayres, A. J. Drummond, A. Rambaut, Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
62
N. Moshiri, FAVITES-COVID-Lite: A simplified (and much faster) simulation pipeline specifically for COVID-19 contact + transmission + phylogeny + sequence simulation. Github (2022); https://github.com/niemasd/FAVITES-COVID-Lite.
63
X. Hao, S. Cheng, D. Wu, T. Wu, X. Lin, C. Wang, Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 584, 420–424 (2020).
64
J. E. Pekar et al., Code for: The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Zenodo (2022); 10.5281/zenodo.6585475.
65
J. E. Pekar et al., Data for: The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Zenodo (2022); 10.5281/zenodo.6887186.
66
J. Hadfield, C. Megill, S. M. Bell, J. Huddleston, B. Potter, C. Callender, P. Sagulenko, T. Bedford, R. A. Neher, Nextstrain: Real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
67
A. Rambaut, figtree. Github (2018); https://github.com/rambaut/figtree/releases.
68
H. Li, Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
69
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
70
N. D. Grubaugh, K. Gangavarapu, J. Quick, N. L. Matteson, J. G. De Jesus, B. J. Main, A. L. Tan, L. M. Paul, D. E. Brackney, S. Grewal, N. Gurfield, K. K. A. Van Rompay, S. Isern, S. F. Michael, L. L. Coffey, N. J. Loman, K. G. Andersen, An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8 (2019).
71
B. C. Jackson, gofasta. Github (2022); https://github.com/virus-evolution/gofasta.
72
G. Dudas, baltic: baltic - backronymed adaptable lightweight tree import code for molecular phylogeny manipulation, analysis and visualisation. Github (2021); https://github.com/evogytis/baltic.
73
S. L. Kosakovsky Pond, D. Posada, M. B. Gravenor, C. H. Woelk, S. D. W. Frost, GARD: A genetic algorithm for recombination detection. Bioinformatics 22, 3096–3098 (2006).
74
D. P. Martin, B. Murrell, M. Golden, A. Khoosal, B. Muhire, RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).
75
H. M. Lam, O. Ratmann, M. F. Boni, Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Mol. Biol. Evol. 35, 247–251 (2018).
76
M. F. Boni, P. Lemey, X. Jiang, T. T.-Y. Lam, B. W. Perry, T. A. Castoe, A. Rambaut, D. L. Robertson, Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat. Microbiol. 5, 1408–1417 (2020).
77
A. Rambaut, T. T. Lam, L. Max Carvalho, O. G. Pybus, Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
78
A. Rambaut, A. J. Drummond, D. Xie, G. Baele, M. A. Suchard, Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
79
F. Li, Y.-Y. Li, M.-J. Liu, L.-Q. Fang, N. E. Dean, G. W. K. Wong, X.-B. Yang, I. Longini, M. E. Halloran, H.-J. Wang, P.-L. Liu, Y.-H. Pang, Y.-Q. Yan, S. Liu, W. Xia, X.-X. Lu, Q. Liu, Y. Yang, S.-Q. Xu, Household transmission of SARS-CoV-2 and risk factors for susceptibility and infectivity in Wuhan: A retrospective observational study. Lancet Infect. Dis. 21, 617–628 (2021).
80
S. Funk, EpiNow2: Estimate realtime case counts and time-varying epidemiological parameters. Github (2020); https://github.com/epiforecasts/EpiNow2.
81
N. Moshiri, NiemaGraphGen: A memory-efficient global-scale contact network simulation toolkit. GIGAbyte 10.46471/gigabyte.37 (2022).
82
A. L. Barabasi, R. Albert, Emergence of scaling in random networks. Science 286, 509–512 (1999).
83
S. Eubank, H. Guclu, V. S. Kumar, M. V. Marathe, A. Srinivasan, Z. Toroczkai, N. Wang, Modelling disease outbreaks in realistic urban social networks. Nature 429, 180–184 (2004).
84
J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, J. Heijne, M. Sadkowska-Todys, M. Rosinska, W. J. Edmunds, Social contacts and mixing patterns relevant to the spread of infectious diseases. PLOS Med. 5, e74 (2008).
85
F. D. Sahneh, A. Vajdi, H. Shakeri, F. Fan, C. Scoglio, GEMFsim: A stochastic simulator for the generalized epidemic modeling framework. J. Comput. Sci. 22, 36–44 (2017).
86
X. Yang, Y. Yu, J. Xu, H. Shu, J. Xia, H. Liu, Y. Wu, L. Zhang, Z. Yu, M. Fang, T. Yu, Y. Wang, S. Pan, X. Zou, S. Yuan, Y. Shang, Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: A single-centered, retrospective, observational study. Lancet Respir. Med. 8, 475–481 (2020).
87
F. Zhou, T. Yu, R. Du, G. Fan, Y. Liu, Z. Liu, J. Xiang, Y. Wang, B. Song, X. Gu, L. Guan, Y. Wei, H. Li, X. Wu, J. Xu, S. Tu, Y. Zhang, H. Chen, B. Cao, Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet 395, 1054–1062 (2020).
88
J. Yang, X. Chen, X. Deng, Z. Chen, H. Gong, H. Yan, Q. Wu, H. Shi, S. Lai, M. Ajelli, C. Viboud, P. H. Yu, Disease burden and clinical severity of the first pandemic wave of COVID-19 in Wuhan, China. Nat. Commun. 11, 5411 (2020).
89
N. Moshiri, TreeSwift: A massively scalable Python tree package. SoftwareX 11, 100436 (2020).
90
J. Ma, First Chinese coronavirus cases may have been infected in October 2019, says new research. South China Morning Post (2021); https://www.scmp.com/news/china/science/article/3126499/first-chinese-covid-19-cases-may-have-been-infected-october-2019.
91
K. Andersen, Clock and TMRCA based on 27 genomes. Virological (2020); https://virological.org/t/clock-and-tmrca-based-on-27-genomes/347/6.
92
L. Pipes, H. Wang, J. P. Huelsenbeck, R. Nielsen, Assessing uncertainty in the rooting of the SARS-CoV-2 phylogeny. Mol. Biol. Evol. 38, 1537–1543 (2021).
93
T. Murata, A. Sakurai, M. Suzuki, S. Komoto, T. Ide, T. Ishihara, Y. Doi, Shedding of viable virus in asymptomatic SARS-CoV-2 carriers. MSphere 6, e00019-21 (2021).
94
T. Sekizuka, K. Itokawa, T. Kageyama, S. Saito, I. Takayama, H. Asanuma, N. Nao, R. Tanaka, M. Hashino, T. Takahashi, H. Kamiya, T. Yamagishi, K. Kakimoto, M. Suzuki, H. Hasegawa, T. Wakita, M. Kuroda, Haplotype networks of SARS-CoV-2 infections in the Diamond Princess cruise ship outbreak. Proc. Natl. Acad. Sci. U.S.A. 117, 20198–20201 (2020).
95
Y. Turakhia, B. Thornlow, A. S. Hinrichs, N. De Maio, L. Gozashti, R. Lanfear, D. Haussler, R. Corbett-Detig, Ultrafast sample placement on existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet. 53, 809–816 (2021).
96
P. Zhou, X.-L. Yang, X.-G. Wang, B. Hu, L. Zhang, W. Zhang, H.-R. Si, Y. Zhu, B. Li, C.-L. Huang, H.-D. Chen, J. Chen, Y. Luo, H. Guo, R.-D. Jiang, M.-Q. Liu, Y. Chen, X.-R. Shen, X. Wang, X.-S. Zheng, K. Zhao, Q.-J. Chen, F. Deng, L.-L. Liu, B. Yan, F.-X. Zhan, Y.-Y. Wang, G.-F. Xiao, Z.-L. Shi, A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020).
97
M. Ghafari, L. du Plessis, J. Raghwani, S. Bhatt, B. Xu, O. G. Pybus, A. Katzourakis, Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. Mol. Biol. Evol. 39, msac009 (2022).
98
S. Duchêne, E. C. Holmes, S. Y. W. Ho, Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proc. Biol. Sci. 281, 20140732 (2014).
99
J. Dushoff, S. W. Park, Speed and strength of an epidemic intervention. Proc. Biol. Sci. 288, 20201556 (2021).
100
J. T. Wu, K. Leung, M. Bushman, N. Kishore, R. Niehus, P. M. de Salazar, B. J. Cowling, M. Lipsitch, G. M. Leung, Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China. Nat. Med. 26, 506–510 (2020).
101
C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu, L. Zhang, G. Fan, J. Xu, X. Gu, Z. Cheng, T. Yu, J. Xia, Y. Wei, W. Wu, X. Xie, W. Yin, H. Li, M. Liu, Y. Xiao, H. Gao, L. Guo, J. Xie, G. Wang, R. Jiang, Z. Gao, Q. Jin, J. Wang, B. Cao, Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020).
102
R. Ke, E. Romero-Severson, S. Sanche, N. Hengartner, Estimating the reproductive number R0 of SARS-CoV-2 in the United States and eight European countries and implications for vaccination. J. Theor. Biol. 517, 110621 (2021).
103
L. Pellis, F. Scarabel, H. B. Stage, C. E. Overton, L. H. K. Chappell, E. Fearon, E. Bennett, K. A. Lythgoe, T. A. House, I. Hall; University of Manchester COVID-19 Modelling Group, Challenges in control of COVID-19: Short doubling time and long delay to effect of interventions. Philos. Trans. R. Soc. Lond. B Biol. Sci. 376, 20200264 (2021).
104
Q. Li, X. Guan, P. Wu, X. Wang, L. Zhou, Y. Tong, R. Ren, K. S. M. Leung, E. H. Y. Lau, J. Y. Wong, X. Xing, N. Xiang, Y. Wu, C. Li, Q. Chen, D. Li, T. Liu, J. Zhao, M. Liu, W. Tu, C. Chen, L. Jin, R. Yang, Q. Wang, S. Zhou, R. Wang, H. Liu, Y. Luo, Y. Liu, G. Shao, H. Li, Z. Tao, Y. Yang, Z. Deng, B. Liu, Z. Ma, Y. Zhang, G. Shi, T. T. Y. Lam, J. T. Wu, G. F. Gao, B. J. Cowling, B. Yang, G. M. Leung, Z. Feng, Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N. Engl. J. Med. 382, 1199–1207 (2020).
105
M. Chinazzi, J. T. Davis, M. Ajelli, C. Gioannini, M. Litvinova, S. Merler, A. Pastore Y Piontti, K. Mu, L. Rossi, K. Sun, C. Viboud, X. Xiong, H. Yu, M. E. Halloran, I. M. Longini Jr., A. Vespignani, The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science 368, 395–400 (2020).
106
R. Li, S. Pei, B. Chen, Y. Song, T. Zhang, W. Yang, J. Shaman, Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 368, 489–493 (2020).
107
N. Moshiri, CoaTran: Coalescent tree simulation along a transmission network.bioRxiv [Preprint] (2020). .
108
K. M. Braun, G. K. Moreno, C. Wagner, M. A. Accola, W. M. Rehrauer, D. A. Baker, K. Koelle, D. H. O’Connor, T. Bedford, T. C. Friedrich, L. H. Moncla, Acute SARS-CoV-2 infections harbor limited within-host diversity and transmit via tight transmission bottlenecks. PLOS Pathog. 17, e1009849 (2021).
109
J. Ma, Coronavirus: China’s first confirmed Covid-19 case traced back to November 17. South China Morning Post (2020); https://www.scmp.com/news/china/society/article/3074991/coronavirus-chinas-first-confirmed-covid-19-case-traced-back.
(2)eLetters
eLetters is a forum for ongoing peer review. eLetters are not edited, proofread, or indexed, but they are screened. eLetters should provide substantive and scholarly commentary on the article. Embedded figures cannot be submitted, and we discourage the use of figures within eLetters in general. If a figure is essential, please include a link to the figure within the text of the eLetter. Please read our Terms of Service before submitting an eLetter.
Log In to Submit a ResponseNo eLetters have been published for this article yet.
Recommended articles from TrendMD
- COVID-19: An Analysis of Coronavirus Mutations in More Than 1,000 PeopleKatrina A. Lythgoe, Science, 2021
- Cryptic transmission of SARS-CoV-2 in Washington stateTrevor Bedford, Science, 2020
- Insights from SARS-CoV-2 sequencesMichael A. Martin, Science, 2021
- Novel Coronavirus Circulated Undetected for Months Before First COVID-19 Cases Discovered in Wuhan, ChinaMike ONeill, Science, 2021
- Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern CaliforniaXianding Deng, Science, 2020
- Aiming for the end of the COVID-19 pandemic: the what, how, who, where, and whenSoriano Joan B., Chinese Medical Journal-4, 2023
- Genomic Epidemiology of Imported Cases of COVID-19 in Guangdong Province, China, October 2020 – May 2021LIANG Dan, Biomedical and Environmental Sciences, 2022
- Genomic characterization of SARS-CoV-2 identified in a reemerging COVID-19 outbreak in Beijing's Xinfadi market in 2020 | Biosafety and HealthYong Zhang, Biosafety and Health-1, 2020
- Lack of evolutionary changes identified in SARS-CoV-2 for the re-emerging outbreak of COVID-19 in Beijing, ChinaBiosafety and Health-1, 2022
- Genomic characteristics of SARS-CoV-2 from the first outbreak in clusters caused by VOC 202012/01-like variant in ChinaInternational Journal of Virology, 2022
Information & Authors
Information
Published In
Science
Volume 377 | Issue 6609
26 August 2022
26 August 2022
Copyright
Copyright © 2022 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).
This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Article versions
You are viewing the most recent version of this article.
Submission history
Received: 3 March 2022
Accepted: 18 July 2022
Published in print: 26 August 2022
Acknowledgments
We gratefully acknowledge the authors from the originating laboratories and the submitting laboratories, who generated and shared through GISAID the viral genomic sequences and metadata on which this research is based (data S1) (57). We are greatly appreciative toward L. Chen, D. Liu, and Y. Yan for providing insight into the putative intermediate genomes and clarification regarding the relative sequencing depth at positions 8782 and 28144, M. Eloit and S. Temmam for sharing their sarbecovirus dataset and recombination analysis results, and M. Kuehnert for general feedback. Figure S30 was created with Biorender.com.
Funding: This project has been funded in whole or in part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), Department of Health and Human Services, under contract 75N93021C00015 (M.W.). J.E.P. acknowledges support from NIH (T15LM011271). N.M. acknowledges support from the National Science Foundation (NSF) (NSF-2028040). J.I.L. acknowledges support from NIH (5T32AI007244-38). J.O.W. acknowledges support from NIH (R01AI135992 and R01AI136056). R.F.G. is supported by NIH (R01AI132223, R01AI132244, U19AI142790, U54CA260581, U54HG007480, and OT2HL158260), the Coalition for Epidemic Preparedness Innovation, the Wellcome Trust Foundation, Gilead Sciences, and the European and Developing Countries Clinical Trials Partnership Programme. M.A.S. and A.R. acknowledge the support from the Wellcome Trust (Collaborators Award 206298/Z/17/Z–ARTIC network), the European Research Council (grant agreement 725422–ReservoirDOCS), and NIH (R01AI153044). K.G.A. is supported by NIH (U19AI135995, U01AI151812, and UL1TR002550). E.C.H. is funded by an Australian Research Council Laureate Fellowship (FL170100022). J.L., H.P., and M.-S.P. acknowledge support from the National Research Foundation of Korea, funded by the Ministry of Science and Information and Communication Technologies, Republic of Korea (NRF-2017M3A9E4061995 and NRF-2019R1A2C2084206). T.I.V. acknowledges support from the Branco Weiss Fellowship. We thank AMD for the donation of critical hardware and support resources from its HPC Fund that made this work possible. This work was supported (in part) by the Epidemiology and Laboratory Capacity (ELC) for Infectious Diseases Cooperative Agreement [grant ELC DETECT (6NU50CK000517-01-07)] funded by the Centers for Disease Control and Prevention (CDC). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC or the Department of Health and Human Services.
Author contributions: Conceptualization: J.E.P., M.A.S., K.G.A., M.W., and J.O.W. Methodology: J.E.P., A.M., N.M., M.A.S., K.G.A., M.W., and J.O.W. Software: J.E.P., A.M., N.M., K.G., and M.A.S. Validation: J.E.P., A.M., K.I., K.G., and M.A.S. Formal analysis: J.E.P., A.M., E.P., K.I., J.L.H., K.G., and J.O.W. Investigation: J.E.P., A.M., E.P., K.I., J.L.H., K.G., and J.O.W. Resources: M.A.S., K.G.A., and J.O.W. Data curation: J.E.P., E.P., K.G., M.Z., J.C.W., S.H., J.L., H.P., M.-S.P., K.C.Z.Y., R.T.P.L., M.N.M.I., Y.M.N., and J.O.W. Writing – original draft preparation: J.E.P., M.W., and J.O.W. Writing – review and editing: All authors. Visualization: J.E.P., J.L.H., K.G., and L.M.M.S.; Supervision: M.A.S, K.G.A., M.W., and J.O.W.; Project administration: M.A.S., K.G.A., M.W., and J.O.W.; Funding acquisition: M.A.S., K.G.A., M.W., and J.O.W.
Competing interests: J.O.W. has received funding from the CDC (ongoing) through contracts or agreements to his institution unrelated to this research. M.A.S. receives contracts and grants from the US Food and Drug Administration, the US Department of Veterans Affairs, and Janssen Research and Development unrelated to this research. R.F.G. is cofounder of Zalgen Labs, a biotechnology company developing countermeasures to emerging viruses. M.W., E.C.H., A.R., M.A.S., J.O.W., and K.G.A. have received consulting fees and/or provided compensated expert testimony on SARS-CoV-2 and the COVID-19 pandemic.
Data and materials availability: Genome accessions are available in data S1 and S2, and raw data for two genomes were deposited to NCBI SRA (PRJNA806767 and PRJNA802993). Code is available on Zenodo (64). The following data are available on Zenodo (65): recCA sequence, BEAST phylogenetic inference output, and simulation and rejection sampling output for the primary analysis.
License information: This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
Authors
Funding Information
National Science Foundation: NSF-2028040
National Institutes of Health: 75N93021C00015
National Institutes of Health: T15LM011271
National Institutes of Health: 5T32AI007244-38
National Institutes of Health: R01AI135992
National Institutes of Health: R01AI136056
National Institutes of Health: R01AI132223
National Institutes of Health: R01AI132244
Australian Research Council Laureate Fellowship: FL170100022
Centers for Disease Control and Prevention: 6NU50CK000517-01-07
National Institutes of Health: U19AI142790
National Institutes of Health: U54CA260581
National Institutes of Health: U54HG007480
National Institutes of Health: OT2HL158260
National Institutes of Health: U19AI135995
National Institutes of Health: U01AI151812
National Institutes of Health: UL1TR002550
National Research Foundation of Korea: NRF-2017M3A9E4061995
National Research Foundation of Korea: NRF-2019R1A2C2084206
Branco Weiss Fellowship
Wellcome: 206298/Z/17/Z
European Research Council: 725422
National Institutes of Health: R01AI153044
Metrics & Citations
Metrics
Article Usage
Altmetrics
Citations
Cite as
- Jonathan E. Pekar et al.
Export citation
Select the format you want to export the citation of this publication.
Cited by
- EVOLUÇÃO DA COVID-19 NA CIDADE DE ITABAIANA, SERGIPE E SEU PAPEL NA DISSEMINAÇÃO DA DOENÇA NA REGIÃO, REVISTA FOCO, 17, 1, (e4078), (2024).https://doi.org/10.54751/revistafoco.v17n1-196
- COVID-19 Epidemic Process and Evolution of SARS-CoV-2 Genetic Variants in the Russian Federation, Microbiology Research, 15, 1, (213-224), (2024).https://doi.org/10.3390/microbiolres15010015
- Addressing Inequality in the COVID-19 Pandemic in Africa: A Snapshot from Clinical Symptoms to Vaccine Distribution, COVID, 4, 2, (170-190), (2024).https://doi.org/10.3390/covid4020014
- Comparative Pathogenesis of Severe Acute Respiratory Syndrome Coronaviruses, Annual Review of Pathology: Mechanisms of Disease, 19, 1, (423-451), (2024).https://doi.org/10.1146/annurev-pathol-052620-121224
- Virology—the path forward, Journal of Virology, 98, 1, (2024).https://doi.org/10.1128/jvi.01791-23
- Rational in silico design identifies two mutations that restore UT28K SARS-CoV-2 monoclonal antibody activity against Omicron BA.1, Structure, (2024).https://doi.org/10.1016/j.str.2023.12.013
- Ratiometric SERS sensing chip for high precision and ultra-sensitive detection of SARS-CoV-2 RNA in human saliva, Sensors and Actuators B: Chemical, 399, (134803), (2024).https://doi.org/10.1016/j.snb.2023.134803
- SARS-CoV-2 infection in animals: Patterns, transmission routes, and drivers, Eco-Environment & Health, 3, 1, (45-54), (2024).https://doi.org/10.1016/j.eehl.2023.09.004
- Approaches and challenges to inferring the geographical source of infectious disease outbreaks using genomic data, The Lancet Microbe, 5, 1, (e81-e92), (2024).https://doi.org/10.1016/S2666-5247(23)00296-3
- Hotspots of zoonotic disease risk from wildlife hunting and trade in the tropics, Integrative Conservation, 2, 4, (165-175), (2024).https://doi.org/10.1002/inc3.34
- See more
Loading...
View Options
View options
PDF format
Download this article as a PDF file
Download PDFMedia
Figures
Fig. 1. Maximum likelihood phylogeny of the early SARS-CoV-2 pandemic, showing nucleotide reversions and putative candidates for the ancestral haplotype at the MRCA.
Putative ancestral haplotypes are identified with colored shapes. Reversions from the Hu-1 reference genome to the recCA are colored. Blue indicates C-to-T reversions, and black indicates all other reversions. The tree is rooted on Hu-1 to show reversion dynamics to the recCA.
Fig. 2. Probability of phylogenetic structures arising from a single introduction of SARS-CoV-2 in epidemic simulations.
(A) A large polytomy of at least 100 descendent lineages, which is consistent with the base of both lineages A and B. (B) Topology matching a C/C ancestral haplotype: two clades, each one mutation from the ancestor, both with polytomies of at least 100 descendent lineages. (C) Topology matching either a lineage A or lineage B ancestral haplotype: a basal polytomy with at least 100 descendent lineages, including a large clade separated by two mutations, also possessing a polytomy of at least 100 descendent lineages. Basal taxa have short branch lengths for clarity. The probability of each phylogenetic structure after a single introduction is reported in the respective boxes.
Fig. 3. Comparison of the tMRCA and primary case dates for lineage A and lineage B in late 2019 across rooting strategies.
Each row represents a different rooting constraint in phylodynamic analysis, with lineage B, C/C, and lineage A representing a fixed ancestral haplotype. (A) The tMRCA for lineages A and B. (B) The number of weeks the tMRCA of lineage A occurs after the tMRCA of lineage B. (C) The timing of the primary case for lineages A and B. (D) The number of weeks the time of the primary case of lineage A occurs after the time of the primary case of lineage B. Long dashed lines indicate the median, and shading indicates the 95% HPD for each distribution. Short dashed lines indicate 0 weeks difference between lineages A and B. Posterior probability that lineage A originated after lineage B is reported in the gray box in each graph in (B) and (D).
Fig. 4. Dynamics of simulated SARS-CoV-2 epidemics resulting from separate introductions of lineages A and B in late 2019.
Each row represents a different rooting constraint in phylodynamic analysis, with lineage B, C/C, and lineage A representing a fixed ancestral haplotype. (A) Estimated number of infections. The header of each column indicates whether the number of infections is caused by lineage A, lineage B, or the two lineages combined. Darker and lighter shading indicates the 50 and 95% HPD, respectively. (B) The log ratio of lineage B to lineage A infections on 15 December 2019. Posterior probability of having more lineage B infections than lineage A reported in the gray box in each graph.
Multimedia
Tables
Table 1. Posterior probabilities of inferred ancestral haplotype at the MRCA of SARS-CoV-2.
References
References
1
E. Dong, H. Du, L. Gardner, An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
2
L.-L. Ren, Y.-M. Wang, Z.-Q. Wu, Z.-C. Xiang, L. Guo, T. Xu, Y.-Z. Jiang, Y. Xiong, Y.-J. Li, X.-W. Li, H. Li, G.-H. Fan, X.-Y. Gu, Y. Xiao, H. Gao, J.-Y. Xu, F. Yang, X.-M. Wang, C. Wu, L. Chen, Y.-W. Liu, B. Liu, J. Yang, X.-R. Wang, J. Dong, L. Li, C.-L. Huang, J.-P. Zhao, Y. Hu, Z.-S. Cheng, L.-L. Liu, Z.-H. Qian, C. Qin, Q. Jin, B. Cao, J.-W. Wang, Identification of a novel coronavirus causing severe pneumonia in human: A descriptive study. Chin. Med. J. 133, 1015–1024 (2020).
3
H. Ritchie, E. Mathieu, L. Rodés-Guirao, C. Appel, C. Giattino, E. Ortiz-Ospina, J. Hasell, B. Macdonald, S. Beltekian, X. Roser, Coronavirus Pandemic (COVID-19). Our World in Data (2022); https://ourworldindata.org/covid-deaths.
4
A. Rambaut, E. C. Holmes, Á. O’Toole, V. Hill, J. T. McCrone, C. Ruis, L. du Plessis, O. G. Pybus, A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
5
F. Wu, S. Zhao, B. Yu, Y.-M. Chen, W. Wang, Z.-G. Song, Y. Hu, Z.-W. Tao, J.-H. Tian, Y.-Y. Pei, M.-L. Yuan, Y.-L. Zhang, F.-H. Dai, Y. Liu, Q.-M. Wang, J.-J. Zheng, L. Xu, E. C. Holmes, Y.-Z. Zhang, A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
6
R. Lu, X. Zhao, J. Li, P. Niu, B. Yang, H. Wu, W. Wang, H. Song, B. Huang, N. Zhu, Y. Bi, X. Ma, F. Zhan, L. Wang, T. Hu, H. Zhou, Z. Hu, W. Zhou, L. Zhao, J. Chen, Y. Meng, J. Wang, Y. Lin, J. Yuan, Z. Xie, J. Ma, W. J. Liu, D. Wang, W. Xu, E. C. Holmes, G. F. Gao, G. Wu, W. Chen, W. Shi, W. Tan, Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet 395, 565–574 (2020).
7
S. Lytras, J. Hughes, D. Martin, P. Swanepoel, A. de Klerk, R. Lourens, S. L. Kosakovsky Pond, W. Xia, X. Jiang, D. L. Robertson, Exploring the natural origins of SARS-CoV-2 in the light of recombination. Genome Biol. Evol. 14, evac018 (2022).
8
M. Worobey, Dissecting the early COVID-19 cases in Wuhan. Science 374, 1202–1204 (2021).
9
R. F. Garry, Early appearance of two distinct genomic lineages of SARS-CoV-2 in different Wuhan wildlife markets suggests SARS-CoV-2 has a natural origin. Virological (2021); https://virological.org/t/early-appearance-of-two-distinct-genomic-lineages-of-sars-cov-2-in-different-wuhan-wildlife-markets-suggests-sars-cov-2-has-a-natural-origin/691.
10
N. De Maio, C. Walker, R. Borges, L. Weilguny, G. Slodkowicz, N. Goldman, Issues with SARS-CoV-2 sequencing data. Virological (2020); https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473.
11
M. Worobey, J. Pekar, B. B. Larsen, M. I. Nelson, V. Hill, J. B. Joy, A. Rambaut, M. A. Suchard, J. O. Wertheim, P. Lemey, The emergence of SARS-CoV-2 in Europe and North America. Science 370, 564–570 (2020).
12
J. O. Wertheim, M. Steel, M. J. Sanderson, Accuracy in Near-Perfect Virus Phylogenies. Syst. Biol. 71, 426–438 (2022).
13
S. Temmam, K. Vongphayloth, E. Baquero, S. Munier, M. Bonomi, B. Regnault, B. Douangboubpha, Y. Karami, D. Chrétien, D. Sanamxay, V. Xayaphet, P. Paphaphanh, V. Lacoste, S. Somlor, K. Lakeomany, N. Phommavanh, P. Pérot, O. Dehan, F. Amara, F. Donati, T. Bigot, M. Nilges, F. A. Rey, S. van der Werf, P. T. Brey, M. Eloit, Bat coronaviruses related to SARS-CoV-2 and infectious for human cells. Nature 604, 330–336 (2022).
14
J. B. Pease, M. W. Hahn, More accurate phylogenies inferred from low-recombination regions in the presence of incomplete lineage sorting. Evolution 67, 2376–2384 (2013).
15
J. Ratcliff, P. Simmonds, Potential APOBEC-mediated RNA editing of the genomes of SARS-CoV-2 and other coronaviruses and its impact on their longer term evolution. Virology 556, 62–72 (2021).
16
P. Simmonds, Rampant C→U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: Causes and consequences for their short- and long-term evolutionary trajectories. MSphere 5, e00408-20 (2020).
17
P. Simmonds, M. A. Ansari, Extensive C->U transition biases in the genomes of a wide range of mammalian RNA viruses; potential associations with transcriptional mutations, damage- or host-mediated editing of viral RNA. PLOS Pathog. 17, e1009596 (2021).
18
P. Forster, L. Forster, C. Renfrew, M. Forster, Phylogenetic network analysis of SARS-CoV-2 genomes. Proc. Natl. Acad. Sci. U.S.A. 117, 9241–9243 (2020).
19
J. D. Bloom, Recovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemic. Mol. Biol. Evol. 38, 5211–5224 (2021).
20
M. A. Caraballo-Ortiz, S. Miura, M. Sanderford, T. Dolker, Q. Tao, S. Weaver, S. L. K. Pond, S. Kumar, TopHap: Rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity. Bioinformatics 38, 2719–2726 (2022).
21
S. Kumar, Q. Tao, S. Weaver, M. Sanderford, M. A. Caraballo-Ortiz, S. Sharma, S. L. K. Pond, S. Miura, An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic. Mol. Biol. Evol. 38, 3046–3059 (2021).
22
N. Moshiri, M. Ragonnet-Cronin, J. O. Wertheim, S. Mirarab, FAVITES: Simultaneous simulation of transmission networks, phylogenetic trees and sequences. Bioinformatics 35, 1852–1861 (2019).
23
J. Pekar, M. Worobey, N. Moshiri, K. Scheffler, J. O. Wertheim, Timing the SARS-CoV-2 index case in Hubei province. Science 372, 412–417 (2021).
24
S. Hsiang, D. Allen, S. Annan-Phan, K. Bell, I. Bolliger, T. Chong, H. Druckenmiller, L. Y. Huang, A. Hultgren, E. Krasovich, P. Lau, J. Lee, E. Rolf, J. Tseng, T. Wu, The effect of large-scale anti-contagion policies on the COVID-19 pandemic. Nature 584, 262–267 (2020).
25
A. L. Bertozzi, E. Franco, G. Mohler, M. B. Short, D. Sledge, The challenges of modeling and forecasting the spread of COVID-19. Proc. Natl. Acad. Sci. U.S.A. 117, 16732–16738 (2020).
26
S. Sanche, Y. T. Lin, C. Xu, E. Romero-Severson, N. Hengartner, R. Ke, High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg. Infect. Dis. 26, 1470–1477 (2020).
27
T. Bedford, A. L. Greninger, P. Roychoudhury, L. M. Starita, M. Famulare, M.-L. Huang, A. Nalla, G. Pepper, A. Reinhardt, H. Xie, L. Shrestha, T. N. Nguyen, A. Adler, E. Brandstetter, S. Cho, D. Giroux, P. D. Han, K. Fay, C. D. Frazar, M. Ilcisin, K. Lacombe, J. Lee, A. Kiavand, M. Richardson, T. R. Sibley, M. Truong, C. R. Wolf, D. A. Nickerson, M. J. Rieder, J. A. Englund, J. Hadfield, E. B. Hodcroft, J. Huddleston, L. H. Moncla, N. F. Müller, R. A. Neher, X. Deng, W. Gu, S. Federman, C. Chiu, J. S. Duchin, R. Gautom, G. Melly, B. Hiatt, P. Dykema, S. Lindquist, K. Queen, Y. Tao, A. Uehara, S. Tong, D. MacCannell, G. L. Armstrong, G. S. Baird, H. Y. Chu, J. Shendure, K. R. Jerome, H. Y. Chu, M. Boeckh, J. A. Englund, M. Famulare, B. R. Lutz, D. A. Nickerson, M. J. Rieder, L. M. Starita, M. Thompson, J. Shendure, T. Bedford, A. Adler, E. Brandstetter, S. Cho, C. D. Frazar, D. Giroux, P. D. Han, J. Hadfield, S. Huang, M. L. Jackson, A. Kiavand, L. E. Kimball, K. Lacombe, J. Logue, V. Lyon, K. L. Newman, M. Richardson, T. R. Sibley, M. L. Zigman Suchsland, M. Truong, C. R. Wolf, Seattle Flu Study Investigators, Cryptic transmission of SARS-CoV-2 in Washington state. Science 370, 571–575 (2020).
28
M. Zeller, K. Gangavarapu, C. Anderson, A. R. Smither, J. A. Vanchiere, R. Rose, D. J. Snyder, G. Dudas, A. Watts, N. L. Matteson, R. Robles-Sikisaka, M. Marshall, A. K. Feehan, G. Sabino-Santos Jr., A. R. Bell-Kareem, L. D. Hughes, M. Alkuzweny, P. Snarski, J. Garcia-Diaz, R. S. Scott, L. I. Melnik, R. Klitting, M. McGraw, P. Belda-Ferre, P. DeHoff, S. Sathe, C. Marotz, N. D. Grubaugh, D. J. Nolan, A. C. Drouin, K. J. Genemaras, K. Chao, S. Topol, E. Spencer, L. Nicholson, S. Aigner, G. W. Yeo, L. Farnaes, C. A. Hobbs, L. C. Laurent, R. Knight, E. B. Hodcroft, K. Khan, D. N. Fusco, V. S. Cooper, P. Lemey, L. Gardner, S. L. Lamers, J. P. Kamil, R. F. Garry, M. A. Suchard, K. G. Andersen, Emergence of an early SARS-CoV-2 epidemic in the United States. Cell 184, 4939–4952.e15 (2021).
29
C. Alteri, V. Cento, A. Piralla, V. Costabile, M. Tallarita, L. Colagrossi, S. Renica, F. Giardina, F. Novazzi, S. Gaiarsa, E. Matarazzo, M. Antonello, C. Vismara, R. Fumagalli, O. M. Epis, M. Puoti, C. F. Perno, F. Baldanti, Genomic epidemiology of SARS-CoV-2 reveals multiple lineages and early spread of SARS-CoV-2 infections in Lombardy, Italy. Nat. Commun. 12, 434 (2021).
30
L. du Plessis, O. Pybus, Further musings on the tMRCA. Virological (2020); https://virological.org/t/further-musings-on-the-tmrca/340.
31
J. Giesecke, Primary and index cases. Lancet 384, 2024 (2014).
32
Centers for Disease Control and Prevention (CDC), Prevalence of IgG antibody to SARS-associated coronavirus in animal traders—Guangdong Province, China, 2003. MMWR Morb. Mortal. Wkly. Rep. 52, 986–987 (2003).
33
A. Marí Saéz, S. Weiss, K. Nowak, V. Lapeyre, F. Zimmermann, A. Düx, H. S. Kühl, M. Kaba, S. Regnaut, K. Merkel, A. Sachse, U. Thiesen, L. Villányi, C. Boesch, P. W. Dabrowski, A. Radonić, A. Nitsche, S. A. J. Leendertz, S. Petterson, S. Becker, V. Krähling, E. Couacy-Hymann, C. Akoua-Koffi, N. Weber, L. Schaade, J. Fahr, M. Borchert, J. F. Gogarten, S. Calvignac-Spencer, F. H. Leendertz, Investigating the zoonotic origin of the West African Ebola epidemic. EMBO Mol. Med. 7, 17–23 (2015).
34
WHO Headquarters, WHO-convened global study of origins of SARS-CoV-2: China Part (2021); https://www.who.int/publications/i/item/who-convened-global-study-of-origins-of-sars-cov-2-china-part.
35
X. Zhang, Y. Tan, Y. Ling, G. Lu, F. Liu, Z. Yi, X. Jia, M. Wu, B. Shi, S. Xu, J. Chen, W. Wang, B. Chen, L. Jiang, S. Yu, J. Lu, J. Wang, M. Xu, Z. Yuan, Q. Zhang, X. Zhang, G. Zhao, S. Wang, S. Chen, H. Lu, Viral and host factors related to the clinical outcome of COVID-19. Nature 583, 437–440 (2020).
36
E. O. Nsoesie, B. Rader, Y. L. Barnoon, L. Goodwin, J. Brownstein, Analysis of hospital traffic and search engine data in Wuhan China indicates early disease activity in the Fall of 2019. Dig. Acc. Scholar. Harv. 2, 019 (2020).
37
L. Chang, L. Zhao, Y. Xiao, T. Xu, L. Chen, Y. Cai, X. Dong, C. Wang, X. Xiao, L. Ren, L. Wang, Serosurvey for SARS-CoV-2 among blood donors in Wuhan, China from September to December 2019. Protein Cell 10.1093/procel/pwac013 (2022).
38
E. C. Holmes, S. A. Goldstein, A. L. Rasmussen, D. L. Robertson, A. Crits-Christoph, J. O. Wertheim, S. J. Anthony, W. S. Barclay, M. F. Boni, P. C. Doherty, J. Farrar, J. L. Geoghegan, X. Jiang, J. L. Leibowitz, S. J. D. Neil, T. Skern, S. R. Weiss, M. Worobey, K. G. Andersen, R. F. Garry, A. Rambaut, The origins of SARS-CoV-2: A critical review. Cell 184, 4848–4856 (2021).
39
M. Worobey, J. I. Levy, L. M. Malpica Serrano, A. Crits-Christoph, J. E. Pekar, S. A. Goldstein, A. L. Rasmussen, M. U. G. Kraemer, C. Newman, M. P. G. Koopmans, M. A. Suchard, J. O. Wertheim, P. Lemey, D. L. Robertson, R. F. Garry, E. C. Holmes, A. Rambaut, K. G. Andersen, The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic. Science 377, 951–959 (2022).
40
X. Xiao, C. Newman, C. D. Buesching, D. W. Macdonald, Z.-M. Zhou, Animal sales from Wuhan wet markets immediately prior to the COVID-19 pandemic. Sci. Rep. 11, 11898 (2021).
41
C. M. Freuling, A. Breithaupt, T. Müller, J. Sehl, A. Balkema-Buschmann, M. Rissmann, A. Klein, C. Wylezich, D. Höper, K. Wernike, A. Aebischer, D. Hoffmann, V. Friedrichs, A. Dorhoi, M. H. Groschup, M. Beer, T. C. Mettenleiter, Susceptibility of raccoon dogs for experimental SARS-CoV-2 infection. Emerg. Infect. Dis. 26, 2982–2985 (2020).
42
S. M. Porter, A. E. Hartwig, H. Bielefeldt-Ohmann, A. M. Bosco-Lauth, J. Root, Susceptibility of wild canids to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). bioRxiv 478082 [Preprint] (2022); https://doi.org/10.1101/2022.01.27.478082.
43
G. Gao, W. Liu, P. Liu, W. Lei, Z. Jia, X. He, L.-L. Liu, W. Shi, Y. Tan, S. Zou, X. Zhao, G. Wong, J. Wang, F. Wang, G. Wang, K. Qin, R. Gao, J. Zhang, M. Li, W. Xiao, Y. Guo, Z. Xu, Y. Zhao, J. Song, J. Zhang, W. Zhen, W. Zhou, B. Ye, J. Song, M. Yang, W. Zhou, Y. Bi, K. Cai, D. Wang, W. Tan, J. Han, W. Xu, G. Wu, Surveillance of SARS-CoV-2 in the environment and animal samples of the Huanan Seafood Market.Research Square [Preprint] (2022); https://doi.org/10.21203/rs.3.rs-1370392/v1.
44
L. du Plessis, J. T. McCrone, A. E. Zarebski, V. Hill, C. Ruis, B. Gutierrez, J. Raghwani, J. Ashworth, R. Colquhoun, T. R. Connor, N. R. Faria, B. Jackson, N. J. Loman, Á. O’Toole, S. M. Nicholls, K. V. Parag, E. Scher, T. I. Vasylyeva, E. M. Volz, A. Watts, I. I. Bogoch, K. Khan, D. M. Aanensen, M. U. G. Kraemer, A. Rambaut, O. G. Pybus; COVID-19 Genomics UK (COG-UK) Consortium, Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021).
45
Chinese SARS Molecular Epidemiology Consortium, Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. Science 303, 1666–1669 (2004).
46
G. Dudas, L. M. Carvalho, A. Rambaut, T. Bedford, MERS-CoV spillover at the camel-human interface. eLife 7, e31257 (2018).
47
J. A. Lednicky, M. S. Tagliamonte, S. K. White, M. A. Elbadry, M. M. Alam, C. J. Stephenson, T. S. Bonny, J. C. Loeb, T. Telisma, S. Chavannes, D. A. Ostrov, C. Mavian, V. M. Beau De Rochars, M. Salemi, J. G. Morris Jr., Independent infections of porcine deltacoronavirus among Haitian children. Nature 600, 133–137 (2021).
48
B. Kan, M. Wang, H. Jing, H. Xu, X. Jiang, M. Yan, W. Liang, H. Zheng, K. Wan, Q. Liu, B. Cui, Y. Xu, E. Zhang, H. Wang, J. Ye, G. Li, M. Li, Z. Cui, X. Qi, K. Chen, L. Du, K. Gao, Y.-T. Zhao, X.-Z. Zou, Y.-J. Feng, Y.-F. Gao, R. Hai, D. Yu, Y. Guan, J. Xu, Molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and on farms. J. Virol. 79, 11892–11900 (2005).
49
K. G. Andersen, A. Rambaut, W. I. Lipkin, E. C. Holmes, R. F. Garry, The proximal origin of SARS-CoV-2. Nat. Med. 26, 450–452 (2020).
50
V. L. Hale, P. M. Dennis, D. S. McBride, J. M. Nolting, C. Madden, D. Huey, M. Ehrlich, J. Grieser, J. Winston, D. Lombardi, S. Gibson, L. Saif, M. L. Killian, K. Lantz, R. M. Tell, M. Torchetti, S. Robbe-Austerman, M. I. Nelson, S. A. Faith, A. S. Bowman, SARS-CoV-2 infection in free-ranging white-tailed deer. Nature 602, 481–486 (2022).
51
J. C. Chandler, S. N. Bevins, J. W. Ellis, T. J. Linder, R. M. Tell, M. Jenkins-Moore, J. J. Root, J. B. Lenoch, S. Robbe-Austerman, T. J. DeLiberto, T. Gidlewski, M. Kim Torchetti, S. A. Shriner, SARS-CoV-2 exposure in wild white-tailed deer (Odocoileus virginianus). Proc. Natl. Acad. Sci. U.S.A. 118, e2114828118 (2021).
52
L. Lu, R. S. Sikkema, F. C. Velkers, D. F. Nieuwenhuijse, E. A. J. Fischer, P. A. Meijer, N. Bouwmeester-Vincken, A. Rietveld, M. C. A. Wegdam-Blans, P. Tolsma, M. Koppelman, L. A. M. Smit, R. W. Hakze-van der Honing, W. H. M. van der Poel, A. N. van der Spek, M. A. H. Spierenburg, R. J. Molenaar, J. Rond, M. Augustijn, M. Woolhouse, J. A. Stegeman, S. Lycett, B. B. Oude Munnink, M. P. G. Koopmans, Adaptation, spread and transmission of SARS-CoV-2 in farmed minks and associated humans in the Netherlands. Nat. Commun. 12, 6802 (2021).
53
B. B. Oude Munnink, R. S. Sikkema, D. F. Nieuwenhuijse, R. J. Molenaar, E. Munger, R. Molenkamp, A. van der Spek, P. Tolsma, A. Rietveld, M. Brouwer, N. Bouwmeester-Vincken, F. Harders, R. Hakze-van der Honing, M. C. A. Wegdam-Blans, R. J. Bouwstra, C. GeurtsvanKessel, A. A. van der Eijk, F. C. Velkers, L. A. M. Smit, A. Stegeman, W. H. M. van der Poel, M. P. G. Koopmans, Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 371, 172–177 (2021).
54
S. V. Kuchipudi, M. Surendran-Nair, R. M. Ruden, M. Yon, R. H. Nissly, K. J. Vandegrift, R. K. Nelli, L. Li, B. M. Jayarao, C. D. Maranas, N. Levine, K. Willgert, A. J. K. Conlan, R. J. Olsen, J. J. Davis, J. M. Musser, P. J. Hudson, V. Kapur, Multiple spillovers from humans and onward transmission of SARS-CoV-2 in white-tailed deer. Proc. Natl. Acad. Sci. U.S.A. 119, e2121644119 (2022).
55
H.-L. Yen, T. H. C. Sit, C. J. Brackman, S. S. Y. Chuk, S. M. S. Cheng, H. Gu, L. D. J. Chang, P. Krishnan, D. Y. M. Ng, G. Y. Z. Liu, M. M. Y. Hui, S. Y. Ho, K. W. S. Tam, P. Y. T. Law, W. Su, S. F. Sia, K.-T. Choy, S. S. Y. Cheuk, S. P. N. Lau, A. W. Y. Tang, J. C. T. Koo, L. Yung, G. Leung, J. S. M. Peiris, L. L. M. Poon, Transmission of SARS-CoV-2 delta variant (AY.127) from pet hamsters to humans, leading to onward human-to-human transmission: A case study. Lancet 399, 1070–1078 (2022).
56
Y. Shu, J. McCauley, GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 22, 30494 (2017).
57
K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
58
N. De Maio, C. Walker, R. Borges, L. Weilguny, G. Slodkowicz, N. Goldman, Masking strategies for SARS-CoV-2 alignments. Virological (2020); https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480.
59
B. Q. Minh, H. A. Schmidt, O. Chernomor, D. Schrempf, M. D. Woodhams, A. von Haeseler, R. Lanfear, IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37, 1530–1534 (2020).
60
P. Sagulenko, V. Puller, R. A. Neher, TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 4, vex042 (2018).
61
M. A. Suchard, P. Lemey, G. Baele, D. L. Ayres, A. J. Drummond, A. Rambaut, Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
62
N. Moshiri, FAVITES-COVID-Lite: A simplified (and much faster) simulation pipeline specifically for COVID-19 contact + transmission + phylogeny + sequence simulation. Github (2022); https://github.com/niemasd/FAVITES-COVID-Lite.
63
X. Hao, S. Cheng, D. Wu, T. Wu, X. Lin, C. Wang, Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 584, 420–424 (2020).
64
J. E. Pekar et al., Code for: The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Zenodo (2022); 10.5281/zenodo.6585475.
65
J. E. Pekar et al., Data for: The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Zenodo (2022); 10.5281/zenodo.6887186.
66
J. Hadfield, C. Megill, S. M. Bell, J. Huddleston, B. Potter, C. Callender, P. Sagulenko, T. Bedford, R. A. Neher, Nextstrain: Real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
68
H. Li, Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
69
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
70
N. D. Grubaugh, K. Gangavarapu, J. Quick, N. L. Matteson, J. G. De Jesus, B. J. Main, A. L. Tan, L. M. Paul, D. E. Brackney, S. Grewal, N. Gurfield, K. K. A. Van Rompay, S. Isern, S. F. Michael, L. L. Coffey, N. J. Loman, K. G. Andersen, An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8 (2019).
72
G. Dudas, baltic: baltic - backronymed adaptable lightweight tree import code for molecular phylogeny manipulation, analysis and visualisation. Github (2021); https://github.com/evogytis/baltic.
73
S. L. Kosakovsky Pond, D. Posada, M. B. Gravenor, C. H. Woelk, S. D. W. Frost, GARD: A genetic algorithm for recombination detection. Bioinformatics 22, 3096–3098 (2006).
74
D. P. Martin, B. Murrell, M. Golden, A. Khoosal, B. Muhire, RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).
75
H. M. Lam, O. Ratmann, M. F. Boni, Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Mol. Biol. Evol. 35, 247–251 (2018).
76
M. F. Boni, P. Lemey, X. Jiang, T. T.-Y. Lam, B. W. Perry, T. A. Castoe, A. Rambaut, D. L. Robertson, Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat. Microbiol. 5, 1408–1417 (2020).
77
A. Rambaut, T. T. Lam, L. Max Carvalho, O. G. Pybus, Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
78
A. Rambaut, A. J. Drummond, D. Xie, G. Baele, M. A. Suchard, Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
79
F. Li, Y.-Y. Li, M.-J. Liu, L.-Q. Fang, N. E. Dean, G. W. K. Wong, X.-B. Yang, I. Longini, M. E. Halloran, H.-J. Wang, P.-L. Liu, Y.-H. Pang, Y.-Q. Yan, S. Liu, W. Xia, X.-X. Lu, Q. Liu, Y. Yang, S.-Q. Xu, Household transmission of SARS-CoV-2 and risk factors for susceptibility and infectivity in Wuhan: A retrospective observational study. Lancet Infect. Dis. 21, 617–628 (2021).
80
S. Funk, EpiNow2: Estimate realtime case counts and time-varying epidemiological parameters. Github (2020); https://github.com/epiforecasts/EpiNow2.
81
N. Moshiri, NiemaGraphGen: A memory-efficient global-scale contact network simulation toolkit. GIGAbyte 10.46471/gigabyte.37 (2022).
82
A. L. Barabasi, R. Albert, Emergence of scaling in random networks. Science 286, 509–512 (1999).
83
S. Eubank, H. Guclu, V. S. Kumar, M. V. Marathe, A. Srinivasan, Z. Toroczkai, N. Wang, Modelling disease outbreaks in realistic urban social networks. Nature 429, 180–184 (2004).
84
J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, J. Heijne, M. Sadkowska-Todys, M. Rosinska, W. J. Edmunds, Social contacts and mixing patterns relevant to the spread of infectious diseases. PLOS Med. 5, e74 (2008).
85
F. D. Sahneh, A. Vajdi, H. Shakeri, F. Fan, C. Scoglio, GEMFsim: A stochastic simulator for the generalized epidemic modeling framework. J. Comput. Sci. 22, 36–44 (2017).
86
X. Yang, Y. Yu, J. Xu, H. Shu, J. Xia, H. Liu, Y. Wu, L. Zhang, Z. Yu, M. Fang, T. Yu, Y. Wang, S. Pan, X. Zou, S. Yuan, Y. Shang, Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: A single-centered, retrospective, observational study. Lancet Respir. Med. 8, 475–481 (2020).
87
F. Zhou, T. Yu, R. Du, G. Fan, Y. Liu, Z. Liu, J. Xiang, Y. Wang, B. Song, X. Gu, L. Guan, Y. Wei, H. Li, X. Wu, J. Xu, S. Tu, Y. Zhang, H. Chen, B. Cao, Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet 395, 1054–1062 (2020).
88
J. Yang, X. Chen, X. Deng, Z. Chen, H. Gong, H. Yan, Q. Wu, H. Shi, S. Lai, M. Ajelli, C. Viboud, P. H. Yu, Disease burden and clinical severity of the first pandemic wave of COVID-19 in Wuhan, China. Nat. Commun. 11, 5411 (2020).
89
N. Moshiri, TreeSwift: A massively scalable Python tree package. SoftwareX 11, 100436 (2020).
90
J. Ma, First Chinese coronavirus cases may have been infected in October 2019, says new research. South China Morning Post (2021); https://www.scmp.com/news/china/science/article/3126499/first-chinese-covid-19-cases-may-have-been-infected-october-2019.
91
K. Andersen, Clock and TMRCA based on 27 genomes. Virological (2020); https://virological.org/t/clock-and-tmrca-based-on-27-genomes/347/6.
92
L. Pipes, H. Wang, J. P. Huelsenbeck, R. Nielsen, Assessing uncertainty in the rooting of the SARS-CoV-2 phylogeny. Mol. Biol. Evol. 38, 1537–1543 (2021).
93
T. Murata, A. Sakurai, M. Suzuki, S. Komoto, T. Ide, T. Ishihara, Y. Doi, Shedding of viable virus in asymptomatic SARS-CoV-2 carriers. MSphere 6, e00019-21 (2021).
94
T. Sekizuka, K. Itokawa, T. Kageyama, S. Saito, I. Takayama, H. Asanuma, N. Nao, R. Tanaka, M. Hashino, T. Takahashi, H. Kamiya, T. Yamagishi, K. Kakimoto, M. Suzuki, H. Hasegawa, T. Wakita, M. Kuroda, Haplotype networks of SARS-CoV-2 infections in the Diamond Princess cruise ship outbreak. Proc. Natl. Acad. Sci. U.S.A. 117, 20198–20201 (2020).
95
Y. Turakhia, B. Thornlow, A. S. Hinrichs, N. De Maio, L. Gozashti, R. Lanfear, D. Haussler, R. Corbett-Detig, Ultrafast sample placement on existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet. 53, 809–816 (2021).
96
P. Zhou, X.-L. Yang, X.-G. Wang, B. Hu, L. Zhang, W. Zhang, H.-R. Si, Y. Zhu, B. Li, C.-L. Huang, H.-D. Chen, J. Chen, Y. Luo, H. Guo, R.-D. Jiang, M.-Q. Liu, Y. Chen, X.-R. Shen, X. Wang, X.-S. Zheng, K. Zhao, Q.-J. Chen, F. Deng, L.-L. Liu, B. Yan, F.-X. Zhan, Y.-Y. Wang, G.-F. Xiao, Z.-L. Shi, A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020).
97
M. Ghafari, L. du Plessis, J. Raghwani, S. Bhatt, B. Xu, O. G. Pybus, A. Katzourakis, Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. Mol. Biol. Evol. 39, msac009 (2022).
98
S. Duchêne, E. C. Holmes, S. Y. W. Ho, Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proc. Biol. Sci. 281, 20140732 (2014).
99
J. Dushoff, S. W. Park, Speed and strength of an epidemic intervention. Proc. Biol. Sci. 288, 20201556 (2021).
100
J. T. Wu, K. Leung, M. Bushman, N. Kishore, R. Niehus, P. M. de Salazar, B. J. Cowling, M. Lipsitch, G. M. Leung, Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China. Nat. Med. 26, 506–510 (2020).
101
C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu, L. Zhang, G. Fan, J. Xu, X. Gu, Z. Cheng, T. Yu, J. Xia, Y. Wei, W. Wu, X. Xie, W. Yin, H. Li, M. Liu, Y. Xiao, H. Gao, L. Guo, J. Xie, G. Wang, R. Jiang, Z. Gao, Q. Jin, J. Wang, B. Cao, Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020).
102
R. Ke, E. Romero-Severson, S. Sanche, N. Hengartner, Estimating the reproductive number R0 of SARS-CoV-2 in the United States and eight European countries and implications for vaccination. J. Theor. Biol. 517, 110621 (2021).
103
L. Pellis, F. Scarabel, H. B. Stage, C. E. Overton, L. H. K. Chappell, E. Fearon, E. Bennett, K. A. Lythgoe, T. A. House, I. Hall; University of Manchester COVID-19 Modelling Group, Challenges in control of COVID-19: Short doubling time and long delay to effect of interventions. Philos. Trans. R. Soc. Lond. B Biol. Sci. 376, 20200264 (2021).
104
Q. Li, X. Guan, P. Wu, X. Wang, L. Zhou, Y. Tong, R. Ren, K. S. M. Leung, E. H. Y. Lau, J. Y. Wong, X. Xing, N. Xiang, Y. Wu, C. Li, Q. Chen, D. Li, T. Liu, J. Zhao, M. Liu, W. Tu, C. Chen, L. Jin, R. Yang, Q. Wang, S. Zhou, R. Wang, H. Liu, Y. Luo, Y. Liu, G. Shao, H. Li, Z. Tao, Y. Yang, Z. Deng, B. Liu, Z. Ma, Y. Zhang, G. Shi, T. T. Y. Lam, J. T. Wu, G. F. Gao, B. J. Cowling, B. Yang, G. M. Leung, Z. Feng, Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N. Engl. J. Med. 382, 1199–1207 (2020).
105
M. Chinazzi, J. T. Davis, M. Ajelli, C. Gioannini, M. Litvinova, S. Merler, A. Pastore Y Piontti, K. Mu, L. Rossi, K. Sun, C. Viboud, X. Xiong, H. Yu, M. E. Halloran, I. M. Longini Jr., A. Vespignani, The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science 368, 395–400 (2020).
106
R. Li, S. Pei, B. Chen, Y. Song, T. Zhang, W. Yang, J. Shaman, Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 368, 489–493 (2020).
107
N. Moshiri, CoaTran: Coalescent tree simulation along a transmission network.bioRxiv [Preprint] (2020). .
108
K. M. Braun, G. K. Moreno, C. Wagner, M. A. Accola, W. M. Rehrauer, D. A. Baker, K. Koelle, D. H. O’Connor, T. Bedford, T. C. Friedrich, L. H. Moncla, Acute SARS-CoV-2 infections harbor limited within-host diversity and transmit via tight transmission bottlenecks. PLOS Pathog. 17, e1009849 (2021).
109
J. Ma, Coronavirus: China’s first confirmed Covid-19 case traced back to November 17. South China Morning Post (2020); https://www.scmp.com/news/china/society/article/3074991/coronavirus-chinas-first-confirmed-covid-19-case-traced-back.
Assessing Probabilities for the Two Leading SARS-CoV-2 Origin Narratives
Abstract: Here we briefly summarize what are currently the two leading narratives on the origins of SARS-CoV-2 and critically assess arguments which have arisen in the most recent literature on this subject. In particular, it addresses the relevant probabilities involved for the recent postulate of two separate zoonotic spillovers of two different lineages of the virus and points out which additional specific probabilities are needed to support this postulate. The recent study presents the likelihoods Prob(2 lineages | single spillover) and Prob(2 lineages | double spillover). However, what is needed to establish the credibility of the two-spillover hypothesis is the ratio of the posterior probabilities: Prob(single spillover | 2 lineages) / Prob(double spillover | 2 lineages).
Text
Two distinct narratives have arisen in the scientific literature concerning the origins of SARS-CoV-2. Summarizing the relevant arguments may help to assess the plausibility of some of the underlying assumptions in each version.
One narrative argues in favor of initial zoonotic spillover at the Huanan seafood market in Wuhan, associated with the cluster of cases in that area occurring in mid-December 2019. Connected with this narrative is the recent postulate of two separate zoonotic spillovers of two different lineages of the virus.
A second narrative dates the initial case to November or even late October 2019, arising from an undetermined source of infection, or possibly from mutation of an existing milder version of the virus. It does not involve the assumption of two separate spillovers. While this second narrative does not preclude an origin in the Huanan market, it relegates the status of the mid-December outbreak to a so-called super-spreader event rather than as evidence for the origin.
One well-publicized line of evidence in support of the first narrative is based on spatial analyses [1]. Detailed studies of the spatial distribution of December cases have confirmed the initial suspicion of the Huanan market as a potential epicenter, although they do not themselves indicate whether the initial cases were infected inside or outside the market. Additionally, a spatial analysis of environmental samples from the market taken in January and February 2020, suggests an association with locations where live animals had been kept. Criticisms of that latter study note that the environmental sampling was non-random and also there is a strong possibility of these samples being of human origin as their sequences match those found in patients [2].
Another line of evidence, which appears to support the second narrative, is based on the timeline of the evolution of the virus. In particular, it is noteworthy that two distinct virus lineages—lineage A and lineage B—were present in the earliest confirmed cases, and thus time would have elapsed while one evolved into the other (involving two substitutions) or while both evolved from a common ancestor that subsequently became extinct. Simulations of the process of viral evolution in the human population based on plausible estimates of the rate of mutation have suggested times of the most recent common ancestor in mid to late November [3, 4, 5]. Another simulation study [6] goes further in attempting to distinguish the index case from the most recent common ancestor (due to the extinction of branch lineages) and indicates an index case in mid-October to mid-November. If one allows for evolutionary adaptation of the virus within a human population, then of course the initial transmission could have occurred earlier [5]. Clearly, the earlier the date of the index case, the greater the subsequent dispersal of cases, and thus the lower the information content of the spatial distribution of cases observed some three or four weeks later in the second half of December. This would weaken the evidence for the Huanan market zoonotic spillover narrative.
But this second line of reasoning based on the evolutionary timeline and pushing the index case further back in time, evidently changes if some of the viral evolution occurred within an animal population rather than a human population. Accordingly, a recent study [7] has proposed the hypothesis of two separate zoonotic spillovers at the Huanan market, one of lineage A and one of lineage B, occurring in a relatively short period of time before the surge of cases in mid-December. In that study, support for this two-spillover hypothesis comes from a simulation of the viral evolution in the human population starting with a single spillover. It shows a low probability of the observed phylogenetic structure, i.e., the emergence of two, and only two, primary lineages (namely, A and B) in the human population and a much higher probability that more than two lineages would have emerged. This low probability of just two lineages is essentially due to the large number of alternative substitutions that could have taken place besides those leading to the emergence of the two lineages when starting from a hypothetical common ancestor. A two-spillover argument can shorten the evolutionary timeline for at least one of the two lineages and thus puts the Huanan market origin narrative back on the table.
However, the posterior probability of the observed phylogenetic outcome in the two-spillover scenario favoring the Huanan market origin is conditional on events which must have occurred in the animal population (a single species or possibly multiple species in a chain of zoonotic events), and these events may in turn also have low probabilities for the following reason. There are two possibilities, each of which may have low probability. One possibility is that only the two lineages emerged in the animal population—one evolved from the other, or both from a common ancestor, but meanwhile no other lineage emerged. This may well be a low probability outcome just as simulations showed it to be in a human population, even if the evolutionary clock runs at a different rate in the animal population. The other possibility is that multiple lineages emerged in the animal population but only two lineages spilled over. The probability of this depends critically on the (unknown) rate of individual spillovers—high enough for two lineages to spill over but not so high that more than two spilled over.
To formulate these points in another way, the study [7] presents the probability of two lineages occurring given a single spillover and also the probability of 2 lineages occurring given a double spillover. The first is very small and the second, as might be expected, is much larger. However, this does not in itself favor the credibility of the two-spillover hypothesis. For a more conclusive outcome, what is needed is the ratio of the posterior probability of a single spillover given the existence of 2 lineages and the posterior probability of a double spillover given the existence of 2 lineages.
Unfortunately, some of the underlying probabilities needed for this calculation depend on processes that would be difficult to quantify: the rate of spillovers from an as yet undetermined source, and the rate of emergence of distinct lineages in an as yet unknown animal population.
Thus without further information, the relative probabilities of -- (a) the one-spillover narrative where the virus arose from an undetermined source of infection prior to December and (b) the two separate-spillover narrative which is compatible with zoonotic spillover at the Huanan market in mid-December -- are unresolved.
The reader can assess the strengths of the evidence for these two narratives, although it may appear to be inconclusive at this time.
References
[1] M. Worobey et al., The Huanan seafood wholesale market in Wuhan was the early epicenter of the Covid-19 pandemic. Science 10.1126/science.abp8715 (2022).
[2] G. Gao et al., Surveillance of SARS-CoV-2 in the environment and animal samples of the Huanan seafood market. Preprint, doi:10.21203/rs.3.rs-1370392/v1 (2022).
[3] X. Zhang et al., Viral and host factors related to the clinical outcome of Covid-19. Nature 583, 437–440 (2020).
[4] S. Duchesene et al., Temporal signal and the phylodynamic threshold of SARS-Cov-2. Virus Evol. 6, veaa061 (2020).
[5] K. G. Anderson et al., The proximal origin of SARS-CoV-2. Nat. Med. 26, 450–452 (2020).
[6] J. E. Pekar et al., Timing the SARS-CoV-2 index case in Hubei province. Science 372, 412–417 (2021).
[7] J. E. Pekar et al., The molecular epidemiology of multiple zoonotic origins of SARS-Cov-2. Science 10.1126/science.abp8337 (2022).
Time for a comparative analysis of all hypotheses: a Response to this article and its twin article (10.1126/science.abp8337)
This article and its twin article by Worobey et al (1) about the origin of COVID-19 at the Huanan Seafood Wholesale Market (HSWM) in Wuhan are of high quality but nevertheless raise several questions.
This article although very good is a simulation based on selected sequences and on a virtual computer-made root. Furthermore, the SARS-CoV-2 genome is saturated with a transition/transversion ration below (2) and a large number of mutations are due to host-driven RNA-editing and defense mechanisms (3). Considering these biases, it is not possible to calculate reliable MRCA (Most Recent Common Ancestor) and phylogenies. The authors also concluded that lineages A and B infected humans independently from different animals. SARS-CoV-2 evolves by generating variants more transmissible than their parental lineage (4). In agreement with the authors, lineages A and B must simply be seen as more transmissible variants of a progenitor virus they named recCA, and which was previously referred to as ProCoV2 (5) or SARS-CoV-2-AD (6). A dominant lineage generates a series of variants, which is consistent with the most common topology of large basal polytomy described in this article out of which one variant will take over to become the new predominant lineage. This is what has been seen with all SARS-CoV-2 variants, including lineages A and B. The progenitor of lineages A and B was logically a less transmissible virus, thus less likely of triggering a disease. This progenitor was itself a variant of an even less transmissible virus which will remain unknown. This chain of virus evolution exists before and after the disease has been characterized.
The limited time of virus circulation in humans before the emergence of COVID-19 described in the article is based on the MRCA of lineages A and B which extends to October confirming a previous article (7). However, this calculation only yields the time back to the progenitor of lineages A and B. Omicron is the result of about one year of evolution of SARS-CoV-2 in humans but the same calculation applied to the variants BA.5.3.1 and BA.2.75 would only go back to Omicron (B.1.1.529 or BA.1), their progenitor and not to Wuhan/Hu-1/2019, the first lineage of SARS-CoV-2 to have been characterized. The same apply to lineages A and B and any MRCA analysis will only come to their immediate progenitor and not to ancestors already present in humans. Marks of positive human selection were found in the genome of SARS-CoV-2 (8) indicating that it was indeed circulating in humans. The cryptic circulation of viruses for a long time, even years, in humans prior to an outbreak was proven for Ebola (9) and was highlighted by WHO to explain the recent reemergence of Monkey pox fever. It cannot be ruled out for SARS-CoV-2/COVID-19. Pekar et al. themselves concluded in 2021 based on the same sequences that SARS-CoV-2 was circulating in humans at least since October 2019 and that HSWM was not involved in the emergence of the virus (7).
The article by Worobey et al. (1) is a very good geographic and epidemiological analysis but does not either bring evidence for HSWM as the origin of human outbreak nor an animal spillover. As rightly stated in the article's title, HSWM is the epicenter of the COVID-19 disease. Indeed, Wuhan is the place where the disease was characterized and HSWM is the place where the official index case was located. However, this is merely the place where the symptoms were recognized not the place where the initial human infection, the primary case, took place. Worobey et al. (1) considered that since more than half of the positive cases were located in the southwestern section of the market where animals were sold, an animal kept in this market area must have been at the origin of the human contamination. They even identified a specific stall, stall 29, considered to be the origin of the primary human infection. However, a systematic sampling analysis all over and around HSWM showed SARS-CoV-2 contaminations in many places, mostly in the southwestern section, but not in stall 29 (10). All these samplings were conducted after the disease has been identified and therefore long after the first COVID-19 patients have been recognized. These contaminations might have occurred after the early human circulation of SARS-CoV-2. No conclusion can be drawn. It is like investigating a crime scene after a crowd was allowed to enter it. One must also consider how these markets work. Small farmers and hunters usually do not go to these markets to sell their animals. It is not economically viable. They usually sell their goods to middlemen, a practice common in Asia. Each middleman buys animals from different hunters and farmers and brings them to warehouse where they are stored, often under deplorable sanitary conditions, until they are brought to the market. This practice is very favorable for infections of humans and animals. This is the way the monkeypox virus is considered to have infected monkeys in 1958. Animals sold in markets are caged, do not move around and have no contact with people. Conversely, middlemen and vendors have a lot of interactions with people. They move around, shop, eat and have social interactions with other persons. This is consistent with the scattered presence of SARS-CoV-2 in the HSWM and its vicinity reported by Gao et al. (10).
It is obvious that at a given time a primary human case was contaminated by an animal but we will never know where and when. The theory of an animal spillover of a virus already adapted to humans and immediately triggering a human disease is not supported by the information available, i.e. adaptation to humans, marks of positive selection, scattered distribution of contamination in HSWM and around, infected animal species never identified. SARS-CoV-2 most likely evolved in humans before generating the high-transmissible variants A and B. A middleman most likely introduced the virus in HSWM which only acted as an amplifier. All potential mechanisms of disease emergence should be comparatively analyzed.
References and Notes