Not Just Bats: Researchers Say Numerous Mammals Could Host Unknown Coronaviruses
(Inside Science) -- Most of the coronaviruses that humans encounter typically cause only mild infections. But the three most recent novel strains -- SARS-CoV-1, MERS-CoV and SARS-CoV-2, which causes COVID-19 -- are unusually virulent with relatively high fatality rates. They also have a shared origin story: They all developed in other mammalian species.
Cross-species transmission is one of the most common ways scientists become aware of new viruses, but it is also incredibly complicated to model and predict. Adding to the complexity is the process of viral recombination: when two different viruses infect the same cell and trade genetic information to create entirely new viruses.
Variants of a virus can be created by nominal mutations of a single virus’s genome resulting in small changes to the virus’s properties. Recombinant viruses require two different viruses to infect the same cell, and the resultant virus can be vastly different from its parent viruses -- even different enough to jump from one species to another. Coronaviruses are one of the few families of viruses that are capable of recombination.
In a paper published in the journal Nature Communications this week, computational biologist Maya Wardeh and virologist Marcus Blagrove, both at the University of Liverpool, developed a machine learning model to predict which mammals might be hosts to new recombinant viruses.
The model predicted that, compared to previous observations, there were 11 times more coronavirus-mammal associations, and over 40 times more mammal species that are likely hosts for recombination of coronaviruses. This suggests, the authors say, that the potential for coronavirus generation in mammals may be significantly underestimated.
“There could be more coronavirus to come,” said Blagrove. “It’s important to be able to predict where it’s coming from, so that we can focus our resources.”
“Once they’re in humans,” added Wardeh, “it’s way too late.”
A viral algorithm
Wardeh and Blagrove’s model takes into account the genome sequences of known coronaviruses, known mammal-coronavirus interactions and all available data on potential mammalian hosts -- including their genomes, where they live geographically, what type of food they eat and how closely related they are to other mammals. It then uses machine learning to create a network of similarities: similar coronaviruses are likely to infect similar hosts, and similar hosts can harbor similar viruses. But if a species isn’t susceptible to many viruses, they aren’t likely to be recombination hosts. And if two host species live on different continents, they’re unlikely to swap pathogens.
“I love complicated models,” said Wardeh. “No one side of the story is enough to enable accurate prediction of which coronaviruses might infect which species, and only by putting all these factors together can we tell the complete story.”
A large obstacle for the model is the lack of current knowledge about how certain viruses infect certain hosts. “We have very limited and biased data for a machine learning algorithm to learn from,” said Nicole Wheeler, a data scientist at the Centre for Genomic Pathogen Surveillance. “The mechanisms that determine whether a virus can infect a host and whether different viruses are likely to mix are extremely complex and influenced by factors we don’t have large amounts of data on.”
Still, she said, the work represents a potentially transformative application of machine learning to disease surveillance, and “may help shine a light on other mammal coronavirus hosts we should pay attention to.”
Recent studies have already confirmed a number of the model’s predictions: specifically, that the alpaca, domestic goat and raccoon dog are all susceptible to SARS-CoV-2.
The potential recombination hosts “cross the mammalian tree,” Blagrove said. “We’re not just missing a bunch of bats.”
The model also suggests the dromedary camel and the African green monkey could possibly host the recombination of SARS-CoV-2 with another coronavirus. Most notably, the model identifies 102 potential hosts for the alarming recombination of SARS-CoV-2 with MERS-CoV, which is far more deadly.
Virologist Arinjay Banerjee at the University of Saskatchewan said the “timely” study “highlights the importance of undiscovered coronavirus diversity and the underestimated number of mammalian hosts.” But he was not necessarily surprised by the scale of the predictions. “Animals, including humans, coexist with a large number of viruses,” he said. “Data from field studies will be important to demonstrate the accuracy of the predictions.”
Wardeh and Blagrove are planning to do surveillance studies on several predicted recombination hosts, to validate and fine-tune their model with observation data.
“This is not going to be the last pandemic,” said Wardeh. “We have to prepare for the next one, and maybe even prevent it.”