Brainstorming Applications for Eco/Evol Sequence Models¶
Trevor Bedford — 2025-08-18
The following list of application areas derives largely from my specific domain experience in pathogens / surveillance, but I've tried to extend beyond just this area. In this brainstorming I've purposely not considered the business case and instead have focused on impactful / engaging areas. I've broken out directions into largely observational inferences vs systems that permit direct interventions.
Model capabilities¶
Here, I'm assuming that the models produced by Pegasus would be able to:
- Project how genomes will change forward in time if they continue to follow current selective pressures or if they are steered by new pressures (change in temperature; change in host immune landscape)
- Project genomes backwards in time to reconstruct common ancestors
- Project how assemblages of genomes will change if they continue to follow current trends
- Project stable assemblages of genomes given specified abiotic environment
Monitoring and forecasting¶
Evolutionary forecasting for strain selection. This is what we've already been doing in the Bedford Lab with fairly simple models to forecast the evolution of seasonal influenza and SARS-CoV-2. We currently participate in the World Health Organization meetings to pick influenza vaccine strains. There are smarter / fancier ways to approach this and better models could offer significant improvements to vaccine strain selection.
Phylodynamics++. Again, following what's been historically successful, we should be able to reconstruct with better accuracy / higher granularity how pathogens spread through the world via their genome sequences. This allows situational awareness and perhaps near-term forecasting of unfolding outbreaks / epidemics / pandemics.
AMR forecasting and stewardship. Antimicrobial resistance (AMR) continues to spread. Tracking and forecasting AMR across bacterial pathogens and across the globe would be useful for policy guidance. This also offers the possibility of feedback where cycling of antibiotics can slow evolution. But this is a complex environment with bystander effects where dosing for one pathogen affects others, etc... Advanced models would offer the potential to guide rational policy decisions from the hospital level to the national level.
Zoonotic spillover risk. Looking at historical patterns of host switching (into humans but also across animals) could potentially predict which viruses are at highest risk of spilling over into the human population. This goal has been hyped by "virus hunters" / things like the PREDICT project without very much to show for it, but there should still really be potential here.
Wastewater surveillance. This is effectively "bioweather". We want a system of weather stations (ala Lungfish) that are producing reliable and timely metagenomic data. On top of this we can have monitoring of what's circulating, nowcasts and forecasts of spread and early identification of novel pathogens.
Environmental metagenomics. This is a very similar topic as wastewater, but extended to more general environments. The questions become larger ecological questions rather than questions about pathogen load in humans and agriculture.
Genomic dark matter. A subset of wastewater surveillance / environmental metagenomics, where currently we can only identify a small taxonomic fraction of reads. Models could help with a larger understanding of the bacteria and viruses in the environment.
Predicting algal blooms. Metagenomic sequencing of ocean water combined with environmental covariates could provide early warning of imminent harmful algal blooms.
Within-host cancer evolution. Track tumor evolution via cell free DNA in the blood. This is similar to the general field of phylodynamics and has promise for tracking within-host cancer evolution. Prediction of resistance to therapies is of obvious utility.
Tree of Life. Imagine Nextstrain, but for all of life. We target the entire evolutionary structure (tree-like in large part, but not completely and especially not for many prokaryotes) that relates all sampled genomes. Historical evolution can be described as paths through a high dimensional embedding space. Utility here is placing sampled organisms into evolutionary context. Sequencing the genome of a new organism would give its immediate relationships to other organisms. Again, my analogy is to viral genomics, where being able to immediately place newly sequenced SARS-CoV-2 into context is hugely useful to virologists. I could imagine this manifesting more broadly.
Steering and synthesis¶
Microbiome health. With longitudinal measurements, individual microbiomes could be steered towards healthy ecologies through (1) diet, (2) probiotic additions and (3) specific subtractions via phages. There's progress here in terms of therapeutics for specific diseases, but data/AI informed steering should really be possible.
Phage cocktails for therapy. Phage therapy is an alternative to traditional antibiotics. My understanding is that the upside (and downside) is much higher specificity where particular phages will work on particular bacteria. Build "microbiome control systems" using metagenomic models to predict how targeted phage cocktails could steer diseased microbiomes back to healthy states.
Immunogen design. There's been lots of interest in the HIV vaccine space to design immunization strategies that guide within-host antibody evolution towards potent broadly neutralization antibodies. I'm broadly skeptical of HIV vaccine research, but I'm more hopeful of these same ideas of steering B cell evolution and T cell populations against other pathogens.
Synthetic consortia for bioremediation. Steer / construct microbial communities to create robust assemblages that can degrade pollutants. Communities are more stable than monocultures.
Assisted evolution for conservation biology. We have a host of taxa that are threatened by extinction due in part to a mismatch between their historical ecological niche and their current environment. The footprint of selection should be visible in these taxa, pushing them towards a fitter constitution. There are examples like the Florida panther where outcrossing with Texas pumas lead to increased survival and fitness. Whether the intervention is specific outcrossing or specific CRISPR edits, evolutionary models could inform exactly what is most likely to assist recovery. This may be particularly appropriate to combat coral bleaching.
Steering for climate-resilient agriculture. Similar to the above, but for agriculture rather than conservation biology. My understanding is that current crops are not expected to keep up with climate stress. Guide evolution towards fitter crops based on knowledge of climate scenarios.
Resurrecting ancient genomes. With proper models we could predict a close proxy of the genome of the common ancestor of an extant group of organisms. Say the common ancestor of all modern birds. This would be the closest thing possible to actual Jurassic Park. Advancements from entities like Colossal are making it possible to grow these organisms.
Synthetic biology more broadly. Many existing companies in this area are working towards designable proteins and biological systems (e.g. fully engineered CRISPR-Cas systems). It's possible (but not at all certain) more directly modeling the evolutionary process could improve steerable sequence generation, compared to treating training sequences as fundamentally atomic examples as current genome language and protein language models do.