Remit and Structure of Pegasus¶
Trevor Bedford — 2026-02-16
Overview¶
Predictive models for the evolution of life
This document describes the remit and structure of a nascent biological AI lab. We have a working project title of "Pegasus", but this will almost certainly be replaced with something else. This entity will be structured exactly like Starfish Neuroscience as a for-profit company in which employees receive equity and where the mission focuses on foundational research rather than immediately trying to develop a product.
Research focus and objectives¶
The research focus will be on foundational AI as applied to biological sequences and related data with a particular focus on modeling evolutionary and ecological processes. Currently there is a swath of companies like Evolutionary Scale that are focused on modeling protein sequences for the application of protein engineering. I believe that sequence modeling approaches have huge potential to uncover evolution ("how does a genome evolve?") and ecology ("what does an assemblage of genomes look like and how does it vary through time and space?"), but this is currently a highly under-subscribed area. Evo 2 from the Arc Institute is state-of-the-art and focuses on functional annotation rather than evolution. If DeepMind / OpenAI have the audacious goal of "solve intelligence" and Colossal Biosciences has the audacious goal of "solve extinction", our audacious goal would be "solve evolution".
My initial project ideas focus on:
-
Evolutionary Diffusion: Variational autoencoders and conditional flow matching to treat evolution as vectors through latent space. This gives a whole genome mutation / selection process from which we can learn the tree of life. Aimed at core evolutionary understanding.
-
Metagenomic Sparse Autoencoders: harvest activations from transformed-based models like Evo 2 to train a token-level sparse autoencoder (SAE) from metagenomic reads. Train SAE for embedding reconstruction alongside taxonomic (via NCBI taxonomy) and functional heads (via Pfam domains). This gives a window into metagenomic dark matter.
Obvious application areas for foundational eco/evol models would be on pathogen evolution (vaccine escape, AMR), environmental metagenomics (wastewater surveillance, ocean ecology) and ecosystem design (steering microbiomes, synthetic consortia for bioremediation). Very broadly I imagine applications that range from evolutionary forecasting, to outbreak tracking, to AMR stewardship, to wastewater surveillance, to steering microbiome health, to phage therapy, to assisted evolution for conservation biology, to a broad understanding of the tree of life.
Organizational structure¶
Academic institutions encourage fiefdoms led by faculty members in which we all need to find our own niche that competes well for grant dollars. There is collaboration when it's mutually beneficial, but largely each lab needs to look out for itself. These labs are mainly staffed by graduate student and postdoc trainees, the most talented of which will go out to fledge their own groups. There's no real capacity of a dream team of faculty-level researchers to work directly alongside one another.
I love the idea of an actually flat structure that would recruit faculty-level scientists and AI experts. We could hack together on projects rather than inventing projects that are right-sized to graduate students to execute as first author. Papers could have the org name as byline with author contributions at the end (like a Valve game).
Along these lines, I've seen my own productivity increase hugely with incorporating Claude Code as my programming strategy and I imagine it will become increasingly the case that the best scientific output includes substantial LLM support. A strategy in which "faculty" are directly working on the science should facilitate this.
The plan is to basically copy Starfish, ie incorporate as a for-profit company where individuals receive an equity stake. I believe this would work well for an "AI lab" entity of 10-15 people that are focused on the same problem domain, where everyone has a north-star to cohere research efforts. I'm tempted to give everyone at the company the title of "fellow", "scientist" or "investigator", but maybe this doesn't work in practice.
Financial support is planned for a long-term runway of ~$10M per year to enable a lab-sized entity of self-directed individual contributors, with the hiring goal of faculty-level members who don't need and don't want to be told what to do. Currently involved are Gabe Newell for broad guidance, organizational structure and financing, Rebekah Englishbee for CEO responsibilities and myself on the scientific front. Gabe and Rebekah already have in depth experience at Starfish for fostering this sort of research / company structure.
Rayan Chikhi and Zehui Li joined in January as contractors (through Starfish) and we've been working to model evolutionary trajectories via discrete diffusion.
I'm still employed by Howard Hughes Medical Institute and Pegasus doesn't yet exist, but everything's getting squared away for me to transition employment and launch company in May.
My own rationale¶
I've been incredibly fortunate to have had such a supportive academic home at the Fred Hutch and such talented and generous lab members, but there are a few things pushing me in this direction.
Funding: This is a unique opportunity with long-term support to grow a research lab outside of traditional academic strictures of grant and publication pressures.
Scientific domain: I believe that a substantial aspect of success in science is the meta-cognition of finding the right field to pursue at the right time. I was extremely lucky with my timing with phylodynamics / genomic epidemiology, where I was able to help drive the 2nd wave of the field as large scale genomic data became routine and turn-arounds timely enough to impact outbreak response. This timing was perfectly aligned to impact the COVID-19 pandemic response. However, post-pandemic there's been a general lull in advancement of the field (sadly in part due to general societal retrenchment away from infectious diseases) and there feels like less of a novel vision to participate in.
I see amazing potential and energy behind recent advances of large-scale AI models to biological sequence data. I believe that a fresh start in a company structure should facilitate this domain transition.
Organizational structure: Watching people become better versions of themselves over the course of their tenure in the lab has been hugely rewarding. However, the academy has required a seemingly unavoidable first author / last author structure, where I'm trying to support trainees learning the entire arc of producing a paper and this feels like stacking up dominos rather than working together towards a major shared goal. I spend most of my time advising / managing rather than getting to hack on problems with peers.
I'd like a structure that allows me to do my most creative and meaningful work and I'm hopeful that this flat company of peers will facilitate this endeavor.