Skip to content

Pegasus Docs

Metagenomic Interpretability

Pegasus Docs

Home
Strategy
Strategy
Projects
Projects
- Projects
- Evolutionary Diffusion
  Evolutionary Diffusion
- Metagenomic Interpretability Metagenomic Interpretability
  Table of contents
  - Goals
  - Repos
Infrastructure
Infrastructure
Decisions
Decisions

Metagenomic Interpretability¶

Harvest activations from transformer-based models like Evo 2 to train a token-level sparse autoencoder (SAE) from metagenomic reads. Train SAE for embedding reconstruction alongside taxonomic (via NCBI taxonomy) and functional heads (via Pfam domains). This gives a window into metagenomic dark matter.

See Initial Directions for the full project scoping discussion.

Goals¶

Fine-tune or pre-train a genome language model on viral/metagenomic data
Harvest activations from Lungfish wastewater data
Train sparse autoencoder on token embeddings with taxonomic and functional heads
Identify and characterize novel biological features in metagenomic data

Repos¶