Why Lattice Proteins for Synthetic Evaluation Data¶

Trevor Bedford — 2026-05-15

Decision¶

Use the MJ lattice protein model (Miyazawa & Jernigan 1985; Bloom, Wilke, Arnold & Adami 2004) as the source of synthetic evaluation data for the evolutionary diffusion model.

The lattice model has a unique property that no more realistic approach can match: exact, deterministic, efficiently computable fitness for every possible sequence. Branch-and-bound enumeration gives the true partition function and ensemble-averaged binding energy, not an approximation, not a sample, not another model's prediction. This makes it possible to construct evaluation benchmarks with unambiguous ground truth.

The model also has the structural properties needed for a meaningful fitness landscape. It produces epistasis (the effect of mutation A depends on the presence of mutation B through structural contacts). It couples sequence to function through an intermediate structural step (folding). It has a genetic code layer that creates synonymous and nonsynonymous mutations with different fitness consequences. These are the same properties that make real protein evolution interesting, even though the specific physics is simplified.

Context¶

The evolutionary diffusion model needs synthetic data where the true fitness landscape is fully known. This enables controlled evaluation questions that are impossible with empirical data: does the model assign higher probability to beneficial mutations? Does it capture epistasis? Does it distinguish synonymous from nonsynonymous changes? Can it predict multiple steps along a fitness gradient?

None of these require that the synthetic fitness landscape resemble real protein binding. They require that it has the right structural properties: ruggedness, epistasis, a mix of neutral/deleterious/beneficial mutations, and coupling between sequence and function mediated by structure.

Alternatives considered¶

Protein language model oracles (e.g., ESM-2 pseudolikelihood as fitness). Advantage: the landscape reflects real evolutionary constraints. Disadvantage: "ground truth" is another model's approximation — testing one model against another model's predictions is a much weaker evaluation than testing against mathematically exact fitness values.

Physics-based energy functions (e.g., Rosetta REF15 on short peptides). Advantage: more realistic treatment of backbone flexibility, solvation, and side-chain rotamers. Disadvantages: orders of magnitude slower (minutes per sequence vs. seconds), the energy function is still parameterized rather than exact, and stochastic sampling introduces noise so the same sequence can give different energies on repeated evaluations.

Molecular dynamics FEP (free energy perturbation). Gold standard for binding free energy. Completely impractical for SSWM trajectory generation — each fitness evaluation takes hours of GPU time.

Consequences¶

The lattice model's simplicity is a feature, not a limitation. If the diffusion model cannot learn to predict the next step in a lattice protein trajectory — where the fitness landscape is smooth, deterministic, and has clear structure — it certainly won't succeed on real evolutionary data where the landscape is noisier and more complex. The lattice evaluation is a necessary condition for trusting the model on harder problems.
The lattice evaluation is necessary but not sufficient. It tests whether the model can learn evolutionary dynamics at all, under controlled conditions that allow precise diagnosis of what the model has and hasn't learned. The empirical benchmarks (DMS correlation, nucleotide frequency baseline, cancer driver classification) test whether it works on real biology. Both are needed; they answer different questions.
Ground truth is mathematically exact rather than another model's prediction, so failures on the lattice benchmark are unambiguously model failures rather than disagreements between two approximations.