Alex Carlin

Novel enzymes from a diffusion model (paper review)

Enzyme design is a bit of a grand challenge in protein science. Let’s just put it this way: enzymes have proven to be incredibly difficult to design effectively. Mayo and coworkers proposed an automated method for de novo enzyme design in 1997, and more and more smart people have been dealing with the insane challenges posed by the problem of designing proficient catalysts ever since.

In 2008, the labs of David Baker and Dan Tawfik created de novo enzymes for a relatively simple reaction, Kemp elimination. They succinctly describe what has become the standard procedure for enzyme design in the field since. We first "choose a catalytic mechanism and then … use quantum mechanical transition state calculations to create an idealized active site with protein functional groups positioned so as to maximize transition state stabilization." Following this geometric definition, an algorithm is used to find natural protein backbones that match to the alpha carbons of the protein functional groups.

In 2010, Baker's group designed enzymes for a bimolecular reaction: the Diels-Alder reaction. While Siegel and coworkers showed that it was possible to create novel biocatalysts for desired reactions, they note that the proficiency of the created enzymes doesn't match the proficiency of natural enzymes. And this is still true today. Designed enzymes do not achieve the same level of catalytic proficiency as natural enzymes do. The vast majority of enzymes in use in biotechnology, whether they are used in a pharmaceutical process or in your laundry soap, are lightly-engineered versions of natural proteins, because they work better than de novo designed enzymes.

The problem is, as the fields of biotechnology, medicine, food science, and others use enzymes to solve more and more problems, there are always new enzymes to design, and we can't rely on natural ones. Especially when we start to solve really fun problems that don't exist in nature, like inventing an injectable enzyme that makes soldiers invulnerable to VX gas (one of the patents I've coauthored). Since the whole point of an enzyme, really, is to make things go fast, the field of enzyme design is still searching for ways to make better enzymes.

In a new paper out this month, folks from David Baker's group take another step towards this goal. In "Computational design of serine hydrolases" by Anna Lauko, Samuel J. Pellock, and others, they describe how they've approached the computational enzyme design workflow by replacing two key parts with deep neural networks trained on natural proteins.

The first piece remains the same. We must first define the chemical reaction we wish to catalyze. Here, the authors chose "serine hydrolysis", or hydrolysis via a catalytic triad of a serine, an aspartic acid (or glutamic acid), and a histidine residue positioned to activate the serine's hydroxyl group to perform a nucleophilic attack on the substrate. (They choose a nice fluorogeneic substrate to make for a tractable enzyme assay.) This defines their theozyme.

The next step is the key innovation. Instead of searching an existing library of protein backbones for something that'll fit the theozyme, they use the protein diffusion network RFdiffusion (also from Baker's group) to create completely novel backbones that fit the theozyme perfectly. This, combined with an auxiliary neural network called ChemNet used as a classifier, allows the authors to generate novel backbone structures for their theozymes, not rely on natural proteins. They can then use LigandMPNN, a deep learning–based method for sequence design from backbones, which I've written about quite a lot here, and AlphaFold to validate that the computationally-folded sequences match the designed backbones.

They further use their ChemNet model to perform detailed, atomistic studies of their designed catalysts, and then use what they learned to improve their design procedure. The active site geometry that the authors chose in this work is considerably more complex than that in previous work. Despite this formidable challenge, in very few rounds, with a relatively small number of designs, they have shown completely novel designed enzymes with catalytic efficiencies of 103M1s1. That's one thousand turnovers per second. Slow natural enzymes like lysozyme are about 1 turnover per second. Most natural enzymes are 105 per second. The very fastest are 108 per second.

This paper is really cool because the authors have shown that it's possible to create completely novel enzymes from a desired chemical reaction, with a complex catalytic site, using diffusion models trained on natural protein structures. While the number of designs they test in the lab is quite small, and their success rate is very high, the catalytic efficiency of their designed biocatalysts is still perhaps 100-fold slower than most natural enzymes. We can only hope that in the future we can design enzymes with catalytic efficiencies that rival those honed by natural selection.