Protein Tools

How does it work?

--- PREPRINT --- Protein sequence and structure co-generation is a long outstanding problem in the field of protein design. By implementing ddpm style diffusion over protein seqeuence space we generate protein sequence and structure pairs. Starting with RoseTTAFold, a protein structure prediction network, we finetuned it to predict sequence and structure given a partially noised sequence. By applying losses to both the predicted sequence and structure the model is forced to generate meaningful pairs. Diffusing in sequence space makes it easy to implement potentials to guide the diffusive process toward particular amino acid composition, net charge, and more! Furthermore, you can sample proteins from a family of sequences or even train a small sequence to function classifier to guide generation toward desired sequences.

How to use it?

A user can either design a custom input sequence to diffuse from or specify a length below. To scaffold a sequence use the following format where X represent residues to diffuse: XXXXXXXXSCIENCESCIENCEXXXXXXXXXXXXXXXXXXX. You can even design a protein with your name XXXXXXXXXXXXNAMEHEREXXXXXXXXXXXXX!

Acknowledgements

Thank you to Simon Dürr and the Hugging Face team for setting us up with a community GPU grant!

Model in Action

INPUTS

Start Sequence

Specify the protein length for complete unconditional generation, or scaffold a motif (or your name) using the custom sequence input

How would you like to specify the starting sequence?
5 250

Optional Parameters

Try changing the sliders or inputing explicit secondary structure conditioning for each residue

How would you like to specify secondary structure?
0 0.05
0 0.05
0 0.2

OUTPUTS

Confidence score for generated structure at each timestep

Output protein sequence

Download PDB file

Structure viewer