- ProtoBind-Diff creates drug-like molecules for particular protein targets by using only their amino acid sequences, eliminating the need for 3D structures.
- This model rivals top structure-based tools in its ability to predict binding strength, while also producing new and chemically varied compounds.
SINGAPORE, June 25, 2025 — Gero, a biotech firm specializing in aging and chronic illnesses, today unveiled ProtoBind-Diff, a masked diffusion language model capable of generating small molecules exclusively from protein sequences. Having been trained on over one million active protein-ligand combinations, ProtoBind-Diff marks a significant advancement in molecular creation. In contrast to structure-based models, which are constrained by the limited and skewed collection of determined protein-ligand complexes, ProtoBind-Diff utilizes the considerably larger volume of activity data found in public repositories. This allows for training across a much wider chemical and biological spectrum, enabling the model to perform well on less-studied targets where structural information is scarce or absent.
Gero released a detailing the model’s performance and design.
“Creating small molecules that accurately engage with protein targets is among the most challenging issues in drug discovery. Traditional modeling faces difficulties because the energy levels, polarization effects, and the intricate nature of protein dynamics make precise predictions almost unachievable. Perhaps we’ve been approaching this problem incorrectly,” stated Peter Fedichev, Ph.D., CEO and Co-Founder of Gero. “Nature has already addressed this challenge; evolution has perfected a biochemical language that dictates how proteins and molecules interact. ProtoBind-Diff allows us to leverage this inherent language. It is a language model that derives understanding from sequences, rather than structures. Instead of simulating physics, it grasps the underlying principles of bioactivity from millions of actual instances.”
ProtoBind-Diff was created to serve as a core element of Gero’s generative drug discovery platform. The model utilizes pre-trained protein embeddings (ESM-2) and a denoising diffusion framework to produce chemically sound and new molecules in SMILES format, relying solely on sequence-based data.
Key results from the preprint include:
- Performance comparable to structure-based models (e.g., Pocket2Mol, TargetDiff) in structure-aware evaluations using Boltz-1, a neural network that anticipates protein-ligand complexes and assesses their binding efficacy. ProtoBind-Diff either equals or surpasses these models across both established (“easy”) and data-scarce (“hard”) targets.
- Spontaneous interpretability, where attention heads correspond to known binding residues even without exposure to 3D binding site annotations during its training phase.
- Significant novelty, drug-like qualities, and ease of synthesis for the molecules generated, as determined by metrics for structural resemblance, drug-likeness, and synthesizability.
- Open-source availability on GitHub, accompanied by a waiting list for the public demonstration of the complete model and its codebase at .
ProtoBind-Diff was assessed using both traditional docking techniques (AutoDock Vina) and advanced deep learning models that consider structure. Specifically, ProtoBind-Diff was evaluated with Boltz-1, an open-source neural network inspired by AlphaFold 3, a Nobel Prize-honored innovation in protein structure prediction. Boltz-1 expands on this by modeling how proteins interact with small molecules, providing a scalable, structure-aware measure for binding strength. The model consistently showed a strong preference for active compounds, especially for targets with limited structural information or few identified ligands. In some instances, its Boltz-1 enrichment factors surpassed those of models trained on structural data, indicating a strong capacity to deduce spatial relationships directly from sequence embeddings.
“I believe we are merely at the initial stages of developing a truly ideal generative model. While our evaluations show that the ProtoBind-Diff model outperforms certain current 3D structural models,” stated Konstantin Avchaciov, Ph.D., a Senior Researcher at Gero and the lead scientist for this initiative. “Nevertheless, I am confident that by continuing to broaden our datasets to encompass a greater variety of protein classes, we will attain substantially improved outcomes in the future.”
The introduction of ProtoBind-Diff aligns with the increasing focus on human-relevant, structure-independent methods for drug discovery, especially in fields such as pandemic preparedness, targets for neglected diseases, and proteins featuring intrinsically disordered regions.
Gero has incorporated ProtoBind-Diff into its internal discovery workflow and is actively looking for partners to utilize the model in collaborative projects spanning oncology, immunology, infectious diseases, and conditions related to aging.
About Gero
Gero is a biotechnology firm that develops new treatments for age-related illnesses and aims to promote longevity. The company uses its unique biological datasets alongside AI-powered models to comprehend and decelerate the aging process, ultimately striving to prolong a healthy human lifespan. Gero is also working with Pfizer to create therapies for fibrotic conditions, as part of its wider goal to address the fundamental causes of aging. For more details, visit .
Media Inquiries
Kimberly Ha
KKH Advisors
917-291-5744