3D World of Molecules
The other day, while driving to work, I listened to a podcast featuring Fei-Fei Li, who recently launched World Labs, a company focused on developing spatial intelligence by building large world models. For those outside the field of AI, Fei-Fei Li is a powerhouse, widely regarded as one of the most influential figures in artificial intelligence. She’s a professor at Stanford University and has mentored several key figures in AI, including Andrej Karpathy, the former head of AI at Tesla. In the podcast, Fei-Fei and her co-founder, Justin Johnson, discussed the significance of 3D world models for understanding the physics of our environment. Current large language models, like the ones behind ChatGPT, are sequence-based and primarily operate in one dimension. Meanwhile, many of the generative image and video technologies work in 2D. Fei-Fei and Justin argue that for machines to truly perceive, reason, and act in a meaningful way, they must be able to interpret the 3D world. This is crucial for machines to grasp the physical properties of the world and, ultimately, achieve true intelligence.
3D world of Biology and Chemistry
While Fei-Fei and Justin are focused on applying 3D world models to areas like gaming and virtual reality (VR), there are interesting parallels to the biological and chemical world. On a molecular level, biology and chemistry also exist in a 3D space where components interact dynamically. In our everyday actual 3D world, we can generate a vast amount of 2D data—images and videos—that might help us infer information about the physics of 3D spaces, even creating virtual worlds in future. However, the molecular world doesn't offer this abundance of 2D images and videos. In theory, we could simulate the molecular 3D world using quantum mechanics, but in practice, it's computationally unfeasible on a large scale. As a result, we rely on approximations of atomic forces and electron behaviour, and our simulations don't always capture the full complexity of biological reality.
I recently attended the “AI in Chemistry” conference, organized by the Royal Society of Chemistry at Churchill College, Cambridge, UK. John Jumper from DeepMind delivered a fantastic talk on AlphaFold3. When asked whether AlphaFold3 understands the physics behind protein folding, he didn’t give a definitive answer but suggested that it might, given its ability to predict novel protein structures with high accuracy. Critics have long argued that AlphaFold may simply be memorizing structures based on sequence similarity within its training set, rather than truly learning the underlying physics. But when it comes to predicting 3D protein structures, this might not matter. If you’re a structural biologist, you’re likely thrilled that AlphaFold provides reliable starting models that can be used in conjunction with experimental x-ray diffraction data. It’s incredible that we now have tools capable of predicting the 3D structure of most proteins.
However, for those deeply invested in understanding the biological world, AlphaFold’s success might seem limited. Tools like AlphaFold, RosettaFold, and others don’t model the dynamic, ever-changing nature of biology. They provide a decent snapshot of a small portion of biological complexity, but protein structures and interactions are dynamic, and this 3D flexibility is key to their function. This is one reason we haven’t seen major breakthroughs in drug discovery using AlphaFold and similar models—at least not yet. Companies like Isomorphic Labs and Xaira Therapeutics might change that in the future, but for now, the impact on biology has been modest.
Some argue that current AI-based tools are comparable to docking methods for predicting binding interactions. Docking has been invaluable for providing approximations of molecular binding interactions, but it falls short when it comes to accurately predicting binding affinities—critical for ranking molecules in drug discovery campaigns. Without precise ranking, it’s difficult to converge on a highly potent drug candidate. While AlphaFold3 is capable of folding small molecules and proteins together, its performance in this regard is similar to, if not slightly worse than, traditional docking methods.
Extracting 3D Information from Biology: The Challenges and Possibilities
Can we extract 3D information from biological data to enable future models to learn the physics of biological interactions? My interest is on designing molecules that bind to biomolecular targets, so I’ll centre this discussion around that topic.
In the podcast, Justin also made an important point about 2D representations of the world, such as images, containing a wealth of information about the 3D world. Essentially, 2D data can be seen as a compressed version of 3D space, and with enough 2D data, you can infer 3D properties. Similarly, we often draw molecular structures in 2D, and these representations might capture the ensemble of 3D conformations. In theory, if we had sufficient data, 2D molecular drawings might offer enough information about a molecule’s conformational diversity. Some in the field argue that 2D representations are all we need. However, unlike fields like image and video generation, where large datasets exist, biology may not yet have the volume of 2D data necessary to fully exploit this approach. Therefore, 3D representations might provide an advantage.
In theory, physics-based molecular dynamics (MD) simulations can offer the 3D representations needed to capture the various conformational states of binding ligands, their protein targets, surrounding solvent water, and ions present in the environment. The challenge lies in converting these 3D simulations into meaningful representations—vectors and matrices—that are useful for machine learning tasks.
There are many questions to consider: Should you use the lowest-energy conformer, or should you take the top 10 minimum-energy conformers? Should you dock the molecule in the binding site and select the most reasonable conformation? Or should you use hundreds of conformations for a single molecule? These are complex problems, but ones that are worth exploring.
Another issue is that many assumptions are made in physics-based simulations, making it difficult to accurately reflect reality. However, the hope is that we can generate enough simulation data close to reality that small amounts of noise won't significantly affect the results. Some argue that if the noise is consistent across all molecules being studied, it may not matter much. I tend to agree with this viewpoint—unless the simulations deviate too far from reality.
A significant problem in the field of predicting binding interactions is the lack of reliable experimental data, especially in large quantities. Binding interaction measurements often depend on the methods and reagents used. For example, protein-RNA interactions can display different binding affinities depending on how the RNA folds under different ionic conditions in the buffer. As many in the field have pointed out, it's essential to be cautious when mixing data from various sources. I hope that one day, we’ll have models that can assist in designing biologically relevant assays by helping us select the right buffer, cells, reagents, time points, and more.
Diffusion Models, simulations, and protein language models
There’s a lot of ongoing research into using diffusion-based algorithms to learn from representations of small molecules and their binding partners. At a high level, diffusion models work by adding noise to the training data and then learning to reverse that noise. In the future, I plan to write a technical blog explaining how to code these models yourself. These algorithms gained popularity in the field of image generation. If you've followed that space, you might remember some of the bizarre images generated in the early days, such as people with multiple hands or an unrealistic number of fingers. However, diffusion models have advanced significantly in image generation, and today, they can produce images nearly indistinguishable from real photos. In our field, the focus of current diffusion model research is on generating synthetically accessible, valid molecules with appropriate bond angles and torsions. Many current models struggle with producing valid molecules and are far from being able to rank molecules effectively to find tight binders for a target pocket. While diffusion models may become usable in the near future, they are not quite there yet.
Another exciting line of research involves using AlphaFold to generate multiple conformations of a protein as starting points for further molecular dynamics (MD) simulations. This approach aims to capture the conformational diversity of proteins. If you’re interested in this space, I recommend looking into the work of Pratyush Tiwary. It's fascinating to see the worlds of AI and physics-based simulations coming together, and I’m excited about where this could lead in the future.
There’s also significant progress being made by using embeddings from protein language models like ESM-2 to identify strongly binding molecules, especially proteins and peptides. For more on this, you can refer to the work of Pranam Chatterjee’s group. Protein language models have shown some success in capturing structural information about proteins. Pranam’s research has taken these protein language models a step further with methods like PepPrCLIP, which have identified tight peptide binders for specific protein targets.
My Ideal Future
Personally, my dream is to have an algorithm that, given the sequence of a target protein, can not only identify binding pockets—even cryptic ones—but also suggest tight-binding small molecules with favourable drug-like properties. While we're far from achieving this, it would be a game-changer. And honestly, if we ever get there, I wouldn’t mind if the model had learned the physics of binding interactions or simply memorized every possible binding scenario.
While I’ve focused mostly on finding tight small molecule binders for protein targets—my area of interest—the broader concepts of dynamics, 3D conformations, and biomolecular interactions apply to many areas of biology. Understanding these interactions is essential, not just for drug discovery but for unravelling the complexities of biology.