Member-only story
Graph Algorithms and Deepchem
Deepchem is a module written on top of Rdkit. We’ve built our toolkit with rdkit and tensorflow using 2D SMILES to represent chemical structures. The SMILES can then be fed into machine learning models.
Recently, some groups have found that it is more accurate to represent a molecule as a graph rather than as a 2D SMILES string. Some of the features are lost when you use SMILES strings. When you use graphs, with atoms represented as nodes and bonds represented as edges, there is more information, more features, which translates into more accuracy.
2D data like smiles is termed euclidian data. Images and text are also euclidian. Geometric deep learning, on the other hand, uses non-euclidian 3D data. There is a lot of loss when you represent a molecule as a SMILES at the expense of structural information in the molecule.
We can use an autoencoder with graphs to generate new molecules.
This code comes from the book Deep Learning for the Life Sciences.
In our next article we will show how this can be combined with a PAINS screen and a druglikeness screen and average binding score filter.
Then we will feed these into autodock. And in the final part of the series we can write a code that selects the top binders from each round and feeds them into the next round after fragmenting. The goal would be to develop better binders and automate this.