Member-only story

Building a chemical library

Patrick Chirdon
12 min readNov 16, 2020

--

Generating new molecules

To generate new molecules using genetic algorithms, we have used a package called Data Warrior. Data Warrior was used because it is an open source easy-to-use platform for generating new molecules based on the scaffolds provided. In addition to the creation of compound libraries, Data Warrior can be used to calculate physiochemical properties, create graphs, and visualize data. We used the genetic algorithm functionality of Data Warrior to generate compounds that were similar in structure to the starting 16 scaffolds.

Briefly, the genetic algorithm of Data Warrior works in the following way: (a) Input a user provided structure of a molecule called scaffold; (b) Mutate it randomly by changing fragments on the molecule and select the structures that are most similar to the original structure; © Generate a pre-specified number of children for every scaffold; (d) Select the most structurally similar molecules from the population. These molecules become the starting structures for the next generation. The above steps are repeated till we have a desired number of generations. While Data Warrior is a powerful tool to rapidly generate a large number of molecules (~105), it is inflexible in the sense that it does not allow us to choose a different selection criterion than structural similarity. We used Data Warrior to generate 400 generations, 32 children selected per generation out of 4096 per generation. The algorithm selected the 32 molecules most similar to the parent generation. Since we wanted to generate compounds…

--

--

No responses yet