Member-only story

OCHEM with rdkit

Patrick Chirdon
1 min readDec 11, 2021

--

A rule based system is an if-then statement. If this SMARTS pattern is present, then the chemical has this chemical mechanism (SN2, schiff base, etc.) and this biological mechanism. I use rdkit for a lot of these programs. I thought I would post this because it is a lot different from the machine learning approaches I would use that were data driven ie SVM, neural net, random forest. There’s also a number of physical chemistry and thermo based rule approaches that use formulas for QSAR models but I’ll just cover a really simple example here.

from rdkit import Chem
import pandas as pd
mydf=pd.read_csv(‘smarty.csv’, sep=’\t’)
mydflist=mydf.values.tolist()
m = Chem.MolFromSmiles(‘O=CC=Cc1ccccc1’)

j=0
myindex=[]
for i in mydf[‘SMARTS’]:
k=mydf[‘SMARTS’][j]
try:
n=Chem.MolFromSmiles(k)
except:
print(‘fail’)
try:
a=m.HasSubstructMatch(n)
if(a==True):
myindex.append(j)
except:
print(‘fail’)
j=j+1

mport sys

new=pd.DataFrame()

for i in myindex:

new=new.append(mydf.iloc[[i]])

For the molecule O=CC=Cc1ccccc1

I created a spreadsheet of SMARTS substructures, associated ochem mechanisms from literature and references.

Using a simple approach like this you could build a simple database.

--

--

No responses yet