Member-only story
Mass spec random forest
This is an initial stab at trying to make a mass spec machine learning program.
A random forest model picks out the most important features to focus on for a model.
I used a toxicity database here — https://xundrug.cn/moltox
This can be used to build a mass spec fragmenter. https://cfmid.wishartlab.com/queries/25dde6e494cf990892d3f451d2180166762755b5
The idea behind mass spec is that if you do not know the structure of the compound but know its mass to charge ratio you can guess what functional groups might be present. For initial chemical discovery you might only know the chemical formula and not the exact arrangement of the atoms ie if you had millions of unknown compounds to screen. For this program in rdkit, we will be modeling chemicals toxic to t pyriformis in ponds and swimming pools.
import tensorflow as tf
from keras import Sequential
from keras.layers import Flatten, Dense
from keras import Input
from keras import Model
import pandas as pd
df = pd.read_csv(‘new.csv’, sep=’\t’)
print(df.head())
import tensorflow as tf
import keras
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import glob
import pickle
% matplotlib inline