Generating New Drug SMILES Data Using Model RNN-LTSM
The goal of the task I'm working on is to generate new molecules using a Recurrent Neural Network. De novo simply means to synthesize new. The idea is to train the model to learn patterns in SMILES strings so that the output generated can match valid molecules. SMILES is a string representation of a molecule based on its structure and different components, making it a computer-friendly way to represent molecules.
Steps:
Install Rdkit: Rdkit is a cheminformatics toolkit that enables working with chemical structures and data. To install it, you can use the following command:
!pip install rdkit-pypi
Install DeepChem: DeepChem is a Python library that provides tools for deep learning in cheminformatics. You can install it using the following command:
!pip install deepchem
Import Libraries: In your Jupyter Notebook, import the required libraries for working with Rdkit and DeepChem:
import general packages to per-process data SMILES¶
first step : install Rdkit package to deal with Chemical data SMILES
- loading dataset and convert to DATAFRAME using Rdkit.Panads.Tools and do some analysis on it
%%capture
!pip install -q condacolab
import condacolab
condacolab.install()
# I HAVE ALREADY INSTALLED ON MY ENVIROMENET
!mamba install -c conda-forge rdkit
!curl -Lo deepchem_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
curl: /usr/local/lib/libcurl.so.4: no version information available (required by curl) % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3457 100 3457 0 0 30184 0 --:--:-- --:--:-- --:--:-- 30324
import deepchem_installer
deepchem_installer.install()
add /root/miniconda/lib/python3.10/site-packages to PYTHONPATH all packages are already installed
%%capture
!pip install transformers
!pip install --pre deepchem
import deepchem
deepchem.__version__
No normalization for AvgIpc. Feature removed! Skipped loading modules with pytorch-geometric dependency, missing a dependency. No module named 'torch_geometric' Skipped loading modules with pytorch-geometric dependency, missing a dependency. cannot import name 'DMPNN' from 'deepchem.models.torch_models' (/usr/local/lib/python3.10/site-packages/deepchem/models/torch_models/__init__.py) Skipped loading modules with pytorch-lightning dependency, missing a dependency. No module named 'pytorch_lightning' Skipped loading some Jax models, missing a dependency. No module named 'haiku'
'2.7.2.dev'
from rdkit.Chem import PandasTools
import pandas as pd
from rdkit.Chem.Draw import IPythonConsole
import os
from rdkit import Chem
from rdkit import RDConfig
import numpy as np
from rdkit.Chem import Draw , Descriptors
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from google.colab import drive , files
drive.mount('/drive')
Mounted at /drive
load Dataset and process:¶
- read the CSV file, and the delimiter is set to "\t" to handle tab-separated values. The columns are named "smiles" and "labels" using the
names
parameter. The resulting DataFrame is stored in the variabledata
. Finally, theset_index
method is called to set the "smiles" column as the index of the DataFrame, but theinplace
parameter is set to False, so the original DataFrame is not modified, and the result is displayed in the output of the code cell.
data_training ='/drive/My Drive/smiles'
smifile = data_training + '/training.smi'
data = pd.read_csv(smifile, delimiter = "\t", names = ["smiles","labels"], index_col=False)
data.set_index("smiles",inplace=False)
labels | |
---|---|
smiles | |
CC(N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23)c4ccccc4 | 0 |
CN(C1CCN(CC1)c2cc(ncn2)C(F)(F)F)C(=O)C3=CN(CC=C)C(=O)c4[nH]ccc34 | 0 |
CN1C(=O)C=Cc2ccccc12 | 0 |
CC(C)N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23 | 0 |
CC(=O)c1cc(c2ccccc2S(=O)(=O)C)c3ccccn13 | 0 |
... | ... |
COc1cc2N(C)C(=O)C=C(C)c2cc1NS(=O)(=O)c3ccc(cc3)C#N | 1 |
CN1C(=O)C=Cc2cc(NS(=O)(=O)c3ccc(cc3)C#N)ccc12 | 1 |
CN1C(=O)C=Cc2cc(NS(=O)(=O)c3ccc(cc3)C#N)ccc12 | 1 |
Cc1nnc2c3ccccc3c(nn12)c4ccc(N5CCOCC5)c(NS(=O)(=O)c6ccc(Cl)cc6)c4 | 1 |
CN1C(=O)C(=Cc2cc(NS(=O)(=O)c3ccc(cc3)C#N)ccc12)C | 1 |
102 rows × 1 columns
data.head(10)
smiles | labels | |
---|---|---|
0 | CC(N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23... | 0 |
1 | CN(C1CCN(CC1)c2cc(ncn2)C(F)(F)F)C(=O)C3=CN(CC=... | 0 |
2 | CN1C(=O)C=Cc2ccccc12 | 0 |
3 | CC(C)N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23 | 0 |
4 | CC(=O)c1cc(c2ccccc2S(=O)(=O)C)c3ccccn13 | 0 |
5 | CC(=O)c1cc(c2ccccc2S(=O)(=O)C)c3cc(Oc4ccccc4)c... | 0 |
6 | CNC(=O)N1CCc2c(C1)c(nn2C3CCOCC3)N4CCCc5cc(c6cn... | 0 |
7 | CC(=O)N1CCc2[nH]nc(Nc3ccccc3)c2C1 | 0 |
8 | CC(=O)N1CCc2c(C1)c(Nc3ccc(cc3F)c4cnn(C)c4)nn2[... | 0 |
9 | CN(C1CCN(C)CC1)C(=O)C2=CN(C)C(=O)c3[nH]ccc23 | 0 |
data.describe()
labels | |
---|---|
count | 102.000000 |
mean | 0.656863 |
std | 0.477101 |
min | 0.000000 |
25% | 0.000000 |
50% | 1.000000 |
75% | 1.000000 |
max | 1.000000 |
data.iloc[0:5]
data.iloc[:-4]
smiles | labels | |
---|---|---|
0 | CC(N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23... | 0 |
1 | CN(C1CCN(CC1)c2cc(ncn2)C(F)(F)F)C(=O)C3=CN(CC=... | 0 |
2 | CN1C(=O)C=Cc2ccccc12 | 0 |
3 | CC(C)N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23 | 0 |
4 | CC(=O)c1cc(c2ccccc2S(=O)(=O)C)c3ccccn13 | 0 |
... | ... | ... |
93 | COc1ccc(cc1)S(=O)(=O)Nc2cc(ccc2N3CCN(C)CC3)c4n... | 1 |
94 | COc1cc(cc(C)c1CN(C)C)C2=CN(C)C(=O)C(=C2)C | 1 |
95 | C[C@@H]1CC(=O)Nc2cccc(c2N1)c3ccc4c(c3)c(nn4C)c... | 1 |
96 | C[C@@H]1CC(=O)Nc2cccc(c2N1)c3ccc4c(c3)c(nn4C)c... | 1 |
97 | COc1cc2N(C)C(=O)C=C(C)c2cc1NS(=O)(=O)c3ccc(cc3... | 1 |
98 rows × 2 columns
data.smiles[0]
'CC(N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23)c4ccccc4'
Virtualize Chemical Components:¶
- The function
mol_with_atom_index(mol)
that takes a molecule (mol
) as input and sets the atom map numbers to the atom indices in the molecule. This is done to show the atom indices when drawing the molecule.
Next,
IPythonConsole.drawOptions.addAtomIndices = True
is set to enable displaying the atom indices in the molecule visualization.The
size
variable is set to300, 300
, which determines the size of the molecule visualization.A molecule (
mol_1
) is created using the SMILES representation of the third row in thedata
DataFrame (data.smiles.iloc[2]
).The
mol_with_atom_index
function is called withmol_1
as an argument, and the result is stored in the variablemol_indexAtom
.The last line of the code cell will display the molecule visualization with atom indices.
def mol_with_atom_index(mol):
for atom in mol.GetAtoms():
atom.SetAtomMapNum(atom.GetIdx())
return mol
IPythonConsole.drawOptions.addAtomIndices = True
size=IPythonConsole.molSize = 300,300
mol_1=Chem.MolFromSmiles(data.smiles.iloc[2])
mol_indexAtom=mol_with_atom_index(mol_1)
mol_indexAtom
IPythonConsole.molSize = 300,300
mol_indexAtom
IPythonConsole.drawOptions.addAtomIndices = True
IPythonConsole.molSize = 300,300
for i in range(5):
j=0
for j in iter(data.smiles):
j+=j
mol=Chem.MolFromSmiles(j)
mol
mol
IPythonConsole.molSize = 400,400
mol_with_atom_index(mol)
transformations and feature engineering :¶
In this code cell, we perform transformations and feature engineering for a Chemical dataset input. The dataset contains SMILES representations of chemical compounds.
We create a
charset
containing unique characters from all SMILES strings along with special characters ("!" for start and "E" for end).We create dictionaries
char_to_int
andint_to_char
to map characters to integers and vice versa.The variable
embed
is set to the length of the longest SMILES string in the dataset plus 5, which determines the embedding size for one-hot encoding.The function
vectorize(smiles)
is defined to convert SMILES strings to one-hot encoded vectors. The function iterates through each SMILES string and encodes it as a one-hot vector. It adds special characters ("!" at the start and "E" at the end) to indicate the beginning and end of the sequence.The
vectorize
function is applied to the training and test datasets (smiles_train
andsmiles_test
) to create input (X_train
andX_test
) and output (Y_train
andY_test
) arrays for training and testing purposes.Information about the first SMILES string in the training dataset is displayed using
print(smiles_train.iloc[0])
.A visualization of the one-hot encoded representation of the first SMILES string in the training dataset is displayed using
plt.matshow(X_train[0].T)
.
from sklearn.model_selection import train_test_split
smiles_train, smiles_test = train_test_split(data["smiles"], random_state=42)
print(smiles_train.shape)
print(smiles_test.shape)
(76,) (26,)
charset = set("".join(list(data.smiles))+"!E")
char_to_int = dict((c,i) for i,c in enumerate(charset))
int_to_char = dict((i,c) for i,c in enumerate(charset))
embed = max([len(smile) for smile in data.smiles]) + 5
print(str(charset))
print(len(charset), embed)
{'s', '!', '@', ']', 'C', 'S', '5', 'F', 'r', '1', ')', 'E', '=', 'H', '#', 'O', 'l', '4', '(', '3', '6', 'n', 'B', 'N', '[', 'c', '2'} 27 82
def vectorize(smiles):
one_hot = np.zeros((smiles.shape[0], embed , len(charset)),dtype=np.int8)
for i,smile in enumerate(smiles):
#encode the startchar
one_hot[i,0,char_to_int["!"]] = 1
#encode the rest of the chars
for j,c in enumerate(smile):
one_hot[i,j+1,char_to_int[c]] = 1
#Encode endchar
one_hot[i,len(smile)+1:,char_to_int["E"]] = 1
#Return two, one for input and the other for output
return one_hot[:,0:-1,:], one_hot[:,1:,:]
X_train, Y_train = vectorize(smiles_train.values)
X_test,Y_test = vectorize(smiles_test.values)
print(smiles_train.iloc[0])
plt.matshow(X_train[0].T)
#print X_train.shape
C[C@H]1C[C@@H](Nc2ccc(Cl)cc2)c3cc(ccc3N1C(=O)C)c4ccc(cc4)C(=O)O
<matplotlib.image.AxesImage at 0x7db0adf3fa90>
"".join([int_to_char[idx] for idx in np.argmax(X_train[0,:,:], axis=1)])
'!C[C@H]1C[C@@H](Nc2ccc(Cl)cc2)c3cc(ccc3N1C(=O)C)c4ccc(cc4)C(=O)OEEEEEEEEEEEEEEEEE'
#Import Keras objects
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Concatenate
from keras import regularizers
input_shape = X_train.shape[1:]
output_dim = Y_train.shape[-1]
latent_dim = 64
lstm_dim = 64
unroll = False
encoder_inputs = Input(shape=input_shape)
encoder = LSTM(lstm_dim, return_state=True,
unroll=unroll)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
states = Concatenate(axis=-1)([state_h, state_c])
neck = Dense(latent_dim, activation="relu")
neck_outputs = neck(states)
decode_h = Dense(lstm_dim, activation="relu")
decode_c = Dense(lstm_dim, activation="relu")
state_h_decoded = decode_h(neck_outputs)
state_c_decoded = decode_c(neck_outputs)
encoder_states = [state_h_decoded, state_c_decoded]
decoder_inputs = Input(shape=input_shape)
decoder_lstm = LSTM(lstm_dim,
return_sequences=True,
unroll=unroll
)
decoder_outputs = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(output_dim, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
#Define the model, that inputs the training vector for two places, and predicts one character ahead of the input
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
print(model.summary())
Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 81, 27)] 0 [] lstm (LSTM) [(None, 64), 23552 ['input_1[0][0]'] (None, 64), (None, 64)] concatenate (Concatenate) (None, 128) 0 ['lstm[0][1]', 'lstm[0][2]'] dense (Dense) (None, 64) 8256 ['concatenate[0][0]'] input_2 (InputLayer) [(None, 81, 27)] 0 [] dense_1 (Dense) (None, 64) 4160 ['dense[0][0]'] dense_2 (Dense) (None, 64) 4160 ['dense[0][0]'] lstm_1 (LSTM) (None, 81, 64) 23552 ['input_2[0][0]', 'dense_1[0][0]', 'dense_2[0][0]'] dense_3 (Dense) (None, 81, 27) 1755 ['lstm_1[0][0]'] ================================================================================================== Total params: 65,435 Trainable params: 65,435 Non-trainable params: 0 __________________________________________________________________________________________________ None
from keras.callbacks import History, ReduceLROnPlateau
h = History()
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5,patience=10, min_lr=0.000001, verbose=1, epsilon=1e-5)
WARNING:tensorflow:`epsilon` argument is deprecated and will be removed, use `min_delta` instead.
from keras.optimizers import RMSprop, Adam
#opt=Adam(lr=0.005) #Default 0.001
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit([X_train,X_train],Y_train,
epochs=200,
batch_size=256,
shuffle=True,
callbacks=[h])
Epoch 1/200 1/1 [==============================] - 5s 5s/step - loss: 3.3220 Epoch 2/200 1/1 [==============================] - 0s 136ms/step - loss: 3.2955 Epoch 3/200 1/1 [==============================] - 0s 120ms/step - loss: 3.2697 Epoch 4/200 1/1 [==============================] - 0s 114ms/step - loss: 3.2442 Epoch 5/200 1/1 [==============================] - 0s 114ms/step - loss: 3.2187 Epoch 6/200 1/1 [==============================] - 0s 113ms/step - loss: 3.1930 Epoch 7/200 1/1 [==============================] - 0s 112ms/step - loss: 3.1668 Epoch 8/200 1/1 [==============================] - 0s 114ms/step - loss: 3.1397 Epoch 9/200 1/1 [==============================] - 0s 112ms/step - loss: 3.1115 Epoch 10/200 1/1 [==============================] - 0s 132ms/step - loss: 3.0817 Epoch 11/200 1/1 [==============================] - 0s 122ms/step - loss: 3.0499 Epoch 12/200 1/1 [==============================] - 0s 121ms/step - loss: 3.0155 Epoch 13/200 1/1 [==============================] - 0s 120ms/step - loss: 2.9776 Epoch 14/200 1/1 [==============================] - 0s 112ms/step - loss: 2.9354 Epoch 15/200 1/1 [==============================] - 0s 118ms/step - loss: 2.8873 Epoch 16/200 1/1 [==============================] - 0s 119ms/step - loss: 2.8319 Epoch 17/200 1/1 [==============================] - 0s 126ms/step - loss: 2.7673 Epoch 18/200 1/1 [==============================] - 0s 121ms/step - loss: 2.6916 Epoch 19/200 1/1 [==============================] - 0s 123ms/step - loss: 2.6029 Epoch 20/200 1/1 [==============================] - 0s 120ms/step - loss: 2.5000 Epoch 21/200 1/1 [==============================] - 0s 112ms/step - loss: 2.3874 Epoch 22/200 1/1 [==============================] - 0s 118ms/step - loss: 2.2937 Epoch 23/200 1/1 [==============================] - 0s 113ms/step - loss: 2.2500 Epoch 24/200 1/1 [==============================] - 0s 110ms/step - loss: 2.2352 Epoch 25/200 1/1 [==============================] - 0s 110ms/step - loss: 2.2243 Epoch 26/200 1/1 [==============================] - 0s 137ms/step - loss: 2.2074 Epoch 27/200 1/1 [==============================] - 0s 119ms/step - loss: 2.1816 Epoch 28/200 1/1 [==============================] - 0s 116ms/step - loss: 2.1479 Epoch 29/200 1/1 [==============================] - 0s 113ms/step - loss: 2.1084 Epoch 30/200 1/1 [==============================] - 0s 115ms/step - loss: 2.0706 Epoch 31/200 1/1 [==============================] - 0s 111ms/step - loss: 2.0446 Epoch 32/200 1/1 [==============================] - 0s 112ms/step - loss: 2.0314 Epoch 33/200 1/1 [==============================] - 0s 123ms/step - loss: 2.0232 Epoch 34/200 1/1 [==============================] - 0s 189ms/step - loss: 2.0120 Epoch 35/200 1/1 [==============================] - 0s 193ms/step - loss: 1.9938 Epoch 36/200 1/1 [==============================] - 0s 202ms/step - loss: 1.9674 Epoch 37/200 1/1 [==============================] - 0s 186ms/step - loss: 1.9345 Epoch 38/200 1/1 [==============================] - 0s 201ms/step - loss: 1.9001 Epoch 39/200 1/1 [==============================] - 0s 185ms/step - loss: 1.8694 Epoch 40/200 1/1 [==============================] - 0s 196ms/step - loss: 1.8434 Epoch 41/200 1/1 [==============================] - 0s 204ms/step - loss: 1.8172 Epoch 42/200 1/1 [==============================] - 0s 183ms/step - loss: 1.7916 Epoch 43/200 1/1 [==============================] - 0s 194ms/step - loss: 1.7724 Epoch 44/200 1/1 [==============================] - 0s 197ms/step - loss: 1.7556 Epoch 45/200 1/1 [==============================] - 0s 190ms/step - loss: 1.7357 Epoch 46/200 1/1 [==============================] - 0s 194ms/step - loss: 1.7150 Epoch 47/200 1/1 [==============================] - 0s 185ms/step - loss: 1.7010 Epoch 48/200 1/1 [==============================] - 0s 188ms/step - loss: 1.6811 Epoch 49/200 1/1 [==============================] - 0s 193ms/step - loss: 1.6680 Epoch 50/200 1/1 [==============================] - 0s 192ms/step - loss: 1.6530 Epoch 51/200 1/1 [==============================] - 0s 200ms/step - loss: 1.6398 Epoch 52/200 1/1 [==============================] - 0s 186ms/step - loss: 1.6267 Epoch 53/200 1/1 [==============================] - 0s 191ms/step - loss: 1.6193 Epoch 54/200 1/1 [==============================] - 0s 185ms/step - loss: 1.6057 Epoch 55/200 1/1 [==============================] - 0s 188ms/step - loss: 1.6004 Epoch 56/200 1/1 [==============================] - 0s 204ms/step - loss: 1.5890 Epoch 57/200 1/1 [==============================] - 0s 195ms/step - loss: 1.5803 Epoch 58/200 1/1 [==============================] - 0s 216ms/step - loss: 1.5746 Epoch 59/200 1/1 [==============================] - 0s 228ms/step - loss: 1.5636 Epoch 60/200 1/1 [==============================] - 0s 201ms/step - loss: 1.5581 Epoch 61/200 1/1 [==============================] - 0s 205ms/step - loss: 1.5516 Epoch 62/200 1/1 [==============================] - 0s 218ms/step - loss: 1.5440 Epoch 63/200 1/1 [==============================] - 0s 197ms/step - loss: 1.5396 Epoch 64/200 1/1 [==============================] - 0s 205ms/step - loss: 1.5332 Epoch 65/200 1/1 [==============================] - 0s 212ms/step - loss: 1.5287 Epoch 66/200 1/1 [==============================] - 0s 202ms/step - loss: 1.5240 Epoch 67/200 1/1 [==============================] - 0s 203ms/step - loss: 1.5189 Epoch 68/200 1/1 [==============================] - 0s 201ms/step - loss: 1.5151 Epoch 69/200 1/1 [==============================] - 0s 196ms/step - loss: 1.5099 Epoch 70/200 1/1 [==============================] - 0s 189ms/step - loss: 1.5061 Epoch 71/200 1/1 [==============================] - 0s 198ms/step - loss: 1.5016 Epoch 72/200 1/1 [==============================] - 0s 212ms/step - loss: 1.4975 Epoch 73/200 1/1 [==============================] - 0s 212ms/step - loss: 1.4943 Epoch 74/200 1/1 [==============================] - 0s 195ms/step - loss: 1.4902 Epoch 75/200 1/1 [==============================] - 0s 195ms/step - loss: 1.4865 Epoch 76/200 1/1 [==============================] - 0s 195ms/step - loss: 1.4834 Epoch 77/200 1/1 [==============================] - 0s 201ms/step - loss: 1.4807 Epoch 78/200 1/1 [==============================] - 0s 194ms/step - loss: 1.4798 Epoch 79/200 1/1 [==============================] - 0s 201ms/step - loss: 1.4778 Epoch 80/200 1/1 [==============================] - 0s 197ms/step - loss: 1.4809 Epoch 81/200 1/1 [==============================] - 0s 192ms/step - loss: 1.4669 Epoch 82/200 1/1 [==============================] - 0s 195ms/step - loss: 1.4830 Epoch 83/200 1/1 [==============================] - 0s 195ms/step - loss: 1.4899 Epoch 84/200 1/1 [==============================] - 0s 194ms/step - loss: 1.4959 Epoch 85/200 1/1 [==============================] - 0s 200ms/step - loss: 1.4657 Epoch 86/200 1/1 [==============================] - 0s 199ms/step - loss: 1.4808 Epoch 87/200 1/1 [==============================] - 0s 211ms/step - loss: 1.4698 Epoch 88/200 1/1 [==============================] - 0s 214ms/step - loss: 1.4607 Epoch 89/200 1/1 [==============================] - 0s 202ms/step - loss: 1.4676 Epoch 90/200 1/1 [==============================] - 0s 202ms/step - loss: 1.4644 Epoch 91/200 1/1 [==============================] - 0s 194ms/step - loss: 1.4513 Epoch 92/200 1/1 [==============================] - 0s 201ms/step - loss: 1.4515 Epoch 93/200 1/1 [==============================] - 0s 197ms/step - loss: 1.4535 Epoch 94/200 1/1 [==============================] - 0s 193ms/step - loss: 1.4420 Epoch 95/200 1/1 [==============================] - 0s 196ms/step - loss: 1.4480 Epoch 96/200 1/1 [==============================] - 0s 203ms/step - loss: 1.4430 Epoch 97/200 1/1 [==============================] - 0s 197ms/step - loss: 1.4365 Epoch 98/200 1/1 [==============================] - 0s 196ms/step - loss: 1.4408 Epoch 99/200 1/1 [==============================] - 0s 198ms/step - loss: 1.4311 Epoch 100/200 1/1 [==============================] - 0s 201ms/step - loss: 1.4327 Epoch 101/200 1/1 [==============================] - 0s 209ms/step - loss: 1.4308 Epoch 102/200 1/1 [==============================] - 0s 196ms/step - loss: 1.4251 Epoch 103/200 1/1 [==============================] - 0s 200ms/step - loss: 1.4261 Epoch 104/200 1/1 [==============================] - 0s 196ms/step - loss: 1.4224 Epoch 105/200 1/1 [==============================] - 0s 200ms/step - loss: 1.4190 Epoch 106/200 1/1 [==============================] - 0s 192ms/step - loss: 1.4191 Epoch 107/200 1/1 [==============================] - 0s 143ms/step - loss: 1.4149 Epoch 108/200 1/1 [==============================] - 0s 138ms/step - loss: 1.4129 Epoch 109/200 1/1 [==============================] - 0s 126ms/step - loss: 1.4116 Epoch 110/200 1/1 [==============================] - 0s 117ms/step - loss: 1.4077 Epoch 111/200 1/1 [==============================] - 0s 116ms/step - loss: 1.4074 Epoch 112/200 1/1 [==============================] - 0s 116ms/step - loss: 1.4031 Epoch 113/200 1/1 [==============================] - 0s 123ms/step - loss: 1.4028 Epoch 114/200 1/1 [==============================] - 0s 113ms/step - loss: 1.3986 Epoch 115/200 1/1 [==============================] - 0s 116ms/step - loss: 1.3979 Epoch 116/200 1/1 [==============================] - 0s 120ms/step - loss: 1.3944 Epoch 117/200 1/1 [==============================] - 0s 140ms/step - loss: 1.3938 Epoch 118/200 1/1 [==============================] - 0s 131ms/step - loss: 1.3902 Epoch 119/200 1/1 [==============================] - 0s 118ms/step - loss: 1.3892 Epoch 120/200 1/1 [==============================] - 0s 127ms/step - loss: 1.3860 Epoch 121/200 1/1 [==============================] - 0s 120ms/step - loss: 1.3846 Epoch 122/200 1/1 [==============================] - 0s 124ms/step - loss: 1.3824 Epoch 123/200 1/1 [==============================] - 0s 115ms/step - loss: 1.3797 Epoch 124/200 1/1 [==============================] - 0s 113ms/step - loss: 1.3788 Epoch 125/200 1/1 [==============================] - 0s 133ms/step - loss: 1.3761 Epoch 126/200 1/1 [==============================] - 0s 116ms/step - loss: 1.3734 Epoch 127/200 1/1 [==============================] - 0s 121ms/step - loss: 1.3726 Epoch 128/200 1/1 [==============================] - 0s 122ms/step - loss: 1.3708 Epoch 129/200 1/1 [==============================] - 0s 121ms/step - loss: 1.3674 Epoch 130/200 1/1 [==============================] - 0s 113ms/step - loss: 1.3655 Epoch 131/200 1/1 [==============================] - 0s 125ms/step - loss: 1.3648 Epoch 132/200 1/1 [==============================] - 0s 119ms/step - loss: 1.3630 Epoch 133/200 1/1 [==============================] - 0s 130ms/step - loss: 1.3609 Epoch 134/200 1/1 [==============================] - 0s 119ms/step - loss: 1.3578 Epoch 135/200 1/1 [==============================] - 0s 113ms/step - loss: 1.3552 Epoch 136/200 1/1 [==============================] - 0s 121ms/step - loss: 1.3528 Epoch 137/200 1/1 [==============================] - 0s 116ms/step - loss: 1.3509 Epoch 138/200 1/1 [==============================] - 0s 124ms/step - loss: 1.3492 Epoch 139/200 1/1 [==============================] - 0s 114ms/step - loss: 1.3478 Epoch 140/200 1/1 [==============================] - 0s 117ms/step - loss: 1.3478 Epoch 141/200 1/1 [==============================] - 0s 134ms/step - loss: 1.3487 Epoch 142/200 1/1 [==============================] - 0s 118ms/step - loss: 1.3542 Epoch 143/200 1/1 [==============================] - 0s 119ms/step - loss: 1.3429 Epoch 144/200 1/1 [==============================] - 0s 126ms/step - loss: 1.3372 Epoch 145/200 1/1 [==============================] - 0s 118ms/step - loss: 1.3386 Epoch 146/200 1/1 [==============================] - 0s 117ms/step - loss: 1.3349 Epoch 147/200 1/1 [==============================] - 0s 137ms/step - loss: 1.3312 Epoch 148/200 1/1 [==============================] - 0s 124ms/step - loss: 1.3313 Epoch 149/200 1/1 [==============================] - 0s 124ms/step - loss: 1.3283 Epoch 150/200 1/1 [==============================] - 0s 121ms/step - loss: 1.3251 Epoch 151/200 1/1 [==============================] - 0s 117ms/step - loss: 1.3242 Epoch 152/200 1/1 [==============================] - 0s 118ms/step - loss: 1.3220 Epoch 153/200 1/1 [==============================] - 0s 121ms/step - loss: 1.3190 Epoch 154/200 1/1 [==============================] - 0s 125ms/step - loss: 1.3172 Epoch 155/200 1/1 [==============================] - 0s 116ms/step - loss: 1.3157 Epoch 156/200 1/1 [==============================] - 0s 114ms/step - loss: 1.3139 Epoch 157/200 1/1 [==============================] - 0s 138ms/step - loss: 1.3112 Epoch 158/200 1/1 [==============================] - 0s 129ms/step - loss: 1.3087 Epoch 159/200 1/1 [==============================] - 0s 122ms/step - loss: 1.3063 Epoch 160/200 1/1 [==============================] - 0s 135ms/step - loss: 1.3042 Epoch 161/200 1/1 [==============================] - 0s 128ms/step - loss: 1.3021 Epoch 162/200 1/1 [==============================] - 0s 118ms/step - loss: 1.3002 Epoch 163/200 1/1 [==============================] - 0s 117ms/step - loss: 1.2992 Epoch 164/200 1/1 [==============================] - 0s 126ms/step - loss: 1.3038 Epoch 165/200 1/1 [==============================] - 0s 127ms/step - loss: 1.3426 Epoch 166/200 1/1 [==============================] - 0s 131ms/step - loss: 1.2966 Epoch 167/200 1/1 [==============================] - 0s 121ms/step - loss: 1.3048 Epoch 168/200 1/1 [==============================] - 0s 132ms/step - loss: 1.3118 Epoch 169/200 1/1 [==============================] - 0s 119ms/step - loss: 1.3086 Epoch 170/200 1/1 [==============================] - 0s 126ms/step - loss: 1.2925 Epoch 171/200 1/1 [==============================] - 0s 129ms/step - loss: 1.3108 Epoch 172/200 1/1 [==============================] - 0s 125ms/step - loss: 1.2932 Epoch 173/200 1/1 [==============================] - 0s 128ms/step - loss: 1.3093 Epoch 174/200 1/1 [==============================] - 0s 120ms/step - loss: 1.3010 Epoch 175/200 1/1 [==============================] - 0s 122ms/step - loss: 1.2865 Epoch 176/200 1/1 [==============================] - 0s 124ms/step - loss: 1.2922 Epoch 177/200 1/1 [==============================] - 0s 129ms/step - loss: 1.2933 Epoch 178/200 1/1 [==============================] - 0s 117ms/step - loss: 1.2805 Epoch 179/200 1/1 [==============================] - 0s 123ms/step - loss: 1.2801 Epoch 180/200 1/1 [==============================] - 0s 117ms/step - loss: 1.2833 Epoch 181/200 1/1 [==============================] - 0s 141ms/step - loss: 1.2761 Epoch 182/200 1/1 [==============================] - 0s 119ms/step - loss: 1.2701 Epoch 183/200 1/1 [==============================] - 0s 136ms/step - loss: 1.2745 Epoch 184/200 1/1 [==============================] - 0s 148ms/step - loss: 1.2691 Epoch 185/200 1/1 [==============================] - 0s 189ms/step - loss: 1.2636 Epoch 186/200 1/1 [==============================] - 0s 205ms/step - loss: 1.2666 Epoch 187/200 1/1 [==============================] - 0s 210ms/step - loss: 1.2621 Epoch 188/200 1/1 [==============================] - 0s 195ms/step - loss: 1.2572 Epoch 189/200 1/1 [==============================] - 0s 196ms/step - loss: 1.2592 Epoch 190/200 1/1 [==============================] - 0s 189ms/step - loss: 1.2538 Epoch 191/200 1/1 [==============================] - 0s 192ms/step - loss: 1.2525 Epoch 192/200 1/1 [==============================] - 0s 212ms/step - loss: 1.2520 Epoch 193/200 1/1 [==============================] - 0s 193ms/step - loss: 1.2468 Epoch 194/200 1/1 [==============================] - 0s 189ms/step - loss: 1.2472 Epoch 195/200 1/1 [==============================] - 0s 199ms/step - loss: 1.2443 Epoch 196/200 1/1 [==============================] - 0s 193ms/step - loss: 1.2412 Epoch 197/200 1/1 [==============================] - 0s 208ms/step - loss: 1.2408 Epoch 198/200 1/1 [==============================] - 0s 194ms/step - loss: 1.2366 Epoch 199/200 1/1 [==============================] - 0s 190ms/step - loss: 1.2362 Epoch 200/200 1/1 [==============================] - 0s 190ms/step - loss: 1.2330
<keras.callbacks.History at 0x7db0ab4610f0>
plt.plot(h.history["loss"], label="Loss")
#plt.plot(h.history["val_loss"], label="Val_Loss")
plt.yscale("log")
plt.legend()
<matplotlib.legend.Legend at 0x7db0ab460e50>
for i in range(26):
v = model.predict([X_test[i:i+1], X_test[i:i+1]]) #Can't be done as output not necessarely 1
idxs = np.argmax(v, axis=2)
pred= "".join([int_to_char[h] for h in idxs[0]])[:-1]
idxs2 = np.argmax(X_test[i:i+1], axis=2)
true = "".join([int_to_char[k] for k in idxs2[0]])[1:]
if true != pred:
print(true,pred)
1/1 [==============================] - 1s 1s/step CN1CC[C@@H](Nc2ncc(c3C=C(C)C(=O)Nc23)c4cncc(C)c4)[C@@H](C1)OCC5CCS(=O)(=O)CC5EEE CCCCCCCCCCCCCCccccccccc))))C)C))CCccccccccccccccccCCEECECCCCCEEEEEEEEEE=))C))))E 1/1 [==============================] - 0s 57ms/step CCN1C=C(c2cccc(c2)C(F)(F)F)c3sc(cc3C1=O)C(=N)NC4CCS(=O)(=O)CC4EEEEEEEEEEEEEEEEEE CCCCCCCCCcccccccccc)))CC)c)cccccccccc)))CCCCCCCCCCC(CO)(=O)cEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 45ms/step CN1C=C(c2cccc(c2)C#N)c3sc(cc3C1=O)C(=N)NC4CCS(=O)(=O)CC4EEEEEEEEEEEEEEEEEEEEEEEE CCCCCCCCcccccccccc)c)Cccccccccc)))CCCCCCCCCCC(CC)C=O)c)EEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 55ms/step COc1cc(ccc1S(=O)(=O)Nc2ccc3N(C)C(=O)C(=Cc3c2)C)C#NEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCccccccccccC))CC))c)cccccccCCCCCC)CCCC)cccccCCCCCEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 53ms/step CC(C)CS(=O)(=O)N[C@H]1CCC(=O)N([C@@H]1c2ccc(Cl)cc2)c3ccc4C(=CC(=O)N(C)c4c3)CEEEE CCCCCcC(CCCccc)ccCCCCCCCCCCCCCCCCCCCCCCccccccc)CcccccccccccCC)CCC)CCC)CcccccEEEE 1/1 [==============================] - 0s 47ms/step CC(=O)c1cc(c2cc(ccc2C)C(=O)NC3CC3)c4ncccn14EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCccccccccccccccccc)c)CC)CCCCCCCCcccccccccEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 57ms/step CC(=O)c1cc(c2cccc(C)n2)c3cc(ccn13)N4CCOCC4EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCcccccccccccccccc)cccccccccccccccCCCCCCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 47ms/step CC(C)CS(=O)(=O)N[C@H]1CCC(=O)N([C@@H]1c2ccc(Cl)cc2)c3ccc4C(=CC(=O)N(C)c4c3)CEEEE CCCCCcC(CCCccc)ccCCCCCCCCCCCCCCCCCCCCCCccccccc)CcccccccccccCC)CCC)CCC)CcccccEEEE 1/1 [==============================] - 0s 51ms/step CC(N1CCC(CC1)N(C)C(=O)C2=CN(CC=C)C(=O)c3[nH]ccc23)c4ccccc4EEEEEEEEEEEEEEEEEEEEEE CCCCCCCCCCCCCCCCCCCCC)C)CC)CCCCC)CCC))CcccccccEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 52ms/step CC(N1CCC(CC1)N(C)C(=O)C2=CN(C)C(=O)c3[nH]ccc23)c4ccccc4EEEEEEEEEEEEEEEEEEEEEEEEE CCCC(CCCCCCCCCCCCCCCC)C)CC)CCCCCCC)CcccccccccEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 50ms/step CN(C)c1ncc2C(=O)N(C)C=Cc2n1EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCcccccccccCC)CCCCCCC)cccCCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 53ms/step CCCOc1ccn2c(cc(c3cccc(OC)c3)c2c1)C(=O)CEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCcccccccccccccccccccc)ccccccccCCCCCCCCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 48ms/step COc1cc2N(C)C(=O)C=C(C)c2cc1NS(=O)(=O)c3ccc(cc3)C#NEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCccccccCCCCCCCCCC)C)CcccccCCCC)CC))cccccccccccEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 50ms/step COc1ccccc1C(=O)Nc2cc3C=C(C)C(=O)N(C)c3cc2N4CCCC4EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCcccccccccCC)CCcccccc))))CCC))C)C)CccccccCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 50ms/step COc1cc2N(C)C(=O)C(=Cc2cc1NS(=O)(=O)c3ccc(cc3OC)C#N)CEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCccccccCCCCCCCCCCC)ccccc)(CC)CC))ccccccccccc)CCCCCCEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 56ms/step CC(=O)c1cc(c2ccccc2S(=O)(=O)C)c3ccccn13EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCcccccccccccccccccc)))))))c)ccccccccccEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 54ms/step COc1ccc(cc1OC)C2=CN(C)C(=O)c3cc(sc23)C(=O)N4CCN(CC4)S(=O)(=O)CEEEEEEEEEEEEEEEEEE CCCccccccccccCCCCCCCCCCCCC)CccccccccccCCCCCCCCCCCCCCC(CO)(=O)cEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 47ms/step CCCOc1ccc2c(c1)c(cn2C(=O)C)C3=C(C)N(NC3=O)C(=O)c4ccc(OC)cc4EEEEEEEEEEEEEEEEEEEEE CCCCcccccccccccCccccc))))CCCCCCCCCCCCCCCC)CCCC)CEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 60ms/step C[C@@H]1CC(=O)Nc2cccc(N3CCCc4cc(c5cnn(C)c5)c(cc34)C(F)F)c2N1EEEEEEEEEEEEEEEEEEEE CCCCCCCCCCCCCCCCccccccccc))))CcccccccccccccccccccccCCCCCCccEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 57ms/step CN1CCCC(C1)NC2=C(Cl)C(=O)N(C)N=C2EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCCCCCCCCCCCCCCCC)C(C)))(C)CCCCCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 49ms/step Cc1nnc2c3ccccc3c(nn12)c4ccc(N5CCOCC5)c(NS(=O)(=O)c6ccc(Cl)cc6)c4EEEEEEEEEEEEEEEE CCcccccccccccccccccccccccccccCCCCCCCCCCCC((O))=O)ccccccccEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 50ms/step COc1ccc(cc1OC)C2=CN(C)C(=O)c3cc(sc23)C(=O)NC4CCS(=O)(=O)CC4EEEEEEEEEEEEEEEEEEEEE CCCccccccccccCCCCCCCCCCCCC)CccccccccccCCCCCCCCCC(CC)CCO)C)EEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 49ms/step CN1C[C@H](C[C@H](C1)c2ccccc2)NC3=C(Cl)C(=O)N(C)N=C3EEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCCCCCCCCCCCCCCCCCCccccccccccCCCCCCCCCCC)CCCCCCCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 58ms/step CNC(=N)c1cc2C(=O)N(C)C=C(c3ccc(OC)c(OC)c3)c2s1EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCCC(CccccccCC)CCC)CCC)Cccccccc)ccc))CccccccCEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 1/1 [==============================] - 0s 57ms/step CC(C)CS(=O)(=O)N[C@H]1CCC(=O)N([C@@H]1c2ccc(Cl)cc2)c3ccc4C(=CC(=O)N(C)c4c3)CEEEE CCCCCcC(CCCccc)ccCCCCCCCCCCCCCCCCCCCCCCccccccc)CcccccccccccCC)CCC)CCC)CcccccEEEE 1/1 [==============================] - 0s 66ms/step CN1C(=O)C=Cc2cc(NS(=O)(=O)c3ccc(cc3)C#N)ccc12EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE CCCCCCCCcCCCccccc(()))))))ccccccccccccc)CcccccEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
data_training ='/drive/My Drive/smiles'
smifile = data_training + '/Results.txt'
data = pd.read_csv(smifile, delimiter = "\t", index_col=False)
data.head()
Real_Smlies predicted_Smlies | |
---|---|
0 | CN1CC[C@@H](Nc2ncc(c3C=C(C)C(=O)Nc23)c4cncc(C)... |
1 | CCN1C=C(c2cccc(c2)C(F)(F)F)c3sc(cc3C1=O)C(=N)N... |
2 | CN1C=C(c2cccc(c2)C#N)c3sc(cc3C1=O)C(=N)NC4CCS(... |
3 | COc1cc(ccc1S(=O)(=O)Nc2ccc3N(C)C(=O)C(=Cc3c2)C... |
4 | CC(C)CS(=O)(=O)N[C@H]1CCC(=O)N([C@@H]1c2ccc(Cl... |