If life gives you RNN, make a calculator 🙂
A Recurrent Neural Network is one of a classic artificial neural network, where the connections between the nodes form a sequential directed graph. RNNs are famous for applications like speech recognition, handwriting recognition, etc because of their internal state memory for processing variable-length sequences.
Top Machine Learning Courses & AI Courses Online
RNNs are further classified into two types. The first one is a finite impulse whose neural network is in the form of a directed acyclic graph where one node can be connected with one or more nodes that are ahead with no visible cycle in the network. Another one is an infinite impulse whose neural network is in the form of a directed cyclic graph which cannot be unrolled into a feed-forward neural network.
Trending Machine Learning Skills
Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
What We Gonna Do?
Let’s build a model that predicts the output of an arithmetic expression. For example, if I give an input ‘11+88’, then the model should predict the next word in the sequence as ‘99’. The input and output are a sequence of characters since an RNN deals with sequential data.
Now designing the architecture of the model looks like a simple task when compared to dataset collection. Generating data or gathering dataset is a strenuous task because data hunger AI models require a fair amount of data for acceptable accuracy.
So this model can be implemented in 6 basic steps:
- Generating data
- Building a model
- Vectorising and De-vectorising the data
- Making a dataset
- Training the model
- Testing the model
Before we dive into implementing the model, let’s just import all the required libraries.
import numpy as np
import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, SimpleRNN, RepeatVector, TimeDistributed from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback from termcolor import colored |
1. Generating Data
Let’s define a char string containing all the characters we need for writing a basic arithmetic equation. So, the string consists of all the characters from 0-9 and all the arithmetic operators like /, *, +, -, .(decimal).
We cannot directly feed the numerical data into our model, we need to pass the data in the form of tensors. Converting the string in the data to a one-hot encoded vector will give us an optimized model performance. A one-hot encoded vector is an array with a length the same as the length of our char string, each one-hot vector has ones only at the respective index of character present in each string.
For example, let’s say our character string is ‘0123456789’, and if we want to encode a string like ‘12’ then the one-hot vector would be [ [0,1,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0] ]. To do that we need to create two dictionaries with one index as keys and chars as values and the other as vice-versa.
char_string = ‘0123456789/*+-.‘
num_chars = len(char_string)Â Â Â character_to_index = dict((c, i) for i, c in enumerate(char_string)) index_to_character = dict((i, c) for i, c in enumerate(char_string)) |
Now let’s write a function that returns a random arithmetic equation along with the result of that equation.
def division(n, d):
    return n / d if d!=0 else 0 def datagen():     random1 = np.random.randint(low=0,high=100)     random2 = np.random.randint(low=0,high=100)     op = np.random.randint(low=0, high=4)     if op==1:         arith = str(random1) + ‘+‘ + str(random2)         res = str(random1+random2)     elif op==1:         arith = str(random1) + ‘–‘ + str(random2)         res = str(random1–random2)     elif op==2:         arith = str(random1) + ‘*‘ + str(random2)         res = str(random1*random2)     else:         arith = str(random1) + ‘/‘ + str(random2)         res = str(round(division(random1, random2),2))     return arith, res |
Also Read: Interesting Neural Network Project Ideas
2. Building A Model
The model will have an encoder and a decoder. The encoder is a simple RNN model with input shape as (None,num_chars) and 128 hidden units, the reason why we choose hidden units as 32,64,128, etc is because of the better performance of CPU or GPU with hidden units as powers of 2.
Our encoder will be a fully connected network and the output of these will be fed back into the network, that is how an RNN works. An RNN layer uses ‘tanh’ activation by default, we are not going to change because it best fits the encoder. The output of this layer will be a single vector and to attain a single vector of the whole output we’ll use the RepeatVector() layer with the required number of times as a parameter.
Now the output vector will have the essence of the input given, and this vector will be fed into the decoder.
The decoder is comprised of a simple RNN layer and this will generate the output sequence since we need the RNN layer to return the predicted sequence we are going to flag the ‘return_sequences’ as True. By assigning the ‘return_sequences’ as True, the RNN layer will return the predicted sequence for each time step(many to many RNN).
The output of this RNN layer is fed into a Dense layer with ‘num_chars’ number of hidden units and we’ll use softmax activation since we need the probability of each character. Before we deploy a Dense layer, we need to abridge this layer into a TimeDistributed layer because we need to deploy the Dense layer for output of each time step.
hidden_units = 128
max_time_steps = 5Â Â #we are hardcoding the output to be of 5 characters def model(): Â Â model = Sequential() Â Â model.add(SimpleRNN(hidden_units, input_shape=(None, num_chars))) Â Â model.add(RepeatVector(max_time_steps)) Â Â model.add(SimpleRNN(hidden_units, return_sequences=True)) Â Â model.add(TimeDistributed(Dense(num_chars, activation=‘softmax‘))) Â Â Â Â return model model = model() model.summary() Â model.compile(loss=‘categorical_crossentropy‘, optimizer=‘adam‘, metrics=[‘accuracy‘]) |
The architecture of the model will be as shown above
Must Read:Â Neural Network Tutorial
3. Vectorizing and De-vectorizing the Data
Let’s define functions for vectorizing and de-vectorizing the data.
Here’s the function for vectorizing the arithmetic expression and the result together.
def vectorize(arith, res):
    x = np.zeros((max_time_steps, num_chars))                       y = np.zeros((max_time_steps, num_chars))    x_remaining = max_time_steps – len(arith)                             y_remaining = max_time_steps – len(res)                              for i, c in enumerate(arith):                   x[x_remaining+i, character_to_index[c]] = 1         for i in range(x_remaining):                           x[i, character_to_index[‘0‘]] = 1                for i, c in enumerate(res):         y[y_remaining+i, character_to_index[c]] = 1     for i in range(y_remaining):         y[i, character_to_index[‘0‘]] = 1     return x, y |
Similarly here’s the function for de-vectorizing the string. Since the output we receive is a vector of probabilities, we’ll use np.argmax() for picking the character with the highest probability. Now the index_to_character dictionary is used to trace back the character at that index.
def devectorize(input):
    res = [index_to_character[np.argmax(vec)] for i, vec in enumerate(input)]     return ‘‘.join(res) |
Now the constraint we have with the ‘devectorize’ function is, it is going to pad the trailing characters with zeroes. For example, if the input vector is (‘1-20’, ‘-19’) then the de-vectorized output will be (‘01-20’, ‘00-19’). We need to take care of these extra padded zeroes. Let’s write a function for stripping the string.
def stripping(input):
    flag = False     output = ‘‘     for c in input:         if not flag and c == ‘0‘:             continue         if c == ‘+‘ or c == ‘–‘ or c==‘*‘ or c==‘/‘ or c==‘.‘:             flag = False         else:             flag = True         output += c     return output |
4. Making A Dataset
Now that we are done with defining a function for generating the data, let’s use that function and make a dataset with many such (arithmetic expression, result) pairs.
def create_dataset(num_equations):
    x_train = np.zeros((num_equations, max_time_steps, num_chars))                y_train = np.zeros((num_equations, max_time_steps, num_chars))                for i in range(num_equations):                                                   e, l = datagen()                                                        x, y = vectorize(e, l)                                                  x_train[i] = x                                                                y_train[i] = y                                                            return x_train, y_train |
5. Training the Model
Let’s create a dataset of 50,000 samples which is a fair number to train our data hunger model, we’ll use 25% of this data for validation. Also, let’s create a callback for intelligent training interruption if the accuracy remains unchanged for 8 epochs. This can be achieved by setting the patience parameter to 8.
x_train, y_train = create_dataset(50000)
simple_logger = LambdaCallback(     on_epoch_end=lambda e, l: print(‘{:.2f}‘.format(l[‘val_accuracy‘]), end=‘ _ ‘) ) early_stopping = EarlyStopping(monitor=‘val_loss‘, patience=8) model.fit(x_train, y_train, epochs=100, validation_split=0.25, verbose=0,            callbacks=[simple_logger, early_stopping]) |
6. Testing the Model
Now let’s test our model by creating a dataset of the size 30.
x_test, y_test = create_dataset(num_equations=20)
preds = model.predict(x_test) full_seq_acc = 0 for i, pred in enumerate(preds): Â Â Â Â pred_str = stripping(devectorize(pred)) Â Â Â Â y_test_str = stripping(devectorize(y_test[i])) Â Â Â Â x_test_str = stripping(devectorize(x_test[i])) Â Â Â Â col = ‘green‘ if pred_str == y_test_str else ‘red‘ Â Â Â Â full_seq_acc += 1/len(preds) * int(pred_str == y_test_str) Â Â Â Â outstring = ‘Input: {}, Output: {}, Prediction: {}‘.format(x_test_str, y_test_str, pred_str) Â Â Â Â print(colored(outstring, col)) print(‘\nFull sequence accuracy: {:.3f} %‘.format(100 * full_seq_acc)) |
The output will be as follows
We can see the accuracy is little poor here, anyways we can optimize it by tweaking a few hyperparameters like the number of hidden units, validation split, number of epochs, etc.
Popular AI and ML Blogs & Free Courses
ConclusionÂ
We’ve understood the basic workflow of an RNN, understood that RNNs are best suited for sequential data, generated a dataset of random arithmetic equations, developed a sequential model for predicting the output of a basic arithmetic expression, trained that model with the dataset which we’ve created, and finally tested that model with a small dataset which the model has never seen before.
If you’re interested to learn more about RNN, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.
What are the different types of neural networks in machine learning?
In machine learning, artificial neural networks are basically computational models that have been designed to resemble the human brain. There are different kinds of artificial neural networks that machine learning employs based on the mathematical computation that needs to be achieved. These neural networks are a subset of different machine learning techniques that learn from data in different ways. Some of the most widely used types of neural networks are – recurrent neural network – long short term memory, feedforward neural network – artificial neuron, radial basis function neural network, Kohonen self-organizing neural network, convolutional neural network, and modular neural network, among others.
What are the advantages of a recurrent neural network?
Recurrent neural networks are among the most commonly used artificial neural networks in deep learning and machine learning. In this type of neural network model, the result obtained from the previous step is fed as input to the subsequent step. A recurrent neural network comes with several advantages like – it can retain every bit of information over time, including its previous inputs, which makes it ideal for time series prediction. This type is the best instance of long-short memory. Also, recurrent neural networks provide constructive pixel neighborhood by using convolutional layers.
How are neural networks employed in real-world applications?
Artificial neural networks are an integral part of deep learning, which again is a super-specialized branch of machine learning and artificial intelligence. Neural networks are used across different industries to achieve various critical objectives. Some of the most interesting real-world applications of artificial neural networks include stock market forecasting, facial recognition, high-performance auto-piloting and fault diagnosis in the aerospace industry, analysis of armed attacks and object location in the defence sector, image processing, drug discovery and disease detection in the healthcare sector, verification of signature, handwriting analysis, weather forecasting and social media trend forecasting, among others.