Solving Basic Math Equation Using RNN [With Coding Example]

# Solving Basic Math Equation Using RNN [With Coding Example]

Last updated:
7th Dec, 2020
Views
10 Mins
View All

If life gives you RNN, make a calculator 🙂

A Recurrent Neural Network is one of a classic artificial neural network, where the connections between the nodes form a sequential directed graph. RNNs are famous for applications like speech recognition, handwriting recognition, etc because of their internal state memory for processing variable-length sequences.

## Top Machine Learning and AI Courses Online

 Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification

RNNs are further classified into two types. The first one is a finite impulse whose neural network is in the form of a directed acyclic graph where one node can be connected with one or more nodes that are ahead with no visible cycle in the network. Another one is an infinite impulse whose neural network is in the form of a directed cyclic graph which cannot be unrolled into a feed-forward neural network.

## Trending Machine Learning Skills

 AI Courses Tableau Certification Natural Language Processing Deep Learning AI

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

## What We Gonna Do?

Let’s build a model that predicts the output of an arithmetic expression. For example, if I give an input ‘11+88’, then the model should predict the next word in the sequence as ‘99’. The input and output are a sequence of characters since an RNN deals with sequential data.

Now designing the architecture of the model looks like a simple task when compared to dataset collection. Generating data or gathering dataset is a strenuous task because data hunger AI models require a fair amount of data for acceptable accuracy.

So this model can be implemented in 6 basic steps:

1. Generating data
2. Building a model
3. Vectorising and De-vectorising the data
4. Making a dataset
5. Training the model
6. Testing the model

Before we dive into implementing the model, let’s just import all the required libraries.

 import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, SimpleRNN, RepeatVector, TimeDistributed from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback from termcolor import colored

### 1. Generating Data

Let’s define a char string containing all the characters we need for writing a basic arithmetic equation. So, the string consists of all the characters from 0-9 and all the arithmetic operators like /, *, +, -, .(decimal).

We cannot directly feed the numerical data into our model, we need to pass the data in the form of tensors. Converting the string in the data to a one-hot encoded vector will give us an optimized model performance. A one-hot encoded vector is an array with a length the same as the length of our char string, each one-hot vector has ones only at the respective index of character present in each string.

For example, let’s say our character string is ‘0123456789’, and if we want to encode a string like ‘12’ then the one-hot vector would be [ [0,1,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0] ]. To do that we need to create two dictionaries with one index as keys and chars as values and the other as vice-versa.

 char_string = ‘0123456789/*+-.‘ num_chars = len(char_string)    character_to_index = dict((c, i) for i, c in enumerate(char_string)) index_to_character = dict((i, c) for i, c in enumerate(char_string))

Now let’s write a function that returns a random arithmetic equation along with the result of that equation.

 def division(n, d):     return n / d if d!=0 else 0 def datagen():     random1 = np.random.randint(low=0,high=100)     random2 = np.random.randint(low=0,high=100)     op = np.random.randint(low=0, high=4)      if op==1:         arith = str(random1) + ‘+‘ + str(random2)         res = str(random1+random2)     elif op==1:         arith = str(random1) + ‘–‘ + str(random2)         res = str(random1–random2)     elif op==2:         arith = str(random1) + ‘*‘ + str(random2)         res = str(random1*random2)     else:         arith = str(random1) + ‘/‘ + str(random2)         res = str(round(division(random1, random2),2))     return arith, res

Also Read: Interesting Neural Network Project Ideas

### 2. Building A Model

The model will have an encoder and a decoder. The encoder is a simple RNN model with input shape as (None,num_chars) and 128 hidden units, the reason why we choose hidden units as 32,64,128, etc is because of the better performance of CPU or GPU with hidden units as powers of 2.

Our encoder will be a fully connected network and the output of these will be fed back into the network, that is how an RNN works. An RNN layer uses ‘tanh’ activation by default, we are not going to change because it best fits the encoder. The output of this layer will be a single vector and to attain a single vector of the whole output we’ll use the RepeatVector() layer with the required number of times as a parameter.

Now the output vector will have the essence of the input given, and this vector will be fed into the decoder.

The decoder is comprised of a simple RNN layer and this will generate the output sequence since we need the RNN layer to return the predicted sequence we are going to flag the ‘return_sequences’ as True. By assigning the ‘return_sequences’ as True, the RNN layer will return the predicted sequence for each time step(many to many RNN).

The output of this RNN layer is fed into a Dense layer with ‘num_chars’ number of hidden units and we’ll use softmax activation since we need the probability of each character. Before we deploy a Dense layer, we need to abridge this layer into a TimeDistributed layer because we need to deploy the Dense layer for output of each time step.

 hidden_units = 128 max_time_steps = 5    #we are hardcoding the output to be of 5 characters def model():   model = Sequential()   model.add(SimpleRNN(hidden_units, input_shape=(None, num_chars)))   model.add(RepeatVector(max_time_steps))   model.add(SimpleRNN(hidden_units, return_sequences=True))   model.add(TimeDistributed(Dense(num_chars, activation=‘softmax‘)))     return model model = model() model.summary()  model.compile(loss=‘categorical_crossentropy‘, optimizer=‘adam‘, metrics=[‘accuracy‘])

The architecture of the model will be as shown above

### 3. Vectorizing and De-vectorizing the Data

Let’s define functions for vectorizing and de-vectorizing the data.

Here’s the function for vectorizing the arithmetic expression and the result together.

 def vectorize(arith, res):     x = np.zeros((max_time_steps, num_chars))                        y = np.zeros((max_time_steps, num_chars))     x_remaining = max_time_steps – len(arith)                              y_remaining = max_time_steps – len(res)                                for i, c in enumerate(arith):                    x[x_remaining+i, character_to_index[c]] = 1          for i in range(x_remaining):                            x[i, character_to_index[‘0‘]] = 1                 for i, c in enumerate(res):         y[y_remaining+i, character_to_index[c]] = 1     for i in range(y_remaining):         y[i, character_to_index[‘0‘]] = 1     return x, y

Similarly here’s the function for de-vectorizing the string. Since the output we receive is a vector of probabilities, we’ll use np.argmax() for picking the character with the highest probability. Now the index_to_character dictionary is used to trace back the character at that index.

 def devectorize(input):     res = [index_to_character[np.argmax(vec)] for i, vec in enumerate(input)]     return ‘‘.join(res)

Now the constraint we have with the ‘devectorize’ function is, it is going to pad the trailing characters with zeroes. For example, if the input vector is (‘1-20’, ‘-19’) then the de-vectorized output will be (‘01-20’, ‘00-19’). We need to take care of these extra padded zeroes. Let’s write a function for stripping the string.

 def stripping(input):     flag = False     output = ‘‘     for c in input:         if not flag and c == ‘0‘:             continue         if c == ‘+‘ or c == ‘–‘ or c==‘*‘ or c==‘/‘ or c==‘.‘:             flag = False         else:             flag = True         output += c     return output

### 4. Making A Dataset

Now that we are done with defining a function for generating the data, let’s use that function and make a dataset with many such (arithmetic expression, result) pairs.

 def create_dataset(num_equations):     x_train = np.zeros((num_equations, max_time_steps, num_chars))                 y_train = np.zeros((num_equations, max_time_steps, num_chars))                 for i in range(num_equations):                                                    e, l = datagen()                                                         x, y = vectorize(e, l)                                                   x_train[i] = x                                                                 y_train[i] = y                                                             return x_train, y_train

### 5. Training the Model

Let’s create a dataset of 50,000 samples which is a fair number to train our data hunger model, we’ll use 25% of this data for validation. Also, let’s create a callback for intelligent training interruption if the accuracy remains unchanged for 8 epochs. This can be achieved by setting the patience parameter to 8.

 x_train, y_train = create_dataset(50000) simple_logger = LambdaCallback(     on_epoch_end=lambda e, l: print(‘{:.2f}‘.format(l[‘val_accuracy‘]), end=‘ _ ‘) ) early_stopping = EarlyStopping(monitor=‘val_loss‘, patience=8) model.fit(x_train, y_train, epochs=100, validation_split=0.25, verbose=0,             callbacks=[simple_logger, early_stopping])

### 6. Testing the Model

Now let’s test our model by creating a dataset of the size 30.

 x_test, y_test = create_dataset(num_equations=20) preds = model.predict(x_test) full_seq_acc = 0 for i, pred in enumerate(preds):     pred_str = stripping(devectorize(pred))     y_test_str = stripping(devectorize(y_test[i]))     x_test_str = stripping(devectorize(x_test[i]))     col = ‘green‘ if pred_str == y_test_str else ‘red‘     full_seq_acc += 1/len(preds) * int(pred_str == y_test_str)     outstring = ‘Input: {}, Output: {}, Prediction: {}‘.format(x_test_str, y_test_str, pred_str)     print(colored(outstring, col)) print(‘\nFull sequence accuracy: {:.3f} %‘.format(100 * full_seq_acc))

The output will be as follows

We can see the accuracy is little poor here, anyways we can optimize it by tweaking a few hyperparameters like the number of hidden units, validation split, number of epochs, etc.

## Popular AI and ML Blogs & Free Courses

 IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau

## Conclusion

We’ve understood the basic workflow of an RNN, understood that RNNs are best suited for sequential data, generated a dataset of random arithmetic equations, developed a sequential model for predicting the output of a basic arithmetic expression, trained that model with the dataset which we’ve created, and finally tested that model with a small dataset which the model has never seen before.

If you’re interested to learn more about RNN, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Course
Select
By clicking 'Submit' you Agree to

#### Our Popular Machine Learning Course

1What are the different types of neural networks in machine learning?

In machine learning, artificial neural networks are basically computational models that have been designed to resemble the human brain. There are different kinds of artificial neural networks that machine learning employs based on the mathematical computation that needs to be achieved. These neural networks are a subset of different machine learning techniques that learn from data in different ways. Some of the most widely used types of neural networks are – recurrent neural network – long short term memory, feedforward neural network – artificial neuron, radial basis function neural network, Kohonen self-organizing neural network, convolutional neural network, and modular neural network, among others.

2What are the advantages of a recurrent neural network?

Recurrent neural networks are among the most commonly used artificial neural networks in deep learning and machine learning. In this type of neural network model, the result obtained from the previous step is fed as input to the subsequent step. A recurrent neural network comes with several advantages like – it can retain every bit of information over time, including its previous inputs, which makes it ideal for time series prediction. This type is the best instance of long-short memory. Also, recurrent neural networks provide constructive pixel neighborhood by using convolutional layers.

3How are neural networks employed in real-world applications?

Artificial neural networks are an integral part of deep learning, which again is a super-specialized branch of machine learning and artificial intelligence. Neural networks are used across different industries to achieve various critical objectives. Some of the most interesting real-world applications of artificial neural networks include stock market forecasting, facial recognition, high-performance auto-piloting and fault diagnosis in the aerospace industry, analysis of armed attacks and object location in the defence sector, image processing, drug discovery and disease detection in the healthcare sector, verification of signature, handwriting analysis, weather forecasting and social media trend forecasting, among others.

## Suggested Blogs

5385
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you

29 Feb 2024

6109
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g

27 Feb 2024

75574
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn

19 Feb 2024

64428
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th

19 Feb 2024

152727
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr

18 Feb 2024

908666
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a

18 Feb 2024

759498
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr

18 Feb 2024

107595
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation