Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconSolving Basic Math Equation Using RNN [With Coding Example]

Solving Basic Math Equation Using RNN [With Coding Example]

Last updated:
7th Dec, 2020
Views
Read Time
10 Mins
share image icon
In this article
Chevron in toc
View All
Solving Basic Math Equation Using RNN [With Coding Example]

If life gives you RNN, make a calculator 🙂

A Recurrent Neural Network is one of a classic artificial neural network, where the connections between the nodes form a sequential directed graph. RNNs are famous for applications like speech recognition, handwriting recognition, etc because of their internal state memory for processing variable-length sequences.

Top Machine Learning and AI Courses Online

RNNs are further classified into two types. The first one is a finite impulse whose neural network is in the form of a directed acyclic graph where one node can be connected with one or more nodes that are ahead with no visible cycle in the network. Another one is an infinite impulse whose neural network is in the form of a directed cyclic graph which cannot be unrolled into a feed-forward neural network.

Ads of upGrad blog

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

What We Gonna Do?

Let’s build a model that predicts the output of an arithmetic expression. For example, if I give an input ‘11+88’, then the model should predict the next word in the sequence as ‘99’. The input and output are a sequence of characters since an RNN deals with sequential data.

Now designing the architecture of the model looks like a simple task when compared to dataset collection. Generating data or gathering dataset is a strenuous task because data hunger AI models require a fair amount of data for acceptable accuracy.

So this model can be implemented in 6 basic steps:

  1. Generating data
  2. Building a model
  3. Vectorising and De-vectorising the data
  4. Making a dataset
  5. Training the model
  6. Testing the model

Before we dive into implementing the model, let’s just import all the required libraries.

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout, SimpleRNN, RepeatVector, TimeDistributed

from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback

from termcolor import colored

1. Generating Data

Let’s define a char string containing all the characters we need for writing a basic arithmetic equation. So, the string consists of all the characters from 0-9 and all the arithmetic operators like /, *, +, -, .(decimal).

We cannot directly feed the numerical data into our model, we need to pass the data in the form of tensors. Converting the string in the data to a one-hot encoded vector will give us an optimized model performance. A one-hot encoded vector is an array with a length the same as the length of our char string, each one-hot vector has ones only at the respective index of character present in each string.

For example, let’s say our character string is ‘0123456789’, and if we want to encode a string like ‘12’ then the one-hot vector would be [ [0,1,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0] ]. To do that we need to create two dictionaries with one index as keys and chars as values and the other as vice-versa.

char_string = 0123456789/*+-.

num_chars = len(char_string)   

character_to_index = dict((c, i) for i, c in enumerate(char_string))

index_to_character = dict((i, c) for i, c in enumerate(char_string))

Now let’s write a function that returns a random arithmetic equation along with the result of that equation.

def division(n, d):

    return n / d if d!=0 else 0

def datagen():

    random1 = np.random.randint(low=0,high=100)

    random2 = np.random.randint(low=0,high=100)

    op = np.random.randint(low=0, high=4

    if op==1:

        arith = str(random1) + + + str(random2)

        res = str(random1+random2)

    elif op==1:

        arith = str(random1) + + str(random2)

        res = str(random1random2)

    elif op==2:

        arith = str(random1) + * + str(random2)

        res = str(random1*random2)

    else:

        arith = str(random1) + / + str(random2)

        res = str(round(division(random1, random2),2))

    return arith, res

Also Read: Interesting Neural Network Project Ideas

2. Building A Model

The model will have an encoder and a decoder. The encoder is a simple RNN model with input shape as (None,num_chars) and 128 hidden units, the reason why we choose hidden units as 32,64,128, etc is because of the better performance of CPU or GPU with hidden units as powers of 2.

Our encoder will be a fully connected network and the output of these will be fed back into the network, that is how an RNN works. An RNN layer uses ‘tanh’ activation by default, we are not going to change because it best fits the encoder. The output of this layer will be a single vector and to attain a single vector of the whole output we’ll use the RepeatVector() layer with the required number of times as a parameter.

Now the output vector will have the essence of the input given, and this vector will be fed into the decoder.

The decoder is comprised of a simple RNN layer and this will generate the output sequence since we need the RNN layer to return the predicted sequence we are going to flag the ‘return_sequences’ as True. By assigning the ‘return_sequences’ as True, the RNN layer will return the predicted sequence for each time step(many to many RNN).

The output of this RNN layer is fed into a Dense layer with ‘num_chars’ number of hidden units and we’ll use softmax activation since we need the probability of each character. Before we deploy a Dense layer, we need to abridge this layer into a TimeDistributed layer because we need to deploy the Dense layer for output of each time step.

hidden_units = 128

max_time_steps = 5    #we are hardcoding the output to be of 5 characters

def model():

  model = Sequential()

  model.add(SimpleRNN(hidden_units, input_shape=(None, num_chars)))

  model.add(RepeatVector(max_time_steps))

  model.add(SimpleRNN(hidden_units, return_sequences=True))

  model.add(TimeDistributed(Dense(num_chars, activation=softmax)))

    return model

model = model()

model.summary()

 model.compile(loss=categorical_crossentropy, optimizer=adam, metrics=[accuracy])

The architecture of the model will be as shown above

Must Read: Neural Network Tutorial

3. Vectorizing and De-vectorizing the Data

Let’s define functions for vectorizing and de-vectorizing the data.

Here’s the function for vectorizing the arithmetic expression and the result together.

 

def vectorize(arith, res):

    x = np.zeros((max_time_steps, num_chars))                   

    y = np.zeros((max_time_steps, num_chars))

    x_remaining = max_time_steps len(arith)                         

    y_remaining = max_time_steps len(res)                           

    for i, c in enumerate(arith):           

        x[x_remaining+i, character_to_index[c]] = 1     

    for i in range(x_remaining):                   

        x[i, character_to_index[0]] = 1         

       for i, c in enumerate(res):

        y[y_remaining+i, character_to_index[c]] = 1

    for i in range(y_remaining):

        y[i, character_to_index[0]] = 1

    return x, y

 

Similarly here’s the function for de-vectorizing the string. Since the output we receive is a vector of probabilities, we’ll use np.argmax() for picking the character with the highest probability. Now the index_to_character dictionary is used to trace back the character at that index.

def devectorize(input):

    res = [index_to_character[np.argmax(vec)] for i, vec in enumerate(input)]

    return .join(res)

Now the constraint we have with the ‘devectorize’ function is, it is going to pad the trailing characters with zeroes. For example, if the input vector is (‘1-20’, ‘-19’) then the de-vectorized output will be (‘01-20’, ‘00-19’). We need to take care of these extra padded zeroes. Let’s write a function for stripping the string.

def stripping(input):

    flag = False

    output =

    for c in input:

        if not flag and c == 0:

            continue

        if c == + or c == or c==* or c==/ or c==.:

            flag = False

        else:

            flag = True

        output += c

    return output

4. Making A Dataset

Now that we are done with defining a function for generating the data, let’s use that function and make a dataset with many such (arithmetic expression, result) pairs.

def create_dataset(num_equations):

    x_train = np.zeros((num_equations, max_time_steps, num_chars))            

    y_train = np.zeros((num_equations, max_time_steps, num_chars))            

    for i in range(num_equations):                                           

        e, l = datagen()                                                

        x, y = vectorize(e, l)                                          

        x_train[i] = x                                                        

        y_train[i] = y                                                        

    return x_train, y_train

5. Training the Model

Let’s create a dataset of 50,000 samples which is a fair number to train our data hunger model, we’ll use 25% of this data for validation. Also, let’s create a callback for intelligent training interruption if the accuracy remains unchanged for 8 epochs. This can be achieved by setting the patience parameter to 8.

x_train, y_train = create_dataset(50000)

simple_logger = LambdaCallback(

    on_epoch_end=lambda e, l: print({:.2f}.format(l[val_accuracy]), end= _ )

)

early_stopping = EarlyStopping(monitor=val_loss, patience=8)

model.fit(x_train, y_train, epochs=100, validation_split=0.25, verbose=0

           callbacks=[simple_logger, early_stopping])

6. Testing the Model

Now let’s test our model by creating a dataset of the size 30.

 

x_test, y_test = create_dataset(num_equations=20)

preds = model.predict(x_test)

full_seq_acc = 0

for i, pred in enumerate(preds):

    pred_str = stripping(devectorize(pred))

    y_test_str = stripping(devectorize(y_test[i]))

    x_test_str = stripping(devectorize(x_test[i]))

    col = green if pred_str == y_test_str else red

    full_seq_acc += 1/len(preds) * int(pred_str == y_test_str)

    outstring = Input: {}, Output: {}, Prediction: {}.format(x_test_str, y_test_str, pred_str)

    print(colored(outstring, col))

print(\nFull sequence accuracy: {:.3f} %.format(100 * full_seq_acc))

The output will be as follows

We can see the accuracy is little poor here, anyways we can optimize it by tweaking a few hyperparameters like the number of hidden units, validation split, number of epochs, etc.

Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Conclusion 

We’ve understood the basic workflow of an RNN, understood that RNNs are best suited for sequential data, generated a dataset of random arithmetic equations, developed a sequential model for predicting the output of a basic arithmetic expression, trained that model with the dataset which we’ve created, and finally tested that model with a small dataset which the model has never seen before.

If you’re interested to learn more about RNN, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What are the different types of neural networks in machine learning?

In machine learning, artificial neural networks are basically computational models that have been designed to resemble the human brain. There are different kinds of artificial neural networks that machine learning employs based on the mathematical computation that needs to be achieved. These neural networks are a subset of different machine learning techniques that learn from data in different ways. Some of the most widely used types of neural networks are – recurrent neural network – long short term memory, feedforward neural network – artificial neuron, radial basis function neural network, Kohonen self-organizing neural network, convolutional neural network, and modular neural network, among others.

2What are the advantages of a recurrent neural network?

Recurrent neural networks are among the most commonly used artificial neural networks in deep learning and machine learning. In this type of neural network model, the result obtained from the previous step is fed as input to the subsequent step. A recurrent neural network comes with several advantages like – it can retain every bit of information over time, including its previous inputs, which makes it ideal for time series prediction. This type is the best instance of long-short memory. Also, recurrent neural networks provide constructive pixel neighborhood by using convolutional layers.

3How are neural networks employed in real-world applications?

Artificial neural networks are an integral part of deep learning, which again is a super-specialized branch of machine learning and artificial intelligence. Neural networks are used across different industries to achieve various critical objectives. Some of the most interesting real-world applications of artificial neural networks include stock market forecasting, facial recognition, high-performance auto-piloting and fault diagnosis in the aerospace industry, analysis of armed attacks and object location in the defence sector, image processing, drug discovery and disease detection in the healthcare sector, verification of signature, handwriting analysis, weather forecasting and social media trend forecasting, among others.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5369
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
6092
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75561
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64411
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
152681
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
908629
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
759301
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
107566
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328055
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon