Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconSolving Basic Math Equation Using RNN [With Coding Example]

Solving Basic Math Equation Using RNN [With Coding Example]

Last updated:
7th Dec, 2020
Views
Read Time
10 Mins
share image icon
In this article
Chevron in toc
View All
Solving Basic Math Equation Using RNN [With Coding Example]

If life gives you RNN, make a calculator 🙂

A Recurrent Neural Network is one of a classic artificial neural network, where the connections between the nodes form a sequential directed graph. RNNs are famous for applications like speech recognition, handwriting recognition, etc because of their internal state memory for processing variable-length sequences.

Top Machine Learning and AI Courses Online

RNNs are further classified into two types. The first one is a finite impulse whose neural network is in the form of a directed acyclic graph where one node can be connected with one or more nodes that are ahead with no visible cycle in the network. Another one is an infinite impulse whose neural network is in the form of a directed cyclic graph which cannot be unrolled into a feed-forward neural network.

Ads of upGrad blog

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

What We Gonna Do?

Let’s build a model that predicts the output of an arithmetic expression. For example, if I give an input ‘11+88’, then the model should predict the next word in the sequence as ‘99’. The input and output are a sequence of characters since an RNN deals with sequential data.

Now designing the architecture of the model looks like a simple task when compared to dataset collection. Generating data or gathering dataset is a strenuous task because data hunger AI models require a fair amount of data for acceptable accuracy.

So this model can be implemented in 6 basic steps:

  1. Generating data
  2. Building a model
  3. Vectorising and De-vectorising the data
  4. Making a dataset
  5. Training the model
  6. Testing the model

Before we dive into implementing the model, let’s just import all the required libraries.

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout, SimpleRNN, RepeatVector, TimeDistributed

from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback

from termcolor import colored

1. Generating Data

Let’s define a char string containing all the characters we need for writing a basic arithmetic equation. So, the string consists of all the characters from 0-9 and all the arithmetic operators like /, *, +, -, .(decimal).

We cannot directly feed the numerical data into our model, we need to pass the data in the form of tensors. Converting the string in the data to a one-hot encoded vector will give us an optimized model performance. A one-hot encoded vector is an array with a length the same as the length of our char string, each one-hot vector has ones only at the respective index of character present in each string.

For example, let’s say our character string is ‘0123456789’, and if we want to encode a string like ‘12’ then the one-hot vector would be [ [0,1,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0] ]. To do that we need to create two dictionaries with one index as keys and chars as values and the other as vice-versa.

char_string = 0123456789/*+-.

num_chars = len(char_string)   

character_to_index = dict((c, i) for i, c in enumerate(char_string))

index_to_character = dict((i, c) for i, c in enumerate(char_string))

Now let’s write a function that returns a random arithmetic equation along with the result of that equation.

def division(n, d):

    return n / d if d!=0 else 0

def datagen():

    random1 = np.random.randint(low=0,high=100)

    random2 = np.random.randint(low=0,high=100)

    op = np.random.randint(low=0, high=4

    if op==1:

        arith = str(random1) + + + str(random2)

        res = str(random1+random2)

    elif op==1:

        arith = str(random1) + + str(random2)

        res = str(random1random2)

    elif op==2:

        arith = str(random1) + * + str(random2)

        res = str(random1*random2)

    else:

        arith = str(random1) + / + str(random2)

        res = str(round(division(random1, random2),2))

    return arith, res

Also Read: Interesting Neural Network Project Ideas

2. Building A Model

The model will have an encoder and a decoder. The encoder is a simple RNN model with input shape as (None,num_chars) and 128 hidden units, the reason why we choose hidden units as 32,64,128, etc is because of the better performance of CPU or GPU with hidden units as powers of 2.

Our encoder will be a fully connected network and the output of these will be fed back into the network, that is how an RNN works. An RNN layer uses ‘tanh’ activation by default, we are not going to change because it best fits the encoder. The output of this layer will be a single vector and to attain a single vector of the whole output we’ll use the RepeatVector() layer with the required number of times as a parameter.

Now the output vector will have the essence of the input given, and this vector will be fed into the decoder.

The decoder is comprised of a simple RNN layer and this will generate the output sequence since we need the RNN layer to return the predicted sequence we are going to flag the ‘return_sequences’ as True. By assigning the ‘return_sequences’ as True, the RNN layer will return the predicted sequence for each time step(many to many RNN).

The output of this RNN layer is fed into a Dense layer with ‘num_chars’ number of hidden units and we’ll use softmax activation since we need the probability of each character. Before we deploy a Dense layer, we need to abridge this layer into a TimeDistributed layer because we need to deploy the Dense layer for output of each time step.

hidden_units = 128

max_time_steps = 5    #we are hardcoding the output to be of 5 characters

def model():

  model = Sequential()

  model.add(SimpleRNN(hidden_units, input_shape=(None, num_chars)))

  model.add(RepeatVector(max_time_steps))

  model.add(SimpleRNN(hidden_units, return_sequences=True))

  model.add(TimeDistributed(Dense(num_chars, activation=softmax)))

    return model

model = model()

model.summary()

 model.compile(loss=categorical_crossentropy, optimizer=adam, metrics=[accuracy])

The architecture of the model will be as shown above

Must Read: Neural Network Tutorial

3. Vectorizing and De-vectorizing the Data

Let’s define functions for vectorizing and de-vectorizing the data.

Here’s the function for vectorizing the arithmetic expression and the result together.

 

def vectorize(arith, res):

    x = np.zeros((max_time_steps, num_chars))                   

    y = np.zeros((max_time_steps, num_chars))

    x_remaining = max_time_steps len(arith)                         

    y_remaining = max_time_steps len(res)                           

    for i, c in enumerate(arith):           

        x[x_remaining+i, character_to_index[c]] = 1     

    for i in range(x_remaining):                   

        x[i, character_to_index[0]] = 1         

       for i, c in enumerate(res):

        y[y_remaining+i, character_to_index[c]] = 1

    for i in range(y_remaining):

        y[i, character_to_index[0]] = 1

    return x, y

 

Similarly here’s the function for de-vectorizing the string. Since the output we receive is a vector of probabilities, we’ll use np.argmax() for picking the character with the highest probability. Now the index_to_character dictionary is used to trace back the character at that index.

def devectorize(input):

    res = [index_to_character[np.argmax(vec)] for i, vec in enumerate(input)]

    return .join(res)

Now the constraint we have with the ‘devectorize’ function is, it is going to pad the trailing characters with zeroes. For example, if the input vector is (‘1-20’, ‘-19’) then the de-vectorized output will be (‘01-20’, ‘00-19’). We need to take care of these extra padded zeroes. Let’s write a function for stripping the string.

def stripping(input):

    flag = False

    output =

    for c in input:

        if not flag and c == 0:

            continue

        if c == + or c == or c==* or c==/ or c==.:

            flag = False

        else:

            flag = True

        output += c

    return output

4. Making A Dataset

Now that we are done with defining a function for generating the data, let’s use that function and make a dataset with many such (arithmetic expression, result) pairs.

def create_dataset(num_equations):

    x_train = np.zeros((num_equations, max_time_steps, num_chars))            

    y_train = np.zeros((num_equations, max_time_steps, num_chars))            

    for i in range(num_equations):                                           

        e, l = datagen()                                                

        x, y = vectorize(e, l)                                          

        x_train[i] = x                                                        

        y_train[i] = y                                                        

    return x_train, y_train

5. Training the Model

Let’s create a dataset of 50,000 samples which is a fair number to train our data hunger model, we’ll use 25% of this data for validation. Also, let’s create a callback for intelligent training interruption if the accuracy remains unchanged for 8 epochs. This can be achieved by setting the patience parameter to 8.

x_train, y_train = create_dataset(50000)

simple_logger = LambdaCallback(

    on_epoch_end=lambda e, l: print({:.2f}.format(l[val_accuracy]), end= _ )

)

early_stopping = EarlyStopping(monitor=val_loss, patience=8)

model.fit(x_train, y_train, epochs=100, validation_split=0.25, verbose=0

           callbacks=[simple_logger, early_stopping])

6. Testing the Model

Now let’s test our model by creating a dataset of the size 30.

 

x_test, y_test = create_dataset(num_equations=20)

preds = model.predict(x_test)

full_seq_acc = 0

for i, pred in enumerate(preds):

    pred_str = stripping(devectorize(pred))

    y_test_str = stripping(devectorize(y_test[i]))

    x_test_str = stripping(devectorize(x_test[i]))

    col = green if pred_str == y_test_str else red

    full_seq_acc += 1/len(preds) * int(pred_str == y_test_str)

    outstring = Input: {}, Output: {}, Prediction: {}.format(x_test_str, y_test_str, pred_str)

    print(colored(outstring, col))

print(\nFull sequence accuracy: {:.3f} %.format(100 * full_seq_acc))

The output will be as follows

We can see the accuracy is little poor here, anyways we can optimize it by tweaking a few hyperparameters like the number of hidden units, validation split, number of epochs, etc.

Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Conclusion 

We’ve understood the basic workflow of an RNN, understood that RNNs are best suited for sequential data, generated a dataset of random arithmetic equations, developed a sequential model for predicting the output of a basic arithmetic expression, trained that model with the dataset which we’ve created, and finally tested that model with a small dataset which the model has never seen before.

If you’re interested to learn more about RNN, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What are the different types of neural networks in machine learning?

In machine learning, artificial neural networks are basically computational models that have been designed to resemble the human brain. There are different kinds of artificial neural networks that machine learning employs based on the mathematical computation that needs to be achieved. These neural networks are a subset of different machine learning techniques that learn from data in different ways. Some of the most widely used types of neural networks are – recurrent neural network – long short term memory, feedforward neural network – artificial neuron, radial basis function neural network, Kohonen self-organizing neural network, convolutional neural network, and modular neural network, among others.

2What are the advantages of a recurrent neural network?

Recurrent neural networks are among the most commonly used artificial neural networks in deep learning and machine learning. In this type of neural network model, the result obtained from the previous step is fed as input to the subsequent step. A recurrent neural network comes with several advantages like – it can retain every bit of information over time, including its previous inputs, which makes it ideal for time series prediction. This type is the best instance of long-short memory. Also, recurrent neural networks provide constructive pixel neighborhood by using convolutional layers.

3How are neural networks employed in real-world applications?

Artificial neural networks are an integral part of deep learning, which again is a super-specialized branch of machine learning and artificial intelligence. Neural networks are used across different industries to achieve various critical objectives. Some of the most interesting real-world applications of artificial neural networks include stock market forecasting, facial recognition, high-performance auto-piloting and fault diagnosis in the aerospace industry, analysis of armed attacks and object location in the defence sector, image processing, drug discovery and disease detection in the healthcare sector, verification of signature, handwriting analysis, weather forecasting and social media trend forecasting, among others.

Explore Free Courses

Suggested Blogs

15 Interesting MATLAB Project Ideas & Topics For Beginners [2024]
82457
Diving into the world of engineering and data science, I’ve discovered the potential of MATLAB as an indispensable tool. It has accelerated my c
Read More

by Pavan Vadapalli

09 Jul 2024

5 Types of Research Design: Elements and Characteristics
47126
The reliability and quality of your research depend upon several factors such as determination of target audience, the survey of a sample population,
Read More

by Pavan Vadapalli

07 Jul 2024

Biological Neural Network: Importance, Components & Comparison
50612
Humans have made several attempts to mimic the biological systems, and one of them is artificial neural networks inspired by the biological neural net
Read More

by Pavan Vadapalli

04 Jul 2024

Production System in Artificial Intelligence and its Characteristics
86790
The AI market has witnessed rapid growth on the international level, and it is predicted to show a CAGR of 37.3% from 2023 to 2030. The production sys
Read More

by Pavan Vadapalli

03 Jul 2024

AI vs Human Intelligence: Difference Between AI & Human Intelligence
112983
In this article, you will learn about AI vs Human Intelligence, Difference Between AI & Human Intelligence. Definition of AI & Human Intelli
Read More

by Pavan Vadapalli

01 Jul 2024

Career Opportunities in Artificial Intelligence: List of Various Job Roles
89547
Artificial Intelligence or AI career opportunities have escalated recently due to its surging demands in industries. The hype that AI will create tons
Read More

by Pavan Vadapalli

26 Jun 2024

Gini Index for Decision Trees: Mechanism, Perfect & Imperfect Split With Examples
70805
As you start learning about supervised learning, it’s important to get acquainted with the concept of decision trees. Decision trees are akin to
Read More

by MK Gurucharan

24 Jun 2024

Random Forest Vs Decision Tree: Difference Between Random Forest and Decision Tree
51730
Recent advancements have paved the growth of multiple algorithms. These new and blazing algorithms have set the data on fire. They help in handling da
Read More

by Pavan Vadapalli

24 Jun 2024

Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network
270717
Introduction In the last few years of the IT industry, there has been a huge demand for once particular skill set known as Deep Learning. Deep Learni
Read More

by MK Gurucharan

21 Jun 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon