Solving Basic Math Equation Using RNN [With Coding Example]

If life gives you RNN, make a calculator 🙂

A Recurrent Neural Network is one of a classic artificial neural network, where the connections between the nodes form a sequential directed graph. RNNs are famous for applications like speech recognition, handwriting recognition, etc because of their internal state memory for processing variable-length sequences.

RNNs are further classified into two types. The first one is a finite impulse whose neural network is in the form of a directed acyclic graph where one node can be connected with one or more nodes that are ahead with no visible cycle in the network. Another one is an infinite impulse whose neural network is in the form of a directed cyclic graph which cannot be unrolled into a feed-forward neural network.

What We Gonna Do?

Let’s build a model that predicts the output of an arithmetic expression. For example, if I give an input ‘11+88’, then the model should predict the next word in the sequence as ‘99’. The input and output are a sequence of characters since an RNN deals with sequential data.

Now designing the architecture of the model looks like a simple task when compared to dataset collection. Generating data or gathering dataset is a strenuous task because data hunger AI models require a fair amount of data for acceptable accuracy.

So this model can be implemented in 6 basic steps:

  1. Generating data
  2. Building a model
  3. Vectorising and De-vectorising the data
  4. Making a dataset
  5. Training the model
  6. Testing the model

Before we dive into implementing the model, let’s just import all the required libraries.

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout, SimpleRNN, RepeatVector, TimeDistributed

from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback

from termcolor import colored

1. Generating Data

Let’s define a char string containing all the characters we need for writing a basic arithmetic equation. So, the string consists of all the characters from 0-9 and all the arithmetic operators like /, *, +, -, .(decimal).

We cannot directly feed the numerical data into our model, we need to pass the data in the form of tensors. Converting the string in the data to a one-hot encoded vector will give us an optimized model performance. A one-hot encoded vector is an array with a length the same as the length of our char string, each one-hot vector has ones only at the respective index of character present in each string.

For example, let’s say our character string is ‘0123456789’, and if we want to encode a string like ‘12’ then the one-hot vector would be [ [0,1,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0] ]. To do that we need to create two dictionaries with one index as keys and chars as values and the other as vice-versa.

char_string = 0123456789/*+-.

num_chars = len(char_string)   

character_to_index = dict((c, i) for i, c in enumerate(char_string))

index_to_character = dict((i, c) for i, c in enumerate(char_string))

Now let’s write a function that returns a random arithmetic equation along with the result of that equation.

def division(n, d):

    return n / d if d!=0 else 0

def datagen():

    random1 = np.random.randint(low=0,high=100)

    random2 = np.random.randint(low=0,high=100)

    op = np.random.randint(low=0, high=4

    if op==1:

        arith = str(random1) + + + str(random2)

        res = str(random1+random2)

    elif op==1:

        arith = str(random1) + + str(random2)

        res = str(random1random2)

    elif op==2:

        arith = str(random1) + * + str(random2)

        res = str(random1*random2)


        arith = str(random1) + / + str(random2)

        res = str(round(division(random1, random2),2))

    return arith, res

Also Read: Interesting Neural Network Project Ideas

2. Building A Model

The model will have an encoder and a decoder. The encoder is a simple RNN model with input shape as (None,num_chars) and 128 hidden units, the reason why we choose hidden units as 32,64,128, etc is because of the better performance of CPU or GPU with hidden units as powers of 2.

Our encoder will be a fully connected network and the output of these will be fed back into the network, that is how an RNN works. An RNN layer uses ‘tanh’ activation by default, we are not going to change because it best fits the encoder. The output of this layer will be a single vector and to attain a single vector of the whole output we’ll use the RepeatVector() layer with the required number of times as a parameter.

Now the output vector will have the essence of the input given, and this vector will be fed into the decoder.

The decoder is comprised of a simple RNN layer and this will generate the output sequence since we need the RNN layer to return the predicted sequence we are going to flag the ‘return_sequences’ as True. By assigning the ‘return_sequences’ as True, the RNN layer will return the predicted sequence for each time step(many to many RNN).

The output of this RNN layer is fed into a Dense layer with ‘num_chars’ number of hidden units and we’ll use softmax activation since we need the probability of each character. Before we deploy a Dense layer, we need to abridge this layer into a TimeDistributed layer because we need to deploy the Dense layer for output of each time step.

hidden_units = 128

max_time_steps = 5    #we are hardcoding the output to be of 5 characters

def model():

  model = Sequential()

  model.add(SimpleRNN(hidden_units, input_shape=(None, num_chars)))


  model.add(SimpleRNN(hidden_units, return_sequences=True))

  model.add(TimeDistributed(Dense(num_chars, activation=softmax)))

    return model

model = model()


 model.compile(loss=categorical_crossentropy, optimizer=adam, metrics=[accuracy])

The architecture of the model will be as shown above

Must Read: Neural Network Tutorial

3. Vectorizing and De-vectorizing the Data

Let’s define functions for vectorizing and de-vectorizing the data.

Here’s the function for vectorizing the arithmetic expression and the result together.


def vectorize(arith, res):

    x = np.zeros((max_time_steps, num_chars))                   

    y = np.zeros((max_time_steps, num_chars))

    x_remaining = max_time_steps len(arith)                         

    y_remaining = max_time_steps len(res)                           

    for i, c in enumerate(arith):           

        x[x_remaining+i, character_to_index[c]] = 1     

    for i in range(x_remaining):                   

        x[i, character_to_index[0]] = 1         

       for i, c in enumerate(res):

        y[y_remaining+i, character_to_index[c]] = 1

    for i in range(y_remaining):

        y[i, character_to_index[0]] = 1

    return x, y


Similarly here’s the function for de-vectorizing the string. Since the output we receive is a vector of probabilities, we’ll use np.argmax() for picking the character with the highest probability. Now the index_to_character dictionary is used to trace back the character at that index.

def devectorize(input):

    res = [index_to_character[np.argmax(vec)] for i, vec in enumerate(input)]

    return .join(res)

Now the constraint we have with the ‘devectorize’ function is, it is going to pad the trailing characters with zeroes. For example, if the input vector is (‘1-20’, ‘-19’) then the de-vectorized output will be (‘01-20’, ‘00-19’). We need to take care of these extra padded zeroes. Let’s write a function for stripping the string.

def stripping(input):

    flag = False

    output =

    for c in input:

        if not flag and c == 0:


        if c == + or c == or c==* or c==/ or c==.:

            flag = False


            flag = True

        output += c

    return output

4. Making A Dataset

Now that we are done with defining a function for generating the data, let’s use that function and make a dataset with many such (arithmetic expression, result) pairs.

def create_dataset(num_equations):

    x_train = np.zeros((num_equations, max_time_steps, num_chars))            

    y_train = np.zeros((num_equations, max_time_steps, num_chars))            

    for i in range(num_equations):                                           

        e, l = datagen()                                                

        x, y = vectorize(e, l)                                          

        x_train[i] = x                                                        

        y_train[i] = y                                                        

    return x_train, y_train

5. Training the Model

Let’s create a dataset of 50,000 samples which is a fair number to train our data hunger model, we’ll use 25% of this data for validation. Also, let’s create a callback for intelligent training interruption if the accuracy remains unchanged for 8 epochs. This can be achieved by setting the patience parameter to 8.

x_train, y_train = create_dataset(50000)

simple_logger = LambdaCallback(

    on_epoch_end=lambda e, l: print({:.2f}.format(l[val_accuracy]), end= _ )


early_stopping = EarlyStopping(monitor=val_loss, patience=8), y_train, epochs=100, validation_split=0.25, verbose=0

           callbacks=[simple_logger, early_stopping])

6. Testing the Model

Now let’s test our model by creating a dataset of the size 30.


x_test, y_test = create_dataset(num_equations=20)

preds = model.predict(x_test)

full_seq_acc = 0

for i, pred in enumerate(preds):

    pred_str = stripping(devectorize(pred))

    y_test_str = stripping(devectorize(y_test[i]))

    x_test_str = stripping(devectorize(x_test[i]))

    col = green if pred_str == y_test_str else red

    full_seq_acc += 1/len(preds) * int(pred_str == y_test_str)

    outstring = Input: {}, Output: {}, Prediction: {}.format(x_test_str, y_test_str, pred_str)

    print(colored(outstring, col))

print(\nFull sequence accuracy: {:.3f} %.format(100 * full_seq_acc))

The output will be as follows

We can see the accuracy is little poor here, anyways we can optimize it by tweaking a few hyperparameters like the number of hidden units, validation split, number of epochs, etc.


We’ve understood the basic workflow of an RNN, understood that RNNs are best suited for sequential data, generated a dataset of random arithmetic equations, developed a sequential model for predicting the output of a basic arithmetic expression, trained that model with the dataset which we’ve created, and finally tested that model with a small dataset which the model has never seen before.

If you’re interested to learn more about RNN, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Lead the AI Driven Technological Revolution

Learn More

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Machine Learning Course

Accelerate Your Career with upGrad