Machine learning and deep learning importance are increasing with the increasing size of data sets. The biggest challenge in front of developers is to build models compatible and scalable with the data set size and dimension. TensorFlow is one of the most used software libraries to build such models. The article focuses on the TensorFlow concept, its features, benefits, architecture and Tensorflow Batch Normalisation.
What is TensorFlow?
Five years ago, TensorFlow was developed by the Google Brain team for Google’s internal use. It was released under the Apache License 2.0. It is an end-to-end open-source platform of a software library for numerical computations. It consists of flexible and comprehensive tools, community resources and libraries that help researchers build and deploy machine learning applications quickly. It makes machine learning and deep learning more accessible and faster.
TensorFlow runs on multiple CPUs (Central Processing Units) and GPUs (Graphics Processing Units). It carries out general-purpose computing on GPUs with CUDA and SYCL extensions. Stateful dataflow graphs demonstrate computations of TensorFlow. TensorFlow was derived from the multidimensional data array operations performed by neural networks referred to as tensors.
Features of TensorFlow
TensorFlow has a flexible architecture that allows easy implementation of machine learning algorithms. Its key features are mentioned as follows:
- It works efficiently with mathematical expressions with multi-dimensional arrays.
- It supports machine learning and deep neural network concepts efficiently.
- The same code can be executed on both GPU and CPU computing architectures.
- It is highly scalable across substantial data sets and machines.
Thus, TensorFlow provides the perfect framework supporting the scalable production of machine intelligence.
What is Batch Normalisation?
Batch normalisation algorithms are one of the most critical ideas in machine learning and deep learning. It is also referred to as batch norm.
It is a technique to train deep neural networks to standardise the inputs to a layer for every mini-batch to stabilise the learning process and reduce the number of training epochs in deep network training.
The mean and standard deviation of every input variable is calculated. These statistics are used to perform standardisation and implement batch normalisation during training.
Benefits of Batch Normalisation
- Faster training of a deep neural network:
The overall training is faster due to the quicker convergence of the calculation when used with a batch normalisation algorithm.
- Higher Learning rate:
In the network convergence, gradient descent requires small learning rates. Gradients get smaller with the depth of networks and require more iterations. But, batch normalisation has a high learning rate that helps in increasing the training speed.
- Easy weight initialisation:
While creating deep networks, weight initialisation becomes difficult. Batch normalisation automatically reduces the initial starting weights sensitivity.
Application of Batch Normalisation
Machine learning and deep neural network training require preprocessing of the input data; one of them is normalisation. It is done to prevent the early saturation of non-linear activation functions, assure the same range of values in the input data and so forth. Thus, all the data resembles a normal distribution like means, unitary variance, and zero mean.
The activation distribution keeps constantly changing during training in the intermediate layers. As each layer needs to learn to adapt to new distribution at each training step, the entire training process is slowed down. It is known as internal covariate shift. Batch normalisation is used to normalise the input in every layer that reduces the internal covariate shift.
Some applications of batch normalisation in models and neural networks are:
- The standardisation of the raw input variables and hidden layers of the output.
- Standardisation of the input before and after the activation function of the earlier layer.
TensorFlow Batch Normalisation
TensorFlow’s architecture has three parts:
- Data preprocessing
- Model building
- Model training and estimation
The input is taken as a multidimensional array and thus referred to as tensors. A flowchart of operations to be carried out on input is constructed, referred to as a graph. Once the input is entered, it flows through the architecture for multiple processes and comes as an output in the form of estimation. Thus, it is called ‘TensorFlow’ as tensor enters from one end of the system, flows through numerous operations and comes out as an output from another side of the system.
A transformation is applied by batch normalisation to maintain the mean output close to zero and standard deviation output close to one. Batch normalisation works differently during inference and training.
Batch Normalisation During training:
The layer normalises output by using the mean and standard deviation of the current batch of inputs when calling the layer with the argument training = True.
Each channel is normalised, and the layer returns:
(Batch – mean(batch)/ (variance(batch) + epsilon)* gamma + beta
Epsilon = a small constant
Gamma = a learned scaling factor
Beta = a leaned offset factor with an initial value of 0.
Batch Normalisation During Inference:
The layer normalises the output using a standard deviation and a moving average of the mean of the batches seen during training. It returns,
(Batch – moving_mean)/ (moving_var + epsilon) * gamma + beta
Where moving_mean and moving_var are variables that can’t be trained. These variables are updated every time the layer is called in training.
TensorFlow Batch Normalisation Steps
The batch normalisation layer follows steps during training time, as given below:
1. Calculate the mean and variance of the input layers:
The batch mean is calculated by the following formula:
𝛍 = 1mt = 1mxt
The following formula calculates the batch variance:
2 = 1mt = 1m(xt-𝛍)2
2. Normalises input layers by using statistics of previously calculated batches:
tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilon, name= None)
A tensor is normalised by mean and variance, applies scale 𝛄 and offset 𝜷to it.
X = Input Tensor
Mean = a mean Tensor
Variance = a variance Tensor
Offset = an offset Tensor (𝜷)
Scale = a scale Tensor (𝛄 )
Variance epsilon = A small float number that avoids dividing by 0 ( )
Name = a name of the operation
The normalisation of input layers is done using the following formula:
xt =xt– 𝛍 2+
3. Obtain output of the layer by scaling and shifting:
The following formula is used to scale and shift the output:
yt=𝛄 xt +𝜷
All this math is implemented in the TensorFlow in the layer tf.layers.batch_normalization.
In short, TensorFlow helps to simplify machine learning and deep learning processes of acquiring data, serving predictions, training models and refining future results. To understand its components and functions, click here.
To understand how to use TensorFlow for deep learning, click here.
Keras is another neural network library built-in Python and simple to use like TensorFlow. If you are confused between the two and want to understand differences while choosing one, you should read the article published by upGrad and guide yourself in choosing the one suitable for you.
TensorFlow is a free and open-source library that eases model building in machine learning and deep learning neural networks. It is implemented to solve the internal covariate shift issue occurring in each layer in the training process. Batch normalisation in a TensorFlow specialises in the normalisation of internal covariate shifts in each layer on a deep neural network.
There are basic steps of batch normalisation that need to be followed strictly. The concept of mean and standard deviation is used to normalise the shift and scaling in batch normalisation. Mathematical formulas to calculate mean and standard deviation are in-built in TensorFlow.
If you are curious to learn TensorFlow and master Machine learning and AI, boost your career with an Masters of Science in ML & AI with IIIT-B & Liverpool John Moores University.