Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconVectorization and Broadcasting in Python

Vectorization and Broadcasting in Python

Last updated:
1st Dec, 2020
Views
Read Time
7 Mins
share image icon
In this article
Chevron in toc
View All
Vectorization and Broadcasting in Python

Vectorization and Broadcasting are ways to speed up the compute time and optimize memory usage while doing mathematical operations with Numpy. These methods are crucial to ensure time complexity is reduced so that the algorithms don’t face any bottlenecks. This optimized operation is necessary for applications to be scalable. We’ll go over both these techniques and implement some examples.

By the end of this tutorial, you will have the knowledge of following:

  • How Vectorization is handled by Numpy
  • Time differences with and without Vectorization
  • What Broadcasting is
  • How Broadcasting is different from usual Matrix multiplication

Explore our Popular Data Science Courses

Vectorization

A lot of times we require mathematical operations on arrays – such as array multiplication. Now, a non-vectorized way would be to do element wise multiplication by using a loop. Implementing it in such a way would result in the same multiplication operation to be done multiple times which would be a wastage of compute resources if the data size is too huge. Let’s take a quick look.

Check out our data science training to upskill yourself

Non vectorized way:

Import random

a = [random.randint(1, 100) for _ in range(10000)]
b = [random.randint(1, 100) for _ in range(10000)]
%timeit [i*j for i, j in zip(a,b)]

 

#Output:
>> 1000 loops, best of 3: 658 µs per loop

Vectorized way:

import numpy as np
a = np.array([random.randint(1, 100) for _ in range(10000)])
b = np.array([random.randint(1, 100) for _ in range(10000)])
%timeit a*b

 

#Output:
>>100000 loops, best of 3: 7.25 µs per loop

As we see, the time elapsed went from 658 microseconds to just 7.25 microseconds. This is because when we say a = np.array([]), all the operations are handled internally by numpy. And when we do a*b, numpy internally multiplies the complete array at once by the means of vectorization.

Here we use the %timeit magic command to time the execution of the process which might differ on your machine.

Let’s take a look at another example of outer products of 2 vectors with dimensions (nx1) and (1xm). The output will be (nxm).

Top Essential Data Science Skills to Learn

import time
import numpy
import array
a = array.array(‘i’, [random.randint(1,100) for _ in range(100)])
b = array.array(‘i’, [random.randint(1,100) for _ in range(100)])

 

T1 = time.process_time()
c = numpy.zeros((200, 200))
 
for i in range(len(a)):
  for j in range(len(b)):
      c[i][j]= a[i]*b[j]
 
T2 = time.process_time()

print(f”Computation time = {1000*(T2-T1)}ms”

 

#Output:
>> Computation time = 6.819299000000001ms

 

Now, let’s do it with Numpy,

T1 = time.process_time()
c = numpy.outer(a, b)
T2 = time.process_time()

print(f”Computation time = {1000*(T2-T1)}ms”

 

#Output:
>> Computation time = 0.2256630000001536ms

As we see again, Numpy processes the same operation way faster by vectorization.

Must Read: Fascinating Python Applications in Real World

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on The Future of Consumer Data in an Open Data Economy

Broadcasting

So uptil now, we saw examples where arrays of the same size were used. What if the sizes of arrays are different? Here’s where Numpy’s another great feature, Broadcasting, comes into picture.

Broadcasting is another extension to vectorization where arrays need not be of the same sizes for operations to be performed on them like addition, subtraction, multiplication, etc. Let’s understand this by a very simple example of addition of an array and a scalar.

 

a = np.array([1, 1, 1, 1])
a+5

 

#Output:
array([6, 6, 6, 6])

 

As we see, the scalar 5 got added to all the elements. So how did it happen?

To imagine the process, you can think that the scalar 5 is repeated 4 times to make an array which is then added into the array a. But keep in mind, Numpy doesn’t create any such arrays which will only take up memory. Numpy just “broadcasts” or duplicates the scalar 5 up to 4 places to add it to the array a.

Let’s take another easy example.

a = np.ones((3,3))
b = np.ones(3)
a+b

 

#Output:
>> array([[2., 2., 2.],
          [2., 2., 2.],
          [2., 2., 2.]])

In the above example, the array of shape (3,1) got broadcasted to (3,3) to match array a.

But does this mean that any array with any dimension can be broadcasted to match an array with any dimension? 

NO!

Our learners also read: Free Online Python Course for Beginners

Broadcasting rules

Numpy follows a set of easy rules to make sure only the arrays following the criteria are broadcasted. Let’s take a look.

The rule of broadcasting says that the 2 arrays that are to be operated must either have the same dimensions or if either of them is 1.

Let’s see this is in action.

Example 1:

Consider below arrays of dimensions:

a = 3 x 4 x 7

b = 3 x 4 x 1

Here b’s last dimension will be broadcasted to match that of a to 7.

Hence, result = 3 x 4 x 7

 

Example 2:

a = 3 x 4 x 7

b =       4 

Now, the number of dimensions of a and b are unequal. In such cases the array with lesser number of dimensions will be padded with 1.

So, here, b’s first and last dimensions are 1, so they will be broadcasted to match that of a to 3 and 7.

Hence, result = 3 x 4 x 7.

Read: Python Tutorial

Example 3:

a = 3 x 4 x 1 x 5

b = 3 x 1 x 7 x 1

Here, again, b’s second and last dimensions will be broadcasted to match that of a to 4 and 5. Also, the third dimension of a will be broadcasted to match that of b to 7.

Hence, result = 3 x 4 x 7 x 5

Now let’s see when the condition fails:

Example 4:

a = 3 x 4 x 7 x 5

b = 3 x 3 x 7 x 4

Here, the second and fourth dimensions of b do not match with a and neither they are 1. In this case, Python will throw a value error:

ValueError: operands could not be broadcast together with shapes (3,4,7,5) (3,3,7,4

Example 5:

a = 3 x 4 x 1 x 5

b = 3 x 2 x 3

Result: ValueError

Here as well, the second dimension doesn’t match and is neither 1 for either of them.

Read our popular Data Science Articles

Before You Go

Vectorization and Broadcasting, both, are methods how Numpy makes its processing optimized and more efficient. These concepts should be kept in mind especially when dealing with matrices and n-dimensional arrays, which are very common in image data and Neural Networks.

If you are curious to learn about python, data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1What is Vectorization in Python?

Numpy is a Python package providing several standard mathematical functions enabling rapid operations on large arrays of data without the need for loops, including Vectorization. Vectorization is used to accelerate Python programs without the usage of loops. Using such a method can aid in reducing the amount of time code takes to execute. There are various operations that are carried out on vectors, such as dot product of vectors, also known as a scalar product because it produces a single output, outer products, which results in a square matrix of dimension equal to the length x length of the vectors, element-wise multiplication, which produces elements with the same indexes.

2What is Broadcasting in Python?

The word broadcasting refers to how Numpy manages arrays with differing Dimensions during arithmetic operations that result in specific restrictions; the smaller array is broadcast across the enormous array so that their forms are consistent. Broadcasting allows you to vectorize array operations such that looping takes place in C rather than Python, as Numpy does. It accomplishes this without creating unnecessary copies of data, resulting in efficient algorithm implementations. In certain situations, broadcasting is a negative idea since it results in wasteful memory consumption, which slows down processing.

3What are the uses of NumPy in Python?

NumPy or Numerical Python is a free and open-source Python library utilized by nearly every research and engineering branch. The NumPy library includes multidimensional array and matrix data structures and offers methods for efficiently operating on an array, a homogenous n-dimensional array object. Users can use NumPy to execute a wide range of mathematical operations on arrays. It enhances Python with strong data structures that provide efficient computations with arrays and matrices, as well as a massive library of high-level mathematical functions that work on these arrays and matrices.

Explore Free Courses

Suggested Blogs

Priority Queue in Data Structure: Characteristics, Types & Implementation
57467
Introduction The priority queue in the data structure is an extension of the “normal” queue. It is an abstract data type that contains a
Read More

by Rohit Sharma

15 Jul 2024

An Overview of Association Rule Mining & its Applications
142458
Association Rule Mining in data mining, as the name suggests, involves discovering relationships between seemingly independent relational databases or
Read More

by Abhinav Rai

13 Jul 2024

Data Mining Techniques & Tools: Types of Data, Methods, Applications [With Examples]
101684
Why data mining techniques are important like never before? Businesses these days are collecting data at a very striking rate. The sources of this eno
Read More

by Rohit Sharma

12 Jul 2024

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
58114
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. The full form
Read More

by Rohit Sharma

11 Jul 2024

Top 7 Data Types of Python | Python Data Types
99373
Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat
Read More

by Rohit Sharma

11 Jul 2024

What is Decision Tree in Data Mining? Types, Real World Examples & Applications
16859
Introduction to Data Mining In its raw form, data requires efficient processing to transform into valuable information. Predicting outcomes hinges on
Read More

by Rohit Sharma

04 Jul 2024

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About
82805
What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes
Read More

by Rohit Sharma

04 Jul 2024

Most Common Binary Tree Interview Questions & Answers [For Freshers & Experienced]
10471
Introduction Data structures are one of the most fundamental concepts in object-oriented programming. To explain it simply, a data structure is a par
Read More

by Rohit Sharma

03 Jul 2024

Data Science Vs Data Analytics: Difference Between Data Science and Data Analytics
70271
Summary: In this article, you will learn, Difference between Data Science and Data Analytics Job roles Skills Career perspectives Which one is right
Read More

by Rohit Sharma

02 Jul 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon