As a multi-paradigm programming language with a structured, object-oriented design approach and simple and uncluttered syntax and grammar, Python is rapidly emerging as the language of choice for programmers working on projects of varying complexity and scale.
Python provides a modular library of pre-built algorithms that allows its users to perform various operations that may help them achieve their task in and of themselves or serve as a step along the way to achieving a larger, more complex goal. One of the more popular such algorithms is one that enables the Merge Sort functionality.
What is Merge Sort?
It is a general-purpose sorting technique that enables users to take a random dataset of any type and from any source and divides it into repetitive stages until eventually it is broken down into its individual components – a recursive technique, commonly referred to as the ‘divide and conquer’ method.
The algorithm then puts together the individual components – again in repetitive stages – but sorts them into a pre-decided, logical sequence at each stage along the way, using the basic comparison and swap until the entire data series is reconstituted in the desired logical sequence.
Check out our other data science courses at upGrad.
Divide and Conquer technique
Take, for instance, a random dataset of letters of the alphabet: N, H, V, B, Q, D, Z, R.
Step 1: The original dataset first gets broken down into two groups as follows:
N, H, V, B Q, D, Z, R
Step 2: Both the resulting arrays get further sub-divided as follows:
N, H V, B Q, D Z, R
Step 3: Finally, all four arrays are further spit-up until the entire data series gets broken down into its individual components:
N H V B Q D Z R
The process then reverses, and the individual data points now begin to merge in a stage-wise manner. But over the course of this merging process, each element in each sub-array gets assesses and swapped so that they sort themselves out in a logical sequence (alphabetical order), as follows:
Step 4: Individual elements merge into pairs while swapping positions as required to form the correct sequence:
H, N B, V D, Q R, Z
Step 5: The recursive process of merging and sorting continues to the next iteration:
B, H, N, V D, Q, R, Z
Step 6: The entire data series is finally reconstituted in its logical alphabetical order:
B, D, H, N, Q, R, V, Z
Explore our Popular Data Science Courses
Merge Sort Implementations
There are two approaches to Merge Sort implementation in Python. The top-down approach and the bottom-up approach.
Top-down Approach:
The more commonly used top-down approach is the one described above. It takes longer and uses up more memory, and is therefore inefficient when working with smaller datasets. However, it is far more reliable, particularly when applied to large datasets.
Read our popular Data Science Articles
Input code:
def merge_sort (inp_arr):
size = len(inp_arr)
if size > 1:
middle = size // 2
left_arr = inp_arr(:middle)
rIght_arr = inp_arr(middle:)
merge_sort(left_arr)
merge _sort(right_arr)
i = 0
j = 0
k = 0
(Where i and j are the iterators for traversing the left and right halves of the data series, respectively, and k is the iterator of the overall data series).
left_size = len(left_arr)
right _size = len(right_arr)
while i < left_size and j < right size:
if left_arr(i) < right_arr (j):
inp_arr(k) – left_arr(i)
i >= 1
else:
inp_arr(k) = right_arr (j)
j += 1
k += 1
while i < left_size:
inp_arr (k) = left_arr(i)
i += 1
k += 1
while j < right_size:
inp_arr (k) = right_arr(j)
j += 1
k += 1
inp_arr = (N, H, V, B, Q, D, Z, R)
print(:Input Array:\n”)
print(inp_arr)
merge_sort (inp_arr)
print(“Sorted Array:\n”)
print (inp_arr)
Output:
Input Array: N, H, V, B, Q, D, Z, R
Output Array: B, D, H, N, Q, R, V, Z
Bottom-up approach:
The bottom-up approach is quicker, uses up less memory, and works efficiently with smaller datasets but may run into problems when working with large data sets. It is therefore less-frequently used.
Input code:
def merge(left, right):
result = [] x, y = 0, 0
for k in range(0, len(left) + len(right)):
if i == len(left): # if at the end of 1st half,
result.append(right[j]) # add all values of 2nd half
j += 1
elif j == len(right): # if at the end of 2nd half,
result.append(left[x]) # add all values of 1st half
i += 1
elif right[j] < left[i]:
result.append(right[j])
j += 1
else:
result.append(left[i])
i += 1
return result
def mergesort(ar_list):
length = len(ar_list)
size = 1
while size < length:
size+=size # initializes at 2 as described
for pos in range(0, length, size):
start = pos
mid = pos + int(size / 2)
end = pos + size
left = ar_list[ start : mid ] right = ar_list[ mid : end ]
ar_list[start:end] = merge(left, right)
return ar_list
ar_list = [N, H, V, B, Q, D, Z, R] print(mergesort(ar_list))
Output:
Input array: N, H, V, B, Q, D, Z, R
Output array: B, D, H, N, Q, R, V, Z
Merge Sort implementation applied to more complex, real-life datasets
Let’s apply the top-down approach to four random off-road vehicles in India:
Brand |
Model |
Ex-showroom price in Rs Crore |
Jeep | Wrangler | 0.58 |
Ford | Endeavour | 0.35 |
Jaguar Land Rover | Range Rover Sport | 2.42 |
Mercedes Benz | G-class | 1.76 |
Input code:
class Car:
def __init__(self, brand, model, price):
self.brand = brand
self.model = model
self.price = price
def __str__(self):
return str.format(“Brand: {}, Model: {}, Price: {}”, self.brand,
self.model, self.price)
def merge(list1, i, j, k, comp_fun):
left_copy = list1[i:k + 1]
r_sublist = list1[k+1:r+1]
left_copy_index = 0
j_sublist_index = 0
sorted_index = i
while left_copy_index < len(left_copy) and j_sublist_index <
len(j_sublist):
if comp_fun(left_copy[left_copy_index], j_sublist[j_sublist_index]):
list1[sorted_index] = left_copy[left_copy_index]
left_copy_index = left_copy_index + 1
else:
list1[sorted_index] = j_sublist[j_sublist_index]
j_sublist_index = j_sublist_index + 1
sorted_index = sorted_index + 1
while left_copy_index < len(left_copy):
list1[sorted_index] = left_copy[left_copy_index]
left_copy_index = left_copy_index + 1
sorted_index = sorted_index + 1
while j_sublist_index < len(j_sublist):
list1[sorted_index] = j_sublist[j_sublist_index]
j_sublist_index = j_sublist_index + 1
sorted_index = sorted_index + 1
def merge_sort(list1, i, j, comp_fun):
if i >= j:
return
k = (i + j)//2
merge_sort(list1, i, k, comp_fun)
merge_sort(list1, k + 1, j, comp_fun)
merge(list1,i, j, k, comp_fun)
car1 = Car(“Jeep”, “Wrangler”, 0.58)
car2 = Car(“Ford”, “Endeavour”, 0.35)
car3 = Car(“Jaguar Land Rover”, “Range Rover Sport”, 1.76)
car4 = Car(“Mercedes Benz”, “G-class”, 2.42)
list1 = [car1, car2, car3, car4]
merge_sort(list1, 0, len(list1) -1, lambda carA, carB: carA.brand < carB.brand)
print(“Cars sorted by brand:”)
for car in list1:
print(car)
print()
merge_sort(list1, 0, len(list1) -1, lambda carA, carB: carA.price< carB.price)
print(“Cars sorted by price:”)
for car in list1:
print(car)
Output:
Cars sorted by brand:
Ford Endeavour
Jaguar Land Rover Range Rover Sport
Jeep Wrangler
Mercedez Benz G-class
Cars sorted by price:
Ford Endeavour
Jeep Wrangler
Jaguar Land Rover Range Rover
Mercedez Benz G-class
You can learn both theoretical and practical aspects of Python with upGrad’s Professional Certificate in Data Science and Business Analytics from the University of Maryland. This course helps you learn Python from scratch. Even if you are new to programming and coding, upGrad will offer you a two-week preparatory course so that you can pick up on the basics of programming. you will learn about various tools like Python, SQL,, while working on multiple industry projects.
