Python Collections module offers a set of container data types that extend the features of stock containers like Lists, Tuples, Sets, and Dictionaries. With these special containers, you not only have the features of stock containers, but also some extra methods which come in very handy for certain tasks.
By the end of this tutorial, you’ll have the knowledge of the following:
- What is the collections module?
- Various functions like :
- Counter
- ChainMap
- Deque
- Named Tuple
- Working examples
The Collections module comes pre-installed in Python so we don’t need to pip install it. We can just import it and you’re ready to go! Let’s go into the most used functions in detail.
Learn learn data science from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Must Read: Fascinating Python Applications in Real World
upGrad’s Exclusive Data Science Webinar for you –
Counter
The Counter is easily the most used and most useful function in the Collections module. Counter is a subclass of the dictionary class in Python. It counts the number of occurrences of each element in an iterable(such as strings, tuples, lists, etc.) and stores it in a dictionary. The dictionary keys are the unique elements in the iterable and the values are the counts of those elements.
Let’s try it out with some examples.
Import collections Marvel = ‘Bad Wolverine bullied poor Iron Man Bad Wolverine poor poor Iron Man’ Marvel_count = collections.Counter(Marvel.split()) |
#Output: Counter({‘Bad’: 3, ‘Iron’: 2, ‘Man’: 2, ‘Poor’: 2, ‘Wolverine’: 2, ‘bullied’: 1}) |
As we see, it counted the occurrences of every element and put them in a dictionary. This can be used in any type of iterable. Now let’s see what all methods it has.
Marvel_count[‘Bad’] #>> 3 Marvel_count.values() #>> dict_values([3, 2, 1, 2, 2, 2]) Marvel_count.keys() #>> dict_keys([‘Bad’, ‘Wolverine’, ‘bullied’, ‘Iron’, ‘Man’, ‘Poor’]) |
The most_common(n) method returns a list of the n most common elements arranged in a descending order of count.
Marvel_count.most_common(2) #>> [(‘Bad’, 3), (‘Wolverine’, 2)] |
Explore our Popular Data Science Certifications
ChainMap
ChainMap is used to make a single view of many dictionaries so that they can be accessed and updated from the single view i.e. the ChainMap object itself. Do keep in mind that these ChainMaps only consist of the references to the actual dictionaries and the update is also done in the actual dictionaries itself.
ChainMap is an extension of the dictionary class, so all the dictionary methods are supported, plus a few extra methods which we’ll be going over.
dic1 = {‘a’ : 1, ‘b’ : 2} dic2 = {‘b’ : 3, ‘c’ : 4} Dic3 = {‘b’ : 9, ‘d’ : 4} chain1 = collections.ChainMap(dic2, dic1) chain1 |
In the above code, we define two dictionaries dic1 and dic2 and put them in a ChainMap object.
#Output: ChainMap({‘b’: 3, ‘c’: 4}, {‘a’: 1, ‘b’: 2}) |
As we see, dic2 is ‘chained’ with dic1 in this very order. In essence, you can imagine dic2 being connected to dic1 like dic2–>dic1. So when we search for the key ‘b’, it will first search in the first mapping which is dic2 and if the key is not found, it will go to the next mappings.
Therefore, the order of the ChainMap is important to determine which mapping is searched first. Let’s see that in action.
chain1[‘b’] #>> 3 |
As we see that above ChainMap has the key ‘b’ in both the dictionaries. So when we search for the key ‘b’, it searches in the first mapping which is dic2 and returns the value.
maps attribute
The maps attribute ChainMap returns a list of mappings in the order of search, i.e., dic2 is first in the map, so it will be searched first and so on.
chain1.maps #>> [{‘b’: 3, ‘c’: 4}, {‘a’: 1, ‘b’: 2}] |
Similarly, we can check for keys and values:
list(chain1.keys()) #>> [‘a’, ‘c’, ‘b’] |
list(chain1.values()) #>> [1, 4, 3] |
As we see, only the unique keys are shown and the values as well.
new_child(m=None)
The new_child() method is used to add new maps into the ChainMap. This method returns a new ChainMap with the new map as the first map followed by the rest of maps. If m is specified, it becomes the first map, else an empty dictionary is added as the first map.
chain1.new_child(dic3) chain1.maps |
#Output: [{‘b’: 9, ‘d’: 4}, {‘b’: 3, ‘c’: 4}, {‘a’: 1, ‘b’: 2}] |
As we see, it added the dic3 in the beginning and returned a new ChainMap object.
reversed
You might be wondering how you can change the order of the ChainMap. That can be achieved using the reversed function which returns an iterator for iterating through the ChainMap in the reverse direction. Let’s see this in action.
The key ‘b’ is now in all the maps. The first map in the ChainMap has key ‘b’ with value as 9.
chain1[‘b’] #>> 9 |
Let’s see what happens once we iterate in the reversed direction.
chain1.maps = reversed(chain1.maps) chain1[‘b’] #>> 2 |
Keep in mind, the reversed function doesn’t really reverse the mapping, it just gives a reversed iterator.
Read: Python Tutorial
Top Data Science Skills to Learn
SL. No | Top Data Science Skills to Learn | |
1 | Data Analysis Programs | Inferential Statistics Programs |
2 | Hypothesis Testing Programs | Logistic Regression Programs |
3 | Linear Regression Programs | Linear Algebra for Analysis Programs |
Deque
Deque (pronounced as ‘deck’) is an extension of lists, but a double ended one. Deque stands for: Double Ended Queue because we can remove/pop and append elements on either end of Deques efficiently unlike lists where all the operations are on the right side.
deque(iterable, maxlen) takes in iterables and returns deque objects. They also have a maxlen parameter which decides the upper limit on the number of elements. If not specified, deque can grow indefinitely. Let’s take a look at its snappy methods.
deq = collections.deque([1, 2, 3, 4, 5], maxlen=6) deq.appendleft(8) |
#Output: deque([8, 1, 2, 3, 4, 5]) |
As we see, calling the appendleft method appended the element on the left end. Moreover, as we had initialized it with maxlen as 6 which it has reached now, appending another element will throw “StopIterationError”.
So, let’s remove the left most element using popleft:
deq.popleft() #>> 8 |
We can also remove a specific element by value using remove:
deq.remove(5) #>> deque([1, 2, 3, 4]) |
Note: calling remove method with an element which is not in the deque will throw a “ValueError”.
We can insert any element at the specified index using insert(index, element).
deq.insert(2,7) #>> deque([1, 2, 7, 3, 4]) |
Deque can be reversed by calling the reverse method.
deq.reverse() #>> deque([4, 3, 7, 2, 1]) |
Deque can also be rotated clockwise or anticlockwise using the rotate method.
#Clockwise deq.rotate(2) #>> deque([2, 1, 4, 3, 7]) |
#Anti Clockwise deq.rotate(-2) #>> deque([4, 3, 7, 2, 1]) |
Named Tuple
namedtuple() is a great uplift of the usual tuple object in Python. Named Tuples allow us to index elements by their names rather than just positions. You can think of named tuples as tables with the table name as the tuple name and column names as the index names. Named Tuple essentially assigns meaning to each element for easier access and more readable code.
Read our popular Data Science Articles
Let’s take some examples and understand how it works.
Performance = collections.namedtuple(‘Employee_Rating’, [‘Q1’, ‘Q2’, ‘Q3’, ‘Q4’]) |
In the above code, we defined a Named Tuple object “Performance” of name “Employee_Rating” with field names as “Q1”, “Q2”, “Q3” and “Q4” which will store quarterly ratings of the Employees. Let’s make 2 named tuple entries of Employee_Rating.
rahul = Performance(3, 4, 3.5, 4.5) ankit = Performance(4, 4.5, 4, 4.5) |
#Output: Employee_Rating(Q1=4, Q2=4.5, Q3=4, Q4=4.5) Employee_Rating(Q1=3, Q2=4, Q3=3.5, Q4=4.5) |
Now that we have created 2 entries, we can access them by index names.
ankit.Q1 #>> 4 |
ankit.Q3 > rahul.Q3 #>> True |
To add new entries, or make new named tuple objects, we can use the _make() method.
Milkha = Performance._make([4, 5, 5, 4.5]) Milkha |
#Output: Employee_Rating(Q1=4, Q2=5, Q3=5, Q4=4.5) |
We can edit the elements by using the _replace method on any named tuple.
rahul._replace(Q1=2) |
#Output: Employee_Rating(Q1=2, Q2=4, Q3=3.5, Q4=4.5) |
Before you go
The Collections module has a few more useful functions such as OrderedDict, defaultdict, UserList, UserString, UserDict. Make sure you get some hands on the functions we discussed in this tutorial. These container types not only make your life easier, but also improves the quality of code you write.
If you are curious to learn about python, data science, check out IIIT-B & upGrad’s Executive PG Program in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
Our learners also read: Top Python Free Courses