What is Hashing in Data Structure? Explore Hashing Techniques, Benefits, Limitations, and More
By Rohan Vats
Updated on Dec 02, 2025 | 28 min read | 187.08K+ views
Share:
Working professionals
Fresh graduates
More
By Rohan Vats
Updated on Dec 02, 2025 | 28 min read | 187.08K+ views
Share:
Table of Contents
Hashing in data structure maps data to fixed-size values using a hash function, allowing fast storage and quick access. You see it in search operations, indexing, caching, and secure verification systems. It cuts lookup time, avoids full scans, and keeps data organized through direct addressing. This approach powers databases, compilers, file systems, and many real-world applications where fast retrieval is essential.
In this guide, you’ll read more about the basics, hashing techniques, collision handling methods, benefits, limitations, and common use cases.
Level Up Your Technical Skills! Master SQL, data analytics, and cloud computing with our Online Data Science Course. Start building real-world projects today!
Think of a library with thousands of books. If you had to search shelf by shelf, it would take a long time. But if every book had a unique code that pointed you directly to its exact spot, you could find it in seconds.
That is the basic idea behind hashing in data structure. It gives every piece of data a special value that tells the system where it should be stored or found.
Hashing in DSA is a method of converting input data into a fixed-size value called a hash value. A hash function performs this conversion. When you store something, the hash value decides the location. When you search for it later, the same hash value helps you jump straight to the right spot.
Here’s a simple flowchart explaining the core concept of hashing:
When we define hashing in data structures, it is also important to know about the core components.
Let’s explore the components of hashing in detail now.
1. Key
A key is the piece of data you want to store and retrieve. It can be a number, text, or any unique identifier. Hashing in DSA uses this key as the starting point for generating a hash code. The system applies a hash function to the key so it can decide exactly where the data should be placed.
When you use meaningful keys, locating or updating entries becomes much easier. A key must remain consistent across operations so that the same input always produces the same code.
2. Hash Function
The hash function converts each key into a numeric output called a hash code. Hashing in Data Structures relies on this step because the hash code decides where the data will be stored in the table. A well-designed hash function distributes keys evenly across the hash table and reduces collisions.
Simple versions might use modulo arithmetic (key % tableSize), while advanced ones use more complex math or encryption-based formulas.
A good hashing function in data structure should do the following things:
3. Hash Table
A hash table is an array or similar structure that reserves spots for each hash code. The table’s size depends on factors like expected data volume and desired efficiency. Each index in the table points to a location where data is kept.
Here’s a quick flowchart on how hash tables work:
When a new key arrives, the hash function determines which slot the data should occupy. If a collision occurs, the system uses hashing collision resolution techniques — like chaining or open addressing — to store items without discarding anything.
You can also check out upGrad’s free tutorial, Data Structures and Algorithms. Explore categories and algorithms. Master all the key operations in data handling and decipher DSA like never before.
Also Read: Hash Tables and Hash Maps in Python
Software Development Courses to upskill
Explore Software Development Courses for Career Progression
Each component in hashing works together to keep data access quick and predictable. The key represents the data you want to store. The hash function turns that key into a code that points to a specific place. The hash table stores the value so the system can reach it instantly. When you define hashing in data structure, this is the core idea: convert a key into a location and reach it without scanning everything.
This setup supports fast insertion, retrieval, and deletion. It works well in large databases, indexing systems, caching, and any situation where quick access to information is important.
Here’s an example to help you understand how hashing works in data structures:
Here’s a step-by-step guide on how hashing in DSA works:
Let’s understand this better through two examples that illustrate how hashing in Data Structure works.
Example 1: Numeric Hashing
Suppose you have three numbers: 14, 19, and 23, along with a simple hash function h(key) = key mod 5.
The table below shows how each key maps to a slot in a hash table of size 5 (indexes 0 through 4):
Key |
H (key) =keymod 5 |
Resulting Index |
| 14 | 14 % 5 = 4 | 4 |
| 19 | 19 % 5 = 4 | 4 (collision) |
| 23 | 23 % 5 = 3 | 3 |
You place 14 at index 4, but 19 also maps to index 4, which causes a collision. The last key, 23, goes to index 3 with no issue. Collisions get handled through methods like chaining or open addressing, which you’ll learn more about later.
Example 2: String Hashing
When hashing a string like "CAT," you might add the ASCII values of each character (C=67, A=65, T=84) for a total of 216. If you apply 216 % 10, you get 6 as the final index.
Below is a table for three strings:
String |
Sum of ASCII Values |
Computation |
Final Index |
| CAT | 67 + 65 + 84 = 216 | 216 % 10 = 6 | 6 |
| DOG | 68 + 79 + 71 = 218 | 218 % 10 = 8 | 8 |
| KEY | 75 + 69 + 89 = 233 | 233 % 10 = 3 | 3 |
Each string generates an index in the table, although collisions can still occur if different strings produce the same final value. The main advantage is that hashing can point you directly to each item's spot instead of forcing you to check every element in the table.
Take the leap into Data Science success – Explore upGrad's comprehensive data science courses and boost your career now!
Also Read: Sorting in Data Structure: Categories & Types [With Examples]
Hash functions follow different approaches, or hashing techniques, to turn keys into compact codes. While all hashing techniques in a data structure aim to minimize collisions, they handle numeric or string inputs in unique ways.
Below is a closer look at widely used hashing techniques, complete with examples to show how keys transform into hash codes.
The division method is one of the simplest types of hashing in data structure. You take a key and divide it by the size of the hash table, then use the remainder as the index. This approach is quick because it involves a single division operation, but picking the right table size is important to reduce collisions.
How Does It Work?
Let’s understand this through an example: Suppose you have keys [13, 27, 18, 42] and a hash table of size 10:
Key |
Computation |
Remainder |
Index |
| 13 | 13 ÷ 10 | 3 | 3 |
| 27 | 27 ÷ 10 | 7 | 7 |
| 18 | 18 ÷ 10 | 8 | 8 |
| 42 | 42 ÷ 10 | 2 | 2 |
You place each key at its remainder index. Even if different keys produce the same remainder, collisions can still happen, so you must have a plan to handle them (for example, chaining or open addressing).
The mid-square method squares the key and extracts the middle digits to form the index. This hashing technique can work well if the squared values vary enough to reduce collisions.
How Does It Work?
Example: Use a table size of 100 for clarity. Let’s take the key ‘56’:
If the key is 123, then:
This approach distributes keys better than a basic division if the squared numbers vary enough. However, repeated patterns can still result in collisions.
This hashing technique in data structure taps into more of the key's digits by squaring it, often spreading values. However, if certain patterns repeat, collisions can still occur.
In folding, you divide the key into equal parts and add those parts together to get the final index. The last segment may be shorter if the key length doesn’t divide evenly. This method works well when keys have multiple segments, like phone numbers or identification codes. In hashing in data structure, folding is a simple way to break down large keys and convert them into manageable values for quick storage and lookup.
How Does It Work?
Example: Take a phone number-like key, such as 123456, and split it into two-digit segments: 12, 34, 56.
If the key had odd digits, the last chunk might have fewer digits. You still add it to the sum.
The multiplication method multiplies the key by a constant between 0 and 1, then uses the fractional part of the result to compute the index. This approach aims to spread keys more evenly than simple division. In hashing in data structure, this method is popular because it reduces patterns that can cluster keys in the same area.
How it works
Example
Key = 45, A = 0.7, table size = 10
45 × 0.7 = 31.5
Fractional part = 0.5
Index = 0.5 × 10 = 5
The multiplication method can still produce collisions, but it generally distributes data more smoothly than basic division.
Universal hashing is one of those hashing techniques in data structure that picks a random function from a set of possible options. This makes it harder for any adversary or pattern to consistently cause collisions. It can be more secure for certain applications, though it can add complexity.
How Does It Work?
Example: Say we keep a set of arithmetic functions like the ones listed below:
We pick one function at run time. Since it’s random, a worst-case scenario for one function may not apply to the next one.
Cryptographic hashing techniques are commonly used for passwords, digital checks, and data verification, but they can also support hashing techniques in data structures when stronger collision resistance is required. These functions create fixed outputs that are extremely hard to reverse, making them useful when accuracy and integrity matter.
How it works
Example
If you apply SHA-256 to the word “apple,” you get a long hexadecimal output of 64 characters. This method costs more computing power than simple arithmetic-based approaches, but it provides stronger protection and very stable distribution.
Also Read: What is MD5 Algorithm? How Does it Work?
Many programming languages offer built-in structures that use a hashing function in data structure operations to store and retrieve values quickly. These structures take a key, compute a hash code internally, and place the value in a table without you handling the process manually. This is one of the reasons hashing in DSA is widely used in real projects.
Below are quick Python, Java, and C++ examples that show how each language manages key–value pairs.
Python Hash Method
Python’s built-in dictionary uses a hash function on each key.
Below is a short code snippet that stores and retrieves data.
def demo_python_dict():
# Create a dictionary
phonebook = {
"Asha": 12345,
"Rahul": 67890
}
# Insert a new key-value pair
phonebook["Meera"] = 55555
# Retrieve values
print("Asha's number:", phonebook["Asha"])
print("Rahul's number:", phonebook["Rahul"])
print("Meera's number:", phonebook["Meera"])
if __name__ == "__main__":
demo_python_dict()
Output:
Asha's number: 12345
Rahul's number: 67890
Meera's number: 55555
Python’s internal hash function generates an index for each name. When you call phonebook["Asha"], the hashing function in data structure logic helps Python use that hash code to jump directly to the stored value. This gives you constant average-time access without scanning the entire table.
Java Hash Method
Java provides the HashMap class, which maps keys to values using hashing. It automatically resizes the table when the load factor becomes high and handles collisions by linking entries in buckets.
import java.util.HashMap;
public class DemoHashMap {
public static void main(String[] args) {
HashMap<String, Integer> phonebook = new HashMap<>();
// Insert entries
phonebook.put("Asha", 12345);
phonebook.put("Rahul", 67890);
// Insert a new one
phonebook.put("Meera", 55555);
// Retrieve values
System.out.println("Asha's number: " + phonebook.get("Asha"));
System.out.println("Rahul's number: " + phonebook.get("Rahul"));
System.out.println("Meera's number: " + phonebook.get("Meera"));
}
}
Output:
Asha's number: 12345
Rahul's number: 67890
Meera's number: 55555
When you add a key, HashMap turns that key into a hash code and places the data in the corresponding bucket. If two keys end up in the same bucket, Java uses a linked list or tree to differentiate them internally.
Also Read: What is Hashtable in Java? Explained with Examples
C++ Hash Method
C++ provides an unordered_map that relies on a hashing function in data structure operations. When you insert a key, it generates a hash code and finds the right bucket to place the value. Collisions are usually handled through chaining, allowing multiple entries to occupy the same bucket safely.
#include <iostream>
#include <unordered_map>
#include <string>
int main() {
std::unordered_map<std::string, int> phonebook;
// Insert entries
phonebook["Asha"] = 12345;
phonebook["Rahul"] = 67890;
// Insert a new one
phonebook["Meera"] = 55555;
// Retrieve values
std::cout << "Asha's number: " << phonebook["Asha"] << std::endl;
std::cout << "Rahul's number: " << phonebook["Rahul"] << std::endl;
std::cout << "Meera's number: " << phonebook["Meera"] << std::endl;
return 0;
}
Output:
Asha's number: 12345
Rahul's number: 67890
Meera's number: 55555
unordered_map calculates a hash code for each name. When you call phonebook["Asha"], it looks up the code and jumps to the correct spot. This method keeps data retrieval at a constant average time.
Also Read: Types of Data Structures in Python That Every Coder Should Know!
Subscribe to upGrad's Newsletter
Join thousands of learners who receive useful tips
Hashing makes large-scale lookups feel instant. You can store thousands or even millions of records and still reach the item you want in constant average time. Many data-driven systems rely on this hashing technique for real-time performance, whether it’s checking passwords or searching entries in a phone directory.
The idea is straightforward: convert each key into a numeric code, then use that code to jump directly to the data’s location. You skip repetitive scans through entire lists, which boosts speed dramatically. By cutting down the overhead of sequential searches, hashing keeps data operations efficient, even as your collection expands.
Below are some key reasons why hashing offers strong benefits for data retrieval:
Collisions happen when two different keys map to the same index in a hash table. In hashing in data structure, you work with a limited number of slots, but the space of possible keys is usually much larger. Because of this mismatch, two inputs can produce the same hash code, leading both items to compete for the same position. This is normal, since no hashing function in data structure can guarantee a unique output for every possible key.
A strong hash function, a suitable table size, and balanced load factors reduce collisions, but they cannot eliminate them fully. That’s why hashing techniques in data structure focus heavily on how to manage collisions safely without losing data.
Common reasons collisions occur:
Collision resolution is a core part of what is hashing in data structure, because storing and retrieving values reliably depends on handling these conflicts correctly.
Separate chaining stores multiple values at the same index by linking them in a small list. Instead of overwriting data, each slot points to a chain of entries. This is one of the most common hashing techniques in data structure with example use cases in dictionaries, symbol tables, and indexing systems.
How it works:
Separate chaining is easy to implement and scales well, although long chains can slow lookups.
Open addressing keeps all elements inside the main array. When a collision occurs, the system probes other slots until it finds an empty one. This approach is common in hashing in DSA because it keeps memory contiguous.
A good example of hashing here is understanding the different probing strategies:
Linear Probing
Moves step-by-step to the next index until it finds an empty slot. Simple but prone to clustering.
Quadratic Probing
Jumps wider using square increments like 1, 4, 9, and 16. This reduces consecutive collisions.
Double Hashing
Uses a second hash function to determine how far to jump during each probe. This often spreads keys more evenly.
How linear probing works (example):
These methods form the foundation of hashing techniques in data structure.
A custom hash table gives you full control over how data gets stored and how collisions are resolved. You can pick a hash function, decide on the table size, and choose whether to use chaining or open addressing.
Before you dive into the intricate details, here’s a quick flowchart on implementing custom hash tables:
Below is a step-by-step approach, followed by an example in pseudo-code.
Step 1. Choose a Hash Function
Select a function that spreads your keys. A simple choice is the division method:
index = key % tableSize
If you need a better distribution, you can choose more advanced approaches (mid-square, multiplication). Consider data patterns before deciding.
Step 2. Decide on Table Size
A table that is too small leads to excessive collisions. Depending on the function, a prime number or power of two is common. Ensure you keep an eye on the load factor — if too many items fill up your table, your performance may decline.
Step 3. Select a Collision Resolution Method
Step 4. Implement Basic Operations
Step 5. Manage the Load Factor and Rehashing
Below is a short pseudo-code showing how you might implement insertion and search using separate chaining.
Initialize array hashTable of size N
For i in [0..N-1]:
hashTable[i] = EmptyLinkedList()
function hashFunction(key):
return key % N
function insert(key, value):
index = hashFunction(key)
// Insert at the head or tail of the linked list
hashTable[index].addNode(key, value)
function search(key):
index = hashFunction(key)
// Traverse linked list at hashTable[index]
for node in hashTable[index]:
if node.key == key:
return node.value
return null // Not found
How Does It Work?
What About Collision Handling?
Here, collisions occur when more than one key lands on the same index. The linked list approach stores them all in that slot. This keeps the insert logic simple, although searches can take longer if the list grows too big. Balancing the table size and keeping an eye on the load factor help maintain a fast average case.
Below is a clear comparison of the advantages and disadvantages of Hashing:
Aspect |
Advantages |
Disadvantages |
| Performance | Fast O(1) access for most operations | Slows down with poor hash functions or heavy collisions |
| Memory | Fixed-size table with minimal overhead | Resizing requires rehashing all entries |
| Usability | Simple to implement and widely supported | Hard to perform ordered operations like min/max |
| Flexibility | Works with many key types | Table size and hash design require careful planning |
| Reliability | Stable performance with balanced load factor | Collisions are unavoidable and must be managed |
Hashing in data structure remains one of the most effective ways to manage large collections when your main focus is speed and direct access. Despite its limitations, it stays a core technique in DSA because of its balance between simplicity and performance.
Hashing in DSA underpins everything from password validation in online services to database indexing at large scale. By converting keys into fixed-size outputs, hashing allows systems to locate, verify, or compare data in constant average time, even when volumes soar.
Below is a quick overview of how various industries and technologies rely on hashing, along with an example of hashing to make the idea clearer:
1. Fast Data Retrieval in Databases
Problem: Large databases take time to search.
How Hashing Helps: Hash-based indexing converts search keys into hash values that point directly to the right record.
Why It Matters: Databases return results almost instantly, which is crucial for e-commerce, finance, and analytics.
2. Secure Password Storage
Problem: Storing plain passwords puts users at risk.
How Hashing Helps: Passwords are converted into irreversible hash strings. Adding salt makes each hash unique.
Why It Matters: Even if attackers access the database, they can’t easily recover real passwords.
3. Message and File Integrity
Problem: Files can be corrupted or tampered with during transfer.
How Hashing Helps: Functions like MD5 and SHA-256 create a digest of the data. Any change produces a completely different hash.
Why It Matters: Used in secure messaging, software updates, and blockchain verification.
4. Optimized Caching Systems
Problem: Repeated data requests slow down apps and servers.
How Hashing Helps: Caches store frequently accessed data using hash-based lookups.
Why It Matters: Reduces latency and improves performance for busy platforms like social media and news sites.
5. Blockchain Integrity
Problem: Decentralized systems must ensure that stored data cannot be altered.
How Hashing Helps: Every block contains the hash of the previous one. Any change breaks the chain.
Why It Matters: Hashing keeps blockchain tamper-proof and trustworthy.
6. Symbol Tables in Compilers
Problem: Compilers need quick access to variable and function names.
How Hashing Helps: Symbol tables use hash tables to store and retrieve identifiers.
Why It Matters: Speeds up compilation and reduces memory usage.
7. File Verification and Download Safety
Problem: Downloads may be corrupted or manipulated.
How Hashing Helps: Comparing the file’s hash with the original confirms integrity.
Why It Matters: Common in software distribution and open-source downloads.
8. Load Balancing in Distributed Systems
Problem: Servers must share traffic evenly.
How Hashing Helps: Consistent hashing spreads requests across servers and adapts smoothly when nodes change.
Why It Matters: Supports high availability in cloud systems and microservices.
Hashing and encryption both transform data, but they solve different problems. Hashing in DSA creates a fixed output used for integrity checks or fast lookups. Encryption protects data by converting it into a form that can be reversed only with the right key. Understanding this difference is important when designing secure systems or choosing the right method for storage.
Hashing vs Encryption
Aspect |
Hashing |
Encryption |
| Reversibility | Not reversible | Reversible with a key |
| Purpose | Integrity, authentication, fast lookup | Confidentiality and secure transfer |
| Output | Fixed-size value | Variable-size encrypted data |
| Keys | No key required (salt optional) | Key required |
| Focus | Collision resistance | Preventing unauthorized access |
| Uses | Password storage, checksums, file verification | Messaging, secure file storage |
A hash table performs well when collisions stay low. The load factor helps track this. It measures how full the table is.
If it rises too high, collisions increase and lookups slow down.
When the load factor crosses the limit, the table needs rehashing.
Rehashing creates a larger table and recomputes positions for every key so they spread out again.
Both concepts work together. The load factor warns you of rising collisions, and rehashing restores speed by redistributing data across more space.
It's clear that hashing in data structure is a cornerstone of efficient data retrieval and a fundamental concept for any serious programmer. By converting a key into an index, it allows for near-instantaneous access to data, making it the powerhouse behind dictionaries, hash maps, and database indexing.
upGrad’s data science and machine learning courses equip you with the skills to master regression analysis through various courses. Here are some of the most popular courses with positive learner outcomes which you can try:
Ready to advance your career in data? Get personalized counseling from upGrad’s experts to help you choose the right program for your goals. You can also visit your nearest upGrad Career Center to kickstart your future!
Related Blogs You Might Like:
The term comes from the culinary concept of "chopping and mixing." Similarly, hashing in data structure chops input data and mixes it mathematically to create a fixed-size output. This process effectively dices the input to create a consistent, condensed representation, much like a culinary hash acts as a uniform mix of ingredients.
Hashing in data structure is converting a key into a fixed-size integer called a hash code. A hashmap is the data structure using this process to map keys to values in an array. This combination allows for highly efficient data retrieval, insertions, and deletions.
Hashing is the method, while a hash table is the implementation. Explain hashing in data structure as the action of generating a code, whereas the table is the array where data lives. Simply put, hashing is the "how" using a hashing function in data structure, and the table is the "where."
A robust hashing function in data structure must be deterministic, always producing the same code for a specific input. It requires uniform distribution to spread keys evenly across the table to minimize collisions. Finally, it must be computationally fast, as it runs during every insertion and lookup operation.
A collision happens when hashing in DSA assigns the same index to two different keys. Since one slot cannot hold two distinct items naturally, the system requires specific strategies to handle this overlap. Managing collisions efficiently is vital to maintaining the high performance of the hash table.
There are two main hashing techniques for collisions: Separate Chaining and Open Addressing. Chaining uses a linked list at the index to store multiple items. Open Addressing probes the array for the next empty slot. Both methods ensure data integrity when multiple keys generate the same hash code.
In Separate Chaining, colliding elements form a linked list at the index. In Open Addressing, hashing techniques in data structure dictate finding the next open slot within the array itself. Chaining is generally easier to implement, while Open Addressing can be more memory-efficient if managed correctly.
The load factor measures how full a hash table is, calculated as the ratio of items to available slots. If hashing in ds results in a high load factor (often > 0.7), the table resizes to prevent performance degradation from excessive collisions, ensuring lookups remain fast.
To define hashing in data structure index calculation, the most common formula is index = hash_code % table_size. This modulo operation ensures the generated hash code maps specifically to a valid index within the array's bounds, preventing overflow errors and placing data in the correct slot.
A hashing function in data structure is critical; a good one distributes keys uniformly to minimize collisions. A poor choice clusters keys, degrading performance to O(n) like a linked list. Standard non-cryptographic functions like MurmurHash are often used to ensure speed and even distribution.
Indexing creates a lookup structure mapping keys to data locations, similar to a book index. It is essentially a highly efficient form of indexing that uses mathematical formulas to jump directly to the data location, avoiding the need to scan every item.
Rehashing occurs when a hash table's load factor gets too high. The system creates a larger array, and every existing item undergoes the hashing definition in data structure process again to find its new position. This expensive but necessary operation restores efficient access speeds.
Ideally, hashing in data structure offers O(1) constant time for search, insert, and delete. The function computes the index directly. However, in worst-case scenarios with poor collision handling, complexity can degrade to O(n), requiring linear traversal of all elements to find the correct data.
No, a standard hash is a one-way function. If you ask what is hashing in data structure with example, think of a blender: you cannot un-blend the ingredients. This irreversibility is intentional, especially in cryptography, making it computationally infeasible to retrieve original inputs from the hash code.
Hashing is a one-way process used to verify integrity, while encryption is two-way and designed to be reversed (decrypted). When discussing what is hashing in ds, remember it creates a fixed fingerprint, whereas encryption hides data that is meant to be read again later.
The MD5 algorithm produces a 128-bit hash, represented as a 32-character hexadecimal string. For an example of hashing, processing "hello" through MD5 always yields the same 32-digit string. However, due to vulnerabilities, MD5 is no longer safe for security, though it remains a classic example.
A clear example of hashing is a Python dictionary. When you save dict["key"] = value, Python uses a hash function to convert "key" into an index. This allows the program to find the memory address of "value" instantly, demonstrating the core utility of hashmaps in programming.
There are multiple hashing techniques in data structure, primarily categorized by collision resolution like Separate Chaining and Open Addressing. In cryptography, types are defined by algorithms like MD5 or SHA-256. Each type serves a specific purpose, ranging from fast data retrieval to secure password verification.
To master hashing techniques in data structure with example code, combine theory with practice. Start with structured courses like those from upGrad, then implement a hash table from scratch in Python or Java. Building the structure yourself is the best way to internalize how collisions and sizing work.
A hash value is the fixed-size output of a hashing function in data structure. It acts as a digital fingerprint. Types include non-cryptographic hashes for fast data structures (like hashmaps) and cryptographic hashes (like SHA-256) used for security. Both rely on the core concept of mapping inputs to fixed outputs.
417 articles published
Rohan Vats is a Senior Engineering Manager with over a decade of experience in building scalable frontend architectures and leading high-performing engineering teams. Holding a B.Tech in Computer Scie...
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
India’s #1 Tech University
Executive PG Certification in AI-Powered Full Stack Development
77%
seats filled
Top Resources