MongoDB Architecture: Structure, Terminologies, Requirement & Benefits

Overview

There is no doubt that the internet is the backbone of the modern world economy. Today, nearly 4.7 billion people in the world use the virtual platform every day, using Internet-driven applications for the news, shop for clothes, order food, listen to music, commute to and from the office, and more.

With such a vast amount of users making digital contributions daily, it is no wonder that huMONGOus amounts of unstructured data are generated in cyberspace every single day. Learn more about MongoDB future scope.

This gave rise to an urgent need for a new database paradigm that can store, serve, and support ‘Big Data’ applications (as they came to be known) 24/7, without breaking down.

Enter NoSQL. 

The Rise Of NoSQL Databases

NoSQL, loosely known as “Not Only SQL,” is an alternative to SQL databases constrained by their fixed table schemas. By being highly flexible, NoSQL overcomes this structural drawback of SQL databases and is equipped to scale horizontally. NoSQL databases were designed to boost developer productivity, arming them with a simple and elegant data model for complex data processing and management operations.

Broadly, these data storage models came in 4 types – Document, Key-Value, Wide-Column, and Graph. We will focus on Document Databases and MongoDB architecture in this blog (the leading NoSQL database)

The MongoDB Structure

Source: MongoDB documentation

MongoDB architecture follows a flexible data model. Unlike RDBMS, which mandates a schema declaration before inserting data, MongoDB does not enforce a fixed document structure.  

Terminologies

Fields

A key-value pair in a document, it is the counterpart of a column in relational databases

Document 

This is the equivalent of a record in RDBMS

Collections

A group of documents is called a collection. This is analogous to an RDBMS table

Differences Between RDBMS and MongoDB Architecture

Joins

In RDBMS, data can be distributed among multiple tables and joined together to access it in a single view. Such a JOIN operation is not possible in MongoDB. Instead, all the data is stored in a single collection but can be separated via nesting or embedded documents

Normalization 

RDBMS warrants normalizing data to avoid duplicates and orphaned records. MongoDB’s flexibility eliminates the need for normalization

Structure

RDBS is mostly used in the banking sector, where the exact database structure is known a priori. MongoDB supports huge volumes of unstructured data and is extensible across cloud, mobile, web, and Big Data applications.

The Need and Benefits of MongoDB Architecture

MongoDB architecture can handle structural changes on the fly, which is the need of the hour. This is perfect for scenarios when you don’t have visibility on your database structure beforehand. 

Following are some of its key benefits

Document-based

Can accommodate data flow changes dynamically, adapting to changing business requirements in real-time 

Ad hoc queries – Powerful query language that can return specified fields. It also allows for highly granular search capabilities. (field-wise, range-wise, common expressions, and more)

Indexing

You can index any field in a document to speed up the process of data retrieval.

Let us now take a deep dive into the MongoDB architecture.

But before we do that, we need to understand the CAP Theorem.

The CAP Theorem

CAP denotes the trifecta of Consistency, Availability, and Partition Tolerance. 

Let Us Look At What Each Term Means In This Context 

Consistency

If you write data into a distributed database, you should be able to access the same data from any node in the system at any point in time. It’s about preserving the integrity of the written data.

Availability

This is about minimizing the downtime of a system. Read/write operations should happen on any machine in the cluster, without fail. 

Partition Tolerance Or Fault Tolerance

indicates a system’s ability to keep functioning smoothly even in the case of a network partition, i.e., different parts of the cluster should be able to talk to each other and synchronize effectively.

The CAP Theorem states that a distributed system HAS to be Partition Tolerant. Any network partitions cannot bring the whole system crashing down.

In other words, you can only guarantee one parameter out of ‘Consistency’ and ‘Availability’ in a distributed system, the other being Partition Tolerance.

This gives rise to a triangle like this:

Source: Data Science Pedia

MongoDB always picks consistency over availability whenever there is a partition in the system (CP). It blocks all write operations till it can ensure the accurate execution of those writes. 

MongoDB Architecture

MongoDB employs the single-master architecture meaning there is a primary machine taking charge of all client-side write operations. All other instances you add later to the cluster constitute the secondary nodes, which commonly handle all the read operations.

These are basically backup copies of the primary server as a failsafe against the primary crashing. 

All these servers are grouped in Replica Sets. You can have multiple Replica Sets, each having its own primary and secondary servers.

Source: MongoDB Documentation

In case the primary goes down, the system chooses a new primary from all the secondary nodes. But this happens arbitrarily, depending on where it’s getting the quickest ping responses from all over the systems. You need to have an odd number of servers in your cluster (minimum 3) so that a primary can be elected with a majority.

If you don’t want to spend money on three servers, you can appoint an ‘Arbiter’ node whose only job is to vote on electing the primary. 

Sharding

Sharding in MongoDB lets you distribute your Big Data across several databases.

Source: MongoDB Documentation

You have an application having millions of users. Sharding lets you partition these users (based on a unique index like a User ID) into different replica sets. Using a process called mongoS, the Application Server talks to the Config Servers (precisely 3) to understand which ‘Shard’ contains the data it is seeking. mongoS runs a Load Balancer process in the background to automatically distribute the load (in this case, number of users) evenly between all the shards.

Conclusion

If you want to learn more about MongoDB and database operations, check out MongoDB project ideas. You can explore the PG Diploma in Data Science from upGrad. A 12-month course designed for working professionals, you get comprehensive career counseling and job opportunities, along with the prestigious IIIT Bangalore Alumni Status. 

We hope this article helped you understand how the MongoDB Architecture works and how the system operates. To know more, please look at our other blogs. 

Upskill Yourself & Get Ready for The Future

400+ HOURS OF LEARNING. 14 LANGUAGES & TOOLS. IIIT-B ALUMNI STATUS.
APPLY NOW

Leave a comment

Your email address will not be published.

×