There is no doubt that the internet is the backbone of the modern world economy. Today, nearly 4.7 billion people in the world use the virtual platform every day, using Internet-driven applications for the news, shop for clothes, order food, listen to music, commute to and from the office, and more.
With such a vast amount of users making digital contributions daily, it is no wonder that huMONGOus amounts of unstructured data are generated in cyberspace every single day. Learn more about MongoDB future scope.
This gave rise to an urgent need for a new database paradigm that can store, serve, and support ‘Big Data’ applications (as they came to be known) 24/7, without breaking down.
The Rise Of NoSQL Databases
NoSQL, loosely known as “Not Only SQL,” is an alternative to SQL databases constrained by their fixed table schemas. By being highly flexible, NoSQL overcomes this structural drawback of SQL databases and is equipped to scale horizontally. NoSQL databases were designed to boost developer productivity, arming them with a simple and elegant data model for complex data processing and management operations.
Broadly, these data storage models came in 4 types – Document, Key-Value, Wide-Column, and Graph. We will focus on Document Databases and MongoDB architecture in this blog (the leading NoSQL database)
The MongoDB Structure
Source: MongoDB documentation
MongoDB architecture follows a flexible data model. Unlike RDBMS, which mandates a schema declaration before inserting data, MongoDB does not enforce a fixed document structure.
A key-value pair in a document, it is the counterpart of a column in relational databases
This is the equivalent of a record in RDBMS
A group of documents is called a collection. This is analogous to an RDBMS table
Differences Between RDBMS and MongoDB Architecture
In RDBMS, data can be distributed among multiple tables and joined together to access it in a single view. Such a JOIN operation is not possible in MongoDB. Instead, all the data is stored in a single collection but can be separated via nesting or embedded documents
RDBMS warrants normalizing data to avoid duplicates and orphaned records. MongoDB’s flexibility eliminates the need for normalization
RDBS is mostly used in the banking sector, where the exact database structure is known a priori. MongoDB supports huge volumes of unstructured data and is extensible across cloud, mobile, web, and Big Data applications.
The Need and Benefits of MongoDB Architecture
MongoDB architecture can handle structural changes on the fly, which is the need of the hour. This is perfect for scenarios when you don’t have visibility on your database structure beforehand.
Following are some of its key benefits
Can accommodate data flow changes dynamically, adapting to changing business requirements in real-time
Ad hoc queries – Powerful query language that can return specified fields. It also allows for highly granular search capabilities. (field-wise, range-wise, common expressions, and more)
You can index any field in a document to speed up the process of data retrieval.
Let us now take a deep dive into the MongoDB architecture.
But before we do that, we need to understand the CAP Theorem.
The CAP Theorem
CAP denotes the trifecta of Consistency, Availability, and Partition Tolerance.
Let Us Look At What Each Term Means In This Context
If you write data into a distributed database, you should be able to access the same data from any node in the system at any point in time. It’s about preserving the integrity of the written data.
This is about minimizing the downtime of a system. Read/write operations should happen on any machine in the cluster, without fail.
Partition Tolerance Or Fault Tolerance
indicates a system’s ability to keep functioning smoothly even in the case of a network partition, i.e., different parts of the cluster should be able to talk to each other and synchronize effectively.
The CAP Theorem states that a distributed system HAS to be Partition Tolerant. Any network partitions cannot bring the whole system crashing down.
In other words, you can only guarantee one parameter out of ‘Consistency’ and ‘Availability’ in a distributed system, the other being Partition Tolerance.
This gives rise to a triangle like this:
Source: Data Science Pedia
MongoDB always picks consistency over availability whenever there is a partition in the system (CP). It blocks all write operations till it can ensure the accurate execution of those writes.
MongoDB employs the single-master architecture meaning there is a primary machine taking charge of all client-side write operations. All other instances you add later to the cluster constitute the secondary nodes, which commonly handle all the read operations.
These are basically backup copies of the primary server as a failsafe against the primary crashing.
All these servers are grouped in Replica Sets. You can have multiple Replica Sets, each having its own primary and secondary servers.
In case the primary goes down, the system chooses a new primary from all the secondary nodes. But this happens arbitrarily, depending on where it’s getting the quickest ping responses from all over the systems. You need to have an odd number of servers in your cluster (minimum 3) so that a primary can be elected with a majority.
If you don’t want to spend money on three servers, you can appoint an ‘Arbiter’ node whose only job is to vote on electing the primary.
Sharding in MongoDB lets you distribute your Big Data across several databases.
Source: MongoDB Documentation
You have an application having millions of users. Sharding lets you partition these users (based on a unique index like a User ID) into different replica sets. Using a process called mongoS, the Application Server talks to the Config Servers (precisely 3) to understand which ‘Shard’ contains the data it is seeking. mongoS runs a Load Balancer process in the background to automatically distribute the load (in this case, number of users) evenly between all the shards.
If you want to learn more about MongoDB and database operations, check out MongoDB project ideas. You can explore the PG Diploma in Data Science from upGrad. A 12-month course designed for working professionals, you get comprehensive career counseling and job opportunities, along with the prestigious IIIT Bangalore Alumni Status.
We hope this article helped you understand how the MongoDB Architecture works and how the system operates. To know more, please look at our other blogs.