Overview
There is no doubt that the internet is the backbone of the modern world economy. Today, nearly 4.7 billion people in the world use the virtual platform every day, using Internet-driven applications for the news, shop for clothes, order food, listen to music, commute to and from the office, and more.
With such a vast amount of users making digital contributions daily, it is no wonder that huMONGOus amounts of unstructured data are generated in cyberspace every single day. Learn more about MongoDB future scope.
This gave rise to an urgent need for a new database paradigm that can store, serve, and support ‘Big Data’ applications (as they came to be known) 24/7, without breaking down.
Enter NoSQL.
The Rise Of NoSQL Databases
NoSQL, loosely known as “Not Only SQL,” is an alternative to SQL databases constrained by their fixed table schemas. By being highly flexible, NoSQL overcomes this structural drawback of SQL databases and is equipped to scale horizontally. NoSQL databases were designed to boost developer productivity, arming them with a simple and elegant data model for complex data processing and management operations.
Broadly, these data storage models came in 4 types – Document, Key-Value, Wide-Column, and Graph. We will focus on Document Databases and MongoDB architecture in this blog (the leading NoSQL database)
The MongoDB Structure
Source: MongoDB documentation
MongoDB architecture follows a flexible data model. Unlike RDBMS, which mandates a schema declaration before inserting data, MongoDB does not enforce a fixed document structure.
Terminologies
Fields
A key-value pair in a document, it is the counterpart of a column in relational databases
Document
This is the equivalent of a record in RDBMS
Collections
A group of documents is called a collection. This is analogous to an RDBMS table.
Explore our Popular Software Engineering Courses
Differences Between RDBMS and MongoDB Architecture
Joins
In RDBMS, data can be distributed among multiple tables and joined together to access it in a single view. Such a JOIN operation is not possible in MongoDB. Instead, all the data is stored in a single collection but can be separated via nesting or embedded documents
Normalization
RDBMS warrants normalizing data to avoid duplicates and orphaned records. MongoDB’s flexibility eliminates the need for normalization
Structure
RDBS is mostly used in the banking sector, where the exact database structure is known a priori. MongoDB supports huge volumes of unstructured data and is extensible across cloud, mobile, web, and Big Data applications.
Explore Our Software Development Free Courses
The Need and Benefits of MongoDB Architecture
MongoDB architecture can handle structural changes on the fly, which is the need of the hour. This is perfect for scenarios when you don’t have visibility on your database structure beforehand.
Following are some of its key benefits
Document-based
Can accommodate data flow changes dynamically, adapting to changing business requirements in real-time
Ad hoc queries – Powerful query language that can return specified fields. It also allows for highly granular search capabilities. (field-wise, range-wise, common expressions, and more)
Indexing
You can index any field in a document to speed up the process of data retrieval.
Let us now take a deep dive into the MongoDB architecture.
But before we do that, we need to understand the CAP Theorem.
In-Demand Software Development Skills
The CAP Theorem
CAP denotes the trifecta of Consistency, Availability, and Partition Tolerance.
Let Us Look At What Each Term Means In This Context
Consistency
If you write data into a distributed database, you should be able to access the same data from any node in the system at any point in time. It’s about preserving the integrity of the written data.
Availability
This is about minimizing the downtime of a system. Read/write operations should happen on any machine in the cluster, without fail.
Partition Tolerance Or Fault Tolerance
indicates a system’s ability to keep functioning smoothly even in the case of a network partition, i.e., different parts of the cluster should be able to talk to each other and synchronize effectively.
The CAP Theorem states that a distributed system HAS to be Partition Tolerant. Any network partitions cannot bring the whole system crashing down.
In other words, you can only guarantee one parameter out of ‘Consistency’ and ‘Availability’ in a distributed system, the other being Partition Tolerance.
This gives rise to a triangle like this:
Source: Data Science Pedia
MongoDB always picks consistency over availability whenever there is a partition in the system (CP). It blocks all write operations till it can ensure the accurate execution of those writes.
MongoDB Architecture
MongoDB employs the single-master architecture meaning there is a primary machine taking charge of all client-side write operations. All other instances you add later to the cluster constitute the secondary nodes, which commonly handle all the read operations.
These are basically backup copies of the primary server as a failsafe against the primary crashing.
All these servers are grouped in Replica Sets. You can have multiple Replica Sets, each having its own primary and secondary servers.
In case the primary goes down, the system chooses a new primary from all the secondary nodes. But this happens arbitrarily, depending on where it’s getting the quickest ping responses from all over the systems. You need to have an odd number of servers in your cluster (minimum 3) so that a primary can be elected with a majority.
If you don’t want to spend money on three servers, you can appoint an ‘Arbiter’ node whose only job is to vote on electing the primary.
Read our Popular Articles related to Software Development
Sharding
Sharding in MongoDB lets you distribute your Big Data across several databases.
Source: MongoDB Documentation
You have an application having millions of users. Sharding lets you partition these users (based on a unique index like a User ID) into different replica sets. Using a process called mongoS, the Application Server talks to the Config Servers (precisely 3) to understand which ‘Shard’ contains the data it is seeking. mongoS runs a Load Balancer process in the background to automatically distribute the load (in this case, number of users) evenly between all the shards.
Conclusion
If you want to learn more about MongoDB and database operations, check out MongoDB project ideas. You can explore the PG Diploma in Data Science from upGrad. A 12-month course designed for working professionals, you get comprehensive career counseling and job opportunities, along with the prestigious IIIT Bangalore Alumni Status.
We hope this article helped you understand how the MongoDB Architecture works and how the system operates. To know more, please look at our other blogs.
Learn Software Development Courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs or Masters Programs to fast-track your career.
As a developer, what is the way to work with MongoDB documents?
MongoDB helps developers in every possible way to increase their productivity. To do so, it provides native drivers that are compatible with frameworks and programming languages. Some of these native drivers are Python, Perl, Java, Scala, PHP, C#, and .NET. All the native drivers are compatible with programming languages. Thus, it becomes easy for developers to incorporate these drivers. They can further implement the use of these drivers for SQL languages. Furthermore, developers can also use MongoDB compass, the graphical user interface for MongoDB, to operate on graphs. The compass can also be utilised to generate and operate large chunks of data. Developers can use the compass for plenty of other purposes like schema validation, query performance execution, and so on.
Is there a specific place to run MongoDB?
No. MongoDB is so versatile that it is ready to operate on any platform. As a user, this gives you the privilege to work with MongoDB on any platform. Furthermore, MongoDB Atlas is a fully-functional service that you can use whenever you want for MongoDB on Google Cloud, AWS, and Azure. Plus, you can use MongoDB anywhere by simply downloading it. So, if you have your own inbuilt infrastructure, you can use MongoDB Ops Manager for performing MongoDB-related tasks. With the use of the Ops Manager, operations are relatively quick. Moreover, it hardly takes any time to deploy, monitor, and back up MongoDB data. You can use the MongoDB Ops Manager services through the MongoDB Cloud Manager.
How does MongoDB keep up with high availability?
MongoDB is very efficient and thus, creates replica sets instantly during its operations. These replica copies of data that it creates are then transferred to other servers and data centres for use. When tons of replica sets are created, it keeps databases away from downtime. A replica set is made up of replica set members. Out of these members, one of them is primary, and the rest are secondary.
