Big Data, as a concept, has been evoked in almost every conversation about digital innovations, the Internet of Things (IoT), and data science research. However, there’s still some confusion about what exactly this term means. In this Big Data tutorial, we aim to clarify everything you need to know before getting started with Big Data.
Simply put, big data is the gathering, analysis, and processing of large amounts of varied data emerging from multiple sources. These large datasets can provide insights into human behaviour, and inform business practices, strategies, product design, artificial intelligence, and more. In this Big Data tutorial, we’ll walk you through the key concepts and terminologies around the buzzword.
We hope that by the end of this tutorial, you’ll have enough idea to take your first steps in the journey of Big Data. But, before we proceed to that in our Big Data tutorial, let’s see the difference between small data and Big Data.
Small data vs. Big Data
It’s easy to understand the scope of big data through comparison to small data. Small data is information that can be managed by a single machine, or by using traditional methods of analysis. The source and impact of this data are on a smaller scale. For example, production logs can be used to develop weekly performance reports on the productivity of a manufacturing line; or survey results can be used in a marketing report about brand perception.
To understand the clear distinction between the two types of data, all we have to do is look at some statistics- by 2020, every person on earth will generate 1.7MB of data per second, sourced from over 50 billion devices connected to the internet. Such a large volume of data, from almost as many sources, can be used to inform business decisions for entire industries, restructuring e-commerce sites, and even revolutionizing health-care delivery.
Now that you have a rough idea of what Big Data is, let’s take this Big Data tutorial a step further and talk about the core concepts.
Big Data Characteristics
How do you process heterogeneous data on such a large scale, where traditional methods of analytics definitely fail? This has been one of the most significant challenges for big data scientists. To simplify the answer, Doug Laney, Gartner’s key analyst, presented the three fundamental concepts of to define “big data”.
This is the primary distinguisher when it comes to Big Data systems. Each of us has a digital footprint, and the amount of data-sets that can be gathered from each of our devices is mind-boggling. Take Facebook for example- as of 2016, there were 2.6 trillion posts on the social networking platform. Twitter logs in at 500 million tweets per day. Add this to all the other digital devices one is connected to, and it is easy to understand how every human on the planet generates an average of 0.77 GB data, per day.
90% of data currently available was generated in the last two years alone. 2.5 quintillion bytes of data gets generated every single day, and this data is expected to be processed in real-time (or near real-time), to generate insights that will not be rendered redundant in a constantly changing world. This is why big data analysts have stepped away from a traditional batch-oriented approach, and have adopted real-time analysis to ensure they’re generating information that is relevant to the current situation.
What makes big data systems so relevant to businesses and communities is the fact that these are unique datasets, as they emerge from varied sources, and are processed using diverse methods. Data can be sourced from social media feeds, physical devices such as Fitbit, home security systems, automobile GPS systems, and more. The data itself is hugely diverse- it could be rich media (photos, videos, audios), or structured logs and unstructured data. The USP of big data is that it consolidates all this information, regardless of its origin, to provide a comprehensive data set of every user.
The Three Vs have been used to distinguish big data since 2001, but the latest narratives are in favour of adding ‘veracity, visualization, variability, and value’ to this list, which widens the scope of big data analysis even further.
That was about the characteristics of Big Data, next on this Big Data tutorial, let’s talk about how to make this data workable and derive insights from it.
How to make sense of big data?
The USP of Big Data is the variety of insights that can be drawn. This usually cannot be done through traditional methods, as a lot of the insights, trends, and patterns are often not-obvious. Moreover, small data analysis technologies do not lend themselves to the large volume and variety of content generated through big data methods.
To overcome these barriers, various new technologies have been developed- the most popular being the Apache Hadoop. These technologies utilize clustered computing to ingest information into a data system, and compute and analyze the data, and visualize the data streams.
Big Data has found a firm place in any imaginable domain and it’ll be wrong to not talk about the wonders this Big Data is doing.
Let’s wrap up this Big Data tutorial by talking about the Applications of Big Data:
Applications of Big Data
- Personal development: On a more individual level, big data is being used to optimize individual health. Armbands and smartwatches use data about sleep cycle, calorie consumption, activity levels, and more to develop insights on improving the user’s health- which feeds back to the individual user in a personalized manner.
- Advertising: Marketing companies are utilizing a variety of data points, including GPS, traffic patterns, eye-movement tracking, etc. to determine what advertisements people are more interested in, thereby determining a more accurate marketing strategy. This is a break from the traditional marketing strategy, where the pricing was ‘per impression’ of the ad.
- Supply chain optimization: Big data is playing a big role in delivery route optimization (a huge concern for companies like Amazon and eBay), where live traffic data, driver behaviour, etc. are tracked using radio frequency identifiers, and GPS systems, to identify the right route to take, depending on the time of day and year.
- Weather forecasting: Applications on mobile phones are being used to crowdsource information about weather patterns, in real time. By using a combination of ambient thermometers, barometers, and hygrometers, these apps can generate accurate real-time data for predictive models, which can vastly improve the accuracy of weather forecasting systems.
- Building smart city infrastructure: Cities are piloting big data analysis systems to develop smart city infrastructure. Drought-ridden California used big data analytics to track water usage by consumers, helping the cut-down water usage by 80%. Los Angeles has reduced its traffic congestion by 16% by monitoring traffic signals around the city.
With each passing year, Big Data is only getting bigger and is strengthening its grips on every domain. We hope that this Big Data tutorial was able to help you understand the hype behind the word “Big Data”. If you’re interested in diving deeper, there are numerous Big Data tutorials, courses, and certifications that’ll get you going well.
Don’t wait any longer, let this Big Data tutorial be the spark you need to tame the beast that is big data.
Latest posts by Mohit Soni (see all)
- How do I Find Mentors for Data Science? - August 16, 2018
- 15 Must Know Big Data Interview Questions and Answers - May 31, 2018
- 7 Interesting Big Data Projects You Need To Watch Out - May 28, 2018