Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconThe Ultimate Data Science Cheat Sheet Every Data Scientists Should Have

The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have

Last updated:
29th Jan, 2021
Read Time
12 Mins
share image icon
In this article
Chevron in toc
View All
The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have

For all those budding professionals and newbies alike who are thinking of taking a dive into the booming world of data science, we have compiled a quick cheat sheet to get you brushed up with the basics and methodologies that underline this field.

Data Science-The Basics

The data that gets generated in our world is in a raw form, i.e., numbers, codes, words, sentences, etc. Data Science takes this very raw data to process it using scientific methods to transform it into meaningful forms to gain knowledge and insights.


Before we dive into the tenets of data science, let’s talk a bit about data, its types, and data processing.

Types of Data

Structured – Data that is stored in a tabulated format in databases. It can be either numeric or text

Unstructured – Data that cannot be tabulated with any definitive structure to speak of is called unstructured data

Semi-structured – Mixed data with traits of both structured and unstructured data

Quantitative – Data with definite numeric values that can be quantified

Big Data – Data stored in huge databases spanning multiple computers or server farms is called Big Data. Biometric data, social media data, etc. is considered Big Data. Big data is characterised by  4 V’s

Explore our Popular Data Science Online Courses

Data Preprocessing

Data Classification – It’s the process of categorizing or labeling data into classes like numerical, textual or image, text, video, etc.

Data Cleansing – It consists of weeding out missing/inconsistent/incompatible data or replacing data using one of the following methods.

  1. Interpolation
  2. Heuristic
  3. Random Assignment
  4. Nearest Neighbour

Data Masking – Hiding or masking out confidential data to maintain the privacy of sensitive information while still able to process it.

Top Data Science Skills to Learn to upskill

What is Data Science Made of?

Concepts of Statistics


Linear Regression

Linear Regression is used to establish a relationship between two variables such as supply and demand, price and consumption, etc. It relates one variable x as a linear function of another variable y as follows

Y = f(x) or Y =mx + c, where m = coefficient

Logistic regression

Logistic regression establishes a probabilistic relationship rather than a linear one between variables. The resulting answer is either 0 or 1 and we look for probabilities and the curve is an S-shaped one.

If p < 0.5, then its 0 else 1


Y = e^ (b0 + b1x) / (1 + e^ (b0 +b1x))

where b0 = bias and b1 = coefficient


Probability helps to predict the likeliness of occurrence of an event. Some terminologies:

Sample: The set of likely outcomes

Event: It is a subset of the sample space

Random Variable: Random variables help to map or quantify likely outcomes to numbers or a line in a sample space

Probability Distributions

Discrete Distributions: Gives the probability as a set of discrete values (integer)

P[X=x] = p(x)

Explore our Popular Data Science Online Courses

Image Source

Read our popular Data Science Articles

Continuous Distributions: Gives the probability over a number of continuous points or intervals instead of discrete values. Formula:

P[a ≤ x ≤ b] = a∫b f(x) dx, where a, b are the points

Image source

Correlation and Covariance

Standard Deviation: The variation or deviation of a given dataset from its mean value

σ = √ {(Σi=1N ( xi –  x   ) ) / (N -1)}


It defines the extent of deviation of random variables X and Y with the mean of the dataset.

Cov(X,Y) = σ2XY ​= E[(X−μX​)(Y−μY​)] = E[XY]−μX​μY​​


Correlation defines the extent of a linear relationship between variables along with their direction, +ve or -ve

ρXY​= σ2XY/​​​      σX *​ *σY​

Our learners also read: Top Python Courses for Free

Artificial Intelligence

The ability of machines to acquire knowledge and make decisions based on inputs is called Artificial Intelligence or simply AI.


  1. Reactive Machines:  Reactive machine AI works by learning to react to predefined scenarios by narrowing down to the fastest and best options. They lack memory and are best for tasks with a defined set of parameters. Highly reliable and consistent.
  2. Limited Memory: This AI has some real-world observational and legacy data fed to it. It can learn and make decisions based on the given data but cannot gain new experiences.
  3. Theory of Mind: It is an interactive AI that can make decisions based on the behaviour of the surrounding entities.
  4. Self Awareness: This AI is aware of its existence and functioning apart from the surroundings. It can develop cognitive abilities and understand and evaluate the impacts of its own actions on the surroundings.

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

Read our popular Data Science Articles

AI terms

Neural Networks

Neural Networks are a bunch or network of interconnected nodes that relay data and information in a system. NNs are modeled to mimic neurons in our brains and can take decisions by learning and predicting.


Heuristics is the ability to predict based on approximations and estimates quickly using prior experience in situations where available information is patchy. It’s quick but not accurate or precise.

Case-Based Reasoning

The ability to learn from previous problem-solving cases and apply them in current situations to arrive at an acceptable solution

Natural Language Processing

It’s simply the ability of a machine to understand and interact directly in human speech or text. For ex, voice commands in a car

Machine Learning

Machine Learning is simply an application of AI using various models and algorithms to predict and solve problems.



This method relies on input data that is associative with the output data. The machine is provided with a set of target variables Y and it has to arrive at the target variable through a set of input variables X under the supervision of an optimization algorithm. Examples of supervised learning are Neural Networks, Random Forest, Deep Learning, Support Vector Machines, etc.


In this method, input variables have no labeling or association, and algorithms work to find patterns and clusters resulting in new knowledge and insights.


Reinforced learning focuses on improvisation techniques to sharpen or polish the learning behaviour. It is a reward-based method where the machine gradually improves its techniques to win a target reward.

Modeling Methods


Regression models always give numbers as output through interpolation or extrapolation of continuous data.


Classification models come up with outputs as a class or label and are better at predicting discrete outcomes like ‘what kind’

Both regression and classification are supervised models.


Clustering is an unsupervised model that identifies clusters based on traits, attributes, features, etc.

ML Algorithms

Decision Trees

Decision trees use a binary approach to arrive at a solution based on successive questions at each stage such that the outcome is either of the two possible ones like ‘Yes’ or ‘No’. Decision trees are simple to implement and interpret.

Random Forest or Bagging

Random Forest is an advanced algorithm of decision trees. It uses a large number of decision trees which makes the structure dense and complex like a forest. It generates multiple outcomes and thus leads to more accurate results and performance.

K- Nearest Neighbour (KNN)

kNN makes use of the proximity of the nearest data points on a plot relative to a new data point to predict which category it falls in. The new data point gets assigned to the category with a higher number of neighbours.

k = number of nearest neighbours

Naïve Bayes

Naïve Bayes works on two pillars, first that every feature of data points are independent, unrelated to each other, i.e. unique, and second on the Bayes theorem which predicts outcomes based on a condition or hypothesis.

Bayes Theorem:

P(X|Y) = {P(Y|X) * P(X)} / P(Y)

Where P(X|Y) = Conditional probability of X given occurrence of Y

P(Y|X) = Conditional probability of Y given occurrence of X

P(X), P(Y) = Probability of X and Y individually

Support Vector Machines

This algorithm tries to segregate data in space based on boundaries which can be either a line or a plane. This boundary is called a ‘hyperplane’ and is defined by the nearest data points of each class which in turn are called ‘support vectors’. The maximum distance between support vectors of either side is called margin.

Neural Networks


The fundamental neural network works by taking weighted inputs and outputs based on a threshold value.

Feed Forward Neural Network 

FFN is the simplest network that transmits data in only one direction. May or may not have hidden layers.

Convolutional Neural Networks  

CNN uses a convolution layer to process certain parts of the input data in batches followed by a pooling layer to complete the output.

Recurrent Neural Networks

RNN consists of a few recurrent layers between I/O layers that can store ‘historic’ data. The dataflow is bi-directional and is fed to the recurrent layers for improving predictions.

Deep Neural Networks and Deep Learning

DNN is a network with multiple hidden layers between I/O layers. The hidden layers apply successive transformations to the data before sending it to the output layer.

Deep Learning’ is facilitated through DNN and can handle huge amounts of complex data and achieve high accuracy because of multiple hidden layers

Get data science certification from the World’s top Universities. Learn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.


Data science is a vast field that runs through different streams but comes across as a revolution and a revelation for us. Data science is booming and will change how our systems work and feel in the future.

If you are curious to learn about data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.


Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1Which programming language is best suited for Data Science and why?

3. This language is by far one of the most convenient and easy to write languages with a clean syntax which improves its readability.

2What are the concepts that make data science complete?

Machine Learning
Machine Learning is another crucial component of Data Science that deals with teaching machines to predict future outcomes based on the provided data. Machine learning has three prominent modelling methods- Clustering, regression, and Classification.

3Describe the types of Machine Learning?

Machine Learning or simple ML has three major types based on their working methods. These types are as follows:
1. Supervised Learning
This is the most primitive type of ML where the input data is labelled. The machine is provided with a smaller set of data that gives the machine an insight into the problem and is trained over it.
2. Unsupervised Learning
The biggest advantage of this type is that the data is unlabelled here and the human labour is almost negligible. This opens the gate for much larger datasets to be introduced to the model.
3. Reinforced LearningThis is the most advanced type of ML that gets inspired by the lives of human beings. Desired outputs are reinforced while the useless outputs are discouraged.

Explore Free Courses

Suggested Blogs

4 Types of Trees in Data Structures Explained: Properties &#038; Applications
In this article, you will learn about the Types of Trees in Data Structures with examples, Properties & Applications. In my journey with data stru
Read More

by Rohit Sharma

31 May 2024

Searching in Data Structure: Different Search Methods Explained
The communication network is expanding, and so the people are using the internet! Businesses are going digital for efficient management. The data gene
Read More

by Rohit Sharma

29 May 2024

What is Linear Data Structure? List of Data Structures Explained
Data structures are the data structured in a way for efficient use by the users. As the computer program relies hugely on the data and also requires a
Read More

by Rohit Sharma

28 May 2024

4 Types of Data: Nominal, Ordinal, Discrete, Continuous
Summary: In this Article, you will learn about what are the 4 Types of Data in Statistics. Qualitative Data Type Nominal Ordinal Quantitative Data
Read More

by Rohit Sharma

28 May 2024

Python Developer Salary in India in 2024 [For Freshers &#038; Experienced]
Wondering what is the range of Python developer salary in India? Before going deep into that, do you know why Python is so popular now? Python has be
Read More

by Sriram

21 May 2024

Binary Tree in Data Structure: Properties, Types, Representation &#038; Benefits
Data structures serve as the backbone of efficient data organization and management within computer systems. They play a pivotal role in computer algo
Read More

by Rohit Sharma

21 May 2024

Data Analyst Salary in India in 2024 [For Freshers &#038; Experienced]
Summary: In this Article, you will learn about Data Analyst Salary in India in 2024. Data Science Job roles Average Salary per Annum Data Scient
Read More

by Shaheen Dubash

20 May 2024

Python Free Online Course with Certification [2024]
Summary: In this Article, you will learn about python free online course with certification. Programming with Python: Introduction for Beginners Le
Read More

by Rohit Sharma

20 May 2024

13 Interesting Data Structure Projects Ideas and Topics For Beginners [2023]
 In the world of computer science, understanding data structures is essential, especially for beginners. These structures serve as the foundation for
Read More

by Rohit Sharma

20 May 2024

Schedule 1:1 free counsellingTalk to Career Expert
footer sticky close icon