Anyone interested in data science must know about Probability Distribution. Data Science concepts such as inferential statistics to Bayesian networks are developed on top of the basic concepts of probability. So to enter into the world of statistics, learning probability is a must.
What Is Statistics?
Statistics is analysing mathematical figures using different methods.
It gives us a more holistic view of different numbers. Statistics for data science is very crucial. Data science is all about figures, and statistics make it simpler and comprehensive.
What Is Probability?
Probability is an intuitive concept. We use it unknowingly in our daily life. Probability is the measure of how likely an event occurs. For example, if there is a 60% chance of rain tomorrow, then the probability is 60%.
What Is Probability Distribution?
A probability distribution is represented in the form of a table or an equation. The table or the equation corresponds to every outcome of a statistical experiment with its probability of occurrence.
Probability distributions can be calculated even for simple events, such as tossing a coin.
The following table shows the probability distribution of each outcome of tossing a coin each outcome with its probability.
|Number of heads||Probability|
They can also be for complex events, such as the probability of a certain vaccine successfully treating COVID-19.
Prerequisites of Probability Distribution
To know about probability distributions, you must know about variables and random variables.
- A variable is a symbol (A, B, x, y, etc.). It takes any of the specified set of values.
- In a statistical experiment, a random variable is the value of a variable.
Usually, a capital letter denotes a random variable, and a lower-case letter denotes one of its values.
- X denotes the random variable X.
- P(X) denotes the probability of X.
- P(X = x) is the probability that the random variable X is equal to a particular value, denoted by x.
For example, P(X = 1) is the probability that the random variable X is equal to 1.
Checkout: Data Science Skills
Types of Probability Distributions
Statisticians divide probability distributions into the following types:
- Discrete Probability Distributions
- Continuous Probability Distributions
Discrete Probability Distributions
Discrete probability functions are the probability of mass functions. It assumes a discrete number of values.
For example, when you toss a coin, then counts of events are discrete functions because there are no in-between values. You have only heads or tails in a coin toss. Similarly, when counting the number of books borrowed per hour at a library, you can count 31 or 32 books and nothing in between.
Types of Discrete Probability Distributions
- Binomial distributions – A Bernoulli distribution has only two outcomes, 1 and 0. Therefore, the random variable X takes the value 1 with the probability of success as p, and the value 0 with the probability of failure as q or 1-p.
Thus, if you toss a coin, the occurrence of head denotes success, and a tail denotes failure.
The probability function is px(1-p)1-x where x € (0, 1)
- Normal distributions – Normal distributions are for the most basic situations. It has the following characteristics:
- Mean, median, and mode coincides.
- The distribution curve is bell-shaped.
- The distribution curve is symmetrical along x = μ.
- The area under the curve is 1.
- Poisson distributions – Counting number of books at a library falls under probability distribution.
Poisson distributions have the following assumptions:
- A successful event is not influencing the outcome of another successful event.
- The probability of success over a short duration equals the probability of success over a longer duration.
- The probability of success in a duration nears zero as the duration becomes smaller.
Also Read: Data Science vs Data Analytics
Continuous Probability Distributions
It is also known as probability density functions. There is a continuous distribution if the variable assumes to have an infinite number of values between any two values. Continuous variables are measured on scales, like height, weight and temperature.
When compared to discrete probability distributions where every value is a non-zero outcome, continuous distributions have a zero probability for specific functions. For example, the probability is zero when measuring a temperature that is exactly 40 degrees.
Types of Continuous Probability Distributions
- Uniform distributions – When rolling a dice, the outcomes are 1 to 6. The probabilities of these outcomes are equal, and that is a uniform distribution.
Suppose the random variable X assumes k different values. Also, P(X=xk) is constant.
The P(X=xk) = 1/k
- Cumulative probability distributions – When the probability that the value of a random variable X is within a specified range, cumulative probability comes into the picture.
Suppose you toss a coin, then what is the probability of the outcome to be one or fewer heads. This is a cumulative probability.
|Number of heads: x||Probability P(X=x)||Cumulative Probability: P(X ≤ x)|
- Probability distribution shows the expected outcomes of the possible values for a given data-generating process.
- Probability distributions are of different types having different characteristics. The characteristics are mainly defined by the mean and standard deviation.
- Investors heavily rely on probability distributions to forecast returns on assets such as stocks over time and to foresee their risk.
If you are curious about learning data science to be in the front of fast-paced technological advancements, check out upGrad & IIIT-B’s PG Diploma in Data Science.