Anyone interested in data science must know about Probability Distribution. Data Science concepts such as inferential statistics to Bayesian networks are developed on top of the basic concepts of probability. So to enter into the world of statistics, learning probability is a must.
What Is Statistics?
Statistics is analysing mathematical figures using different methods.
It gives us a more holistic view of different numbers. Statistics for data science is very crucial. Data science is all about figures, and statistics make it simpler and comprehensive.
What Is Probability?
Probability is an intuitive concept. We use it unknowingly in our daily life. Probability is the measure of how likely an event occurs. For example, if there is a 60% chance of rain tomorrow, then the probability is 60%.
What Is Probability Distribution?
A probability distribution is represented in the form of a table or an equation. The table or the equation corresponds to every outcome of a statistical experiment with its probability of occurrence.
Probability distributions can be calculated even for simple events, such as tossing a coin.
The following table shows the probability distribution of each outcome of tossing a coin each outcome with its probability.
|Number of heads||Probability|
They can also be for complex events, such as the probability of a certain vaccine successfully treating COVID-19.
Prerequisites of Probability Distribution
To know about probability distributions, you must know about variables and random variables.
- A variable is a symbol (A, B, x, y, etc.). It takes any of the specified set of values.
- In a statistical experiment, a random variable is the value of a variable.
Usually, a capital letter denotes a random variable, and a lower-case letter denotes one of its values.
- X denotes the random variable X.
- P(X) denotes the probability of X.
- P(X = x) is the probability that the random variable X is equal to a particular value, denoted by x.
For example, P(X = 1) is the probability that the random variable X is equal to 1.
Checkout: Data Science Skills
Types of Probability Distributions
Statisticians divide probability distributions into the following types:
- Discrete Probability Distributions
- Continuous Probability Distributions
Discrete Probability Distributions
Discrete probability functions are the probability of mass functions. It assumes a discrete number of values.
For example, when you toss a coin, then counts of events are discrete functions because there are no in-between values. You have only heads or tails in a coin toss. Similarly, when counting the number of books borrowed per hour at a library, you can count 31 or 32 books and nothing in between.
Types of Discrete Probability Distributions
- Binomial distributions – A Bernoulli distribution has only two outcomes, 1 and 0. Therefore, the random variable X takes the value 1 with the probability of success as p, and the value 0 with the probability of failure as q or 1-p.
Thus, if you toss a coin, the occurrence of head denotes success, and a tail denotes failure.
The probability function is px(1-p)1-x where x € (0, 1)
- Normal distributions – Normal distributions are for the most basic situations. It has the following characteristics:
- Mean, median, and mode coincides.
- The distribution curve is bell-shaped.
- The distribution curve is symmetrical along x = μ.
- The area under the curve is 1.
- Poisson distributions – Counting number of books at a library falls under probability distribution.
Poisson distributions have the following assumptions:
- A successful event is not influencing the outcome of another successful event.
- The probability of success over a short duration equals the probability of success over a longer duration.
- The probability of success in a duration nears zero as the duration becomes smaller.
Also Read: Data Science vs Data Analytics
Continuous Probability Distributions
It is also known as probability density functions. There is a continuous distribution if the variable assumes to have an infinite number of values between any two values. Continuous variables are measured on scales, like height, weight and temperature.
When compared to discrete probability distributions where every value is a non-zero outcome, continuous distributions have a zero probability for specific functions. For example, the probability is zero when measuring a temperature that is exactly 40 degrees.
Learn Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Types of Continuous Probability Distributions
- Uniform distributions – When rolling a dice, the outcomes are 1 to 6. The probabilities of these outcomes are equal, and that is a uniform distribution.
Suppose the random variable X assumes k different values. Also, P(X=xk) is constant.
The P(X=xk) = 1/k
- Cumulative probability distributions – When the probability that the value of a random variable X is within a specified range, cumulative probability comes into the picture.
Suppose you toss a coin, then what is the probability of the outcome to be one or fewer heads. This is a cumulative probability.
|Number of heads: x||Probability P(X=x)||Cumulative Probability: P(X ≤ x)|
- Probability distribution shows the expected outcomes of the possible values for a given data-generating process.
- Probability distributions are of different types having different characteristics. The characteristics are mainly defined by the mean and standard deviation.
- Investors heavily rely on probability distributions to forecast returns on assets such as stocks over time and to foresee their risk.
If you have the passion and want to learn more about artificial intelligence, you can take up IIIT-B & upGrad’s PG Diploma in Machine Learning and Deep Learning that offers 400+ hours of learning, practical sessions, job assistance, and much more.
What are the properties of a probability distribution?
There are three properties that a probability distribution must have to be called a probability distribution. First, it should be commutative. This just means that when you add up any two terms from the distribution, you should get the same total no matter which term you add first. Second, it should be completely monotonic, which means that each term must be greater than or equal to the previous term. And third, the distribution should be continuous, which just means that you can't have gaps between the probability for different numbers.
How are probability distributions used in decision making?
In decision making, the probability distributions are used in a wide spectrum of applications where the outcome of a process is uncertain. In the casino, the probability distributions are used to determine the odds of a particular outcome. In the medical field, the probability distributions are used to determine the likelihood of a particular disease. In business, the probability distributions are used to determine the possibility of a particular outcome to an action. The applications of these probability distributions are limitless.
What is a probability distribution?
A probability distribution is a mathematical function that gives the probability that a random variable is any particular value. The concept of a random variable is central to probability theory. The probability distribution of a discrete random variable takes the form of a list of probabilities of its individual possible values. In general, a probability distribution is a mathematical function that describes the probability of occurrence of a particular value (or range) for a random variable in a given space.