Explore

MBAData ScienceDoctorate Software & Tech AI | ML MarketingManagementLawBusiness Analytics Job LinkedStudy AbroadSupply Chain ManagementHR Management

Data Science Skills

Data Analysis CoursesInferential Statistics CoursesLogistic Regression CoursesLinear Regression CoursesLinear Algebra for Analysis CoursesHypothesis Testing CoursesSimply put, hypothesis testing is a process of examination of claims made against a process with the help of observed data. The process can be anything and is not related to only statistical problems.

Consider a set of random variables X1,X2, X3, ..., XN.

Let F denote the distribution function of the set of random variables.

Note that F is chosen to keep with the experiment’s model belonging to a family of distributions .

Now, the above problem would fall under the umbrella of hypothesis testing if a suggestion of the form

H0 : F 0

is encountered, where 0 is a specified proper subset of .

A statistical hypothesis is a statement used to examine the validity of claims made about the distributions of a set of random variables. The examination process is performed based on a set of observations on the random variable.

The process of examination of the above claims is known as hypothesis testing.

Definition:

If a hypothesis H0 (taken together with the model) specifies the joint distribution of X1,X2, X3, ..., Xn completely, then it is known as a simple hypothesis.

If H0 does not specify the joint distribution completely, it is said to be a composite hypothesis.

A problem of hypothesis testing falls under a parametric setup if it is assumed that the distribution function F belonging to the set of random variables X1,X2, X3, ..., Xn is known (usually assumed to follow a Normal distribution) except for some parameter or parameters .

A non-parametric setup is used for testing a hypothesis when the assumption of normality is violated. The different tests, like the t-test and f-test, work efficiently when the random variables follow a normal distribution. But for non-normal distributions, these methods are sub-optimum.

Another term used to define a non-parametric setup is distribution-free because the procedures used for testing under this case do not depend on the distribution of the random variables.

In a testing problem, the statistical hypothesis statement that equates to two or more possible outcomes of the experiment is known as a null hypothesis. It is usually taken to be the observed difference between the testing parameters.

It is denoted by H0.

Example:

Consider a testing problem where it is required to test if the mean of a particular distribution, indicated by F (say), acquires a specific value 0 (say). If denotes the mean of the distribution, the null hypothesis will be -

H0 : =0

An alternative or alternate hypothesis is proposed in a testing problem to counter the null hypothesis. If the data from the experiment contradicts the null hypothesis, the alternate hypothesis is suggested as another option.

It is generally represented by H1 or Ha.

Example:

Consider a testing problem where it is required to test if the mean of a particular distribution, indicated by F (say), acquires a specific value 0 (say). If denotes the mean of the distribution, the null hypothesis will be -

H0 : =0

Now, if the data obtained contradicts H0, then it gets rejected by the experimenter, and the alternate hypothesis gets accepted, denoted by -

H1 : 0

The testing problem is usually written as

To test - H0 : =0against H1 : 0

The alternate hypothesis can also be of the form -

H1 : <0 or H1 : > 0

A null hypothesis is rejected or accepted based on the data collected by the experimenter.

Consider the following testing problem:

Let X1,X2, X3, ..., Xn be a set of random variables independently and identically distributed following a normal distribution with mean and standard deviation 0, where the value of 0 is known.

To test: H0 : =0 against H1 : 0

Where the value of 0 is known.

Now, one can carry out testing in two ways. Either a particular test can be used, or simply the mean of the distribution can be calculated using the observed values of X.

Suppose, after calculation, the mean value comes out to be X. Two cases may arise.

Case I: 0=X

In this case, the observed data does not contradict the null hypothesis, so the null hypothesis is not rejected in favor of the alternate hypothesis.

Case II: 0X

In this case, the observed data contradicts the null hypothesis, so it gets rejected in favor of the alternate hypothesis.

The p-value method of hypothesis testing is used to check the significance of the null and alternate hypotheses in a testing problem. But before calculating the p-value, several steps have to be followed.

A testing problem is generally carried out by using the transformation method to generate a test statistic, the value of which is used to determine whether to accept or reject the null hypothesis.

After determining the test statistic, you can calculate a critical region. The critical region can be defined as a set of limits within which the calculated value of the test statistic should lie for the null hypothesis not to be rejected.

The region lying between these two limits, also known as critical points, is known as the region of acceptance denoted by W. The region outside the limits is known as the rejection region, which is represented by A.

The level of significance, denoted by , denotes the probability of rejecting a true null hypothesis. It is also used to determine the critical region of the test statistic distribution under the null hypothesis.

The significance level is mostly 5%, but differs from problem to problem and is usually provided to the experimenter.

Consider a testing problem where the main population is denoted by X1,X2, X3, ..., XN which follows a normal distribution (independently and identically)

Let denote the population mean and denote the standard deviation.

Let denote the level of significance.

Now, from the above population of size N, a sample of size n is chosen randomly.

Let the random samples be denoted by x1,x2, x3, ..., xn.

Let x be the sample mean and s be the standard deviation.

So the standard error is given by:

SE = n

To test: H0 against H1

Let Z denote the test statistic.

We define Z as:

Z =(x-)SE

So a calculated value of Z can be obtained using the observed values of the random sample denoted by Zobs.

Under H0: Z ~ N(0,1)

The P-value can be calculated by:

p = 1 - Probability (Zobs)

Based on the P-value, the null hypothesis will be not be rejected in favor of the alternative hypothesis if

p > level of significance ().

Based on the P-value, it can reject the null hypothesis in favor of the alternative hypothesis if p < level of significance ().

In general, the P-value associated with a test statistic in a testing problem denotes the probability that a given point lies in withing the critical region. Experimenters use these values to decide whether to accept or reject a null hypothesis.

So, P-value or Probability value is a measure of the probability of occurrence of the event under study by the experimenter under the conditions of a null hypothesis.

**Example:**

Let there be a bulb manufacturer who claims that a particular lot of bulbs have a lifetime of units. Suppose N bulbs are present in the lot.

This will constitute a testing problem of the form:

To test: H0: Average lifetime of the bulbs is units

Against

H1: Average lifetime of the bulbs is not units.

Let a sample of size n be drawn randomly from the N bulb.

Now, if on calculation the average lifetime of the n bulbs attaints a value very close to (exact value can never be attained due to underlying errors), then the value of the calculated test statistic chosen will match the value of the statistic assumed under the conditions of the null hypothesis. In this case, the P-value will be close to 1 (but never equal to 1).

Suppose the average lifetime of the n bulbs differs significantly from, then the calculated value of the test statistic will also differ significantly from the value that the test statistic assumes under the conditions of the null hypothesis. In this case, the P-value will be close to 0 (but never equal to 0).

In a testing problem, the null hypothesis is not rejected in favor of the alternate hypothesis if the calculated value of the test statistic (denoted by Tcalc, say) chosen falls within the region of acceptance, denoted by W.

If the value of Tcalc falls outside W, then the null hypothesis is rejected in favor of the alternate hypothesis.

Such a case may arise wherein Tcalc W, still the null hypothesis gets rejected.

This type of error is known as type I error.

**Definition:**

The error committed by rejecting a true null hypothesis is known as a type I error.

It may also happen that Tcalc W, but still, the null hypothesis does not get rejected in favor of the alternate hypothesis. This type of error is known as type II error.

**Definition:**

The error committed by accepting a false null hypothesis is known as a type II error.

Situation Decision | H0 True | H0 False |

H0 Rejected | Type I Error | Correct Decision |

H0 Not Rejected | Correct Decision | Type II Error |

In a testing problem, the choice of the null hypothesis depends highly should be made keeping in mind both types of errors. A test is termed as good if both types of errors are kept under control since, for practical purposes, it is impossible to get rid of any errors.

Now, it is assumed that the commission of the errors is a random event. As such, the experimenters can easily calculate the probabilities associated with them.

Since the problem of hypothesis testing consists of a missing parameter (say ), the probabilities will also depend on it.

The probability of type I error associated with is given by:

P [Type I Error] =P [(X1,X2, X3, ..., XN) W]= P(W), 0

Where

X1,X2, X3, ..., XN denotes the population under study

W denotes the acceptance region

0 denotes a specified proper subset of the parameter space

Let be any number such that 0<<1. This value indicates the level at which the probability of type I error should be kept for a good test. So we have,

P(W) = , 0 is known as a test's significance level.

The probability of type Ii error associated with is given by:

P [Type II Error] =P [(X1,X2, X3, ..., XN) A]= P(A), -0

Where

X1,X2, X3, ..., XN denotes the population under study

A denotes the rejection region

-0 denotes a specified proper subset of the parameter space

The region of acceptance, W, and the rejection region A can be thought of as two sets in the cartesian plane. The culmination of these two sets forms the entire range of values for the test.

Both these regions are compliments of each other, i.e., W=AC

Where Ac is the set complimentary to A.

So, the probability of type II error can also be written as:

P(A) = P(WC)= 1-P(W)

For -0

The probability () =P(W) is a function of () is called the power function of the test.

We have:

() = the probability of type I error associated with , 0

() = 1 - the probability of type II error associated with , -0

The power function is used to judge the nature of the whole test.

The null and alternative hypotheses statements corresponding to a testing problem differ from problem to problem.

Usually, the claim made about the parameter is chosen as the alternative hypothesis when dealing with a problem. Consider the following problem:

Problem:

A lightbulb manufacturer packs their bulbs into cartons, each carton containing 100 bulbs. Out of these 100 bulbs, 30 bulbs are picked at random testing. According to the manufacturer, the average lifetime of a bulb is 1,000 hours. Now, a new manufacturing process has been introduced, which is said to increase the average lifetime of the bulbs. Check whether the new approach is effective, assuming that the lifetime of the bulbs follows a normal distribution.

Solution:

In the above problem, it has been provided that the average lifetime of the bulbs using the old method is 1,000 hours.

A claim has been made that the new manufacturing process will increase the lifetime of the bulb, i.e., it will be more than 1,000 hours.

So, we have to test if the new process actually increases the lifetime of the bulbs.

Let denote the lifetime of the bulbs.

As such the testing problem can be written as:

To test: H0 : =1,000 against H1 : >1,000

The level of significance of a test () is the probability of type I error. This is usually provided to the experimenter.

A test statistic is used to decide the rejection criteria for the null hypothesis in a testing problem. Different tests have different test statistics

Let X1,X2, X3, ..., Xn denote a set of random samples that follow a normal distribution independently and identically with mean and variance 2. Here both the mean and variance are unknown parameters.

Let the testing problem be defined as

To test: H0 : =0 against H1 : = 1

Where 10

We define the test statistic as -

T = (X-)SE

Where X is the mean of the population from which the random sample has been sampled

And SE is the standard error

Now, under H0,

T =(X-0)SE ~ tn-1

So the value of the test statistic can be calculated at a particular level of significance from a t-distribution table

The retrieved value of the test statistic can be computed methodically by using the observations.

Let X1,X2, X3, ..., Xn denote a set of random samples that follow a normal distribution independently and identically with mean and variance 2. Here both the mean and variance are unknown parameters.

Let the testing problem be defined as

To test: H0 : 2=02 against H1 :2=12

Where 1202

We define the test statistic as

T=(n-1)s22

Where s2 denotes the sample variance

Under H0,

T ~ n-12

So the value of the test statistic can be calculated at a particular level of significance from a chi-square distribution table.

The observed value of the test statistic is computed methodically by using the observations.

After calculating the value of the test statistic T, denoted by Tobs (say) we need to compare it to the critical value to determine whether H0 gets rejected.

The test statistic’s critical value is obtained from tables provided or by using the software. The critical value is calculated at a particular level of significance ,say.

Suppose the calculated value of the test statistic comes out to be greater than the critical value at significance level. In such a case, the null hypothesis is rejected in favor of the alternate hypothesis.

Correctly reporting the results of an experiment is one of the most crucial tasks of the experimenter. While dealing with the problem of hypothesis testing, a particular syntax is followed by statisticians all around the globe.

After comparing the value of the test statistic to the critical value, either the null hypothesis will get rejected, or it will not get rejected.

As the calculated value of the test statistic is greater than the critical value, we reject H0 in favor of H1.

As the calculated value of the test statistic is less than the critical value, we do not reject H0 in favor of H1.

It is also preferable to report all the values obtained in a tabular format.

One sample t-test is generally used to determine if a significant difference exists between the means of an unknown population and a particular value. It is used when the standard deviation of the population is unknown.

Assumptions:

Data must be continuous

The data must follow a normal distribution

Sampling should be done using simple random sample techniques such that the probability of selection of each sample is equal

The pre-requisites for performing this test are the population mean, sample size, sample mean, sample standard deviation, and sample size.

Let X1,X2, X3, ..., Xn denote a set of random samples that follow a normal distribution independently and identically with mean and variance 2, where the variance is unknown.

Let the testing problem be denoted as:

To test: H0 : =0 against H1 : 0 (two-tailed test)

H0 : =0 against H1 : > 0 (right-tailed test)

H0 : =0 against H1 : < 0 (left-tailed test)

Where is the value of the hypothesized mean

Now, the standard error of the sample is given by:

SE = sn

Where s is the standard deviation of the random sample

The test statistic is defined as

T = (X-)SE =n(X-)s

Under H0,

T = (X-0)SE =n(X-0)s ~ tn-1

I.e., the test statistic follows a t-distribution with degrees of freedom n-1

The critical value if given by: Tctitcal= t; n-1 (for a one tailed test)

t2; n-1 (for a two tailed test)

Where is the level of significance

In a testing problem, z-test is used to check the significant difference between two population means when the standard deviation of the population is known.

Assumptions:

Data should be continuous

The data should follow a normal distribution

The sample should be generated from the population using simple random sampling techniques, such that the probabilities of selecting the samples are equal.

The population standard deviation should be known.

Let X1,X2, X3, ..., Xn denote a set of random samples that follow a normal distribution independently and identically with mean and variance 2, where the variance is known.

Let the testing problem be denoted as:

To test: H0 : =0 against H1 : 0 (two-tailed test)

H0 : =0 against H1 : > 0 (right-tailed test)

H0 : =0 against H1 : < 0 (left-tailed test)

Where is the value of the hypothesized mean

The test statistic is defined as

Z=X-/n

Where X=i=1nXin

Under H0,

Z = (X-0)/n ~ N(0,1)

I.e, the test statistic follows a standard normal distribution

The critical value is given by: Zctitcal= z; n-1 (for a one-tailed test)

z2; n-1 (for a two-tailed test)

Where is the level of significance

The use of t-test can be extended beyond one sample, i.e., it can also be used to check for a significant difference between the means of two different independent populations.

Assumptions:

Data must be continuous

Random sampling techniques from the population should generate the data.

The data should follow a normal distribution

The variances of the two independent groups should be equal

Let X1,X2, X3, ..., XnX denote the first random sample and Y1,Y2, Y3, ..., YnY denote the second random sample such that they are independent of each other.

Let the first sample follow a normal distribution with mean X and variance sX2.

Let the second sample follow a normal distribution with mean Y and variance sY2.

To test: H0 : X=Y against H1 : XY (two-tailed test)

H0 : X=Y against H1 : X> Y (right-tailed test)

H0 : X=Y against H1 : X< Y (left-tailed test)

We define the test statistic as

T=X-YSE(1nX+1nY)

Where SE is the pooled standard deviation is given by

SE ={(nx-1)sX2}+{nY-1)sY2}nX+nY-2

Under H0, T ~ tnX+nY-2

The critical value is given by: Tctitcal=t; nX+nY-2 (for a one-tailed test)

t2; nX+nY-2 (for a two-tailed test )

Where is the level of significance

A paired t-test is used to check for the presence of any significant difference between two variables under the same subject. Usually, the two variables are separated by time.

**Example:**

An experimenter may want to find if there is any significant difference between deaths due to COVID-19 in May 2020 as compared to June 2020.

So, a paired t-test is used to check whether the mean difference between the pairs of observations differs significantly.

**Assumptions:**

The samples under study must be independent, i.e., any measurements made on the first sample should not affect the second sample.

Each sample pair must be obtained from the same subject, e,g., the weights of patients before and after undergoing a diet.

Each sample pair must follow a normal distribution.

Let X1,X2, X3, ..., Xn denote the first random sample and Y1,Y2, Y3, ..., Yn denote the second random sample such that they are independent of each other. Let both of them be normally distributed.

Let Z be a new random variable denoting the difference between the two samples, i.e.,

Z=X-Y

Let Z denote the mean of the differences and sZ2 denote the variance of the difference.

To test: H0 : Z=0 against H1 : Z0 (two-tailed test)

H0 : Z=0 against H1 : Z>0 (right-tailed test)

H0 : Z=0 against H1 : Z<0 (left-tailed test)

The test statistic is given by

T=ZsZ/n

Under H0, T ~ tn-1

The critical value if given by: Tctitcal= t; n-1 (for a one tailed test)

t2; n-1 (for a two tailed test)

Where is the level of significance

If observations are taken from a population with a given mean, it is not necessary that they will be identical. Due to the presence of random observation error, the observations fluctuate around the mean. This is a natural, inevitable variation. On top of this, another source of variation or sources of variation is deliberately introduced or suspected to enter due to circumstances beyond our control.

Hence, observations are heterogeneous or not homogeneous concerning the source or sources of variation.

Example:

An experimenter wishes to assess the effect of a sleeping drug on the average amount of sleep of patients.

A deliberately introduced source of variation, for example, a sleeping drug, is called “treatment” or “factor”. Thus certain patients who do not receive the “treatment” form one group, and the other groups are formed by changing the “dose” of the drug. Besides the drug, the patients can be classified according to other factors such as age or gender.

The effect of these sources of variation; that is, treatment can be assessed by analyzing the total variation and spilling it into components corresponding to these sources of variation.

Now, this analysis can be done in several ways, Analysis of Variance or ANOVA being one such method. The analysis of variance is a body of statistical methods of analyzing observations assumed to be of the structure

Yi= b1xi1+b2xi2+...+bpxip+ei , i=1(1)n j=1(1)p

, where the coefficients {xij} are the values of “counter variables” or “indicator variables’ which refer to the presence or absence of the effects {bj} in the conditions under which the observations are taken as: xij is the number of times bj occurs in the ith observation and this is usually 0 or 1. In general, in the analysis of variance, all factors are treated qualitatively.

Now the experimenter may also be interested to know if the effect of any of the treatments in an ANOVA setup differs significantly concerning the other treatments.

Let the data be modeled as

Yi=+i+ei , i=1(1)n

Where is the process mean

i denotes the effect due to the ith treatment

ei is the random error associated with the process

To test: H0: 1=2=3= ... = n=0 against H1: not H0

Now there may be two cases that the experimenter may face.

**Case I: Null Hypothesis Is Not Rejected**

In this case, since H0 is not rejected in favor of H1, no significant difference exists between the effect of the treatments.

**Case II: Null Hypothesis Rejected**

If the null hypothesis gets rejected in favor of the alternate hypothesis, then the experimenter can claim that the effects due to one or more treatments are different.

Pairwise testing is used on all treatment pairs to determine which treatments are responsible for the difference. This process is known as post hoc analysis.

Many courses are available today that provide quality education on hypothesis testing. These courses are especially beneficial because they will save you a lot of time and energy.

The main advantage of opting for an online course is that you can learn at your own pace. In offline courses, once a topic is covered, it will be up to you to learn it because the professor may move on to the next topic without waiting for you to finish. This does not happen in online courses. Online courses follow your pace of learning and thus offer better learning opportunities.

Another significant advantage of online courses is that you can attend classes from the comfort of your home, significantly reducing travel expenses.

When you opt for online courses, you will be provided with a choice of instructors and can select someone who suits your needs best. This will allow you to learn much more effectively than offline courses, where your choices remain limited.

Online courses also have excellent doubt-clearing facilities that offline courses lack.

So, in light of the given data, an online hypothesis testing course is better than an offline one.

The syllabus for a hypothesis testing course covers:

Test of a statistical hypothesis and critical region

Type I and type II errors

Level of significance and power of test

Optimum tests in different situations

Unbiased tests

Neyman-Pearson lemma

Construction of most powerful (MP) and uniformly most powerful (UMP) critical regions

MP and UMP regions in random sampling from a normal distribution

Construction of type A regions

Construction of type A1 regions

Optimum regions and sufficient statistics

Randomized tests

Composite hypotheses and similar regions

Similar regions and complete sufficient statistics

Construction of most powerful similar regions

Test to derive the mean of a normal distribution

Test for the variance of a normal distribution

Monotonicity of power function

Consistency

Invariance

Likelyhood-ratio tests

Comparing the means of k normal distributions with common variance

Properties of likelihood-ratio tests

The complex process of Hypothesis testing is being broadly leveraged industry-wide to make well-informed, data-driven decisions towards assured results. The power of Hypothesis testing enables professionals to test their theories before putting them into action, which can significantly benefit organisations to reap value while cutting risks of potential repercussions.

Its active implementation in business, as well as investment opportunities, is helping experts perform statistical analysis against containing datasets and receive decisive predictions towards a winning strategy. As Hypothetical testing is strengthening its statistical methods to enhance accuracy, more and more businesses are incorporating it to test their theories before committing resources to it, leading to a thriving future projection in the coming days.

Today, all major jobs are in the data science field. A data science course may just land you your dream job.

Hypothesis testing is one of the most pivotal concepts in statistics. This concept is used in all industries. As a result, there has been a huge demand for hypothesis testing courses in India.

These courses offer all the knowledge that any data scientist may possess, allowing you to apply for your dream job no matter your educational background.

Hypothesis testing is a part of statistics. So, solving these problems generally falls on the data scientists or data analysts who deal with statistics as a whole.

The median salary of data scientists in India is Rs. 46,953 per annum.

The entry-level salary for an analyst with an experience of less than a year is Rs. 3,67,000. For an experienced data analyst with more than 20 years of experience, the salary is Rs. 2 million.

Different factors affect the job of a data analyst. A base-level data analyst should know basic statistics and software like python, SQL, and R. Apart from these, they must also possess project management and organizational skills.

Data analysts should also have an analytical mind that allows them to work seamlessly with large unstructured data sets.

Other factors that determine their salary are the company they work at, its size and reputation, their position, work experience, and geographic location.

The median salary of a data analyst in the US is $ 63,259 (Rs. 49,43,545.35) per annum.

The median salary of a data analyst in the UK is £ 28,218 (Rs. 2706930.73) per annum

View More

1000+

Top Companies

50%

Average Salary Hike

Top 1%

Global Universities

Schedule 1:1 Counseling with upGrad

IIIT Bangalore & LJMU Alumni Status

Liverpool John Moores University

18 Months

No Cost EMI

NASSCOM Certificate

Syllabus

View Program

2+ Million Learners

IIIT Bangalore

12 Months

No Cost EMI

Joint Certificate from upGrad and NASSCOM

Syllabus

View Program

Ranked 4th by NIRF, 2021

IIM Kozhikode

8 Months

No Cost EMI

200+ Hours of Learning

Syllabus

View Program

University of Arizona Alumni Status

University of Arizona

24 Months

No Cost EMI

1-1 Mentorship and Job Support

Syllabus

View Program

5000 + Students Enrolled

IIIT Bangalore

8 Months

No Cost EMI

7+ Case Studies & Projects

Syllabus

View Program

5+ Projects

Caltech

9 Months

No Cost EMI

1:1 Mentorship and Career Support

Syllabus

View Program

Certification

Doubt Resolution Sessions | 5 Tools

Top 30 US University

University of Maryland

12 Months

No Cost EMI

1-1 Mentorship & Job Support

Syllabus

View Program

Live Sessions

University of Maryland

31 Weeks

6 Assignments

EMI Options

Syllabus

View Program

Cornell Certification

Cornell

4.5 Months

Live & Interactive

Ivy League

Syllabus

View Program

Top companies from all around the world have recruited upGrad alumni

Courses to get started with your Data Science and ML Career

Online Digital Marketing Degrees from the World's Top B-schools

Online Management Programs from the World's Top Universities

Online MBA Programs from the World’s Top Universities

Business Analytics Degrees from Top Universites

Strong hand-holding with dedicated support to help you master Data Science

Industry Expert Guidance

- - Interactive Live Sessions with leading industry experts covering curriculum + advanced topics
- - Personalised Industry Session in small groups (of 10-12) with industry experts to augment program curriculum with customized industry based learning

Student Support

- - Student Support is available 7 days a week, 24*7
- - For urgent queries, use the Call Back option on the platform.

Career Mentorship Sessions (1:1)

- Get mentored by an experienced industry expert and receive personalised feedback to achieve your desired outcome

High Performance Coaching (1:1)

- Get a dedicated career coach after the program to help track your career goals, coach you on your profile, and support you during your career transition journey

AI Powered Profile Builder

- Obtain specific, AI powered inputs on your resume and Linkedin structure along with content on real time basis

Interview Preparation

- - Get access to Industry Experts and discuss any queries before your interview
- - Career bootcamps to refresh your technical concepts and improve your soft skills

Networking & Learning Experience

- - Live Discussion forum for peer to peer doubt resolution monitored by technical experts
- - Peer to peer networking opportunities with a alumni pool of 10000+
- - Lab walkthroughs of industry-driven projects
- - Weekly real-time doubt clearing sessions

upGrad Opportunities

- - upGrad Elevate: Virtual hiring drive giving you the opportunity to interview with upGrad's 300+ hiring partners
- - Job Opportunities Portal: Gain exclusive access to upGrad's Job Opportunities portal which has 100+ openings from upGrad's hiring partners at any given time
- - Be the first to know vacancies to gain an edge in the application process
- - Connect with companies that are the best match for you

Fees

INR 4,99,000*

INR 2,99,000*

INR 7,50,000*

INR 1,50,000*

INR 99,000*

Learn through real-life industry projects sponsored by top companies across industries

- Collaborative projects with peers
- In-person learning with expert mentors
- Personalised feedback to facilitate improvement

Analyse movie data from the past 100 years and find out various insights to determine what makes a movie do well.

Learn More

Solve a real industry problem through the concepts learnt in exploratory data analysis

Learn More

Build a model to understand the factors on which the demand for bike sharing systems vary on and help a company optimise its revenue

Learn More

Help the sales team of your company identify which leads are worth pursuing through this classification case study

Learn More

Apply the machine learning concepts learnt to help an international NGO cluster countries to determine their overall development and plan for lagging countries.

Learn More

Telecom companies often face the problem of churning customers due to the competitive nature of the industry. Help a telecom company identify customers that are likely to churn and make data-driven strategies to retain them.

Learn More

Build a machine learning model to identify fraudulent credit card transactions

Learn More

Forecasting the sales on the time series data of a global store

Learn More

In this assignment, you will work on a movies dataset using SQL to extract exciting insights.

Learn More

In this assignment, you will apply your Hive and Hadoop learnings on an E-commerce company dataset.

Learn More

This is an ETL project which will cover the topics like Apache Sqoop, Apache Spark and Apache Redshift

Learn More

This assignment will test the learners understanding of the previous 2 modules on structured problem solving 1 and 2

Learn More

With the IPL season commencing, let's go ahead and do an exciting assignment on sports analytics in Tableau.

Learn More

Build a regularized regression model to understand the most important variables to predict the house prices in Australia.

Learn More

Analyse the dataset of parking tickets

Learn More

Practice MapReduce Programming on a Big Dataset.

Learn More

In this module, you will solve an industry case study using optimisation techniques

Learn More

This module will contain practice assignment & all resources related to a classification based problem statement.

Learn More

Real-life hypothesis testing allows researchers to test new theories before implementing them. It is used in different industries to set standards for their products. It is especially helpful to statisticians when designing an experiment with many parameters.

Statistical hypotheses are of two types. Simple and composite.

A statistical hypothesis that specifies the distribution of the parent population from which the random samples to be used for testing has been generated is known as a simple hypothesis.

A statistical hypothesis that does not specify the distribution of the parent population from which the random samples to be used for testing has been generated is known as a composite hypothesis.

One needs to know probability theory, the different types of probability distributions, and statical inference to get a good grasp on the testing of a hypothesis.