Blogs

40464

Introduction
Machine Learning Interviews can vary according to the types or categories, for instance, a few recruiters ask many Linear Regression interview questions. When going for the role of Machine Learning Engineer interview, they can specialize in categories like Coding, Research, Case Study, Project Management, Presentation, System Design, and Statistics. We will focus on the most common types of categories and how to prepare for them.
Getting your desired job as a machine learning engineer may need you to pass a machine learning interview. The categories included in these interviews are frequently coding, machine learning concepts, screening, and system design. Different facets of your expertise and knowledge in the topic are assessed in each category. In this article, we’ll examine the most typical machine learning interview questions and offer helpful preparation advice for each of them.
It is a common practice to test data science aspirants on commonly used machine learning algorithms in interviews. These conventional algorithms being linear regression, logistic regression, clustering, decision trees etc. Data scientists are expected to possess an in-depth knowledge of these algorithms.
We consulted hiring managers and data scientists from various organisations to know about the typical ML questions which they ask in an interview. Based on their extensive feedback a set of question and answers were prepared to help aspiring data scientists in their conversations. Linear Regression interview questions are the most common in Machine Learning interviews. Q&As on these algorithms will be provided in a series of four blog posts.
Each blog post will cover the following topic:-
Linear Regression
Logistic Regression
Clustering
Decision Trees and Questions which pertain to all algorithms
Let’s get started with linear regression!
1. What is linear regression?
In simple terms, linear regression is a method of finding the best straight line fitting to the given data, i.e. finding the best linear relationship between the independent and dependent variables.
In technical terms, linear regression is a machine learning algorithm that finds the best linear-fit relationship on any given data, between independent and dependent variables. It is mostly done by the Sum of Squared Residuals Method.
2. State the assumptions in a linear regression model.
There are three main assumptions in a linear regression model:
The assumption about the form of the model:
It is assumed that there is a linear relationship between the dependent and independent variables. It is known as the ‘linearity assumption’.
Assumptions about the residuals:
Normality assumption: It is assumed that the error terms, ε(i), are normally distributed.
Zero mean assumption: It is assumed that the residuals have a mean value of zero.
Constant variance assumption: It is assumed that the residual terms have the same (but unknown) variance, σ2 This assumption is also known as the assumption of homogeneity or homoscedasticity.
Independent error assumption: It is assumed that the residual terms are independent of each other, i.e. their pair-wise covariance is zero.
Assumptions about the estimators:
The independent variables are measured without error.
The independent variables are linearly independent of each other, i.e. there is no multicollinearity in the data.
Explanation:
This is self-explanatory.
If the residuals are not normally distributed, their randomness is lost, which implies that the model is not able to explain the relation in the data.
Also, the mean of the residuals should be zero.
Y(i)i= β0+ β1x(i) + ε(i)
This is the assumed linear model, where ε is the residual term.
E(Y) = E(β0+ β1x(i) + ε(i))
= E(β0+ β1x(i) + ε(i))
If the expectation(mean) of residuals, E(ε(i)), is zero, the expectations of the target variable and the model become the same, which is one of the targets of the model.
The residuals (also known as error terms) should be independent. This means that there is no correlation between the residuals and the predicted values, or among the residuals themselves. If some correlation is present, it implies that there is some relation that the regression model is not able to identify.
If the independent variables are not linearly independent of each other, the uniqueness of the least squares solution (or normal equation solution) is lost.
Join the Artificial Intelligence Course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.
3. What is feature engineering? How do you apply it in the process of modelling?
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models resulting in improved model accuracy on unseen data.
In layman terms, feature engineering means the development of new features that may help you understand and model the problem in a better way. Feature engineering is of two kinds — business driven and data-driven. Business-driven feature engineering revolves around the inclusion of features from a business point of view. The job here is to transform the business variables into features of the problem.
In the case of data-driven feature engineering, the features you add do not have any significant physical interpretation, but they help the model in the prediction of the target variable.
FYI: Free nlp course!
To apply feature engineering, one must be fully acquainted with the dataset. This involves knowing what the given data is, what it signifies, what the raw features are, etc. You must also have a crystal clear idea of the problem, such as what factors affect the target variable, what the physical interpretation of the variable is, etc.
5 Breakthrough Applications of Machine Learning
4. What is the use of regularisation? Explain L1 and L2 regularisations.
Regularisation is a technique that is used to tackle the problem of overfitting of the model. When a very complex model is implemented on the training data, it overfits. At times, the simple model might not be able to generalise the data and the complex model overfits. To address this problem, regularisation is used.
Regularisation is nothing but adding the coefficient terms (betas) to the cost function so that the terms are penalised and are small in magnitude. This essentially helps in capturing the trends in the data and at the same time prevents overfitting by not letting the model become too complex.
L1 or LASSO regularisation: Here, the absolute values of the coefficients are added to the cost function. This can be seen in the following equation; the highlighted part corresponds to the L1 or LASSO regularisation. This regularisation technique gives sparse results, which lead to feature selection as well.
L2 or Ridge regularisation: Here, the squares of the coefficients are added to the cost function. This can be seen in the following equation, where the highlighted part corresponds to the L2 or Ridge regularisation.
5. How to choose the value of the parameter learning rate (α)?
Selecting the value of learning rate is a tricky business. If the value is too small, the gradient descent algorithm takes ages to converge to the optimal solution. On the other hand, if the value of the learning rate is high, the gradient descent will overshoot the optimal solution and most likely never converge to the optimal solution.
To overcome this problem, you can try different values of alpha over a range of values and plot the cost vs the number of iterations. Then, based on the graphs, the value corresponding to the graph showing the rapid decrease can be chosen.
The aforementioned graph is an ideal cost vs the number of iterations curve. Note that the cost initially decreases as the number of iterations increases, but after certain iterations, the gradient descent converges and the cost does not decrease anymore.
If you see that the cost is increasing with the number of iterations, your learning rate parameter is high and it needs to be decreased.
Best Machine Learning and AI Courses Online
Master of Science in Machine Learning & AI from LJMU
Executive Post Graduate Programme in Machine Learning & AI from IIITB
Advanced Certificate Programme in Machine Learning & NLP from IIITB
Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB
Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland
To Explore all our courses, visit our page below.
Machine Learning Courses
6. How to choose the value of the regularisation parameter (λ)?
Selecting the regularisation parameter is a tricky business. If the value of λ is too high, it will lead to extremely small values of the regression coefficient β, which will lead to the model underfitting (high bias – low variance). On the other hand, if the value of λ is 0 (very small), the model will tend to overfit the training data (low bias – high variance).
There is no proper way to select the value of λ. What you can do is have a sub-sample of data and run the algorithm multiple times on different sets. Here, the person has to decide how much variance can be tolerated. Once the user is satisfied with the variance, that value of λ can be chosen for the full dataset.
One thing to be noted is that the value of λ selected here was optimal for that subset, not for the entire training data.
7. Can we use linear regression for time series analysis?
One can use linear regression for time series analysis, but the results are not promising. So, it is generally not advisable to do so. The reasons behind this are —
Time series data is mostly used for the prediction of the future, but linear regression seldom gives good results for future prediction as it is not meant for extrapolation.
Mostly, time series data have a pattern, such as during peak hours, festive seasons, etc., which would most likely be treated as outliers in the linear regression analysis.
8. What value is the sum of the residuals of a linear regression close to? Justify.
Ans The sum of the residuals of a linear regression is 0. Linear regression works on the assumption that the errors (residuals) are normally distributed with a mean of 0, i.e.
Y = βT X + ε
Here, Y is the target or dependent variable,
β is the vector of the regression coefficient,
X is the feature matrix containing all the features as the columns,
ε is the residual term such that ε ~ N(0,σ2).
So, the sum of all the residuals is the expected value of the residuals times the total number of data points. Since the expectation of residuals is 0, the sum of all the residual terms is zero.
Note: N(μ,σ2) is the standard notation for a normal distribution having mean μ and standard deviation σ2.
9. How does multicollinearity affect the linear regression?
Ans Multicollinearity occurs when some of the independent variables are highly correlated (positively or negatively) with each other. This multicollinearity causes a problem as it is against the basic assumption of linear regression. The presence of multicollinearity does not affect the predictive capability of the model. So, if you just want predictions, the presence of multicollinearity does not affect your output. However, if you want to draw some insights from the model and apply them in, let’s say, some business model, it may cause problems.
One of the major problems caused by multicollinearity is that it leads to incorrect interpretations and provides wrong insights. The coefficients of linear regression suggest the mean change in the target value if a feature is changed by one unit. So, if multicollinearity exists, this does not hold true as changing one feature will lead to changes in the correlated variable and consequent changes in the target variable. This leads to wrong insights and can produce hazardous results for a business.
A highly effective way of dealing with multicollinearity is the use of VIF (Variance Inflation Factor). Higher the value of VIF for a feature, more linearly correlated is that feature. Simply remove the feature with very high VIF value and re-train the model on the remaining dataset.
In-demand Machine Learning Skills
Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses
10. What is the normal form (equation) of linear regression? When should it be preferred to the gradient descent method?
The normal equation for linear regression is —
β=(XTX)-1.XTY
Here, Y=βTX is the model for the linear regression,
Y is the target or dependent variable,
β is the vector of the regression coefficient, which is arrived at using the normal equation,
X is the feature matrix containing all the features as the columns.
Note here that the first column in the X matrix consists of all 1s. This is to incorporate the offset value for the regression line.
Comparison between gradient descent and normal equation:
Gradient Descent
Normal Equation
Needs hyper-parameter tuning for alpha (learning parameter)
No such need
It is an iterative process
It is a non-iterative process
O(kn2) time complexity
O(n3) time complexity due to evaluation of XTX
Prefered when n is extremely large
Becomes quite slow for large values of n
Here, ‘k’ is the maximum number of iterations for gradient descent, and ‘n’ is the total number of data points in the training set.
Clearly, if we have large training data, normal equation is not prefered for use. For small values of ‘n’, normal equation is faster than gradient descent.
What is Machine Learning and Why it matters
11. You run your regression on different subsets of your data, and in each subset, the beta value for a certain variable varies wildly. What could be the issue here?
This case implies that the dataset is heterogeneous. So, to overcome this problem, the dataset should be clustered into different subsets, and then separate models should be built for each cluster. Another way to deal with this problem is to use non-parametric models, such as decision trees, which can deal with heterogeneous data quite efficiently.
12. Your linear regression doesn’t run and communicates that there is an infinite number of best estimates for the regression coefficients. What could be wrong?
This condition arises when there is a perfect correlation (positive or negative) between some variables. In this case, there is no unique value for the coefficients, and hence, the given condition arises.
13. What do you mean by adjusted R2? How is it different from R2?
Adjusted R2, just like R2, is a representative of the number of points lying around the regression line. That is, it shows how well the model is fitting the training data. The formula for adjusted R2 is —
Here, n is the number of data points, and k is the number of features.
One drawback of R2 is that it will always increase with the addition of a new feature, whether the new feature is useful or not. The adjusted R2 overcomes this drawback. The value of the adjusted R2 increases only if the newly added feature plays a significant role in the model.
14. How do you interpret the residual vs fitted value curve?
The residual vs fitted value plot is used to see whether the predicted values and residuals have a correlation or not. If the residuals are distributed normally, with a mean around the fitted value and a constant variance, our model is working fine; otherwise, there is some issue with the model.
The most common problem that can be found when training the model over a large range of a dataset is heteroscedasticity(this is explained in the answer below). The presence of heteroscedasticity can be easily seen by plotting the residual vs fitted value curve.
15. What is heteroscedasticity? What are the consequences, and how can you overcome it?
A random variable is said to be heteroscedastic when different subpopulations have different variabilities (standard deviation).
The existence of heteroscedasticity gives rise to certain problems in the regression analysis as the assumption says that error terms are uncorrelated and, hence, the variance is constant. The presence of heteroscedasticity can often be seen in the form of a cone-like scatter plot for residual vs fitted values.
One of the basic assumptions of linear regression is that heteroscedasticity is not present in the data. Due to the violation of assumptions, the Ordinary Least Squares (OLS) estimators are not the Best Linear Unbiased Estimators (BLUE). Hence, they do not give the least variance than other Linear Unbiased Estimators (LUEs).
There is no fixed procedure to overcome heteroscedasticity. However, there are some ways that may lead to a reduction of heteroscedasticity. They are —
Logarithmising the data: A series that is increasing exponentially often results in increased variability. This can be overcome using the log transformation.
Using weighted linear regression: Here, the OLS method is applied to the weighted values of X and Y. One way is to attach weights directly related to the magnitude of the dependent variable.
How does Unsupervised Machine Learning Work?
16. What is VIF? How do you calculate it?
Variance Inflation Factor (VIF) is used to check the presence of multicollinearity in a dataset. It is calculated as—
Here, VIFj is the value of VIF for the jth variable,
Rj2 is the R2 value of the model when that variable is regressed against all the other independent variables.
If the value of VIF is high for a variable, it implies that the R2 value of the corresponding model is high, i.e. other independent variables are able to explain that variable. In simple terms, the variable is linearly dependent on some other variables.
17. How do you know that linear regression is suitable for any given data?
To see if linear regression is suitable for any given data, a scatter plot can be used. If the relationship looks linear, we can go for a linear model. But if it is not the case, we have to apply some transformations to make the relationship linear. Plotting the scatter plots is easy in case of simple or univariate linear regression. But in case of multivariate linear regression, two-dimensional pairwise scatter plots, rotating plots, and dynamic graphs can be plotted.
18. How is hypothesis testing used in linear regression?
Hypothesis testing can be carried out in linear regression for the following purposes:
To check whether a predictor is significant for the prediction of the target variable. Two common methods for this are —
By the use of p-values:
If the p-value of a variable is greater than a certain limit (usually 0.05), the variable is insignificant in the prediction of the target variable.
By checking the values of the regression coefficient:
If the value of regression coefficient corresponding to a predictor is zero, that variable is insignificant in the prediction of the target variable and has no linear relationship with it.
To check whether the calculated regression coefficients are good estimators of the actual coefficients.
19. Explain gradient descent with respect to linear regression.
Gradient descent is an optimisation algorithm. In linear regression, it is used to optimise the cost function and find the values of the βs (estimators) corresponding to the optimised value of the cost function.
Gradient descent works like a ball rolling down a graph (ignoring the inertia). The ball moves along the direction of the greatest gradient and comes to rest at the flat surface (minima).
Mathematically, the aim of gradient descent for linear regression is to find the solution of
ArgMin J(Θ0,Θ1), where J(Θ0,Θ1) is the cost function of the linear regression. It is given by —
Here, h is the linear hypothesis model, h=Θ0 + Θ1x, y is the true output, and m is the number of the data points in the training set.
Gradient Descent starts with a random solution, and then based on the direction of the gradient, the solution is updated to the new value where the cost function has a lower value.
The update is:
Repeat until convergence
20. How do you interpret a linear regression model?
A linear regression model is quite easy to interpret. The model is of the following form:
The significance of this model lies in the fact that one can easily interpret and understand the marginal changes and their consequences. For example, if the value of x0 increases by 1 unit, keeping other variables constant, the total increase in the value of y will be βi. Mathematically, the intercept term (β0) is the response when all the predictor terms are set to zero or not considered.
These 6 Machine Learning Techniques are Improving Healthcare
21. What is robust regression?
A regression model should be robust in nature. This means that with changes in a few observations, the model should not change drastically. Also, it should not be much affected by the outliers.
A regression model with OLS (Ordinary Least Squares) is quite sensitive to the outliers. To overcome this problem, we can use the WLS (Weighted Least Squares) method to determine the estimators of the regression coefficients. Here, less weights are given to the outliers or high leverage points in the fitting, making these points less impactful.
22. Which graphs are suggested to be observed before model fitting?
Before fitting the model, one must be well aware of the data, such as what the trends, distribution, skewness, etc. in the variables are. Graphs such as histograms, box plots, and dot plots can be used to observe the distribution of the variables. Apart from this, one must also analyse what the relationship between dependent and independent variables is. This can be done by scatter plots (in case of univariate problems), rotating plots, dynamic plots, etc.
23. What is the generalized linear model?
The generalized linear model is the derivative of the ordinary linear regression model. GLM is more flexible in terms of residuals and can be used where linear regression does not seem appropriate. GLM allows the distribution of residuals to be other than a normal distribution. It generalizes the linear regression by allowing the linear model to link to the target variable using the linking function. Model estimation is done using the method of maximum likelihood estimation.
24. Explain the bias-variance trade-off.
Bias refers to the difference between the values predicted by the model and the real values. It is an error. One of the goals of an ML algorithm is to have a low bias.
Variance refers to the sensitivity of the model to small fluctuations in the training dataset. Another goal of an ML algorithm is to have low variance.
For a dataset that is not exactly linear, it is not possible to have both bias and variance low at the same time. A straight line model will have low variance but high bias, whereas a high-degree polynomial will have low bias but high variance.
There is no escaping the relationship between bias and variance in machine learning.
Decreasing the bias increases the variance.
Decreasing the variance increases the bias.
So, there is a trade-off between the two; the ML specialist has to decide, based on the assigned problem, how much bias and variance can be tolerated. Based on this, the final model is built.
25. How can learning curves help create a better model?
Learning curves give the indication of the presence of overfitting or underfitting.
In a learning curve, the training error and cross-validating error are plotted against the number of training data points. A typical learning curve looks like this:
If the training error and true error (cross-validating error) converge to the same value and the corresponding value of the error is high, it indicates that the model is underfitting and is suffering from high bias.
Machine Learning Interviews and How to Ace Them
Machine Learning Interviews can vary according to the types or categories, for instance a few recruiters ask many Linear Regression interview questions. When going for the role of Machine Learning Engineer interview, they can specialise in categories like Coding, Research, Case Study, Project Management, Presentation, System Design, and Statistics. We will focus on the most common types of categories and how to prepare for them.
Coding
Coding and programming are significant components of a machine learning interview and are frequently used to screen applicants. To do well in these interviews, you need to have solid programming abilities. Coding interviews typically run 45 to 60 minutes and are made up of only two questions. The interviewer poses the topic and anticipates that the applicant would address it in the least amount of time possible.
How to prepare – You can prepare for these interviews by having a good understanding of the data structures, complexities of time and space, management skills, and the ability to understand and resolve a problem. upGrad has a great software engineering course that can help you enhance your coding skills and ace that interview.
In machine learning interviews, coding and programming abilities are essential and frequently utilized to evaluate candidates. You’ll be given coding issues to effectively solve in a constrained amount of time throughout these interviews. Strong programming skills, data structure expertise, an understanding of time and space complexities, and problem-solving talents are necessary to succeed in these interviews.
Consider enrolling in a software engineering course, such as the one provided by upGrad, to prepare for coding interviews. It can help you improve your coding abilities and get ready for the coding problems that will come up during the interview.
During these interviews, your knowledge of machine learning principles will be carefully assessed. Questions may encompass subjects like convolutional layers, recurrent neural networks, generative adversarial networks, and speech recognition, depending on the employment needs.
2. Machine Learning
Your understanding of machine learning will be evaluated through interviews. Convolutional layers, recurrent neural networks, generative adversary networks, speech recognition, and other topics may be covered depending on the employment needs.
How to prepare – To be able to ace this interview, you must ensure that you have a thorough understanding of the job roles and responsibilities. This will help you identify the specifications of ML that you must study. However, if you do not come across any specifications, you must deeply understand the basics. An in-depth course in ML that upGrad provides can help you with that. You can also study the latest articles on ML and AI to understand their latest trends and you can incorporate them on a regular basis.
3. Screening
This interview is somewhat informal and typically one of the initial points of the interview. A prospective employer often handles it. This interview’s major goal is to provide the applicant with a sense of the business, the role, and the duties. In a more informal atmosphere, the candidate is also questioned about their past to determine whether their area of interest matches the position.
How to prepare – This is a very non-technical part of the interview. All this required is your honesty and the basics of your specialization in Machine Learning.
In the initial stage of the interview process, the screening interview is frequently casual. Its main objective is to give the applicant an overview of the organization, the position, and the duties. To determine whether a candidate is a good fit for the role, questions about their experience and hobbies may be asked.
Being truthful about your history and showcasing your general and machine learning-specific knowledge are important aspects of screening interview preparation.
4. System Design
Such interviews test a person’s capacity to create a fully scalable solution from beginning to finish. The majority of engineers are so preoccupied with an issue that they frequently overlook the wider picture. A system design interview calls for an understanding of numerous elements that combine to produce a solution. These elements include the front-end layout, the load balancer, the cache, and more. An effective and scalable end-to-end system is easier to develop when these issues are well understood.
How to prepare – Understand the concepts and components of the system design project. Use real-life examples to explain the structure to your interviewer for a better understanding of the project.
Interviews for system design assess a candidate’s capacity to create a fully scalable solution from scratch. It involves knowledge of numerous elements that contribute to a scalable end-to-end system, including front-end layout, load balancing, caching, and more.
Learn the terms and elements of system design projects to perform well in a system design interview. To help the interviewer better comprehend your approach, use examples from real-world situations while describing the structure you propose.
If there is a significant gap between the converging values of the training and cross-validation errors, i.e. the cross-validating error is significantly higher than the training error, it suggests that the model is overfitting the training data and is suffering from a high variance.
Popular AI and ML Blogs & Free Courses
IoT: History, Present & Future
Machine Learning Tutorial: Learn ML
What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles
A Day in the Life of a Machine Learning Engineer: What do they do?
What is IoT (Internet of Things)
Permutation vs Combination: Difference between Permutation and Combination
Top 7 Trends in Artificial Intelligence & Machine Learning
Machine Learning with R: Everything You Need to Know
AI & ML Free Courses
Introduction to NLP
Fundamentals of Deep Learning of Neural Networks
Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World
Introduction to Tableau
Case Study using Python, SQL and Tableau
If there is a significant gap between the converging values of the training and cross-validating errors, i.e. the cross-validating error is significantly higher than the training error, it suggests that the model is overfitting the training data and is suffering from a high variance.
Machine Learning Engineers: Myths vs. Realities
That’s the end of the first section of this series. Stick around for the next part of the series which consist of questions based on Logistic Regression. Feel free to post your comments.
Co-authored by – Ojas Agarwal
You can check our Executive PG Programme in Machine Learning & AI, which provides practical hands-on workshops, one-to-one industry mentor, 12 case studies and assignments, IIIT-B Alumni status, and more.

Read More10 Sep 2023

Blogs

6113

Simpson’s paradox is a phenomenon in probability and statistics, in which a trend appears in different groups of data, but disappears or reverses when these groups are combined.
You need to be very careful while calculating averages or pooling data from different sectors. It is always better to check whether the pooled data tell the same story or a different one from that of the non-aggregated data. If the story is different, then there is a high probability of Simpson’s paradox. A lurking variable must be affecting the direction of the explanatory and target variables.
Learn data science courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Historical Background
Simpson’s Paradox was discovered in the early twentieth century, with contributions from various statisticians and scholars. In 1951, Edward H. Simpson, a British statistician, found one of the earliest prominent examples. However, the paradox itself had been observed in various forms even before Simpson’s work.
Simpsons Paradox refers to a phenomenon in which an apparent trend or relationship in aggregated data reverses or disappears when the data is disaggregated into subgroups. If not fully understood and accounted for, this surprising discovery might lead to incorrect findings.
Consider the famous Simpson’s Paradox example to gain a better understanding of the dilemma it presents. Assume two departments, A and B, at a university and the goal is to compare their respective acceptance rates of male and female candidates. On a surface analysis of the aggregated data, it appears that Department A has a higher admittance rate for both males and females than Department B; however, when we break down the data by gender, we see that while Department A has a higher admittance rate for both genders, Department B actually has a lower rate for each gender combined. This trend reversal at the subgroup level is an example of Simpson’s Paradox.
Real-world Applications
Simpson’s Paradox has far-reaching implications and has been observed in various domains, including social sciences, healthcare, education, economics, and sports. Understanding this Simpson’s paradox in data science is crucial for avoiding misinterpretation of data and making accurate decisions.
In the field of healthcare, Simpson’s Paradox has been encountered in studies evaluating the effectiveness of treatments. For instance, a drug may show positive effects overall but fail to demonstrate efficacy when the data is analyzed based on different patient characteristics or disease severity levels. This highlights the importance of considering subgroup analyses to gain a comprehensive understanding of treatment outcomes.
In economics, Simpson’s Paradox can occur when analyzing income inequality across different regions or demographic groups. Aggregated data may suggest a decreasing income gap, but disaggregating the data could reveal that inequality actually worsens within each subgroup. This emphasizes the need to examine data from various perspectives to avoid overlooking underlying patterns.
Preventive Measures
To circumvent Simpson’s Paradox and guarantee precise study and analysis, researchers and investigators ought to take several preventive steps. First and foremost, it is essential to perform a subgroup analysis. By closely observing the data at the subgroup level, subtleties in the underlying connections can be exposed. This allows for a more astute understanding of the data and helps uncover potential confounding variables or interaction effects that can contribute to the paradox. Additionally, the sample size must be taken into account. Adequate sample sizes within subgroups are essential to obtain dependable and statistically substantial outcomes. Insufficient sample sizes can cause illogical determinations and exacerbate the odds of experiencing Simpson’s Paradox.
Contextual data is another significant factor to bear in mind. Understanding the exact setting in which the data was collected can help recognize conceivable predispositions and confounding factors. This data can then be incorporated into the analysis to offer a more exact elucidation of the discoveries. Lastly, by utilizing progressed factual techniques, such as multidimensional analysis and causal modeling, can give assistance to untangle the real connections between variables. These techniques permit distinguishing and controlling confounding factors, offering a stronger analysis.
By executing these preventive measures, researchers and analysts can minimize the danger of experiencing Simpson’s Paradox and enhance the accuracy and dependability of their discoveries. It is essential to approach data investigation with alertness and to consider the potential effect of subgroup results to guarantee logical choices in view of exact perceptions of the data.
Let us understand Simpson’s paradox with the help of an another example:
In 1973, a court case was registered against the University of California, Berkeley. The reason behind the case was gender bias during graduate admissions. Here, we will generate synthetic data to explain what really happened.
Let’s assume the combined data for admissions in all departments is as follows
Gender
Applicants
Admitted
Admission Percentage
Men
2,691
1,392
52%
Women
1,835
789
43%
If you observe the data carefully, you’ll see that 52% of the males were given admission, while only 43% of the women were admitted to the university. Clearly, the admissions favoured the men, and the women were not given their due. However, the case is not so simple as it appears from this information alone. Let’s now assume that there are two different categories of departments — ‘Hard’ (hard to get into) and ‘Easy’.
Our learners also read: Learn Python Online for Free
Let’s divide the combined data into these categories and see what happens
Department
Applied
Admitted
Admission Percentage
Men
Women
Men
Women
Men
Women
Hard
780
1,266
200
336
26%
27%
Easy
1,911
569
1,192
453
62%
80%
Do you see any gender bias here? In the ‘Easy’ department, 62% of the men and 80% of the women got admission. Likewise, in the ‘Hard’ department, 26% of the men and 27% of the women got admission. Is there any bias here? Yes, there is. But, interestingly, the bias is not in favour of the men; it favours the women!!! If you combine this data, then an altogether different story emerges. A bias favouring the men becomes apparent. In statistics, this phenomenon is known as ‘Simpson’s paradox.’ But why does this paradox occur?
Top Essential Data Science Skills to Learn
SL. No
Top Data Science Skills to Learn
1
Data Analysis Certifications
Inferential Statistics Certifications
2
Hypothesis Testing Certifications
Logistic Regression Certifications
3
Linear Regression Certifications
Linear Algebra for Analysis Certifications
Simpson’s paradox occurs if the effect of the explanatory variable on the target variable changes direction when you account for the lurking explanatory variable. In the above example, the lurking variable is the ‘department.’ In the case of the ‘Easy’ department, the percentages of men and women applying were in equal proportion. While in the case of the ‘Hard’ department, more women applied than men, and this led to more women applications getting rejected. When this data is combined, it shows a visible bias towards male admissions, which is really non-existent.
Now suppose you were a statistician for the Indian government and inspected a fighter plane that returned from the Chinese war of 1965. Inspecting the bullet holes in the aircraft surface, what would you recommend? Would you recommend the strengthening of the areas hit by bullets?
The following is an excerpt from a StackExchange
“During World War II, Abraham Wald was a statistician for the U.S. government. He looked at the bombers that returned from missions and analysed the pattern of the bullet ‘wounds’ on the planes. He recommended that the Navy reinforce areas where the planes had no damage.
Read our popular Data Science Articles
Data Science Career Path: A Comprehensive Career Guide
Data Science Career Growth: The Future of Work is here
Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers
The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have
Top 6 Reasons Why You Should Become a Data Scientist
A Day in the Life of Data Scientist: What do they do?
Myth Busted: Data Science doesn’t need Coding
Business Intelligence vs Data Science: What are the differences?
upGrad’s Exclusive Data Science Webinar for you –
Watch our Webinar on How to Build Digital & Data Mindset?
document.createElement('video');
https://cdn.upgrad.com/blog/webinar-on-building-digital-and-data-mindset.mp4
Why? We have selective effects at work. This sample suggests that damage inflicted on the observed areas could be withstood. Either the plane was never hit in the untouched areas — an unlikely proposition — or strikes to those parts were lethal. We care about the planes that went down, not just those that returned. Those that fell likely suffered an attack in a place that was untouched on those that survived.”
In statistics, things are not as they appear on the surface. You need to be skeptical and look beyond the obvious during analyses. Maybe it’s time to read ‘Think Like a Freak’ or ‘How to Think Like Sherlock’. Let us know if you already have and what your thoughts are on the same!

Read More14 Jun 2023

Blogs

9551

A constant form of silent evolution is machine learning. We thought computers were the big all-that that would allow us to work more efficiently; soon, machine learning was introduced to the picture, changing the discourse of our lives forever. The reshaping of the world started with teaching computers to do things for us, and now it has reached the stage where even that simple step is eliminated. It is no longer imperative for us to teach computers how to execute complex tasks like text translation or image recognition: instead, we built systems that let them do it themselves. It’s as close to magic as the muggle community will ever reach!
The exceptionally powerful form of machine learning being used today goes by the name “deep learning”. On vast quantities of data, it builds complex mathematical structures called neural networks. Constructed to be analogous to how the human brain functions, it was in 1930 that neural networks themselves were first introduced. Though, it was only in the past decade or so that computers have become efficient enough to use that ability.
What exactly is Machine Learning?
So, in general terms, machine learning is a result of the application of Artificial Learning. Let’s take the example of you shopping online — have you ever been in a situation where the app or website started recommending products that might in some way be associated or similar to the purchase you made? If yes, then you have seen machine learning in action. Even the “bought together” combination of products is another byproduct of machine learning.
This is how companies target their audience, and divide people into various categories to serve them better, make their shopping experience tailored to their browsing behavior.
Machine learning is merely based on predictions made based on experience. It enables machines to make data-driven decisions, which is more efficient than explicitly programming to carry out certain tasks. These algorithms are designed in a fashion that gives exposure to new data that can help organisations learn and improve their strategies.
The Future of Jobs
What is the future of Machine Learning?
Improved cognitive services
With the help of machine learning services like SDKs and APIs, developers are able to include and hone the intelligent capabilities into their applications. This will empower machines to apply the various things they come across, and accordingly carry out an array of duties like vision recognition, speech detection, and understanding of speech and dialect. Alexa is already talking to us, and our phones are already listening to our conversations— how else do you think the machine “wakes up” to run a google search on 9/11 conspiracies for you? Those improved cognitive skills are something we could not have ever imagined happening a decade ago, yet, here we are. Being able to engage humans efficiently is under constant alteration to serve and understand the human species better.
We already spend so much time in front of screens that our mobiles have become an extension of us- and through cognitive learning, it has literally become the case. Your machine learns all about you, and then accordingly alters your results. No two people’s Google search results are the same: why? Cognitive learning.
The Rise of Quantum Computing
“Quantum computing”— sounds like something straight out of a science fiction movie, no? But it has become a genuine phenomenon. Satya Nadella, the chief executive of Microsoft Corp., calls i7t one of the three technologies that will reshape our world. Quantum algorithms have the potential to transform and innovate the field of machine learning. It could process data at a much faster pace and accelerate the ability to draw insights and synthesize information.
Heavy-duty computation will finally be done in a jiffy, saving so much of time and resources. The increased performance of machines will open so many doorways that will elevate and take evolution to the next level. Something as basic as two numbers- 0 and 1 changed the way of the world, imagine what could be achieved if we ventured into a whole new realm of computers and physics?
Join the AI & ML course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.
Rise of Robots
With machine learning on the rise, it is only natural that the medium gets a face on it— robots! The sophistication of machine learning is not a ‘small wonder’ if you know what I mean.
Multi-agent learning, robot vision, self-supervised learning all will be accomplished through robotisation. Drones have already become a normality, and have now even replaced human delivery men. With the rapid speed technology is moving forward, even the sky is not the limit. Our childhood fantasies of living in an era of the Jetsons will soon become reality. The smallest of tasks will be automated, and human beings will no longer have to be self-reliant because you will have a bot following you like a shadow at all times.
Career Opportunities in the field?
Now that you are aware of the reach of machine learning and how it can single-handedly change the course of the world, how can you become a part of it?
Here are some job options that you can potentially think of opting –
Machine Learning Engineer – They are sophisticated programmers who develop the systems and machines that learn and apply knowledge without having any specific lead or direction.
Deep Learning Engineer – Similar to computer scientists, they specialise in using deep learning platforms to develop tasks related to artificial intelligence. Their main goal is to be able to mimic and emulate brain functions.
Data Scientist – Someone who extracts meaning from data and analyses and interprets it. It requires both methods, statistics, and tools.
Computer Vision Engineer – They are software developers who create vision algorithms for recognising patterns in images.
Machine learning already is and will change the course of the world in the coming decade. Let’s eagerly prep and wait for what the future awaits. Let’s hope that machines do not get the bright idea of taking over the world, because not all of us are Arnold Schwarzenegger. Fingers crossed!
Importance of Machine Learning
Machine learning is important as it helps give a new perspective on customer trends and business patterns. Machine learning is being used by various big companies today, such as Facebook, Uber, Ola, Google, etc.
It is also useful in driving business results such as money and time-saving ideas, and It is also helpful in automating the tasks which would otherwise be performed by an individual manually.
Use cases of machine learning
Healthcare
The technology of machine learning is highly useful in the field of healthcare. The technology of natural language processing helps give accurate insights. Other machine learning applications would be CT-Scan, X-ray, Ultrasound, etc. The reason why healthcare is a machine learning future scope is that it is helpful in redefining age-old processes.
Banking and Finance
Machine learning uses statistical patterns to make accurate predictions. The technology is also helpful in document analysis, fraud detection, KYC processing, high-frequency trading, etc. It is the future scope of machine learning which is scouring the banking sector.
Image recognition
Another scope of machine learning is image recognition. It is useful to detect images over the internet, the social networking site such as Facebook uses this to see images for tagging.
Top Machine Learning and AI Courses Online
Master of Science in Machine Learning & AI from LJMU
Executive Post Graduate Programme in Machine Learning & AI from IIITB
Advanced Certificate Programme in Machine Learning & NLP from IIITB
Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB
Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland
To Explore all our certification courses on AI & ML, kindly visit our page below.
Machine Learning Certification
If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s Executive PG Programme in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.
Trending Machine Learning Skills
AI Courses
Tableau Certification
Natural Language Processing
Deep Learning AI
Popular AI and ML Blogs & Free Courses
IoT: History, Present & Future
Machine Learning Tutorial: Learn ML
What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles
A Day in the Life of a Machine Learning Engineer: What do they do?
What is IoT (Internet of Things)
Permutation vs Combination: Difference between Permutation and Combination
Top 7 Trends in Artificial Intelligence & Machine Learning
Machine Learning with R: Everything You Need to Know
AI & ML Free Courses
Introduction to NLP
Fundamentals of Deep Learning of Neural Networks
Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World
Introduction to Tableau
Case Study using Python, SQL and Tableau

Read More25 Mar 2023

Blogs

21043

Welcome to the second part of the series of commonly asked interview questions based on machine learning algorithms. We hope that the previous section on Linear Regression was helpful to you.
Machine learning is a growing field. Its demand is increasing and the market is expected to grow very rapidly in the coming years. The demand for machine learning is high because of its vast applicability. There is no limitation to its applications. With the growing technology, the uses of machine learning are almost everywhere from a simple switch to big giant technologies. It is considered one of the highest-paying careers in today’s times. The average salary of a machine learning engineer is 7.5 LPA and the salary ranges from 3.5 LPA to 21.0 LPA it can be more than that as well owing to the
experience, skillsets, and upskilling history. (Source)
Logistic regression is a machine learning classification algorithm. It is a statistical analysis method to predict the binary outcome. It predicts a dependent variable by analysing the relationship between one or more independent variables. It is about fitting a curve to a data as opposed to the linear regression that is about fitting a line in the data.
Logistic regression is vastly applicable and can be used to predict for data sets such as whether a political candidate will win or no or whether a patient will have herart attack ornot. This is how to explain logistic regression in interview.
Let’s find the answers to questions on logistic regression:
1. What is a logistic function? What is the range of values of a logistic function?
f(z) = 1/(1+e -z )
The values of a logistic function will range from 0 to 1. The values of Z will vary from -infinity to +infinity.
2. Why is logistic regression very popular?
Logistic regression is famous because it can convert the values of logits (logodds), which can range from -infinity to +infinity to a range between 0 and 1. As logistic functions output the probability of occurrence of an event, it can be applied to many real-life scenarios. It is for this reason that the logistic regression model is very popular.
It is one of the most commonly asked logistic regression questions. Logistic regression is also predictive analysis just like all the other regressions and is used to describe the relationship between the variables. There are many real-life examples of logistic regression such as the probability of predicting a heart attack, the probability of finding if the transaction is going to be fraudulent or not, etc.
In-demand Machine Learning Skills
Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses
3. What is the formula for the logistic regression function?
f(z) = 1/(1+e-(α+1X1+2X2+….+kXk))
The Difference between Data Science, Machine Learning and Big Data!
4. How can the probability of a logistic regression model be expressed as conditional probability?
P(Discrete value of Target variable | X1, X2, X3….Xk). It is the probability of the target variable taking up a discrete value (either 0 or 1 in case of binary classification problems) when the values of independent variables are given. For example, the probability an employee will attrite (target variable) given his attributes such as his age, salary, KRA’s, etc.
5. What are the odds?
These types of logistic regression questions and answers are being asked during the interview to understand the level of basic foundation the candidate has. It is the ratio of the probability of an event occurring to the probability of the event not occurring. For example, let’s assume that the probability of winning a lottery is 0.01. Then, the probability of not winning is 1- 0.01 = 0.99.’
The odds of winning the lottery = (Probability of winning)/(probability of not winning)
The odds of winning the lottery = 0.01/0.99
The odds of winning the lottery are 1 to 99, and the odds of not winning the lottery are 99 to 1.
6. What are the outputs of the logistic model and the logistic function?
The logistic model outputs the logits, i.e. log odds; and the logistic function outputs the probabilities.
Logistic model = α+1X1+2X2+….+kXk. The output of the same will be logits.
Logistic function = f(z) = 1/(1+e-(α+1X1+2X2+….+kXk)). The output, in this case, will be the probabilities.
Best Machine Learning and AI Courses Online
Master of Science in Machine Learning & AI from LJMU
Executive Post Graduate Programme in Machine Learning & AI from IIITB
Advanced Certificate Programme in Machine Learning & NLP from IIITB
Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB
Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland
To Explore all our courses, visit our page below.
Machine Learning Courses
7. How to interpret the results of a logistic regression model? Or, what are the meanings of alpha and beta in a logistic regression model?
Alpha is the baseline in a logistic regression model. It is the log odds for an instance when all the attributes (X1, X2,………….Xk) are zero. In practical scenarios, the probability of all the attributes being zero is very low. In another interpretation, Alpha is the log odds for an instance when none of the attributes is taken into consideration.
Beta is the value by which the log odds change by a unit change in a particular attribute by keeping all other attributes fixed or unchanged (control variables).
The beta in logistical regression is associated with predictor X, where X is representing the expected change in log odds. Whereas, the alpha is a constant.
Join the Artificial Intelligence Course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.
8. What is odds ratio?
The Odds ratio is the ratio of odds between two groups. The odd ratio is carried out to obtain the ratio of more than one variable that is involved. For example, let’s assume that we are trying to ascertain the effectiveness of a medicine. We administered this medicine to the ‘intervention’ group and a placebo to the ‘control’ group.
Odds ratio (OR) = (odds of the intervention group)/(odds of the control group)
Interpretation
If the odds ratio = 1, then there is no difference between the intervention group and the control group
If the odds ratio is greater than 1, then the control group is better than the intervention group
If the odds ratio is less than 1, then the intervention group is better than the control group.
5 Breakthrough Applications of Machine Learning
9. What is the formula for calculating the odds ratio?
In the formula above, X1 and X0 stand for two different groups for which the odds ratio needs to be calculated. X1i stands for the instance ‘i’ in group X1. Xoi stands for the instance ‘i’ in group X0. stands for the coefficient of the logistic regression model. Note that the baseline is not included in this formula.
In-demand Machine Learning Skills
Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses
10. Why can’t linear regression be used in place of logistic regression for binary classification?
The reasons why linear regressions cannot be used in the case of binary classification are as follows:
Distribution of error terms: The distribution of data in the case of linear and logistic regression is different. Linear regression assumes that error terms are normally distributed. In the case of binary classification, this assumption does not hold true.
Model output: In linear regression, the output is continuous. In the case of binary classification, an output of a continuous value does not make sense. For binary classification problems, linear regression may predict values that can go beyond 0 and 1. If we want the output in the form of probabilities, which can be mapped to two different classes, then its range should be restricted to 0 and 1. As the logistic regression model can output probabilities with logistic/sigmoid function, it is preferred over linear regression.
Variance of Residual errors: Linear regression assumes that the variance of random errors is constant. This assumption is also violated in the case of logistic regression.
This can be asked in an alternate ways such as , “Logistic regression error values are normally distributed. state if it is true or false?” or “ Select the wrong statement about the logistic regression?”
Linear Regression is a model that is used to estimate the relationship between two variables, one dependent and one independent variable using a straight line. Linear Regression is helpful in predicting the value of a variable based on another value as two variables are involved here. The prediction done using linear regression provides a scientific and accurate depth to the study.
FYI: Free Deep Learning Course!
11. Is the decision boundary linear or nonlinear in the case of a logistic regression model?
The decision boundary is a line that separates the target variables into different classes. The decision boundary can either be linear or nonlinear. In the case of a logistic regression model, the decision boundary is a straight line.
Logistic regression model formula = α+1X1+2X2+….+kXk. This clearly represents a straight line. Logistic regression is only suitable in such cases where a straight line is able to separate the different classes. If a straight line is not able to do it, then nonlinear algorithms should be used to achieve better results.
The importance of decision boundaries is high. It is a known fact that the decision boundary is the surface that separates the data points belonging to different class labels. These are not limited to the data points that are already provided. The model has the feature of making predictions for any new possible combinations as well.
12. What is the likelihood function?
The likelihood function is the joint probability of observing the data. For example, let’s assume that a coin is tossed 100 times and we want to know the probability of getting 60 heads from the tosses. This example follows the binomial distribution formula.
p = Probability of heads from a single coin toss
n = 100 (the number of coin tosses)
x = 60 (the number of heads – success)
n-x = 30 (the number of tails)
Pr(X=60 |n = 100, p)
The likelihood function is the probability that the number of heads received is 60 in a trail of 100 coin tosses, where the probability of heads received in each coin toss is p. Here the coin toss result follows a binomial distribution.
This can be reframed as follows:
Pr(X=60|n=100,p) = c x p60x(1-p)100-60
c = constant
p = unknown parameter
The likelihood function gives the probability of observing the results using unknown parameters.
13. What is the Maximum Likelihood Estimator (MLE)?
The MLE chooses those sets of unknown parameters (estimator) that maximise the likelihood function. The method to find the MLE is to use calculus and setting the derivative of the logistic function with respect to an unknown parameter to zero, and solving it will give the MLE. For a binomial model, this will be easy, but for a logistic model, the calculations are complex. Computer programs are used for deriving MLE for logistic models.
(Here’s another approach to answering the question.)
MLE is a statistical approach to estimating the parameters of a mathematical model. MLE and ordinary square estimation give the same results for linear regression if the dependent variable is assumed to be normally distributed. MLE does not assume anything about independent variables.
The point in the parameters that aim to maximise the likelihood function is famously known as the maximum likelihood estimate. This method has gained popularity for statistical inference owing to its intuitive and flexible features.
The maximum likelihood estimators have some interesting features such as consistency functional equivariance efficiency and second order efficiency. These features allow better scope for reliable outputs.
The maximum likelihood estimator is useful for getting unbiased output in the case of large data sets as well. Along with this, it facilitates achieving a consistent yet flexible approach while making it ideal for a broad range of applications.
14. What are the different methods of MLE and when is each method preferred?
In the case of logistics regression, there are two approaches of to MLE. They are conditional and unconditional methods. Conditional and unconditional methods are algorithms that use different likelihood functions. The unconditional formula employs a joint probability of positives (for example, churn) and negatives (for example, non-churn). The conditional formula is the ratio of the probability of observed data to the probability of all possible configurations.
The unconditional method is preferred if the number of parameters is lower compared to the number of instances. If the number of parameters is high compared to the number of instances, then conditional MLE is to be preferred. Statisticians suggest that conditional MLE is to be used when in doubt. Conditional MLE will always provide unbiased results.
These 6 Machine Learning Techniques are Improving Healthcare
15. What are the advantages and disadvantages of conditional and unconditional methods of MLE?
Conditional methods do not estimate unwanted parameters. Unconditional methods estimate the values of unwanted parameters also. Unconditional formulas can directly be developed with joint probabilities. This cannot be done with conditional probability. If the number of parameters is high relative to the number of instances, then the unconditional method will give biased results. Conditional results will be unbiased in such cases.
16. What is the output of a standard MLE program?
The output of a standard MLE program is as follows:
Maximised likelihood value: This is the numerical value obtained by replacing the unknown parameter values in the likelihood function with the MLE parameter estimator.
Estimated variance-covariance matrix: The diagonal of this matrix consists of estimated variances of the ML estimates. The off-diagonal consists of the covariances of the pairs of the ML estimates.
17. Why can’t we use Mean Square Error (MSE) as a cost function for logistic regression?
In logistic regression, we use the sigmoid function and perform a non-linear transformation to obtain the probabilities. Squaring this non-linear transformation will lead to non-convexity with local minimums. Finding the global minimum in such cases using gradient descent is not possible. Due to this reason, MSE is not suitable for logistic regression. Cross-entropy or log loss is used as a cost function for logistic regression. In the cost function for logistic regression, the confident wrong predictions are penalised heavily. The confident right predictions are rewarded less. By optimising this cost function, convergence is achieved.
18. Why is accuracy not a good measure for classification problems?
Accuracy is not a good measure for classification problems because it gives equal importance to both false positives and false negatives. However, this may not be the case in most business problems. For example, in the case of cancer prediction, declaring cancer as benign is more serious than wrongly informing the patient that he is suffering from cancer. Accuracy gives equal importance to both cases and cannot differentiate between them.
It is important to explain what is accuracy before answering this question. Accuracy as the name signifies is freedom from error. It is a condition or quality of being true, correct, and defect-free.It is not a good measure for classification problems in the case of imbalanced data.
19. What is the importance of a baseline in a classification problem?
Most classification problems deal with imbalanced datasets. Examples include telecom churn, employee attrition, cancer prediction, fraud detection, online advertisement targeting, and so on. In all these problems, the number of positive classes will be very low when compared to negative classes. In some cases, it is common to have positive classes that are less than 1% of the total sample. In such cases, an accuracy of 99% may sound very good but, in reality, it may not be.
Here, the negatives are 99%, and hence, the baseline will remain the same. If the algorithms predict all the instances as negative, then also the accuracy will be 99%. In this case, all the positives will be predicted wrongly, which is very important for any business. Even though all the positives are predicted wrongly, an accuracy of 99% is achieved. So, the baseline is very important, and the algorithm needs to be evaluated relative to the baseline.
A baseline is the most broken down or simplest possible prediction. A baseline is useful in understanding the reliability of any trained model.
20. What are false positives and false negatives?
False positives are those cases in which the negatives are wrongly predicted as positives. For example, predicting that a customer will churn when, in fact, he is not churning.
False negatives are those cases in which the positives are wrongly predicted as negatives. For example, predicting that a customer will not churn when, in fact, he churns.
The tests have a chance of having either false positives or false negatives. The professionals need to be extra cautious while working with the data to avoid any such scenarios of false positives and false negatives occurring.
21. What are the true positive rate (TPR), true negative rate (TNR), false-positive rate (FPR), and false-negative rate (FNR)?
TPR refers to the ratio of positives correctly predicted from all the true labels. In simple words, it is the frequency of correctly predicted true labels.
True Positives are the values that are actually positive and predicted positive.
TPR = TP/TP+FN
TNR refers to the ratio of negatives correctly predicted from all the false labels. It is the frequency of correctly predicted false labels.
True negatives are the values that are actually negative and predicted negative.
TNR = TN/TN+FP
FPR refers to the ratio of positives incorrectly predicted from all the true labels. It is the frequency of incorrectly predicted false labels.
False positives are the values that are actually negative and predicted positive.
FPR = FP/TN+FP
FNR refers to the ratio of negatives incorrectly predicted from all the false labels. It is the frequency of incorrectly predicted true labels.
False negatives are the values that are actually positive and predicted negative.
FNR = FN/TP+FN
22. What are precision and recall?
Precision is the proportion of true positives out of predicted positives. To put it in another way, it is the accuracy of the prediction. It is also known as the ‘positive predictive value’.
Precision = TP/TP+FP
Recall is the same as the true positive rate (TPR).
It is important to examine both precision and recall while evaluating a model’s effectiveness. Precision is known to be a fraction of relevant instances among the retrieved instances. And recall is a fraction of relevant instances that were retrieved.
How does Unsupervised Machine Learning Work?
23. What is the F-measure?
It is the harmonic mean of precision and recall. In some cases, there will be a trade-off between precision and recall. In such cases, the F-measure will drop. It will be high when both the precision and the recall are high. Depending on the business case at hand and the goal of data analytics, an appropriate metric should be selected.
F-measure = 2 X (Precision X Recall) / (Precision+Recall)
The F-score or F- measure is commonly used for evaluation o information retrieval system such as search engines, etc. The F- measure is used to measure the model accuracy. It combines precision and recall. And is defined as the harmonic mean of the precision and recall of the model. It measures the accuracy of a test.
24. What is accuracy?
It is the number of correct predictions out of all predictions made.
Accuracy = (TP+TN)/(The total number of Predictions)
25. What are sensitivity and specificity?
Specificity is the same as true negative rate, or it is equal to 1 – false-positive rate.
Specificity = TN/TN + FP.
Sensitivity is the true positive rate.
Sensitivity = TP/TP + FN
26. How to choose a cutoff point in the case of a logistic regression model?
The cutoff point depends on the business objective. Depending on the goals of your business, the cutoff point needs to be selected. For example, let’s consider loan defaults. If the business objective is to reduce the loss, then the specificity needs to be high. If the aim is to increase profits, then it is an entirely different matter. It may not be the case that profits will increase by avoiding giving loans to all predicted default cases. But it may be the case that the business has to disburse loans to default cases that are slightly less risky to increase the profits. In such a case, a different cutoff point, which maximises profit, will be required. In most instances, businesses will operate around many constraints. The cutoff point that satisfies the business objective will not be the same with and without limitations. The cutoff point needs to be selected considering all these points. As a thumb rule, choose a cutoff value that is equivalent to the proportion of positives in a dataset.
What is Machine Learning and Why it matters
27. How does logistic regression handle categorical variables?
The inputs to a logistic regression model need to be numeric. The algorithm cannot handle categorical variables directly. So, they need to be converted into a format that is suitable for the algorithm to process. The various levels of a categorical variable will be assigned a unique numeric value known as the dummy variable. These dummy variables are handled by the logistic regression model as any other numeric value.
28. What is a cumulative response curve (CRV)?
In order to convey the results of an analysis to the management, a ‘cumulative response curve’ is used, which is more intuitive than the ROC curve. A ROC curve is very difficult to understand for someone outside the field of data science. A CRV consists of the true positive rate or the percentage of positives correctly classified on the Y-axis and the percentage of the population targeted on the X-axis. It is important to note that the percentage of the population will be ranked by the model in descending order (either the probabilities or the expected values). If the model is good, then by targeting a top portion of the ranked list, all high percentages of positives will be captured. As with the ROC curve, there will be a diagonal line that represents random performance. Let’s understand this random performance as an example. Assuming that 50% of the list is targeted, it is expected that it will capture 50% of the positives. This expectation is captured by the diagonal line, which is similar to the ROC curve.
29. What are the lift curves?
The lift is the improvement in model performance (increase in true positive rate) when compared to random performance. Random performance means if 50% of the instances are targeted, then it is expected that it will detect 50% of the positives. Lift is in comparison to the random performance of a model. If a model’s performance is better than its random performance, then its lift will be greater than 1.
In a lift curve, the lift is plotted on the Y-axis and the percentage of the population (sorted in descending order) on the X-axis. At a given percentage of the target population, a model with a high lift is preferred.
30. Which algorithm is better at handling outliers logistic regression or SVM?
Logistic regression will find a linear boundary if it exists to accommodate the outliers. Logistic regression will shift the linear boundary in order to accommodate the outliers. SVM is insensitive to individual samples. There will not be a major shift in the linear boundary to accommodate an outlier. SVM comes with inbuilt complexity controls, which take care of overfitting. This is not true in the case of logistic regression.
31. How will you deal with the multiclass classification problem using logistic regression?
The most famous method of dealing with multiclass classification using logistic regression is using the one-vs-all approach. Under this approach, a number of models are trained, which is equal to the number of classes. The models work in a specific way. For example, the first model classifies the datapoint depending on whether it belongs to class 1 or some other class; the second model classifies the datapoint into class 2 or some other class. This way, each data point can be checked over all the classes.
32. Explain the use of ROC curves and the AUC of a ROC Curve.
A ROC (Receiver Operating Characteristic) curve illustrates the performance of a binary classification model. It is basically a TPR versus FPR (true positive rate versus false-positive rate) curve for all the threshold values ranging from 0 to 1. In a ROC curve, each point in the ROC space will be associated with a different confusion matrix. A diagonal line from the bottom-left to the top-right on the ROC graph represents random guessing. The Area Under the Curve (AUC) signifies how good the classifier model is. If the value for AUC is high (near 1), then the model is working satisfactorily, whereas if the value is low (around 0.5), then the model is not working properly and just guessing randomly.
33. How can you use the concept of ROC in a multiclass classification?
The concept of ROC curves can easily be used for multiclass classification by using the one-vs-all approach. For example, let’s say that we have three classes ‘a’, ’b’, and ‘c’. Then, the first class comprises class ‘a’ (true class) and the second class comprises both class ‘b’ and class ‘c’ together (false class). Thus, the ROC curve is plotted. Similarly, for all three classes, we will plot three ROC curves and perform our analysis of AUC.
Popular AI and ML Blogs & Free Courses
IoT: History, Present & Future
Machine Learning Tutorial: Learn ML
What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles
A Day in the Life of a Machine Learning Engineer: What do they do?
What is IoT (Internet of Things)
Permutation vs Combination: Difference between Permutation and Combination
Top 7 Trends in Artificial Intelligence & Machine Learning
Machine Learning with R: Everything You Need to Know
AI & ML Free Courses
Introduction to NLP
Fundamentals of Deep Learning of Neural Networks
Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World
Introduction to Tableau
Case Study using Python, SQL and Tableau
We have so far covered the two most basic ML algorithms, Linear and Logistic Regression, and we hope that you have found these resources helpful.
Learn ML Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Machine Learning Engineers: Myths vs. Realities
The next part of this series is based on another very important ML Algorithm, Clustering. Feel free to post your doubts and questions in the comment section below.
Co-authored by – Ojas Agarwal

Read More13 Sep 2022

Blogs

5348

With Data Science jobs on the rise, there’s a question that often lurks in the minds of aspirants – What’s the difference between a Data Scientist and a Data Analyst?
Are these 2 the same?
Such questions have been a source of great confusion among youngsters who wish to make a successful career in Data Science. Today, we’re here to put these questions to rest and clarify the entire matter for you!
Before diving in deep into the job profile of a Data Scientist and that of a Data Analyst, let’s first understand the core difference between the 2 job roles.
Data Scientist Job Role – Data Scientists are expert professionals equipped with a combination of coding, mathematical, statistical, analytical, and ML skills. Even during a Data Science interview, most of the questions are in and around these concepts. They explore and examine large datasets gathered from multiple sources, clean it, organize it, and process it to facilitate the ease of interpretation. While they can do analyzing tasks of an analyst, they also have to work with advanced ML algorithms, predictive models, programming and statistical tools to make sense of data and develop new processes for data modeling. A Data Scientist can also be labeled as a Data Researcher or a Data Developer, depending upon the skill set and job demand.
Data Analyst Job Role – As the name suggests, Data Analysts are primarily involved with the day-to-day data collection and analysis tasks. They must sift through data to identify meaningful insights from data. They look at business problems and try to find the answers to a specific set of questions from a given set of data. Furthermore, Data Analysts create visual representations of data in the form of graphs, charts, etc., for the ease of understanding of every stakeholder involved in the business process. A Data Analyst can also labelled as Data Architect or Data Administrator or an Analytics Engineer, depending upon the skill set and job demand.
Gathering from this description of the two job profiles, it is clear that a Data Scientist mainly deals with finding meaning from incoherence (unstructured/semi-structured datasets), whereas a Data Analyst has to find answers to questions based on the findings of a Data Scientist. However, sometimes the job roles do overlap, thereby giving rise to a grey area. And while Data Analysts and Data Scientists both share some similarities, there are certain pivotal differences between the two roles.
Data Scientist and Data Analyst – A Comparision
1. Responsibilities
Just a minute ago, we talked about the primary job responsibilities of a Data Scientist and Data Analyst in a nutshell. Now, we’ll talk about their respective job responsibilities in detail.
Data Scientist:
Create & define programs for data collection, modelling, analysis, and reporting.
Perform data cleansing and processing operations to mine valuable insights from data.
Develop custom data models and ML algorithms to suit company/customer needs.
To mine and analyze data from company databases to foster optimization and improvement of business operations (product development, marketing techniques, and customer satisfaction).
To use the right data visualization and predictive modelling tools to boost revenue generation, marketing strategies, enhance customer experiences, etc.
To develop new ML methods and analytical models.
To correlate different datasets, determine the validity of new data sources and data collection methods.
To coordinate and communicate with both IT and business management teams to implement data models and monitor the outcomes.
To identify new business opportunities and determine how the findings can be used to enhance business strategies and outcomes.
To create sophisticated tools/processes to monitor and analyze the performance of data models accurately.
To develop A/B testing frameworks to test model functioning and quality.
To take on the role of a visionary who can unlock new possibilities from data.
Data Analyst:
To analyze and mine business data to identify correlations and discover valuable patterns from disparate data points.
To work with customer-centric algorithm models and personalize them to fit individual customer requirements.
To create and deploy custom models to uncover answers to business matters such as marketing strategies and their performance, customer taste, and preference patterns, etc.
To map and trace data from multiple systems to solve specific business problems.
To write SQL queries to extract data from the data warehouse and to identify the answers to complex business issues.
To apply statistical analysis methods to conduct consumer data research and analytics.
To coordinate with Data Scientists and Data Engineers to gather new data from multiple sources.
To design and develop data visualization reports, dashboards, etc., to help the business management team to make better business decisions.
To perform routine analysis tasks as well as quantitative analysis as and when required to support day-to-day business functioning and decision making.
Checkout: Data Analyst Salary in India
Explore our Popular Data Science Courses
Executive Post Graduate Programme in Data Science from IIITB
Professional Certificate Program in Data Science for Business Decision Making
Master of Science in Data Science from University of Arizona
Advanced Certificate Programme in Data Science from IIITB
Professional Certificate Program in Data Science and Business Analytics from University of Maryland
Data Science Courses
2. Skills
The role of a Data Scientist is highly specialized and versatile. Hence, Data Scientists mostly have advanced degrees such as a Master’s or PhD. According to KDnuggets, nearly 88% of Data Scientists have a master’s degree, and at least 46 % of them hold a PhD. Let’s take a look at the role requirements of a Data Scientist:
A minimum of a Master’s degree in Statistics/Mathematics/Computer Science. Better if you have a PhD.
Proficiency in programming languages like R, Python, Java, SQL, to name a few.
In-depth knowledge of ML techniques, including clustering, decision trees, artificial neural networks, etc.
In-depth knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests, etc.).
Experience in working with statistical and data mining techniques (linear regression, random forest, trees, text mining, social network analysis, etc.).
Experience in working with as well as creating data architectures.
Experience in manipulating data sets and developing statistical models.
Experience in using web services such as S3, Spark, Redshift, DigitalOcean, etc.
Experience in analyzing data from third-party providers like Google Analytics, AdWords, Facebook Insights, Site Catalyst, Coremetrics, etc.
Experience in working with distributed data/computing tools like Map/Reduce, Hadoop, Spark, Hive, MySQL, etc.
Experience in data visualization using tools like ggplot, Tableau, Periscope, Business Objects, D3, etc.
For the job role of a Data Analyst, the minimum requirement is to have an undergraduate STEM (science, technology, engineering, or math) degree. Having advanced degrees is excellent, but it is not a necessity. If you have strong Math, Science, Programming, Database, Predictive Analytics, and Data Modeling skills, you’re good to go. Here’s a list of all the essential requirements for a Data Analyst:
Undergraduate degree in Mathematics/Statistics/Business with a focus on analytics.
Proficiency in programming languages like R, Python, Java, SQL, to name a few.
A solid combination of analytical skills, intellectual curiosity, and business acumen.
In-depth knowledge of data mining techniques and emerging technologies including MapReduce, Spark, ML, Deep Learning, artificial neural networks, etc.
Experience in working with the agile methodology.
Experience in working with Microsoft Excel and Office.
Strong communication skills (both verbal and written).
Ability to manage and handle multiple priorities simultaneously.
Top Data Science Skills to Learn to upskill
SL. No
Top Data Science Skills to Learn
1
Data Analysis Online Courses
Inferential Statistics Online Courses
2
Hypothesis Testing Online Courses
Logistic Regression Online Courses
3
Linear Regression Courses
Linear Algebra for Analysis Online Courses
upGrad’s Exclusive Data Science Webinar for you –
How upGrad helps for your Data Science Career?
document.createElement('video');
https://cdn.upgrad.com/blog/alumni-talk-on-ds.mp4
3. Salary
According to a PwC study report, by 2020, there will be around 2.7 million job openings for Data Scientists and Data Analysts. It further states that the applicants for these job roles must be “T-shaped”, as in, they must possess not only technical and analytical skills but also soft skills including communication, teamwork, and creativity. Since it is difficult to find such talent with the right skill set and the demand for Data Scientists and Analysts exceed the supply by a large margin, these roles promise handsome salary package.
However, the job of a Data Scientist being much more demanding than that of a Data Analyst, the salary of Data Analysts is naturally lower than Data Scientists. Glassdoor maintains that the average annual salary of Data Scientists is Rs. 10,00,000, whereas that of a Data Analyst is Rs. 4,82,041.
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Concluding thoughts
Considering all the points mentioned above, the job title of Data Scientists and Data Analysts seem deceptively similar owing to the few similarities in skill sets and job responsibilities. For instance, if you have a STEM background with a flair in programming, analytics, and statistics, you are ideally suited for a career in Data Science. However, the subtle differences between the two give rise to the significant disparity in the salary level.
Read our popular Data Science Articles
Data Science Career Path: A Comprehensive Career Guide
Data Science Career Growth: The Future of Work is here
Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers
The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have
Top 6 Reasons Why You Should Become a Data Scientist
A Day in the Life of Data Scientist: What do they do?
Myth Busted: Data Science doesn’t need Coding
Business Intelligence vs Data Science: What are the differences?
If you are still cannot make a choice, let’s make it simpler for you – suppose you are great with numbers, but you still need to go a long way to perfect your coding and data modelling skills, you’d better start your career as a Data Analyst. Gradually, you can upskill and then become a Data Scientist. This way, the job of a Data Analyst can become a stepping stone to becoming a Data Scientist. All in all, both the options are emerging and highly lucrative career choices, so you’ll have a promising career in Data Science no matter what you choose.

Read More08 Jul 2019

Blogs

6403

Industries are using Data science in exciting and creative ways. Data Science is turning up in unexpected places improving the efficiency of various sectors. It is powering up human decision making and impacting the top and bottom lines of the business like never before. Industries are delighting millions of customers by powering up their applications with data science and machine learning.
Top Machine Learning and AI Courses Online
Master of Science in Machine Learning & AI from LJMU
Executive Post Graduate Programme in Machine Learning & AI from IIITB
Advanced Certificate Programme in Machine Learning & NLP from IIITB
Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB
Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland
To Explore all our certification courses on AI & ML, kindly visit our page below.
Machine Learning Certification
This blog series aims to talk about interesting applications of data science and machine learning in various companies. A company will be spotlighted in each blog post. This blog series will talk about how companies like Google, Apple, LinkedIn, Uber, Instagram, Twitter, Instacart, Netflix, Washington post, Quora, Pinterest, Amazon, Medium, Microsoft, etc. are leveraging Data Science and Machine learning to power their businesses. So, let us start this series with ‘Netflix’.
Trending Machine Learning Skills
AI Courses
Tableau Certification
Natural Language Processing
Deep Learning AI
Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
NETFLIX
It is well known that Netflix uses Recommendation Systems for suggesting movies or shows to its customers. Apart from movie recommendations, there are many other lesser-known areas in which Netflix is using data science and machine learning are:
Deciding personalised Artwork for the movies and shows
Suggesting the best frames from a show to the editors for creative work
Improving the Quality of Service (QoS) streaming by deciding about video encoding, advancements in client side and server side algorithms, caching the video etc
Optimizing different stages of production
Experimenting with various algorithms using A/B testing and deciding causal inference. Reduce the time taken for experimenting using interweaving etc.
A Sample Road-map for Building Your Data Warehouse
Personalised Artwork
Every movie recommended by Netflix comes with associated Artwork. The Artwork that comes along with a movie suggestion is not common for everyone. Like movie recommendation, the Artwork related to a show is also personalised. All the members do not see a single best Artwork. A portfolio of Artwork will be created for a specific title. Depending on the taste and preference of the audience machine learning algorithm will choose an artwork which maximises the chances of viewing the title.
A portfolio of Artwork created for the title ‘Stranger Things’:
Personalisation at work. Top row – Artwork suggested for a viewer who likes the actress Uma Thurman. Bottom row – Artwork suggestion for a viewer who likes the actor John Travolta:
Artwork personalisation is not always straightforward. There are challenges to artwork personalisation. Firstly, a single image can only be chosen for Artwork personalisation. In contrast, many movies can be recommended at a time. Secondly, the artwork suggestion should work in association with a movie recommendation engine. It typically sits on top of movie recommendation. Thirdly, personalised artwork recommendation should take into account image suggestions for other movies. Otherwise, there will not be variation and diversity in artwork suggestions which will be monotonous. Fourth, Should the same artwork or a different one be displayed between sessions. Every time showing different images will confuse the viewer and will also lead to the attribution problem. Attribution problem is which Artwork lead the audience to view the show.
Artwork personalisation leads to significant improvements in discovering content by the viewers. Artwork Personalisation is the first instance of not only a personalised recommendation but how the recommendation is made to the members. Netflix is still actively researching and perfecting this nascent technique.
An Overview of Association Rule Mining and its Applications
Art of Image Discovery
A single hour of ‘Stranger Things’ consists of 86,000 static video frames. A single season (10 episodes) consists on average 9 million total frames. Netflix is adding content regularly to cater to its global customers. In such a situation it is not possible to harvest manually to find the ‘Right’ artwork for the ‘Right’ person. It is next to impossible for the human editors to search for the best frames which will bring out the unique elements of the show. To tackle this challenge at scale Netflix built a suite of tools to resurface best frames which truly capture the true spirit of the show.
Pipeline to automatically capture the best frames for a show:
Frame annotations are used to capture the objective signals which are used for image ranking. To achieve frame annotations a video is divided into multiple small chunks. These chunks are processed in parallel using a framework known as ‘Archer’. This parallel processing is helping Netflix to capture the frame annotations in scale. Each piece is handled by a machine vision algorithm to obtain the frame characteristics. For example, some of the properties of the frame that are captured are colour, brightness, contrast etc. A category of features which will tell what is happening in a frame and caught during frame annotation are face detection, motion estimation, object detection etc. Netflix also identified a set of properties from the core principles of photography, cinematography and visual aesthetic design like rule-of-third etc. which are captured during frame annotation.
The next step after frame annotation is to rank the images. Some factors considered for ranking are actors, diversity of the images, content maturity etc. Netflix is using deep learning techniques to cluster the images of actors in a show, prioritise the main characters and de-prioritise the secondary characters. The frames with violence and nudity are given a meagre score. Using this ranking method the best frames for a show is surfaced. This way the artwork and editorial team will have a set of high-quality images to work with instead of dealing with millions of frame for a particular episode.
Data Science in Production
Netflix is spending eight billion dollars this year for creating original content. Content created for millions of audience across the globe in more than 20 languages. It should not surprise us if Netflix is using Data Science for producing original content. In fact, Netflix is using Data Science in every step of content production.
Typically producing content will consist of pre-production, production and post-production stages. Planning, budgeting etc. happens in pre-production. Principal photography is part of the production. Steps like editing, sound mixing etc. are part of post-production. Adding of sub-titles and removing the technical glitches are part of localisation and quality control. Now let us see how data science help optimises each stage of production.
Pipeline to automatically capture the best frames for a show:
As said earlier, budgeting is part of pre-production. Many decisions need to take before production starts. For example, the location for shooting. Data science is extensively used to analyse the cost implications of a specific location. Decisions are taken by delicately balancing the creative vision and budgets. Costs minimisation is done without compromising the vision of the content.
Production involves shooting thousands of shots spanning many months. Production will have an objective, but it needs to be undertaken under specific constraints. For example, constraints can be that an actor is available for only one week, a location is only available for particular days, the working hours for the crew is 8 hours per day, time constraints such as a day shot or night shot, the team may have to move locations between shoots. Preparing a shooting schedule with all these constraints can be a nightmare for the director. Mathematical optimisation techniques are used here with an objective and constraints. This optimisation technique will give a rough shooting schedule. This schedule is refined further with adjustments.
Post-production will take as much time as production if not more. Data visualisation techniques are used to check the bottlenecks in post-production. Visualisation techniques are also used to track the trend in post-production and project it into the future. This forecasting is done to see the workload of various teams and staffing the team appropriately.
In localisation, shows are dubbed from one language to another. Prioritisation regarding which shows needs to be dubbed is decided based on data analysis. Dubbed content which proved popular in the past is prioritised. Quality control will check for issues like syncing between audio and video, syncing of subtitles with sound etc. Quality control is done both before and after encoding (the process of compressing videos into different bitrates for streaming on different devices). Netflix accumulated historical data from manual quality control checks. This data consisted of the errors which occurred in the past, the video formats in which the errors were found, the partners from whom this content was obtained, the genre of the content etc. Yes, Netflix saw a pattern of errors in the genre as well. Using this data a machine learning model was built which predicts either ‘pass’ or ‘fail’ of the quality checks. If a machine learning algorithm predicts ‘fail’, then that asset will go through a round of manual quality checks.
Top Companies Hiring Data Scientists in India
Streaming Quality of Experience and A/B testing
Data science is extensively used for ensuring the quality of the streaming experience. Quality of network connectivity is predicted to ensure the quality of streaming. Netflix actively predicts which show is going to be streamed in a particular location and caches the content in the nearby server. The caching and storing of content are done when internet traffic is low. This ensures content is streamed without buffers and customer satisfaction is maximized.A/B testing is extensively used whenever a change is done to the existing algorithm, or a new algorithm is proposed. New techniques like interleaving and repeated measures are used to speed up the A/B testing process using a very less number of samples.
Popular AI and ML Blogs & Free Courses
IoT: History, Present & Future
Machine Learning Tutorial: Learn ML
What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles
A Day in the Life of a Machine Learning Engineer: What do they do?
What is IoT (Internet of Things)
Permutation vs Combination: Difference between Permutation and Combination
Top 7 Trends in Artificial Intelligence & Machine Learning
Machine Learning with R: Everything You Need to Know
AI & ML Free Courses
Introduction to NLP
Fundamentals of Deep Learning of Neural Networks
Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World
Introduction to Tableau
Case Study using Python, SQL and Tableau
To conclude, these are some ways Netflix is using data analysis to engage and awe the customers. If you are interested in diving deep and knowing more about how this marvellous company is using data science, visit their Research blog. There is a treasure trove of articles on their blog waiting to be explored.
A Beginner’s Guide to Data Science and Its Applications
In the upcoming blog series let us see how Instacart is leveraging data science and machine learning. Now you have read this blog, provide feedback on what you think about this article. Also, offer suggestions regarding which company you would like to see in my future series.
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Read More21 Aug 2018

Blogs

7195

In a lot of Data Science interviews, it is common to ask business-related questions. The interviewee is expected to solve the challenge faced by a business in an interview. For example, the profits earned by a newspaper company is dropping, what can be done to rescue the situation. How Reliance Jio can decide if it is beneficial to start its operations from a new location. How the launch of Baba Ramdev SIM will affect the business of Reliance Jio, etc.
It is common for Interviewees to prepare well for Data Science questions. They expect and prepare well for questions like, “How to impute missing values”, “How do you decide which algorithm is suitable for a dataset”. However, the candidates are completely baffled when they face business cases. A part of the reason is that the candidates are not expecting business case questions in an interview. Another reason being none of the data science blogs or courses does not touch upon how to convert business problems to data science problems. There are frameworks available for Data analysis but they are surprisingly quiet on how to convert the business to the data problems. For example, the CRISP-DM framework is very famous for Data analysis. The first and second steps of CRISP-DM are ‘Business understanding’ and ‘Data Understanding’. Most aspiring data scientist does not know how to proceed from the 1st to 2nd step. The aim of this article is to fill this gap. Victor Cheng’s book and videos helped me in understanding how to convert business problems to data problems. This article is based on his teachings.
Interviewer’s Mindset
Before diving-in how to answer case interviews, let us understand the interviewer’s mindset. What are the interviewers looking for when they ask business questions? Some of the things which interviewers are looking for are:-
Do the candidates possess independent thinking mindset?
Are the answers good enough if not precise?
Are the solutions offered by the candidate client friendly (In case of consulting companies)
Is the solution offered by the candidates linear?
Did the candidate explain the solution visually?
Is the candidate’s solution practical to implement?
Learn data science certification courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
The candidate solution should proceed linearly and logically from a challenge to a solution. If the solution is scattered, jumping from one point to another arbitrarily then that interview is over immediately in the mind of the interviewer. Additionally, a right approach with a wrong answer is preferred over a right answer with a wrong approach. If the approach is wrong and the answer is correct, interviewers will assume that the candidate got lucky. This will not be the case in all situations. If the approach is correct then it is repeatable and can be applied to many business situations.
Our learners also read: Free Online Python Course for Beginners
What Is Data Science? Who is a Data Scientist? What is Analytics?
Answering ‘Case interview’ questions
The answers to case interview questions can be divided into three stages. Open, analyze and close. Open and close are formulaic and can be answered easily with practice. Analyse stage differs according to the business problem and involves thinking and creativity. Let us see what to do in each stage of answering the question.
The steps for answering Case interview questions are:-
Stalling
Verify your understanding
Identify the structure of the problem
Analyse
Close
Stalling (Opening Stage)
Victor Cheng asks to take a pause of five seconds before telling something similar to, “Ah, this is a very interesting problem”. This is known as ‘stalling’. If you are waiting more than five seconds to answer, then the interviewer will think that you don’t know the solution. Stalling helps in getting some valuable time to think through the problem.
Verify your understanding
Usually ‘Case Interview’ questions are a one-liner. All the required information will not be provided to answer the question. The candidates are expected to ask questions and verify their understanding. As previously said if it is a case of increasing the newspaper profits, then you may ask questions like ‘what topics does the newspaper cover?’, ‘What is the target audience of this newspaper’, etc. Clarify about any terminology the interviewer used and about which you are not sure. Here it is important that you do not assume anything. Assuming may lead to solving wrong problems which are not faced by the company. This exercise will also inform the interviewer about how good you are in seeking help when it is required.
During the initial phases of answering you will have the freedom to ask open-ended questions. As time progresses you will lose freedom and will only able to ask close-ended questions. Asking open-ended questions towards the end will lead to the interviewer thinking that you are trying to ask the answers. So, don’t hesitate and ask questions which you feel are valuable for answering.
“How to Become a Data Scientist” Answered!
Identify the Structure
Once you get the required information after asking questions, identify the structure to which the question belongs. Victor specifies four different frameworks to which a case interview question can belong. The frameworks in their order of importance (according to the frequency asked in interviews) are:
Profit
Business situation
Merger & Acquisition
Supply / Demand
The examples of Business situation framework are – the launch of a New product, responding to the competitor behaviour, changes in demand, growth strategies for a company, etc. Examples of Supply / Demand problems are building a new factory or shutting down a factory, change in capacity through acquisition, change in demand, etc.
upGrad’s Exclusive Data Science Webinar for you –
Watch our Webinar on The Future of Consumer Data in an Open Data Economy
document.createElement('video');
https://cdn.upgrad.com/blog/sashi-edupuganti.mp4
Top Data Science Skills to Learn to upskill
SL. No
Top Data Science Skills to Learn
1
Data Analysis Online Courses
Inferential Statistics Online Courses
2
Hypothesis Testing Online Courses
Logistic Regression Online Courses
3
Linear Regression Courses
Linear Algebra for Analysis Online Courses
Explore our Popular Data Science Courses
Executive Post Graduate Programme in Data Science from IIITB
Professional Certificate Program in Data Science for Business Decision Making
Master of Science in Data Science from University of Arizona
Advanced Certificate Programme in Data Science from IIITB
Professional Certificate Program in Data Science and Business Analytics from University of Maryland
Data Science Courses
These frameworks are not hard and fast. The profit problem may ultimately end up in a business situation or supply/demand problem. Nevertheless, this categorisation will provide a structure to our thinking and help us in moving forward with the challenge at hand.
Once you identify the problem structure and match it to the appropriate framework, the next step is to describe the key components of the framework. For example, in case of increasing the profits of a newspaper, you can talk about revenue and cost. Be careful and don’t name the framework explicitly. A framework is only for structuring your thinking and not for mentioning it to the interviewer. Draw the key components of the framework along with your description. As you practice these steps it will become second nature to you. The key points in identifying the structure are:
Identify the nature of the problem
Match the problem to the appropriate framework
Describe the key components of the framework
Draw
Once this is done proceed to the next step in the process which is analysing the problem.
Top 17 Data Analyst Interview Questions and Answer
Analyse
Here Analyse does not mean analysing the data. It is still far away. Data comes at the end. Before asking for data there are other things which need to be addressed.
Start your analysis by asking where to start. To improve the profitability of a newspaper ask if you should proceed with cost or revenue. Depending on the choice of the interviewer proceed with your answer.
Visualise the underlying problem as a decision tree and you start at the root. State the hypothesis and pick a branch according to the choice of the interviewer. Identify and state the key issues within a branch. Ask standard questions and keep drilling down. In this process keep refining your hypothesis. If you reach a dead end of a branch without any resolution, trace up to the node of the branch and traverse the opposite direction.
Think out loud during the whole process. This will help interviewers to know about your thinking and analytical skills. If there is any flaw in your thinking they may even point it out and help you in the right direction. It is always a good practice to think aloud about case interview questions.
Once you reach the leaf of your decision tree with a hypothesis or get to a point where the interviewers are not sure about the hypothesis, then you may go ahead and ask for data. Data which can help either accept or refute your hypothesis. Data which will help derive insights and make business decisions. A point to be remembered is, all the data requests should be backed by solid explanation. Asking for data without any explanation will not go well with the interviewers or the clients in case of consultancy.
This completes the conversion of business problems to data analysis problem. Once this is done, now you know how to proceed ahead with analysing the data. From here the standard CRISP-DM steps follows:
Understanding the data
Modelling the data
Validating the model
Model deployment
Updating the model and keeping it relevant
Read our popular Data Science Articles
Data Science Career Path: A Comprehensive Career Guide
Data Science Career Growth: The Future of Work is here
Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers
The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have
Top 6 Reasons Why You Should Become a Data Scientist
A Day in the Life of Data Scientist: What do they do?
Myth Busted: Data Science doesn’t need Coding
Business Intelligence vs Data Science: What are the differences?
Tips and Tricks to answering the Case Interview questions
Segment your numbers
Company Vs competitor
Current year Vs past year
Think aloud
Explain why you need data before asking for it
Don’t assume anything
Practice answering business problems
How to Make a Bright Career in Data
Segment your numbers
Data science is all about breaking down the problem into its constituent parts. Analyse the parts and derive insights. Combine the parts together and offer recommendations backed by data analysis. This combination of parts is also known as ‘synthesis’ in the language of consultancy.
Better insights can be derived when data is segmented into parts. Let us assume that the newspaper profits are down due to losses in revenue. The loss in revenue as a whole does not offer many insights. If this revenue is segmented into different buckets based on the age of subscribers it may provide more valuable insights. A company can execute an action targeting a particular age group to improve the revenue. Given a business problem, always see how best to break it into different parts.
Given a business problem, think if it is faced only by a particular company or by the whole industry. The solution and recommendations will be different for both the cases. Another line of thinking is comparing between the past and present performance. Both these line of thoughts will provide a direction to move ahead in answering the case interview question.
Knowing about the steps and different frameworks will only take you up to a certain point. What will help you in successfully answering the case interview questions are, ‘Practice, Practice and more Practice’. This is no other way around it. The fluency in asking verifying questions, identifying the structure, matching the problem to a framework, formulating the hypothesis, traversing along the decision tree, etc. will only come by practice. It is impossible to crack a case interview question or convert business problem to data analysis problem without practice.
Data Scientists: Myths vs. Realities
To summarise, converting business problems to data science problem can be equated with a case interview question. To successfully answer case interview questions the steps to be followed are – stalling the interviewer, identify the structure of the problem, match it to the underlying framework, formulate a hypothesis, traverse the decision tree by asking relevant questions and finally ask for the data by explaining why you want it. Finally, to succeed in answering case interview questions, practice it. Once you are able to convert a business problem into a data science problem, follow the CRISP-DM framework to analyse the results and provide recommendations backed by data.
Join the Professional Certificate Program in Data Science for Business Decision Making from IIM-Kozhikode today!

Read More02 Jul 2018

Blogs

5610

This article aims to explore the connection between the game ‘Go’ and artificial intelligence. The objective is to answer the questions – What makes the game of Go, special? Why was mastering the game of Go difficult for a computer? Why was a computer program able to beat a chess grandmaster in 1997? Why did it take close to two decades to crack Go?
Best Machine Learning and AI Courses Online
Master of Science in Machine Learning & AI from LJMU
Executive Post Graduate Programme in Machine Learning & AI from IIITB
Advanced Certificate Programme in Machine Learning & NLP from IIITB
Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB
Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland
To Explore all our courses, visit our page below.
Machine Learning Courses
“Gentlemen should not waste their time on trivial games – they should study Go”
– Confucius
In fact, artificial intelligence pundits thought computers would only be able to beat a world Go champion by 2027. Thanks to DeepMind, an artificial intelligence company under the umbrella of Google, this formidable task was achieved a decade earlier. This article will talk about the technologies used by DeepMind to beat the world Go champion. Finally, this post discusses how this technology can be used to resolve some complex, real-world problems.
Go – What is it?
Go is a 3000-year-old Chinese strategy board game, which has retained its popularity through the ages. Played by tens of millions of people worldwide, Go is a two-player board game with simple rules and intuitive strategy. Different board sizes are in use for playing this game; professionals use a 19×19 board.
The game starts with an empty board. Each player then takes turns to place the black and white stones (black goes first) on the board, at the intersection of the lines (unlike chess, where you place pieces in the squares). A player can capture the stones of the opponent by surrounding it from all sides. For each captured stone, some points are awarded to the player. The objective of the game is to occupy maximum territory on the board along with capturing your opponents’ stones.
In-demand Machine Learning Skills
Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses
Go is about creation, unlike Chess, which is about destruction. Go requires freedom, creativity, intuition, balance, strategy and intellectual depth to master the game. Playing Go involves both sides of the brain. In fact, the brain scans of Go players have revealed that Go helps in brain development by improving connections between both the brain hemispheres.
Go and the Challenge to Artificial Intelligence (AI)
Computers were able to master Tic-Tac-Toe in 1952. Deep Blue was able to beat Chess grandmaster Garry Kasparov in 1997. The computer program was able to win against the world champion in Jeopardy (a popular American game) in 2001. DeepMind’s AlphaGo was able to defeat a world Go champion in 2016. Why is it considered challenging for a computer program to master the game of Go?
Chess is played on an 8×8 board whereas Go uses a 19×19 size board. In the opening of a chess game, a player will have 20 possible moves. In a Go opening, a player can have 361 possible moves.The number of possible Go board positions is equal to 10 to the power 170; more than the number of atoms in our universe! The potential number of board positions makes Go googol times (10 to the power 100) more complex than chess.
In chess, for each step, a player is faced with a choice of 35 moves. On average, a Go player will have 250 possible moves at each step. In Chess, at any given position, it is relatively easy for a computer to do brute force search and choose the best possible move which maximises the chances of winning. A brute force search is not possible in the case of Go, as the potential number of legal moves allowed for each step is humongous.
For a computer to master chess, it becomes easier as the game progresses because the pieces are removed from the board. In Go, it becomes more difficult for the computer program as stones are added to the board as the game progresses. Typically, a Go game will last 3 times longer than a game of chess.
Due to all these reasons, a top computer Go program was only able to catch up with the Go world champion in 2016, after a huge explosion of new machine learning techniques. Scientists working at DeepMind were able to come up with a computer program called AlphaGo which defeated world champion Lee Seedol. Achieving the task was not easy. The researchers at DeepMind came up with many novel innovations in the process of creating AlphaGo.
“The rules of Go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play Go.”
– Edward Laskar
How AlphaGo Works
AlphaGo is a general purpose algorithm, which means it can be put to use for solving other tasks as well. For example, Deep Blue from IBM is specifically designed for playing chess. Rules of chess together with the accumulated knowledge from centuries of playing the game are programmed into the brain of the program. Deep Blue can’t be used even for playing trivial games like Tic-Tac-Toe. It can do only one specific thing, which it is very good at, i.e. playing chess. AlphaGo can learn to play other games as well apart from Go. These general purpose algorithms constitute a novel field of research, called Artificial General Intelligence.
AlphaGo uses state-of-the-art methods – Deep Neural Networks (DNN), Reinforcement Learning (RL), Monte Carlo Tree Search (MCTS), Deep Q Networks (DQN) (a novel technique introduced and popularised by DeepMind which combines neural networks with reinforcement learning), to name a few. It then combines all these methods innovatively to achieve superhuman level mastery in the game of Go.
Let’s first look at each individual piece of this puzzle before going into how these pieces are tied together to achieve the task at hand.
Deep Neural Networks
DNNs are a technique to perform machine learning, loosely inspired by the functioning of the human brain. A DNN’s architecture consists of layers of neurons. DNN can recognise patterns in data without being explicitly programmed for it.
It maps the inputs to outputs without anyone specifically programming it for the same. As an example, let us assume that we have fed the network with a lot of cat and dog photos. At the same time, we are also training the system by telling it (in the form of labels) if a particular image is of a cat or a dog (this is called supervised learning). A DNN will learn to recognise the pattern from the photos to successfully differentiate between a cat and a dog. The main objective of the training is that when a DNN sees a new picture of either a dog or a cat, it should be able to correctly classify it, i.e. predict if it is a cat or a dog.
Let us understand the architecture of a simple DNN. The number of neurons in the input layer corresponds to the size of the input. Let us assume our cat and dog photos are a 28×28 image. Each row and column will consist of 28 pixels each, which makes it a total of 784 pixels for each picture. In such a case the input layer will comprise of 784 neurons, one for each pixel. The number of neurons in the output layer will depend on the number of classes into which the output needs to be classified. In this case, the output layer will consist of two neurons – one corresponding to ‘cat’, the other to ‘dog’.
There will be many neuron layers in between the input and output layers (which is the origin of using the term ‘Deep’ in ‘Deep Neural Network’). These are called “hidden layers”. The number of hidden layers and the number of neurons in each layer is not fixed. In fact, changing these values is exactly what leads to optimisation of performance. These values are called hyper-parameters, and they need to be tuned according to the problem at hand. The experiments surrounding neural networks largely involve finding out the optimal number of hyperparameters.
The training phase of DNNs will consist of a forward pass and a backward pass. First, all the connections between the neurons are initialised with random weights. During the forward pass, the network is fed with a single image. The inputs (pixel data from the image) are combined with the parameters of the network (weights, biases and activation functions) and feed-forwarded through hidden layers, all the way to the output, which returns a probability of a photo belonging to each of the classes.
Then, this probability is compared with the actual class label, and an “error” is calculated. At this point, the backward pass is performed – this error information is passed back through the network through a technique called “back-propagation”. During initial phases of training, this error will be high, and a good training mechanism will gradually reduce this error.
The DNNs are trained in this way with a forward and backward pass until the weights stop changing (this is known as convergence). Then the DNNs will be able to predict and classify the images with a high degree of accuracy, i.e. whether the picture has a cat or a dog.
Research has given us many different Deep Neural Network Architectures. For Computer Vision problems (i.e. problems involving images), Convolution Neural Networks (CNNs) have traditionally given good results. For issues which involve a sequence – speech recognition or language translation – Recurrent Neural Networks (RNN) provide excellent results.
In the case of AlphaGo, the process was as follows: first, the Convolution Neural Network (CNN) was trained on millions of images of board positions. Next, the network was informed about the subsequent move played by the human experts in each case during the training phase of the network. In the same manner as earlier mentioned, the actual value was compared with the output and some sort of “error” metric was found.
At the end of the training, the DNN will output the next moves along with probabilities which are likely to be played by an expert human player. This kind of network can only come up with a step which is played by a human expert player. DeepMind was able to achieve an accuracy of 60% in predicting the move that the human would make. However, to beat a human expert at Go, this is not sufficient. The output from the DNN is further processed by Deep Reinforcement Network, an approach conceived by DeepMind, which combines deep neural networks and reinforcement learning.
Deep Reinforcement Learning
Reinforcement learning (RL) is not a new concept. Nobel prize laureate Ivan Pavlov experimented on classical conditioning on dogs and discovered the principles of reinforcement learning in 1902. RL is also one of the methods with which humans learn new skills. Ever wondered how the Dolphins in shows are trained to jump to such great heights out of the water? It is with the help of RL. First, the rope which is used for preparing the dolphins is submerged in the pool. Whenever the dolphin crosses the cable from the top, it is rewarded with food. When it does not cross the rope the reward is withdrawn. Slowly the dolphin will learn that it is paid whenever it passes the cord from above. The height of the rope is increased gradually to train the dolphin.
Agents in reinforcement learning are also trained using the same principle. The agent will take action and interact with the environment. The action taken by the agent causes the environment to change. Further, the agent received feedback about the environment. The agent is either rewarded or not, depending on its action and the objective at hand. The important point is, this objective at hand is not explicitly stated for the agent. Given sufficient time, the agent will learn how to maximise future rewards.
Combining this with DNNs, DeepMind invented Deep Reinforcement Learning (DRL) or Deep Q Networks (DQN) where Q stands for maximum future rewards obtained. DQNs were first applied to Atari games. DQN learnt how to play different types of Atari games just out of the box. The breakthrough was that no explicit programming was required for representing different kinds of Atari games. A single program was smart enough to learn about all the different environments of the game, and through self-play, was able to master many of them.
In 2014, DQN outperformed previous machine learning methods in 43 of the 49 games (now it has been tested on more than 70 games). In fact, in more than half the games, it performed at more than 75% of the level of a professional human player. In certain games, DQN even came up with surprisingly far-sighted strategies that allowed it to achieve the maximum attainable score—for example, in Breakout, it learned to first dig a tunnel at one end of the brick wall, so the ball would bounce around the back and knock out bricks from behind.
Policy and Value Networks
There are two main types of networks inside AlphaGo:
One of the objectives of AlphaGo’s DQNs is to go beyond the human expert play and mimic new innovative moves, by playing against itself millions of times and thereby incrementally improving the weights. This DQN had an 80% win rate against common DNNs. DeepMind decided to combine these two neural networks (DNN and DQN) to form the first type of network – a ‘Policy Network’. Briefly, the job of a policy network is to reduce the breadth of the search for the next move and to come up with a few good moves which are worth further exploration.
Once the policy network is frozen, it plays against itself millions of times. These games generate a new Go dataset, consisting of the various board positions and the outcomes of the games. This dataset is used to create an evaluation function. The second type of function – the ‘Value Network’ is used to predict the outcome of the game. It learns to take various board positions as inputs and predict the outcome of the game and the measure of it.
Combining the Policy and Value Networks
After all this training, DeepMind finally ended up with two neural networks – Policy and Value Networks. The policy network takes the board position as an input and outputs the probability distribution as the likelihood of each of the moves in that position. The value network again takes the position of the board as input and outputs a single real number between 0 and 1. If the output of the network is zero, it means that white is completely winning and 1 indicates a complete win for the player with black stones.
The Policy network evaluates current positions, and the value network evaluates future moves. The division of tasks into these two networks by DeepMind was one of the major reasons behind the success of AlphaGo.
Combining Policy and Value networks with Monte Carlo Tree Search (MCTS) and Rollouts
The neural networks on their own will not be enough. To win the game of Go, some more strategising is required. This plan is achieved with the help of MCTS. Monte Carlo Tree Search also helps in stitching the two neural networks together in an innovative way. Neural networks assist in an efficient search for the next best move.
Let’s try constructing an example which will help you visualise all of this much better. Imagine that the game is in a new position, one which has not been encountered before. In such a situation, a policy network is called upon to evaluate the current situation and possible future paths; as well as the desirability of the paths and the value of each move by the Value networks, supported by Monte Carlo rollouts.
Policy network finds all the possible “good” moves and value networks evaluate each of their outcomes. In Monte Carlo rollouts, a few thousand random games are played from the positions recognised by the policy network. Experiments were done to determine the relative importance of value networks against Monte Carlo rollouts. As a result of this experimentation, DeepMind assigned 80% weightage to the Value networks and 20% weightage to the Monte Carlo rollout evaluation function.
The policy network reduces the width of the search from 200-odd possible moves to the 4 or 5 best moves. The policy network expands the tree from these 4 or 5 steps which need consideration. The value network helps in cutting down the depth of the tree search by instantly returning the outcome of the game from that position. Finally, the move with the highest Q value is selected, i.e. the step with maximum benefit.
“The game is played primarily through intuition and feel, and because of its beauty, subtlety and intellectual depth it has captured the human imagination for centuries.”
– Demis Hassabis
Application of AlphaGo to real-world problems
The vision of DeepMind, from their website, is very telling – “Solve intelligence. Use this knowledge to make the world a better place”. The end goal of this algorithm is to make it general-purpose so that it can be used to solve complex real-world problems. DeepMind’s AlphaGo is a significant step forward in the quest for AGI. DeepMind has used its technology successfully to solve real-world problems – let’s look at some examples:
Reduction in energy consumption
DeepMind’s AI was successfully utilised to reduce Google’s data centre cooling cost by 40%. In any large-scale energy consuming environment this improvement is a phenomenal step forward. One of the primary sources of energy consumption for a data centre is cooling. A lot of heat generated from running the servers needs to be removed for keeping it operational. This is accomplished by large-scale industrial equipment like pumps, chillers and cooling towers. As the environment of the data centre is very dynamic, it is challenging to operate at optimal energy efficiency. DeepMind’s AI was used to tackle this problem.
First, they proceeded using historical data, which was collected by thousands of sensors within the data centre. Using this data, they trained an ensemble of DNNs on average future Power Usage Effectiveness (PUE). As this is a general-purpose algorithm, it is planned that it will be applied to other challenges as well, in the data centre environment.
The possible applications of this technology include getting more energy from the same unit of input, reducing semiconductor manufacturing energy and water usage, etc. DeepMind announced in its blog post that this knowledge would be shared in a future publication so that other data centres, industrial operators and ultimately the environment can greatly benefit from this significant step.
Popular AI and ML Blogs & Free Courses
IoT: History, Present & Future
Machine Learning Tutorial: Learn ML
What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles
A Day in the Life of a Machine Learning Engineer: What do they do?
What is IoT (Internet of Things)
Permutation vs Combination: Difference between Permutation and Combination
Top 7 Trends in Artificial Intelligence & Machine Learning
Machine Learning with R: Everything You Need to Know
AI & ML Free Courses
Introduction to NLP
Fundamentals of Deep Learning of Neural Networks
Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World
Introduction to Tableau
Case Study using Python, SQL and Tableau
Radiotherapy planning for head and neck cancers
DeepMind has collaborated with the radiotherapy department at University College London Hospital’s NHS Foundation Trust, a world leader in cancer treatment.
One in 75 men and one in 150 women are diagnosed with oral cancer in their lifetime. Due to the sensitive nature of the structures and organs in the head and neck area, radiologists need to take extreme care while treating them.
Before radiotherapy is administered, a detailed map needs to be prepared with the areas to be treated and the areas to be avoided. This is known as segmentation. This segmented map is fed into the radiography machine, which will then target cancer cells without harming healthy cells.
In the case of cancer of the head or neck region, this is a painstaking job for the radiologists as it involves very sensitive organs. It takes around four hours for the radiologists to create a segmented map for this area. DeepMind, through its algorithms, is aiming to reduce the time required for generating the segmented maps, from four to one hour. This will significantly free up the radiologist’s time. More importantly, this segmentation algorithm can be utilised for other parts of the body.
To summarise, AlphaGo successfully beat the 18-time world Go champion, Lee Seedol, four times in a best-of-five tournament in 2016. In 2017, it even beat a team of the world’s best players. It uses a combination of DNN and DQN as a policy network for coming up with the next best move, and one DNN as a value network to evaluate the outcome of the game. Monte Carlo tree search is used along with both the policy and value networks to reduce the width and depth of the search – they are used to improve the evaluation function. The ultimate aim of this algorithm is not to solve board games but to invent an Artificial General Intelligence algorithm. AlphaGo is undoubtedly a big step ahead in that direction.
Of course, there have been other effects. As the news of AlphaGo Vs Lee Seedol became viral, the demand for Go boards jumped tenfold. Many stores reported instances of Go boards going out of stock, and it became challenging to purchase a Go board.
Fortunately, I just found one and ordered it for myself and my kid. Are you planning to buy the board and learn Go?
Learn ML courses from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Read More14 Feb 2018

Blogs

5676

Cancer is not one disease. It is many diseases. Let us understand the cause of cancer by a simple example. If you take a photocopy of a document, due to some issues, other dots or smears appear on it even though they are not present in the original copy. In the same way, in gene replication processes, errors occur inadvertently. Most of the time the genes with errors will not be able to sustain and will ultimately perish.
In some rare cases, the mutated gene with mistakes will survive and get further replicated uncontrollably. Uncontrollable replication of mutated genes is the primary cause of cancer. This mutation can happen in any of the twenty thousand genes in our body. Variation in any one or a combination of genes makes cancer a severe disease to conquer. To eradicate cancer, we need methods to destroy the rogue cells without harming the functional cells of the body; which makes it doubly hard to defeat.
Cancer and its complexity
Cancer is a disease with a long tail distribution. Long tail distribution means there are various reasons for this condition to occur and there is no single solution for eradicating it. There are diseases which affect a large percentage of the population but have a sole cause of occurrence. For example, let us consider Cholera. Eating food or drinking water contaminated by the bacterium Vibrio Cholerae is the cause of cholera. Cholera can occur only because of Vibrio Cholerae, and there is no another reason. Once we find out the only cause of a disease, then it is relatively easy to conquer it.
What if a condition occurs because of multiple reasons? A mutation can occur in any of the twenty thousand genes in our body. Not only that, but we also need to consider their combinations. Cancer may not just happen because of a random mutation in a gene but also because of a combination of gene mutations. The number of causes for cancer becomes exponential, and there is no single mechanism to cure it. For example, a mutation of any of these genes ALK, BRAF, DDR2, EGFR, ERBB2, KRAS, MAP2K1, NRAS, PIK3CA, PTEN, RET, and RIT1 can cause lung cancer. There are many ways for cancer to occur and that’s why it is a disease with long tail distribution.
In our arsenal for waging this war on cancer and conquering it, big data and machine learning are critical tools. How can big data help in fighting this war? What does machine learning have to do with cancer? How are they going to help in fighting a disease with many causes, a condition with a long tail distribution? Firstly, how and where is this big data generated? Let us find answers to these questions.
Gene Sequencing and explosion in data
Gene sequencing is one area which is producing humongous amounts of data. Exactly how much data? According to the Washington Post, the human data generated through gene sequencing (approximately 2.5 lakh sequences) takes up about a fourth of the size of YouTube’s yearly data production. If all this data were combined with all the extra information that comes with sequencing genomes and recorded on 4GB DVDs, it would be a stack about half a mile high.
Explore Our Software Development Free Courses
Fundamentals of Cloud Computing
JavaScript Basics from the scratch
Data Structures and Algorithms
Blockchain Technology
React for Beginners
Core Java Basics
Java
Node.js for Beginners
Advanced JavaScript
The methods for gene sequencing have improved over the years, and the cost for the same has plummeted exponentially. In the year 2008, the cost of gene sequencing was 10 million dollars. As of today, it is only a 1000 dollars. In the future, it is expected to reduce further. It is estimated that one billion people will have their genes sequenced by 2025. So, within the next decade, the genomics data generated will be somewhere between 2 – 40 exabytes in a year. An exabyte is ten followed by 17 zeros.
Before coming to how data will help in curing cancer, let us take one concrete example and see how data can help in conquering a disease. Data and its analysis helped in finding out the cause of one infectious disease and fight it, not now but in nineteenth-century itself! Yes, in the nineteenth century! The name of that disease is Cholera.
Clustering in the Nineteenth Century – the Cholera breakthrough
John Snow was an anesthesiologist and cholera broke out in September 1854 near Snow’s house. To know the reason for cholera, Snow decided to note the spatial dimensions of the patients on the city map. He marked the location of the home address of patients on London’s city map. With this exercise, John Snow understood that people suffering from cholera were clustered around some specific water wells. He firmly believed that a contaminated pump was responsible for the epidemic and against the will of the local authorities replaced the pump. This replacement drastically reduced the spread of cholera.
Snow subsequently published a map of the outbreak to support his theory, showing the locations of the 13 public wells in the area, and the 578 cholera deaths mapped by home address. This map ultimately led to the understanding that cholera was an infectious disease and quickly spread through the medium of water. John Snow’s experiment is the earliest example of applying the clustering algorithm to know the cause of illness and help eradicate it. In the nineteenth century, John Snow could apply clustering algorithm on a London city map with a pencil. With cancer as the target disease, this level of analysis is not possible with the same ease as John Snow’s Analysis. We need sophisticated tools and technologies to mine this data. That is where we leverage the capabilities of modern technologies like Machine Learning and Big Data.
Explore our Popular Software Engineering Courses
Master of Science in Computer Science from LJMU & IIITB
Caltech CTME Cybersecurity Certificate Program
Full Stack Development Bootcamp
PG Program in Blockchain
Executive PG Program in Full Stack Development
View All our Courses Below
Software Engineering Courses
Big data and Machine learning – tools to fight cancer
Vast amounts of data along with machine learning algorithms will help us in our fight with cancer in many ways. It can help us with diagnosis, treatment, and prognosis. Mainly, it will help customise the therapy according to the patient, which is not possible otherwise. It will also help deal with the long tail of the distribution.
Given the enormous amounts of Electronic Medical Records (EMR), data generated and recorded by various hospitals; it is possible to use ‘labelled’ data in diagnosing cancer. Techniques like Natural Language Programming (NLP) are utilised for making sense of doctor’s prescriptions and Deep Learning Neural Networks are deployed to analyse CT and MRI scans. The different types of machine learning algorithms search the EMR databases and find hidden patterns. These hidden patterns will help in diagnosing cancers.
A college student was able to design an Artificial Neural Network from the comfort of her home and developed a model that can diagnose breast cancer with a high degree of accuracy.
In-Demand Software Development Skills
JavaScript Courses
Core Java Courses
Data Structures Courses
Node.js Courses
SQL Courses
Full stack development Courses
NFT Courses
DevOps Courses
Big Data Courses
React.js Courses
Cyber Security Courses
Cloud Computing Courses
Database Design Courses
Python Courses
Cryptocurrency Courses
Diagnosis with Big Data and Machine Learning
Brittanny Wenger was 16 years old when her older cousin was diagnosed with breast cancer. This inspired her to make the process better by improving the diagnostics. Fine Needle Aspiration (FNA) was a less invasive method of biopsy and the quickest method of diagnosis. The doctors were reluctant to use FNA because the results are not reliable. Brittanny thought of using her programming skills to do something about it. She decided to improve the reliability of FNA which would enable the women to choose less invasive and comfortable diagnostic methods.
Brittanny found public domain data from the University of Wisconsin that included Fine Needle Aspiration. She coded an Artificial Neural Network (ANN) which is inspired by the design of human brain architecture. She used cloud technologies to process the data and train the ANN to find the similarities. After many attempts and errors finally, her network was able to detect breast cancer from an FNA test data with 99.1% sensitivity to malignancy. This method is applicable for diagnosing other cancers as well.
The accuracy of diagnosis is dependent upon the amount and quality of the data available. The more the data available, the more the algorithms will be able to query the database, find similarities and come out with valuable models.
Treatment with Big Data and Machine Learning
Big data and Machine learning will be helpful not only for diagnosis but treatment as well. John and Kathy were married for three decades. At the age of 49, Kathy was diagnosed with stage III breast cancer. John, CIO of a Boston hospital helped plan her treatment with the help of big data tools that he designed and brought into existence.
In 2008, five Harvard affiliated hospitals shared their databases and created a powerful search tool known as ‘Shared Health Research Information Network’ (SHRINE). By the time of Kathy’s diagnosis, her doctors could sift through a database of 6.1 million records to find insightful information. Doctors queried ‘SHRINE’ with questions like “50-year-old Asian women, diagnosed with stage III breast cancer and their treatments”. Armed with this information doctors were able to treat her with chemotherapy drugs by targeting the estrogen-sensitive tumour cells by avoiding surgery.
By the time Kathy completed her chemotherapy regimen the radiologists could no longer find any tumour cells. This is one example of how big data tools can help in customising the treatment plan according to the requirement of each.
As cancer is a long tail distribution a ‘one size fits all’ philosophy will not work. For customising treatments depending on the patient’s history, their gene sequence, results of diagnostic tests, a mutation found in their genes or a combination of their genes and environment, big data and machine learning tools are indispensable.
upGrad’s Exclusive Software Development Webinar for you –
SAAS Business – What is So Different?
document.createElement('video');
https://cdn.upgrad.com/blog/mausmi-ambastha.mp4
Drug Discovery with Big Data and Machine Learning
Big data and Machine learning will not only help in diagnosis and treatment but also will revolutionise drug discovery. Researchers can use open data and computational resources to discover new uses for the drugs which are already approved by agencies like FDA for other purposes. For example, scientists at University of California at San Francisco found by number crunching that a drug called ‘pyrvinium pamoate’ which is used to treat pinworms – could shrink hepatocellular carcinoma, a type of liver cancer, in mice. This disease which is associated with the liver is the second highest contributor to cancer deaths in the world.
Not only is big data used for discovering new uses for old drugs but can also be used for detecting new drugs. By crunching data related to different drugs, chemicals, and their properties, symptoms of various diseases, the chemical composition of the drugs used for those conditions and side effects of these medications collected from different media; new drugs can be devised for various types of cancer. This will significantly reduce the time taken to come up with new medicines without wasting millions of dollars in the process.
Using big data and machine learning will no doubt improve the process of diagnosis, treatment and drug discovery in treating cancer, but it is not without challenges. There are many stumbling blocks and problems on the road ahead. If these blocks are not removed, and these challenges are not faced, then our enemy will get the upper hand and will defeat us in the future battle.
Read our Popular Articles related to Software Development
Why Learn to Code? How Learn to Code?
How to Install Specific Version of NPM Package?
Types of Inheritance in C++ What Should You Know?
Challenges in using Big Data and Machine Learning to fight Cancer
Digitisation
Except for a few large and technically advanced hospitals, most of them are yet to be digitised. They are still following the old methods of capturing and recording data in massive stacks of files. Due to lack of technical expertise, affordability, economies of scale and various other reasons, digitisation has not taken place. Provision of open source EMR software, teaching how helpful these digital records could be in treating the patients and how profitable it is to the hospitals are some steps in the right direction.
Data locked in enterprise warehouses
As of today, only a few hospitals can digitally capture patient records. This apparatus too is locked away in enterprise warehouses and inaccessible to the world at large.
Hospitals are reluctant to share their databases with other hospitals. Even if they are willing, they are plagued by the different database schemas and architectures. Critical thinking is required on this front about how hospitals can share their databases among themselves for their mutual benefit without being suspicious of each other. A consensus needs to be reached about the schema in which this data should be shared as well, for the benefit of all hospitals. This patient data should be democratised and utilised for the betterment of the future of mankind.
Patient data should not be allowed to be employed for the growth of a single organisation. Utmost care should be taken to anonymise the individual to whom the data belongs. If a person’s lipstick preference is leaked, then there is not much harm. If a person’s medical history is leaked, then it will have a significant impact on his life and prospects.
The government should take positive steps in this direction and should help create a big data infrastructure for storing medical records of patients from all hospitals. It should make it compulsory for all hospitals to share their database within this shared infrastructure. Access to this database should be made free for patient treatment and research.
Improvement in efficiency of Machine Learning Algorithms
Machine learning is not a magic pill for cancer diagnosis and treatments. It is a tool that if used well can help in our journey to conquer cancer. Machine learning is still in a nascent stage and has its disadvantages. For example, the data on which these algorithms are trained needs to be very close to the data on which they are utilised for producing results. If there is a huge difference in them, then the algorithm will not be able to provide meaningful results which can be employed.
There are many machine learning algorithms which exist with their own peculiar assumptions, advantages, and disadvantages. If we can find a way to combine all these different algorithms for achieving the results required by us, i.e. curing cancer, needless to say, we would have found a hugely beneficial outcome. The famous machine learning scientist Pedro Domingos calls it “The Master Algorithm”, who also wrote a popular science book of the same name.
According to Pedro, there are five different schools of thought in machine learning. The symbolist, connectionist, Bayesian, evolutionaries and analogisers. It is difficult to go into all these different types of machine learning systems in this article. I will cover all the five types of machine learning systems in one of my future blogs. For now, we need to understand that all these different methods have advantages and disadvantages of their own. If we can combine them, then we can derive highly impactful insights from our data. This will be immensely useful not only for all kinds of predictions and forecasts but also for our fight against a vengeful enemy – cancer.
To summarise, cancer is a formidable enemy which keeps changing its form frequently. We do possess new weapons in our arsenal now in the form of big data and machine learning, however, to face it competently. But to demolish it entirely we need a more powerful weapon than what we presently possess. The name of that weapon is ‘The Master Algorithm’.
We also need to make some changes in the strategies and methods with which we are fighting this enemy. These changes are creating a big data infrastructure, making it compulsory for hospitals to share anonymised patient records, maintaining the security of the database and allowing free access to the database for patient treatment and research to cure cancer.
Get data science certification from the World’s top Universities. Learn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Wrapping up
If you are interested to know more about Big Data, check out our Advanced Certificate Programme in Big Data from IIIT Bangalore.
Learn Software Engineering degrees online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Read More08 Jan 2018