Data Science and Machine Learning Interviews revolve a lot around Machine Learning algorithms and techniques. Linear Regression is the most frequently asked of them as it is generally the most basic algorithm one studies. Not only that, Linear Regression is widely used across the industry in multiple domains.
Linear Regression Interview Questions & Answers
Question 1: How Does Linear Regression Work?
Linear Regression, as its name implies, tries to model the data using a linear relation of the independent variables to the dependent variable or the target. If there is just one independent variable/feature, it is called Simple Linear Regression. If there are multiple features, it is called Multivariate Linear Regression.
Regression, basically, means finding the best fit line/curve to your numerical data — a functional approximation of the data. That is, you want a mapping function of your input data to the output data (target). This mapping function is written as:
Ŷ = W*X + B
where B is the intercept and W is the slope of the line and Ŷ is the predicted output. The optimum values of W and B need to be found to find the best fit line
Question 2: How Does Linear Regression Find Optimal Point?
Linear Regression uses the Least Squares method to find the optimal point where the squared error is minimum. It finds the optimal values of the weight by an iterative and approximation method called Gradient Descent. Initially, random values of the weights are taken and then the loss is calculated for each instance.
After calculating the cumulative error of the whole dataset, a small step towards the minima is taken and the weights are updated by this change. Slowly, by taking these small steps towards the minima, the values of the weights reach approximately to the minima and the algorithm exits.
Question 3: What is Learning Rate?
Learning Rate or alpha is a hyperparameter that needs to be of the optimal value for the algorithm to converge quickly with the least error. Alpha controls the magnitude of the step size taken during Gradient Descent for converging to global minima.
The bigger the value of alpha, the larger will be the step size and the convergence might be faster. If alpha is too small, then it might take a long time to converge. But if the alpha is too big then it might start overshooting and not converge at all. Finding the right value of alpha is done during Hyperparameter optimization.
Question 4: What are the Assumptions of Linear Regression?
Linear Regression makes a lot of assumptions about the data to make calculations easier. And that makes it a lot more vulnerable to poor results as the data might not agree with those assumptions. Some of the most vulnerable assumptions are:
- Linear Relationship: First and the most obvious assumption it makes is that the features are linearly related to the target. In other words, the best fit line will be linear. But this usually is not the case most of the times.
- No Multicollinearity: Linear Regression tries to estimate coefficients of all the features according to their impact on the target. But this calculation is hampered when features themselves are dependent/collinear to each other.
- Homoscedasticity: With reference to LR, Homoscedasticity means that the errors or the residuals have similar values. In other words, if you plot the residuals vs predicted values, there should be no clear pattern. However, if the data has heteroscedasticity, the assumption would be broken and results can’t be trusted.
Question 5: What are the Different Types of Gradient Descent in Linear Regression?
There are mainly 3 types of gradient descents.
Vanilla Gradient Descent updates the weights after every epoch, which means that in essence, it takes the average loss of all the iterations of training instances and then updates the weights at the end of the epoch.
This is not ideal as it might not capture details, hence Stochastic Gradient Descent updates the weights with the loss obtained in every iteration in every epoch. That’s a lot of updates! So this makes the optimization curve noisy and time-consuming as well.
Mini-Batch Gradient Descent is sort of a middle ground between Vanilla and Stochastic. It forms batches of the complete dataset and then updates the weights at the end of every batch. This not only makes the optimization better and faster but also helps when the dataset is huge and you cannot load all of it at once.
Question 6: What is Heteroscedasticity?
With reference to Linear Regression, Heteroscedasticity simply means that the residuals of the observations do not possess the same variances. This would mean that the observations are actually from different probability distributions with different variances. And this defies one of the assumptions of Linear Regression. The quickest way to check for Heteroscedasticity would be to plot residuals against the predictions and see for any pattern. If a pattern exists, there might be Heteroscedasticity present.
Question 7: What is Multicollinearity and How can it Impact the Model?
Multicollinearity occurs when multiple features in a regression model are correlated or dependent on each other to some extent. Change in the value of one feature will also force change the value of features collinear to it. In other words, such features add no more information to the model. This can lead to Overfitting as it might give unpredictable results on unseen data.
Question 8: How to Measure Multicollinearity?
To measure Multicollinearity, the 2 most common techniques are – Correlation Matrix and Variance Inflation Factor(VIF). The correlation Matrix just contains the correlation values of each feature with every other feature. Extreme values signify a high correlation.
VIF is another method to quantify correlation, with the value of 1 meaning no Collinearity and >5 meaning high collinearity.
Question 9: What are the Loss Functions used in Linear Regression?
Mean Squared Error and Root Mean Squared Error are the two most common loss functions used in Linear Regression.
Question 10: What Metrics are used for Linear Regression?
The most common metrics used for Linear Regression are R Squared score and Adjusted R Squared score. The higher the value of R2, the better is the performance of the model. However, this is not true all the times as R2 always increases upon adding new features. This means that even if the feature is not significant, the R2 value will still increase. This shortcoming is overcome by Adjusted R Square which increases only if the newly added feature is significant.
Question 11: What are the Limitations of Linear Regression?
One limitation of LR is that it is quite sensitive to outliers in the data. Another limitation is the high bias in it due to its assumptions of the data. This can lead to a very poor model.
Question 12: What are the Different Types of Regularized Regression Algorithms?
There are mainly two types of regularized versions of Linear Regression: Ridge and Lasso. Both the algorithms include a penalty term which helps reduce the overfitting of the linear model. Lasso applies the absolute penalty, so some terms or weights of features less significant reduce to zero. With Ridge, the coefficients of less significant features come close to zero as it uses squared penalties.
Also Read: Linear Regression Models
Linear Regression is the most fundamental algorithm in Machine Learning. In this tutorial, we covered some fundamental questions that are very frequently asked in interviews. The interviewers can also ask scenario-based questions by giving examples of some data and results.
upGrad provides a PG Diploma in Machine Learning and AI and a Master of Science in Machine Learning & AI that may guide you toward building a career. These courses will explain the need for Machine Learning and further steps to gather knowledge in this domain covering varied concepts ranging from Gradient Descent to Machine Learning.