Welcome to the second part of the series of commonly asked interview questions based on machine learning algorithms. We hope that the previous section on Linear Regression was helpful to you.
Machine learning is a growing field. Its demand is increasing and the market is expected to grow very rapidly in the coming years. The demand for machine learning is high because of its vast applicability. There is no limitation to its applications. With the growing technology, the uses of machine learning are almost everywhere from a simple switch to big giant technologies. It is considered one of the highest-paying careers in today’s times. The average salary of a machine learning engineer is 7.5 LPA and the salary ranges from 3.5 LPA to 21.0 LPA it can be more than that as well owing to the
experience, skillsets, and upskilling history. (Source)
Logistic regression is a machine learning classification algorithm. It is a statistical analysis method to predict the binary outcome. It predicts a dependent variable by analysing the relationship between one or more independent variables. It is about fitting a curve to a data as opposed to the linear regression that is about fitting a line in the data.
Logistic regression is vastly applicable and can be used to predict for data sets such as whether a political candidate will win or no or whether a patient will have herart attack ornot. This is how to explain logistic regression in interview.
Let’s find the answers to questions on logistic regression:
1. What is a logistic function? What is the range of values of a logistic function?
f(z) = 1/(1+e -z )
The values of a logistic function will range from 0 to 1. The values of Z will vary from -infinity to +infinity.
2. Why is logistic regression very popular?
Logistic regression is famous because it can convert the values of logits (logodds), which can range from -infinity to +infinity to a range between 0 and 1. As logistic functions output the probability of occurrence of an event, it can be applied to many real-life scenarios. It is for this reason that the logistic regression model is very popular.
It is one of the most commonly asked logistic regression questions. Logistic regression is also predictive analysis just like all the other regressions and is used to describe the relationship between the variables. There are many real-life examples of logistic regression such as the probability of predicting a heart attack, the probability of finding if the transaction is going to be fraudulent or not, etc.
In-demand Machine Learning Skills
3. What is the formula for the logistic regression function?
4. How can the probability of a logistic regression model be expressed as conditional probability?
P(Discrete value of Target variable | X1, X2, X3….Xk). It is the probability of the target variable taking up a discrete value (either 0 or 1 in case of binary classification problems) when the values of independent variables are given. For example, the probability an employee will attrite (target variable) given his attributes such as his age, salary, KRA’s, etc.
5. What are the odds?
These types of logistic regression questions and answers are being asked during the interview to understand the level of basic foundation the candidate has. It is the ratio of the probability of an event occurring to the probability of the event not occurring. For example, let’s assume that the probability of winning a lottery is 0.01. Then, the probability of not winning is 1- 0.01 = 0.99.’
The odds of winning the lottery = (Probability of winning)/(probability of not winning)
The odds of winning the lottery = 0.01/0.99
The odds of winning the lottery are 1 to 99, and the odds of not winning the lottery are 99 to 1.
6. What are the outputs of the logistic model and the logistic function?
The logistic model outputs the logits, i.e. log odds; and the logistic function outputs the probabilities.
Logistic model = α+1X1+2X2+….+kXk. The output of the same will be logits.
Logistic function = f(z) = 1/(1+e-(α+1X1+2X2+….+kXk)). The output, in this case, will be the probabilities.
Best Machine Learning Courses & AI Courses Online
7. How to interpret the results of a logistic regression model? Or, what are the meanings of alpha and beta in a logistic regression model?
Alpha is the baseline in a logistic regression model. It is the log odds for an instance when all the attributes (X1, X2,………….Xk) are zero. In practical scenarios, the probability of all the attributes being zero is very low. In another interpretation, Alpha is the log odds for an instance when none of the attributes is taken into consideration.
Beta is the value by which the log odds change by a unit change in a particular attribute by keeping all other attributes fixed or unchanged (control variables).
The beta in logistical regression is associated with predictor X, where X is representing the expected change in log odds. Whereas, the alpha is a constant.
Join the Artificial Intelligence Course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.
8. What is odds ratio?
The Odds ratio is the ratio of odds between two groups. The odd ratio is carried out to obtain the ratio of more than one variable that is involved. For example, let’s assume that we are trying to ascertain the effectiveness of a medicine. We administered this medicine to the ‘intervention’ group and a placebo to the ‘control’ group.
Odds ratio (OR) = (odds of the intervention group)/(odds of the control group)
If the odds ratio = 1, then there is no difference between the intervention group and the control group
If the odds ratio is greater than 1, then the control group is better than the intervention group
If the odds ratio is less than 1, then the intervention group is better than the control group.
9. What is the formula for calculating the odds ratio?
In the formula above, X1 and X0 stand for two different groups for which the odds ratio needs to be calculated. X1i stands for the instance ‘i’ in group X1. Xoi stands for the instance ‘i’ in group X0. stands for the coefficient of the logistic regression model. Note that the baseline is not included in this formula.
In-demand Machine Learning Skills
10. Why can’t linear regression be used in place of logistic regression for binary classification?
The reasons why linear regressions cannot be used in the case of binary classification are as follows:
Distribution of error terms: The distribution of data in the case of linear and logistic regression is different. Linear regression assumes that error terms are normally distributed. In the case of binary classification, this assumption does not hold true.
Model output: In linear regression, the output is continuous. In the case of binary classification, an output of a continuous value does not make sense. For binary classification problems, linear regression may predict values that can go beyond 0 and 1. If we want the output in the form of probabilities, which can be mapped to two different classes, then its range should be restricted to 0 and 1. As the logistic regression model can output probabilities with logistic/sigmoid function, it is preferred over linear regression.
Variance of Residual errors: Linear regression assumes that the variance of random errors is constant. This assumption is also violated in the case of logistic regression.
This can be asked in an alternate ways such as , “Logistic regression error values are normally distributed. state if it is true or false?” or “ Select the wrong statement about the logistic regression?”
Linear Regression is a model that is used to estimate the relationship between two variables, one dependent and one independent variable using a straight line. Linear Regression is helpful in predicting the value of a variable based on another value as two variables are involved here. The prediction done using linear regression provides a scientific and accurate depth to the study.
11. Is the decision boundary linear or nonlinear in the case of a logistic regression model?
The decision boundary is a line that separates the target variables into different classes. The decision boundary can either be linear or nonlinear. In the case of a logistic regression model, the decision boundary is a straight line.
Logistic regression model formula = α+1X1+2X2+….+kXk. This clearly represents a straight line. Logistic regression is only suitable in such cases where a straight line is able to separate the different classes. If a straight line is not able to do it, then nonlinear algorithms should be used to achieve better results.
The importance of decision boundaries is high. It is a known fact that the decision boundary is the surface that separates the data points belonging to different class labels. These are not limited to the data points that are already provided. The model has the feature of making predictions for any new possible combinations as well.
12. What is the likelihood function?
The likelihood function is the joint probability of observing the data. For example, let’s assume that a coin is tossed 100 times and we want to know the probability of getting 60 heads from the tosses. This example follows the binomial distribution formula.
p = Probability of heads from a single coin toss
n = 100 (the number of coin tosses)
x = 60 (the number of heads – success)
n-x = 30 (the number of tails)
Pr(X=60 |n = 100, p)
The likelihood function is the probability that the number of heads received is 60 in a trail of 100 coin tosses, where the probability of heads received in each coin toss is p. Here the coin toss result follows a binomial distribution.
This can be reframed as follows:
Pr(X=60|n=100,p) = c x p60x(1-p)100-60
c = constant
p = unknown parameter
The likelihood function gives the probability of observing the results using unknown parameters.
13. What is the Maximum Likelihood Estimator (MLE)?
The MLE chooses those sets of unknown parameters (estimator) that maximise the likelihood function. The method to find the MLE is to use calculus and setting the derivative of the logistic function with respect to an unknown parameter to zero, and solving it will give the MLE. For a binomial model, this will be easy, but for a logistic model, the calculations are complex. Computer programs are used for deriving MLE for logistic models.
(Here’s another approach to answering the question.)
MLE is a statistical approach to estimating the parameters of a mathematical model. MLE and ordinary square estimation give the same results for linear regression if the dependent variable is assumed to be normally distributed. MLE does not assume anything about independent variables.
The point in the parameters that aim to maximise the likelihood function is famously known as the maximum likelihood estimate. This method has gained popularity for statistical inference owing to its intuitive and flexible features.
The maximum likelihood estimators have some interesting features such as consistency functional equivariance efficiency and second order efficiency. These features allow better scope for reliable outputs.
The maximum likelihood estimator is useful for getting unbiased output in the case of large data sets as well. Along with this, it facilitates achieving a consistent yet flexible approach while making it ideal for a broad range of applications.
14. What are the different methods of MLE and when is each method preferred?
In the case of logistics regression, there are two approaches of to MLE. They are conditional and unconditional methods. Conditional and unconditional methods are algorithms that use different likelihood functions. The unconditional formula employs a joint probability of positives (for example, churn) and negatives (for example, non-churn). The conditional formula is the ratio of the probability of observed data to the probability of all possible configurations.
The unconditional method is preferred if the number of parameters is lower compared to the number of instances. If the number of parameters is high compared to the number of instances, then conditional MLE is to be preferred. Statisticians suggest that conditional MLE is to be used when in doubt. Conditional MLE will always provide unbiased results.
15. What are the advantages and disadvantages of conditional and unconditional methods of MLE?
Conditional methods do not estimate unwanted parameters. Unconditional methods estimate the values of unwanted parameters also. Unconditional formulas can directly be developed with joint probabilities. This cannot be done with conditional probability. If the number of parameters is high relative to the number of instances, then the unconditional method will give biased results. Conditional results will be unbiased in such cases.
16. What is the output of a standard MLE program?
The output of a standard MLE program is as follows:
Maximised likelihood value: This is the numerical value obtained by replacing the unknown parameter values in the likelihood function with the MLE parameter estimator.
Estimated variance-covariance matrix: The diagonal of this matrix consists of estimated variances of the ML estimates. The off-diagonal consists of the covariances of the pairs of the ML estimates.
17. Why can’t we use Mean Square Error (MSE) as a cost function for logistic regression?
In logistic regression, we use the sigmoid function and perform a non-linear transformation to obtain the probabilities. Squaring this non-linear transformation will lead to non-convexity with local minimums. Finding the global minimum in such cases using gradient descent is not possible. Due to this reason, MSE is not suitable for logistic regression. Cross-entropy or log loss is used as a cost function for logistic regression. In the cost function for logistic regression, the confident wrong predictions are penalised heavily. The confident right predictions are rewarded less. By optimising this cost function, convergence is achieved.
18. Why is accuracy not a good measure for classification problems?
Accuracy is not a good measure for classification problems because it gives equal importance to both false positives and false negatives. However, this may not be the case in most business problems. For example, in the case of cancer prediction, declaring cancer as benign is more serious than wrongly informing the patient that he is suffering from cancer. Accuracy gives equal importance to both cases and cannot differentiate between them.
It is important to explain what is accuracy before answering this question. Accuracy as the name signifies is freedom from error. It is a condition or quality of being true, correct, and defect-free.It is not a good measure for classification problems in the case of imbalanced data.
19. What is the importance of a baseline in a classification problem?
Most classification problems deal with imbalanced datasets. Examples include telecom churn, employee attrition, cancer prediction, fraud detection, online advertisement targeting, and so on. In all these problems, the number of positive classes will be very low when compared to negative classes. In some cases, it is common to have positive classes that are less than 1% of the total sample. In such cases, an accuracy of 99% may sound very good but, in reality, it may not be.
Here, the negatives are 99%, and hence, the baseline will remain the same. If the algorithms predict all the instances as negative, then also the accuracy will be 99%. In this case, all the positives will be predicted wrongly, which is very important for any business. Even though all the positives are predicted wrongly, an accuracy of 99% is achieved. So, the baseline is very important, and the algorithm needs to be evaluated relative to the baseline.
A baseline is the most broken down or simplest possible prediction. A baseline is useful in understanding the reliability of any trained model.
20. What are false positives and false negatives?
False positives are those cases in which the negatives are wrongly predicted as positives. For example, predicting that a customer will churn when, in fact, he is not churning.
False negatives are those cases in which the positives are wrongly predicted as negatives. For example, predicting that a customer will not churn when, in fact, he churns.
The tests have a chance of having either false positives or false negatives. The professionals need to be extra cautious while working with the data to avoid any such scenarios of false positives and false negatives occurring.
21. What are the true positive rate (TPR), true negative rate (TNR), false-positive rate (FPR), and false-negative rate (FNR)?
TPR refers to the ratio of positives correctly predicted from all the true labels. In simple words, it is the frequency of correctly predicted true labels.
True Positives are the values that are actually positive and predicted positive.
TPR = TP/TP+FN
TNR refers to the ratio of negatives correctly predicted from all the false labels. It is the frequency of correctly predicted false labels.
True negatives are the values that are actually negative and predicted negative.
TNR = TN/TN+FP
FPR refers to the ratio of positives incorrectly predicted from all the true labels. It is the frequency of incorrectly predicted false labels.
False positives are the values that are actually negative and predicted positive.
FPR = FP/TN+FP
FNR refers to the ratio of negatives incorrectly predicted from all the false labels. It is the frequency of incorrectly predicted true labels.
False negatives are the values that are actually positive and predicted negative.
FNR = FN/TP+FN
22. What are precision and recall?
Precision is the proportion of true positives out of predicted positives. To put it in another way, it is the accuracy of the prediction. It is also known as the ‘positive predictive value’.
Precision = TP/TP+FP
Recall is the same as the true positive rate (TPR).
It is important to examine both precision and recall while evaluating a model’s effectiveness. Precision is known to be a fraction of relevant instances among the retrieved instances. And recall is a fraction of relevant instances that were retrieved.
23. What is the F-measure?
It is the harmonic mean of precision and recall. In some cases, there will be a trade-off between precision and recall. In such cases, the F-measure will drop. It will be high when both the precision and the recall are high. Depending on the business case at hand and the goal of data analytics, an appropriate metric should be selected.
F-measure = 2 X (Precision X Recall) / (Precision+Recall)
The F-score or F- measure is commonly used for evaluation o information retrieval system such as search engines, etc. The F- measure is used to measure the model accuracy. It combines precision and recall. And is defined as the harmonic mean of the precision and recall of the model. It measures the accuracy of a test.
24. What is accuracy?
It is the number of correct predictions out of all predictions made.
Accuracy = (TP+TN)/(The total number of Predictions)
25. What are sensitivity and specificity?
Specificity is the same as true negative rate, or it is equal to 1 – false-positive rate.
Specificity = TN/TN + FP.
Sensitivity is the true positive rate.
Sensitivity = TP/TP + FN
26. How to choose a cutoff point in the case of a logistic regression model?
The cutoff point depends on the business objective. Depending on the goals of your business, the cutoff point needs to be selected. For example, let’s consider loan defaults. If the business objective is to reduce the loss, then the specificity needs to be high. If the aim is to increase profits, then it is an entirely different matter. It may not be the case that profits will increase by avoiding giving loans to all predicted default cases. But it may be the case that the business has to disburse loans to default cases that are slightly less risky to increase the profits. In such a case, a different cutoff point, which maximises profit, will be required. In most instances, businesses will operate around many constraints. The cutoff point that satisfies the business objective will not be the same with and without limitations. The cutoff point needs to be selected considering all these points. As a thumb rule, choose a cutoff value that is equivalent to the proportion of positives in a dataset.
27. How does logistic regression handle categorical variables?
The inputs to a logistic regression model need to be numeric. The algorithm cannot handle categorical variables directly. So, they need to be converted into a format that is suitable for the algorithm to process. The various levels of a categorical variable will be assigned a unique numeric value known as the dummy variable. These dummy variables are handled by the logistic regression model as any other numeric value.
28. What is a cumulative response curve (CRV)?
In order to convey the results of an analysis to the management, a ‘cumulative response curve’ is used, which is more intuitive than the ROC curve. A ROC curve is very difficult to understand for someone outside the field of data science. A CRV consists of the true positive rate or the percentage of positives correctly classified on the Y-axis and the percentage of the population targeted on the X-axis. It is important to note that the percentage of the population will be ranked by the model in descending order (either the probabilities or the expected values). If the model is good, then by targeting a top portion of the ranked list, all high percentages of positives will be captured. As with the ROC curve, there will be a diagonal line that represents random performance. Let’s understand this random performance as an example. Assuming that 50% of the list is targeted, it is expected that it will capture 50% of the positives. This expectation is captured by the diagonal line, which is similar to the ROC curve.
29. What are the lift curves?
The lift is the improvement in model performance (increase in true positive rate) when compared to random performance. Random performance means if 50% of the instances are targeted, then it is expected that it will detect 50% of the positives. Lift is in comparison to the random performance of a model. If a model’s performance is better than its random performance, then its lift will be greater than 1.
In a lift curve, the lift is plotted on the Y-axis and the percentage of the population (sorted in descending order) on the X-axis. At a given percentage of the target population, a model with a high lift is preferred.
30. Which algorithm is better at handling outliers logistic regression or SVM?
Logistic regression will find a linear boundary if it exists to accommodate the outliers. Logistic regression will shift the linear boundary in order to accommodate the outliers. SVM is insensitive to individual samples. There will not be a major shift in the linear boundary to accommodate an outlier. SVM comes with inbuilt complexity controls, which take care of overfitting. This is not true in the case of logistic regression.
31. How will you deal with the multiclass classification problem using logistic regression?
The most famous method of dealing with multiclass classification using logistic regression is using the one-vs-all approach. Under this approach, a number of models are trained, which is equal to the number of classes. The models work in a specific way. For example, the first model classifies the datapoint depending on whether it belongs to class 1 or some other class; the second model classifies the datapoint into class 2 or some other class. This way, each data point can be checked over all the classes.
32. Explain the use of ROC curves and the AUC of a ROC Curve.
A ROC (Receiver Operating Characteristic) curve illustrates the performance of a binary classification model. It is basically a TPR versus FPR (true positive rate versus false-positive rate) curve for all the threshold values ranging from 0 to 1. In a ROC curve, each point in the ROC space will be associated with a different confusion matrix. A diagonal line from the bottom-left to the top-right on the ROC graph represents random guessing. The Area Under the Curve (AUC) signifies how good the classifier model is. If the value for AUC is high (near 1), then the model is working satisfactorily, whereas if the value is low (around 0.5), then the model is not working properly and just guessing randomly.
33. How can you use the concept of ROC in a multiclass classification?
The concept of ROC curves can easily be used for multiclass classification by using the one-vs-all approach. For example, let’s say that we have three classes ‘a’, ’b’, and ‘c’. Then, the first class comprises class ‘a’ (true class) and the second class comprises both class ‘b’ and class ‘c’ together (false class). Thus, the ROC curve is plotted. Similarly, for all three classes, we will plot three ROC curves and perform our analysis of AUC.
Popular Machine Learning and Artificial Intelligence Blogs
We have so far covered the two most basic ML algorithms, Linear and Logistic Regression, and we hope that you have found these resources helpful.
Learn ML Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
The next part of this series is based on another very important ML Algorithm, Clustering. Feel free to post your doubts and questions in the comment section below.
Co-authored by – Ojas Agarwal
What are the cumulative Gain and Lift charts?
A Gain and Lift chart is a visual approach to assess the efficiency of several machine learning models in various ways. In addition to assisting you in evaluating how successful your prediction model is, they visually display how the response rate of a targeted group differs from that of a randomly picked group. These diagrams are valuable in corporate settings, such as target marketing. They may also be applied in other fields, such as risk modeling, supply chain analytics, and so on. In other words, Gain and Lift charts are two ways of dealing with classification difficulties involving unbalanced data sets.
What are some of the assumptions made while using logistic regression?
Some assumptions are made while using logistic regression. One of them is that the continuous predictors have no influential values (extreme values or outliers). Logistic regression, which is divided into two classes, presupposes that the dependent variable be binary, whereas ordered logistic regression requires that the dependent variable be ordered. It is also assumed that there are no substantial intercorrelations (i.e. multicollinearity) among the predictors. It also considers that the observations are independent of one another.
Can I get a data scientist job if I have a fair knowledge of Machine Learning?
A Data Scientist collects, analyses, and interprets enormous volumes of data using sophisticated analytics technologies such as Machine Learning and Predictive Modeling. These are then utilized by company leaders to make the best business choices. Thus, in addition to other skills such as data mining and understanding of statistical research methodologies, Machine Learning is a critical competence for a Data Scientist. But if you want to work as a Data Scientist, you must also be familiar with big data platforms and technologies such as Hadoop, Pig, Hive, Spark, and others, as well as programming languages such as SQL, Python, and others.
How do I prepare for a machine learning interview?
Most of the machine learning interviews are conducted over a whiteboard. It is highly unlikely to be done via coding. So, it is a good idea to be prepared for some formulation and classifications. You might get some questions about a classification problem. But the most likely questions are formulation based. A typical machine learning interview consists of two parts. First is explaining your problem-solving approach, second is your coding skills. So, machine learning interviews are 80% about problem-solving and 20% about coding. While preparing for the interview, keep that in mind and practice accordingly.
What should I consider before applying for a machine learning job?
Machine learning is a field of computer science where we build algorithms which allow computers to learn things on their own. Generally, there are two kinds of machine learning jobs. One job is called data scientist. In this job, you will build the algorithms. Another job is called business analyst. In this job, you will use the algorithms built by data scientists. A machine learning job is all about doing some mathematical modelling/ programming/ research to solve huge data problems. You will have to have very strong knowledge of data structures in order to carry out data processing for machine learning tasks. Also, there will be a lot of mathematical modelling involved in this, so you should be good at the subjects you studied at the college level.
Is machine learning a good career option?
Absolutely. The reason why we say this is because machine learning is the future. Self-driving cars, self-flying drones, automated trading and many others are often powered by machine learning algorithms. And the best thing is, the demand for top notch machine learning engineers is incredibly high. So, by choosing this field, you will never have to worry about a lack of job opportunities!
Is logistic regression mainly used for regression?
Logistic regression is a methodoloyd to find a relationship between a dependent variable and one or more independent variables. It can be used for both regression and classification but it is mainly used for classification problems.
What is the impact of outliers on logistic regression?
Outliers are the values that have deviated from the expected range of values. These outliers impact the output and generate certain results. They have the ability to influence the results that invariably result in the incorrect results or analysis.
Is logistic regression sensitive to outliers?
Yes, logistic regression is sensitive to outliers. As these outliers can affect the result and influence the analysis
What is the main purpose of logistic regression?
The main purpose of logistic regression is to estimate the relationship between a dependent variable and one or more independent variables. It is used to make predictions about the categorical variables. For example, it can be used to predict whether or not a particularpatient wil get prone to a certain disease or whetehr or not a particular politcal candidate will win or not. The logistic regression will help in streamlining of mathematical by measuring the impact of multiple variables such as age, medical history, gender, etc. It is commonly used feature for binary classification in the machine learning model.
What is the difference between linear regression and logistic regression?
Linear regression model is used to predict the continuous dependent variable by utlising the given set of independent variables. Whereas, the logistic regression is used to predict the categorica dependentl variable by utilising the given set of independent variables.
What are the applications of logistic regression?
Logistic regression have wide range of applications such as- Predicting the probability of a candidate winning an election. Predicting the probability of a student getting admitted into a college. Predicting the probability of a person having a heart attack. Predicting the mortality in the patients. Predicting the successful transactions
What type of dataset is used for logistic regression?
Continuous and discrete datasets are majorly used in the logistic regression to classify new data.