What is Guesstimate?
Guesstimate is a methodological method of theory and evaluation; it helps you work efficiently with a higher degree of accuracy. It is the study of the data to consolidate the result. It is also an essential part of the Business Analyst or Data Scientist and Data Architects or Data Techies.
- Meaning: It’s about understanding the problem which you want to solve, and what is the purpose of that, why you want to solve it.
- Definition: It’s about the particular object and input and output of the flow of a process. To put it in a word, explanation.
- Guessing: It’s about the thought and conclusion- you are creating a particular object in your problem.
- Estimate: It’s about the estimate of the numbers on a given problem.
- Come up with an idea: Implement the idea with research and development.
When a guesstimate question can ask for the size of a market, it’s then called a “market-sizing” question.
Here are the basic questions about guesstimate:
- How many people wear blue in New York on a typical Monday?
- How many tennis balls can you fit into an aeroplane?
How to Approach Guesstimate?
The process of solving a guesstimate problem is pretty manageable:
- Look at the feasible parameters that may affect the final quantity and estimate its numbers.
- Take a step back and think.
- Clarify your thoughts.
- Voice your thoughts.
- Simple Math approach-
This approach is typically used when the number to guesstimate is a ratio of some sorts. The task is to obtain the numerator and denominator then we are done!
1. Per capita approach-
This approach is used when the number to guess can be thought of as a consumption item at a person, household, or population level within geography.
2. Supply & Demand approach-
This approach needs thinking of the guesstimate number from either the supply or the demand (or both) side of the item.
Generally speaking, you can propose guesstimates in one of these two ways:
- Top-down method
- Bottom-up method
In the top-down, you start with the largest possible universe, of which your guesstimate is a portion of.
With the broadest base at the top. To this universe, you then keep applying a set of conditions or filters (however you want to put it) that reduce the number from the universe to a number that is appropriate for your guesstimate.
The key to the top-down estimation process lies in:
- It is accurately identifying the starting universe.
- It is accurately identifying as many of the relevant conditions/filters and segments that apply to your guesstimate problem.
- Segments: Frequently, you will have first to segment the universe into buckets and apply different filters to each segment.
Tips for guesstimate questions for Data Science:
- Practice Presenting: We have to do the practice of presenting with the audience of the particular solution which you have completed.
- Practice Analyzing: Analyzing plays a vital role to make thought processes on the given problem.
- Practice with Numbers: Playing with the numbers or creating custom logic is always important.
While solving the guesstimate questions for Data Science, you need to understand these points:
- You’re describing this to someone who’s not in your head. The solution isn’t for you.
- At the same time, remember not to turn each aspect into an entirely new guesstimate itself! It’s easy to get swayed by your intelligence and analytical abilities.
- Focus on the question. Have you heard of analysis-paralysis?
What are the purposes of guesstimate questions for Data Science?
- To understand your capacity to understand a situation.
- To understand the scope of your ability to connect things, to reach an answer.
- To know your strength to prioritize and dismiss different parameters.
- To understand how well you work with inadequate information.
Here are some guesstimate questions for Data Science-
Question:1 Create an Experiment with the k-means algorithm on the UCI Iris data set:
In this experiment, Perform k-means clustering using all the features in the dataset, and then compare the clustering results with the true class label for all samples.
Use the Multiclass Logistic Regression module to perform multiclass classification and compare its performance with that of k-means clustering.
Question:2 In a very simple format, explain Precision & Recall?
Question:3 If you have been given a data set, how do you decide on which ML algorithm to the user?
Question:4 Is it better to have too many false positives? Or too many false negatives?
Question:5 What is model accuracy and model performance? What scenario can you apply?
Question:6 How do you ensure you are not over-fitting with a model? Explain with an example.
Question:7 When you run a binary classification tree algorithm is quite easy. In the Binary algorithm, how does the tree decide on which variable to split at the root node and its succeeding child nodes?
Question:8 How are NumPy and SciPy described?
Question:9 Write a basic Machine learning program to check the accuracy of the dataset importing any dataset using any classifier?
Question:10 Create a Regression algorithm to predict the price of a car based on different variables.
Question:11 Develop a model that uses different network features to detect which network activities are part of an intrusion/attack using Binary classifications.
Question:12 How to Group (Clustering) to find similar organizations together based on their Wikipedia description.
Question:13 How would you predict who will renew their subscriptions next month?
- What data would you need to solve this?
- What kind of analysis would you do?
- What kind of predictive models’ algorithms would be needed?
Question:14 How would you map nicknames (Alen, Bob, Alex, Tim, etc.) to real names?
Question:15 Create a prediction on whether scheduled passenger flight is delayed or not using a Binary-classifier with R or python script.
Question:16 Predict automobile prices using Linear Regression with Prepare and Cleaned the data by removing the normalized losses column.
Since it has many missing values, create an experiment and model.
Question:17 How many ways can you split 14 people into 4 teams of 5?
Question:18 Area under the standard normal curve is?
- Greater than 1
- Equal to 1
- Less than 1
Question:19 Create a Regression algorithm to predict the price of a car based on different variables.
Question:20 Your manager asked to build a random forest model with 10000 trees during your training, and you got a training error as 0.00. But, on testing, the validation error was 34.23. What basis will you assume what went wrong? How would you check your model if it’s not trained perfectly?
Question:21 ‘People who bought this, also bought…’ recommendations seen on Amazon are based on which algorithm?
Question:22 Which algorithms are linked in recommendations you see as ‘Today’s News and views’?
We hope this article helped you understand guesstimate questions for data science and how to overcome them. You will find more useful articles like this one at upGrad; we offer an extensive range of courses, MBA, Data Science, Machine Learning, etc. We provide mentorship from the industries’ best individuals!
If you are interested in learning Data Science and opt for a career in this field, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.