Introduction

Data Science – Everyone is talking about it. Every business, big or small, relies on data to make decisions to drive revenue, retain customers, understand patterns, launch new products and more.

Data scientists and analysts have been in demand over the last few years, with the sudden rise in job opportunities and more importantly because of sudden increase in the data being generated.

In this career guide, we will walk you through the skills that are must haves for Data Scientists, best practices to build an impressive resume, most frequently asked interview questions, potential career paths and how you can transition (based on your experience).

Data Science Skills
Data Science Skills

Deriving insights from large volumes of data to enable better decision-making and an even better customer experience has become the norm for competitive firms these days. Which is why having a data science skills in this world pays off well.

Data Science Interviews
Data Science Interviews

Irrespective of your background, you can potentially transition to the field of data analysis. If you’re an IT professional or you hold a PhD in statistics, with right skills you can rule the world of Data Science!

Data Science Career Paths
Data Science Career Paths

So what do these guys do? We understand there are a lot of jargons you must be coming across which all sound the same - analyst, scientist, engineer, etc. But what do they mean?

Data Science Skills

From Mathematics to Storytelling, hone top Data Science skills

Read Time : 17 Minutes

What to expect from this article

What are the skills you need to be a Data Scientist?

What are the skills you need to be a Data Scientist?

Deriving insights from large volumes of data to enable better decision-making and an even better customer experience has become the norm for competitive firms these days. Which is why having a data science skills in this world pays off well.

While there are skills specific to every role in the Data field and software applications one must be a master of, gaining certain skills can help you start off and chart your career in this rapidly growing domain.

Here are 4 essential skills a Data Science professional must absolutely have.

  1. Good Business Understanding:

    Data Science skills help in business problem-solving. They need to understand the variables in a business, the levers that they can potentially move to bring about a significant positive change, the external and internal factors that affect its growth and take all necessary decisions accordingly. Business understanding is a must-have and one of the most critical skills if you aspire to become a data analyst.

     

  2. Mathematics:

    Objective decision-making forms a very important part of how you arrive at the solution of any given decision. To be able to take decisions objectively you must rely on Mathematics. You need to find patterns, segment, make predictions based on historical information. You will need to use predictive algorithms, classification, and clustering algorithms to arrive at the best possible solution, and that is where Mathematics will come to the rescue.

     

  3. Technical Skills:

    You can identify and solve a problem using domain understanding and mathematical skills. But most businesses are not so simple. You will need to go through data sets that are way beyond your calculative abilities. In order to be able to replicate your algorithms and business solutions at scale, it is very important that you pick up tools such as R, Python, SQL, etc.

     

  4. Soft Skills:

    Last, but not the least of data analytics skills. You should be able to communicate your solution in the most simple and understandable format to the stakeholders. They might not know anything about KS Statistics, or root mean square error or your clustering algorithm but that is where your soft skills come in. Impactful communication, use of great visualisation and visualisation tools like Tableau, QlikView, GGPlot, etc. become really important.

     

Don’t to read more articles like this?

Subscribe and get more content like this in your inbox.

UpGrad
Check out how UpGrad can help you transition into a successful Data Scientist.
View Program

Data Science Interviews

Write the perfect resume & crack the Data Science interview

Read Time : 17 Minutes

What to expect from this article

Resume building and Interview Questions

Resume building and Interview Questions

Irrespective of your background, you can potentially transition to the field of data analysis. If you’re an IT professional or you hold a PhD in statistics, you just need to hone a couple of skills and you’ll be ready to rule the world with Data Science skills!

Having said that, we all have to impress recruiters and portray our skills and experience in a short period of time.

The first step towards that is, building a kickass resume.

Here are a few things your resume should definitely convey :

  1. Profile summary
  2. Educational Qualification
  3. Professional experience
  4. Leadership roles and personal achievements
Profile Summary

This section does the job of introducing your profile in 1-2 sentences. It also sets the initial impression in the mind of the recruiter as they go through the rest of the resume.

A summary should be able to communicate the following:

  1. Number of years of experience
  2. Industrial domain worked for
  3. Technical domain/skills
  4. Personal strengths

 

Educational background
  1. Most recent qualification should be at the top.
  2. For people with 0-4 years of experience, PG, UG and intermediate education could be included.
  3. For people with over 4 years of experience, you may only include PG and UG educational details.
  4. It should cover the Degree Name, Specialisation, University, Location and duration of the degree.
  5. Marks/Percentage/CGPA could be included if it is worth highlighting for example, though if you have scored less than 70%, you may not want to highlight that.
  6. For people with up to 4 years of experience, if you have academic achievements worth mentioning, it is a good idea to include them. For example your participation in National/International events, high GMAT Score, High CAT Score or IIT Rank.

 

Professional Experience

If you are currently in a mid to senior level, the ideal split for your resume would be 70-30 where 70% would focus on your work experience, and the rest would be your academic achievements and other activities. For people with less work experience, this split could be closer to 50-50.

If you have spent more than 1 year in your current role, it is important to elaborate on your most recent (or current job) followed by other jobs/organisations you have been in.

While describing your experience, an important differentiation to be aware of is between a Task and an Achievement. Tasks are generally the activities one performs while being in a job or role. Achievements are the outcomes that one achieves at the end of the tasks.

Achievements are what differentiate one professional from another while being in the same role since they are performing the same tasks.

 

Leadership roles and personal achievements
  1. For people with less than 5 years of experience, add national level events, awards. These could be in the areas of technology, music and arts, sports etc.
  2. Any volunteering experience with an NGO or government agency.
  3. Significant resulting from self-initiative. For example, started planting trees in the neighbourhood areas and gradually built a team of 15 people who joined hands to spread the word to adopt 5 more nearby areas.
  4. Initiatives taken during your employment with various organisations for example successfully planning a trip with the team, volunteering with CSR, etc.

Now that you have a fair idea of how to structure your resume, next up is the interview process.
To transition into a career everyone has to go through an interview. Having helped many of our students with successful placements, we have collated a few frequently asked questions in interviews of Data science roles which are used to assess candidates and their knowledge/skill in the subject.

 

Data science Interview questions:

Q.1. How do you select variables from a large dataset?

Filter Methods: They are generally used as a preprocessing step. The selection of features is independent of any machine learning algorithms. Instead, features are selected on the basis of their scores in various statistical tests for their correlation with the outcome variable.

Wrapper methods: In this method, we try to use a subset of features and train a model using them. The problem is essentially reduced to a search problem. These methods are usually computationally very expensive. Some common examples of wrapper methods are forward feature selection, backward feature elimination, recursive feature elimination, etc.

Embedded methods: Embedded methods combine the qualities’ of filter and wrapper methods. It’s implemented by algorithms that have their own built-in feature selection methods. Some of the most popular examples of these methods are LASSO and RIDGE regression which have inbuilt penalisation functions to reduce overfitting.

 

Q.2. How to know whether a cluster generated is good?

To measure the quality of clustering results, there are two kinds of validity indices: external indices and internal indices.

An external index is a measure of agreement between two partitions where the first partition is the a priori known clustering structure, and the second results from the clustering procedure (Dudoit et al., 2002).

Internal indices are used to measure the goodness of a clustering structure without external information (Tseng et al., 2005).

For external indices, we evaluate the results of a clustering algorithm based on a known cluster structure of a data set (or cluster labels).

For internal indices, we evaluate the results using quantities and features inherent in the data set. The optimal number of clusters is usually determined based on an internal validity index.

Determining The Optimal Number Of Clusters: 3 Must Know Methods

Using internal evaluation measures to validate the quality of diverse stream clustering algorithms

As your unsupervised learning method is probabilistic, another option is to evaluate some probability measure (log-likelihood, perplexity, etc.) on held out data. The motivation here is that if your unsupervised learning method assigns a high probability to similar data that wasn’t used to fit parameters, then it has probably done a good job of capturing the distribution of interest.

 

Q.3. How do you handle class imbalance in a dataset? Explain what could you do at the data level and at the model level.
  1. Collecting more data
  2. Changing the performance metric (It was observed in many cases that accuracy does not work well with imbalanced datasets.
  3. Resampling the dataset. Add samples of data that is under-represented. This is known as oversampling. Delete samples of data that are over-represented. This is known as undersampling.
  4. Generating synthetic samples – to randomly sample the attributes from instances in the minority class.
  5. Spot checking of different algorithms
  6. Penalizing the models – Penalised classification imposes an additional cost on the model for making classification mistakes on the minority class during training. These penalties can bias the model to pay more attention to the minority class.

 

Q.4. What is a Naïve Bayes model called a naïve?

Naïve Bayes machine learning algorithm is considered Naïve because the assumptions the algorithm makes are virtually impossible to find in real-life data. Conditional probability is calculated as a pure product of individual probabilities of components. This means that the algorithm assumes the presence or absence of a specific feature of a class is not related to the presence or absence of any other feature (absolute independence of features), given the class variable. For instance, a fruit may be considered to be a banana if it is yellow, long and about 5 inches in length. However, if these features depend on each other or are based on the existence of other features, a naïve Bayes classifier will assume all these properties to contribute independently to the probability that this fruit is a banana. Assuming that all features in a given dataset are equally important and independent rarely exists in the real-world scenario.

 

Q.5. What are the steps in a data analytics project?

The steps in a data analytics project are as follows:

  1. Data Acquisition
  2. Data Understanding
  3. Data Preparation
  4. Data Modelling
  5. Model Evaluation
  6. Model Deployment

 

  1. Data Acquisition and understanding

    For doing Data Science, you need data. The primary step in the lifecycle of data science projects is to first identify the person who knows what data to acquire and when to acquire based on the question to be answered. The person need not necessarily be a data scientist but anyone who knows the real difference between the various available data sets and making hard-hitting decisions about the data investment strategy of an organisation – will be the right person for the job.

    Data science project begins with identifying various data sources which could be – logs from web servers, social media data, data from online repositories like US Census datasets, data streamed from online sources via APIs, web scraping or data could be present in an excel or can come from any other source. Data acquisition involves acquiring data from all the identified internal and external sources that can help answer the business question.

    A major challenge that data professionals often encounter in data acquisition step is tracking where each data slice comes from and whether the data slice acquired is up-to-date or not. It is important to track this information during the entire lifecycle of a data science project as data might have to be re-acquired to test other hypothesis or run any other updated experiments.

     

  2. Data Preparation

    Often referred as data cleaning or data wrangling phase. Data scientists often complain that this is the most boring and time-consuming task involving identification of various data quality issues. Data acquired in the first step of a data science project is usually not in a usable format to run the required analysis and might contain missing entries, inconsistencies and semantic errors.

    Having acquired the data, data scientists have to clean and reformat the data by manually editing it in the spreadsheet or by writing code. This step of the data science project lifecycle does not produce any meaningful insights. However, through regular data cleaning, data scientists can easily identify what foibles exist in the data acquisition process, what assumptions they should make and what models they can apply to produce analysis results. Data after re-formatting can be converted to JSON, CSV or any other format that makes it easy to load into one of the data science tools.

    Exploratory data analysis forms an integral part at this stage as summarisation of the clean data can help identify outliers, anomalies and patterns that can be usable in the subsequent steps. This is the step that helps data scientists to answer the question on – what do they actually want to do with this data.

    “Exploratory data analysis” is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there. — said John Tukey, an American Mathematician

     

  3. Data Modelling

    This is the core activity of a data science project that requires writing, running and refining the programs to analyse and derive meaningful business insights from data. Often these programs are written in languages like Python, R, MATLAB or Perl. Diverse machine learning techniques are applied to the data to identify the machine learning model that best fits the business needs. All the contending machine learning models are trained with the training datasets.

     

  4. Evaluation and Interpretation

    There are different evaluation metrics for different performance metrics. For instance, if the machine learning model aims to predict the daily stock then the RMSE (root mean squared error) will have to be considered for evaluation. If the model aims to classify spam emails then performance metrics like average accuracy, AUC and log loss have to be considered.

    A common question that professionals often have when evaluating the performance of a machine learning model is that which dataset they should use to measure the performance of the machine learning model. Looking at the performance metrics on the trained dataset is helpful but is not always right because the numbers obtained might be overly optimistic as the model is already adapted to the training dataset. Machine learning model performances should be measured and compared using validation and test sets to identify the best model based on model accuracy and over-fitting.

    All the above steps from 1 to 4 are iterated as data is acquired continuously and business understanding become much clearer.

     

  5. Deployment

    Machine learning models might have to be recorded before deployment because data scientists might favour Python programming language but the production environment supports Java. After this, the machine learning models are first deployed in a pre-production or test environment before actually deploying them into production.

 

Q.6. Why does one need to perform data cleaning?

Dataset might contain discrepancies in the names or codes.

Dataset might contain outliers or errors.

Dataset lacks your attributes of interest for analysis.

All in all, the dataset is not qualitative but is just quantitative. So the data needs to be cleaned and transformed to make it suitable for modelling.

Some of the generic questions are as follows :

  1. Tell us about a project in your company where you used/could have used data analytics to do it better.
  2. Why should I hire you over someone with prior experience in Data analytics?
  3. What do you bring to the table over a fresher?
  4. Why do you want to move from your current industry to data analytics/Data science/Data engineering?
  5. Why should I hire you over someone with prior experience in Data analytics?

Irrespective of your background, you can potentially transition to the field of data science. If you’re an IT professional or you hold a PhD in statistics, you just need to hone a couple of skills and you’ll be ready to rule the world of Data Science!

Don’t to read more articles like this?

Subscribe and get more content like this in your inbox.

UpGrad
Check out how UpGrad can help you transition into a successful Data Scientist.
View Program

Data Science Career Paths

Find your interest areas within Data Science & plan next steps

Read Time : 17 Minutes

What to expect from this article

How to Choose Career Paths

How to Choose Career Paths

So what do these guys do? We understand there are a lot of jargons you must be coming across which all sound the same – analyst, scientist, engineer, etc. But what do they mean?

The roles overall can be broadly classified into two key functionalities:

  1. Managing Data (Data Engineer) :

    Getting all the scattered data into the right shape to make a meaningful inference from it.

  2. Deriving insights from that data :

    Roles of data scientists who work more on deriving insights are more specific which include data analyst, data visualiser, data scientist and ML experts.

Let us break it down for you. Let us understand the various job roles in the field of Data Science. We will take the example of such roles in E-Commerce. To illustrate what the actual expectations from these roles may be.

  1. Data Engineer:

    A data engineer creates the platform and the data structure within which all the data from users is captured. For example, in an e-commerce website the items they buy, what is in their cart currently as well as on their wish-list.
    Data engineers should make sure that the captured data is stored in such a fashion that it is not only efficient but also easily retrievable. They are comfortable in working with varied data sources (both structured and unstructured) and write ETL queries to collate data from all of them. Then they organise all this data in data warehouses/data lakes or databases, so that, others in the company can make the best use of it.
    To become a data engineer you need to acquire knowledge of languages such as Python, Java, SQL, Hadoop, Spark, Ruby, and C++. You should note, however, that knowledge of all of these is not mandatory but varies from company to company.
    As a data engineer, you would be sitting at the rare intersection of a software engineering professional and a data analyst.

  2. Data Analyst :

    Data analysts are expected to draw insights from the data, which directly impacts business decisions. They are directly involved in analytics around day-to-day business activities. There are a lot of ad hoc analyses that a data analyst or a business analyst is expected to do.
    For example, a data analyst in an e-commerce company helps the marketing team (also called the marketing analyst) identify the customer segments that require marketing, or the best time to market a certain product, or why the last marketing campaign failed and what to do in the future to prevent such mistakes.
    Hence, for a data analyst, a good understanding of business, data, and statistics is essential. The tools and languages that would be most commonly used by a data analyst would be Excel, SQL, and R, and in some cases Tableau as well for data-driven storytelling.

  3. Data Visualiser/Business Intelligence Professional :

    There might be specialised data visualisers or business intelligence professionals at this e-commerce company. They are responsible for creating weekly dashboards to inform the management ofvarious metrics. These metrics include weekly sales of different products, the average delivery time, or the number of daily cancellations of orders.

  4. Data Scientist :

    A data scientist uses the data that the organisation holds, to design business-oriented machine learning models. For e.g. as a starting point, data scientists can go through the available data of the company to look at various buying patterns, identify similar items on the website, and identify similar users. Then, they will create algorithms around the same so that the website can automatically recommend products to the users based on their navigation histories, purchase histories, and other such metrics. This solution must be effective enough that it can predict the future purchases, in real-time, for website visitors. The requisite tools and concepts for a data scientist is knowledge of algorithms, statistics, mathematics, machine learning, and programming languages such as R, Python, SQL, and Hive. A data scientist should have a business understanding and the aptitude for framing the right questions to ask. They should find the answers in the available data. Then communicate the results effectively to the team members, and all the stakeholders.

 

How to make a career transition in Data Science

  1. Without experience:
    1. A lot of companies tend to hire fresh college graduates and train them in-house. This is because many times they need a set of fresh eyes without any bias to look at business problems.
    2. As a fresher, you don’t have any baggage and your biggest advantage is that you can be moulded in any way.
    3. However, if you prepare yourself beforehand, make yourself conceptually sound in statistics as well as learn relevant tools/languages to demonstrate your skill set, you will have a great upper hand.
    4. Make sure you highlight tools and programming language skills in your resume along with your relevant education.

     

  2. With experience:
    1. Non-technical experience (sales, marketing, operations, public relations, etc.) –
      1. Less than 4 years of experience –
        • For professionals with fewer years of experience – if you really want to switch industries then your previous experience may not count as much and you might have to begin afresh.
        • But don’t forget, your business experience and problem-solving approach can prove to be a big advantage. With enough practice, you have potential to become formidable professional.
        • The most suitable role for you within the data industry would be of a data analyst. For that, you will have to acquire additional technical skills (R, SQL, Tableau) and statistical mathematics. You can start as a Business Intelligence expert, a Data Visualiser, Data warehousing expert or even a data analyst. With significant experience in the field, you can ultimately become a data scientist.
        • If you have the right communication and problem-solving skills, you can transition into these roles.
        • With time, try to acquire more mathematical and software skills and then you can also go for data scientist roles. Don’t forget that you will have to work your way up with practice and dedication. There is no other way.
      2. 4+ years of experience
        • If you have over 4-5 years of experience, then you have a substantial domain experience to simply shed it away and join a completely new field.
        • Analytics is 50% domain knowledge and 50% technical skills. Don’t throw away the advantage you have.
        • Use the knowledge of DS to improve decision making within your existing role instead of switching industries/companies directly. Go for the middle ground. For example, if you have experience in marketing, then go for being a marketing analyst.
        • If you are in sales, you can switch into sales roles of analytics projects within your company or outside.
        • Look for such role preferably within your organisation.
        • DS can be a powerful tool in your arsenal, use it to make more informed and data-driven decisions.
    2. Technical experience –
      1. If you have technical experience in software engineering or mathematics, then you certainly have an advantage. You may not get full credit for your experience as you have knowledge in only one or two component necessary for being a good Analyst.
      2. If you have a software background then Data engineering stream would be the easiest to switch to because it requires a good knowledge of data structures and programming languages.
      3. If you have a background in mathematics or engineering, business analyst/data analyst role would be of interest to you because these are mathematics intensive domains.
      4. One thing that people from technical domain might lack is business understanding and presentation of data. These are the skills that one can acquire with practice and by talking to relevant people in your professional circle and even within your organisation.
      5. Most of the IT companies have some analytics opportunities or the other. The best scenario for such professionals
    3. Consulting Experience –
      1. Consulting experience can be thought of as exactly opposite of technical experience. You have a good business understanding and know how to present data but might lack on the technical front. Consulting experience gets a good weightage, even more than technical experience. You need to equip yourself with the knowledge of relevant programming languages and mathematical tools.
    4. How can you rise in your organisation in your current role (vertical hierarchy)
      1. As we have stressed before, think of Data Science as a tool. It can help you do your tasks better and faster. This skill can help you gain an advantage over your colleagues when it comes to promotions and rise in the organisation. Because of the wide applicability of Data Science in every domain, you become extremely valuable to your organisation.

 

General transition guidelines –

  1. A general advice would be to try transition from your current profile to DS, preferably within your current organisation, instead of making a complete switch and beginning afresh. This way you will get maximum credit for your current experience and at the same time, you will retain your seniority in the organisation. Even if you are looking to switch organisations first work in a similar profile where you can use analytics and once you feel that you are proficient in analytics you can also try working in full-fledged analytics profile in the industry of your choice.
  2. Experience – Don’t think of your experience in your current field as a liability. With experience comes business understanding and knowledge of a particular field, you must leverage that to your advantage. Look for positions in which your experience will complement the data analytics profile. It definitely counts if you are moving within the organisation.
  3. There are companies like TATA steel which traditionally did not have a Data Science team. Now if they want to create a DS team they need someone who has good domain knowledge as well as DS skills. This is where people with 10- 15 years of domain experience and equipped with DS skills will come into the picture. This is just an example there are many more such industries.
  4. Tools – Instead of learning lot of tools focus on one of – R, Python or SAS. All are equally good. More important are techniques you use. Once you master that they can be easily replicated between languages.

Don’t to read more articles like this?

Subscribe and get more content like this in your inbox.

UpGrad
Check out how UpGrad can help you transition into a successful Data Scientist.
View Program

Have more question about Data Science as a career? We'II give you a call and resolve all your queries.

Request Call Back