This post was originally published in Analytics India Magazine.
Crunching numbers and spotting patterns has become the gold standard in the IT industry. Data analyst jobs are in demand, LinkedIn’s Most Promising Jobs of 2017 has listed Data Engineer at number 9 and Analytics Manager making it number 18.
Another Glassdoor study of the 50 Best Jobs in America puts Data Scientist at the top spot with Data Engineer coming in at a close number 3 and Analytics Manager at an enviable number 5. However, taking the top slot isn’t easy. You need an armory of data analytics skills if you want to clock year-on-year growth and a fat pay package with this major career advancement.
While the internet is abuzz with free resources on how to master the fundamentals of data science, sentiment analysis and fast track machine learning among others. Analytics India Magazine and UpGrad help you cut through the claptrap by listing down 5 basic skills needed to become a data analyst.
Put yourself in the lead for Data Analyst jobs with these skills
Let’s start with a few basics:
Not everybody can become a data analyst. You need to have a natural leaning toward math and statistics. All those years of learning calculus and probability will come in handy. A degree in Computer Science is always an added advantage.
Statistics: To become a full-fledged data analyst, a thorough grounding in statistics is essential. Being good at statistics will help you understand algorithms deeply and also when they should be used.
Brush up on applied statistics, linear algebra, real analysis, graph theory and numerical analysis. Linear algebra comes into play with regression, understanding data structures and preparing data for prescriptive and predictive data modeling.
1) Statistical Language:
SAS vs R vs Python: It’s a question that needles most data nerds when it comes to picking up the analytical tool of choice. While SAS (expensive) and Python (billed for low-scale data processing) are easy to learn, R (low-level programming language) wins hands down thanks to its advanced computing capability, better graphical capabilities and advanced tools.
Since R is open sourced, features and packages get added quickly as opposed to SAS. Another reason why R is thriving is it has a huge ecosystem backing it up that keep it up-to-speed with rich features.
Pro Tip: R’s commercial appeal has made it a household (read: IT/tech focused companies’) name and while SAS is still widely used by enterprises, this statistical language is catching on. But R has a steep learning curve.
2) Querying Language:
SQL: One of the oldest querying languages, SQL is a general-purpose database language which is used for analytical as well as transactional queries. SQL is mainly used in day-to-day operations and cannot support petabytes of data. Programs like Unity tutorial can help you familiarize yourself with PHP & MySQL more in-depth.
Hive: This Hadoop query language was invented by Facebook’s Data Infrastructure team. Right from the day that Hive was open sourced in 2008, it has become the popular choice for business analysts. The open source data warehousing solution that uses an SQL type language called HQL can support terabytes and petabytes of data as opposed to SQL. The downside is it only supports structured data.
PIG: One of the biggest advantages for Pig is that it can process both structured and unstructured data and works over MapReduce. It is the go-to language for most programmers who tend to write scripts. What you need to do is learn Pig Latin that helps tackle structured/unstructured and semi-structured with more ease as compared to Hive. Here’s a bit of history trivia – Pig was created in Yahoo in 2006 to perform MapReduce jobs.
Pro Tip: Knowledge of SQL will help in picking up Pig and Hive.
3) Scripting Language:
MATLAB: It’s a language used for data mining. Some might argue that its popularity has declined. It wouldn’t hurt to put it in your arsenal. Remember, MATLAB has been around for a long, long time, invented in the late ’70s as a tool for data analysis.
Python: This is hands down one of the most popular scripting languages and its popularity stems from current stack. The core libraries NumPy, SciPy, Pandas, matplotlib, IPython. Perfect for modeling and analysis. It has one drawback though – scalability for large datasets.
Pro Tip: Python has a strong community and is best used for scraping websites and data engineering. Guess what? It’s so easy that people with a non-programming background can also master it!
Machine Learning (ML) is not just a buzz word. It is finding a lot of utility across domains and gaining immense traction, and therefore turning out to be an essential skill that data professionals need to have. In ML, regression, classification and segmentation are the broad learning areas that analysts should focus on.
You have all this data; now how do you bring it to life? Your job, as a data analyst, would be to make evocative reports, find trends and communicate these findings to the top brass. Data visualization tools to master are Tableau, Microsoft Power BI, Oracle Visual Analyser, SAS Visual Analytics. If you like R, you can use the ggplot package to create highly interactive charts and graphs.
Pro Tip: Don’t just learn the tools. Try understanding the motive of visually encoding data as well.
Essentially used to better understand the customer, database analysis extends from basic analysis to complex data mining through various tools – Geographic Information System (GIS) or text analysis. The basic steps for analyzing databases are to extract, clean, merge, analyse and implement.
Data Munging or Data Wrangling
Before you start extracting insights from reams of data, data must be cleaned. In plain speak, somebody needs to do the job of a janitor, which means, manually cleaning data and processing it in a unified format before it is analyzed. So far, excel has been used for cleaning and enriching data, but Stanford debuted an interactive tool, a work-in-progress called Wrangler.
Pro Tip: Give Wrangler a try and see how you can manipulate real-world data and export it for use in Tableau or R
A data analyst does not requires advanced skills like data scientists. However, since these roles are multi-faceted and learning is a continuous process, with additional resources you can become a junior data scientist as well.
Essentially, mathematics and statistics (32%), computer science (19%), and engineering (16%) are predominantly the most important fields of study for a data scientist. Data analysts are generally expected to be proficient with languages such as SAS and/or R.
It’s advisable for people with a computer science background to know Python, Hadoop, and SQL coding. Additionally, working with unstructured data is an integral part of the data analyst job. It’s a good idea to be accustomed to unstructured databases. Moreover, a data analyst must imbibe qualities such as developing a business acumen or good communication/presentation skills, as these skills will help stay ahead of the game.