This is the first of a two-part series.

Part One — Building A Data Warehouse

Nowadays, everyone wants to build a data warehouse. But does one really need it? Even if you need it, how do you know you’re building the right thing and when are you really going to start to reaping early benefits from it?

But first things first, what is a data warehouse? Simply put, it’s a single place where you can store data from all sources. It helps one answer the questions that require complex analysis involving data from multiple sources. You can also build a data warehouse in a fashion that you get your most frequent data requirements taken care of quickly.

A year ago, we were struggling with this question at UpGrad — to build or not to build a data warehouse?

In order to answer this, and many other such questions, we talked to a lot of other people who had done it before. The first thing that we noticed was that to build a data warehouse (or DW), you need the right team of data engineers, architects, analysts and product managers. The first question we asked was — is it really worth that much investment?

Learn Data Science Courses online at upGrad

To find the right answer, we need to ask ourselves the right set of questions. These questions might take a good deal of time and energy, but once you are done with these, you will be far more confident about whether to move ahead with DW or not. Here, we’ll provide the answers we got from our own exercise to enhance your understanding, and hopefully aid you in this process of deciding whether or not to set up your own data warehouse.

A Start-Up's Guide to Data Analytics UpGrad blog

Question #1: What answers do you want to get from analytics/data? And at what frequency?

As you must have noted already, this is the most important question of all. You must involve other teams (Sales, Marketing, Business) while answering these questions to make sure you don’t miss anything.

What this meant for us: We wanted 3 important answers from analytics/data:

a. Which channels in marketing are performing well i.e. multi-channel attribution?

UpGrad’s marketing team uses different channels, both online and offline, for user acquisition. We conduct offline workshops and events for professionals seeking a career upgrade. We also use online channels like Facebook and Google to attract these professionals. So it becomes very important for us to know which channels are performing well, in order to craft our marketing strategy on a weekly, or even daily, basis. Further, we also want to know whether re-marketing or offline efforts have any effect on converting these users into paid students.

b. What does our conversion funnel look like?

Our funnel looks much larger than most companies. First visit — signup — application start — application submit — test-taken/exempted — shortlist — paid. It is critical to know what the funnel looks like based on multiple different features like city, age group, acquisition channel etc.

c. Can we predict whether a user will end up paying or not, i.e. lead scoring?

Lead scoring can be based on two things — fit and interest. The fit is determined by user attributes like years of experience, GRE/GMAT/CAT score etc. Interest is based on how active the user has been on the website, or how responsive the user is to calls or emails.

Apart from these, we wanted to:

d. Track every student performance in a course or program so that we can help them at the right time.

e. Monitor student’s ratings and reviews of the course content.

We got many more such questions from different teams… but you get the idea.

Top 4 Data Analytics Skills You Need

Question #2: Which of these answers are already provided by the current setup, or would require only minimal tweaks?

Asking this question will give you a good sense of current database capabilities. Make sure you have the right engineers in the room when you ask this (hint: most of these would be backend engineers in a startup who look after the transactional database).

What this meant for us:

a. Multi-channel attribution

Before making a purchase, visitors makes many visits through different channels. Sometimes they simply find you on Google and come to your website, and sometimes they come to attend an offline promotional event. So when a visitor finally buys the product, we want to be able to attribute which of the channels have been most effective. To do so, we have to merge both online and offline data* in one place and run different attribution models.

b. Conversion funnels

Our funnel again includes some offline components, like shortlists and tests which are uploaded manually into Salesforce by the counselling team. The funnel requires merging webstream data to Salesforce data.

c. Lead Scoring

Most of the lead scoring tools are basic. For example, you can score on the basis of events streamed in Pardot (by Salesforce). We needed a system which could merge data from Salesforce, web analytics, and emails to give a final score based on fit and interest.

d. Student performance

Since this data is stored in a transactional database, we could find a visualisation tool like BIME or Tableau to pull the data and create these tracking dashboards.

upGrad’s Exclusive Data Science Webinar for you –

ODE Thought Leadership Presentation

e. Student’s ratings and reviews

Same as (d) above.

So, we started building a data warehouse schema, keeping in mind a, b, and c. Many startups don’t require lead scoring and have only one source of data for conversion funnels and attribution. For those startups, a Business Intelligence (BI) tool is more effective than actually building a data warehouse.

A Start-Up's Guide to Data Analytics UpGrad Blog

Explore our Popular Data Science Certifications

Executive Post Graduate Programme in Data Science from IIITB	Professional Certificate Program in Data Science for Business Decision Making	Master of Science in Data Science from University of Arizona
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Certifications

Question #3: Will things look different as you scale over the next 1–2 years?

At scale, your transactional database might get very large and queries could get slower or start failing. You should plan for such situations as well, while designing the warehouse.

What this meant for us:

Our student activities database table will grow very fast as we add more courses and students. The queries have already started slowing down. It made sense to keep this in mind while designing the schema.

Top Data Science Skills to Learn

SL. No	Top Data Science Skills to Learn
1	Data Analysis Programs	Inferential Statistics Programs
2	Hypothesis Testing Programs	Logistic Regression Programs
3	Linear Regression Programs	Linear Algebra for Analysis Programs

Question #4: Is there anywhere else you want to send the data that you want in your data warehouse?

The data stored in the warehouse might have many different use cases, apart from the principal one. These use cases help you think through the schema, and include additional fields, if needed, while building the schema.

What this meant for us:

The lead score is used by the counselling team, so we have to send this to Salesforce. The fit score of the lead scoring can also be used by a particular course team to auto-exempt them from the course. The attribution model is used by the marketing team, so we have to send it to the BI tool in a particular format.

Finally, Question #5: Do you have the right team to make decisions like:

Which analytics database should you use, based on the scale and analytics use-cases?
What should be the schema/data model for the current use cases? Is this schema scalable?
What kind of ETL would be required for creating the analytics database? How much time would the ETL take?
What would be the update frequency of different tables? How should you handle real-time use cases, like the one for recommendation engines?

You will need a data engineer, a senior engineer who has already worked with data for 3–5 years at least, and a data scientist to make many of these decisions.

12 Ways to Connect Data Analytics to Business Outcomes

After thinking through these 5 questions, a startup can decide whether or not to build a data warehouse. Here’s a simple list of pros and cons of a data warehouse, to help you evaluate even further:

Pros —

You will have full control over your data, and switch over to third-party tools easily as and when they get more expensive for you or don’t meet your requirements.
You can build data science products! Recommendations, search, sentiment analysis, spam vs ham etc. Be careful and check beforehand whether you will require real-time data for these products, or they will need to be updated hourly/daily.
As pointed out earlier, you can save a lot of time and troubles for analysts. The queries will be faster, and data will be reliable.

Cons —

You need to invest in engineering and data storage resources heavily, long before you can start reaping benefits.
Chances are your first build will be far from perfect. If you are an early-to-mid-stage company, a lot of processes are still evolving. You can’t cover the cases that are going to come up in the next 3–6 months. Facing questions like why didn’t we think of that, etc might end up disheartening you. You will need to brush off these small setbacks, and keep your eyes on the long-term goal.
Most organisations don’t have the right research and patience to build the Data Warehouse solution for their needs. You will need to invest a lot of time before starting it all up.

A Start-Up's Guide to Data Analytics UpGrad Blog

Once you have completed this exercise, I am pretty sure you will be ready to embark upon the data analytics journey for your startup and will avoid costly mistakes. Comment below and let us know if you liked this post or found it useful. Stay tuned for the next one!

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	Top 6 Reasons Why You Should Become a Data Scientist
A Day in the Life of Data Scientist: What do they do?	Myth Busted: Data Science doesn’t need Coding	Business Intelligence vs Data Science: What are the differences?

*If we had only online channels, we could have used google analytics multi-channel attribution. We also have offline events data, which can be uploaded to google analytics. Problem solved? Alas! GA forbids you from sending any personally identifiable information. In absence of email information, it’s hard to link this data to other data sources, unless you map google analytics’ ID to emails in your own database, look up these IDs and upload offline data with these IDs into GA.

Frequently Asked Questions (FAQs)

1. Why is Data Analytics important in a start-up?

To begin, data analysis can assist a start-up in determining its objectives. It would be difficult to set goals and track progress without metrics, which helps a start-up to keep improving and moving forward. Secondly, everyone in a company can utilise data to boost productivity and improve decision-making. It assists entrepreneurs in making wise, measured, and well-informed start-up decisions. Also, knowing what customers want ahead of time makes marketing campaigns more customer centric. Finally, data analytics assists start-ups in discovering further potential chances to optimise operations and increase earnings.

2. Does Data Analytics really matter for start-ups?

The answer is Yes! Start-ups are both thrilling and exhausting. The possibilities are limitless, which is both exhilarating and overwhelming. There are numerous things that must be put in place, but data analytics is frequently overlooked. If you think data analytics is something you can put off until your company is well established, you’ll find that getting there is a lot more challenging. What you learn from data analytics could be the key to getting you to the next level. It is data that answers crucial questions about your marketing, users, product, productivity, customer service, to help you take the right direction for your start-up.

3. Which are the best Data Analytics tools for start-ups?

In the twenty-first century, data gathering, and analysis are crucial to making decisions. Whether you sell a small product, a software as a service (SaaS) business, or run a website, you need to know what motivates your customers to buy your product, what your marketing funnel looks like, and how you can improve it. Some of the most effective analytics tools to aid in the success of your business are Google Analytics, R and Python, Microsoft Excel, Tableau, RapidMiner, KNIME, Power BI, Apache Spark, Qlik View, Talend, Splunk, etc.

Suggested Blogs

57467

Priority Queue in Data Structure: Characteristics, Types & Implementation

Introduction The priority queue in the data structure is an extension of the “normal” queue. It is an abstract data type that contains a

by Rohit Sharma

15 Jul 2024

142458

An Overview of Association Rule Mining & its Applications

Association Rule Mining in data mining, as the name suggests, involves discovering relationships between seemingly independent relational databases or

by Abhinav Rai

13 Jul 2024

101684

Data Mining Techniques & Tools: Types of Data, Methods, Applications [With Examples]

Why data mining techniques are important like never before? Businesses these days are collecting data at a very striking rate. The sources of this eno

by Rohit Sharma

12 Jul 2024

58114

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]

Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. The full form

by Rohit Sharma

11 Jul 2024

99373

Top 7 Data Types of Python | Python Data Types

Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat

by Rohit Sharma

11 Jul 2024

16859

What is Decision Tree in Data Mining? Types, Real World Examples & Applications

Introduction to Data Mining In its raw form, data requires efficient processing to transform into valuable information. Predicting outcomes hinges on

by Rohit Sharma

04 Jul 2024

82805

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About

What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes

by Rohit Sharma

04 Jul 2024

10471

Most Common Binary Tree Interview Questions & Answers [For Freshers & Experienced]

Introduction Data structures are one of the most fundamental concepts in object-oriented programming. To explain it simply, a data structure is a par

by Rohit Sharma

03 Jul 2024

70271

Data Science Vs Data Analytics: Difference Between Data Science and Data Analytics

Summary: In this article, you will learn, Difference between Data Science and Data Analytics Job roles Skills Career perspectives Which one is right

by Rohit Sharma

02 Jul 2024

A Start-Up’s Guide to Data Analytics (Part One)

Part One — Building A Data Warehouse

Question #1: What answers do you want to get from analytics/data? And at what frequency?

a. Which channels in marketing are performing well i.e. multi-channel attribution?

b. What does our conversion funnel look like?

c. Can we predict whether a user will end up paying or not, i.e. lead scoring?

d. Track every student performance in a course or program so that we can help them at the right time.

e. Monitor student’s ratings and reviews of the course content.