Analyze movie data from the past hundred years and find out various insights to determine what makes a movie do well. Use data manipulation, data slicing, and various other data-frame operations to successfully find usable insights from the movies.
Analysis of the Uber dataset to understand the demand and supply of cab services using data visualisation.
As part of this case study, students must use the Boto3 SDK and AWS services to build an application that scans an image stored in S3 for celebrities.
Using EDA, help Stack Overflow implement the features in its web application such as sending notifications to relevant users if a question is raised and calculate the approximate time within which the user will receive an answer.
Similar to the BFSI or the healthcare industry, the telecommunications industry deals with big data situations. Now, these situations involve collecting, storing and analyzing data to generate insights.
Analyze a public click stream dataset of a cosmetics store to extract valuable insights which generally data engineers come up with in an e-retail company Use effective querying, query optimization, file partitioning techniques for enhancing the performance of your queries.
Perform the real time hashtags analysis of tweets using the Twitter APIs Get the real time tweets from API and then filter out the hashtags using Spark Streaming.
Using the ALS algorithm, predicted the top 20 recommendation to each user who have rated certain movies based on his/her liking or disliking. The recommendation system is built on MovieLens dataset using a collaborative filtering technique i.e. ALS algorithm.
Implementation of market basket analysis using Spark with the help of Apriori algorithm. The assignment tests the business acumen by implementing multiple strategies over the results of the Apriori algorithm.
An e-commerce company is evaluating its customer experience. They run a survey among the customers to understand their feedback. The feedback shows that late delivery is one of the most frequent problems faced by the customer.
A common challenge in the real estate industry includes predicting the price of a flat given various factors such as area, bedrooms, parking, etc. Machine learning enables you to answer this question by making use of the relationship between these variables.
Build a pricing model to predict the fare of a taxi ride given set of attributes such as date-time, coordinates for pickup and drop off. Data understanding, outlier analysis, and effective feature engineering are essential in any modelling exercises.
The objective of the case study is to predict whether a customer leaves the network or not. A logistic regression model using the scikit learn library is built to predict the churn rates of individual customers. The data is divided in three different tables which need to be combined.
The dataset contains information about advertisements shown to customers. The complete dataset is anonymised for privacy reasons. The objective of the case study is to build a logistic regression model which can predict whether an advertisement will be click or not. The case study includes model building a logistic model in pyspark.
To correct predict whether a transaction is fraud or not based on the previous records. Requires building a classification decision tree.
You will be required to build a tree to predict the income of a given population, which is labelled as <= 50K and >50K. The attributes (predictors) are age, working-class type, marital status, gender, race, etc. Build a decision tree in python. Visualization and hyperparameter tuning included.
LibSVM dataset is used to represent data that contains numerous missing values in a compact form. Build a regression decision tree in spark using spark libraries.
To correctly predict whether the person has heart disease or not based on: id, age, gender, height, weight, ap_hi, ap_lo, cholesterol, gluc. smoke, alco, active, cardio. Build a decision tree in python using sklearn. Also, requires hyperparameter tuning and visualizing the tree.
Use of Random forest model to recommend an FBI code based on the different attributes of the crime for the Chicago police.
Analyse the data of different matches to understand what kind of strategy works better to improve the ranking.
In this case study, we have built a clustering model for customer segmentation. The data contains various attributes such as InvoiceDate, UnitPrice, CustomerID etc. On the basis of this data, the supermarket store wants to segment its customers into clusters based on their shopping patterns.
Implemented the movie recommendation system (Item based filtering) by reducing the dimensionality of the movies vs tags dataset using PCA algorithm. PCA act as dimensionality reduction technique and using this we are able to compress the data in 100 dimensions instead of 1000 dimensions (tags) in the original dataset.
*More details under the referral policy under Support Section