Industries are using Data science in exciting and creative ways. Data Science is turning up in unexpected places improving the efficiency of various sectors. It is powering up human decision making and impacting the top and bottom lines of the business like never before. Industries are delighting millions of customers by powering up their applications with data science and machine learning.
This blog series aims to talk about interesting applications of data science and machine learning in various companies. A company will be spotlighted in each blog post. This blog series will talk about how companies like Google, Apple, LinkedIn, Uber, Instagram, Twitter, Instacart, Netflix, Washington post, Quora, Pinterest, Amazon, Medium, Microsoft, etc. are leveraging Data Science and Machine learning to power their businesses. So, let us start this series with ‘Netflix’.
It is well known that Netflix uses Recommendation Systems for suggesting movies or shows to its customers. Apart from movie recommendations, there are many other lesser-known areas in which Netflix is using data science and machine learning are:
- Deciding personalised Artwork for the movies and shows
- Suggesting the best frames from a show to the editors for creative work
- Improving the Quality of Service (QoS) streaming by deciding about video encoding, advancements in client side and server side algorithms, caching the video etc
- Optimizing different stages of production
- Experimenting with various algorithms using A/B testing and deciding causal inference. Reduce the time taken for experimenting using interweaving etc.
Every movie recommended by Netflix comes with associated Artwork. The Artwork that comes along with a movie suggestion is not common for everyone. Like movie recommendation, the Artwork related to a show is also personalised. All the members do not see a single best Artwork. A portfolio of Artwork will be created for a specific title. Depending on the taste and preference of the audience machine learning algorithm will choose an artwork which maximises the chances of viewing the title.
A portfolio of Artwork created for the title ‘Stranger Things’:
Personalisation at work. Top row – Artwork suggested for a viewer who likes the actress Uma Thurman. Bottom row – Artwork suggestion for a viewer who likes the actor John Travolta:
Artwork personalisation is not always straightforward. There are challenges to artwork personalisation. Firstly, a single image can only be chosen for Artwork personalisation. In contrast, many movies can be recommended at a time. Secondly, the artwork suggestion should work in association with a movie recommendation engine. It typically sits on top of movie recommendation. Thirdly, personalised artwork recommendation should take into account image suggestions for other movies. Otherwise, there will not be variation and diversity in artwork suggestions which will be monotonous. Fourth, Should the same artwork or a different one be displayed between sessions. Every time showing different images will confuse the viewer and will also lead to the attribution problem. Attribution problem is which Artwork lead the audience to view the show.
Artwork personalisation leads to significant improvements in discovering content by the viewers. Artwork Personalisation is the first instance of not only a personalised recommendation but how the recommendation is made to the members. Netflix is still actively researching and perfecting this nascent technique.
Art of Image Discovery
A single hour of ‘Stranger Things’ consists of 86,000 static video frames. A single season (10 episodes) consists on average 9 million total frames. Netflix is adding content regularly to cater to its global customers. In such a situation it is not possible to harvest manually to find the ‘Right’ artwork for the ‘Right’ person. It is next to impossible for the human editors to search for the best frames which will bring out the unique elements of the show. To tackle this challenge at scale Netflix built a suite of tools to resurface best frames which truly capture the true spirit of the show.
Pipeline to automatically capture the best frames for a show:
Frame annotations are used to capture the objective signals which are used for image ranking. To achieve frame annotations a video is divided into multiple small chunks. These chunks are processed in parallel using a framework known as ‘Archer’. This parallel processing is helping Netflix to capture the frame annotations in scale. Each piece is handled by a machine vision algorithm to obtain the frame characteristics. For example, some of the properties of the frame that are captured are colour, brightness, contrast etc. A category of features which will tell what is happening in a frame and caught during frame annotation are face detection, motion estimation, object detection etc. Netflix also identified a set of properties from the core principles of photography, cinematography and visual aesthetic design like rule-of-third etc. which are captured during frame annotation.
The next step after frame annotation is to rank the images. Some factors considered for ranking are actors, diversity of the images, content maturity etc. Netflix is using deep learning techniques to cluster the images of actors in a show, prioritise the main characters and de-prioritise the secondary characters. The frames with violence and nudity are given a meagre score. Using this ranking method the best frames for a show is surfaced. This way the artwork and editorial team will have a set of high-quality images to work with instead of dealing with millions of frame for a particular episode.
Data Science in Production
Netflix is spending eight billion dollars this year for creating original content. Content created for millions of audience across the globe in more than 20 languages. It should not surprise us if Netflix is using Data Science for producing original content. In fact, Netflix is using Data Science in every step of content production.
Typically producing content will consist of pre-production, production and post-production stages. Planning, budgeting etc. happens in pre-production. Principal photography is part of the production. Steps like editing, sound mixing etc. are part of post-production. Adding of sub-titles and removing the technical glitches are part of localisation and quality control. Now let us see how data science help optimises each stage of production.
Pipeline to automatically capture the best frames for a show:
As said earlier, budgeting is part of pre-production. Many decisions need to take before production starts. For example, the location for shooting. Data science is extensively used to analyse the cost implications of a specific location. Decisions are taken by delicately balancing the creative vision and budgets. Costs minimisation is done without compromising the vision of the content.
Production involves shooting thousands of shots spanning many months. Production will have an objective, but it needs to be undertaken under specific constraints. For example, constraints can be that an actor is available for only one week, a location is only available for particular days, the working hours for the crew is 8 hours per day, time constraints such as a day shot or night shot, the team may have to move locations between shoots. Preparing a shooting schedule with all these constraints can be a nightmare for the director. Mathematical optimisation techniques are used here with an objective and constraints. This optimisation technique will give a rough shooting schedule. This schedule is refined further with adjustments.
Post-production will take as much time as production if not more. Data visualisation techniques are used to check the bottlenecks in post-production. Visualisation techniques are also used to track the trend in post-production and project it into the future. This forecasting is done to see the workload of various teams and staffing the team appropriately.
In localisation, shows are dubbed from one language to another. Prioritisation regarding which shows needs to be dubbed is decided based on data analysis. Dubbed content which proved popular in the past is prioritised. Quality control will check for issues like syncing between audio and video, syncing of subtitles with sound etc. Quality control is done both before and after encoding (the process of compressing videos into different bitrates for streaming on different devices). Netflix accumulated historical data from manual quality control checks. This data consisted of the errors which occurred in the past, the video formats in which the errors were found, the partners from whom this content was obtained, the genre of the content etc. Yes, Netflix saw a pattern of errors in the genre as well. Using this data a machine learning model was built which predicts either ‘pass’ or ‘fail’ of the quality checks. If a machine learning algorithm predicts ‘fail’, then that asset will go through a round of manual quality checks.
Streaming Quality of Experience and A/B testing
Data science is extensively used for ensuring the quality of the streaming experience. Quality of network connectivity is predicted to ensure the quality of streaming. Netflix actively predicts which show is going to be streamed in a particular location and caches the content in the nearby server. The caching and storing of content are done when internet traffic is low. This ensures content is streamed without buffers and customer satisfaction is maximized.A/B testing is extensively used whenever a change is done to the existing algorithm, or a new algorithm is proposed. New techniques like interleaving and repeated measures are used to speed up the A/B testing process using a very less number of samples.
To conclude, these are some ways Netflix is using data analysis to engage and awe the customers. If you are interested in diving deep and knowing more about how this marvellous company is using data science, visit their Research blog. There is a treasure trove of articles on their blog waiting to be explored.
In the upcoming blog series let us see how Instacart is leveraging data science and machine learning. Now you have read this blog, provide feedback on what you think about this article. Also, offer suggestions regarding which company you would like to see in my future series.
Latest posts by Thulasiram Gunipati (see all)
- A Brilliant Future Scope of Machine Learning - July 18, 2019
- Data Analyst vs Data Scientist – Spot the Difference - July 8, 2019
- Applications of Data Science and Machine Learning in NETFLIX - August 21, 2018