This blog was originally published on Medium by Aiswarya Ramachandran – an alumnus of UpGrad’s Data Science program with IIIT-Bangalore.
In one of my previous posts on Medium, I had written about how to scrape search results for a particular query string from Medium. In this post, we will go into details of analyzing the data scrapped for the search term “Data Science” to group posts based on Number of claps and Responses into different levels of popularity and also understand what makes these posts popular.
The data scrapped from Medium search results was JSON file with extensive data about each search result. To explore the structure of JSON file, I used Notepad++ with JSON plugin. The JSON file had data about the posts, author of the post and publisher associated with that post (if any). Here’s the JSON data structure for a medium post:
The code to extract data from the JSON file can be found here. In addition to extracting data from the JSON file, I also added a field with the date when the post was scrapped.
Table of Contents
Exploratory Analysis of Posts Related to “Data Science”
On scraping results for search term “Data Science”, 831 posts were scrapped, out of which 31 were responses to a post and were excluded from the analysis. Here are the number of posts published over years, the data scrapped was from March 2013 to April 2018:
All the date fields like Created Date, First Published Date, Last Updated Date wherein milliseconds elapsed since Jan 1970. They were converted into a human readable date format using the function below
# Function to Convert EPOCH Date to Human Readable format
def convertToDateString(date): return (datetime(1970, 1, 1) + timedelta(milliseconds=date)).strftime("%Y-%m-%d %H:%M:%S")
The next step was to look at what words were most commonly occurring in the titles of these posts. As you can see from the word cloud below, Data Science, Big Data, AI, Analytics, Machine Learning, Python, self-driven (about self-driving cars) are some of the most frequently occurring words.
The distribution of Number of Claps, Number of Responses is highly skewed. 708 posts have less than 500 claps. This shows that there are few posts which become popular. Here’s the distribution of claps:
The Reading Time (mins) of most articles is between 1 to 3 min.
On Medium, each post can have a maximum of 5 tags. Tags help readers find content more easily. The more relevant tags, the easier to find. As we can see in the image, Data Science is the most frequently used tag, followed by Machine Learning, Big data, Artificial Intelligence. Here are top 10 tags related to data science:
Creating Clusters Based on User Responses
There are three metrics to measure how popular a post is on Medium viz. #Claps, #Responses and #Recommends. To make a fair comparison, I also included feature #Days between First Published and data collection date.On this feature set, I applied k-means clustering and identified three clusters. As we can see from the image below, there is a huge difference between the three metrics across clusters (Popularity Groups). Also, we can see that for the less popular posts though their median days between publishing and scrapping is the highest their engagement is very low. Here are the metrics across clusters (Popularity Groups):
Understanding What Makes a Data Science Post Popular
As we can see from the image below, for more popular articles the median for high and medium popularity articles are 9 and 7. They also have more links compared to less popular articles. This means that Popular posts refer to other posts and other sources of information adding more value to the content. Difference between Popular and Non-Popular Posts
From the image above, we can also see that the post with medium popularity is closer to a highly popular group than to the less popular group.
With a simple k-means, we were able to identify popular and non-popular posts on Medium related to Data Science.
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
When it comes to Medium, how often should you post?
If you want to be successful on Medium and can't post every day, at least write 3 to 5 times each week. Consistency is the most essential thing you should strive for. Whatever timetable you come up with, be sure it is sustainable in the long term and stick to it.
Is it possible for anyone to get published on Medium?
Anyone may create a free Medium account and begin blogging right away. Writers can submit standalone pieces, contribute to collections of collected stories, or create their own collection. With their simple editor, you can share your experiences with the world as a Medium writer. Publishing on Medium is completely free, and your stories will be shared with your followers as well as millions of other people who are interested in similar themes.
On Medium, what is Towards Data Science?
The company, Towards Data Science Inc., is based in Canada. They use Medium to create a forum for thousands of individuals to share ideas and learn more about data science. Authors can choose to restrict access to their posts to members exclusively as part of the Medium ecosystem. Through the Medium Partner Program, you can reach a larger audience and earn money by publishing in Towards Data Science. In line with the Medium Terms of Service, which you agree to when creating a Medium account, you are also the sole owner of your work.