Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconScraping Twitter Data With Python [With 2 APIs]

Scraping Twitter Data With Python [With 2 APIs]

Last updated:
30th Nov, 2020
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
Scraping Twitter Data With Python [With 2 APIs]

Introduction

Social media platforms like Twitter are one of the great repositories for gathering datasets. Working on a new data science project requires a fair amount of data, gathering the dataset is not an easy task.

And Twitter provides a diversified genre of data because it is a collection of tweets from people with different mindsets and different sentiments. This kind of a dataset without bias is a much-needed prerequisite for training a new machine learning model.

Let’s get started!

We are going to walk through 2 APIs for Twitter data scraping.

  1. Tweepy
  2. Twint

Tweepy

Before we start walking through python code for scraping data using Tweepy API, there’s a point you need to know that we need credentials of a Twitter developer account and it’s a piece of cake if you already have them.

For the people who don’t have a developer account, you can apply for that over here. And before applying for a developer account you need to have a Twitter account. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.

Once you receive approval for the developer account, make a note of your consumer API keys, access token, and access token secret from the “keys and tokens” section.

Also, there’s a point to be noted that there are few constraints for tweepy like you can only scrape tweets that are not older than a week. And a limit in scraping, up to 18000 tweets in a 15 minutes time frame.

Great, now that we have keys and tokens from the developer account let’s authorize them.

consumer_key = your consumer key

consumer_secret = your consumer secret

access_token = your access token

access_token_secret = your token secret

authorization = tweepy.OAuthHandler(consumer_key, consumer_secret)

authorization.set_access_token(access_token, access_token_secret)

api = tweepy.API(authorization,wait_on_rate_limit=True)

Now that we have authorized with our credentials, Let’s scrape tweets of a particular account. For now, let’s scrape tweets of Mr. Sundar Pichai. 

 

username = sundarpichai

count = 100

try:

 #line1

 tweets_obj = tweepy.Cursor(api.user_timeline,id=username).items(count)

  #line2

 tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

  #line3

 tweets_df = pd.DataFrame(tweets_list)

except BaseException as e:

      print(something went wrong, ,str(e))

In the above snippet, line1 creates an iterable object with all the tweets and it is assigned to a variable “tweets_obj”. Once we are done with creating an iterable object, let’s iterate over it and extract all the data.

We are extracting only a few attributes like “created_at”, “id”, “text” and appending them to each entry in a 2D array. Where each entry has all the data of each tweet we scraped. Now that we have a 2D array with attributes as each entry we can convert it to a data frame by using “pd.DataFrame()” syntax.

The reason for converting an array to a data frame is more flexibility, availability of predefined methods, and easy access makes it stand out from all the data structures for data science projects.

Similarly, let’s walk through a code for scraping data that has a particular text query.

text_query = vocal for local

count = 100

try:

 #line1

 tweets_obj = tweepy.Cursor(api.search,q=text_query).items(count)

  #line2

 tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

  #line3

 df = pd.DataFrame(tweets_list)

 except BaseException as e:

    print(something went wrong, ,str(e))

 

In the above snippet, everything is the same as the previous snippet. At last, we created a data frame with all the tweets containing the text query “vocal for local”.

If you are looking for a more specific or customized data scraping like including more attributes like retweets count, favorites count, etc. We can customize our syntax and extract other attributes provided by tweepy. For further reading on other attributes offered by tweepy have a look at the documentation.

Twint

Twint API doesn’t require any developer account credentials, you can scrape tweets easily without any authorization keys. Also, twint doesn’t have any restrictions like the number of tweets, time frames, scraping limits, etc. Twint provides you a seamless data scraping and an easy to use API.

We can print the list of followers of a person using his username from the twint API.

t_obj = twint.Config()

t_obj.Username = sundarpichai

twint.run.Followers(t_obj)

In the above snippet twint.Config() configures the twint API and makes things get started. And after assigning an object we can use that reference for our work, “t_obj.Username” assigns the username which we’ve entered. And twint.run.Followers perform a search of all followers of that username.

We can also store the scraped data into a data frame similar to the tweepy API.

t_obj.Limit = 100

t_obj.Username = sundarpichai

t_obj.Pandas = True

twint.run.Followers(t_obj)

result_df = twint.storage.panda.User_df

Everything in the snippet is almost the same as the previous snippet, only with an extra line of syntax “twint.storage.panda.User_df” which converts the scraped data to a data frame. The result data frame consists of a list of followers of the username given.

Explore our Popular Data Science Online Courses

Now that we have seen scraping the follower data of a particular username, let’s walk through the code for scraping tweets of a particular account.

Read our popular Data Science Articles

t_obj.Search = from:@sundarpichai

t_obj.Store_object = True

t_obj.Limit = 20

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

In the above snippet, we are making the configured object to search tweets of a particular person, we can also set the limit of tweets while scraping using the syntax “t_obj.Limit”. And after running the search, it creates a list of all the tweets and we can assign it to a local variable as per our need.

After seeing the snippets of scraping info of a particular account, you may have a quick question which is how to scrape tweets containing a particular keyword?. Not an issue twint has a solution for this.

t_obj.Search = data science

t_obj.Store_object = True

t_obj.Limit = 100

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

The above snippet is the same as the snippet for scraping tweets from a particular account, with a single difference in line1. We can also convert it to a data frame as per our convenience.

For further reading on twint API have a look at their repository and documentation.

Our learners also read: Top Python Courses for Free

Top Data Science Skills to Learn to upskill

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on How to Build Digital & Data Mindset?

 

Conclusion

We have understood the importance of scraping the data. Walked through two APIs and their features for scraping Twitter data. Seen a few methods for converting the scraped data into our required file format. Now that you are aware of these APIs, start scraping data for your data science projects!

We at upGrad are happy to help you and would also like to let you know about the opportunities that you can have by learning python. Python has been used extensively for Machine Learning and Data Science, two of the most popular and emerging technologies. Learning Python and also having knowledge of these skills will make you excel in your field and get better career opportunities.

We have a lot of courses developed along with industry experts and top academic institutes to provide you with all the skills required to excel in this field. Some of the courses that can help you make use of your knowledge in python and increase your career prospects: 

Data Science:

Check out upGrad’s Online Data Science Programs developed with IIIT-B, it is a full-fledged data science course to enter into this field and make a mark in the industries with your knowledge. 

Masters of Science in Data Science: Developed in coordination with Liverpool John Moores University and IIIT-B, got a master’s degree in Data Science from one of the top universities of the world. 

Machine Learning:

Advance Certification in Machine Learning and AI: IIT madras, one of the best educational institutions of India, has partnered with upGrad to make an advanced course on Machine Learning for individuals to have complete knowledge of Machine Learning with this course. 

Masters of Science in Machine Learning and AI: Liverpool John Moores University and IIIT-B have together partnered with upGrad to provide complete masters of science degrees for the individuals to learn the technology in detail and get a formal degree in this technology to pave a successful path in this field.

PG Diploma in Machine Learning and AI: IIIT-B and upGrad came together to help individuals get an opportunity to do a 12-month long course on Machine Learning and AI and have the chance to enter this technology with this course.

Check out all trending Python tutorial concepts in 2024.

Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1What is the scraping of data?

Data scraping refers to a process in which a computer software scrapes data from an output created by another program. Web scraping is a type of data scraping that is used to gather data or information from different websites. In web scraping, an application is used to collect valuable information from a website. The web scraping program can rapidly and easily access the WWW(World Wide Web) using the HTML(Hypertext Transfer Protocol) or a web browser.

2Why is data scraping necessary on Twitter?

Data scraping on social media aids in tracking, evaluating, and scrutinizing the data available on the platforms. Twitter is the most popular platform, and scraping Twitter data helps users to analyze user behavior, competition strategy, sentiment analysis, and remain up to speed with what's occurring on the world's most popular social channel from the tweets of people, peers, and businesses that matter to you. Twitter data scraping service handles your end-to-end needs in the lowest amount of time and gives you the necessary data. Twitter, for example, only allows crawlers to collect data via its API to restrict the quantity of information on their users and their activities.

3What are APIs?

Application Programming Interfaces are the tiny bits of code that allow digital devices, software programs, and data servers to communicate with one another, and they are the vital backbone of many of the services we currently rely on. An API connects computers or pieces of software to one other, as opposed to a user interface, which connects a computer to a human. It is not designed for direct usage by anybody (the end-user) other than a computer programmer who incorporates it into the software. An API is frequently composed of many components that serve as tools or services to the programmer.

Explore Free Courses

Suggested Blogs

Priority Queue in Data Structure: Characteristics, Types & Implementation
57396
Introduction The priority queue in the data structure is an extension of the “normal” queue. It is an abstract data type that contains a
Read More

by Rohit Sharma

15 Jul 2024

An Overview of Association Rule Mining & its Applications
142323
Association Rule Mining in data mining, as the name suggests, involves discovering relationships between seemingly independent relational databases or
Read More

by Abhinav Rai

13 Jul 2024

Data Mining Techniques & Tools: Types of Data, Methods, Applications [With Examples]
101548
Why data mining techniques are important like never before? Businesses these days are collecting data at a very striking rate. The sources of this eno
Read More

by Rohit Sharma

12 Jul 2024

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
57976
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. The full form
Read More

by Rohit Sharma

11 Jul 2024

Top 7 Data Types of Python | Python Data Types
99341
Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat
Read More

by Rohit Sharma

11 Jul 2024

What is Decision Tree in Data Mining? Types, Real World Examples & Applications
16859
Introduction to Data Mining In its raw form, data requires efficient processing to transform into valuable information. Predicting outcomes hinges on
Read More

by Rohit Sharma

04 Jul 2024

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About
82673
What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes
Read More

by Rohit Sharma

04 Jul 2024

Most Common Binary Tree Interview Questions & Answers [For Freshers & Experienced]
10181
Introduction Data structures are one of the most fundamental concepts in object-oriented programming. To explain it simply, a data structure is a par
Read More

by Rohit Sharma

03 Jul 2024

Data Science Vs Data Analytics: Difference Between Data Science and Data Analytics
70189
Summary: In this article, you will learn, Difference between Data Science and Data Analytics Job roles Skills Career perspectives Which one is right
Read More

by Rohit Sharma

02 Jul 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon