Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconScraping Twitter Data With Python [With 2 APIs]

Scraping Twitter Data With Python [With 2 APIs]

Last updated:
30th Nov, 2020
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
Scraping Twitter Data With Python [With 2 APIs]

Introduction

Social media platforms like Twitter are one of the great repositories for gathering datasets. Working on a new data science project requires a fair amount of data, gathering the dataset is not an easy task.

And Twitter provides a diversified genre of data because it is a collection of tweets from people with different mindsets and different sentiments. This kind of a dataset without bias is a much-needed prerequisite for training a new machine learning model.

Let’s get started!

We are going to walk through 2 APIs for Twitter data scraping.

  1. Tweepy
  2. Twint

Tweepy

Before we start walking through python code for scraping data using Tweepy API, there’s a point you need to know that we need credentials of a Twitter developer account and it’s a piece of cake if you already have them.

For the people who don’t have a developer account, you can apply for that over here. And before applying for a developer account you need to have a Twitter account. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.

Once you receive approval for the developer account, make a note of your consumer API keys, access token, and access token secret from the “keys and tokens” section.

Also, there’s a point to be noted that there are few constraints for tweepy like you can only scrape tweets that are not older than a week. And a limit in scraping, up to 18000 tweets in a 15 minutes time frame.

Great, now that we have keys and tokens from the developer account let’s authorize them.

consumer_key = your consumer key

consumer_secret = your consumer secret

access_token = your access token

access_token_secret = your token secret

authorization = tweepy.OAuthHandler(consumer_key, consumer_secret)

authorization.set_access_token(access_token, access_token_secret)

api = tweepy.API(authorization,wait_on_rate_limit=True)

Now that we have authorized with our credentials, Let’s scrape tweets of a particular account. For now, let’s scrape tweets of Mr. Sundar Pichai. 

 

username = sundarpichai

count = 100

try:

 #line1

 tweets_obj = tweepy.Cursor(api.user_timeline,id=username).items(count)

  #line2

 tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

  #line3

 tweets_df = pd.DataFrame(tweets_list)

except BaseException as e:

      print(something went wrong, ,str(e))

In the above snippet, line1 creates an iterable object with all the tweets and it is assigned to a variable “tweets_obj”. Once we are done with creating an iterable object, let’s iterate over it and extract all the data.

We are extracting only a few attributes like “created_at”, “id”, “text” and appending them to each entry in a 2D array. Where each entry has all the data of each tweet we scraped. Now that we have a 2D array with attributes as each entry we can convert it to a data frame by using “pd.DataFrame()” syntax.

The reason for converting an array to a data frame is more flexibility, availability of predefined methods, and easy access makes it stand out from all the data structures for data science projects.

Similarly, let’s walk through a code for scraping data that has a particular text query.

text_query = vocal for local

count = 100

try:

 #line1

 tweets_obj = tweepy.Cursor(api.search,q=text_query).items(count)

  #line2

 tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

  #line3

 df = pd.DataFrame(tweets_list)

 except BaseException as e:

    print(something went wrong, ,str(e))

 

In the above snippet, everything is the same as the previous snippet. At last, we created a data frame with all the tweets containing the text query “vocal for local”.

If you are looking for a more specific or customized data scraping like including more attributes like retweets count, favorites count, etc. We can customize our syntax and extract other attributes provided by tweepy. For further reading on other attributes offered by tweepy have a look at the documentation.

Twint

Twint API doesn’t require any developer account credentials, you can scrape tweets easily without any authorization keys. Also, twint doesn’t have any restrictions like the number of tweets, time frames, scraping limits, etc. Twint provides you a seamless data scraping and an easy to use API.

We can print the list of followers of a person using his username from the twint API.

t_obj = twint.Config()

t_obj.Username = sundarpichai

twint.run.Followers(t_obj)

In the above snippet twint.Config() configures the twint API and makes things get started. And after assigning an object we can use that reference for our work, “t_obj.Username” assigns the username which we’ve entered. And twint.run.Followers perform a search of all followers of that username.

We can also store the scraped data into a data frame similar to the tweepy API.

t_obj.Limit = 100

t_obj.Username = sundarpichai

t_obj.Pandas = True

twint.run.Followers(t_obj)

result_df = twint.storage.panda.User_df

Everything in the snippet is almost the same as the previous snippet, only with an extra line of syntax “twint.storage.panda.User_df” which converts the scraped data to a data frame. The result data frame consists of a list of followers of the username given.

Explore our Popular Data Science Online Courses

Now that we have seen scraping the follower data of a particular username, let’s walk through the code for scraping tweets of a particular account.

Read our popular Data Science Articles

t_obj.Search = from:@sundarpichai

t_obj.Store_object = True

t_obj.Limit = 20

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

In the above snippet, we are making the configured object to search tweets of a particular person, we can also set the limit of tweets while scraping using the syntax “t_obj.Limit”. And after running the search, it creates a list of all the tweets and we can assign it to a local variable as per our need.

After seeing the snippets of scraping info of a particular account, you may have a quick question which is how to scrape tweets containing a particular keyword?. Not an issue twint has a solution for this.

t_obj.Search = data science

t_obj.Store_object = True

t_obj.Limit = 100

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

The above snippet is the same as the snippet for scraping tweets from a particular account, with a single difference in line1. We can also convert it to a data frame as per our convenience.

For further reading on twint API have a look at their repository and documentation.

Our learners also read: Top Python Courses for Free

Top Data Science Skills to Learn to upskill

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on How to Build Digital & Data Mindset?

 

Conclusion

We have understood the importance of scraping the data. Walked through two APIs and their features for scraping Twitter data. Seen a few methods for converting the scraped data into our required file format. Now that you are aware of these APIs, start scraping data for your data science projects!

We at upGrad are happy to help you and would also like to let you know about the opportunities that you can have by learning python. Python has been used extensively for Machine Learning and Data Science, two of the most popular and emerging technologies. Learning Python and also having knowledge of these skills will make you excel in your field and get better career opportunities.

We have a lot of courses developed along with industry experts and top academic institutes to provide you with all the skills required to excel in this field. Some of the courses that can help you make use of your knowledge in python and increase your career prospects: 

Data Science:

Check out upGrad’s Online Data Science Programs developed with IIIT-B, it is a full-fledged data science course to enter into this field and make a mark in the industries with your knowledge. 

Masters of Science in Data Science: Developed in coordination with Liverpool John Moores University and IIIT-B, got a master’s degree in Data Science from one of the top universities of the world. 

Machine Learning:

Advance Certification in Machine Learning and AI: IIT madras, one of the best educational institutions of India, has partnered with upGrad to make an advanced course on Machine Learning for individuals to have complete knowledge of Machine Learning with this course. 

Masters of Science in Machine Learning and AI: Liverpool John Moores University and IIIT-B have together partnered with upGrad to provide complete masters of science degrees for the individuals to learn the technology in detail and get a formal degree in this technology to pave a successful path in this field.

PG Diploma in Machine Learning and AI: IIIT-B and upGrad came together to help individuals get an opportunity to do a 12-month long course on Machine Learning and AI and have the chance to enter this technology with this course.

Check out all trending Python tutorial concepts in 2024.

Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1What is the scraping of data?

Data scraping refers to a process in which a computer software scrapes data from an output created by another program. Web scraping is a type of data scraping that is used to gather data or information from different websites. In web scraping, an application is used to collect valuable information from a website. The web scraping program can rapidly and easily access the WWW(World Wide Web) using the HTML(Hypertext Transfer Protocol) or a web browser.

2Why is data scraping necessary on Twitter?

Data scraping on social media aids in tracking, evaluating, and scrutinizing the data available on the platforms. Twitter is the most popular platform, and scraping Twitter data helps users to analyze user behavior, competition strategy, sentiment analysis, and remain up to speed with what's occurring on the world's most popular social channel from the tweets of people, peers, and businesses that matter to you. Twitter data scraping service handles your end-to-end needs in the lowest amount of time and gives you the necessary data. Twitter, for example, only allows crawlers to collect data via its API to restrict the quantity of information on their users and their activities.

3What are APIs?

Application Programming Interfaces are the tiny bits of code that allow digital devices, software programs, and data servers to communicate with one another, and they are the vital backbone of many of the services we currently rely on. An API connects computers or pieces of software to one other, as opposed to a user interface, which connects a computer to a human. It is not designed for direct usage by anybody (the end-user) other than a computer programmer who incorporates it into the software. An API is frequently composed of many components that serve as tools or services to the programmer.

Explore Free Courses

Suggested Blogs

Top 13 Highest Paying Data Science Jobs in India [A Complete Report]
905213
In this article, you will learn about Top 13 Highest Paying Data Science Jobs in India. Take a glimpse below. Data Analyst Data Scientist Machine
Read More

by Rohit Sharma

12 Apr 2024

Most Common PySpark Interview Questions & Answers [For Freshers & Experienced]
20904
Attending a PySpark interview and wondering what are all the questions and discussions you will go through? Before attending a PySpark interview, it’s
Read More

by Rohit Sharma

05 Mar 2024

Data Science for Beginners: A Comprehensive Guide
5066
Data science is an important part of many industries today. Having worked as a data scientist for several years, I have witnessed the massive amounts
Read More

by Harish K

28 Feb 2024

6 Best Data Science Institutes in 2024 (Detailed Guide)
5169
Data science training is one of the most hyped skills in today’s world. Based on my experience as a data scientist, it’s evident that we are in
Read More

by Harish K

28 Feb 2024

Data Science Course Fees: The Roadmap to Your Analytics Career
5075
A data science course syllabus covers several basic and advanced concepts of statistics, data analytics, machine learning, and programming languages.
Read More

by Harish K

28 Feb 2024

Inheritance in Python | Python Inheritance [With Example]
17627
Python is one of the most popular programming languages. Despite a transition full of ups and downs from the Python 2 version to Python 3, the Object-
Read More

by Rohan Vats

27 Feb 2024

Data Mining Architecture: Components, Types & Techniques
10800
Introduction Data mining is the process in which information that was previously unknown, which could be potentially very useful, is extracted from a
Read More

by Rohit Sharma

27 Feb 2024

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About
80734
What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes
Read More

by Rohit Sharma

19 Feb 2024

Sorting in Data Structure: Categories & Types [With Examples]
139091
The arrangement of data in a preferred order is called sorting in the data structure. By sorting data, it is easier to search through it quickly and e
Read More

by Rohit Sharma

19 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon