Scraping Twitter Data With Python [With 2 APIs]

Introduction

Social media platforms like Twitter are one of the great repositories for gathering datasets. Working on a new data science project requires a fair amount of data, gathering the dataset is not an easy task.

And Twitter provides a diversified genre of data because it is a collection of tweets from people with different mindsets and different sentiments. This kind of a dataset without bias is a much-needed prerequisite for training a new machine learning model.

Let’s get started!

We are going to walk through 2 APIs for Twitter data scraping.

Tweepy
Twint

Tweepy

Before we start walking through python code for scraping data using Tweepy API, there’s a point you need to know that we need credentials of a Twitter developer account and it’s a piece of cake if you already have them.

For the people who don’t have a developer account, you can apply for that over here. And before applying for a developer account you need to have a Twitter account. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.

Once you receive approval for the developer account, make a note of your consumer API keys, access token, and access token secret from the “keys and tokens” section.

Also, there’s a point to be noted that there are few constraints for tweepy like you can only scrape tweets that are not older than a week. And a limit in scraping, up to 18000 tweets in a 15 minutes time frame.

Great, now that we have keys and tokens from the developer account let’s authorize them.

consumer_key = “your consumer key“

consumer_secret = “your consumer secret“

access_token = “your access token“

access_token_secret = “your token secret“

authorization = tweepy.OAuthHandler(consumer_key, consumer_secret)

authorization.set_access_token(access_token, access_token_secret)

api = tweepy.API(authorization,wait_on_rate_limit=True)

Now that we have authorized with our credentials, Let’s scrape tweets of a particular account. For now, let’s scrape tweets of Mr. Sundar Pichai.

username = ‘sundarpichai‘

count = 100

try:

#line1

tweets_obj = tweepy.Cursor(api.user_timeline,id=username).items(count)

#line2

tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

#line3

tweets_df = pd.DataFrame(tweets_list)

except BaseException as e:

print(‘something went wrong, ‘,str(e))

In the above snippet, line1 creates an iterable object with all the tweets and it is assigned to a variable “tweets_obj”. Once we are done with creating an iterable object, let’s iterate over it and extract all the data.

We are extracting only a few attributes like “created_at”, “id”, “text” and appending them to each entry in a 2D array. Where each entry has all the data of each tweet we scraped. Now that we have a 2D array with attributes as each entry we can convert it to a data frame by using “pd.DataFrame()” syntax.

The reason for converting an array to a data frame is more flexibility, availability of predefined methods, and easy access makes it stand out from all the data structures for data science projects.

Similarly, let’s walk through a code for scraping data that has a particular text query.

text_query = ‘vocal for local‘

count = 100

try:

#line1

tweets_obj = tweepy.Cursor(api.search,q=text_query).items(count)

#line2

tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

#line3

df = pd.DataFrame(tweets_list)

except BaseException as e:

print(‘something went wrong, ‘,str(e))

In the above snippet, everything is the same as the previous snippet. At last, we created a data frame with all the tweets containing the text query “vocal for local”.

If you are looking for a more specific or customized data scraping like including more attributes like retweets count, favorites count, etc. We can customize our syntax and extract other attributes provided by tweepy. For further reading on other attributes offered by tweepy have a look at the documentation.

Twint

Twint API doesn’t require any developer account credentials, you can scrape tweets easily without any authorization keys. Also, twint doesn’t have any restrictions like the number of tweets, time frames, scraping limits, etc. Twint provides you a seamless data scraping and an easy to use API.

We can print the list of followers of a person using his username from the twint API.

t_obj = twint.Config()

t_obj.Username = “sundarpichai“

twint.run.Followers(t_obj)

In the above snippet twint.Config() configures the twint API and makes things get started. And after assigning an object we can use that reference for our work, “t_obj.Username” assigns the username which we’ve entered. And twint.run.Followers perform a search of all followers of that username.

We can also store the scraped data into a data frame similar to the tweepy API.

t_obj.Limit = 100

t_obj.Username = “sundarpichai“

t_obj.Pandas = True

twint.run.Followers(t_obj)

result_df = twint.storage.panda.User_df

Everything in the snippet is almost the same as the previous snippet, only with an extra line of syntax “twint.storage.panda.User_df” which converts the scraped data to a data frame. The result data frame consists of a list of followers of the username given.

Explore our Popular Data Science Online Courses

Executive Post Graduate Programme in Data Science from IIITB	Professional Certificate Program in Data Science for Business Decision Making	Master of Science in Data Science from University of Arizona
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Online Courses

Now that we have seen scraping the follower data of a particular username, let’s walk through the code for scraping tweets of a particular account.

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	Top 6 Reasons Why You Should Become a Data Scientist
A Day in the Life of Data Scientist: What do they do?	Myth Busted: Data Science doesn’t need Coding	Business Intelligence vs Data Science: What are the differences?

t_obj.Search = “from:@sundarpichai“

t_obj.Store_object = True

t_obj.Limit = 20

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

In the above snippet, we are making the configured object to search tweets of a particular person, we can also set the limit of tweets while scraping using the syntax “t_obj.Limit”. And after running the search, it creates a list of all the tweets and we can assign it to a local variable as per our need.

After seeing the snippets of scraping info of a particular account, you may have a quick question which is how to scrape tweets containing a particular keyword?. Not an issue twint has a solution for this.

t_obj.Search = “data science“

t_obj.Store_object = True

t_obj.Limit = 100

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

The above snippet is the same as the snippet for scraping tweets from a particular account, with a single difference in line1. We can also convert it to a data frame as per our convenience.

For further reading on twint API have a look at their repository and documentation.

Our learners also read: Top Python Courses for Free

Top Data Science Skills to Learn to upskill

SL. No	Top Data Science Skills to Learn
1	Data Analysis Online Courses	Inferential Statistics Online Courses
2	Hypothesis Testing Online Courses	Logistic Regression Online Courses
3	Linear Regression Courses	Linear Algebra for Analysis Online Courses

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on How to Build Digital & Data Mindset?

Conclusion

We have understood the importance of scraping the data. Walked through two APIs and their features for scraping Twitter data. Seen a few methods for converting the scraped data into our required file format. Now that you are aware of these APIs, start scraping data for your data science projects!

We at upGrad are happy to help you and would also like to let you know about the opportunities that you can have by learning python. Python has been used extensively for Machine Learning and Data Science, two of the most popular and emerging technologies. Learning Python and also having knowledge of these skills will make you excel in your field and get better career opportunities.

We have a lot of courses developed along with industry experts and top academic institutes to provide you with all the skills required to excel in this field. Some of the courses that can help you make use of your knowledge in python and increase your career prospects:

Data Science:

Check out upGrad’s Online Data Science Programs developed with IIIT-B, it is a full-fledged data science course to enter into this field and make a mark in the industries with your knowledge.

Masters of Science in Data Science: Developed in coordination with Liverpool John Moores University and IIIT-B, got a master’s degree in Data Science from one of the top universities of the world.

Machine Learning:

Advance Certification in Machine Learning and AI: IIT madras, one of the best educational institutions of India, has partnered with upGrad to make an advanced course on Machine Learning for individuals to have complete knowledge of Machine Learning with this course.

Masters of Science in Machine Learning and AI: Liverpool John Moores University and IIIT-B have together partnered with upGrad to provide complete masters of science degrees for the individuals to learn the technology in detail and get a formal degree in this technology to pave a successful path in this field.

PG Diploma in Machine Learning and AI: IIIT-B and upGrad came together to help individuals get an opportunity to do a 12-month long course on Machine Learning and AI and have the chance to enter this technology with this course.

Check out all trending Python tutorial concepts in 2024.

Frequently Asked Questions (FAQs)

1. What is the scraping of data?

Data scraping refers to a process in which a computer software scrapes data from an output created by another program. Web scraping is a type of data scraping that is used to gather data or information from different websites. In web scraping, an application is used to collect valuable information from a website. The web scraping program can rapidly and easily access the WWW(World Wide Web) using the HTML(Hypertext Transfer Protocol) or a web browser.

2. Why is data scraping necessary on Twitter?

Data scraping on social media aids in tracking, evaluating, and scrutinizing the data available on the platforms. Twitter is the most popular platform, and scraping Twitter data helps users to analyze user behavior, competition strategy, sentiment analysis, and remain up to speed with what's occurring on the world's most popular social channel from the tweets of people, peers, and businesses that matter to you. Twitter data scraping service handles your end-to-end needs in the lowest amount of time and gives you the necessary data. Twitter, for example, only allows crawlers to collect data via its API to restrict the quantity of information on their users and their activities.

3. What are APIs?

Application Programming Interfaces are the tiny bits of code that allow digital devices, software programs, and data servers to communicate with one another, and they are the vital backbone of many of the services we currently rely on. An API connects computers or pieces of software to one other, as opposed to a user interface, which connects a computer to a human. It is not designed for direct usage by anybody (the end-user) other than a computer programmer who incorporates it into the software. An API is frequently composed of many components that serve as tools or services to the programmer.

Suggested Blogs

905213

Top 13 Highest Paying Data Science Jobs in India [A Complete Report]

In this article, you will learn about Top 13 Highest Paying Data Science Jobs in India. Take a glimpse below. Data Analyst Data Scientist Machine

by Rohit Sharma

12 Apr 2024

20904

Most Common PySpark Interview Questions & Answers [For Freshers & Experienced]

Attending a PySpark interview and wondering what are all the questions and discussions you will go through? Before attending a PySpark interview, it’s

by Rohit Sharma

05 Mar 2024

5066

Data Science for Beginners: A Comprehensive Guide

Data science is an important part of many industries today. Having worked as a data scientist for several years, I have witnessed the massive amounts

by Harish K

28 Feb 2024

5169

6 Best Data Science Institutes in 2024 (Detailed Guide)

Data science training is one of the most hyped skills in today’s world. Based on my experience as a data scientist, it’s evident that we are in

by Harish K

28 Feb 2024

5075

Data Science Course Fees: The Roadmap to Your Analytics Career

A data science course syllabus covers several basic and advanced concepts of statistics, data analytics, machine learning, and programming languages.

by Harish K

28 Feb 2024

17627

Inheritance in Python | Python Inheritance [With Example]

Python is one of the most popular programming languages. Despite a transition full of ups and downs from the Python 2 version to Python 3, the Object-

by Rohan Vats

27 Feb 2024

10800

Data Mining Architecture: Components, Types & Techniques

Introduction Data mining is the process in which information that was previously unknown, which could be potentially very useful, is extracted from a

by Rohit Sharma

27 Feb 2024

80734

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About

What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes

by Rohit Sharma

19 Feb 2024

139091

Sorting in Data Structure: Categories & Types [With Examples]

The arrangement of data in a preferred order is called sorting in the data structure. By sorting data, it is easier to search through it quickly and e

by Rohit Sharma

19 Feb 2024

Scraping Twitter Data With Python [With 2 APIs]

Introduction

Tweepy

Twint

Explore our Popular Data Science Online Courses

Read our popular Data Science Articles

Top Data Science Skills to Learn to upskill

Conclusion

Rohit Sharma

Our Trending Data Science Courses

Our Popular Data Science Course

Data Science Skills to Master

Our Trending Data Science Courses

Frequently Asked Questions (FAQs)

Explore Free Courses

Suggested Blogs

Scraping Twitter Data With Python [With 2 APIs]

Introduction

Tweepy

Twint

Explore our Popular Data Science Online Courses

Read our popular Data Science Articles

Top Data Science Skills to Learn to upskill

Conclusion

Rohit Sharma

Our Trending Data Science Courses

Our Popular Data Science Course

Data Science Skills to Master

Our Trending Data Science Courses

Frequently Asked Questions (FAQs)

Related ProgramsView All

Explore Free Courses

Suggested Blogs