Collecting Tweets from Twitter API using Python
Introduction to Tweepy
I have been interested in the Python programming language for a while. My goal is to somehow improve my Machine Learning skills in addition to the data analytics process. So, I've set a few thresholds, including machine learning, to see self-improvement progress. The first threshold is to work on the Twitter API. Later on, I will implement Flask and Vue frameworks to build a basic sentiment analysis app.
Tweepy is one of the popular Python libraries developed for Twitter API operations2.
The library can easily install with the
pip install tweepy command.
You can quickly list the posts on certain topics or by users (up to 20 most relevant for timeline, up to 100 (re)tweets for each query for id), user, and user. It is possible to access events and share automatically3 4 5 6 7 8.
The maximum number of requests allowed for Twitter API v2 and Standard v1.1 is 900 requests/15 minutes for an endpoint. For an application, the limit is 300 requests9 at a certain time.
Tweepy's functionality provides the following features in line with the Twitter API:
- API class
import tweepy import pandas as pd consumer_key = '...' consumer_secret = '...' access_token = '...' access_token_secret = '...' # OAuth auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) # API class api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) # API class > Mehods > User Timelines public_tweets = api.home_timeline() df = pd.DataFrame([tweet.text for tweet in public_tweets])
The following methods are accessible through the API class12:
- User timelines
- User Account
- Blocking users
So let's go to our example flow and reach only tweets containing the hashtag we forwarded within a certain date range.
First, let's create our virtual environment (venv) and separate the work from other Python environments13. I used
twBot as the project name; you can change it as you wish. However, it would be best not to forget to maintain this change in the next steps.
python3 -m venv twBot cd twBot && source bin/activate && which pip
After the execution of these commands, the
pip path in the project folder should return. For example,
You can see the relevant packages and their versions below.
certifi==2021.5.30 charset-normalizer==2.0.3 DateTime==4.3 idna==3.2 oauthlib==3.1.1 PySocks==1.7.1 pytz==2021.1 requests==2.26.0 requests-oauthlib==1.3.0 six==1.16.0 tweepy==3.10.0 urllib3==1.26.6 zope.interface==5.4.0
You can save this list in the project directory with the name
requirements.txt and call it from the directory with
bin/pip install -r requirements.txt
After the newly added packages, you can use the
pip freeze > requirements.txt command to update the
requirements.txt file contents.
Yes, our Python code snippet is14:
It would be best if you did not forget to edit the credentials contained in this code snippet.
tw = getTweets( consumer_key='...', consumer_secret='...', access_token='...', access_token_secret='...' )
If you save the relevant code snippet as a file (you can also use it through the colab), you can run it regularly with
bin/python and have the tweets compiled as files.
query parameter in the code snippet below specifies which topics we will collect tweets on.
-filter:retweets parameter will exclude retweets. With
has:images -is:retweet parameter makes it possible to get (re)tweets if includes an image. Also, usernames can be used as a parameter;
from:ceaksan from:twitterapi has:links15. The query is being handled on the basis of item.
However, if you want, you can also access tweets based on page with a small change16.
tw.writeToFile(filename='twitter.csv', query='#Tokyo2020 -filter:retweets', count=100, lang='tr', items=100)
Again, as a continuation of the above code snippet, let's perform Model operations. First, let's list the mentions and then favorite that mention and then follow the relevant person.
tweets = tw.api.mentions_timeline() for tweet in tweets: print(tweet.text) tweet.favorite() tweet.user.follow()
class TWStreamListener(tweepy.StreamListener): def on_status(self, status): print(status.text) TWListener = TWStreamListener() twStream = tweepy.Stream(auth = tw.api.auth, listener=TWListener) twStream.filter(track=['Tokyo2020', 'Turkey'])
- Twitter API Tools and libraries. Twitter Developer Platform ↩
- Tweepy Documentation. Tweepy ↩ ↩
- Tutorials. Twitter Developer Platform ↩
- API Reference. Tweepy ↩
- Data dictionary ↩
- Miguel Garcia. How to Make a Twitter Bot in Python With Tweepy ↩
- Getting historical Tweets using the full-archive search endpoint. Twitter Developer Platform ↩
- Explore a user’s Tweets. Twitter Developer Platform ↩
- Rate limits. Twitter Developer Platform ↩
- Get Tweet timelines. Twitter Developer Platform ↩
- Tweepy 3.10.0, AttributeError: module 'tweepy' has no attribute 'Client'. stackoverflow ↩
- Twitter API v1.1 Reference ↩
- Installing packages using pip and virtual environments. PyPA ↩
- ceaksan/GetTweets.py. GitHub ↩
- Listen for important events. Twitter Developer Platform ↩
- Items or Pages. Cursor Tutorial. Tweepy ↩
- Streaming With Tweepy ↩
- Stream Tweets in real-time. Twitter Developer Platform ↩