Collecting Tweets from Twitter API using Python

Introduction to Tweepy

I have been interested in the Python programming language for a while. My goal is to somehow improve my Machine Learning skills in addition to the data analytics process. So, I've set a few thresholds, including machine learning, to see self-improvement progress. The first threshold is to work on the Twitter API. Later on, I will implement Flask and Vue frameworks to build a basic sentiment analysis app.

AA

I reached the Tools and Libraries page shared by Twitter and examined the solutions that can be used1. On this page, there are many options that vary in terms of features, prepared with many programming languages such as JavaScript, Go, Python, Lua, Julia, R, and Ruby. So, I've decided to pick Python and its tweepy library.2.

Tweepy

Tweepy is one of the popular Python libraries developed for Twitter API operations2.

The library can easily install with the pip install tweepy command.

You can quickly list the posts on certain topics or by users (up to 20 most relevant for timeline, up to 100 (re)tweets for each query for id), user, and user. It is possible to access events and share automatically3 4 5 6 7 8.

The maximum number of requests allowed for Twitter API v2 and Standard v1.1 is 900 requests/15 minutes for an endpoint. For an application, the limit is 300 requests9 at a certain time.

Tweepy's functionality provides the following features in line with the Twitter API:

  • OAuth
  • API class
  • Models
  • Cursors
  • Streams

I'll go with Standard v1.1 in the example flow below. Twitter API v2 is not currently covered by tweepy v3. First, I will get the data of the home timeline using the tweepy.API() class10 11.

import tweepy
import pandas as pd

consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_token_secret = '...'

# OAuth
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# API class
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# API class > Mehods > User Timelines
public_tweets = api.home_timeline()
df = pd.DataFrame([tweet.text for tweet in public_tweets])

The following methods are accessible through the API class12:

  • User timelines
  • Tweets
  • Users
  • Followers
  • User Account
  • Likes
  • Blocking users
  • Searches
  • Trends
  • Streaming

So let's go to our example flow and reach only tweets containing the hashtag we forwarded within a certain date range.

First, let's create our virtual environment (venv) and separate the work from other Python environments13. I used twBot as the project name; you can change it as you wish. However, it would be best not to forget to maintain this change in the next steps.

python3 -m venv twBot
cd twBot && source bin/activate && which pip

After the execution of these commands, the pip path in the project folder should return. For example, /Users/user/Desktop/flask/bin/pip

You can see the relevant packages and their versions below.

certifi==2021.5.30
charset-normalizer==2.0.3
DateTime==4.3
idna==3.2
oauthlib==3.1.1
PySocks==1.7.1
pytz==2021.1
requests==2.26.0
requests-oauthlib==1.3.0
six==1.16.0
tweepy==3.10.0
urllib3==1.26.6
zope.interface==5.4.0

You can save this list in the project directory with the name requirements.txt and call it from the directory with pip.

bin/pip install -r requirements.txt

After the newly added packages, you can use the pip freeze > requirements.txt command to update the requirements.txt file contents.

Yes, our Python code snippet is14:

It would be best if you did not forget to edit the credentials contained in this code snippet.

tw = getTweets(
    consumer_key='...',
    consumer_secret='...',
    access_token='...',
    access_token_secret='...'
)

If you save the relevant code snippet as a file (you can also use it through the colab), you can run it regularly with bin/python and have the tweets compiled as files.

bin/python ./getTweets.py

The query parameter in the code snippet below specifies which topics we will collect tweets on.

The -filter:retweets parameter will exclude retweets. With has:images -is:retweet parameter makes it possible to get (re)tweets if includes an image. Also, usernames can be used as a parameter; from:ceaksan from:twitterapi has:links15. The query is being handled on the basis of item.

However, if you want, you can also access tweets based on page with a small change16.

tw.writeToFile(filename='twitter.csv',
               query='#Tokyo2020 -filter:retweets',
               count=100,
               lang='tr',
               items=100)

Again, as a continuation of the above code snippet, let's perform Model operations. First, let's list the mentions and then favorite that mention and then follow the relevant person.

tweets = tw.api.mentions_timeline()
for tweet in tweets:
  print(tweet.text)
  tweet.favorite()
  tweet.user.follow()

Finally, let's follow the tweets instantly with the stream17 18.

class TWStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print(status.text)

TWListener = TWStreamListener()
twStream = tweepy.Stream(auth = tw.api.auth, listener=TWListener)
twStream.filter(track=['Tokyo2020', 'Turkey'])