Twitter Scraper

Tayyab Kharl

Last update: Dec 30, 2022

Related tags

Web Crawling tweety

Overview

tweety

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely fast.

Prerequisites

Before you begin, ensure you have met the following requirements:

Internet Connection
Python 3.6+
BeautifulSoup (Python Module)
Requests (Python Module)

All Functions

get_tweets()
get_user_info()
get_trends() (can be used without username)
search() (can be used without username)
tweet_detail() (can be used without username)

Using tweety

Getting Tweets:

Description:

Get 20 Tweets of a Twitter User

Required Parameter:

Username or User profile URL while initiating the Twitter Object

Optional Parameter:

pages : int (default is 1,starts from 2) -> Get the mentioned number of pages of tweets
include_extras : boolean (default is False) -> Get different extras on the page like Topics etc

Output:

Type -> dictionary

Structure

    {
      "p-1" : {
        "result": {
            "tweets": []
        }
      },
      "p-2":{
        "result": {
            "tweets": []
        }
      }
    }

Example:

>> from tweet import Twitter >>> all_tweet = Twitter("Username or URL").get_tweets(pages=2) >>> for i in all_tweet: ... print(all_tweet[i]) ">

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> all_tweet = Twitter("Username or URL").get_tweets(pages=2)
>>> for i in all_tweet:
...   print(all_tweet[i])

Getting Trends:

Description:

Get 20 Locale Trends

Output:

Type -> dictionary

Structure

 ", "url":"
  " }, { "name":"
  
   ", "url":"
   
    " } ] } "> 
      {
    "trends":[
      {
        "name":"
      
       "
      ,
        "url":"
      
       "
      
      },
      {
        "name":"
      
       "
      ,
        "url":"
      
       "
      
      }
    ]
  }  
   
  
 

Example :

>> from tweet import Twitter >>> trends = Twitter().get_trends() >>> for i in trends['trends']: ... print(i['name']) ">

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter().get_trends()
>>> for i in trends['trends']:
...   print(i['name'])

Searching a keyword:

Description:

Get 20 Tweets for a specific Keyword or Hashtag

Required Parameter:

keyword : str -> Keyword begin search

Optional Parameter:

latest : boolean (Default is False) -> Get the latest tweets

Output:

Type -> list

Example:

>> from tweet import Twitter >>> trends = Twitter().search("Pakistan") ">

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter().search("Pakistan")

Getting USER Info:

Description:

Get the information about the user

Required Parameter:

Username or User profile URL while initiating the Twitter Object

Optional Parameter:

banner_extensions : boolean (Default is False) -> get more information about user banner image
image_extensions : boolean (Default is False) -> get more information about user profile image

Output:

Type -> dict

Example:

>> from tweet import Twitter >>> trends = Twitter("Username or URL").get_user_info() ">

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter("Username or URL").get_user_info()

Getting a Tweet Detail:

Description:

Get the detail of a tweet including its reply

Required Parameter:

Identifier of the Tweet -> Either Tweet URL OR Tweet ID

Output:

Type -> dict
Structure

  {
    "conversation_threads":[],
    "tweet": {}
  }

Example:

>> from tweet import Twitter >>> trends = Twitter().tweet_detail("https://twitter.com/Microsoft/status/1442542812197801985") ">

python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter().tweet_detail("https://twitter.com/Microsoft/status/1442542812197801985")

Updates:

Update 0.1:

Get Multiple Pages of tweets using pages parameter in get_tweets() function
output of get_tweets has been reworked.

Update 0.2:

Again reworked and simplified tweets in get_tweets function 😜
Added tweet_detail function for getting details about a tweet including replies to it

Update 0.2.1:

Fixed Hashtag Search

Comments

tweet.card: Card without choices throws: 'NoneType' has no len()

Just started, yet getting some error when trying to access tweet.card that has no choices:

Traceback (most recent call last): File "/Users/tom/Dev/test/gayTwitter/guys.py", line 20, in <module> print(tweet.card) File "/opt/homebrew/lib/python3.10/site-packages/tweety/types/twDataTypes.py", line 610, in __repr__ return f"Card(id={self.rest_id}, choices={len(self.choices) if self.choices else []}, end_time={self.end_time}, duration={len(self.duration)} minutes)" TypeError: object of type 'NoneType' has no len()

opened by thomasf1 7
ValueError: sheet is not in list

Im running a very simple script to scrape and publish file .to_xlsx() but every time I do this I get the below error in the title of this issue.

See script I am running below:

If I scrape and print each tweet, that works fine. However, if I want to change filetime to xlsx and ultimately csv after, I am met with this error.

Please provide guidance if I am missing something!

opened by rm0nroe 5
Get quoted tweet
I can see how to get the retweet posts, but the API seems to have no way of getting quoted tweets.

tweets_iter = Twitter(nickname).get_tweets() for post in tweets_iter: msg = post.tweet_body timest = post.created_on print(post.is_retweet, post.author.screen_name, msg, post.threads)

Even this code will display the person who retweeted the post when is_retweet is True, but not the original poster.
opened by narodnik 3
pip install tweety-ns does not install latest versions
After installing with pip install tweety-ns and pip3 install tweety-ns, I am met with this error:

ModuleNotFoundError: No module named 'tweety.bot'; 'tweety' is not a package

Please provide guidance. I am simply trying to run the below.

from tweety.bot import Twitter app = Twitter("elonmusk") all_tweets = app.get_tweets() for tweet in all_tweets: print(tweet)
opened by rm0nroe 2
Get followers for a user

Thanks for this great library.

I went through the user API and there is nothing to get the followers for a user. Is this possible? I noticed you cannot access followers without logging in.

opened by narodnik 2
Rate limit exceeded

I was trying to scrape a whole user's tweet history and after some minutes I got the following error: "requests.exceptions.JSONDecodeError: [Errno Expecting value] Rate limit exceeded"

Could you please increase the amount of time it takes to move into the next page of tweets by default? Ideally there should be a parameter we could send to the get_tweets() function. In that way users will be able to adjust the amount of time the time.sleep() function should take so it doesn't impair users who can max out their scraping capabilities.

opened by epremuz 2
Error: Tweet' object is not subscriptable

Hello my friend.

First of all, thanks for the beatufiul way of implementing this internal/public Twitter API. I've been testing a few weeks ago, and it seems really nice. Thanks for sharing with community.

I was using version 0.1.2 and it was working fine, however after updating to 0.2, im receiving the following error:

Erro de conexão/Connection Error : 'Tweet' object is not subscriptable

Do you have any idea what it could be wrong ?

opened by cangarot 2
documentation/update readme to fix array reference
First, thank you for referencing this repository from twitter-scraper #197

In setup, I found that you have a typo in your readme. It errors in that "tweets" is not found. See below:

all_tweets = app.get_tweets() for tweet in tweets:

for tweet in tweets: should be updated to for tweet in all_tweets:

Cheers 🍻
opened by rm0nroe 1

Retweet author field doesn't show the actual author of the tweet

Thanks for updating quoted tweets. That works correctly now. However the retweets don't show the correct author:

from tweety.bot import Twitter

app = Twitter("cobie")

for tweet in app.get_tweets():
    if tweet.is_retweet:
        print("RT")
    print(tweet.author)
    print(tweet.text)
    #print(tweet)
    print()

This will show:

...

RT
User(id=2259434528, name=Cobie, username=cobie, followers=780689, verified=True)
BREAKING: Bank of International Settlements finalizes policy to let banks hold 2% of reserves in #Bitcoin

...

When it should actually have the username as the person who wrote the tweet (not the person who RT).

opened by narodnik 1

Owner

Tayyab Kharl

Newbie But Passionate

GitHub

Twitter Scraper

45 Dec 30, 2022

A universal package of scraper scripts for humans

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

299 Dec 15, 2022

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

4.8k Jan 4, 2023

A web scraper that exports your entire WhatsApp chat history.

WhatSoup ?? A web scraper that exports your entire WhatsApp chat history. Table of Contents Overview Demo Prerequisites Instructions Frequen

87 Jan 6, 2023

Python scraper to check for earlier appointments in Clalit Health Services

clalit-appt-checker Python scraper to check for earlier appointments in Clalit Health Services Some background If you ever needed to schedule a doctor

16 Sep 17, 2022

Automated data scraper for Thailand COVID-19 data

The Researcher COVID data Automated data scraper for Thailand COVID-19 data Accessing the Data 1st Dose Provincial Vaccination Data 2nd Dose Provincia

31 Apr 17, 2022

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

15 May 17, 2022

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

Disboard-Scraper Threaded Scraper to get discord servers from disboard.org written in python3. Setup. One thread / tag If you whant to look for multip

11 Nov 1, 2022

A simple proxy scraper that utilizes the requests module in python.

Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.

3 Sep 8, 2021

A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

11 May 6, 2022

Kusonime scraper using python3

Features Scrap from url Scrap from recommendation Search by query Todo [+] Search by genre Example # Get download url >>> from kusonime import Scrap >

2 Jan 28, 2022

simple http & https proxy scraper and checker

11 Nov 15, 2021

Nekopoi scraper using python3

Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper >>> from nekopoi import Hent >>>

9 Apr 6, 2022

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

543 Jan 3, 2023

A social networking service scraper in Python

snscrape snscrape is a scraper for social networking services (SNS). It scrapes things like user profiles, hashtags, or searches and returns the disco

2.4k Jan 1, 2023

An automated, headless YouTube Watcher and Scraper

Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube automation written in Python and the dockerized Tor Browser.

44 Oct 18, 2022

Dailyiptvlist.com Scraper With Python

Dailyiptvlist.com scraper Info Made in python Linux only script Script requires to have wget installed Running script Clone repository with: git clone

1 Oct 16, 2021

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Github Scraper Github scraper app is used to scrape data for a specific user profile. Github scraper app gets a github profile name and check whether

6 Apr 5, 2022

IGLS - Instagram Like Scraper CLI tool

IGLS - Instagram Like Scraper It's a web scraping command line tool based on python and selenium. Description This is a trial tool for learning purpos

5 Oct 29, 2021