Twitter Scraper

Related tags

Web Crawling tweety
Overview

tweety

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely fast.

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Internet Connection
  • Python 3.6+
  • BeautifulSoup (Python Module)
  • Requests (Python Module)

All Functions

  • get_tweets()
  • get_user_info()
  • get_trends() (can be used without username)
  • search() (can be used without username)
  • tweet_detail() (can be used without username)

Using tweety

Getting Tweets:

Description:

Get 20 Tweets of a Twitter User

Required Parameter:

  • Username or User profile URL while initiating the Twitter Object

Optional Parameter:

  • pages : int (default is 1,starts from 2) -> Get the mentioned number of pages of tweets
  • include_extras : boolean (default is False) -> Get different extras on the page like Topics etc

Output:

  • Type -> dictionary
  • Structure
    {
      "p-1" : {
        "result": {
            "tweets": []
        }
      },
      "p-2":{
        "result": {
            "tweets": []
        }
      }
    }

Example:

>> from tweet import Twitter >>> all_tweet = Twitter("Username or URL").get_tweets(pages=2) >>> for i in all_tweet: ... print(all_tweet[i]) ">
python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> all_tweet = Twitter("Username or URL").get_tweets(pages=2)
>>> for i in all_tweet:
...   print(all_tweet[i])

Getting Trends:

Description:

Get 20 Locale Trends

Output:

  • Type -> dictionary
  • Structure
", "url":" " }, { "name":" ", "url":" " } ] } ">
  {
    "trends":[
      {
        "name":"
      
       "
      ,
        "url":"
      
       "
      
      },
      {
        "name":"
      
       "
      ,
        "url":"
      
       "
      
      }
    ]
  } 

Example :

>> from tweet import Twitter >>> trends = Twitter().get_trends() >>> for i in trends['trends']: ... print(i['name']) ">
python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter().get_trends()
>>> for i in trends['trends']:
...   print(i['name'])

Searching a keyword:

Description:

Get 20 Tweets for a specific Keyword or Hashtag

Required Parameter:

  • keyword : str -> Keyword begin search

Optional Parameter:

  • latest : boolean (Default is False) -> Get the latest tweets

Output:

  • Type -> list

Example:

>> from tweet import Twitter >>> trends = Twitter().search("Pakistan") ">
python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter().search("Pakistan")

Getting USER Info:

Description:

Get the information about the user

Required Parameter:

  • Username or User profile URL while initiating the Twitter Object

Optional Parameter:

  • banner_extensions : boolean (Default is False) -> get more information about user banner image
  • image_extensions : boolean (Default is False) -> get more information about user profile image

Output:

  • Type -> dict

Example:

>> from tweet import Twitter >>> trends = Twitter("Username or URL").get_user_info() ">
python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter("Username or URL").get_user_info()

Getting a Tweet Detail:

Description:

Get the detail of a tweet including its reply

Required Parameter:

  • Identifier of the Tweet -> Either Tweet URL OR Tweet ID

Output:

  • Type -> dict
  • Structure
  {
    "conversation_threads":[],
    "tweet": {}
  }

Example:

>> from tweet import Twitter >>> trends = Twitter().tweet_detail("https://twitter.com/Microsoft/status/1442542812197801985") ">
python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tweet import Twitter
>>> trends = Twitter().tweet_detail("https://twitter.com/Microsoft/status/1442542812197801985")

Updates:

Update 0.1:

  • Get Multiple Pages of tweets using pages parameter in get_tweets() function
  • output of get_tweets has been reworked.

Update 0.2:

Update 0.2.1:

  • Fixed Hashtag Search
Comments
  • tweet.card:  Card without choices throws: 'NoneType' has no len()

    tweet.card: Card without choices throws: 'NoneType' has no len()

    Just started, yet getting some error when trying to access tweet.card that has no choices:

    Traceback (most recent call last): File "/Users/tom/Dev/test/gayTwitter/guys.py", line 20, in <module> print(tweet.card) File "/opt/homebrew/lib/python3.10/site-packages/tweety/types/twDataTypes.py", line 610, in __repr__ return f"Card(id={self.rest_id}, choices={len(self.choices) if self.choices else []}, end_time={self.end_time}, duration={len(self.duration)} minutes)" TypeError: object of type 'NoneType' has no len()

    opened by thomasf1 7
  • ValueError: sheet is not in list

    ValueError: sheet is not in list

    Im running a very simple script to scrape and publish file .to_xlsx() but every time I do this I get the below error in the title of this issue.

    See script I am running below: image

    If I scrape and print each tweet, that works fine. However, if I want to change filetime to xlsx and ultimately csv after, I am met with this error.

    Please provide guidance if I am missing something!

    opened by rm0nroe 5
  • Get quoted tweet

    Get quoted tweet

    I can see how to get the retweet posts, but the API seems to have no way of getting quoted tweets.

        tweets_iter = Twitter(nickname).get_tweets()
        for post in tweets_iter:
            msg = post.tweet_body
            timest = post.created_on
            print(post.is_retweet, post.author.screen_name, msg, post.threads)
    

    Even this code will display the person who retweeted the post when is_retweet is True, but not the original poster.

    opened by narodnik 3
  • pip install tweety-ns does not install latest versions

    pip install tweety-ns does not install latest versions

    After installing with pip install tweety-ns and pip3 install tweety-ns, I am met with this error:

    ModuleNotFoundError: No module named 'tweety.bot'; 'tweety' is not a package

    Please provide guidance. I am simply trying to run the below.

    from tweety.bot import Twitter
    
    app = Twitter("elonmusk")
    
    all_tweets = app.get_tweets()
    for tweet in all_tweets:
        print(tweet)
    
    
    opened by rm0nroe 2
  • Get followers for a user

    Get followers for a user

    Thanks for this great library.

    I went through the user API and there is nothing to get the followers for a user. Is this possible? I noticed you cannot access followers without logging in.

    opened by narodnik 2
  • Rate limit exceeded

    Rate limit exceeded

    I was trying to scrape a whole user's tweet history and after some minutes I got the following error: "requests.exceptions.JSONDecodeError: [Errno Expecting value] Rate limit exceeded"

    Could you please increase the amount of time it takes to move into the next page of tweets by default? Ideally there should be a parameter we could send to the get_tweets() function. In that way users will be able to adjust the amount of time the time.sleep() function should take so it doesn't impair users who can max out their scraping capabilities.

    opened by epremuz 2
  • Error: Tweet' object is not subscriptable

    Error: Tweet' object is not subscriptable

    Hello my friend.

    First of all, thanks for the beatufiul way of implementing this internal/public Twitter API. I've been testing a few weeks ago, and it seems really nice. Thanks for sharing with community.

    I was using version 0.1.2 and it was working fine, however after updating to 0.2, im receiving the following error:

    Erro de conexão/Connection Error : 'Tweet' object is not subscriptable

    Do you have any idea what it could be wrong ?

    opened by cangarot 2
  • documentation/update readme to fix array reference

    documentation/update readme to fix array reference

    First, thank you for referencing this repository from twitter-scraper #197

    In setup, I found that you have a typo in your readme. It errors in that "tweets" is not found. See below:

    all_tweets = app.get_tweets()
    for tweet in tweets:
    

    for tweet in tweets: should be updated to for tweet in all_tweets:

    Cheers 🍻

    opened by rm0nroe 1
  • Retweet author field doesn't show the actual author of the tweet

    Retweet author field doesn't show the actual author of the tweet

    Thanks for updating quoted tweets. That works correctly now. However the retweets don't show the correct author:

    from tweety.bot import Twitter
    
    app = Twitter("cobie")
    
    for tweet in app.get_tweets():
        if tweet.is_retweet:
            print("RT")
        print(tweet.author)
        print(tweet.text)
        #print(tweet)
        print()
    
    

    This will show:

    ...
    
    RT
    User(id=2259434528, name=Cobie, username=cobie, followers=780689, verified=True)
    BREAKING: Bank of International Settlements finalizes policy to let banks hold 2% of reserves in #Bitcoin
    
    ...
    

    When it should actually have the username as the person who wrote the tweet (not the person who RT).

    opened by narodnik 1
Owner
Tayyab Kharl
Newbie But Passionate
Tayyab Kharl
Twitter Scraper

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely fast.

Tayyab Kharl 45 Dec 30, 2022
A universal package of scraper scripts for humans

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains.

null 299 Dec 15, 2022
A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

Mika 4.8k Jan 4, 2023
A web scraper that exports your entire WhatsApp chat history.

WhatSoup ?? A web scraper that exports your entire WhatsApp chat history. Table of Contents Overview Demo Prerequisites Instructions Frequen

Eddy Harrington 87 Jan 6, 2023
Python scraper to check for earlier appointments in Clalit Health Services

clalit-appt-checker Python scraper to check for earlier appointments in Clalit Health Services Some background If you ever needed to schedule a doctor

Dekel 16 Sep 17, 2022
Automated data scraper for Thailand COVID-19 data

The Researcher COVID data Automated data scraper for Thailand COVID-19 data Accessing the Data 1st Dose Provincial Vaccination Data 2nd Dose Provincia

Porames Vatanaprasan 31 Apr 17, 2022
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

Aditya Gupta 15 May 17, 2022
🤖 Threaded Scraper to get discord servers from disboard.org written in python3

Disboard-Scraper Threaded Scraper to get discord servers from disboard.org written in python3. Setup. One thread / tag If you whant to look for multip

Ѵιcнч 11 Nov 1, 2022
A simple proxy scraper that utilizes the requests module in python.

Proxy Scraper A simple proxy scraper that utilizes the requests module in python. Usage Depending on your python installation your commands may vary.

null 3 Sep 8, 2021
A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

null 11 May 6, 2022
Kusonime scraper using python3

Features Scrap from url Scrap from recommendation Search by query Todo [+] Search by genre Example # Get download url >>> from kusonime import Scrap >

MhankBarBar 2 Jan 28, 2022
simple http & https proxy scraper and checker

simple http & https proxy scraper and checker

Neospace 11 Nov 15, 2021
Nekopoi scraper using python3

Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper >>> from nekopoi import Hent >>>

MhankBarBar 9 Apr 6, 2022
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

Joseph Lai 543 Jan 3, 2023
A social networking service scraper in Python

snscrape snscrape is a scraper for social networking services (SNS). It scrapes things like user profiles, hashtags, or searches and returns the disco

null 2.4k Jan 1, 2023
An automated, headless YouTube Watcher and Scraper

Searches YouTube, queries recommended videos and watches them. All fully automated and anonymised through the Tor network. The project consists of two independently usable components, the YouTube automation written in Python and the dockerized Tor Browser.

null 44 Oct 18, 2022
Dailyiptvlist.com Scraper With Python

Dailyiptvlist.com scraper Info Made in python Linux only script Script requires to have wget installed Running script Clone repository with: git clone

null 1 Oct 16, 2021
Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Github Scraper Github scraper app is used to scrape data for a specific user profile. Github scraper app gets a github profile name and check whether

Siva Prakash 6 Apr 5, 2022
IGLS - Instagram Like Scraper CLI tool

IGLS - Instagram Like Scraper It's a web scraping command line tool based on python and selenium. Description This is a trial tool for learning purpos

Shreshth Goyal 5 Oct 29, 2021