A command line tool (and Python library) for archiving Twitter JSON

Related tags

CLI Tools twarc
Overview

twarc

Build Status

Collect data at the command line from the Twitter API (v1.1 and v2).

Contributing

Documentation

The documentation is managed at ReadTheDocs. If you would like to improve the documentation you can edit the Markdown files in docs or add new ones. Then send a pull request and we can add it.

To view your documentation locally you should be able to:

pip install -r requirements-mkdocs.txt
mkdocs serve
open http://127.0.0.1:8000/

If you prefer you can create a page on the wiki to workshop the documentation, and then when/if you think it's ready to be merged with the documentation create an issue. Please feel free to create whatever documentation is useful in the wiki area.

Code

If you are interested in adding functionality to twarc or fixing something that's broken here are the steps to setting up your development environment:

git clone https://github.com/docnow/twarc
cd twarc
pip install -r requirements.txt

Create a .env file that included Twitter App keys to use during testing:

BEARER_TOKEN=CHANGEME
CONSUMER_KEY=CHANGEME
CONSUMER_SECRET=CHANGEME
ACCESS_TOKEN=CHANGEME
ACCESS_TOKEN_SECRET=CHANGEME

Now run the tests:

python setup.py test

Add your code and some new tests, and send a pull request!

Comments
  • twarc2 search without configure on Windows throws JSON parse error

    twarc2 search without configure on Windows throws JSON parse error

    I ran the request below: twarc2 search '#ENDSARS-is:retweet' --start-time 2017-12-01 --end-time 2020-11-30 --flatten --archive C:\Users\USER\Desktop\MyTwarcResults.json

    and I got this error message below:

    Traceback (most recent call last):
      File "C:\Users\USER\PycharmProjects\workspace\venv\Scripts\twarc2-script.py", line 33, in <module>
        sys.exit(load_entry_point('twarc==2.0.6', 'console_scripts', 'twarc2')())
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 782, in main
        rv = self.invoke(ctx)
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\decorators.py", line 33, in new_func
        return f(get_current_context().obj, *args, **kwargs)
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\twarc\decorators.py", line 172, in __call__
        result = e.response.json()
      File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\requests\models.py", line 900, in json
        return complexjson.loads(self.text, **kwargs)
      File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\__init__.py", line 354, in loads
        return _default_decoder.decode(s)
      File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\decoder.py", line 339, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\decoder.py", line 357, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    

    What exactly be the cause/source of this error, and how can i get help?

    opened by osemele 62
  • unable to parse 400 error as json: Bad request

    unable to parse 400 error as json: Bad request

    Hello to everyone, I am new to programming and I have been working for two days in Twarc in order to download tweets for my research. Note that I have access to Academic Product track v2

    I’m doing:

    pip install --upgrade twarc twarc2 configure (I paste the Bearer Token )

    Your keys have been written to …some folder…config etc etc and I get the message: Happy twarcing!
    Εverything seemed to be fine..!!

    When I run: twarc2 search --archive --start-time 2020-04-01 --end-time 2021-03-31 "(#coronahysterie OR #coronaluege OR #wirwerdenalledasein) lang:de" results.jsonl

    It crashes with an error: Unable to parse 400 error as JSON: Bad Request

    I also tried: twarc2 search climatestrike > tweets.jsonl , but I get the same error. I changed hashtags and dates but the problem remains.

    Can someone help me?? Thank you in advance! Sofia

    opened by sofiavlachou28 38
  • Dehydrated file empty

    Dehydrated file empty

    I collected tweets using twarc and then when I enter the commend dehydrate the .txt file is empty, there is no output. The result is the same with all files tried. Screenshot (39)

    opened by SCrockfo 33
  • Labs endpoints

    Labs endpoints

    Twitter is developing new API endpoints with all new JSON payloads as part of their Labs environment:

    https://developer.twitter.com/en/account/labs

    I think it might make sense to slowly develop a branch of twarc that uses these endpoints instead of the standard ones that are available now? It's not clear yet when (or if) the current API will be turned off. Once that is clearer I think this issue may take on some urgency.

    opened by edsu 25
  • Error: cannot import name 'Twarc'

    Error: cannot import name 'Twarc'

    Hi! I'm pretty new to python and twitter; I'm trying to run the sample code to use twarc as a library to collect tweets, but I keep getting ImportError: cannot import name 'Twarc' whenever I run the code. Any ideas what I'm doing wrong?

    Thanks!

    opened by jte229 24
  • Archive & Hydrate failures getting OpenSSL.SSL.SysCallError: (104, 'ECONNRESET') by Twitter

    Archive & Hydrate failures getting OpenSSL.SSL.SysCallError: (104, 'ECONNRESET') by Twitter

    It's a few weeks that very often Archive.py and Twarc --Hydrate failes unnoticed when launching with &. It doesn't write any traceback neither forwarding output to files. But launching both interactively I get this:

    Traceback (most recent call last):
      File "/usr/local/bin/twarc.py", line 4, in <module>
        __import__('pkg_resources').run_script('twarc==0.3.0', 'twarc.py')
      File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 729, in run_script
        self.require(requires)[0].run_script(script_name, ns)
      File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1649, in run_script
        exec(script_code, namespace, namespace)
      File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 335, in <module>
    
      File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 109, in main
    
      File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 298, in hydrate
    
      File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 172, in new_f
    
      File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.0-py2.7.egg/EGG-INFO/scripts/twarc.py", line 323, in post
    
      File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 507, in post
        return self.request('POST', url, data=data, json=json, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 464, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
        r = adapter.send(request, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 370, in send
        timeout=timeout
      File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
        body=body, headers=headers)
      File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 372, in _make_request
        httplib_response = conn.getresponse(buffering=True)
      File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
        response.begin()
      File "/usr/lib/python2.7/httplib.py", line 407, in begin
        version, status, reason = self._read_status()
      File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
        line = self.fp.readline()
      File "/usr/lib/python2.7/socket.py", line 447, in readline
        data = self._sock.recv(self._rbufsize)
      File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 188, in recv
        data = self.connection.recv(*args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 995, in recv
        self._raise_ssl_error(self._ssl, result)
      File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 862, in _raise_ssl_error
        raise SysCallError(errno, errorcode[errno])
    OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')
    

    The systems used have no I/O issues or network issues, all debians & OS X full updated and Twarc is current v0.3.0. How can I help in finding any solution?

    opened by remagio 24
  • conversations module

    conversations module "sleep" inconsistencies

    Hi there, I feel like I'm running into a similar issue as described here: https://twittercommunity.com/t/inconsistent-rate-limit-academic-research-full-archive-search/162928/18?u=igorbrigadir

    and here: https://github.com/DocNow/twarc/pull/578.

    I too am fetching all tweets related to a conversation id using the twarc2 command: Twarc2 conversations --archive input_conversation_ids.txt output_conversation_tweets.jsonl

    But, I'm finding that it is doing far fewer than the 300 requests per 15 minutes that my academic Twitter API account has been allotted.

    I'm using the latest version of twarc 2.8.2

    Here is some of the log that I'm seeing (Notice that at 10:14:24 it stops...and then doesn't restart until 11:23:27 for no clear reason.

    2022-01-01 10:04:32,148 INFO fetching conversation 290902007206772736
    2022-01-01 10:04:32,148 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902007206772736', 'max_results': 100}}
    2022-01-01 10:04:32,174 WARNING rate limit exceeded: sleeping 591.8250887393951 secs
    2022-01-01 10:14:24,003 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902007206772736', 'max_results': 100}}
    2022-01-01 10:14:24,117 INFO Retrieved an empty page of results.
    2022-01-01 10:14:24,117 INFO No more results for search conversation_id:290902007206772736.
    2022-01-01 11:23:27,550 INFO fetching conversation 290902009131966464
    2022-01-01 11:23:27,551 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:290902009131966464', 'max_results': 100}}
    2022-01-01 11:23:28,605 INFO Retrieved an empty page of results.
    2022-01-01 11:23:28,605 INFO No more results for search conversation_id:290902009131966464.
    2022-01-01 11:23:28,607 INFO fetching conversation 290902010599964672
    

    I'm also getting a lot of these warnings about "overlong sleep interval"s:

    2022-01-01 15:53:56,688 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
    2022-01-01 15:53:56,688 WARNING rate limit exceeded: sleeping 901 secs
    2022-01-01 16:08:57,693 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291011973909467136', 'max_results': 100}}
    2022-01-01 16:08:57,795 INFO Retrieved an empty page of results.
    2022-01-01 16:08:57,795 INFO No more results for search conversation_id:291011973909467136.
    2022-01-01 16:08:57,795 INFO fetching conversation 291018004416839680
    2022-01-01 16:08:57,796 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291018004416839680', 'max_results': 100}}
    2022-01-01 16:08:57,820 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
    2022-01-01 16:08:57,820 WARNING rate limit exceeded: sleeping 901 secs
    2022-01-01 16:23:58,834 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291018004416839680', 'max_results': 100}}
    2022-01-01 16:23:58,967 INFO Retrieved an empty page of results.
    2022-01-01 16:23:58,967 INFO No more results for search conversation_id:291018004416839680.
    2022-01-01 16:23:58,968 INFO fetching conversation 291062323647500288
    2022-01-01 16:23:58,968 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'media.fields': 'alt_text,duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'start_time': '2006-03-21T00:00:00+00:00', 'end_time': None, 'query': 'conversation_id:291062323647500288', 'max_results': 100}}
    2022-01-01 16:23:58,992 WARNING Detected overlong sleep interval - is your system clock accurate? An accurate system time is needed to calculate how long to sleep for, and data collection might be slowed.
    2022-01-01 16:23:58,992 WARNING rate limit exceeded: sleeping 901 secs
    

    Any help would be appreciated as at this rate, I won't put this dataset together within any reasonable amount of time.

    opened by jonlee112 21
  • deleted.py

    deleted.py

    What's the usage command for deleted.py? I've been using the command python utils/deleted.py election_data.txt > election_deleted.jsonl
    where election_data is the dehydration output of tweet ids from an election dataset. I keep getting this error: Traceback (most recent call last): File "utils/deleted.py", line 31, in for t in missing(tweets): File "utils/deleted.py", line 16, in missing tweet_ids = [t['id_str'] for t in tweets] File "utils/deleted.py", line 16, in tweet_ids = [t['id_str'] for t in tweets] TypeError: 'int' object is not subscriptable

    opened by ameliameyer 21
  • replies --recursive doesnt seem to be working

    replies --recursive doesnt seem to be working

    when I run for this tweet

    https://twitter.com/hey_ciara/status/1082335132818771968

    like twarc --mykeys=keys replies 1082335132818771968 --recursive

    I get the following output, which has 8 tweets and they are not the "thread".

    I understand there is no way to just get main user's tweets but this looks like it's not exhausting the search space. I want this id as the second one (1082335516232687617, 2nd tweet in the thread)

    {"created_at": "Mon Jan 07 17:56:19 +0000 2019", "id": 1082335132818771968, "id_str": "1082335132818771968", "full_text": "Y\u2019all REALLY tryna travel in 2019?? I got you. \n\nWe\u2019re covering everything today - flights, accommodation, activities, EVERYTHING! \n\nHere\u2019s a thread on how I\u2019ve traveled to over 30 countries for the low:", "truncated": false, "display_text_range": [0, 203], "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": []}, "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 786360813187571712, "id_str": "786360813187571712", "name": "Ciara Johnson", "screen_name": "hey_ciara", "location": "[email protected]", "description": "solo travel queen | Quit my job to travel the world | travel blog: https://t.co/iaRZsxQmMy. IG: Hey_ciara | Feat: @forbes @essence @elle @cosmopolitan TAMU15", "url": "https://t.co/PoGu3SY7li", "entities": {"url": {"urls": [{"url": "https://t.co/PoGu3SY7li", "expanded_url": "http://www.instagram.com/hey_ciara", "display_url": "instagram.com/hey_ciara", "indices": [0, 23]}]}, "description": {"urls": [{"url": "https://t.co/iaRZsxQmMy", "expanded_url": "http://www.heyciara.com", "display_url": "heyciara.com", "indices": [67, 90]}]}}, "protected": false, "followers_count": 31330, "friends_count": 631, "listed_count": 170, "created_at": "Thu Oct 13 00:20:02 +0000 2016", "favourites_count": 5194, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": false, "statuses_count": 4890, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "F5F8FA", "profile_background_image_url": null, "profile_background_image_url_https": null, "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/948180869738565634/8OcWwI-n_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/948180869738565634/8OcWwI-n_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/786360813187571712/1514898912", "profile_image_extensions_alt_text": null, "profile_banner_extensions_alt_text": null, "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 26468, "favorite_count": 97299, "favorited": false, "retweeted": false, "lang": "en"}
    {"created_at": "Mon Feb 04 22:08:03 +0000 2019", "id": 1092545345622589440, "id_str": "1092545345622589440", "full_text": "@hey_ciara Hey I\u2019m trying to go to Colorado over spring break, what is the best website to find cheap flights? Tx-&gt;Co :)", "truncated": false, "display_text_range": [11, 123], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}], "urls": []}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 3068904452, "id_str": "3068904452", "name": "Deissy \u2728", "screen_name": "Cucuuuy29", "location": "TX \ud83d\ude1c", "description": "- Instagram: iintoxicatex - Snapchat: cucuuuy - Scorpio \u264f\ufe0f", "url": "https://t.co/UeJ4Vw3DNb", "entities": {"url": {"urls": [{"url": "https://t.co/UeJ4Vw3DNb", "expanded_url": "https://youtu.be/UkwsermH0EY", "display_url": "youtu.be/UkwsermH0EY", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 130, "friends_count": 101, "listed_count": 0, "created_at": "Mon Mar 09 01:37:39 +0000 2015", "favourites_count": 8732, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 2150, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/3068904452/1529832326", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": true, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "en"}
    {"created_at": "Tue Feb 05 16:00:55 +0000 2019", "id": 1092815340210438147, "id_str": "1092815340210438147", "full_text": "@Cucuuuy29 Well, we're slightly biased but... we think you will like to give this a try: https://t.co/l8Eo3ARyhp", "truncated": false, "display_text_range": [11, 112], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "Cucuuuy29", "name": "Deissy \u2728", "id": 3068904452, "id_str": "3068904452", "indices": [0, 10]}], "urls": [{"url": "https://t.co/l8Eo3ARyhp", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [89, 112]}]}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"https://concorde.io\" rel=\"nofollow\">Concorde Helper</a>", "in_reply_to_status_id": 1092545345622589440, "in_reply_to_status_id_str": "1092545345622589440", "in_reply_to_user_id": 3068904452, "in_reply_to_user_id_str": "3068904452", "in_reply_to_screen_name": "Cucuuuy29", "user": {"id": 4249658592, "id_str": "4249658592", "name": "Concorde Cheaps", "screen_name": "concordecheaps", "location": "In the cloud.", "description": "Discover cheap flights with Concorde.", "url": "https://t.co/YZ6V5SLzKh", "entities": {"url": {"urls": [{"url": "https://t.co/YZ6V5SLzKh", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 769, "friends_count": 4, "listed_count": 79, "created_at": "Sun Nov 15 21:37:40 +0000 2015", "favourites_count": 49, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 22381, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_image_url_https": "https://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_banner_url": "https://pbs.twimg.com/profile_banners/4249658592/1454193154", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 1, "favorited": false, "retweeted": false, "possibly_sensitive": false, "lang": "en"}
    {"created_at": "Tue Feb 05 16:29:37 +0000 2019", "id": 1092822563527495680, "id_str": "1092822563527495680", "full_text": "@concordecheaps I can\u2019t figure out your website \ud83d\ude2c but I don\u2019t see any flights from IAH to Colorado? Thank you though.", "truncated": false, "display_text_range": [16, 117], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "concordecheaps", "name": "Concorde Cheaps", "id": 4249658592, "id_str": "4249658592", "indices": [0, 15]}], "urls": []}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1092815340210438147, "in_reply_to_status_id_str": "1092815340210438147", "in_reply_to_user_id": 4249658592, "in_reply_to_user_id_str": "4249658592", "in_reply_to_screen_name": "concordecheaps", "user": {"id": 3068904452, "id_str": "3068904452", "name": "Deissy \u2728", "screen_name": "Cucuuuy29", "location": "TX \ud83d\ude1c", "description": "- Instagram: iintoxicatex - Snapchat: cucuuuy - Scorpio \u264f\ufe0f", "url": "https://t.co/UeJ4Vw3DNb", "entities": {"url": {"urls": [{"url": "https://t.co/UeJ4Vw3DNb", "expanded_url": "https://youtu.be/UkwsermH0EY", "display_url": "youtu.be/UkwsermH0EY", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 130, "friends_count": 101, "listed_count": 0, "created_at": "Mon Mar 09 01:37:39 +0000 2015", "favourites_count": 8732, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 2150, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1090821215579705351/ci0x3d1U_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/3068904452/1529832326", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": true, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "en"}
    {"created_at": "Tue Feb 05 13:01:10 +0000 2019", "id": 1092770107196145664, "id_str": "1092770107196145664", "full_text": "@Cucuuuy29 Have you checked out Concorde? Take a look at our latest flight deals: https://t.co/l8Eo3ARyhp", "truncated": false, "display_text_range": [11, 105], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "Cucuuuy29", "name": "Deissy \u2728", "id": 3068904452, "id_str": "3068904452", "indices": [0, 10]}], "urls": [{"url": "https://t.co/l8Eo3ARyhp", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [82, 105]}]}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"https://concorde.io\" rel=\"nofollow\">Concorde Helper</a>", "in_reply_to_status_id": 1092545345622589440, "in_reply_to_status_id_str": "1092545345622589440", "in_reply_to_user_id": 3068904452, "in_reply_to_user_id_str": "3068904452", "in_reply_to_screen_name": "Cucuuuy29", "user": {"id": 4249658592, "id_str": "4249658592", "name": "Concorde Cheaps", "screen_name": "concordecheaps", "location": "In the cloud.", "description": "Discover cheap flights with Concorde.", "url": "https://t.co/YZ6V5SLzKh", "entities": {"url": {"urls": [{"url": "https://t.co/YZ6V5SLzKh", "expanded_url": "https://concorde.io", "display_url": "concorde.io", "indices": [0, 23]}]}, "description": {"urls": []}}, "protected": false, "followers_count": 769, "friends_count": 4, "listed_count": 79, "created_at": "Sun Nov 15 21:37:40 +0000 2015", "favourites_count": 49, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 22381, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_image_url_https": "https://pbs.twimg.com/profile_images/666008495133626368/vBf8kjXB_normal.png", "profile_banner_url": "https://pbs.twimg.com/profile_banners/4249658592/1454193154", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "possibly_sensitive": false, "lang": "en"}
    {"created_at": "Mon Feb 04 18:34:34 +0000 2019", "id": 1092491619243302918, "id_str": "1092491619243302918", "full_text": "@hey_ciara @coolkidmiri", "truncated": false, "display_text_range": [11, 23], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}, {"screen_name": "coolkidmiri", "name": "miri \ud83c\uddf3\ud83c\uddec", "id": 1053462132262690816, "id_str": "1053462132262690816", "indices": [11, 23]}], "urls": []}, "metadata": {"iso_language_code": "und", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 2680409402, "id_str": "2680409402", "name": "nicole", "screen_name": "enimsajn_", "location": "216", "description": "", "url": null, "entities": {"description": {"urls": []}}, "protected": false, "followers_count": 2168, "friends_count": 745, "listed_count": 7, "created_at": "Fri Jul 25 20:26:50 +0000 2014", "favourites_count": 35168, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": false, "statuses_count": 35331, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "000000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_tile": true, "profile_image_url": "http://pbs.twimg.com/profile_images/1092265592822865920/prl8CRTC_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1092265592822865920/prl8CRTC_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/2680409402/1549251387", "profile_link_color": "BDEDFF", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "000000", "profile_text_color": "000000", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": false, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "und"}
    {"created_at": "Sat Feb 02 12:02:21 +0000 2019", "id": 1091668140012974080, "id_str": "1091668140012974080", "full_text": "@hey_ciara following !", "truncated": false, "display_text_range": [11, 22], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}], "urls": []}, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 28462348, "id_str": "28462348", "name": "Jh\u00e9ani", "screen_name": "iamjheani", "location": "Los Angeles, CA", "description": "pop/r&b princess IG: iamjheani", "url": null, "entities": {"description": {"urls": []}}, "protected": false, "followers_count": 2798, "friends_count": 245, "listed_count": 29, "created_at": "Fri Apr 03 00:46:41 +0000 2009", "favourites_count": 124782, "utc_offset": null, "time_zone": null, "geo_enabled": true, "verified": false, "statuses_count": 43157, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "0E0000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme6/bg.gif", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme6/bg.gif", "profile_background_tile": true, "profile_image_url": "http://pbs.twimg.com/profile_images/1081151604814802944/CCYQBSw4_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1081151604814802944/CCYQBSw4_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/28462348/1547805519", "profile_link_color": "542403", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "010300", "profile_text_color": "4A4730", "profile_use_background_image": true, "has_extended_profile": false, "default_profile": false, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 0, "favorited": false, "retweeted": false, "lang": "en"}
    {"created_at": "Tue Jan 29 01:23:20 +0000 2019", "id": 1090057773369315328, "id_str": "1090057773369315328", "full_text": "@hey_ciara @chanslifee", "truncated": false, "display_text_range": [11, 22], "entities": {"hashtags": [], "symbols": [], "user_mentions": [{"screen_name": "hey_ciara", "name": "Ciara Johnson", "id": 786360813187571712, "id_str": "786360813187571712", "indices": [0, 10]}, {"screen_name": "chanslifee", "name": "babygirlvampedupp", "id": 2872357053, "id_str": "2872357053", "indices": [11, 22]}], "urls": []}, "metadata": {"iso_language_code": "und", "result_type": "recent"}, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "in_reply_to_status_id": 1082335132818771968, "in_reply_to_status_id_str": "1082335132818771968", "in_reply_to_user_id": 786360813187571712, "in_reply_to_user_id_str": "786360813187571712", "in_reply_to_screen_name": "hey_ciara", "user": {"id": 992933584603369474, "id_str": "992933584603369474", "name": "Shay\ud83d\udc97\ud83d\ude80", "screen_name": "Shaunequa5", "location": "United States", "description": "just livin\u2019 life \ud83e\uddd8\ud83c\udffd\u200d\u2640\ufe0f|IG:shay.leeeeeee_", "url": null, "entities": {"description": {"urls": []}}, "protected": false, "followers_count": 90, "friends_count": 81, "listed_count": 0, "created_at": "Sun May 06 01:06:29 +0000 2018", "favourites_count": 8403, "utc_offset": null, "time_zone": null, "geo_enabled": false, "verified": false, "statuses_count": 1260, "lang": "en", "contributors_enabled": false, "is_translator": false, "is_translation_enabled": false, "profile_background_color": "F5F8FA", "profile_background_image_url": null, "profile_background_image_url_https": null, "profile_background_tile": false, "profile_image_url": "http://pbs.twimg.com/profile_images/1091389797145276416/Bf-rGH0g_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1091389797145276416/Bf-rGH0g_normal.jpg", "profile_banner_url": "https://pbs.twimg.com/profile_banners/992933584603369474/1533486045", "profile_link_color": "1DA1F2", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "has_extended_profile": true, "default_profile": true, "default_profile_image": false, "following": false, "follow_request_sent": false, "notifications": false, "translator_type": "none"}, "geo": null, "coordinates": null, "place": null, "contributors": null, "is_quote_status": false, "retweet_count": 0, "favorite_count": 1, "favorited": false, "retweeted": false, "lang": "und"}
    
    
    opened by EralpB 20
  • Refactor command line and add manually set fields and expansions

    Refactor command line and add manually set fields and expansions

    Fix #493 will also Fix #550

    This also refactors the command2.py click command line options.

    Needs a good bit of testing to make sure all the old commands still work.

    opened by igorbrigadir 19
  • Client forbidden

    Client forbidden

    Hi, I have configured twarc2

    Your keys have been written to /home/aborruso/.config/twarc/config
    
    
    ✨ ✨ ✨  Happy twarcing! ✨ ✨ ✨
    

    But when I run twarc2 conversation 1406852944784412675 I have

    ⚡ Client Forbidden
    

    My log

    2021-06-21 09:41:00,283 INFO using config /home/aborruso/.config/twarc/config
    2021-06-21 09:41:00,283 INFO creating HTTP session headers for app auth.
    2021-06-21 09:41:00,283 INFO getting ('https://api.twitter.com/2/tweets/search/recent',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': 'conversation_id:1406852944784412675'}}
    

    If I run twarc filter covid I have no error. I'm using twarc v2.2.0.

    Thank you

    opened by aborruso 18
  • CLI: Allow to differentiate between 404 and connection timeout

    CLI: Allow to differentiate between 404 and connection timeout

    As I'm struggling to work around the API connection issues with CLI twarc1, I need some way to differentiate unsuccessful completion with "Read timed out" and "404 no users found from the list specified".

    My app deals with accounts that last few hours to few weeks, and therefore "No users found" for users/show is absolutely legit outcome that needs not to be retried or handled in any special way.

    On the contrary, in timeout scenario I need to retry twarc invocation until it succeeds.

    CLI twarc1 returns error code 1 both in case of 404 and connection timeout.

    Is there way to either make twarc consider 404 as a non-error scenario, or to differentiate between 404 and connection timeout for CLI utility?

    opened by antibot4navalny 4
  • Add `include_ext_is_blue_verified=True`

    Add `include_ext_is_blue_verified=True`

    Where we have this:

    https://github.com/DocNow/twarc/blob/cdb03503d3bdb7bf724e9b0edf6009fa0b9acc1a/twarc/client.py#L120

    We should also include this new one. The v1.1 API is no longer maintained, but it seems it is getting a few fields that may be useful. So for those that still have access to v1.1, this will be good to have.

    enhancement 
    opened by igorbrigadir 0
  • Add ability to configure max retries for v1.1 server errors

    Add ability to configure max retries for v1.1 server errors

    Given the increasing instability of the v1.1 API, it's helpful to be able to tune the number of retries for server errors.

    • Adds max_server_error_retries param to the v1.1 client retaining old value of 30 as default value
    • Pass max_server_error_retries through the rate_limit decorator
    • Pass max_server_error_retries through uses of Twarc.get() in client.py

    Given I haven't consulted with anyone on this change, please let me know what you think and I'm very happy to alter it!

    Some questions I have for you all:

    • Do you think the client initialisation is an appropriate place to expose the parameter?
    • Should the parameter be exposed in the CLI as well as in the library?
    • I passed the parameter through all usages of Twarc.get() in client.py, but not in the usages in utils/deletes.py as those are a bit different as they initialise their own client. What are your thoughts on how they should handle the parameter?
    • Should I pass the parameter through Twarc.post() as well so they are also configurable? Should they possibly be a separate parameter for posts rather than gets as they're for different purposes so people may want different values?

    For further context, I'm suggesting this change because we're seeing an increasing number of 500 errors returned from the v1.1 timeline endpoint, which is not surprising given everything that's going on at Twitter. For my usage of the endpoint, I'd much rather skip over that request earlier and move on to my next request - we are finding some requests do reach the 30 retries! I'd imagine that other people may want to retry for longer to increase their chances of getting the data they want.

    enhancement 
    opened by betsybookwyrm 3
  • Raise a meaningful error when trying to flatten non-tweets

    Raise a meaningful error when trying to flatten non-tweets

    Using Jupyter in VSCode. Several of the twarc functions lead to the entire of the data being printed to the console which makes debugging a pain and generally clogs things up.

    An example code that does this.

    from twarc.client2 import Twarc2
    from twarc.expansions import ensure_flattened
    
    
    from twarc.expansions import ensure_flattened
    twarc = Twarc2(SOME LOGIN INFO)
    
    listOfIds = [SOMEIDS]
    
    for id in listOfIds
      search = twarc.liking_users(id, max_results=100)
      for page in search:
          for profile in ensure_flattened(page):
              # Do something with the tweet
              allLikes.append({tid:profile['username']})
    

    This code leads to every returned profile being printed in full to the console. Example of the output here.

    https://imgur.com/a/X82V5K9

    good first issue 
    opened by gdhpearson 2
  • implement home timeline reverse chrono

    implement home timeline reverse chrono

    PR for the feature requested here: https://github.com/DocNow/twarc/issues/639

    Unless I am missing something, the changes I made to the __init__ should allow users to access other v2-only endpoint as well https://github.com/DocNow/twarc/issues/581

    opened by ntorba 5
  • Option to use full archive search by default

    Option to use full archive search by default

    I would suppose that users that have Academic Research access would typically use the full archive search. However, it is easy to forget the --archive flag, especially since it is not available for all commands.

    It would be preferable to be able to set that full archive search to be the default option.

    I understand that silently changing the endpoint is not a good idea. Instead, a warning could be displayed if recent search version is used, where full archive would be available.

    opened by Iseratho 2
Releases(v2.13.0)
  • v2.13.0(Dec 26, 2022)

    What's Changed

    • bump python version to >=3.6 by @igorbrigadir in https://github.com/DocNow/twarc/pull/660
    • Updates to tutorial doc by @boyd-nguyen in https://github.com/DocNow/twarc/pull/665
    • Twarc Tutorial by @SamHames in https://github.com/DocNow/twarc/pull/558
    • Add twarc demo video by @Quiet27 in https://github.com/DocNow/twarc/pull/677
    • Add missing variants field to media by @igorbrigadir in https://github.com/DocNow/twarc/pull/679

    New Contributors

    • @boyd-nguyen made their first contribution in https://github.com/DocNow/twarc/pull/665
    • @Quiet27 made their first contribution in https://github.com/DocNow/twarc/pull/677

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.12.0...v2.13.0

    Source code(tar.gz)
    Source code(zip)
  • v2.12.0(Oct 1, 2022)

    What's Changed

    • Add additional expansions and fields for the new tweet edit related API parameters by @SamHames in https://github.com/DocNow/twarc/pull/657

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.11.3...v2.12.0

    Source code(tar.gz)
    Source code(zip)
  • v2.11.3(Sep 12, 2022)

    What's Changed

    • Add __twarc metadata to list_lookup returned data by @SamHames in https://github.com/DocNow/twarc/pull/654

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.11.2...v2.11.3

    Source code(tar.gz)
    Source code(zip)
  • v2.11.2(Aug 16, 2022)

  • v2.11.1(Jul 18, 2022)

    What's Changed

    • Add sort_order parameter for search api by @mirkolenz in https://github.com/DocNow/twarc/pull/645
    • Append matching rules from stream when flattening by @igorbrigadir in https://github.com/DocNow/twarc/pull/646
    • Fix bug where --max-results could not be set with --no-context-annota… by @SamHames in https://github.com/DocNow/twarc/pull/648

    New Contributors

    • @mirkolenz made their first contribution in https://github.com/DocNow/twarc/pull/645

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.10.4...v2.11.1

    Source code(tar.gz)
    Source code(zip)
  • v2.11.0(Jun 30, 2022)

    What's Changed

    • Add sort_order parameter for search api by @mirkolenz in https://github.com/DocNow/twarc/pull/645
    • Append matching rules from stream when flattening by @igorbrigadir in https://github.com/DocNow/twarc/pull/646

    New Contributors

    • @mirkolenz made their first contribution in https://github.com/DocNow/twarc/pull/645

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.10.4...v2.11.0

    Source code(tar.gz)
    Source code(zip)
  • v2.10.4(Apr 29, 2022)

    This release contains two bug fixes:

    • A fix to the ensure_flattened function that handles valid API responses that contain errors, but no data #627
    • A fix to the v1.1 user_lookup function that raises a useful error when a string is passed, preventing the lookup of single character usernames. Thanks to @hauselin.
    Source code(tar.gz)
    Source code(zip)
  • v2.10.3(Apr 21, 2022)

    This release fixes two issues:

    • reports a meaningful error when the timeline command is called for a user that doesn't exist
    • correctly handles counts when querying for a user that doesn't exist
    Source code(tar.gz)
    Source code(zip)
  • v2.10.2(Apr 1, 2022)

    What's Changed

    • Set daemon attribute instead of using setDaemon method that was deprecated in Python 3.10. by @tirkarthi in https://github.com/DocNow/twarc/pull/617
    • Fix method name typo introduced by refactor by @SamHames in https://github.com/DocNow/twarc/pull/621

    New Contributors

    • @tirkarthi made their first contribution in https://github.com/DocNow/twarc/pull/617

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.10.1...v2.10.2

    Source code(tar.gz)
    Source code(zip)
  • v2.10.1(Mar 25, 2022)

  • v2.10.0(Mar 23, 2022)

    This release adds support for:

    • all of the list related endpoints via the twarc2 lists subcommands and associated client methods.
    • the new quote tweet endpoint via the twarc2 quotes command and associated client methods.
    Source code(tar.gz)
    Source code(zip)
  • v2.9.5(Mar 4, 2022)

    This release adds a workaround for a bug in Twitter's Counts API endpoint which was resulting in the twarc2 counts command stopping prematurely. Thanks to @melaniewalsh and @SamHames for the detective work! See #602 for the story.

    Source code(tar.gz)
    Source code(zip)
  • v2.9.4(Feb 24, 2022)

    This release is functionally identical to v2.9.3, which contained a bugfix for an issue with the streaming API raising an exception and stopping early.

    Due to a mistake in the release process v2.9.3 wasn't deployed to PyPI. Rather than edit history to re-release that version, this new release is being made instead.

    Source code(tar.gz)
    Source code(zip)
  • v2.9.3(Feb 9, 2022)

    This version fixes a bug in the twarc2 sample command, that would cause an exception to be raised when trying to log a non-existent tweet ID.

    Source code(tar.gz)
    Source code(zip)
  • v2.9.2(Jan 31, 2022)

    This release includes new functionality to provide a User-Agent HTTP header with all Twitter API requests. For example:

    twarc/2.9.2 (Darwin x86_64) CPython/3.10.1
    
    Source code(tar.gz)
    Source code(zip)
  • v2.9.1(Jan 31, 2022)

  • v2.9.0(Jan 27, 2022)

    What's Changed

    • More badges in the readme by @igorbrigadir in https://github.com/DocNow/twarc/pull/577
    • Support likes and retweets endpoints. by @SamHames in https://github.com/DocNow/twarc/pull/588

    Full Changelog: https://github.com/DocNow/twarc/compare/v2.8.3...v2.9.0

    Source code(tar.gz)
    Source code(zip)
  • v2.8.3(Jan 6, 2022)

    Fixes an issue where twarc was not correctly handling the 1 request/s rate limit for the search/all endpoint. Also includes better handling and error messages of situations when that rate limit is hit.

    Source code(tar.gz)
    Source code(zip)
  • v2.8.2(Dec 4, 2021)

    This release includes some improvements to how the mandatory the mandatory one second sleep between requests to the search api is handled with some of the twarc2 commands. See #575 for details.

    Source code(tar.gz)
    Source code(zip)
  • v2.8.1(Nov 15, 2021)

    v2.8.1 includes a small update to the twarc search --help message that links to Twitter's Building Queries for Search Tweets to help users figure out what's possible.

    https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(Oct 23, 2021)

    v2.8.0 adds some new controls for shaping the data that is returned from the Twitter API. The default behavior is for twarc to retrieve the fullest representation of a tweet by requesting all tweet, user, media, place and poll fields as well as all available expansions. This is generally good practice with twarc because it means that downstream processing of the collected data can rely on have all this data at its disposal. However there may be cases where you want to customize the data that comes back. This is not recommended practice but it could be useful in some contexts.

    The following options allow you to fine tune the types of data that are requested when using the following sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, and stream. The options include:

      --expansions TEXT               Comma separated list of expansions to
                                      retrieve. Default is all available.
      --tweet-fields TEXT             Comma separated list of tweet fields to
                                      retrieve. Default is all available.
      --user-fields TEXT              Comma separated list of user fields to
                                      retrieve. Default is all available.
      --media-fields TEXT             Comma separated list of media fields to
                                      retrieve. Default is all available.
      --place-fields TEXT             Comma separated list of place fields to
                                      retrieve. Default is all available.
      --poll-fields TEXT              Comma separated list of poll fields to
                                      retrieve. Default is all available.
    

    These correspond to the API Fields and Expansions.

    There is also --minimal-fields which requests just a minimal subset of data, and --no-context-annotations that does not include context-annotations, which allows more tweets to be fetched at one time (500 instead of 100). This also applies to the sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, stream.

      --minimal-fields                By default twarc gets all available data.
                                      This option requests the minimal retrievable
                                      amount of data - only IDs and object
                                      references are retrieved. Setting this makes
                                      --max-results 500 the default. NOTE: This
                                      argument is mutually exclusive with
                                      arguments: [--counts-only, --poll-fields,
                                      --media-fields, --expansions, --no-context-
                                      annotations, --place-fields, --user-fields,
                                      --tweet-fields].
    
    Source code(tar.gz)
    Source code(zip)
  • v2.7.3(Oct 10, 2021)

  • v2.7.2(Oct 10, 2021)

  • v2.7.1(Oct 10, 2021)

    • Add start-time/since-id parameters in the timeline CLI command to the timelines CLI command.
    • Ensure that sample command only writes JSON on stdout.
    Source code(tar.gz)
    Source code(zip)
  • v2.7.0(Oct 4, 2021)

    v2.7.0 adds a new places command to search for places and their identifiers, which can be used in search and stream queries. Even though it's still on the 1.1 endpoint the 1.1/geo/search.json API endpoint makes these place identifiers available when searching by the name, geo coordinates, or ip address.

    Usage: twarc2 places [OPTIONS] VALUE [OUTFILE]
    
      Search for places by place name, geo coordinates or ip address.
    
    Options:
      --type [name|geo|ip]            How to search for places (defaults to name)
      --granularity [neighborhood|city|admin|country]
                                      What type of places to search for (defaults
                                      to neighborhood)
      --max-results INTEGER           Maximum results to return
      --json                          Output raw JSON response
      --help                          Show this message and exit.
    

    There is a corresponding method twarc.client2.Twarc2.geo() method which you can use to do the lookup yourself from Python.

    Source code(tar.gz)
    Source code(zip)
  • v2.6.0(Sep 27, 2021)

    • Adds the searches CLI command for running multiple searches from an input file
    • Makes progress reporting more accurate for commands that consume files one line at a time (users, conversations, hydrate etc)
    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Sep 22, 2021)

    This release includes new functionality for working with Twitter's new Batch Compliance API which allow you to upload large datasets of Tweet or user IDs to retrieve their compliance status in order to determine what data requires action in order to bring your datasets into compliance.

    Usage: twarc2 compliance-job [OPTIONS] COMMAND [ARGS]...
    
      Create, retrieve and list batch compliance jobs for Tweets and Users.
    
    Options:
      --help  Show this message and exit.
    
    Commands:
      create    Create a new compliance job and upload tweet IDs.
      download  Download the compliance job with the specified ID.
      get       Returns status and download information about the job ID.
      list      Returns a list of compliance jobs by job type and status.
    
    Source code(tar.gz)
    Source code(zip)
  • v2.4.2(Aug 17, 2021)

    This release ensures that the timeline, timelines, conversation and conversations commands default to a --start-time of 2006-03-21 (the first day of tweets) when being instructed to use the /tweets/search/all endpoint behind the scenese. For example when doing:

    twarc2 timeline --use-search jack 
    

    or:

    twarc2 conversation --archive 21
    

    Previously it was defaulting to the last 30 days (which is an unfortunate default set by the /tweets/search/all endpoint). Many thanks to Darren Halpin and @SamHames for identifying and fixing the issue!

    Source code(tar.gz)
    Source code(zip)
  • v2.4.1(Aug 11, 2021)

    This release includes support for requesting the new alt_text field for media from Twitter's v2 API:

    https://twittercommunity.com/t/media-alt-text-field-now-available-in-twitter-api-v2/157939

    Source code(tar.gz)
    Source code(zip)
Owner
Documenting the Now
tools supporting the ethical collection, use, and preservation of social media
Documenting the Now
A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

Glenn Musa 1 Feb 3, 2022
A command-line based, minimal torrent streaming client made using Python and Webtorrent-cli. Stream your favorite shows straight from the command line.

A command-line based, minimal torrent streaming client made using Python and Webtorrent-cli. Installation pip install -r requirements.txt It use

Jonardon Hazarika 17 Dec 11, 2022
AML Command Transfer. A lightweight tool to transfer any command line to Azure Machine Learning Services

AML Command Transfer (ACT) ACT is a lightweight tool to transfer any command from the local machine to AML or ITP, both of which are Azure Machine Lea

Microsoft 11 Aug 10, 2022
Python library and command line tool for interacting with Bugzilla

python-bugzilla This package provides two bits: bugzilla python module for talking to a Bugzilla instance over XMLRPC or REST /usr/bin/bugzilla comman

Python Bugzilla Project 112 Nov 5, 2022
A cd command that learns - easily navigate directories from the command line

NAME autojump - a faster way to navigate your filesystem DESCRIPTION autojump is a faster way to navigate your filesystem. It works by maintaining a d

William Ting 14.5k Jan 3, 2023
Ros command - Unifying the ROS command line tools

Unifying the ROS command line tools One impairment to ROS 2 adoption is that all

null 37 Dec 15, 2022
Python command line tool and python engine to label table fields and fields in data files.

Python command line tool and python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifable information (PII).

APICrafter 22 Dec 5, 2022
A lightweight Python module and command-line tool for generating NATO APP-6(D) compliant military symbols from both ID codes and natural language names

Python military symbols This is a lightweight Python module, including a command-line script, to generate NATO APP-6(D) compliant military symbol icon

Nick Royer 5 Dec 27, 2022
gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases.

gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

Pachter Lab 570 Dec 29, 2022
MsfMania is a command line tool developed in Python that is designed to bypass antivirus software on Windows and Linux/Mac in the future

MsfMania MsfMania is a command line tool developed in Python that is designed to bypass antivirus software on Windows and Linux/Mac in the future. Sum

null 446 Dec 21, 2022
A python command line tool to calculate options max pain for a given company symbol and options expiry date.

Options-Max-Pain-Calculator A python command line tool to calculate options max pain for a given company symbol and options expiry date. Overview - Ma

null 13 Dec 26, 2022
A command line tool to hide and reveal information inside images (works for both PNGs and JPGs)

Imgrerite A command line tool to hide and reveal information inside images (works for both PNGs and JPGs) Dependencies Python 3 Git Most of the Linux

Jigyasu 10 Jul 27, 2022
Splitgraph command line client and python library

Splitgraph Overview Splitgraph is a tool for building, versioning and querying reproducible datasets. It's inspired by Docker and Git, so it feels fam

Splitgraph 313 Dec 24, 2022
PwnWiki command line searching tool & bindings written in Python

pwsearch PwnWiki 数据库搜索命令行工具。 安装 您可以直接用 pip 命令从 PyPI 安装 pwsearch: pip3 install -U pwsearch 您也可以 clone 该仓库并直接从源码启动

PwnWiki 20 Jun 21, 2021
PyArmor is a command line tool used to obfuscate python scripts

PyArmor is a command line tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.

Dashingsoft 2k Jan 7, 2023
Unofficial Open Corporates CLI: OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. This is an unofficial open corporates python command line tool.

Unofficial Open Corporates CLI OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. This is an unoff

Richard Mwewa 30 Sep 8, 2022
Professor Wordlist is a free open source command line tool written in python

Professor Wordlist is a free open source command line tool written in python, With the aim of generating custom wordlists with a variety of unique parameters and functions providing many possibilities.

オークO A K Z E H オーク 1 Oct 28, 2021
A simple command line tool written in python to manage a to-do list

A simple command line tool written in python to manage a to-do list Dependencies: python Commands: todolist (-a | --add) [(-p | --priority)] [(-l | --

edwloef 0 Nov 2, 2021