Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.

Stanford Internet Observatory

Last update: Dec 14, 2022

Related tags

Third-party APIs Wrappers gogettr

Overview

GoGettr

GoGettr is an API client for GETTR, a "non-bias [sic] social network." (We will not reward their domain with a hyperlink.) GoGettr is built and maintained by the Stanford Internet Observatory.

This tool does not currently require any authentication with GETTR; it gathers all its data through publicly accessible endpoints.

Currently, this tool can:

Pull all the posts made on the platform
Pull all the comments made on the platform
Pull all the top "trending" hashtags
Pull all the suggested users
Pull all the "trending" posts (i.e., the posts on the home page)
Pull all the posts and/or comments of a user on the platform
Pull all a users' followers
Pull all the users a particular user follows
Pull all the comments on a particular posts
Pull information about any users on the platform

GoGettr is designed for academic research, open source intelligence gathering, and data archival. It pulls all of the data from the publicly accessible API.

Installation

GoGettr is available on PyPI. To install it, simply run pip install gogettr. Provided your pip is setup correctly, this will make gogettr available both as a command and as a Python package. Note that GoGettr requires Python 3.8 or higher.

CLI Playbook

Pull all posts (starting at id 1, capped at 1m)

gogettr all --max 1000000

Pull all comments

gogettr all --type comments --max 1000000

Pull all posts (starting at a particular ID and moving backward through IDs)

gogettr all --rev --last pay8d

Pull all posts from a user

gogettr user USERNAME --type posts

Pull all comments from a user

gogettr user USERNAME --type comments

Pull all likes from a user

gogettr user USERNAME --type likes

Pull a user's information

gogettr user-info USERNAME

CLI Usage

Usage: gogettr [OPTIONS] COMMAND [ARGS]...

  GoGettr is an unauthenticated API client for GETTR.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  all             Pull all posts (or comments) sequentially.
  comments        Pull comments on a specific post.
  hashtags        Pull the suggested hashtags (the top suggestions are...
  registered      Check if a username is registered.
  search          Search posts for the given query.
  suggested       Pull the suggested users (users displayed on the home...
  trends          Pull all the trends (posts displayed on the home page).
  user            Pull the posts, likes, or comments made by a user.
  user-followers  Pull all a user's followers.
  user-following  Pull all users a given user follows.
  user-info       Pull given user's information.

`all`

Usage: gogettr all [OPTIONS]

  Pull all posts (or comments) sequentially.

  Note that if iterating chronologically and both max and last are unset, then
  this command will run forever (as it will iterate through all post IDs to
  infinity). To prevent this, either specify a max, last post, or iterate
  reverse chronologically.

  Posts will be pulled in parallel according to the desired number of workers.
  Out of respect for GETTR's servers, avoid setting the number of workers to
  values over 50.

Options:
  --first TEXT             the ID of the first post to pull
  --last TEXT              the ID of the last post to pull
  --max INTEGER            the maximum number of posts to pull
  --rev                    increment reverse chronologically (i.e., from last
                           to first)
  --type [posts|comments]
  --workers INTEGER        the number of threads to run in parallel
  --help                   Show this message and exit.

`comments`

Usage: gogettr comments [OPTIONS] POST_ID

  Pull comments on a specific post.

Options:
  --max INTEGER  the maximum number of comments to pull
  --help         Show this message and exit.

`hashtags`

Usage: gogettr hashtags [OPTIONS]

  Pull the suggested hashtags (the top suggestions are displayed on the front
  page).

  Note that while the first five or so hashtags have expanded information
  associated with them, later results do not.

Options:
  --max INTEGER  the maximum number of hashtags to pull
  --help         Show this message and exit.

`registered`

Usage: gogettr registered [OPTIONS] USERNAME

  Check if a username is registered.

Options:
  --help  Show this message and exit.

`search`

Usage: gogettr search [OPTIONS] QUERY

  Search posts for the given query.

  This is equivalent to putting the query in the GETTR search box and
  archiving all the posts that result.

Options:
  --max INTEGER  the maximum number of posts to pull
  --help         Show this message and exit

`suggested`

Usage: gogettr suggested [OPTIONS]

  Pull the suggested users (users displayed on the home page).

Options:
  --max INTEGER  the maximum number of users to pull
  --help         Show this message and exit.

`trends`

Usage: gogettr trends [OPTIONS]

  Pull all the trends (posts displayed on the home page).

Options:
  --max INTEGER  the maximum number of posts to pull
  --until TEXT   the ID of the earliest post to pull
  --help         Show this message and exit.

`user`

Usage: gogettr user [OPTIONS] USERNAME

  Pull the posts, likes, or comments made by a user.

Options:
  --max INTEGER                  the maximum number of activities to pull
  --until TEXT                   the ID of the earliest activity to pull for
                                 the user
  --type [posts|comments|likes]
  --help                         Show this message and exit.

`user-followers`

Usage: gogettr user-followers [OPTIONS] USERNAME

  Pull all a user's followers.

Options:
  --max INTEGER  the maximum number of users to pull
  --help         Show this message and exit.

`user-following`

Usage: gogettr user-following [OPTIONS] USERNAME

  Pull all users a given user follows.

Options:
  --max INTEGER  the maximum number of users to pull
  --help         Show this message and exit.

`user-info`

Usage: gogettr user-info [OPTIONS] USERNAME

  Pull given user's information.

Options:
  --help  Show this message and exit.

Module Usage

You can use GoGettr as a Python module. For example, here's how you would pull all a user's posts:

from gogettr import PublicClient
client = PublicClient()
posts = client.user_activity(username="support", type="posts")

For more examples of using GoGettr as a module, check out the tests directory. Note that the API surface can't be considered quite stable yet. In the case that Gettr changes their API, GoGettr's API may change to match (though with as few public-facing API changes as possible, however).

GoGettr groups related API functionality into the same capabilities; for example, pulling users' comments, posts, and likes is all done by the same function (inside user_activity.py), and pulling followers and following is done by the same function (inside user_relationships.py). That means there isn't perfect correspondence between the CLI surface and the API surface.

Development

To run gogettr in a development environment, you'll need Poetry. Install the dependencies by running poetry install, and then you're all set to work on gogettr locally.

To run the tests, run poetry run pytest.

To access the CLI, run poetry run gogettr.

To package and release a new version on PyPI, simply create a new release tag on GitHub.

Contributing

Contributions are encouraged! For small bug fixes and minor improvements, feel free to just open a PR. For larger changes, please open an issue first so that other contributors can discuss your plan, avoid duplicated work, and ensure it aligns with the goals of the project. Be sure to also follow the code of conduct. Thanks!

Logging

When run in CLI mode, GoGettr will log extensive debug information to gogettr.log (in the working directory). This log will include every single request GoGettr makes, and every single response GoGettr receives. Because it's possible that GoGettr accidentally loses some information when parsing API responses, consider keeping this file around just in case.

Wishlist

Support for the following capabilities is planned:

...nothing right now! (Got an idea? Submit an issue/PR!)

Comments

Improve Logging with named logger

firstly: thanks for providing this nice API!

I would like to use it in a small project but the INFO logs are spamming quite much. Could you use a named logger instead of the root logger? For example by setting the logger in the api module like: logger = logging.getLogger(__name__) and consecutively using logger.INFO("awesome log") instead of logging.INFO(...)

This would enable users of your library to override the log level by using: logging.getLogger("gogettr").setLevel(logging.WARNING)
help wanted

opened by twitter-79 5
Always run poetry install

Because pull requests from forks can access the cache (https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache), just using the contents of the cache could potentially be a security issue if someone submits a malicious PR.

opened by epicfaace 4
KeyError: 'txt'
Similar to Issue #13 I receive the following KeyError message with the "all" command (plain or with --rev) after the first scraped post:

if data["data"]["txt"] == "Content Not Found": KeyError: 'txt'

If it's unclear how to resolve the error then reviewing this PHP-based API client on GitHub may be useful, as I'm able to scrape with that without any issues.

Unrelated: The quoted description contained in this project's "About" and README.md, i.e. '"non-bias [sic] social network,"' attempts to emphasize an ephemeral grammatical error in the App Store description that no longer exists, highlighting the project maintainers' condescension. This reflects poorly on StanfordIO and Stanford University.
opened by hack-r 3
errors

Hi, I am trying to get hashtags but the command isn't working. Is the script still working? I am getting errors such as the following one. The only command that is working for me is related to the users' info and pulling all posts (with another error copied below). Thanks again!

Traceback (most recent call last): File "c:\users\ahmed\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\ahmed\anaconda3\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\Ahmed\anaconda3\Scripts\gogettr.exe_main.py", line 7, in File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1128, in call return self.main(*args, **kwargs) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1053, in main rv = self.invoke(ctx) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 754, in invoke return __callback(*args, **kwargs) File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\cli.py", line 41, in user for post in client.user_activity(username, max=max, until=until, type=type): File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\capabilities\user_activity.py", line 38, in pull for data in self.client.get_paginated( File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\api.py", line 93, in get_paginated data = self.get(*args, **kwargs) File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\api.py", line 77, in get raise GettrApiError(errors[-1]) # Throw with most recent error gogettr.errors.GettrApiError

++++++++ Pull all posts error

File "c:\users\ahmed\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\ahmed\anaconda3\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\Ahmed\anaconda3\Scripts\gogettr.exe_main.py", line 7, in File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1128, in call return self.main(*args, **kwargs) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1053, in main rv = self.invoke(ctx) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\ahmed\anaconda3\lib\site-packages\click\core.py", line 754, in invoke return __callback(*args, **kwargs) File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\cli.py", line 82, in all for post in client.all( File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\capabilities\all.py", line 46, in pull result = futures.popleft().result() File "c:\users\ahmed\anaconda3\lib\concurrent\futures_base.py", line 432, in result return self.__get_result() File "c:\users\ahmed\anaconda3\lib\concurrent\futures_base.py", line 388, in __get_result raise self._exception File "c:\users\ahmed\anaconda3\lib\concurrent\futures\thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "c:\users\ahmed\anaconda3\lib\site-packages\gogettr\capabilities\all.py", line 108, in _pull_post if data["data"]["txt"] == "Content Not Found": KeyError: 'txt

opened by ahmedalrawi 2
Add ability to get comments from a post
uses /u/post/%s/comments URI for fetching comments from a specific post id.

uses get_paginated function to fetch comments.

More testing needs to be done on more situations (eg: 0 comments), as well as write a python test.
opened by KonradIT 2
switched to using a named logger

this enables us to override the log level as discussed in #24

some of the test cases are not working but I think this has nothing to do with the logging.

opened by twitter-79 1
Retrieving all followers/following for an account

Hey!

I have been trying to play around with the API to get the followers of some specific accounts, but I was wondering if GoGettr is able to get the whole list of followers for any account? I've been running some tests with a few accounts, and I usually get a fraction of the account's followers (for example, I get 1035 followers for @sloan4america, whereas he's supposed to have about 26k followers). I took a look at the gogettr.log file, and I noticed that the followers are retrieved by batches of 500 with the offset parameter. However, it seems that each batch generally has 20 followers. It may be that I am not using correctly, in which case please feel free to correct me.

Btw thanks again for all your hard work to develop this API, it is really an invaluable resource!

opened by AminMekacher 1

Misc. fixes

In light of issue #13 a change was made to the backend, now on some posts, the txt string key is not present in data.

sample of posts

{
	'data': {
		'udate': 1639344588908,
		'acl': {
			'_t': 'acl'
		},
		'_t': 'post',
		'cdate': 1639344588908,
		'_id': 'pew0',
		'nfound': True,
		'txt': 'Content Not Found'
	},
	'aux': None,
	'serial': 'post'
} {
	'data': {
		'udate': 1639344096311,
		'acl': {
			'_t': 'acl'
		},
		'_t': 'post',
		'cdate': 1639344588946,
		'_id': 'pew9',
		'vis': 'd'
	},
	'aux': None,
	'serial': 'post'
}

{
	'data': {
		'udate': 1639344588929,
		'acl': {
			'_t': 'acl'
		},
		'_t': 'post',
		'cdate': 1639344588929,
		'_id': 'pew3',
		'nfound': True,
		'txt': 'Content Not Found'
	},
	'aux': None,
	'serial': 'post'
} {
	'data': {
		'udate': 1625151531564,
		'acl': {
			'pub': 4
		},
		'_t': 'post',
		'cdate': 1625151530055,
		'_id': 'pew8',
		'txt': 'Himalaya',
		'imgs': ['null'],
		'vid_wid': '4032',
		'vid_hgt': '3024',
		'vis': 'p',
		'meta': [{
			'wid': None,
			'hgt': None,
			'meta': {
				'heads': None
			}
		}],
		'uid': 'milesmusk',
		'shbpst': 1
	},
	'aux': {
		'shrdpst': None,
		's_pst': {
			'shbpst': 1
		},
		'uinf': {
			'milesmusk': {
				'udate': '1625274241748',
				'_t': 'uinf',
				'_id': 'milesmusk',
				'nickname': '项羽',
				'username': 'milesmusk',
				'ousername': 'milesmusk',
				'dsc': 'Everything is just beginning',
				'status': 'a',
				'cdate': '1625149711910',
				'lang': 'en',
				'infl': 2,
				'ico': 'group37/getter/2021/07/03/01/1f4e2119-04a0-6827-35e8-1acf22e82b69/a628de634b82c0cec995da13db2874a9.jpg',
				'location': 'NFSC',
				'flw': '958',
				'flg': '1444',
				'lkspst': '1',
				'shspst': '1'
			}
		},
		'lks': [],
		'shrs': []
	},
	'serial': 'post'
}

Also the support account's infl key is now 2.

opened by KonradIT 1

Allow non-ASCII chars in CLI output by default, with flag to revert to original behavior
As discussed here: https://github.com/stanfordio/gogettr/issues/7

The ensure_ascii flag for JSON dumps is now False by default

it can be overridden via command line boolean flag --ensure_ascii
opened by KonradIT 0
user_activity retrieves <500 posts, retrieving more requires logging in
Gettr seems to have changed its public API, and now only allows ~500 posts to be retrieved without logging in. This is also apparent when looking at a user's timeline in a web browser: scrolling to the bottom of a user timeline when you're not logged in shows ~500 user posts and says "END" when you reach the limit, while scrolling to the bottom of the timeline when you ARE logged in shows all user posts (see attached image)

Adding an X-App-Auth header parameter from the logged-in user containing the username and a generated token allows you to retrieve all of a user's posts, i.e.:

HEADERS = { 'X-App-Auth': json.dumps({ 'user': '$MY_USERNAME', 'token': '$MY_TOKEN'}), } resp = requests.get( url, params=params, timeout=10, headers=HEADERS, )

Is implementing a login flow within the scope of this project? I could potentially take a crack at implementing it.

Might be related to issue https://github.com/stanfordio/gogettr/issues/21
opened by trislee 4
Issues Using All API for Posts Greater than p7b5gh

When I try to use the all API with posts greater than p7b5gh, I start running into issues where I think that there are large numbers of indices seem to be missing.

e.g., running the following command:

gogettr all --max 1000 --first p7b5gh

Returns a single post.

I tried with a couple of much larger ids (copied from another issue), I got a similar result. I did the same thing with the module mode, and still no luck. I want to be respectful of their API and don't want to like brute force until I see more posts, but I am not sure how else to collect sets of posts for a given time period or like the next n posts after a specific _id.

Before that index, I don't seem to run in quite as many issues, though there are definitely gaps in the returned indices.

Do you have any recommendations for using all with larger indices, or should I switch to scraping posts for specific users, rather than specific points in time? Am I missing something? Do the indices change to a different base or something? Is this just a weird coincidence that I am reading into too much?

Thank you for taking a look, this tool is super helpful!

opened by pjachim 1
GETTR post id does not consistently increase
Just curious if anyone else has noticed that the GETTR post id does not consistently increase in a user's timeline. I pulled the most recent 20 posts from a user using the following code:

client = PublicClient() posts = client.user_activity(username="elisestefanik", max=20, type="posts") for post in posts: print(post["_id"])

Here are the results. The oldest post is on the bottom.

puyfio07ae puyl23a25b puya6rd15c puy99p209c puy13dd6f9 puwxq28b5b puwool7718 puwlwue81d pux00k1c3a pux3rocc56 pux0r51e80 puwhjz99e2 puwz4v8eff puwvl67d70 puw00a95db puwqhtd4ed puwyd00b1b puwvfme687 put13g39eb pusdlv7b92

I assumed that the post id was a base 36 value that was always increasing over time, but when you start from the bottom of the results and go forward through time you will see the id go from puw___ to pux___ and then back to puw___ Huh? Within the puw posts it goes from puwv___ to puwy___ to puwq___

This seems to present a problem when using the 'until' parameter. My use case involves keeping track of the most recent post id that was retrieved each day, and using that on the next day to make sure I only grab the new posts. This requires a value that consistently increases. Since the value bounces around it's very likely to miss some posts since the line in user_activity.py if until is not None and until > id: assumes that new posts always have a higher id.

Each post has a 'udate' available in the dictionary at post['udate'] which is the time in milliseconds since the epoch, UTC. This seems to consistently increase for each post. Maybe a parameter 'until_time' could replace or be an alternative to 'until'?
opened by JusticeProject 0
Hashtags don't really make sense

Hi,

The hashtags don't really make sense as they are full sentences starting with a hashtag. Strange, but it behaves the same way on the mobile before entering a key. Those hashtags won't work because of the spaces. What's the API you are using? i'd like to be able to provide a letter for the top hashstags for that letter.

opened by TheGitHubGuy1 0
saving file+hashtags

Hi, Thanks for the update! I have Windows 10, and I am running Python from the shell. The script is fetching user's posts now, but is there a way to save the data as a file e.g. JSON or CSV. I tried using filename.json in the end of the command, but it didn't work. Then I used --o filename.json which also didn't work

Regarding hashtag search, it does not seem to be working because of the command argument. I tried alternative formats e.g. gogettr hashtags [#selfie] OR gogettr hashtags [selfie] OR gogettr hashtags #selfie OR gogettr hashtags [OPTIONS] #selfie...etc, but I always get the same error: Error: Got unexpected extra arguments ([OPTIONS].
Thanks! Ahmed

opened by ahmedalrawi 4
Output format / issues

The CLI client outputs JSON objects on new lines to stdout, which might not be ideal for parsing on other programs. Ideally the software would allow for a desired data formats to be set, for instance TSV/CSV instead of JSON. Also, outputting directly to a sqlite DB or other file.

Also, the JSON objects could be outputted with a comma at the end to make parsing on other programs easier.

What do you all think? I could work on it and submit draft PRs.

opened by KonradIT 8

Releases(v0.8.0)

v0.8.0(Jan 13, 2022)
What's Changed

Add Live capability by @KonradIT in https://github.com/stanfordio/gogettr/pull/16

Fix followers and following search (API limit lowered from 500 to 20)

Full Changelog: https://github.com/stanfordio/gogettr/compare/v0.7.0...v0.8.0
Source code(tar.gz)
Source code(zip)
v0.7.0(Jan 1, 2022)
What's Changed

Fix tests by @KonradIT in https://github.com/stanfordio/gogettr/pull/9

Fix Hashtags test by @KonradIT in https://github.com/stanfordio/gogettr/pull/10

Fix user activity test, uses emeraldrobinson instead of support by @KonradIT in https://github.com/stanfordio/gogettr/pull/11

Fix infl level in user info test by @iRove108 in https://github.com/stanfordio/gogettr/pull/12

Misc. fixes by @KonradIT in https://github.com/stanfordio/gogettr/pull/14

New Contributors

@iRove108 made their first contribution in https://github.com/stanfordio/gogettr/pull/12

Full Changelog: https://github.com/stanfordio/gogettr/compare/v0.6.0...v0.7.0
Source code(tar.gz)
Source code(zip)
v0.6.0(Jul 13, 2021)
Adds the registered command to check if a username is registered (thanks @KonradIT).

Reliability improvements for the all command.

Source code(tar.gz)
Source code(zip)
v0.5.0(Jul 9, 2021)
A comments command is now available to pull all the comments on a post. Thanks @KonradIT!

The all command now pulls posts and comments in parallel using a thread pool, making the command faster by an order of magnitude. You can set the number of workers using the --workers option.

GETTR's API is sometimes unreliable, and it's constantly changing. This release makes GoGettr less likely to unnecessarily propagate GETTR API errors. (For example, requests that timeout are now retried.)

Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.1(Jul 7, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.0(Jul 5, 2021)

Source code(tar.gz)
Source code(zip)
v0.2.1(Jul 5, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Stanford Internet Observatory

GitHub

Official python API for Phish.AI public and private API to detect zero-day phishing websites

phish-ai-api Summary Official python API for Phish.AI public and private API to detect zero-day phishing websites How it Works (TLDR) Essentially we h

168 May 17, 2022

SimpleTelegramScraper - A python script scrapes accounts from public groups via Telegram API and saves them in a CSV file

SimpleTelegramScraper - the best scraper on GitHub This simple python script scr

12 Oct 6, 2022

Beyonic API Python official client library simplified examples using Flask, Django and Fast API.

46 Sep 1, 2022

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database. Its original intention is to monitor cryptocurrency related channels, but it can be configured to read any Telegram data that is accessible through the API.