Bulk Downloader for Reddit

Pranav

Last update: Jan 3, 2023

Related tags

Downloader scraper downloader youtube image jpg png video reddit imgur mp4 youtube-dl reddit-api imgur-album web-scraping gfycat imgur-api bulk-download imgur-album-downloader bulk-downloader redgifs

Overview

saveddit is a bulk media downloader for reddit

pip3 install saveddit

Setting up authorization

Register an application with Reddit
- Write down your client ID and secret from the app
- More about Reddit API access here
- Wiki page about Reddit OAuth2 applications here

These registrations will authorize you to use the Reddit and Imgur APIs to download publicly available information.

User configuration

The first time you run saveddit, you will see something like this:

foo@bar:~$ saveddit
Retrieving configuration from ~/.saveddit/user_config.yaml file
No configuration file found.
Creating one. Please edit ~/.saveddit/user_config.yaml with valid credentials.
Exiting

Open the generated ~/.saveddit/user_config.yaml
Update the client IDs and secrets from the previous step
If you plan on using the user API, add your reddit username as well

imgur_client_id: ''
reddit_client_id: ''
reddit_client_secret: ''
reddit_username: ''

Download from Subreddit

foo@bar:~$ saveddit subreddit -h
Retrieving configuration from /Users/pranav/.saveddit/user_config.yaml file

usage: saveddit subreddit [-h] [-f categories [categories ...]] [-l post_limit] [--skip-comments] [--skip-meta] [--skip-videos] -o output_path subreddits [subreddits ...]

positional arguments:
  subreddits            Names of subreddits to download, e.g., AskReddit

optional arguments:
  -h, --help            show this help message and exit
  -f categories [categories ...]
                        Categories of posts to download (default: ['hot', 'new', 'rising', 'controversial', 'top', 'gilded'])
  -l post_limit         Limit the number of submissions downloaded in each category (default: None, i.e., all submissions)
  --skip-comments       When true, saveddit will not save comments to a comments.json file
  --skip-meta           When true, saveddit will not save meta to a submission.json file on submissions
  --skip-videos         When true, saveddit will not download videos (e.g., gfycat, redgifs, youtube, v.redd.it links)
  -o output_path        Directory where saveddit will save downloaded content

Example Usage: Download the hottest 15 posts each from /r/pics and /r/aww

foo@bar:~$ saveddit subreddit pics aww -f hot -l 5 -o ~/Desktop

You can download from multiple subreddits and use multiple filters:

foo@bar:~$ saveddit subreddit funny AskReddit -f hot top new rising -l 5 -o ~/Downloads/Reddit/.

Download from User's page

foo@bar:~$ saveddit user -h
Retrieving configuration from /Users/pranav/.saveddit/user_config.yaml file

usage: saveddit user [-h] users [users ...] {saved,gilded,submitted,upvoted,comments} ...

positional arguments:
  users                 Names of users to download, e.g., Poem_for_your_sprog
  {saved,gilded,submitted,upvoted,comments}

optional arguments:
  -h, --help            show this help message and exit

Example Usage: Download top 10 comments submissions by user

saveddit user "Poem_for_your_sprog" comments -s top -l 10 -o ~/Desktop

Example Output

foo@bar:~$ tree ~/Downloads/www.reddit.com
/Users/pranav/Downloads/www.reddit.com
├── r
│   └── aww
│       └── new
│           ├── 000_We_decided_to_foster_a_litter_of...
│           │   ├── comments.json
│           │   ├── files
│           │   │   └── 7fjt2gkp32s61.jpg
│           │   └── submission.json
│           ├── 001_Besties_
│           │   ├── comments.json
│           │   ├── files
│           │   │   └── zklpm1qo32s61.jpg
│           │   └── submission.json
│           ├── 002_My_cat_dice_with_his_best_friend...
│           │   ├── comments.json
│           │   ├── files
│           │   │   └── av3yrbmo32s61.jpg
│           │   └── submission.json
│           ├── 003_Digging_makes_her_the_happiest_
│           │   ├── comments.json
│           │   ├── files
│           │   │   └── zjw5f3yl32s61.jpg
│           │   └── submission.json
│           └── 004_Our_beloved_pup_needs_some_help_...
│               ├── comments.json
│               ├── files
│               │   ├── 66su4i9b32s61.mp4
│               │   ├── 66su4i9b32s61_audio.mp4
│               │   └── 66su4i9b32s61_video.mp4
│               └── submission.json
└── u
    └── Poem_for_your_sprog
        └── gilded
            ├── 000_Comment__The_guy_was_the_biggest_deal_an...
            │   └── comments.json
            ├── 001_Comment__tl_dr_life_is_long_Journey_s_h...
            │   └── comments.json
            ├── 002_Comment_From_Northwind_mine_to_Talos_shr...
            │   └── comments.json
            ├── 003_Comment__I_feel_terrible_having_people_j...
            │   └── comments.json
            └── 004_Comment_I_often_stop_a_time_or_two_At_...
                └── comments.json

21 directories, 22 files
(saveddit_prod) (base)

Supported Links:

Direct links to images or videos, e.g., .png, .jpg, .mp4, .gif etc.
Reddit galleries reddit.com/gallery/...
Reddit videos v.redd.it/...
Gfycat links gfycat.com/...
Redgif links redgifs.com/...
Imgur images imgur.com/...
Imgur albums imgur.com/a/... and imgur.com/gallery/...
Youtube links youtube.com/... and yout.be/...
These sites supported by youtube-dl
Self posts
For all other cases, saveddit will simply fetch the HTML of the URL

Contributing

Contributions are welcome, have a look at the CONTRIBUTING.md document for more information.

License

The project is available under the MIT license.

Comments

filenames for large multireddits

Hi, i've just encountered a problem. When i try to make an anonymous multireddit with about 90 subreddits in it the name of the folder generated throws this error: [Errno 36] File name too long

Is there a way to bypass this?

opened by kjboa 2
Permission Error

I am using Linux Mint 20.1. The following error occured.

Traceback (most recent call last): File "/home/mobi/.local/bin/saveddit", line 8, in sys.exit(main()) File "/home/mobi/.local/lib/python3.8/site-packages/saveddit/saveddit.py", line 68, in main downloader.download(args.o, File "/home/mobi/.local/lib/python3.8/site-packages/saveddit/subreddit_downloader.py", line 79, in download os.makedirs(category_dir) File "/usr/lib/python3.8/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/usr/lib/python3.8/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/usr/lib/python3.8/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) [Previous line repeated 2 more times] File "/usr/lib/python3.8/os.py", line 223, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '/Downloads'

opened by mubashir-rehman 2
http 401 error

hello, I have the following error : `python -m saveddit.saveddit -r "Ebony" -f "new" -l 2000 -o "E:\E\D\saveddit\test" .___ .. __ ___________ ___ __ ____ | _/| /|__|/ | / __/_ \ / // __ \ / __ |/ __ | | \
_ \ / __ \ /\ __// // / // | | || | /____ >(____ /_/ ___ >____ ____ | |||| / / / / /

Downloader for Reddit version : v1.0.0 URL : https://github.com/p-ranav/saveddit

E:\E\D\saveddit\test Downloading from /r/Ebony/new/ Traceback (most recent call last): File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\theo\Downloads\saveddit-master\saveddit-master\saveddit\saveddit.py", line 73, in main(args) File "C:\Users\theo\Downloads\saveddit-master\saveddit-master\saveddit\saveddit.py", line 32, in main categories=args.f, post_limit=args.l, skip_videos=args.skip_videos, skip_meta=args.skip_meta, skip_comments=args.skip_comments) File "C:\Users\theo\Downloads\saveddit-master\saveddit-master\saveddit\subreddit_downloader.py", line 74, in download for i, submission in enumerate(category_function(limit=post_limit)): File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\praw\models\listing\generator.py", line 63, in next self._next_batch() File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\praw\models\listing\generator.py", line 73, in _next_batch self._listing = self._reddit.get(self.url, params=self.params) File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\praw\reddit.py", line 566, in get return self._objectify_request(method="GET", params=params, path=path) File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\praw\reddit.py", line 672, in _objectify_request path=path, File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\praw\reddit.py", line 855, in request json=json, File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\prawcore\sessions.py", line 331, in request url=url, File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\prawcore\sessions.py", line 257, in _request_with_retries url, File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\prawcore\sessions.py", line 164, in _do_retry retry_strategy_state=retry_strategy_state.consume_available_retry(), # noqa: E501 File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\prawcore\sessions.py", line 257, in _request_with_retries url, File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\prawcore\sessions.py", line 164, in _do_retry retry_strategy_state=retry_strategy_state.consume_available_retry(), # noqa: E501 File "C:\Users\theo\AppData\Local\Programs\Python\Python37\lib\site-packages\prawcore\sessions.py", line 260, in _request_with_retries raise self.STATUS_EXCEPTIONSresponse.status_code prawcore.exceptions.InvalidToken: received 401 HTTP response`

this happened a first time at the 514th file and happened again as I retried.

opened by reuspppp 2
Move client IDs and secrets in a separate configuration file
Main issue: In your script files you store your client IDs and secrets as constants. This can pose a number of problems.

Main problems:

Sensetiva Data exposure. Client IDs and secrets are considered rather sensetiva data. Storing them as constans is highly discouraged as basically anyone can get a hold of them.

Difficult configuration. If you need to change/update your tokens, this can be complicated for the end user due to the fact that he needs to change it in the script files itself, which is rather discouraging.

Code redundancy. You define your credentials twice, thus making it harder to change it (you need to go to every file and manually update them, which is inefficient at best) and also you end up with basically the same variables, which is also inefficient.

Solution: Move all this data in a separate configuration file (.yaml or .json) and create a function to parse it. That way you can store all your data in one place and retrieving it via a simple function call, thus making the process of updating it much more simpler, your code is now optimized (a bit, but still), and any end user feels more comfortable working with configuration file than with raw codebase.

If you do not mind, assign me please for this issue.

Thanks for this awesome project!
enhancement
opened by NickolaiBeloguzov 2
Fix for bugs plaguing the last commit & added a way to update configuration from the first start
This updates includes:

Emergency fixes for the last commit that made it impossible for the script to run in some cases.

Updated both documentations (readme & readmepy)

Added a way for users to add their Oauth credentials right from the terminal on their first start. (This is very useful for people that use Termux on Android which have a difficulty to edit the yaml file).
opened by Theoneflop 1
[Help] Comments Limits & Possible no-duplicate

Hi! I just downloaded this and was wondering how I could like remove the limit on how much comments it downloads? (the current limit is top comments only so)

Also, I was wondering how could I prevent it from re-downloading posts I already downloaded.

Thanks!

opened by Theoneflop 1
Added configuration file support

All website IDs and secrets were moved to an external file called 'user_config.yaml'. Also a new saveddit.configuration module was created for parsing thic configuration file.

This module is searching for a file and, if does not find one, creates an empty configuration file, prompts user to place his valid credentials into this file and exits the script.

New dependency was added: pyyaml==5.4.1

'user_config.yaml' file was added to .gitignore to prevent sensetive data leak.

opened by NickolaiBeloguzov 1
Comments limit argument & duplicates avoidance for subreddits.
What does this pull request change in the whole code? Simple.

From now on, users are able to choose if they want to download only top-level comments or the whole comment section in a post inside a subreddit by including the argument "--all-comments".

In case of any technical error / electricity shutdown / internet issues etc. while downloading a subreddit, Saveddit will make sure to not re-download already downlaoded posts if the same command is written & while having the same output directory.
opened by Theoneflop 0
Make saveddit a CL-callable module
Main issue: Using python3 -m saveddit.saveddit [args] command is not very comfortable for multiple reasons.

Reasons:

You need to be in the same directory as saveddit which is an unnecessary step that can be eliminated

You need to call this module directly, which is a) can be confusing, b) can be eliminated

Solution: Make this module callable from any place by creating a setup.py script and assembling this module into a python package. This way you can make saveddit available for domnload via PyPI - the largest Python project repository - via just a simple pip install saveddit command. To use this package, you'll need to just execute saveddit [args] command without changing your working directory. Also users can easily update your packages and you can modify its contents with ease
opened by NickolaiBeloguzov 0

Need error handling or processing of non media posts.

Getting the following error occasionally:

     * This is a redgif link
       - Looking for submission.preview.reddit_video_preview.fallback_url
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/christopher/saveddit/saveddit/saveddit.py", line 65, in <module>
    main(args)
  File "/home/christopher/saveddit/saveddit/saveddit.py", line 31, in main
    downloader.download(args.o,
  File "/home/christopher/saveddit/saveddit/subreddit_downloader.py", line 141, in download
    self.download_gfycat_or_redgif(submission, files_dir)
  File "/home/christopher/saveddit/saveddit/subreddit_downloader.py", line 371, in download_gfycat_or_redgif
    if "reddit_video_preview" in submission.preview:
  File "/home/christopher/.local/lib/python3.8/site-packages/praw/models/reddit/base.py", line 35, in __getattr__
    return getattr(self, attribute)
  File "/home/christopher/.local/lib/python3.8/site-packages/praw/models/reddit/base.py", line 36, in __getattr__
    raise AttributeError(
AttributeError: 'Submission' object has no attribute 'preview'

opened by cmullins83 0

Scraping comments in order

Does this library scrape comments of a given post in the order of their occurrence without messing with the hierarchy? The praw library helps in scraping all the comments but they are not in order. Please let me know if this library can do that and the command I should use.

I used the command below and got an error:

python3 -m bdfr download ./path/to/output --all-comments -l "https://www.reddit.com/r/germany/comments/yydfai/what_is_your_opinion_of_graffiti_all_over_walls/"

Error: No such option: --all-comments

Thank you

opened by naveenmalla046 0
"..." in directory names doesn't work for Windows users.

Line 41 in submission_downloader.py causes issues for Windows users because directories can't have "..." at the end of their names. For Windows users that line should be commented out.

opened by doctorrmcb 1
After merging audio and video, audio and video stay around

When download a file that separates audio and video into 2 files after saveddit merges them into 1 file the individual audio and video files stay around. Is this intended functionality? or Can there be an option to only keep the merged file when downloading?
enhancement

opened by Kyle-Mickan 2

FileNotFoundError when download a post with a title that is truncated on windows

System: Windows 10 64 bit. Python 3.9.5

Steps to reproduce: Run this on windows: saveddit subreddit pics -f top -l 5 -o .

As of today, the top post is this: https://old.reddit.com/r/pics/comments/haucpf/ive_found_a_few_funny_memories_during_lockdown/ Trying to download it gives this output:

#000 "I’ve found a few funny memories during lockdown. This is from my 1st tour in 89, backstage in Vegas."
     * Processing `https://i.redd.it/f58v4g8mwh551.jpg`
Traceback (most recent call last):
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\bad_g\AppData\Local\Programs\Python\Python39\Scripts\saveddit.exe\__main__.py", line 7, in <module>
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\site-packages\saveddit\saveddit.py", line 346, in main    downloader.download(args.o,
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\site-packages\saveddit\subreddit_downloader.py", line 67, in download
    SubmissionDownloader(submission, i, self.logger, category_dir,
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\site-packages\saveddit\submission_downloader.py", line 68, in __init__
    files_dir = create_files_dir(submission_dir)
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\site-packages\saveddit\submission_downloader.py", line 62, in create_files_dir
    os.makedirs(files_dir)
  File "c:\users\bad_g\appdata\local\programs\python\python39\lib\os.py", line 225, in makedirs
    mkdir(name, mode)
FileNotFoundError: [WinError 3] The system cannot find the path specified: '.\\www.reddit.com\\r\\pics\\top\\000_I_ve_found_a_few_funny_memories_...\\files'
PS C:\Users\bad_g\Downloads\Saveddit>

Probably due to the fact that windows removes the ellipsis at the end of the directory automatically. Maybe add the possibility to remove the truncation and/or simply remove the "..." added to the end of the directory for windows.

opened by Satiriques 3

Add support for the XDG Base Directory Specification

This is a feature request for supporting the XDG Base Directory Specification.

The specification works around a bug during the early UNIX v2 rewrite which caused files prepended with a '.' to be ignored from the output of ls. While this "bug" has become a feature for some, it has also become a headache for users when developers continue to assume HOME is a great place to dump configuration files and local caches.

To address these issues XDG Basedir was formed to give developers a standard location for these files and giving the users control over where they are placed in their HOME.

If you were to support the XDG specification the following locations would change:

Change ~/.saveddit/ to $XDG_CONFIG_HOME/saveddit and fall back to $HOME/.config/saveddit if XDG_CONFIG_HOME is not defined.

opened by SaraSmiseth 0

Owner

Pranav

GitHub

Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download the any kind of Images at any date and time over the internet. These images will get downloaded as a job and then let user know that the images have been downloaded by sending them a link over an email.

Advance Image Downloader/Extractor(Job) Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download th

13 Aug 27, 2022

Bulk Downloader for Reddit

Related tags

Overview

Setting up authorization

User configuration

Download from Subreddit

Example Usage: Download the hottest 15 posts each from /r/pics and /r/aww

Download from User's page

Example Usage: Download top 10 comments submissions by user

Example Output

Supported Links:

Contributing

License

Comments

Owner

Pranav

YouTube-Downloader - YouTube Video Downloader made using python

Python-Youtube-Downloader - An Open Source Python Youtube Downloader

Youtube Downloader is a simple but highly efficient Youtube Video Downloader, made completly using Python

Youtube-downloader-using-Python - Youtube downloader using Python

Python library to download bulk of images from Bing.com

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Youtube Downloader Telegram Bot 😉

A scriptable music downloader for Qobuz, Tidal, and Deezer

A Udemy downloader that can download DRM protected videos and non-DRM protected videos.

Music and video downloader, Made with love by Bryan Herrera

📺 YouTube Song Downloader Bot For Telegram 🔮

music downloader written in python. (Uses jiosaavn API)

MMDL (Mega Music Downloader) - A tool to easily download music.

apkizer is a mass downloader for android applications for all available versions.

Pantheon - The fastest YouTube downloader.

Terminal based YouTube player and downloader

Youtube playlist downloader with full metadata support

Using Youtube downloader is the fast and easy way to download and save any YouTube video.