Utility for downloading fanfiction in bulk from the Archive of Our Own

Overview

What is this?

This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to work with links to the Archive of Our Own itself, but has a secondary function of downloading any Pinboard bookmarks that link to the Archive of Our Own. You can ignore the Pinboard functionality if you don't know what Pinboard is or don't use Pinboard. This program is lightly tested and is currently very likely to have bugs.

Instructions

  • install python
    • make sure to choose the option "add to PATH" when you are installing python. if you do not do this the program is even less likely to work correctly than it already was.
  • clone (or download and unzip) the repository. the "repository" means the folder containing the code. you can download the repository by clicking on the "Code" button in github and selecting "Download ZIP"
  • windows: double-click on "ao3downloader.cmd"
  • other platforms: ao3downloader should work on any platform that supports python, however, you will need to do your own research into how to run python programs on your system.

Menu Options Explanation

  • 'download from ao3 link' - this works for most links to ao3. for example, you can use this to download a single work, a series, or any ao3 page that contains links to works or series (such as your bookmarks or an author's works). the program will download multiple pages automatically without the need to enter the next page link manually.
  • 'download latest version of incomplete fics (ao3 epub files only)' - you can use this to check a folder on your computer (and any subfolders) for epub files downloaded from ao3 that are incomplete works. for each incomplete fic found, the program will check ao3 to see if there are any new chapters, and if so, will download the new version to the downloads folder. apologies but this does not work for filetypes other than epub.
  • 'download pinboard xml document' - this is the first step in downloading your ao3 bookmarks from pinboard. ignore this if you don't use pinboard. to get the api token go to settings -> password on the pinboard website.
  • 'download bookmarks from pinboard xml document' - this is the second step in downloading your ao3 bookmarks from pinboard. ignore this if you don't use pinboard or if you haven't yet downloaded the pinboard xml document.
  • 'convert logfile into interactable html' - all downloads from ao3 (and some other actions) are logged in a file called log.jsonl in the downloads folder, along with information such as whether or not the download was successful, details about errors encountered, and so on. this option converts log.jsonl into a much more human-readable, searchable and sortable html file that can be opened in any browser. the file is saved in the downloads folder and is called 'logvisualization.html'

Notes

  • The purpose of entering your ao3 login information is to download archive-locked works or anything else that is not visible when you are not logged in. If you don't care about that, there is no need to enter your login information.
  • Try to keep your ao3 browsing to a minimum while the script is running. It won't break anything, but it may cause you to hit ao3's limit on how many hits to the site you are allowed within a certain time frame. This limit is per user, or per IP if you are not logged in. If this happens, the script will pause for 5 minutes to let the limit reset, and you may see a "Retry later" message when you try to open an ao3 page during that time. Don't be alarmed by this, just wait it out.
  • You should be able to guess the approximate runtime in seconds by taking the number of works to be downloaded times five. This is a very rough estimate as many factors can affect the total runtime.
  • If the script encounters a work that is part of a series, it will also download the entire series that the work is a part of.
  • For multi-page downloads from ao3, a message will be printed to the console each time a new page starts downloading. If you need to stop the download in the middle, take note of the last page downloaded before you close the window. When you restart, enter the link to that specific page instead of the first page, to avoid repeating downloads as much as possible. Note that pinboard bookmarks are not paginated in the same way, so this will not work if you are downloading bookmarks from pinboard.
  • IMPORTANT: some of your input choices are saved in settings.json. In some cases you will not be able to change these choices unless you clear your settings by deleting settings.json (or editing it, if you are comfortable with json). In addition, please note that saved settings include passwords and keys and are saved in plain text. Use appropriate caution with this file.

Troubleshooting

  • First, if you are able to create logvisualization.html (menu option 'v'), take a look through the logs to see if there are any helpful error messages.
  • If there are no logs or the logs are unhelpful, look for a folder called "venv" inside the repository. Delete "venv" and try re-running the script.
  • If deleting venv doesn't work, try deleting the entire repository and re-downloading from github (but remember to save your existing downloads if you have any!)
  • If re-downloading the repository doesn't work, try reinstalling python. Make sure to choose the option "add to PATH" during the installation.
  • If reinstalling python doesn't work, see this stackoverflow answer.
  • If you have tried all of the above and it still doesn't work, see below for how to send me a bug report.

Questions? Comments? Bug reports?

Feel free to email me at [email protected]. Please include "ao3downloader" in the subject line. If you are reporting a bug, please describe exactly what you did to make the bug happen to the best of your ability. (More is more! Be as detailed as possible.) Optionally when reporting bugs, it is also helpful if you include log.jsonl in the email as an attachment.

(Please note that while I will absolutely do my best to get back to you, I can't make any promises - I have a job, etc.)

Comments
  • Unable To Download Individual Fics From Collections

    Unable To Download Individual Fics From Collections

    Hello,nineyna! First and foremost, thank you so much for creating this wonderful program. I have 0 experience in programming and I really appreciate your instructions and this program in general. ( I sent an ask previously if this question is familiar to you.)

    I have run into an issue while using it. I can't download individual stories from collections for example Big Bangs. If I include stories in a series, they are able to be downloaded, however the individual fics do not show up in the downloads folder.

    I have tried deleting the settings, re-downloading the program itself. and I tried to see if the files were hidden but still nothing. I checked for this issue in several separate collections as well as tried to see if maybe my filters affected the download but the result is the same.

    I have checked the logs and there is only the link of the collection listed but then no stories listed as being downloaded if I don't select to download from a series.

    I don't have this issue if I am downloading from my favorites, from a creators page or from a works tag, just the collections.

    I do not know if other people also ran into this issue or if it is just me. I understand if you do not have the time or if it is not possible to fix this issue but I wanted to bring this to your attention.

    Thank you so much for making this awesome program! Me and my library are very grateful to you.

    bug user requested 
    opened by ghost 4
  • Initial set-up MacOS error code return

    Initial set-up MacOS error code return

    Let me preface by saying I don't know coding at all and am following the guide to the best of my ability. I run a Mac OS and I'm getting a return error in the initial set-up run command window following the command "pip install -r requirements.txt". I think what the error means is that I'm missing a key code package, but I figure it would be best to ask the developer to be sure. Code pasted below:

    Collecting beautifulsoup4==4.9.3 Using cached beautifulsoup4-4.9.3-py3-none-any.whl (115 kB) Collecting certifi==2020.12.5 Using cached certifi-2020.12.5-py2.py3-none-any.whl (147 kB) Collecting cffi==1.15.0 Using cached cffi-1.15.0-cp310-cp310-macosx_11_0_arm64.whl (173 kB) Collecting chardet==4.0.0 Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB) Collecting colorama==0.4.4 Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB) Collecting cryptography==36.0.1 Using cached cryptography-36.0.1-cp36-abi3-macosx_10_10_universal2.whl (4.8 MB) Collecting cssselect==1.1.0 Using cached cssselect-1.1.0-py2.py3-none-any.whl (16 kB) Collecting EbookLib==0.17.1 Using cached EbookLib-0.17.1.tar.gz (111 kB) Preparing metadata (setup.py) ... done Collecting idna==2.10 Using cached idna-2.10-py2.py3-none-any.whl (58 kB) Collecting loguru==0.5.3 Using cached loguru-0.5.3-py3-none-any.whl (57 kB) Collecting lxml==4.6.4 Using cached lxml-4.6.4.tar.gz (3.2 MB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [3 lines of output] Building lxml version 4.6.4. Building without Cython. Error: Please make sure the libxml2 and libxslt development packages are installed. [end of output]

    opened by Arukou-Arukou 4
  • Downloaded New Versions of Python and Downloader Now Have a NoModuleFoundError: no module named

    Downloaded New Versions of Python and Downloader Now Have a NoModuleFoundError: no module named "requests"

    Hello,

    Thank you for making the the AO3 downloader. I have an issue. I decided to download the newest versions of the downloader and python but now have a NoModuleFound: No module named "requests" error. I think I downloaded everything correctly but I don't know what to do now.

    Thank you for reading.

    image

    opened by readingfan 2
  • Work links in summaries should not be included in the download

    Work links in summaries should not be included in the download

    Regression caused by #43. Previously, the downloader only recognized internal/relative work links (of the form /works/12345). Now, work links of the form [...]/works/12345 are included also. This has the unintended side effect of including absolute work links (https://archiveofourown.org/works/12345) in the list. This can happen when an author manually links to another work on ao3 in their summary. Aside from not being intended behavior, the ao3 root url is still prepended to the absolute links (https://archiveofourown.orghttps://archiveofourown.org/works/12345) causing mayhem.

    tl;dr amend the work and series patterns to recognize internal (starting with "/") links only.

    bug 
    opened by nianeyna 1
  • Add slowmode

    Add slowmode

    Currently the program does its best to skate just under the ao3 rate limit, which makes it go as fast as possible but tends to foul up casual ao3 browsing while it's running. Add an option to halve the speed so you can browse in peace.

    wontfix 
    opened by nianeyna 1
  • Save embedded images

    Save embedded images

    Nothing fancy here. Probably just gonna label them with the fic name and dump them in a subfolder in downloads (presuming I can get it to work in the first place, but theoretically it should just be a matter of parsing the soup for img tags)

    opened by nianeyna 0
  • Find some way to pause non-paginated downloads

    Find some way to pause non-paginated downloads

    The pinboard and redownload functions are a bit scary because it is expected that they might need to download very large numbers of fics - potentially taking many hours - and there is absolutely no way to pause them in the middle. There has GOT to be some way to save state on these guys.

    opened by nianeyna 0
  • Improve handling of custom exceptions

    Improve handling of custom exceptions

    • Make all custom exceptions inherit from a base class
    • Don't bother logging stack trace for custom exceptions... I know where they came from
    • Instead of "message" field, just pass the relevant message into the exception constructor (no idea why I didn't do this in the first place...)
    opened by nianeyna 0
  • Log file locations and urls of fics found to be incomplete

    Log file locations and urls of fics found to be incomplete

    Currently a log is only written at the download step, and that only happens if the fic is found to have new chapters. It would be better to have visibility into all incomplete fics, including the ones that haven't been updated.

    opened by nianeyna 0
  • Log works with creator styles

    Log works with creator styles

    Related to #10

    Integrating a way to preserve workskins is challenging. As a stopgap, add a column to the log file to allow users to identify works which have custom skins, so that they can manually download them some other way. h/t @heartlessglasses for this idea.

    enhancement user requested 
    opened by nianeyna 0
  • Add support for downloading from manually entered links lists

    Add support for downloading from manually entered links lists

    Discussed in https://github.com/nianeyna/ao3downloader/discussions/41

    Originally posted by coolioniki August 23, 2022 Hello! This is an amazing piece of work.

    I wanted to ask if there's a way to input multiple links (directly of the fic) into the program and download it that way? Like FanFicFare (which is amazing and what I have been using so far, but have also been having formatting issues with).

    enhancement user requested 
    opened by nianeyna 0
  • Use logging or events instead of print() in certain places

    Use logging or events instead of print() in certain places

    Specifically:

    • inside repo.py when alerting the user about status 429 events
    • inside soup.py when alerting the user that a new page has started downloading

    Reasoning: separation of concerns between the logic layer and the UI layer

    opened by nianeyna 0
  • Let users toggle password save

    Let users toggle password save

    Would want to do this in some persistent way that doesn't require running the program in order to change. Possibly an ini file. Should probably be turned off by default, rather than on like it currently is.

    opened by nianeyna 0
Owner
null
A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

24 July 2020 Actively soliciting contributers! Ping @ronncc if you would like to help out! pytube pytube is a very serious, lightweight, dependency-fr

pytube 7.9k Jan 2, 2023
TikTok channel bulk ripper based on TikTok-Api and Youtube-dl. Some assembly may be required.

RipTok Script provided as is. Absolutely no guarantee. A TikTok ripper based on TikTokApi and YouTube-dl. Some assembly may be required. positional ar

null 32 Dec 24, 2022
a script to bulk check usernames on multiple site. includes proxy & threading support.

linked-bulk-checker bulk checks username availability on multiple sites info people have been selling these so i just made one to release dm my discor

krul 9 Sep 20, 2021
A program used to create accounts in bulk, still a work in progress as of now.

Discord Account Creator This project is still a work in progress. It will be published upon its full completion. About This project is still under dev

patched 8 Sep 15, 2022
Using AWS Batch jobs to bulk copy/sync files in S3

Using AWS Batch jobs to bulk copy/sync files in S3

AWS Samples 14 Sep 19, 2022
Bulk NFT uploader to OpenSea!

Bulk NFT Uploader Description Simple easy peasy python script which logins to opensea account using metamask and bulk uploads NFT to your default coll

Lakshya Khera 25 May 23, 2022
Bulk convert image types with Python

Bulk Image Converter ?? Helper script to convert a folder's worth of images from one filetype to another, and optionally delete originals Use Setup /

null 1 Nov 13, 2021
Opencontactbook - Bulk-manage large numbers of vCard contacts with built-in geolocation

Open Contact Book Open Contact Book is a buiness-oriented, cross-platform, Pytho

Aurélien PIERRE 2 Aug 8, 2022
OpenSea Bulk Uploader And Trader 100000 NFTs (MAC WINDOWS ANDROID LINUX) Automatically and massively upload and sell your non-fungible tokens on OpenSea using Python Selenium

OpenSea Bulk Uploader And Trader 100000 NFTs (MAC WINDOWS ANDROID LINUX) Automatically and massively upload and sell your non-fungible tokens on OpenS

ERC-7211 3 Mar 24, 2022
A simple Python wrapper for the archive.is capturing service

archiveis A simple Python wrapper for the archive.is capturing service. Installation pipenv install archiveis Python Usage Import it. >>> import archi

PastPages 157 Dec 28, 2022
Cloud-optimized, single-file archive format for pyramids of map tiles

PMTiles PMTiles is a single-file archive format for tiled data. A PMTiles archive can be hosted on a commodity storage platform such as S3, and enable

Protomaps 325 Jan 4, 2023
Quickly and efficiently delete your entire tweet history with the help of your Twitter archive without worrying about the pointless 3200 tweet limit imposed by Twitter.

Twitter Nuke Quickly and efficiently delete your entire tweet history with the help of your Twitter archive without worrying about the puny and pointl

Mayur Bhoi 73 Dec 12, 2022
Archive tweets and make them searchable

Tweeter Archive and search your tweets and liked tweets using AWS Lambda, DynamoDB and Elasticsearch. Note: this project is primarily being used a tes

Kamil Sindi 8 Nov 18, 2022
A wrapper to stream information from Twitter's Full-Archive Search Endpoint

A wrapper to stream information from Twitter's Full-Archive Search Endpoint. To exploit this library, one must have approved academic research access.

Daniela Pinto Veizaga 9 Nov 28, 2022
An attempt to make a bot that can auto-archive Danganronpa KG RPs on Discord.

Danganronpa Killing Game Archiving Bot An attempt to make a bot that can auto-archive Danganronpa KG RPs on Discord. The final format is meant to look

Astrea 1 Nov 30, 2021
Telegram Link Wayback Bot. This bot archives a web page thrown at itself with wayback Machine (Archive.org).

Telegram Link Wayback Bot. This bot archives a web page thrown at itself with wayback Machine (Archive.org).

Hüzünlü Artemis [HuzunluArtemis] 11 Feb 18, 2022
Script for downloading Coursera.org videos and naming them.

Coursera Downloader Coursera Downloader Introduction Features Disclaimer Installation instructions Recommended installation method for all Operating S

Coursera Downloader 9k Jan 2, 2023