Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Last update: Jan 4, 2022

Related tags

Web Crawling Web-scraping

Overview

Extract Data from the IRS website A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

How to run the script? This script runs on Python 3.8. Install the libraries on requirements.txt into a new environment, then run 'Script.py'.

What should I expect? The script will ask you for the form number(s) then scrap the IRS website. --> Please enter the complete tax form number separated by a comma followed by a space (not case sensitive): (ie. Form W-2, Form 1095-C, Form W-3, etc) --> Form W-2, Form 1095-C

Then the bot will ask if the user would like to download the forms. --> Would you like to download all related pdfs? (Y/N)

If selected, the bot will follow up by asking a year range. --> Please provide the year range by using a dash in between the years (starting year must be smaller than ending year): (ie. 2018-2020)

Once executed, the bot will automatically create a folder and download the relevant pdfs into the folder.

Finally, the results will be returned as a json string. If there are no results, the user will get a 'No results' instead.

Sample output: [ {'form_number': 'Form W-2', 'form_title': 'Wage and Tax Statement (Info Copy Only)', 'min_year': '1954', 'max_year': '2022'}, {'form_number': 'Form 1095-C', 'form_title': 'Employer-Provided Health Insurance Offer and Coverage', 'min_year': '2014', 'max_year': '2022'}, {'form_number': 'Form W-3', 'form_title': 'Transmittal of Wage and Tax Statements (Info Copy Only)', 'min_year': '1990', 'max_year': '2022'} ]

Note: To keep users engaged, the bot will display which task it is performing and what URL it is currently searching.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 8, 2023

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Related tags

Overview

You might also like...

热搜榜-python爬虫+正则re+beautifulsoup+xpath

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Web Scraping images using Selenium and Python

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

Web-Scraping using Selenium Master

A simple django-rest-framework api using web scraping

Amazon web scraping using Scrapy Framework

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Owner

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

A Happy and lightweight Python Package that searches Google News RSS Feed and returns a usable JSON response and scrap complete article - No need to write scrappers for articles fetching anymore

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

Simple library for exploring/scraping the web or testing a website you’re developing

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

A modern CSS selector implementation for BeautifulSoup