This is a webscraper for a specific website

Overview

Web-Scraper-for-a-news-website

This is a webscraper for a specific website (Economic Times). It is tuned to extract the headlines of that website. With some little adjustments the webscraper is able to extract any part of the website.

Installation

Install the following:

  1. Selenium: Please follow the link https://selenium-python.readthedocs.io/installation.html and install the selenium.
  2. Chromedriver: Check your Chrome browser's version (Menu -> Help -> About Google Chrome) and download the relevant Chromedriver from https://sites.google.com/chromium.org/driver/home
  3. TQDM: https://pypi.org/project/tqdm/
  4. BeautifulSoup4: https://pypi.org/project/beautifulsoup4/

Using the webscraper

It is important to take care of the sequence of executing these files. Please follow the sequence below:

  1. ET_Archive_Links.py: Use this website as it is the source of everything that we'll do later. This scripy gives us the initial links in the Archive page of the website.
  2. ET_All_Links_Inside_Archive.py: This is the script that takes the output (csv file) of the previous script. It produces a new file which contain URLs of all the archived news on the website since 2002.
  3. ET_Content.py: Finally, this is the script that scrapes the headlines along with the dates. ( If you want to scrap any other part of the website then this is the script that you have to edit )

Dataset

I used the scraper on another news website named "Businessline". It's dataset is available on Kaggle(https://www.kaggle.com/rsiyanwal/20182019-businessline-headlines).

You might also like...
A simple flask application to scrape gogoanime website.

gogoanime-api-flask A simple flask application to scrape gogoanime website. Used for demo and learning purposes only. How to use the API The base api

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

A web scraper for nomadlist.com, made to avoid website restrictions.
A web scraper for nomadlist.com, made to avoid website restrictions.

Gypsylist gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions. nomadlist.com is a website with a lot of information fo

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.
A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Annex Bubt Scraping Script I think this is the first public repository that provides free annex-BUBT, BUBT-Soft, and BUBT website scraping API script

Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it
Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

web-scraping Program that scrapes a website for a collection of quotes, picks on

Python scrapper scrapping torrent website and download new movies Automatically.

torrent-scrapper Python scrapper scrapping torrent website and download new movies Automatically. If you like it Put a ⭐ on this repo 😇 Run this git

Owner
Rahul Siyanwal
I love doing stuff.
Rahul Siyanwal
Ebay Webscraper for Getting Average Product Price

Ebay-Webscraper-for-Getting-Average-Product-Price The code in this repo is used to determine the average price of an item on Ebay given a valid search

null 17 Jan 5, 2023
Pro Football Reference Game Data Webscraper

Pro Football Reference Game Data Webscraper Code Copyright Yeetzsche This is a simple Pro Football Reference Webscraper that can either collect all ga

null 6 Dec 21, 2022
SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

null 28 Dec 20, 2022
WebScraper - A script that prints out a list of all EXTERNAL references in the HTML response to an HTTP/S request

Project A: WebScraper A script that prints out a list of all EXTERNAL references

null 2 Apr 26, 2022
Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers.

Louie Cai 13 Oct 15, 2022
This program will help you to properly scrape all data from a specific website

This program will help you to properly scrape all data from a specific website

MD. MINHAZ 0 May 15, 2022
Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Github Scraper Github scraper app is used to scrape data for a specific user profile. Github scraper app gets a github profile name and check whether

Siva Prakash 6 Apr 5, 2022
Goblyn is a Python tool focused to enumeration and capture of website files metadata.

Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se

Gustavo 46 Nov 22, 2022
A web crawler script that crawls the target website and lists its links

A web crawler script that crawls the target website and lists its links || A web crawler script that lists links by scanning the target website.

null 2 Apr 29, 2022
This tool can be used to extract information from any website

WEB-INFO- This tool can be used to extract information from any website Install Termux and run the command --- $ apt-get update $ apt-get upgrade $ pk

null 1 Oct 24, 2021