This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Faisal Ahmed

Last update: Jan 10, 2022

Related tags

Web Crawling Website-Crawler-Python-

Overview

Website-Crawler-Python

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address. After getting the website address, it asks for how much crawling depth the user wants in between the number of links has been found after providing the website address.

Website Crawler takes 3 inputs:

A website address
Integer value for the crawling depth
A user specified regular expression to find user specific data

General tasks:

Find all the Nowgegian mobile numbers and saves into a text file.
Find all the sub-links inside the given website and saves into a text file.
Saves the website's raw HTML code into a text file.
Find all email addresses and save into a text file.
Find all the comments used in the website and saves it into a text file.
Find five most used words and print it into the terminal.

This is a Python based project and used some dependent libraries to execute the functionalities.

RegEx
Urllib3
BeautifulSoup 4
Counter in Collections

This program will help you to properly scrape all data from a specific website

0 May 15, 2022

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

1 Dec 20, 2021

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

1 Jan 10, 2022

This is a webscraper for a specific website

This is a webscraper for a specific website. It is tuned to extract the headlines of that website. With some little adjustments the webscraper is able to extract any part of the website.

1 Dec 13, 2021

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

2.9k Jan 3, 2023

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

2 Jan 24, 2022

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

1 Dec 30, 2021

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具，可以快速批量下载大量论文，方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文，目前抓取成功率维持在90%以上。通过配置Config文件，可以抓取任意计算机领域相关会议的论文。 Installation Down

47 Nov 23, 2022

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

You might also like...

This program will help you to properly scrape all data from a specific website

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

A dead simple crawler to get books information from Douban.

A dead simple crawler to get books information from Douban.

This is a webscraper for a specific website

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

Owner

Faisal Ahmed

A web crawler script that crawls the target website and lists its links

Create crawler get some new products with maximum discount in banimode website

This repo has the source code for the crawler and data crawled from auto-data.net

Python script who crawl first shodan page and check DBLTEK vulnerability

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Open Crawl Vietnamese Text

Crawl BookCorpus

This script is intended to crawl license information of repositories through the GitHub API.

Iptvcrawl - A scrapy project for crawl IPTV playlist

A Telegram crawler to search groups and channels automatically and collect any type of data from them.