Web Scraping
AUTHOR: Saurabh G.
MTech Information Security, IIT Jammu.
If you find this repository useful.
I would appreciate if you Star it and Fork it !
This project is a part of Lab Tutorial for Data Organization and Retrieval course.
This tutorial is to be followed by MTech Data Science students of IIT Jammu, Batch 2021.
Objective
The objective of this tutorial is to help the students understand the basics of web scraping.
HOW TO RUN THIS PROJECT
Import the project in Pycharm IDE and run the "main.py" file. Use the "Add interpreter" of pycharm and set the path to "venv" folder provided in this repository.
The project will run !
Slides used for this lab can be found in the link below
Suggested Tutorial for Prerequisite
- Python: https://www.youtube.com/watch?v=_uQrJ0TkZlc
- Python file Handling: https://www.w3schools.com/python/python_file_handling.asp
- Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Suggested Articles
- Get and Post in python https://www.geeksforgeeks.org/get-post-requests-using-python/
- https://2.python-requests.org/en/master/user/advanced/#id2
- https://www.nylas.com/blog/use-python-requests-module-rest-apis/
- response methods : https://www.geeksforgeeks.org/response-methods-python-requests/
- user Agent : https://www.whatismybrowser.com/detect/what-is-my-user-agent
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
- https://www.howtogeek.com/114937/htg-explains-whats-a-browser-user-agent/
- https://www.geeksforgeeks.org/python-string-strip/
- https://www.geeksforgeeks.org/python-list-index/