download NCERT books using scrapy

Overview

download_ncert_books

download NCERT books using scrapy

NCERT_CLASS_1 NCERT_CLASS_2 NCERT_CLASS_3 NCERT_CLASS_4 NCERT_CLASS_5 NCERT_CLASS_6 NCERT_CLASS_7 NCERT_CLASS_8 NCERT_CLASS_9 NCERT_CLASS_10 NCERT_CLASS_11 NCERT_CLASS_12

Downloading Books:

You can either use the spider by cloning this repo and following the instructions given below
or
You can download the books direcly from the release section or by clicking on the badges above

There are 2 different kind of zips in the release section for every class

  1. Book wise NCERT_CLASS_ClassNo_Subject_BookName.zip : These zips contain the Chapters of the BookName for the Subject of the ClassNo
  2. Books Text Class_ClassNo_Text.zip : These zips contain the text extracted from all the books of the ClassNo

How to use the spider

Initial Setup

git clone https://github.com/nit-in/download_ncert_books.git
cd download_ncert_books
pip install -r requirements.txt

to run the spider

scrapy crawl --nolog ncert

and follow the prompts

for example if you want to download Class 11th Economics Book

 scrapy crawl  --nolog ncert                                                                                                                                      ─╯

Enter the class:        11

Select one the subjects:
Enter 1 for Sanskrit
Enter 2 for Accountancy
Enter 3 for Chemistry
Enter 4 for Mathematics
Enter 5 for Economics
Enter 6 for Psychology
Enter 7 for Geography

and so on ...

Enter subject number:   5

Select one the books:
Enter 1 for Indian Economic Development
Enter 2 for Statistics for Economics
Enter 3 for Sankhyiki
Enter 4 for Bhartiya Airthryavstha Ka Vikas 
Enter 5 for Hindustan Ki Moaashi Tarraqqi(Urdu)
Enter 6 for Shumariyaat Bar-e-Mushiyat(Urdu)

Enter book number:      1

Downloading...  Class: Class11  Subject: Economics      Book: Indian_Economic_Development       Chapters: 10


downloading keec1ps.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec1ps.pdf
downloading keec101.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec101.pdf
downloading keec102.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec102.pdf

			OR 

to download multiple books

enter their numbers separated by commas

e.g. 

Select one the books:
Enter 1 for Indian Economic Development
Enter 2 for Statistics for Economics
Enter 3 for Sankhyiki
Enter 4 for Bhartiya Airthryavstha Ka Vikas 
Enter 5 for Hindustan Ki Moaashi Tarraqqi(Urdu)
Enter 6 for Shumariyaat Bar-e-Mushiyat(Urdu)

Enter book number:      1,2

if you want to see scrapy spider log

scrapy shell ncert
You might also like...
Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

Scraping news from Ucsal portal with Scrapy.

NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

Fundamentus scrapy

Fundamentus_scrapy Baixa informacões que os outros scrapys do fundamentus não realizam. Para iniciar (python main.py), sera criado um arquivo chamado

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

Scrapy uses Request and Response objects for crawling web sites.

Requests and Responses¶ Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and p

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Comments
  • Bump requests from 2.26.0 to 2.28.1

    Bump requests from 2.26.0 to 2.28.1

    Bumps requests from 2.26.0 to 2.28.1.

    Release notes

    Sourced from requests's releases.

    v2.28.1

    2.28.1 (2022-06-29)

    Improvements

    • Speed optimization in iter_content with transition to yield from. (#6170)

    Dependencies

    • Added support for chardet 5.0.0 (#6179)
    • Added support for charset-normalizer 2.1.0 (#6169)

    New Contributors

    Full Changelog: https://github.com/psf/requests/blob/main/HISTORY.md#2281-2022-06-29

    v2.28.0

    2.28.0 (2022-06-09)

    Deprecations

    • ⚠️ Requests has officially dropped support for Python 2.7. ⚠️ (#6091)
    • Requests has officially dropped support for Python 3.6 (including pypy3). (#6091)

    Improvements

    • Wrap JSON parsing issues in Request's JSONDecodeError for payloads without an encoding to make json() API consistent. (#6097)
    • Parse header components consistently, raising an InvalidHeader error in all invalid cases. (#6154)
    • Added provisional 3.11 support with current beta build. (#6155)
    • Requests got a makeover and we decided to paint it black. (#6095)

    Bugfixes

    • Fixed bug where setting CURL_CA_BUNDLE to an empty string would disable cert verification. All Requests 2.x versions before 2.28.0 are affected. (#6074)
    • Fixed urllib3 exception leak, wrapping urllib3.exceptions.SSLError with requests.exceptions.SSLError for content and iter_content. (#6057)
    • Fixed issue where invalid Windows registry entires caused proxy resolution to raise an exception rather than ignoring the entry. (#6149)
    • Fixed issue where entire payload could be included in the error message for JSONDecodeError. (#6079)

    New Contributors

    ... (truncated)

    Changelog

    Sourced from requests's changelog.

    2.28.1 (2022-06-29)

    Improvements

    • Speed optimization in iter_content with transition to yield from. (#6170)

    Dependencies

    • Added support for chardet 5.0.0 (#6179)
    • Added support for charset-normalizer 2.1.0 (#6169)

    2.28.0 (2022-06-09)

    Deprecations

    • ⚠️ Requests has officially dropped support for Python 2.7. ⚠️ (#6091)
    • Requests has officially dropped support for Python 3.6 (including pypy3.6). (#6091)

    Improvements

    • Wrap JSON parsing issues in Request's JSONDecodeError for payloads without an encoding to make json() API consistent. (#6097)
    • Parse header components consistently, raising an InvalidHeader error in all invalid cases. (#6154)
    • Added provisional 3.11 support with current beta build. (#6155)
    • Requests got a makeover and we decided to paint it black. (#6095)

    Bugfixes

    • Fixed bug where setting CURL_CA_BUNDLE to an empty string would disable cert verification. All Requests 2.x versions before 2.28.0 are affected. (#6074)
    • Fixed urllib3 exception leak, wrapping urllib3.exceptions.SSLError with requests.exceptions.SSLError for content and iter_content. (#6057)
    • Fixed issue where invalid Windows registry entires caused proxy resolution to raise an exception rather than ignoring the entry. (#6149)
    • Fixed issue where entire payload could be included in the error message for JSONDecodeError. (#6036)

    2.27.1 (2022-01-05)

    Bugfixes

    • Fixed parsing issue that resulted in the auth component being dropped from proxy URLs. (#6028)

    2.27.0 (2022-01-03)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump itemadapter from 0.4.0 to 0.7.0

    Bump itemadapter from 0.4.0 to 0.7.0

    Bumps itemadapter from 0.4.0 to 0.7.0.

    Release notes

    Sourced from itemadapter's releases.

    v0.7.0

    What's Changed

    New Contributors

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.6.0...v0.7.0

    v0.6.0

    What's Changed

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.5.0...v0.6.0

    v0.5.0

    What's Changed

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.4.0...v0.5.0

    Changelog

    Sourced from itemadapter's changelog.

    0.7.0 (2022-08-02)

    ItemAdapter.get_field_names_from_class (#64)

    0.6.0 (2022-05-12)

    Slight performance improvement (#62)

    0.5.0 (2022-03-18)

    Improve performance by removing imports inside functions (#60)

    Commits
    • 0bd037c Bump version: 0.6.0 → 0.7.0
    • 8f3826a Update changelog for 0.7.0
    • 900ae14 ItemAdapter.get_field_names_from_class (#64)
    • 927ee25 Bump version: 0.5.0 → 0.6.0
    • 86f82ea Update changelog for 0.6.0
    • 8f239bc Merge pull request #62 from scrapy/performance
    • 60c9ccc Merge pull request #61 from scrapy/fix-repr
    • 8733014 Replace 'any' ocurrences
    • d66aa62 Remove hardcoded class name in ItemAdapter.repr
    • 1203b5e Bump version: 0.4.0 → 0.5.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump scrapy from 2.5.0 to 2.7.1

    Bump scrapy from 2.5.0 to 2.7.1

    Bumps scrapy from 2.5.0 to 2.7.1.

    Release notes

    Sourced from scrapy's releases.

    2.7.1

    • Relaxed the restriction introduced in 2.6.2 so that the Proxy-Authentication header can again be set explicitly in certain cases, restoring compatibility with scrapy-zyte-smartproxy 2.1.0 and older
    • Bug fixes

    See the full changelog

    2.7.0

    See the full changelog

    2.6.3

    Makes pip install Scrapy work again.

    It required making changes to support pyOpenSSL 22.1.0. We had to drop support for SSLv3 as a result.

    We also upgraded the minimum versions of some dependencies.

    See the changelog.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

    ... (truncated)

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.7.1 (2022-11-02)

    New features

    
    -   Relaxed the restriction introduced in 2.6.2 so that the
        ``Proxy-Authentication`` header can again be set explicitly, as long as the
        proxy URL in the :reqmeta:`proxy` metadata has no other credentials, and
        for as long as that proxy URL remains the same; this restores compatibility
        with scrapy-zyte-smartproxy 2.1.0 and older (:issue:`5626`).
    

    Bug fixes

    
    -   Using ``-O``/``--overwrite-output`` and ``-t``/``--output-format`` options
        together now produces an error instead of ignoring the former option
        (:issue:`5516`, :issue:`5605`).
    
    • Replaced deprecated :mod:asyncio APIs that implicitly use the current event loop with code that explicitly requests a loop from the event loop policy (:issue:5685, :issue:5689).

    • Fixed uses of deprecated Scrapy APIs in Scrapy itself (:issue:5588, :issue:5589).

    • Fixed uses of a deprecated Pillow API (:issue:5684, :issue:5692).

    • Improved code that checks if generators return values, so that it no longer fails on decorated methods and partial methods (:issue:5323, :issue:5592, :issue:5599, :issue:5691).

    Documentation </code></pre> <ul> <li> <p>Upgraded the Code of Conduct to Contributor Covenant v2.1 (:issue:<code>5698</code>).</p> </li> <li> <p>Fixed typos (:issue:<code>5681</code>, :issue:<code>5694</code>).</p> </li> </ul> <p>Quality assurance</p> <pre><code>

    • Re-enabled some erroneously disabled flake8 checks (:issue:5688).

    • Ignored harmless deprecation warnings from :mod:typing in tests (:issue:5686, :issue:5697).

    • Modernized our CI configuration (:issue:5695, :issue:5696).

    &lt;/tr&gt;&lt;/table&gt; </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary>

    <ul> <li><a href="https://github.com/scrapy/scrapy/commit/6ded3cf4cd134b615239babe28bb28c3ff524b05"><code>6ded3cf</code></a> Bump version: 2.7.0 → 2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/95880c5de1b1909bf03303fb9c02cddb0508fe1a"><code>95880c5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5701">#5701</a> from scrapy/relnotes-2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/5ec175b8bb08f93c431d7d64d2389b90ec7a1f37"><code>5ec175b</code></a> Small relnotes fixes.</li> <li><a href="https://github.com/scrapy/scrapy/commit/940a73863bf7dcb16b3f2d9f5efb83efe4599712"><code>940a738</code></a> Release notes for 2.7.1.</li> <li><a href="https://github.com/scrapy/scrapy/commit/a95a338eeada7275a5289cf036136610ebaf07eb"><code>a95a338</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5599">#5599</a> from tonal/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/9077d0f9b490114f117c668f115240c16afccedf"><code>9077d0f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5698">#5698</a> from pankali/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/76c2cb070e4efe3ae33a4b3d72a5bcac6709f48f"><code>76c2cb0</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5697">#5697</a> from iamkaushal/<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5686">#5686</a>_fix</li> <li><a href="https://github.com/scrapy/scrapy/commit/9f45be439de8a3b9a6d201c33e98b408a73c02bb"><code>9f45be4</code></a> Update Code of Conduct to Contributor Covenant v2.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/bd9e482c2f0db92065708c8291be6e8bc1f05218"><code>bd9e482</code></a> added typing.io and typing.re in pytest warning filter to ignore</li> <li><a href="https://github.com/scrapy/scrapy/commit/fd692f309105d917f5f46bd00a88c550d6cc7da3"><code>fd692f3</code></a> Prevent running the -O and -t command-line options together (<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5605">#5605</a>)</li> <li>Additional commits viewable in <a href="https://github.com/scrapy/scrapy/compare/2.5.0...2.7.1">compare view</a></li> </ul> </details>

    <br />

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(class_9)
Owner
coding is a hobby; Not professionally educated in programming; If you find issues or mistake DO tell me ;-)
null
This Spider/Bot is developed using Python and based on Scrapy Framework to Fetch some items information from Amazon

- Hello, This Project Contains Amazon Web-bot. - I've developed this bot for fething some items information on Amazon. - Scrapy Framework in Python is

Khaled Tofailieh 4 Feb 13, 2022
Amazon scraper using scrapy, a python framework for crawling websites.

#Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra

Akash Das 1 Dec 26, 2021
This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.

Deals of the Day This is a web scraper, using the Python framework Scrapy, built to extract data such as price and product name from the Deals of the

David Souza 1 Jan 12, 2022
Amazon web scraping using Scrapy Framework

Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

Sejal Rajput 1 Jan 25, 2022
A scrapy pipeline that provides an easy way to store files and images using various folder structures.

scrapy-folder-tree This is a scrapy pipeline that provides an easy way to store files and images using various folder structures. Supported folder str

Panagiotis Simakis 7 Oct 23, 2022
Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

Scrapinghub 8.7k Jan 5, 2023
Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

Scrapy project 45.5k Jan 7, 2023
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 3, 2023
An experiment to deploy a serverless infrastructure for a scrapy project.

Serverless Scrapy project This project aims to evaluate the feasibility of an architecture based on serverless technology for a web crawler using scra

José Ferraz Neto 5 Jul 8, 2022
a high-performance, lightweight and human friendly serving engine for scrapy

a high-performance, lightweight and human friendly serving engine for scrapy

Speakol Ads 30 Mar 1, 2022