Amazon Scraper: A command-line tool for scraping Amazon product data

Overview

Amazon Product Scraper: 2021

Description

A command-line tool for scraping Amazon product data to CSV or JSON format(s).

Requirements

  • Python 3
  • pip3

Installation

Using git clone (you'll need git installed for this):

git clone https://github.com/scrapewalrus/amazon-scraper-python-2021

Or download and extract the zip file of the project manually

You'll also need to install requirements for the project to run. Locate amazon-product-scraper folder via terminal and type pip install -r requirements.txt:

Usage

To launch the Amazon scraper locate the amazon-product-scraper folder via terminal and type python amazon_scraper.py -k "your keyword". This will start the program.

NOTE: you must declare either -k or --keyword before entering your keyword. It's a required argument.

Example:

amazon_scraper.py - the name of a scraper file.

-k or --keyword - required argument to pass before entering your keyword.

-p or --proxies - optional argument to enable proxies. To avoid getting blocked I highly recommend using proxies. I'm using Residential Proxies from Oxylabs. For highest success rate, I suggest Residential Proxies over Datacenter as they're almost impossible to detect and have the smallest footprint. If you decide to use different proxy provider services keep in mind that you'll have to make some minor adjustments in get-proxies.py file.

-j or --json - optional argument for storing extracted data in .json format. Default output format is .csv.

Example of product data #1: JSON

[
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/Mostly-T-Shirt-Womens-Letter-Printed/dp/B07QN2NQ59/ref=sr_1_3?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-3",
        "PRODUCT_NAME": "I'm Mostly Peace Love and Light Funny T-Shirt Womens Graphic Printed Short Sleeve Tops Tee",
        "PRICE": "$21.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "1,637"
    },
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/YITAN-Women-Graphic-Funny-X-Large/dp/B074QMG4D7/ref=sr_1_4?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-4",
        "PRODUCT_NAME": "YITAN Women's Cute Juniors Tops Teen Girl Tee Funny T Shirt",
        "PRICE": "$12.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "12,281"
    },
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/DANVOUY-Womens-V-Neck-Doesnt-Definitely/dp/B07V55ZXVS/ref=sr_1_5?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-5",
        "PRODUCT_NAME": "DANVOUY Womens If My Mouth Doesn't Say It My Face Definitely Will T Shirt",
        "PRICE": "$12.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "6,787"
    }]

Example of product data #2: CSV

amazon-csv-product-data-example

You might also like...
Free and Open-Source Command Line tool for Text Replacement
Free and Open-Source Command Line tool for Text Replacement

Sniplet Free and Open Source Text Replacement Tool Description: Sniplet is a work in progress CLI tool which can do text replacement globally in Linux

PwnWiki command line searching tool & bindings written in Python
PwnWiki command line searching tool & bindings written in Python

pwsearch PwnWiki 数据库搜索命令行工具。 安装 您可以直接用 pip 命令从 PyPI 安装 pwsearch: pip3 install -U pwsearch 您也可以 clone 该仓库并直接从源码启动

Official AIdea command line tool

AIdea CLI Official AIdea command line tool for https://aidea-web.tw. Installation Make sure you have installed both Python 3 and pip package manager.

PyArmor is a command line tool used to obfuscate python scripts

PyArmor is a command line tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.

A command line tool to remove background from video and image
A command line tool to remove background from video and image

A command line tool to remove background from video and image, brought to you by BackgroundRemover.app which is an app made by nadermx powered by this tool

A command line tool (and Python library) for archiving Twitter JSON

A command line tool (and Python library) for archiving Twitter JSON

command line tool for frequent nmigen tasks (generate sources, show design)
command line tool for frequent nmigen tasks (generate sources, show design)

nmigen-tool command line tool for frequent nmigen tasks (generate sources, show design) Usage: generate verilog: nmigen generate verilog nmigen_librar

Command line tool to keep track of your favorite playlists on YouTube and many other places.

Command line tool to keep track of your favorite playlists on YouTube and many other places.

Owner
null
Universal Command Line Interface for Amazon Web Services

This package provides a unified command line interface to Amazon Web Services.

Amazon Web Services 13.3k Jan 7, 2023
AML Command Transfer. A lightweight tool to transfer any command line to Azure Machine Learning Services

AML Command Transfer (ACT) ACT is a lightweight tool to transfer any command from the local machine to AML or ITP, both of which are Azure Machine Lea

Microsoft 11 Aug 10, 2022
A cd command that learns - easily navigate directories from the command line

NAME autojump - a faster way to navigate your filesystem DESCRIPTION autojump is a faster way to navigate your filesystem. It works by maintaining a d

William Ting 14.5k Jan 3, 2023
Ros command - Unifying the ROS command line tools

Unifying the ROS command line tools One impairment to ROS 2 adoption is that all

null 37 Dec 15, 2022
Unofficial Open Corporates CLI: OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. This is an unofficial open corporates python command line tool.

Unofficial Open Corporates CLI OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. This is an unoff

Richard Mwewa 30 Sep 8, 2022
A command line tool that creates a super timeline from SentinelOne's Deep Visibility data

S1SuperTimeline A command line tool that creates a super timeline from SentinelOne's Deep Visibility data What does it do? The script accepts a S1QL q

Juan Ortega 2 Feb 8, 2022
Python3 command-line tool for the inference of Boolean rules and pathway analysis on omics data

BONITA-Python3 BONITA was originally written in Python 2 and tested with Python 2-compatible packages. This version of the packages ports BONITA to Py

null 1 Dec 22, 2021
Python command line tool and python engine to label table fields and fields in data files.

Python command line tool and python engine to label table fields and fields in data files. It could help to find meaningful data in your tables and data files or to find Personal identifable information (PII).

APICrafter 22 Dec 5, 2022
Command-line tool for looking up colors and palettes.

Colorpedia Colorpedia is a command-line tool for looking up colors, shades and palettes. Supported color models: HEX, RGB, HSL, HSV, CMYK. Requirement

Joohwan Oh 282 Dec 27, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Phil Wang 4.4k Jan 9, 2023