START YOUR FIRST SCRAPY PROJECT

Before start with scrapy. I hope you’re already familiar with python or python package installer. They both are required to install scrapy on your machine. Scrapy requires a python 3.6+ version.

If you don’t have to install python on your machine you just need to install it first.

Install python — https://www.python.org/downloads/

Download python from the above links and install it to your system.

After installing python, you just need to install the python package install in your system or you can create a virtual environment for your scrapy project

Python package install

pip install pip

In python, we can use anaconda for a virtual environment

Install anaconda https://www.anaconda.com/products/individual#windows

Download anaconda from the above links and install it in your system.

Pip installs python packages whereas Conda installs packages that may contain software written in any language.

# Install scrapy on your machine using the following command on cmd or conda cmd

pip install scrapy

# After installation you check scrapy in your system using the following command

scrapy

# For check scrapy version on your system

scrapy version

# Run a quick benchmark test

scrapy bench

Scrapy Comes With A Simple Benchmarking Suite That Spawns A Local HTTP Server And Crawls It At The Maximum Possible Speed. The Goal Of This Benchmarking Is To Get An Idea Of How Scrapy Performs In Your Hardware, In Order To Have A Common Baseline For Comparisons. It Uses A Simple Spider That Does Nothing And Just Follows Links.

# For example check static website URL using scrapy

scrapy fetch — nologhttps://docs.scrapy.org/en/latest/topics/commands.html

# Save/Download static website in local machine using scrapy where

scrapy fetch — nolog https://docs.scrapy.org/en/latest/topics/commands.html > scrapydocs.html

# Open your website URL in a web browser for static website

scrapy view https://docs.scrapy.org/en/latest/topics/commands.html

# For scrapy shell

scrapy shell https://docs.scrapy.org/en/latest/topics/commands.html

# Shelp() to see all commands

shelp()

# Use xpath or css in scrapy in scrapy shell

For xpath

fetch(“https://docs.scrapy.org/en/latest/topics/commands.html”)
response.xpath(‘/html/head/title/text()’).get()

For CSS

fetch(“https://docs.scrapy.org/en/latest/topics/commands.html”)
response.css(‘#div’).get()

When you start any project in scrapy just create a separate directory for your project

# Make a new directory

mkdir dir

# Switch to your new directory

cd dir

# Create your first scrapy project

scrapy startproject scrapyprojectname

Your scrapy project name

# Enter to your project folder

cd scrapyprojectname

# Create the first spider for the project first switch to spider folder in cmd then run this command

scrapy genspider example –t crawl example.com

Spider Name website URL

Use for import Link Extractor, CrawlSpider, Rule

#First we need to switch spider location in command prompt then run the spider

scrapy crawl example

Spider Name

#Example project for scraping product data from amazon eCommerce website

reference url https://blog.datahut.co/tutorial-how-to-scrape-amazon-data-using-python-scrapy/

For Official Documentation for Scrapy

https://docs.scrapy.org/en/latest/topics/commands.html

Scroll to Top