Install Scrapy on macOS, don't step on the pit

Install Scrapy

Install according to the official Scrapy installation guide: https://docs.scrapy.org/en/latest/intro/install.html# .

Scrapy requires Python version: Python 3.6+

Scrapy is written in pure Python and depends on a few key Python packages (amongst others):

  • lxml , an efficient XML and HTML parser
  • parsel , an HTML/XML data extraction library written on top of lxml,
  • w3lib , a multipurpose helper for dealing with URLs and web page encoding
  • twisted , an asynchronous networking framework
  • cryptography and pyOpenSSL , to handle various network-level security needs

A testable minimal version of Scrapy requires installation:

  • twisted 14.0
  • lxml 3.4
  • pyOpenSSL 0.14

1. Use Anaconda or Miniconda

To install packages from conda -forge , run:conda install -c conda-forge scrapy, which avoids most installation problems.

2. It is recommended to use a virtual environment to install

It is recommended to install Scrapy in a virtual environment on all platforms. Virtual environments allow you to not conflict with already installed Python system packages.

For information on how to create virtual environments, see Virtual Environments and Packages.

Python packages can be installed globally (that is, system-wide) or in user space. It is not recommended to install Scrapy system-wide.

3. Platform-specific installation instructions

3.1 Windows

If you already have Anaconda or Miniconda installed, install Scrapy:conda install -c conda-forge scrapy.

Use directly on Windows if Anaconda or Miniconda is not installedpipInstall Scrapy:

First you need "Microsoft Visual C++" to install some Scrapy dependencies:

  1. Download and execute the Microsoft C++ Build Tools to install the Visual Studio installer.

  2. Run the Visual Studio installer.

  3. Under the Workloads section, select C++ build tools .

  4. Check the installation details and make sure the following packages are selected as optional components:

    • MSVC (eg MSVC v142 - VS 2019 C++ x64/x86 Build Tools (v14.23) )
    • Windows SDK (e.g. Windows 10 SDK (10.0.18362.0))
  5. Install the Visual Studio Build Tools.

Then, you can usepip install Scrapy.

3.2 macOS

Building Scrapy's dependencies requires the presence of a C compiler and development headers. On macOS, this is usually provided by Apple's Xcode development tools. To install the Xcode command line tools, open a terminal window and run:

xcode-select --install
  • 1

Create a virtual environment with condapy310:

conda create -n py310 python = 3.10
conda info -e
conda activate py310
  • 1
  • 2
  • 3

Then install Scrapy:

pip install Scrapy
  • 1

Be careful to usepip install,I useconda installThen there is the pit behind. The operation method of stepping on the pit is a common practice on CSDN. It is not recommended for everyone to follow it here.

insert image description here---------------------------------- Dividing line-------------- --------------------------

Here is installed according to the official method, there is basically no problem, I will repeat installation process again .

Native Environment: MacOS 12.0+

  1. Install the Xcode command line tools, open a terminal and run:xcode-select --install.
  2. Under your miniconda or Anaconda, create a new virtual environment directly, as abovepy310.
  3. In this virtual environment, usepipInstall Scrapy:pip install Scrapy. Note, it must bepip,pip,pip.

4. Test your first Scrapy project

Crawler:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'https://quotes.toscrape.com/tag/humor/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)



  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

Enter at the terminal command line:

scrapy runspider quotes_spider.py -o quotes.jl
  • 1

Output result:

insert image description here

The last run is successful, it will generate aquotes.jlfile, the contents of which should be as follows:

insert image description here

[Not recommended] Stepping on the pit: An error occurs when installing Scrapy: MemoryError: Cannot allocate write+execute memory for ffi.callback().

Native Environment: MacOS 12.0+, Python3.8, Scrapy 2.6.1

The following is a simple example of running a Scrapy, enter at the command linescrapy runspider quotes_spider.py -o quotes.jl

import scrapy


class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'https://quotes.toscrape.com/tag/humor/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)


  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

The last run is successful, it will generate aquotes.jlfile, the contents of which should be as follows:

{"author": "Jane Austen", "text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d"}
{"author": "Steve Martin", "text": "\u201cA day without sunshine is like, you know, night.\u201d"}
{"author": "Garrison Keillor", "text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d"}
...

  • 1
  • 2
  • 3
  • 4
  • 5

But in the end an error occurred:

MemoryError: Cannot allocate write+execute memory for ffi.callback(). You might be running on a system that prevents this. For more information, see https://cffi.readthedocs.io/en/latest/using.html#callbacks
2022-03-28 15:57:37 [scrapy.core.engine] INFO: Closing spider (finished)

[Not feasible] Method 1: Delete the pyopenssl library and install openssl .

The problem is accompanied by the failure of SSL authentication. After checking the information: According to https://github.com/pyca/pyopenssl/issues/873, it is initially determined that it is pyopenssl. Openssl has write and execute permissions, but pyopenssl does not. But Scrapy relies on pyopenssl.

Check whether pyopenssl exists in your environment, passpip show pyopensslDiscovery exists. If you perform deleteconda uninstall pyopenssl, install openssl, the result is not feasible.

[Not feasible] Method 2: Update the requests library.

According to the reply on Scrapy on M1 Mac on Stack , try to update requests,pip3 install --upgrade requests. Update conda first,conda update conda, and then update requests. My requests are 2.27.1 which is the latest. In the end it was not feasible.

[Caution] Method 3: Upgrade the python version to 3.10.4, and then reinstall scrapy.

Created a virtual environment py310 with Python=3.10, reinstall scrapy, runscrapy runspider quotes_spider.py -o quotes.jlAfter that, this problem occurred:Library not loaded: @rpath/libssl.1.1.dylib.

ImportError: dlopen(/Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so, 0x0002): Library not loaded: @rpath/libssl.1.1.dylib

Referenced from: /Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so
Reason: tried: '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/../../../../../libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/../../../../../libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/bin/../lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/bin/../lib/libssl.1.1.dylib' (no such file), '/usr/local/lib/libssl.1.1.dylib' (no such file), '/usr/lib/libssl.1.1.dylib' (no such file)

Refer to Library not loaded: libcrypto.1.0.0.dylib issue in mac to see:

wrong reason:

  1. The default openssl installed by Homebrew is version 1.0; the latest scrapy requires version 1.1
  2. There is a problem with the dynamic link library path

Solution:

Step 1. Install openssl using brew.

Check if openssl is installed:brew info openssl, found that it was not installed. Install with brewbrew install openssl.

insert image description here

If the check finds that openssl 1.0 is installed, update to openssl 1.1 usingbrew reinstall [email protected], and finally check the openssl version:brew info brew, remember the address where your openssl dynamic library file libssl.1.1.dylib is located:/opt/homebrew/Cellar/[email protected]/1.1.1k/lib/libssl.1.1.dylib.

insert image description here

Step 2. Copy the dynamic library filelibssl.1.1.dylibinto the path @rpath.

First, locate the address of your dynamic library file:cd /opt/homebrew/Cellar/[email protected]/1.1.1k/lib;

insert image description here

Then, copy the dynamic library filelibssl.1.1.dylibandlibcrypto.1.1.dylibTo the path @rpath, such as some paths in the initial error message, such as '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/usr/lib/libssl .1.1.dylib' (no such file).

Here, add the dynamic library files to /Users/dan/miniforge3/envs/py310/lib/:

sudo  cp /opt/homebrew/Cellar/[email protected]/1.1.1k/lib/libssl.1.1.dylib /Users/and/miniforge3/envs/py310/lib/
 sudo  cp /opt/homebrew/Cellar/[email protected]/ 1.1.1k/lib/libcrypto.1.1.dylib /Users/and/miniforge3/envs/py310/lib/
  • 1
  • 2

It should be noted that if it is copied to/usr/lib/, see the following command, it may appearOperation not permittedPermission problem situation:

This is because SIP (System Integrity Protection) is enabled on the computer and the Rootless mechanism is added, which makes it impossible to modify files even with root privileges. Mainly because the rootless mechanism is the last line of defense against malicious programs.

Workaround 1: If you get a permission denied error even after sudo. try manually copying to/usr/lib.

  1. Open Finder, use command+shift+G, and fill in the pop-up directory/opt/homebrew/Cellar/[email protected]/1.1.1k/lib, enter the directory, find the filelibssl.1.1.dylibMake a copy: command + C;

  2. Use command+shift+G to fill in the pop-up directory/usr/lib, enter the directory, paste the filelibssl.1.1.dylib:command + V;

  3. Ditto, copylibcrypto.1.1.dylibarrive/usr/lib.

Solution 2: In order to be able to modify the file when necessary, the protection mechanism can only be turned off. 【Not recommended】

1) Restart, press and hold command+R during the process to enter the recovery partition. Then find Terminal to start and run.

2) Open Terminal and entercsrutil disable

3) Restart again, you canusr/libModify the files in the directory.

4) Restore the protection mechanism, re-enter the protection mode, entercsrutil enable.

sudo  cp /opt/homebrew/Cellar/[email protected]/1.1.1k/lib/libssl.1.1.dylib /usr/lib/
 sudo  cp /opt/homebrew/Cellar/[email protected]/1.1.1k/lib/libcrypto. 1.1.dylib /usr/lib/
  • 1
  • 2

or

 ln -s /opt/homebrew/Cellar/[email protected]/1.1.1k/lib/libssl.1.1.dylib /usr/lib/
  ln -s /opt/homebrew/Cellar/[email protected]/1.1.1k/lib/ libcrypto.1.1.dylib /usr/lib/
  • 1
  • 2

To summarize all the operations:

	brew reinstall [email protected] #Download   version 1.1 
	cd /opt/homebrew/Cellar      # The address is located according to your library file 
	# If openssl is not installed, you can ignore the next two steps 
	mv openssl [email protected] #Rename the previous version 
	mv [email protected] openssl # Use 1.1 
	# Locate the openssl library 
	cd /opt/homebrew/Cellar/[email protected]/1.1.1k/lib
	 sudo  cp libssl.1.1.dylib libcrypto.1.1.dylib /usr/lib/
	 # The next three steps seem not to be required 
	sudo  rm libssl.dylib libcrypto.dylib
	 sudo  ln -s libssl.1.1.dylib libssl.dylib
	 sudo  ln -s libcrypto.1.1.dylib libcrypto.dylib
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

Welcome everyone to pay attention to my personal public account: HsuDan , I will share more of my learning experience, summary of pit avoidance, interview experience, and the latest AI technology information.

Tags: Install Scrapy on macOS, don't step on the pit

software tools Python scrapy python reptile

Related: Install Scrapy on macOS, don't step on the pit