Distributed deployment of pyspider in centos7
- 2020-06-01 10:04:46
- OfStack
1. Setting up environment:
System version: Linux centos-linux.shared 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 GNU/Linux
python version: Python 3.5.1
1.1. Build python3 environment:
I chose the integrated environment Anaconda after trying
Compile 1.1.1.
# Download the dependent
yum install -y ncurses-devel openssl openssl-devel zlib-devel gcc make glibc-devel libffi-devel glibc-static glibc-utils sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-deve
# download python version
wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz
# Or use domestic sources
wget http://mirrors.sohu.com/python/3.5.1/Python-3.5.1.tgz
mv Python-3.5.1.tgz /usr/local/src;cd /usr/local/src
# Unpack the
tar -zxf Python-3.5.1.tgz;cd Python-3.5.1
# Compile the installation
./configure --prefix=/usr/local/python3.5 --enable-shared
make && make install
# Build soft links
ln -s /usr/local/python3.5/bin/python3 /usr/bin/python3
echo "/usr/local/python3.5/lib" > /etc/ld.so.conf.d/python3.5.conf
ldconfig
# validation python3
python3
# Python 3.5.1 (default, Oct 9 2016, 11:44:24)
# [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux
# Type "help", "copyright", "credits" or "license" for more information.
# >>>
# pip
/usr/local/python3.5/bin/pip3 install --upgrade pip
ln -s /usr/local/python3.5/bin/pip /usr/bin/pip
# I had a problem installing it will pip reinstall
wget https://bootstrap.pypa.io/get-pip.py --no-check-certificate
python get-pip.py
1.1.2. Integrated environment anaconda
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
1.2. Installation mariaDB
# The installation
yum -y install mariadb mariadb-server
# Start the
systemctl start mariadb
# Set to boot
systemctl enable mariadb
# Configure a password The default is empty
mysql_secure_installation
# The login
mysql -u root -p
# create 1 A user Set your own password
CREATE USER 'user_name'@'localhost' IDENTIFIED BY 'user_pass';
GRANT ALL PRIVILEGES ON *.* TO 'user_name'@'localhost' WITH GRANT OPTION;
CREATE USER 'user_name'@'%' IDENTIFIED BY 'user_pass';
GRANT ALL PRIVILEGES ON *.* TO 'user_name'@'%' WITH GRANT OPTION;
1.3. Installation pyspider
I use Anaconda
# Building a virtual environment sbird python version 3.*
conda create -n sbird python=3*
# Into the environment
source activate sbird
# The installation pyspider
pip install pyspider
# An error
# it does not exist. The exported locale is "en_US.UTF-8" but it is not supported
# perform Can be written to .bashrc
export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8
#ImportError: pycurl: libcurl link-time version (7.29.0) is older than compile-time version (7.49.0)
conda install pycurl
# exit
source deactivate sbird
# In a virtual machine unreachable localhost:5000 Shut-down firewall
systemctl stop firewalld.service
######### Direct source ==============
mkdir git;cd git
# download
git clone https://github.com/binux/pyspider.git
# The installation
/root/anaconda3/envs/sbird/bin/python /root/git/pyspider/run.py
Other methods
# Building a virtual environment
pip install virtualenv
mkdir python;cd python
# Creating a virtual environment pyenv3
virtualenv -p /usr/bin/python3 pyenv3
# Enter the virtual environment Activate the environment
cd pyenv3/
source ./bin/activate
pip install pyspider
# if pycurl An error
yum install libcurl-devel
# Continue to
pip install pyspider
# Shut down
deactivate
I recommend the anaconda installation
If an error occurs during the pyspider run, refer to the anaconda installation section. From here, you can see the localhost:5000 page.
1.4. Installation Supervisor
# The installation
yum install supervisor -y
# If it cannot be retrieved Add ali's epel The source
vim /etc/yum.repos.d/epel.repo
# Add the following
[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://mirrors.aliyun.com/epel/7/$basearch
http://mirrors.aliyuncs.com/epel/7/$basearch
failovermethod=priority
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug
http://mirrors.aliyuncs.com/epel/7/$basearch/debug
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=http://mirrors.aliyun.com/epel/7/SRPMS
http://mirrors.aliyuncs.com/epel/7/SRPMS
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
# The installation
yum install supervisor -y
# Test for successful installation
echo_supervisord_conf
1.4.1. Supervisor usage
supervisord #supervisor The server side of the Start the
supervisorctl # Start the supervisor Command line window
# Let's say I create a process pyspider01
vim /etc/supervisord.d/pyspider01.ini
# Write the following
[program:pyspider01]
command = /root/anaconda3/envs/sbird/bin/python /root/git/pyspider/run.py
directory = /root/git/pyspider
user = root
process_name = %(program_name)s
autostart = true
autorestart = true
startsecs = 3
redirect_stderr = true
stdout_logfile_maxbytes = 500MB
stdout_logfile_backups = 10
stdout_logfile = /pyspider/supervisor/pyspider01.log
# overloading
supervisorctl reload
# Start the
supervisorctl start pyspider01
# This can also be started
supervisord -c /etc/supervisord.conf
# Check the status
supervisorctl status
# output
pyspider01 RUNNING pid 4026, uptime 0:02:40
# Shut down
supervisorctl shutdown
1.5. Installation redis
# Message queue adoption redis
mkdir download;cd download
wget http://download.redis.io/releases/redis-3.2.4.tar.gz
tar xzf redis-3.2.4.tar.gz
cd redis-3.2.4
make
# Or directly yum The installation
yum -y install redis
# Start the
systemctl start redis.service
# restart
systemctl restart redis.service
# stop
systemctl stop redis.service
# Check the status
systemctl status redis.service
# Change the file /etc/redis.conf
vim /etc/redis.conf
# Change the content
daemonize no Instead of daemonize yes
bind 127.0.0.1 Instead of bind 10.211.55.22( Current server ip)
# restart redis
systemctl restart redis.service
1.6. About self-starting
# Supervisor Add to self-booting service
systemctl enable supervisord.service
# redis Add to self-booting service
systemctl enable redis.service
# Turn off the firewall and start
systemctl disable firewalld.service
At this point, the pyspider single server operating environment is set up and deployed. Launch localhost:5000 and enter the web interface.
Also can write a script to run in/pyspider/supervisor/pyspider01 log view the running state.
2. Distributed deployment
The server you just configured, name it centos01, and then deploy two centos02 and centos03, respectively, according to this configuration.
As follows:
Server name ip description
centos01 10.211.55.22 redis,mariaDB, scheduler
centos02 10.211.55.23 fetcher, processor, result_worker,phantomjs
centos03 10.211.55.24 fetcher, processor,,result_worker,webui
2.1.centos01
Enter the server centos01, after the first step, the basic environment has been set up, first edit the configuration file /pyspider/ config.json
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
0
Try running:
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
1
After the success of the operation, can be directly change/etc/supervisord d/pyspider01 ini is as follows:
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
2
centos01 has been deployed.
2.2.centos02
In centos02, you need to run result_worker, processor, phantomjs, fetcher
Create files respectively:
/etc/supervisord.d/result_worker.ini
[program:result_worker]
command = /root/anaconda3/envs/sbird/bin/python /root/git/pyspider/run.py -c /pyspider/config.json result_worker
directory = /root/git/pyspider
user = root
process_name = %(program_name)s
autostart = true
autorestart = true
startsecs = 3
redirect_stderr = true
stdout_logfile_maxbytes = 500MB
stdout_logfile_backups = 10
stdout_logfile = /pyspider/supervisor/result_worker.log
/etc/supervisord.d/processor.ini
[program:processor]
command = /root/anaconda3/envs/sbird/bin/python /root/git/pyspider/run.py -c /pyspider/config.json processor
directory = /root/git/pyspider
user = root
process_name = %(program_name)s
autostart = true
autorestart = true
startsecs = 3
redirect_stderr = true
stdout_logfile_maxbytes = 500MB
stdout_logfile_backups = 10
stdout_logfile = /pyspider/supervisor/processor.log
/etc/supervisord.d/phantomjs.ini
[program:phantomjs]
command = /pyspider/phantomjs --config=/pyspider/pjsconfig.json /pyspider/phantomjs_fetcher.js 25555
directory = /root/git/pyspider
user = root
process_name = %(program_name)s
autostart = true
autorestart = true
startsecs = 3
redirect_stderr = true
stdout_logfile_maxbytes = 500MB
stdout_logfile_backups = 10
stdout_logfile = /pyspider/supervisor/phantomjs.log
/etc/supervisord.d/fetcher.ini
[program:fetcher]
command = /root/anaconda3/envs/sbird/bin/python /root/git/pyspider/run.py -c /pyspider/config.json fetcher
directory = /root/git/pyspider
user = root
process_name = %(program_name)s
autostart = true
autorestart = true
startsecs = 3
redirect_stderr = true
stdout_logfile_maxbytes = 500MB
stdout_logfile_backups = 10
stdout_logfile = /pyspider/supervisor/fetcher.log
Create pjsconfig.json in the pyspider directory
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
4
Download phantomjs to/pyspider/folder, will git/pyspider/pyspider/fetcher/phantomjs_fetcher js copy to phantomjs_fetcher. js
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
5
centos02 has been deployed.
2.3.centos03
Deployment of the three processes fetcher, processor, result_worker and centos02 1, this server is mainly on the basis of the previous plus webui
Create documents:
# Integrated environment anaconda( recommended )
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
# Just install it directly.
./Anaconda3-4.2.0-Linux-x86_64.sh
# If the error, may be decompression failure
yum install bzip2
6
3. Summary
Visit http: / / 10.211.55.24:5000 can, to crawl.