Linux Deploys python Crawler Scripts and Sets Timed Tasks Method

  • 2021-06-28 14:45:12
  • OfStack

Last year, a crawler was written with python for the project.The crawled data needs to be stored in the PG database in the production environment.Therefore, you need to deploy the script to the CentOS server, set a timer task, and start the script automatically.

The implementation steps are as follows:

1. Install pip (OS comes with python2.6 for direct use, but no pip)


#  download pip Installation package 
wget "https://pypi.python.org/packages/source/p/pip/pip-1.5.4.tar.gz#md5=834b2904f92d46aaa333267fb1c922bb" --no-check-certificate
#  Unpack and install 
tar -xzvf pip-1.5.4.tar.gz
cd pip-1.5.4
python setup.py install

2. Install Third Party Library with pip


pip install PyGreSQL==5.0.3
pip install requests==2.18.3

3. Set timer tasks


#  Open Timed Task Service 
service crond start
#  View Timed Task Service Status 
service crond status
#  Open Timed Task Editing Window 
crontab -e 
#  Add two timed tasks, daily 0 spot 0 Divide and sum 12 spot 20 Separate execution 1 And write to the log 
0 0 * * * /usr/bin/python /home/longrise/psrd/collect.py > /home/longrise/psrd/collect.log 2>&1 &

20 12 * * * /usr/bin/python /home/longrise/psrd/collect.py > /home/longrise/psrd/collect.log 2>&1 &

 The syntax for timed tasks is as follows: 
# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed

Related articles: