Nginx limits the frequency of crawlers in search engines and prohibits blocking web crawler configurations

  • 2020-05-07 20:55:29
  • OfStack

# Global configuration
limit_req_zone $anti_spider zone=anti_spider:10m rate=15r/m; # a server In the
 limit_req zone=anti_spider burst=30 nodelay;
 if ($http_user_agent ~* "xxspider|xxbot") {
 set $anti_spider $http_user_agent;

If the set frequency is exceeded, one 503 is given to spider.
Please use google for detailed explanation of the above configuration, and please customize the name of spider/bot.

Attachment: blocking web crawlers is prohibited in nginx

server { 
        listen       80; 
        #charset koi8-r; 
        #access_log  logs/host.access.log  main; 
        #location / { 
        #    root   html; 
        #    index  index.html index.htm; 
    if ($http_user_agent ~* "qihoobot|Baiduspider|Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Google|Adsbot-Google|Feedfetcher-Google|Yahoo! Slurp|Yahoo! Slurp China|YoudaoBot|Sosospider|Sogou spider|Sogou web spider|MSNBot|ia_archiver|Tomato Bot") { 
                return 403; 
    location ~ ^/(.*)$ { 
                proxy_pass http://localhost:8080; 
        proxy_redirect          off; 
        proxy_set_header        Host $host; 
        proxy_set_header        X-Real-IP $remote_addr; 
        proxy_set_header       X-Forwarded-For   $proxy_add_x_forwarded_for; 
        client_max_body_size    10m; 
        client_body_buffer_size 128k; 
        proxy_connect_timeout   90; 
        proxy_send_timeout      90; 
        proxy_read_timeout      90; 
        proxy_buffer_size       4k; 
        proxy_buffers           4 32k; 
        proxy_busy_buffers_size 64k; 
        proxy_temp_file_write_size 64k; 
        #error_page  404              /404.html; 
        # redirect server error pages to the static page /50x.html 
        error_page   500 502 503 504  /50x.html; 
        location = /50x.html { 
            root   html; 
        # proxy the PHP scripts to Apache listening on 
        #location ~ \.php$ { 
        #    proxy_pass; 
        # pass the PHP scripts to FastCGI server listening on 
        #location ~ \.php$ { 
        #    root           html; 
        #    fastcgi_pass; 
        #    fastcgi_index  index.php; 
        #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name; 
        #    include        fastcgi_params; 
        # deny access to .htaccess files, if Apache's document root 
        # concurs with nginx's one 
        #location ~ /\.ht { 
        #    deny  all; 

You can test 1 with curl

curl -I -A "qihoobot"

Related articles: