Nginx anti crawler strategy to prevent UA from crawling websites
- 2021-09-05 01:22:56
- OfStack
New anti-crawler policy file:
vim /usr/www/server/nginx/conf/anti_spider.conf
File content
# Prohibit Scrapy Grasping of tools such as
if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {
return 403;
}
# Prohibit designation UA And UA Null access
if ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) {
return 403;
}
# Prohibit non- GET|HEAD|POST Grasping of mode
if ($request_method !~ ^(GET|HEAD|POST)$) {
return 403;
}
# Shield single IP The command is
#deny 123.45.6.7
# Seal the whole paragraph from 123.0.0.1 To 123.255.255.254 The command of
#deny 123.0.0.0/8
# Seal IP Segment is from 123.45.0.1 To 123.45.255.254 The command of
#deny 124.45.0.0/16
# Seal IP Segment is from 123.45.6.1 To 123.45.6.254 The command is
#deny 123.45.6.0/24
# The following IP They are all hooligans
#deny 58.95.66.0/24;
Configure the use of
Introduced in the server of the site
# Anti-reptile
include /usr/www/server/nginx/conf/anti_spider.conf
Finally restart nginx
Is the verification valid
Analog YYSpider
Lambda curl -X GET -I -A 'YYSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 403
server: marco/2.11
date: Fri, 20 Mar 2020 08:48:50 GMT
content-type: text/html
content-length: 146
x-source: C/403
x-request-id: 3ed800d296a12ebcddc4d61c57500aa2
Simulate Baidu Baiduspider
Lambda curl -X GET -I -A 'BaiduSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 200
server: marco/2.11
date: Fri, 20 Mar 2020 08:49:47 GMT
content-type: text/html
vary: Accept-Encoding
x-source: C/200
last-modified: Wed, 18 Mar 2020 13:16:50 GMT
etag: "5e721f42-150ce"
x-request-id: e82999a78b7d7ea2e9ff18b6f1f4cc84
User-Agent, a common reptile
FeedDemon Content acquisition
BOT/0.1 (BOT for JCE) sql Injection
CrawlDaddy sql Injection
Java Content acquisition
Jullo Content acquisition
Feedly Content acquisition
UniversalFeedParser Content acquisition
ApacheBench cc Attacker
Swiftbot Useless reptile
YandexBot Useless reptile
AhrefsBot Useless reptile
YisouSpider Useless crawler (has been UC God horse search acquisition, this spider can let go!)
jikeSpider Useless reptile
MJ12bot Useless reptile
ZmEu phpmyadmin Vulnerability scanning
WinHttp Acquisition cc Attack
EasouSpider Useless reptile
HttpClient tcp Attack
Microsoft URL Control Scanning
YYSpider Useless reptile
jaunty wordpress Blasting scanner
oBot Useless reptile
Python-urllib Content acquisition
Indy Library Scanning
FlightDeckReports Bot Useless reptile
Linguee Bot Useless reptile
The above is the Nginx anti-crawler strategy to prevent UA crawling site details, more information about Nginx anti-crawler please pay attention to other related articles on this site!