apache USES.htaccess files to block wget from downloading web content

  • 2020-05-09 19:50:27
  • OfStack

I found that although wget follows the robots.txt rule, that can be bypassed, and I'm going to share my own masking methods:

1. Block downloading of any file

.htaccess


SetEnvIfNoCase User-Agent "^wget" bad_bot
<Limit GET POST>
  Order Allow,Deny
  Allow from all
  Deny from env=bad_bot
</Limit>

2. Block the download of some files

.htaccess


SetEnvIfNoCase User-Agent "^Wget" bad_bot
SetEnvIfNoCase User-Agent "^Wget/1.5.3" bad_bot
SetEnvIfNoCase User-Agent "^Wget/1.6" bad_bot
<Files ~ "\.(html|pdf|mp3|zip|rar|exe|gif|jpe?g|png|php|jsp) $">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</files>


Related articles: