Results 1 to 3 of 3
  1. #1
    Join Date
    Aug 2017
    Location
    South Africa
    Posts
    12

    Unanswered: Various methods of blocking online website backlink profilers

    Helo guys ive been going mad trying to get around this one so now i put it in the php forum since i heard there is a php solution to this i just cant find it online.

    Im being crawled by other websites and it nailing my bandwidth i suspect they are crawling me for my backlink profile. I have tried disalow in the robot.txt file which doesn't seem to work .
    Also i have used mod rewrite;s on my Apache servers htacesss file specifically blocking my user agent string and ip ranges. However i have come to find that the suspected bots can change their ip out of ranges and also they do not have to declare accurate user agent strings . So there is still acess to my online website by all the major back link bots.

    Is there something i overlooked perhaps a php solution i tired crawling the competitor whom i suspected and i can get anything on him he managed to do it maybe with redirects im not sure how he identifies the bots and spiders any ideas?

  2. #2
    Join Date
    Jan 2016
    Posts
    14
    Hello,

    Can you confirm what .htaccess rules you are using?

    Do you see any pattern in bad bots?

    This is not something php thing could help. The backlinks bots somehow will target your website.

    You can only allow major search engies to crwl your site in robots.txt and need to update firewall and .htaccss rules accordingly.

  3. #3
    Join Date
    Aug 2017
    Location
    South Africa
    Posts
    12
    I am running this for now , most people suggest going through your logs and finding the bots and adding them. However as i stated they are constantly changing ips , user agent strings and so forth. I also understand that once youve been crawled by example ahrefs you are put on a database , so even if you did block all their changing spiders , you will still yield results to the person searching. I have read of something called a "PHP Blackhole" it seems to be a trap. I have not looked into firewall options as of yet... lastly Worpress gets plugins against bots so if thats the case there must be a way , perhaps download the plugin and have a look at the coding to see what they are doing. It would be useless however if they are updating from a database.

    For now ive kind of let it go , its tooo much effort my time is better spent improving my site.


    THanyou however for you response if you do have anything to add please do

    Code:
    RewriteEngine On
    RewriteCond %{REQUEST_URI} !/robots.txt$
    RewriteCond %{HTTP_USER_AGENT} ^.*BLEXBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BlackWidow.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Nutch.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Jetbot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebVac.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Stanford.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*scooter.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*naver.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*dumbot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Hatena\ Antenna.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*grub.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*looksmart.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebZip.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*larbin.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*b2w/0.1.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Copernic.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*psbot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Python-urllib.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetMechanic.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*URL_Spider_Pro.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CherryPicker.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailCollector.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailSiphon.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebBandit.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailWolf.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Email.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CopyRightCheck.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Crescent.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SiteSnagger.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ProWebWalker.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CheeseBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LNSpiderguy.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ia_archiver.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Alexibot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Teleport.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MIIxpc.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Mister\ PiX.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebAuto.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*TheNomad.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WWW-Collector-E.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*RMA.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*libWeb/clsHTTP.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*asterias.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*httplib.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*turingos.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*spanner.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Harvest.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*InfoNaviRobot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Bullseye.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebBandit.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NICErsPRO.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Microsoft\ URL\ Control.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DittoSpyder.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Foobot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebmasterWorldForumBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SpankBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BotALot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*lwp-trivial.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebmasterWorld.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BunnySlippers.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*URLy\ Warning.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Wget.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LinkWalker.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*cosmos.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*hloader.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*humanlinks.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LinkextractorPro.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Offline\ Explorer.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Mata\ Hari.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LexiBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Image\ Collector.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*The\ Intraformant.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*True_Robot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BlowFish.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SearchEngineWorld.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*JennyBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MIIxpc.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BuiltBotTough.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ProPowerBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BackDoorBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*toCrawl/UrlDispatcher.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebEnhancer.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*suzuran.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebViewer.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*VCI.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Szukacz.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*QueryN.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Openfind.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Openbot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Webster.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EroCrawler.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LinkScan.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Keyword.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Kenjin.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Iron33.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Bookmark\ search\ tool.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GetRight.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*FairAd\ Client.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Gaisbot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Aqua_Products.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Radiation\ Retriever\ 1.1.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Flaming\ AttackBot.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Oracle\ Ultra\ Search.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MSIECrawler.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*PerMan.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*searchpreview.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*sootle.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Enterprise_Search.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Bot\ mailto:craftbot@yahoo.com.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ChinaClaw.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Custo.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DISCo.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Download\ Demon.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*eCatch.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EirGrabber.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailSiphon.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailWolf.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Express\ WebPictures.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EyeNetIE.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*FlashGet.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GetRight.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GetWeb!.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Go!Zilla.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Go-Ahead-Got-It.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GrabNet.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Grafula.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*HMView.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Image\ Stripper.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Image\ Sucker.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Indy\ Library.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*InterGET.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Internet\ Ninja.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*JetCar.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*JOC\ Web\ Spider.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*larbin.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LeechFTP.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Mass\ Downloader.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MIDown\ tool.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Mister\ PiX.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Navroad.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NearSite.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetAnts.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetSpider.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Net\ Vampire.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetZIP.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Octopus.*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Offline\ Explorer.*$ [NC,OR]
    RewriteRule ^.*.* http://www.google.com [L]
    ##Had to cut it short because of character restrictions in posts.

    Also yes ive considered using a whitelist ( block everything exept major search engines) but its a high risk as google crawls anon sometimes and also bots can use the user agents string googlebot if they want.
    Last edited by petershene; 11-03-17 at 03:05.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •