Majestic-12 logo
Distributed Search Engine
Use it:
Install the 'Majestic-12' search plugin.
Home | DSearch | Projects | Stats | Download | Forum | Blog | About

Majestic-12 : DSearch : MJ12bot

Contact address (if you are too busy to read useful info below) at: bot@majestic12.co.uk (we respond very quickly!)

01/06/09: We've received information that 2 fake bots appear to operate from IPs: 81.169.145.25 and 81.169.145.28 - we are currently investigating details, if you have been crawled from those IPs please email us.

24/11/08 New fake bot activity from IP: 88.38.211.149 - we have been made aware of a fake MJ12bot crawling from this IP address. We have nothing to do with it! The bot manifests itself as MJ12bot/v1.2.1 (old version) - it does not obey robots.txt and may overload websites. It also uses HEAD requests a lot (our bot doesn't do it). We believe it is a 3rd party multi-threaded crawler that was given fake user-agent (ours). The good news is that we know who does it! fake bot comes from - progetplus.it - contact page - Italian based company, phone number: + 39 0547 382551.

If you see this fake bot from that IP, please report this abuse to: abusedesk@interbusiness.it and abuse@business.telecomitalia.it - this is their ISP who provides them with upstream connectivity and they were very responsive. I think if such abuse continues they will cut them off for good.

Once again - this is a fake bot - we are not responsible in any way for actions of these people.

Newsflash 04/02/08: Massive Anchor Index has been released! If you are into SEO then don't walk past without checking Majestic-SEO!

You've most likely reached this page by clicking a link left by MJ12bot in your log files. Below you can see some of the most Frequently Asked Questions regarding MJ12bot.

  1. What is MJ12bot doing on my site(s)?
  2. What happens with the crawled data?
  3. How can I block MJ12bot?
  4. Why my robots.txt block did not work on MJ12bot?
  5. How can I slow down MJ12bot?
  6. How can I reduce bandwidth usage?
  7. What about MJ12bot/v1.0.8 (also known as fake MJ12bot)?
  8. What are the current versions of MJ12bot?

What is MJ12bot doing on my site(s)?

We do spider the Web for the purpose of building a distributed search engine with fast and efficient downloadable distributed crawler that will enable people with broadband connections to help contribute to, what we hope, will become the biggest search engine in the world.

What happens with crawled data?

Crawled data is added to the search engine index. This is work in progress, but an Alpha version of the search engine is available here.

How can I block MJ12bot?

MJ12bot adheres to robots.txt standard. If you want the bot to prevent website from being crawled then add the following text to your robots.txt:

User-agent: MJ12bot
Disallow: /

Please do not waste your time trying to block bot via IP in htaccess - we do not use any consequitive IP blocks so your efforts will be in vain. Also please make sure the bot can actually retrieve robots.txt itself - if it can't then it will assume (this is the industry practice) that its okay to crawl your site.

If you have reason to believe that MJ12bot did NOT obey your robots.txt commands, then please let us know via email: bot@majestic12.co.uk. Please provide URL to your website and log entries showing bot trying to retrieve pages that it was not supposed to.

Why why robots.txt block did not work on MJ12bot?

There can be different reasons for this. Most recently the most common reason is fake bots that claim to be MJ12bot but actually they are not ours. Naturally those bad guys don't bother obeying robots.txt. This is the same situation as with spammers who fake source email addresses - there is nothing we can do about it unfortunately because these things are out of our control. In order to check if MJ12bot is ours or not we need log requests showing IP address of the bot, and ideally request for robots.txt - in referer section there genuine MJ12bot will contain debugging information that allows to positively answer whether it is fake or not.

How can I slow down MJ12bot?

You can easily slow down bot by adding the following to your robots.txt file:

User-Agent: MJ12bot
Crawl-Delay:   5

Crawl-Delay should be an integer number and it signifies number of seconds of wait between requests. MJ12bot will make an up to 20 seconds delay between requests to your site - note however that while it is unlikely, it is still possible your site may have been crawled from multiple MJ12bots at the same time. Making high Crawl-Delay should minimise impact on your site. This Crawl-Delay parameter will also be active if it was used for * wildcard.

If our bot detects that you used Crawl-Delay for any other bot then it will automatically crawl slower even though MJ12bot specifically was not asked to do so.

Fake MJ12bot v1.0.8 (virus based botnet)

31 Aug 2008: this information is now kept for historical purposes only - the fake bot has not appeared for a long time now, thankfully!

Below you can see extensive information about fake MJ12bot - we have not received any reports of such fake bot since Feb 2008, however we decided to keep the information below as it was posted at the time for historical purposes. Short summary - virus botnet unrelated to us used user-agent of our old bot.

20 Oct 2007 - in the last few days it has been brought to our attention that a number of fake MJ12bots appeared on the Net. These bots are not ours but they use fake MJ12bot user-agent - this is something we can't do anything about just like with email spammers who fake email addresses so we all get spammed supposedly from our own emails or someone elses. :(

6 Dec 2007 - there appears to be a surge in activity of the fake bots in the last few weeks, we have added many more IPs for you to block. They seem to be part of a botnet, unfortunately we do not know who and why does this stuff, the best we can do is take reports of these fakes and publish their IPs for everyone to block. This is a very difficult situation for us as our reputation being affected by those scumbags, we hope you understand this situation and direct your anger to those bad guys rather than us - we really don't have anything to do with this behavior :'-(

28 Dec 2007 - we continue to get reports from people about fake bot. It has now became certain that this bot is actually a virus of some kind that installs itself on end user computers and turns them into botnet - currently anti-virus vendors do not appear to catch this malware, however we are working very hard trying to collect data that will help develop a cure against this virus. Yet again we stress that we have nothing whatsoever to do with those people - our software is not used, they just use fake user-agent that we started using more than 3 years ago.

30 Dec 2007 - if your PC has been infected by this botnet please report this at Kaspersky forum (anti-virus vendor) thread. This fake bot is now known to be 100% a virus of some kind that seems to have infected a lot of people, yet again we want to stress that they don't use our software (it can't be used this way) and they just fake user-agent to look like us :( If you wish to discuss this fake bot on our forum you can do so here (you can post anonymously there, no need to register).

31 Dec 2007 - Breaking news! Kaspersky Labs have successfully identified this virus and it's detection and removal will be included in the next release! Here is relevant thread from their forum. They called this fake MJ12bot virus thingy as: Trojan.Win32.Agent.dqy and Trojan.Win32.Zapchast.dv. I am going to ask user that supplied infected files to Kaspersky to forward them to me so that I could pass them along to other anti-virus companies, hopefully they will be as quick as Kaspersky and produce a cure for everyone, though we can't be 100% sure that this botnet will disappear, but at least right now we know for fact that it was a malicious virus that, yet again, had nothing to do with us! Happy New Year to everyone and lets hope criminals who made this virus will get what they deserve!

6 Jan 2008 - number of reports about this virus appear to be going down, we don't know right now if this is because low lifes who run it took a holiday or anti-viruses are catching this infection more effectively than before - I certainly hope the latter is the case. In any even I hope now that it is proven that this fake bot was a virus that had nothing to do with us, people can see that we were the innocent party in this, victims just like those webmasters hit by this virus. We did all we could in trying to stop this pest, including paying a small cash bounty to an infected person who helped in trying to locate this pest. We did this because we were as pissed off as you, let's hope this problem will go away forever and never return.

Solutions

Best solution is to ban fake bot using user-agent that it claims to be with keywords: "MJ12bot" and "1.0.8" - any MJ12bot claiming to be this version is fake because we don't use this version for a long time. Below you can see two approaches to this, both of which require Apache, anyone running Microsoft IIS might have similar tools that can pattern match user-agent and ignore requests from those matched, if you do then please let us know.

Solution 1: Hexia.net blog entry how to block in Apache fake MJ12bots claiming to be v1.0.8, read more below about them or go to the good people of Hexia.net to get this block that does not depend on IP address of this fake bot. Our good bot obeys your robots.txt file, so if you wish to disallow it then it is best to use robots.txt.

Solution 2 (updated: 6/01/08): Suggested by Ken from www.kensadservice.com add to .htaccess the following: RewriteCond %{HTTP_USER_AGENT} MJ12bot/v1\.0\.8 [NC]
RewriteRule ^.* - [F]

Alternative suggestion from Paul to have this htaccess rule as follows:

RewriteCond %{HTTP_USER_AGENT} ^MJ12bot/v1\.0\.8.*$
RewriteRule .* - [F]

Another suggestion from Olliver W.

On another note, in your tips and tricks section for dealing
appropriately with this fake bot you were mentioning some sample entries
to be added to httpd.conf or .htaccess (depending on the level of access
one has on the server), but I noticed Mod SetEnvIf is missing. So
here is a step by step guide for Apache users:

1. First create a section for mod setenvif in case it does not exist. It
is not dependent from any Directory/Location directives and can be placed
in both httpd.conf and .htaccess

# deny fake bot
SetEnvIfNoCase User-Agent "^MJ12bot/v?1\.[01]\.[0-9]{1,2}" block

This entry will create an environmental variable called "block" in case
of a match. The match itself is a bit more sophisticated to catch any
modifications that are likely to happen once the old Agent no longer
achieves its goal. It denies access for any 1.0.x or 1.1.x version and
works even if the "v" was omitted.

2. Create an entry what to do with the variable in case it is set

Deny from env=block

In .htaccess this line merely needs to be placed after the SetEnvIfRule,
but those who want to include it in httpd.conf, have to take care of
placing it within their VirtualHost section. An example as illustration:


[...]
  # Directory permissions
  
    Options Indexes FollowSymlinks MultiViews
    AllowOverride All
    Order deny,allow
    # apply SetEnvIfRule here
    Deny from env=block
[...]

This should give 403 Forbidden errors to fake bot requests.

Solution 3: Suggested by Michael B. - if you have Cold-Fusion then add the following to application.cfm:

<cfif cgi.HTTP_USER_AGENT contains "MJ12bot/v1.0.8">
<cflocation url="http://www.fbi.gov/">
</cfif>

This should result in requests by fake bot redirected to the FBI, maybe this will make them interested, I sure hope so and will be pretty happy if those fake bot guys get waterboarded in Guantanamo, harsh but just treatment that they surely deserve! Note: we don't know for certain if their fake bot supports redirects at all, it probably does though.

You might be tempted to ban anything that has got MJ12bot in user-agent. This is not wise for 2 reasons: first you will prevent our good bot from obeying your robots.txt because it won't be able to get it, and secondly you will help bad guys achieve what they probably want - ruin our reputation as good guys and make people ban our good bot. We don't know if the bad guys who run this fake bot want that or they just picked our user-agent randomly, but if you hate those guys as much as we do, then don't allow them to achieve their goals.

Solution 4: ban known fake MJ12bot IPs - you can ban those right now as they are known to be used by fakers: [removed IPs since they no longer relevant]

The list of IPs is pretty big. Initially it was small, but then it grown up pretty quickly - it seems that those guys run their bot that pretends to be us on a big botnet, which is why IPs are so varied. Banning by IP is therefore not the best approach - it is better to catch user-agent MJ12bot/v1.0.8 as described above. We will keep this list however to show our good faith towards those who got hit by this bot - we will add your IPs to this public list to demonstrate that we have nothing to do with those people, whoever they are.

If you are in doubt whether the bot that crawled your site is genuine then please use contact information at the bottom of this page to tell us about it and we will give answer whether this is a genuine or fake MJ12bot bot. Once again to reiterate - we don't know who fakes our user-agent and for what purposes, but you can be sure that this is not us.

The way to distinquish those fake bots is this:

  1. Too old version: v1.0.8 - current bot version is v1.2.4 (v1.2.3 is also valid until 1 Jun 2009) - if you see v1.0.8 of the bot then it is fake, tell us its IP though please as we want to add it to the list of fake bots IPs above!
  2. Does not retrieve immediately prior to crawling urls or no more than 24 hours ago robots.txt, and does not obey it
  3. "Accept" header: */* (genuine is normally "text/html,text/plain,text/xml,text/*,application/xml,application/xhtml+xml") (thanks to Borg for this information)

If you have any scripts that check for user-agents then you can safely ban any MJ12bot that claims to be v1.0.8 - this old version that is not in use now is definately a fake. But please consider not banning whole of MJ12bot in robots.txt - it won't save you from fake bots that ignore robots.txt.

What are the current versions of MJ12bot?

Current legit versions of MJ12bot are:

  1. v1.3.0 (in BETA will replace old versions from 15 Sep 2009)
  2. v1.2.5
  3. v1.2.4

If you have not been satisfied with the information above then feel free to contact us: bot@majestic12.co.uk or alternatively (if you don't get reply within 24 hours, which could be due to spam filter wrongly picking on your email) feel free to post in our forum's bug section where you don't even need to register to post: here.


Copyright © Majestic-12