majestic 12 bot

Moderators: Elvis, DrVolin, Jeff

majestic 12 bot

Postby tron » Sun May 06, 2018 2:14 am

Users browsing this forum: Google [Bot], Majestic-12 [Bot] and 10 guests

anyone know what this bot is?
User avatar
tron
 
Posts: 507
Joined: Fri Dec 08, 2006 6:34 pm
Blog: View Blog (0)

Re: majestic 12 bot

Postby Elvis » Sun May 06, 2018 4:59 am

https://www.majestic12.co.uk/projects/d ... j12bot.php

Majestic-12 logo Distributed Search Engine
Enter domain or URL:
Home | DSearch | Projects | Stats | Download | Forum | Blog | About

Majestic-12 : DSearch : MJ12bot

Email Address for Queries about the bot ( if you are too busy to read rest of the page ): bot@majestic12.co.uk (we respond very quickly!)

You may have reached this page by clicking a link left by MJ12bot in your log files. Below you can see some of the most Frequently Asked Questions regarding MJ12bot.

What is MJ12bot doing on my site(s)?

We spider the Web for the purpose of building a search engine with a fast and efficient downloadable distributed crawler that enables people with broadband connections to help contribute to, what we hope, will become the biggest search engine in the world. Production of a full text search engine at Majestic-12 is currently in the research phase, funded in part by the commercialisation of research at Majestic.
What happens with crawled data?

Crawled data (currently only web graph of links) is added to the the largest public backlinks search engine index that we maintain as a dedicated tool called Site Explorer. All webmasters can obtain full free data on backlinks by verifying ownership of their site - learn about your own backlinks from the extensive backlinks index.
Why do you keep crawling 404 or 301 pages?

We have a long memory and what to ensure that temporary errors, website down pages or other temporary changes to sites do not cause ireperable changes to your site profile when they shouldn't. Also if there are still links to these pages they will continue to be found and followed. Google have published a statement since they are also asked this question, their reason is of course the same as ours and their answer can be found here: Google 404 policy
Your are crawling links with rel=nofollow

This is a common misunderstanding of the (perhaps poorly named) nofollow attribute. Google introduced the 'rel=nofollow' attribute in 2005 stating that links so marked would not influence the target's Pagerank, it does not stop the crawler from visiting the target page, this becomes particularly obvious if the target page has several links to it, some may have this attribute, some may not. If you wish to stop bots from crawling a page then the robots.txt file should be used to disallow the target page.
More information on rel=nofollow can be found here: Wikipedia Nofollow
How can I block MJ12bot?

MJ12bot adheres to the robots.txt standard. If you want the bot to prevent website from being crawled then add the following text to your robots.txt:

User-agent: MJ12bot
Disallow: /

Please do not waste your time trying to block bot via IP in htaccess - we do not use any consecutive IP blocks so your efforts will be in vain. Also please make sure the bot can actually retrieve robots.txt itself - if it can't then it will assume (this is the industry practice) that its okay to crawl your site.

If you have reason to believe that MJ12bot did NOT obey your robots.txt commands, then please let us know via email: bot@majestic12.co.uk. Please provide URL to your website and log entries showing bot trying to retrieve pages that it was not supposed to.
What non-standard features of robots.txt MJ12bot supports?

Our current crawler supports the following non-standard extensions to robots.txt:

Crawl-Delay for up to 20 seconds (higher values will be rounded down to maximum our bot supports)
Redirects ( within same site ) when trying to fetch robots.txt
Simple pattern matching in Disallow directives compatible with Yahoo's wildcard specification
Allow directives can override Disallow if they are more specific (longer in length)
Certain failures to fetch robots.txt such as 403 Forbidden will be treated as blanket disallow directive

Why did a robots.txt block not work on MJ12bot?

We are keen to see any reports of potential violations of robots.txt by MJ12bot.

There are a number of false positives raised - this can be a useful checklist when configuring a web server:

Off site redirects when requesting robots.txt - MJ12Bot follows redirects, but only on the same domain. The ideal is for robots.txt to be available at "/robots.txt" as specified in the standard.
Multiple domains running on the same server. Modern webservers such as Apache can log accesses to a number of domains to one file - this can cause confusion when attempting to see what webserver was accessed at which point. You may wish to consider adding domain information to the access log, or splitting access logs on a per domain basis
Robots.txt out of sync with developer copy. We have had complaints that MJ12Bot has disobeyed robots.txt - only to find out that the developer was testing against a development server which was not in-sync with the live version

Historically, there was a period when the MJ12Bot User-Agent was spoofed. Bad bots often used spoofed user agents, which are easily faked. The discussion regarding the fake V1.08 MJ12Bot is archived here. Majestic-12 is therefore interested to hear of any reports of robots.txt violation, In order to check if MJ12bot is ours or not we need log requests showing IP address of the bot, the request for robots.txt and subsequent requests which you believe are in violation.
How can I slow down MJ12bot?

You can easily slow down bot by adding the following to your robots.txt file:

User-Agent: MJ12bot
Crawl-Delay: 5

Crawl-Delay should be an integer number and it signifies number of seconds of wait between requests. MJ12bot will make an up to 20 seconds delay between requests to your site - note however that while it is unlikely, it is still possible your site may have been crawled from multiple MJ12bots at the same time. Making high Crawl-Delay should minimise impact on your site. This Crawl-Delay parameter will also be active if it was used for * wildcard.

If our bot detects that you used Crawl-Delay for any other bot then it will automatically crawl slower even though MJ12bot specifically was not asked to do so.
What are the current versions of MJ12bot?

Current operating versions of MJ12bot are:

v1.4.x series - most common: v1.4.5 (new as of April 2014) and v1.4.4 (to be phased out before end of May 2014)

If you have not been satisfied with the information above then feel free to contact us: bot@majestic12.co.uk

Copyright © Majestic-12

“The purpose of studying economics is not to acquire a set of ready-made answers to economic questions, but to learn how to avoid being deceived by economists.” ― Joan Robinson
User avatar
Elvis
 
Posts: 7413
Joined: Fri Apr 11, 2008 7:24 pm
Blog: View Blog (0)

Re: majestic 12 bot

Postby Elvis » Sun May 06, 2018 5:15 am

https://majestic.com/

The planet's largest Link Index database
Enter a domain, URL OR search phrase
“The purpose of studying economics is not to acquire a set of ready-made answers to economic questions, but to learn how to avoid being deceived by economists.” ― Joan Robinson
User avatar
Elvis
 
Posts: 7413
Joined: Fri Apr 11, 2008 7:24 pm
Blog: View Blog (0)

Re: majestic 12 bot

Postby elfismiles » Sun May 06, 2018 8:45 am

There is a search function on this site. :eeyaa just sayin.

There's a Majestic-12 bot?
Post by Stephen Morgan » 22 Nov 2011
viewtopic.php?f=8&t=33576

:coolshades

elfismiles » 28 Nov 2011 08:31 wrote:WHOA!!! Déjà vu ...

FLASHBACK: Majestic-12 [bot]
viewtopic.php?f=8&t=28887
User avatar
elfismiles
 
Posts: 8511
Joined: Fri Aug 11, 2006 6:46 pm
Blog: View Blog (4)

Re: majestic 12 bot

Postby seemslikeadream » Sun May 06, 2018 11:02 am

:)

In total there are 112 users online :: 2 registered, 2 hidden and 108 guests (based on users active over the past 5 minutes)
Mazars and Deutsche Bank could have ended this nightmare before it started.
They could still get him out of office.
But instead, they want mass death.
Don’t forget that.
User avatar
seemslikeadream
 
Posts: 32090
Joined: Wed Apr 27, 2005 11:28 pm
Location: into the black
Blog: View Blog (83)


Return to The Lounge & Member News

Who is online

Users browsing this forum: No registered users and 10 guests