Request.Browser.Crawler

In my previous post about exception logging, I show how to log several different parameters related to the exception in the database. Request.Browser.Crawler is one of them and its used to track browser crawlers. It warrants its own separate entry since it requires some extra bit of setup in the web.config to get it to work correctly.

You’ll have to add the following code in the section of your web.config file:

<!-- This section is used by Request.Browser.Crawler property to detect search engine crawlers -->
<browserCaps>
  <filter>
    <!-- SEARCH ENGINES GROUP -->
    <!-- check Google (Yahoo uses this as well) -->
    <case match="^Googlebot(\-Image)?/(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
      browser=Google
      version=${version}
      majorversion=${major}
      minorversion=${minor}
      crawler=true
    </case>
    <!-- check Alta Vista (Scooter) -->
    <case match="^Scooter(/|-)(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
      browser=AltaVista
      version=${version}
      majorversion=${major}
      minorversion=${minor}
      crawler=true
    </case>
    <!-- check Alta Vista (Mercator) -->
    <case match="Mercator">
      browser=AltaVista
      crawler=true
    </case>
    <!-- check Slurp (Yahoo uses this as well) -->
    <case match="Slurp">
      browser=Slurp
      crawler=true
    </case>
    <!-- check MSN -->
    <case match="MSNBOT">
      browser=MSN
      crawler=true
    </case>
    <!-- check Northern Light -->
    <case match="^Gulliver/(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
      browser=NorthernLight
      version=${version}
      majorversion=${major}
      minorversion=${minor}
      crawler=true
    </case>
    <!-- check Excite -->
    <case match="ArchitextSpider">
      browser=Excite
      crawler=true
    </case>
    <!-- Lycos -->
    <case match="Lycos_Spider">
      browser=Lycos
      crawler=true
    </case>
    <!-- Ask Jeeves -->
    <case match="Ask Jeeves">
      browser=AskJeaves
      crawler=true
    </case>
    <!-- check Fast -->
    <case match="^FAST-WebCrawler/(?'version'(?'major'\d+)(?'minor'\.\d+)).*">
      browser=Fast
      version=${version}
      majorversion=${major}
      minorversion=${minor}
      crawler=true
    </case>
    <!-- IBM Research Web Crawler -->
    <case match="http\:\/\/www\.almaden.ibm.com\/cs\/crawler">
      browser=IBMResearchWebCrawler
      crawler=true
    </case>
  </filter>
</browserCaps>

Now what does it all mean? Well, IIS uses that information in the section of your config file to detect whether the client browser is a crawler or not. If you look at it closely, its basically a regular expression filter. I presume you could add more filters in a similar format to detect other kinds of crawlers.

Update: For the most accurate and updated version of browserCaps and other useful browser testing/detection resources you can go to one of these sites:

http://slingfive.com/pages/code/browserCaps/

http://ocean.accesswa.net/browsercaps/

http://browsers.garykeith.com/downloads.asp

If you have any questions or comments, please post them below. If you liked this post, you can share it with your followers or follow me on Twitter!