I had an idea how to check for robots using PHP and htaccess today that doesn't involve a User Agent string at all. Instead it uses sessions and exploits the fact that all well behaved robots are going to request robots.txt before they request anything else.
I start by making sure my usual robots.txt file is in place. Then I upload a robots.php file to the same location as robots.txt with a tiny bit of code in it.
$_SESSION['robot'] = 1;
All that does is start a session, set a
robot$_SESSION variable that I can check in subsequent scripts, then return the contents of robots.txt.
In htaccess I have the following line which transparently redirects requests to robots.txt made by visitors/spiders to robots.php, which in turn returns the contents of robots.txt
RewriteRule robots\.txt robots.php
Now in my applications I can easily check for robots and drop things like advertisement banners to speed up page loads since spiders don't look at advertisements anyway.
echo isset($_SESSION['robot']) ? 'ROBOT !!' : 'Not a robot';
Here are some of the benefits of doing it this way.
- I can continue to modify robots.txt as I normally would
- I don't need to keep up with changing User Agent strings
- Checking for the existance of a session variable is quicker than performing pattern matching on a variable