Wednesday, January 26, 2011

Simple URL Shortening Script

If you're running Apache and are able to use the mod_asis module, it's very easy to setup a simple and efficient URL shortener. This URL shortening script doesn't require any sort of database and it doesn't have the overhead of something like PHP added to every single redirection.

If you read the documentation for mod_asis, the fundamentals of how this all works may be immediately obvious. If not, basically mod_asis lets you have a sort of static response cache. A lot like having a static HTML cache, but with HTTP headers.

Assuming you have your URL shortening domain setup and you have a nice empty DocumentRoot for your new URL shortening service, the first thing you need to do is create a directory named "stubs". If you want to setup some sort of UI to create new shortened URLs with setup your permissions on stubs so that PHP can write to stubs, but nobody else can.

Within the stubs directory create an htaccess file (ideally you'd use Directory or Location containers in your VirtualHost, but for example sake I'm using htaccess).
Within that htaccess file, add the fillowing line.

SetHandler send-as-is


What that line does, is force every requested file to pass through mod_asis, which if you've read the documentation for mod_asis you know that other than adding a Date and Server header to the response, mod_asis just sends the file to the visitor as-is.

So, if I put a file in that directory named "abc123" with the following contents, I'll basically be given a 302 redirect from the server pointing me to google.com

NOTE: There are two newlines after the Location line to signal the end of HTTP headers. This is important

Status: 302
Location: http://www.google.com/



So basically, at this point I could distribule the shortened URL "http://domain/stubs/abc123" and it would redirect visitors to google.com; This isn't all that nice though, as I have "stubs" in the URL and that sort of defeats the purpose of a shortened URL.

That's why in the parent directory of stubs, AKA the DocumentRoot, I add the following to the htaccess file.

RewriteEngine on
RewriteBase /
RewriteRule ([a-f\d]{1,8})$ stubs/$1 [L]


Now I can distribute my shortened URL as "http://domain/abc123" and it will redirect to google.com; I could still distribute the URL with stubs in it if I wanted and it would still work for both shortened URLs.

At this point I have a small, simple, and efficient URL shortener. I have to manually go into my stubs directory and add a new file every time I want to shorten a URL though.

For that, I have the following simple PHP script with a bulk shortened URL capable UI.

<!doctype html>  
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>My URL Shortener</title>
</head>

<body>
<div id="container">
<header>
<h1>My URL Shortener</h1>
<p>Enter as many URLs as you want, one per line.</p>
</header>
<div id="main">
<?php
/*
Contains backwards compatibility code;
If you comment this include() out and don't get any errors, it's safe to leave it out.
*/
include('./lib.php');

if( ! empty($_POST['u']))
{
$url_list = '<ul>';
foreach(preg_split('#[\r\n\f]+#', $_POST['u'], -1, PREG_SPLIT_NO_EMPTY) as $_url)
{
$url = parse_url($_url);
if($url && http_build_url('', $url) == $_url)
{
$crc = sprintf('%x', crc32($_url));
if( ! file_exists("./stubs/{$crc}"))
{
file_put_contents("./stubs/{$crc}", "Status: 302\nLocation: {$_url}\n\n\n");
}
printf('<li><a href="http://%1$s%2$s%3$s">http://%1$s%2$s%3$s</a> » %4$s</li>',
$_SERVER['HTTP_HOST'],
str_replace('//', '/', dirname($_SERVER['REQUEST_URI']) . '/'),
$crc,
$_url
);
}
}
if(strlen($url_list) > 4)
{
echo $url_list, '</ul>';
}
}
?>
<form action="index.php" method="post">
<fieldset>
<legend>URL</legend>
<textarea name="u" id="u" rows="10" style="width:400px;"></textarea>
<p><input type="submit" name="s" id="s" value="Shorten!"/></p>
</fieldset>
</form>
</div>
<footer>© Me; 2011</footer>
</div>
</body>
</html>


Also, because the PHP function "http_build_url" is a PECL function, I have the following code I include in "lib.php".

<?php
if (!function_exists('http_build_url'))
{
define('HTTP_URL_REPLACE', 1); // Replace every part of the first URL when there's one of the second URL
define('HTTP_URL_JOIN_PATH', 2); // Join relative paths
define('HTTP_URL_JOIN_QUERY', 4); // Join query strings
define('HTTP_URL_STRIP_USER', 8); // Strip any user authentication information
define('HTTP_URL_STRIP_PASS', 16); // Strip any password authentication information
define('HTTP_URL_STRIP_AUTH', 32); // Strip any authentication information
define('HTTP_URL_STRIP_PORT', 64); // Strip explicit port numbers
define('HTTP_URL_STRIP_PATH', 128); // Strip complete path
define('HTTP_URL_STRIP_QUERY', 256); // Strip query string
define('HTTP_URL_STRIP_FRAGMENT', 512); // Strip any fragments (#identifier)
define('HTTP_URL_STRIP_ALL', 1024); // Strip anything but scheme and host

// Build an URL
// The parts of the second URL will be merged into the first according to the flags argument.
//
// @param mixed (Part(s) of) an URL in form of a string or associative array like parse_url() returns
// @param mixed Same as the first argument
// @param int A bitmask of binary or'ed HTTP_URL constants (Optional)HTTP_URL_REPLACE is the default
// @param array If set, it will be filled with the parts of the composed url like parse_url() would return
function http_build_url($url, $parts=array(), $flags=HTTP_URL_REPLACE, &$new_url=false)
{
$keys = array('user','pass','port','path','query','fragment');

// HTTP_URL_STRIP_ALL becomes all the HTTP_URL_STRIP_Xs
if ($flags & HTTP_URL_STRIP_ALL)
{
$flags |= HTTP_URL_STRIP_USER;
$flags |= HTTP_URL_STRIP_PASS;
$flags |= HTTP_URL_STRIP_PORT;
$flags |= HTTP_URL_STRIP_PATH;
$flags |= HTTP_URL_STRIP_QUERY;
$flags |= HTTP_URL_STRIP_FRAGMENT;
}
// HTTP_URL_STRIP_AUTH becomes HTTP_URL_STRIP_USER and HTTP_URL_STRIP_PASS
else if ($flags & HTTP_URL_STRIP_AUTH)
{
$flags |= HTTP_URL_STRIP_USER;
$flags |= HTTP_URL_STRIP_PASS;
}

// Parse the original URL
$parse_url = parse_url($url);

// Scheme and Host are always replaced
if (isset($parts['scheme']))
$parse_url['scheme'] = $parts['scheme'];
if (isset($parts['host']))
$parse_url['host'] = $parts['host'];

// (If applicable) Replace the original URL with it's new parts
if ($flags & HTTP_URL_REPLACE)
{
foreach ($keys as $key)
{
if (isset($parts[$key]))
$parse_url[$key] = $parts[$key];
}
}
else
{
// Join the original URL path with the new path
if (isset($parts['path']) && ($flags & HTTP_URL_JOIN_PATH))
{
if (isset($parse_url['path']))
$parse_url['path'] = rtrim(str_replace(basename($parse_url['path']), '', $parse_url['path']), '/') . '/' . ltrim($parts['path'], '/');
else
$parse_url['path'] = $parts['path'];
}

// Join the original query string with the new query string
if (isset($parts['query']) && ($flags & HTTP_URL_JOIN_QUERY))
{
if (isset($parse_url['query']))
$parse_url['query'] .= '&' . $parts['query'];
else
$parse_url['query'] = $parts['query'];
}
}

// Strips all the applicable sections of the URL
// Note: Scheme and Host are never stripped
foreach ($keys as $key)
{
if ($flags & (int)constant('HTTP_URL_STRIP_' . strtoupper($key)))
unset($parse_url[$key]);
}


$new_url = $parse_url;

return
((isset($parse_url['scheme'])) ? $parse_url['scheme'] . '://' : '')
.((isset($parse_url['user'])) ? $parse_url['user'] . ((isset($parse_url['pass'])) ? ':' . $parse_url['pass'] : '') .'@' : '')
.((isset($parse_url['host'])) ? $parse_url['host'] : '')
.((isset($parse_url['port'])) ? ':' . $parse_url['port'] : '')
.((isset($parse_url['path'])) ? $parse_url['path'] : '')
.((isset($parse_url['query'])) ? '?' . $parse_url['query'] : '')
.((isset($parse_url['fragment'])) ? '#' . $parse_url['fragment'] : '')
;
}
}
?>


That just gives me a simple textarea that I can enter a list of URLs into, and automatically have stubs for short URLs written to the stubs directory and get a list of shortened URLs back.

Monday, January 24, 2011

Google Safe Browsing Wordpress Dashboard Module

I wrote a small dashboard module for Wordpress this morning which automatically fetches the Google Safe Browsing report for the current domain and displays the results on the dashboard. This is a lot more convenient than going to the safe browsing page yourself periodically if you aren't using a browser with the reporting built-in.

The project is named wp-google-safe-browsing-dashboard and the plug-in is available for download at Google Code. It's a nice simple plugin, just upload the zip file using your Wordpress plug-ins manager, activate it, and you're good to go!

Please, no applause, just throw money.

Saturday, January 22, 2011

$18.50 Average Adsense Page RPM

I've been experimenting with a sort of "bare bones" layout with articles for a few weeks now. I have a half-dozen websites using the layout and the average Adsense Page RPM is $18.50 USD. To answer the obvious question, no these aren't articles that use ridiculously high paying no-chance-for-the-little-guy-to-compete-competition keywords either.

Basically, it's a 500-600 word article centered on the page with some navigation links on top, a descriptive heading, a paragraph of introduction text, a 728x90 advertisement, the entire article text, then some links to other articles and websites on the bottom.

There are no layout graphics at all in this layout, only an occasional article-relevant graphic within the article text from time to time. No sidebars, navigation is kept above the heading, and down in the footer. I haven't been adding any links within the article text to relevant articles.

There are 3 sizes of black text, normal, h1, and h2 sizes. The background is solid white and the links are the default blue. The main heading is centered, the rest of the text is left-aligned within the centered 800 pixel column.

The websites use almost no bandwidth, CPU to generate the pages is minimal.

It will be interesting to see if this layout survives, it really does give new meaning to "content is king".

Sunday, January 9, 2011

Text Ads or Image Ads

A question I see a lot is "Which is better with Adsense, text ads or image ads?" and the honest answer is, it depends.
Sometimes a layout will dictate whether you can use text ads or rich media ads. There are certain scenarios where using text ads would break the Adsense TOS whereas using image ads wouldn't.
Another thing to consider is which one would fit better with the placement. Sometimes using one type or the other just doesn't make any sense because it sticks out like a sore thumb rather than a piece of jewelry.

In any event, I've gone ahead and gathered my statistics for the entire year of 2010 and listed each of the Adsense ad types in order of best performing to worst performing for each of the categories Adsense tracks.

Page Views


  1. Image Ads
  2. Text Ads
  3. Flash Ads
  4. Animated Image Ads
  5. Rich Media Ads

CTR


  1. Animated Image Ads
  2. Text Ads
  3. Image Ads
  4. Flash Ads
  5. Rich Media Ads

CPC


  1. Rich Media Ads
  2. Flash Ads
  3. Image Ads
  4. Animated Image Ads
  5. Text Ads

Page RPM


  1. Rich Media Ads
  2. Image Ads
  3. Animated Image Ads
  4. Text Ads
  5. Flash Ads

Estimated Earnings


  1. Image Ads
  2. Rich Media Ads
  3. Text Ads
  4. Animated Image Ads
  5. Flash Ads

Thursday, January 6, 2011

Blocking Low Paying Adsense Categories

I was looking through the new Adsense interface today (the version with /v3/ in the URL) and found out that the control over blocked ads is a lot better than I remember. Under the "Allow and Block Ads" tab there are two awesome options in "Blocking OptionS" section of the sidebar, "General Categories" and "Sensitive Categories".

Sensitive Categories has things like dating, politics, religion, etc. Whereas General Categories has everything else.

Both of these sections show a list of possible Adsense categories along with what percentage of the ads you've displayed came from each category, and how much of your Adsense revenue came from each category. Most of the categories also break down into multiple sub-categories.

As soon as I found it I immediately realized I had a couple of categories with bad impression/revenue relationships. For instance I had one category that was accounting for 13% of impressions, however it only accounted for 3.8% of revenue. In comparison there are other categories that account for 6.9% of revenue on 3.3% of impressions and 4.5% on revenue on 2.5% of impressions.

So, I went ahead and blocked the categories with horrible impression to revenue relationships. I figure that having more ads from more profitable categories being displayed will translate into more money. Since Adsense has much improved reporting in V3 it will be easy to see just how well my changes do in the future.

There is a 50 category limit on the number of Adsense categories that can be blocked.