Wednesday, January 26, 2011

Simple URL Shortening Script

If you're running Apache and are able to use the mod_asis module, it's very easy to setup a simple and efficient URL shortener. This URL shortening script doesn't require any sort of database and it doesn't have the overhead of something like PHP added to every single redirection.

If you read the documentation for mod_asis, the fundamentals of how this all works may be immediately obvious. If not, basically mod_asis lets you have a sort of static response cache. A lot like having a static HTML cache, but with HTTP headers.

Assuming you have your URL shortening domain setup and you have a nice empty DocumentRoot for your new URL shortening service, the first thing you need to do is create a directory named "stubs". If you want to setup some sort of UI to create new shortened URLs with setup your permissions on stubs so that PHP can write to stubs, but nobody else can.

Within the stubs directory create an htaccess file (ideally you'd use Directory or Location containers in your VirtualHost, but for example sake I'm using htaccess).
Within that htaccess file, add the fillowing line.

SetHandler send-as-is


What that line does, is force every requested file to pass through mod_asis, which if you've read the documentation for mod_asis you know that other than adding a Date and Server header to the response, mod_asis just sends the file to the visitor as-is.

So, if I put a file in that directory named "abc123" with the following contents, I'll basically be given a 302 redirect from the server pointing me to google.com

NOTE: There are two newlines after the Location line to signal the end of HTTP headers. This is important

Status: 302
Location: http://www.google.com/



So basically, at this point I could distribule the shortened URL "http://domain/stubs/abc123" and it would redirect visitors to google.com; This isn't all that nice though, as I have "stubs" in the URL and that sort of defeats the purpose of a shortened URL.

That's why in the parent directory of stubs, AKA the DocumentRoot, I add the following to the htaccess file.

RewriteEngine on
RewriteBase /
RewriteRule ([a-f\d]{1,8})$ stubs/$1 [L]


Now I can distribute my shortened URL as "http://domain/abc123" and it will redirect to google.com; I could still distribute the URL with stubs in it if I wanted and it would still work for both shortened URLs.

At this point I have a small, simple, and efficient URL shortener. I have to manually go into my stubs directory and add a new file every time I want to shorten a URL though.

For that, I have the following simple PHP script with a bulk shortened URL capable UI.

<!doctype html>  
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>My URL Shortener</title>
</head>

<body>
<div id="container">
<header>
<h1>My URL Shortener</h1>
<p>Enter as many URLs as you want, one per line.</p>
</header>
<div id="main">
<?php
/*
Contains backwards compatibility code;
If you comment this include() out and don't get any errors, it's safe to leave it out.
*/
include('./lib.php');

if( ! empty($_POST['u']))
{
$url_list = '<ul>';
foreach(preg_split('#[\r\n\f]+#', $_POST['u'], -1, PREG_SPLIT_NO_EMPTY) as $_url)
{
$url = parse_url($_url);
if($url && http_build_url('', $url) == $_url)
{
$crc = sprintf('%x', crc32($_url));
if( ! file_exists("./stubs/{$crc}"))
{
file_put_contents("./stubs/{$crc}", "Status: 302\nLocation: {$_url}\n\n\n");
}
printf('<li><a href="http://%1$s%2$s%3$s">http://%1$s%2$s%3$s</a> » %4$s</li>',
$_SERVER['HTTP_HOST'],
str_replace('//', '/', dirname($_SERVER['REQUEST_URI']) . '/'),
$crc,
$_url
);
}
}
if(strlen($url_list) > 4)
{
echo $url_list, '</ul>';
}
}
?>
<form action="index.php" method="post">
<fieldset>
<legend>URL</legend>
<textarea name="u" id="u" rows="10" style="width:400px;"></textarea>
<p><input type="submit" name="s" id="s" value="Shorten!"/></p>
</fieldset>
</form>
</div>
<footer>© Me; 2011</footer>
</div>
</body>
</html>


Also, because the PHP function "http_build_url" is a PECL function, I have the following code I include in "lib.php".

<?php
if (!function_exists('http_build_url'))
{
define('HTTP_URL_REPLACE', 1); // Replace every part of the first URL when there's one of the second URL
define('HTTP_URL_JOIN_PATH', 2); // Join relative paths
define('HTTP_URL_JOIN_QUERY', 4); // Join query strings
define('HTTP_URL_STRIP_USER', 8); // Strip any user authentication information
define('HTTP_URL_STRIP_PASS', 16); // Strip any password authentication information
define('HTTP_URL_STRIP_AUTH', 32); // Strip any authentication information
define('HTTP_URL_STRIP_PORT', 64); // Strip explicit port numbers
define('HTTP_URL_STRIP_PATH', 128); // Strip complete path
define('HTTP_URL_STRIP_QUERY', 256); // Strip query string
define('HTTP_URL_STRIP_FRAGMENT', 512); // Strip any fragments (#identifier)
define('HTTP_URL_STRIP_ALL', 1024); // Strip anything but scheme and host

// Build an URL
// The parts of the second URL will be merged into the first according to the flags argument.
//
// @param mixed (Part(s) of) an URL in form of a string or associative array like parse_url() returns
// @param mixed Same as the first argument
// @param int A bitmask of binary or'ed HTTP_URL constants (Optional)HTTP_URL_REPLACE is the default
// @param array If set, it will be filled with the parts of the composed url like parse_url() would return
function http_build_url($url, $parts=array(), $flags=HTTP_URL_REPLACE, &$new_url=false)
{
$keys = array('user','pass','port','path','query','fragment');

// HTTP_URL_STRIP_ALL becomes all the HTTP_URL_STRIP_Xs
if ($flags & HTTP_URL_STRIP_ALL)
{
$flags |= HTTP_URL_STRIP_USER;
$flags |= HTTP_URL_STRIP_PASS;
$flags |= HTTP_URL_STRIP_PORT;
$flags |= HTTP_URL_STRIP_PATH;
$flags |= HTTP_URL_STRIP_QUERY;
$flags |= HTTP_URL_STRIP_FRAGMENT;
}
// HTTP_URL_STRIP_AUTH becomes HTTP_URL_STRIP_USER and HTTP_URL_STRIP_PASS
else if ($flags & HTTP_URL_STRIP_AUTH)
{
$flags |= HTTP_URL_STRIP_USER;
$flags |= HTTP_URL_STRIP_PASS;
}

// Parse the original URL
$parse_url = parse_url($url);

// Scheme and Host are always replaced
if (isset($parts['scheme']))
$parse_url['scheme'] = $parts['scheme'];
if (isset($parts['host']))
$parse_url['host'] = $parts['host'];

// (If applicable) Replace the original URL with it's new parts
if ($flags & HTTP_URL_REPLACE)
{
foreach ($keys as $key)
{
if (isset($parts[$key]))
$parse_url[$key] = $parts[$key];
}
}
else
{
// Join the original URL path with the new path
if (isset($parts['path']) && ($flags & HTTP_URL_JOIN_PATH))
{
if (isset($parse_url['path']))
$parse_url['path'] = rtrim(str_replace(basename($parse_url['path']), '', $parse_url['path']), '/') . '/' . ltrim($parts['path'], '/');
else
$parse_url['path'] = $parts['path'];
}

// Join the original query string with the new query string
if (isset($parts['query']) && ($flags & HTTP_URL_JOIN_QUERY))
{
if (isset($parse_url['query']))
$parse_url['query'] .= '&' . $parts['query'];
else
$parse_url['query'] = $parts['query'];
}
}

// Strips all the applicable sections of the URL
// Note: Scheme and Host are never stripped
foreach ($keys as $key)
{
if ($flags & (int)constant('HTTP_URL_STRIP_' . strtoupper($key)))
unset($parse_url[$key]);
}


$new_url = $parse_url;

return
((isset($parse_url['scheme'])) ? $parse_url['scheme'] . '://' : '')
.((isset($parse_url['user'])) ? $parse_url['user'] . ((isset($parse_url['pass'])) ? ':' . $parse_url['pass'] : '') .'@' : '')
.((isset($parse_url['host'])) ? $parse_url['host'] : '')
.((isset($parse_url['port'])) ? ':' . $parse_url['port'] : '')
.((isset($parse_url['path'])) ? $parse_url['path'] : '')
.((isset($parse_url['query'])) ? '?' . $parse_url['query'] : '')
.((isset($parse_url['fragment'])) ? '#' . $parse_url['fragment'] : '')
;
}
}
?>


That just gives me a simple textarea that I can enter a list of URLs into, and automatically have stubs for short URLs written to the stubs directory and get a list of shortened URLs back.

1 comment:

Anonymous said...

web server directives are great,
it's too bad free web hosts disallow quite a lot of directives, you really need to host it yourself or pay for hosting.. though unless it's a VPS server you're often limited to what you can edit in the httpd.conf or php.ini etc