Weblogs: Web Development

Url hack for Lifehacker hashbang links

Tuesday, April 26, 2011

Three months ago Gawker launched it's new redesign based on hashbangs and JavaScript URL routing. Three months later and inbound links to their site are still broken. So I'm fixing what bothers me the most: links to lifehacker.com

Geo-redirects

Gawker's geo-redirection is causing problems. Their main .com sites use hashbangs, but their country-specific ones don't. Traffic coming into the .com sites from countries with their own country-specific subdomains are redirected on the first attempt.

That geo-redirection changes the domain name, but doesn't fix the hashbang URL, so the visitor is thrown to the homepage of that domain rather than the article the link was supposed to point to.

One solution is to hand-edit the URL each time by removing the hashbang characters. The other is to go back to the originating page and click it again and hope the second time the geo-redirection doesn't kick in.

Fixing the problem

Three months without fixing this is ridiculous. And it's not difficult. So I've gone ahead and created a fix myself. Gawker are welcomed to take this code and implement it on their servers.

The solution is straightforward, the complicated part is trapping queries to the Gawker sites. It would be easier if I had control over the gawker domain namespace, but I can work around that.

In effect, I'm locally mapping the gawker domains (but not their subdomains) to my VPS. There I have a simple PHP script that checks the domain and if it's a Gawker site I want to see it sends back a tiny HTML page with some JavaScript. The JavaScript looks at the URL requested and modifies it into a direct to the article URL and redirects to that URL. In this way I respect Gawker's geo-redirect requirements, and respect my requirement of seeing the article I intended to see.

Lets go step by step. The IP address of my VPS is 95.154.229.206 (this is actually my development/playground VPS).

Step 1: mapping Gawker domains to my VPS

On a Mac or Linux boxes this is a case of editing /etc/hosts. So in a terminal running sudo nano /etc/hosts and adding the following line:

95.154.229.206 lifehacker.com gizmodo.com

On Windows the corresponding file is /Windows/system32/drivers/etc/hosts. Make the same edit to this file with your text editor and save. Windows users have one extra step to perform here: close down your browser and reopen. This will clear the domain name cache and allow this change to take effect.

Step 2: Click on Lifehacker links like normal

There is no step 2.

Replicating this on your own VPS or webspace

The default configuration on a Ubuntu server is to map all non-specified domain names to the default webroot (/var/www), so all non-specified pages are served up /var/www/index.html. So I'm using this little trick to make it easy to deal with gawker domains without any configuration. All I've done is replaced the starting index.html with a PHP script that checks the domain name of the incoming request.

The script is as follows (running at /var/www/index.php):

<?php
$gawker = array( 'lifehacker.com', 'gizmodo.com' );
$domain = $_SERVER[ 'HTTP_HOST' ];

if ( in_array($domain, $gawker) ) {
  echo <<<HTML
<html><body><script type="text/javascript">
var link = document.location.protocol
  + '//uk.'
  + document.location.host;

if (document.location.hash.indexOf('#!')===0) {
  link += '/' + document.location.hash.substring(2);
}
else {
  link += document.location.pathname;
}
window.location = link;
</script></body></html>
HTML;
}
else {
  echo <<<HTML
<html><body><h1>It works!</h1></body></html>
HTML;
}
?>

Line 2 is a list of gawker domains to listen for. Line 4 checks whether the incoming request domain matches one on my list. If it does it writes out a short HTML page containing a piece of JavaScript that extracts the current page URL and recrafts it into a non-hashbanged version. And then the JavaScript redirects to that page.

Unfortunately because the most important bit of the Gawker URL is hidden in a fragement identifier it is not visible by the server, so we can't do a clean non-JavaScript dependent version. We have to resort to returning a small page with some JavaScript.

Fixing the web a little at a time

This solution solves my two main headaches with the Lifehacker site:

I should not need to do this fix. Gawker seem to have no inclination to fix it themselves. I reiterate: hashbang URLs break the Web, here's one datapoint.


[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]