Weblogs: Web Development
Url hack for Lifehacker hashbang links
Tuesday, April 26, 2011Three months ago Gawker launched it's new redesign based on hashbangs and JavaScript URL routing. Three months later and inbound links to their site are still broken. So I'm fixing what bothers me the most: links to lifehacker.com
Geo-redirects
Gawker's geo-redirection is causing problems. Their main .com sites use hashbangs, but their country-specific ones don't. Traffic coming into the .com sites from countries with their own country-specific subdomains are redirected on the first attempt.
That geo-redirection changes the domain name, but doesn't fix the hashbang URL, so the visitor is thrown to the homepage of that domain rather than the article the link was supposed to point to.
One solution is to hand-edit the URL each time by removing the hashbang characters. The other is to go back to the originating page and click it again and hope the second time the geo-redirection doesn't kick in.
Fixing the problem
Three months without fixing this is ridiculous. And it's not difficult. So I've gone ahead and created a fix myself. Gawker are welcomed to take this code and implement it on their servers.
The solution is straightforward, the complicated part is trapping queries to the Gawker sites. It would be easier if I had control over the gawker domain namespace, but I can work around that.
In effect, I'm locally mapping the gawker domains (but not their subdomains) to my VPS. There I have a simple PHP script that checks the domain and if it's a Gawker site I want to see it sends back a tiny HTML page with some JavaScript. The JavaScript looks at the URL requested and modifies it into a direct to the article URL and redirects to that URL. In this way I respect Gawker's geo-redirect requirements, and respect my requirement of seeing the article I intended to see.
Lets go step by step. The IP address of my VPS is 95.154.229.206 (this is actually my development/playground VPS).
Step 1: mapping Gawker domains to my VPS
On a Mac or Linux boxes this is a case of editing /etc/hosts
. So in a terminal running sudo nano /etc/hosts
and adding the following line:
95.154.229.206 lifehacker.com gizmodo.com
On Windows the corresponding file is /Windows/system32/drivers/etc/hosts
. Make the same edit to this file with your text editor and save. Windows users have one extra step to perform here: close down your browser and reopen. This will clear the domain name cache and allow this change to take effect.
Step 2: Click on Lifehacker links like normal
There is no step 2.
Replicating this on your own VPS or webspace
The default configuration on a Ubuntu server is to map all non-specified domain names to the default webroot (/var/www
), so all non-specified pages are served up /var/www/index.html
. So I'm using this little trick to make it easy to deal with gawker domains without any configuration. All I've done is replaced the starting index.html
with a PHP script that checks the domain name of the incoming request.
The script is as follows (running at /var/www/index.php
):
<?php
$gawker = array( 'lifehacker.com', 'gizmodo.com' );
$domain = $_SERVER[ 'HTTP_HOST' ];
if ( in_array($domain, $gawker) ) {
echo <<<HTML
<html><body><script type="text/javascript">
var link = document.location.protocol
+ '//uk.'
+ document.location.host;
if (document.location.hash.indexOf('#!')===0) {
link += '/' + document.location.hash.substring(2);
}
else {
link += document.location.pathname;
}
window.location = link;
</script></body></html>
HTML;
}
else {
echo <<<HTML
<html><body><h1>It works!</h1></body></html>
HTML;
}
?>
Line 2 is a list of gawker domains to listen for. Line 4 checks whether the incoming request domain matches one on my list. If it does it writes out a short HTML page containing a piece of JavaScript that extracts the current page URL and recrafts it into a non-hashbanged version. And then the JavaScript redirects to that page.
Unfortunately because the most important bit of the Gawker URL is hidden in a fragement identifier it is not visible by the server, so we can't do a clean non-JavaScript dependent version. We have to resort to returning a small page with some JavaScript.
Fixing the web a little at a time
This solution solves my two main headaches with the Lifehacker site:
- Gawker breaking incoming links to articles
- Having to deal with that abysmal Gawker Ajax interface
I should not need to do this fix. Gawker seem to have no inclination to fix it themselves. I reiterate: hashbang URLs break the Web, here's one datapoint.
[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]