Gawker's geo-redirection is causing problems. Their main .com sites use hashbangs, but their country-specific ones don't. Traffic coming into the .com sites from countries with their own country-specific subdomains are redirected on the first attempt.
That geo-redirection changes the domain name, but doesn't fix the hashbang URL, so the visitor is thrown to the homepage of that domain rather than the article the link was supposed to point to.
One solution is to hand-edit the URL each time by removing the hashbang characters. The other is to go back to the originating page and click it again and hope the second time the geo-redirection doesn't kick in.
Three months without fixing this is ridiculous. And it's not difficult. So I've gone ahead and created a fix myself. Gawker are welcomed to take this code and implement it on their servers.
The solution is straightforward, the complicated part is trapping queries to the Gawker sites. It would be easier if I had control over the gawker domain namespace, but I can work around that.
In effect, I'm locally mapping the gawker domains (but not their subdomains) to my VPS. There I have a simple PHP script that checks the domain and if it's a Gawker site I want to see it sends back a tiny HTML page with some JavaScript. The JavaScript looks at the URL requested and modifies it into a direct to the article URL and redirects to that URL. In this way I respect Gawker's geo-redirect requirements, and respect my requirement of seeing the article I intended to see.
Lets go step by step. The IP address of my VPS is 95.154.229.206 (this is actually my development/playground VPS).
On a Mac or Linux boxes this is a case of editing /etc/hosts
. So in a terminal running sudo nano /etc/hosts
and adding the following line:
95.154.229.206 lifehacker.com gizmodo.com
On Windows the corresponding file is /Windows/system32/drivers/etc/hosts
. Make the same edit to this file with your text editor and save. Windows users have one extra step to perform here: close down your browser and reopen. This will clear the domain name cache and allow this change to take effect.
There is no step 2.
The default configuration on a Ubuntu server is to map all non-specified domain names to the default webroot (/var/www
), so all non-specified pages are served up /var/www/index.html
. So I'm using this little trick to make it easy to deal with gawker domains without any configuration. All I've done is replaced the starting index.html
with a PHP script that checks the domain name of the incoming request.
The script is as follows (running at /var/www/index.php
):
<?php
$gawker = array( 'lifehacker.com', 'gizmodo.com' );
$domain = $_SERVER[ 'HTTP_HOST' ];
if ( in_array($domain, $gawker) ) {
echo <<<HTML
<html><body><script type="text/javascript">
var link = document.location.protocol
+ '//uk.'
+ document.location.host;
if (document.location.hash.indexOf('#!')===0) {
link += '/' + document.location.hash.substring(2);
}
else {
link += document.location.pathname;
}
window.location = link;
</script></body></html>
HTML;
}
else {
echo <<<HTML
<html><body><h1>It works!</h1></body></html>
HTML;
}
?>
Line 2 is a list of gawker domains to listen for. Line 4 checks whether the incoming request domain matches one on my list. If it does it writes out a short HTML page containing a piece of JavaScript that extracts the current page URL and recrafts it into a non-hashbanged version. And then the JavaScript redirects to that page.
Unfortunately because the most important bit of the Gawker URL is hidden in a fragement identifier it is not visible by the server, so we can't do a clean non-JavaScript dependent version. We have to resort to returning a small page with some JavaScript.
This solution solves my two main headaches with the Lifehacker site:
I should not need to do this fix. Gawker seem to have no inclination to fix it themselves. I reiterate: hashbang URLs break the Web, here's one datapoint.
]]>With a bare-bones server up and running it's time to install CouchDB. CouchDB is available as an apt install on Ubuntu 8.10 and above, but it won't be the cutting edge code and so some features may be missing. So I had no choice but to go for the tar.gz install. But first, we can install the package dependencies:
Next we need to get Erlang 5.6 or above. This is another case where newer releases of Ubuntu have this version, but not Ubuntu 8.04. But this time the Apache folks on the CouchDB project have a nifty solution of using the Ubuntu Intrepid repository to grab the updated version of Erlang. (Chris Strom's blog post Yak Shaving is the new Dependency Hell help deal with a number of dependency issues thrown up by configure)
That's all the dependencies sorted. Now onto downloading the most up-to-date CouchDB source code and building it (this follows the approach documented by Craig R Webster) :
At this point we have a working installed version of CouchDB. Craig continues with some additional work, and even gets couchdb installed as a service that can be started and stopped using the bog-standard Ubuntu init.d methods:
At this point CouchDB is running as a daemon. And we can test it via HTTP by calling the root URL on localhost:
Wget saves the contents of this url into index.html, and this contains:
{"couchdb":"Welcome","version":"0.9.1"}
And we are done!
As sequels go, they never outdo the first. And Yahoo's first London Hackday is legend. The second, far better organised, a brilliant venue (except for their below-par networking infrastructure) and excellent food. Congratulations are very much in order for Anil Patel and Sophie Major for pulling off a remarkable and well-run geek event.
The first day started off with tech talks. Dav Glass' overview of YUI3 was particularly insightful, and a well thought-out code example topped off a useful talk (has anyone ever seen Dav without a beanie?). Christian's talk was the best, showing off the potential of YQL, and billing it as the command line version of Pipes. I found the BOSS talk lacking in real substance.
Rasmus Lerdorf had interesting material. He offered useful snippets of code for doing typical things: curling, parsing RSS, caching, authentication, YQL, and even OAuth. Aimed primarily at new hackers, or non-hardcore developers. These are all very useful pieces of code for a hacker's toolkit. I feel the material would be far more useful as an online website, like a Hackday Manual.
My memories go back to the first installment and it's interesting to see the considerable change of technology. Back then YUI was just about to start gaining mindshare, Pipes (the forerunner to YQL) hadn't arrived. Ryan Kennedy did a fascinating talk about BBauth and the Yahoo Mail API. It's clear Yahoo is starting to be a leader on the technology front. They are listening and reacting in a way that is beneficial to us hackers, Yahoo employees or not.
YQL (Yahoo Query Language) is a brilliant conception. Fundamentally sound and it taps into the web developer mind share. They already deal with SQL queries, and here's something that then has a minimal learning curve and produces remarkable results fairly quickly. It almost just converts the web into a big queryable database.
I saw this magic first in SPARQL last year, but YQL is going to be a much bigger deal because it taps into an existing audience and fits naturally into their existing toolset. SPARQL is tarnished by the Semantic Web mumbo-jumbo and the prevalent belief they'll reinvent the Internet in it's image. YQL just works, with and without the datasource's help.
One of the subtler advantages of YQL is that when enough people find an API-less website useful to scrape via YQL, the site owner is going to notice a steady stream of requests from the YQL user-agents. That leads him into figuring out why YQL is making these requests, and hopefully encourage that developer to make the content open as an API. Using YQL is almost like voting to encourage a website owner to offer API access to the data that he's already making available as part of a website.
It was also particularly splendid to see David Filo in attendance again. I actually believe he has been to all the Yahoo Open Hackdays across the planet. Has he even missed one? I regard Filo as the technical heart of Yahoo, more than just a leader. An inspiring presence. And in his lifetime he is a legend, and along with Jerry Yang played a crucial role in the history of the Web that we have with us today.
When 250 geeks converge into one constrained space, Internet access is going to be problematic. The one particularly significant issue of this hackday has been the wonky wifi, and perhaps more scandalous or ridiculous, a dedicated line that's significantly well below average if compared to household broadband connections.
A bottom range broadband connection is just nowhere near acceptable enough to satisfy the data needs of two hundred and fifty geeks armed with multiple laptops, 3G auto-connecting mobile phones and other handheld devices. Despite the phenomenal effort of the Yahoo hackday crew, who requipped the venue with enterprise level wifi routers, there was nothing they could have do to alleviate the issue of a low-bandwidth connection.
Just as the lightning strike at Alexandra Palace was totally out of Yahoo's control, so was the internet connection. Except, it was a man-made error, and a mistake made perhaps months or years before hosting a hackday there became a possibility.
It's an organisation failure of the venue owners. The connection size might just be barely adequate in an audience of 250, and only a fraction are using their laptops for web requests. 250 data hungry geeks is a completely different story. We live, breathe and die on TCP/IP. There's not enough hours in the day to satiate our appetites for information on the web.
As a result, my plan of building an application on top of Twitter was doomend from the start. Trying to debug Curl was a frustrating spiral, I just couldn't tell whether there was something wrong with my Curl request, or whether it was just the lousy connection. With the majority of webpages just timing out and showing a blank screen, it was a no-win situation for me. So I opted to leave early, and bailing out on the typical overnight hackday activities.
I was building an application because there was a room of like-minded geeks building their applications, so it was the natural thing to do. But, I'm building the app for me, because it's something I want to exist, and something that interests me. So after a pizza-hazed hacking session, and coding storms tempered all top frequently by Twitter's hourly limit, I decided that my idea was more valuable to me than building something in 24 hours.
I headed back into London on the Sunday to catch the end of the hacking session and for the hack presentations. I finally got to see the much-celebrated Dundee University winning hack team. They were flown from Dundee to participate in this hackday, and again they turned in a stunning application that really pushed the boundaries and conceptions around web accessibility.
Out of all the hacks, I felt that the Dundee University's Intellisearch was the most polished and well-executed hack there. It was basically a predictive interface for people with motor-related disabilities. Clicking a mouse button without the mouse moving is the one major frustrating barriers this audience has to deal with. So the Dundee guys implemented a clickless interface, and worked around the imprecision of mouse positioning by clever use of zoom.
By hooking into Yahoo BOSS and YQL, the interface offered multiple options that could be selected just by moving the mouse nearer the preferred option. The closer the mouse to an area, the larger that area became. This neatly solves the tradeoff problem of having large selectable region against the need to have as much information on the page as usability allows.
It was an awesome demonstration of creativity, an ingenious idea that was well executed. It used HTML Canvas for the interface, and I felt that the award of "Best Mozilla Hack" is an understatement. All A-Grade browsers support Canvas except for Internet Explorer. So it's not just Mozilla, but the Webkits and Operas that can make use of this immensely powerful interface conception.
I've praised Dundee University previous for turning out such talented, gifted and accessibility aware developers. And like a broken record, well done to the Dundee crew of Chris Brett, Laurence Hole and Matthew Ross. Exceptional work. And lots of credit to the staff at Dundee University for keeping up-to-date with modern web development techniques, and teaching them.
Fifty hacks were presented. Many of them took advantage of YQL, and pulled off some nifty ideas. Jim O'Donnell demoed a marvellous YQL and Flickr hack on the significant number of astronomy pictures on Flickr (and won a Best YQL execute hack award). He pulled in astronomy data surfaced in Flickr machine tags into the YQL table, thus exposing it to future processing. So now it is possible to start at a picture of one region of the night sky and be able to calculate and find pictures of celestial bodies nearby.
The implications of surfacing such rich data is very similar to geo-location data already available for our tiny planet. Talking to Jim on Saturday he made a great point that geo-location is looking at a ball you are standing on, and what he is doing is looking from the inside of a bigger ball, the maths and logic is largely the same, and the newly-created opportunities equally so.
Jim has been an independent champion of YQL, and over the past few months has inadvertently demonstrated the unexpected benefits of making data easy to query. I doubt the YQL team ever imagined how beneficial their platform would be to the world of astronomy. A case of build it and see we comes? Jim has been applying YQL to both astronomy and maritime information, and it's splendid to see such a cutting edge technology being so useful to long-established professions.
Norm, James and Richard produced an exceptional idea, a Bayesian filtered news aggregator. By filtering stories based on user preference, you can remain subscribed to more general news feeds (like Top Stories feeds) and filter out topics you are not interested in in an automated fashion. I loved the concept, and the end result was splendid. Also, there's the broader implications of Bayesian filters for building a set of user preferences; the could be applied to other things like recommendation engines of various retailers. It could be Skynet.
The iPhone orchestra entertained us by playing the Doctor Who theme. It was awesome, and gave the Tesla Coil interpretation a run for it's money.
The other hack that caught my attention was the FireEagle temporary pass, which solves the issue of letting some people know where you are, but in an easily revocable way. That way some people can know where you are when you want or need them to know.
It has been an awesome weekend. Again, well done to Anil and Sophie, and the endless energy from the Yahoo support crew. They did a fantastic job, and the quality of hacks from the gathered geeks was very good. Perhaps better than the initial London Open Hackday.
]]>