@Media Ajax - Day twoTuesday, September 16, 2008
The presentations on Day 2 of @media Ajax was more technical than the initial days. From Comet through to security, and no @media is the same without a panel discussion at the end.
Joe Walker and Comet
Joe talks about Comet and digs underneath to show the audience the approaches and hackery used to simulate a server-side push. Most of the problems is down to Internet Explorer, and so these techniques need some unconventional workarounds.
Long polling is essentially a "slow XMLHttpRequest" that can drip feed content back to the browser. Normally an HTTP header of Chunked is sent across. But even with chunked content, IE lies. The change of status is received, but IE claims it sees no data. The workaround is for the server to terminate the connection and get the browser to restart it, and that gets IE to recognise the content is there.
The forever frame approach is just a polling mechanism in a hidden iframe. IE lies here to about the content being available, and the workaround here is to flush the content out by putting a script tag for each data block. The alternative is to use the content-type of text/plain and push through at least 4k of text. Thankfully, with gzip compression prefixing the response with 4k of spaces compresses very well. The Iframe needs restarting every so often otherwise IE leaks memory like a sieve. Every URL change in the iframe also causes IE to emit a click, this can be worked around by confusing IE as to the parent of the iframe container, but this is a rather messy hack.
All of the above techniques are nasty and horrible. HTML 5 makes things better with Web Sockets. That combined with DOMStorage will be a good alternative to the above hacks.
Eleven years when server-side push was in vogue it used mime messaging (using x-multipart-replace). This technique could also be used as it offers excellent performance, but this does not work in IE.
Even crazier is using "Forever GIFs". Gifs can be multi-framed (animated gifs), and this multiframing could be used to stream data from the server (if you are warped or sick enough).
Performance is an often cited reason to avoid the use of Comet, but testing with a Jetty server showed that even with 20,000 concurrent connections, the server still returns sub-second response times (the performance graphs showed 500 millisecond responses, versus about 250 for 10,000 connections). With that level of performance on a single server its more likey something else will break before performance becomes too bad.
The two connections per domain is a limiting issue, especially if you have multiple tabs open to the same domain that's streaming data. There are a number of techniques to alleviate this barrier. The best solution is multi-homed DNS - setting up loads of subdomains that map to the same space. Even if they are all on the same IP address, the browser still allows two connections per domain.
Modern browsers allow the raising of the number of per-domain connections, so these tend to now be between 4 and 8.
Client side mechanisms for alleviating the per-domain connection limit include using window.name as references, where one tab takes the lead in talking to the server, and it deals with all requests for all the tabs, using the window.name property to share data between the other tabs. Cookies are another way of sharing data between the tabs.
Joe goes on to talk about the server, talking about issues of detecting failed connections. He also talked about various approaches to writing Comet servers, using code examples in Java using the DWT libraries. He covered the two Comet architecture approaches of Inboard and Outboard - where the Comet server sits in relation to the rest of the server.
Comet is a hack. So is Ajax - which didn't stop Ajax. And it does work. Facebook routinely uses Comet, proof that Comet can work on massive and popular sites.
Christophe Porteneuve and Best practice solutions with Prototype
Christophe took us through 3 code examples of prototype being used for Ajax features. The first one was showing how to neaten up some Prototype code by taking advantage of features like chaining, better choice of selectors, overriding custom behaviours, also Event Delegation to dramatically simplify the code. The end result was a code chunk that was more readable, easier to understand, and hopefully makes maintenance easier.
The second code example was building up a persistent list of checkboxes - the basis of to-do lists. Here Christophe showed a minimal PHP to support both the rendering of the page and the Ajax interactions - using the
Http-X-Requested-By: XMLHttpRequest HTTP header to branch between a normal page request and an Ajax update. After getting the basic functionality working, Christophe improved the usability of the widget by inserting loading indicators. He later built drag-and-drop allowing the list to be reordered in a persistent fashion.
A useful presentation showing not only the possibilities offered by the Prototype library, but also how a developer thinks his way through implementing an Ajax driven widget.
Dan Webb on Managing Complexity
On a typical Ajaxified page - from a music social network - Dan identifies common behaviours of Collapsible modules (using two different styles), Collection lists, Selections and Additions and Colour pickers.
Using jQuery and his own LowPro library, Dan attaches a Behaviour class to each element expressing a behaviour, and attaches the necessary event handlers to the element. He does this in a way where the Behaviour class knows the element it applies to, but not vice versa. This elegantly solves the memory leak problem of cross-references.
A profound presentation, one I cannot do justice to in this blogpost. I hope Dan can spend some time writing up notes, or a few blog posts on this topic. I think the approach and underlying logic is right, and I feel it is a nice fit for websites that are modular in nature.
Yehuda Katz making jQuery code modular
Yehuda authored the book "jQuery in Action", still the only book on the market that covers jQuery 1.2, where he explains why jQuery does things a certain way, which helps developers unlock the potential of jQuery.
In this talk, Yehuda takes us step by step through building an autocomplete search field in less than 100 lines of code (that's what he promised on the session writeup). Yehuda managed it in 92 lines of code, but isn't entirely happy with certain bits that he feels aren't elegant enough.
Yehuda spends time describing the different states of an autocomplete feature, and shows how namespaced custom events can be used as a powerful mechanism for reducing code complexity to maintainable levels.
He takes the time to explain the facets and features inside of jQuery, including the data persistence, the custom events, how the state is handled with namespaced events.
To be honest, bits and pieces of the talk went way over my head, but Yehud will be making the code available with the presentation on his site, so I'll have to spend some time working through the code and figure out the bits I didn't grasp today.
Simon Willison when Ajax attacks
A few years ago, the shared wisdom about security can be boiled down to three tenets:
- Never trust user inputs
- Protect against SQL injection attacks
The first two are boring, since they are solved problems on modern server side problems. SQL injection is largely dealt with by parameterised SQL and decent APIs. The third, today, is the most interesting.
Simon shows a competition based around finding XSS vulnerabilities ordered by pagerank. There's a few Yahoo domains listed there. But also, Simon shows that there are XSS vulnerabilities in Facebook, YouTube, Google Groups and Microsoft's live search.
There was recently a mass XSS via SQL injection, hitting a large number of sites running the same application code. The app was badly written to allow the injection of SQL.
Preventing XSS involves:
- Use a tool that escapes everything on output
- Then only unescape stuff you know is safe and only contains markup you want to execute
- IE8 has XSS filter to track potential XSS attempts and blocks it. But this is irrelevant to developers because IE8 isn't the only browser out there.
HTML sanitisation is an extremely common vector of XSS. The best idea is to avoid needing to do this, it is very very difficult to get this right.
The "samy is my hero", from October 2005, is the famous MySpace worm that defeated the HTML sanitisation on the user profile page. Simon dissects the worm's code showing the various techniques used to bypass the MySpace HTML sanitisation.
The worm itself infected MySpace exponentially. One user in the first hour, 221 friend requests after 7 hours, and over one million friend requests after 20 hours, which is when MySpace crashed. (and it took 18 months to prosecute Samy)
The flaws in MySpace that allowed this worm to propagate were:
position: absolute was used to phish 30,000 MySpace user accounts by being used to position a retargeted login form directly over the genuine login form. The attackers released the details of all the hacked accounts. The good news is that there's hope for humanity yet, although a number of MySpace user passwords were password1, most passwords did show signs of simple password protection like not just using dictionary words.
CSRF: Cross Site Request Forgery. In 2005, when Google released the Google accelerator, many people using web applications found they were losing data. This is because Google's accelerator prefetches GET urls, and applications were using links to delete data from an application. This caused quite a stir, especially with the 37 signals group.
Apart from the argument of REST vs GET for everything, it was realised that there was a huge security risk already in using links for delete requests. A simple list of images with the src of those delete links is enough for an attacker to get the victim to delete his own data unwittingly.
form.submit(). With a hidden iframe, the victim won't know what's happened.
A few years ago, Digg has no CSRF protection on the "Digg this" button. This lead to sites creating self-digging pages: programmatically hitting the digg this url without the visitor knowing.
Gmail had left a CSRF hole in their "Add filter" feature. This opened a backdoor for an attacker to add a filter rule to forward all received email to an attacker controlled email address. This hole was undiscovered for several months.
The main way to prevent CSRF is to be able to distinguish between form actions from your site from form actions from an external site. Referrer checking is unreliable, some anti-virus tools have been known to strip the referrer header from outgoing HTTP requests.
The main technique is to use a form token - called a crumb in Yahoo!. This is a hidden field on a form that contains a unique secret value that is only known to the host of the form. The server checks this crumb against a server side known value, and if it is valid, then allow the request to proceed. The crumb is unique for every user, and it needs to change over time to prevent replay attacks. One method of generating crumb is to derive it from a session cookie, like an md5 hash.
If you are not using crumbs now, you have a security hole you can drive a bus through.
Protecting the crumb becomes the vital link in protecting against CSRF. This means that an XSS vulnerability can allow the crumb to be stolen, so XSS holes are CSRF holes.
Because Ajax is done within a same-domain policy, and Ajax can set HTTP headers that regular forms can't, this means that an
X-Requested-By: XMLHttpRequest can only come from your site, so if you see that in the header, you could skip crumb checking for those requests.
Flash's crossdomain.xml also opens up a vulnerability. This is mitigated slightly by moving API calls to a different domain to the website. Flickr for example has its API calls on api.flickr.com.
Flash can be tricked with fake crossdomain.xml files. One example Simon showed was a corrupted gif file that contained a crossdomain.xml, which gave attacker access to data on the target domain. Flash can fake the Ajax
X-Requested-With HTTP header, and has access to crumbs, so it is definitely a vector for attack.
Simon spends time describing JSON-P, which is wrapping a JSON object in a function call, where the function is identified by a parameter on the request URI query string. He shows that if the JSON-P contains user sensitive information at a guessable URL, then that data is at risk. He shows a vulnerability in Google Docs which made a user's contact list publicly gettable by an attacker.
As developers we need to stay informed. Simon offers the following resources:
- Planet Web security (Atom feed)
- Simon's own hoard of security resources
- Web application security professionals
One thing Simon has yet to find is a good tutorial or article for the best way of creating crumbs.
End of day two
The second day of @media Ajax turned out much better than the first, lots of technical details and code to dive into and peruse. Overall, the conference was a decent opportunity to see what is happening in the Ajax world. To be honest, Simon stole the event with a remarkable and interesting presentation on security.