Weblogs: Javascript

@Media Ajax - Day two

Tuesday, September 16, 2008

The presentations on Day 2 of @media Ajax was more technical than the initial days. From Comet through to security, and no @media is the same without a panel discussion at the end.

Joe Walker and Comet

Joe talks about Comet and digs underneath to show the audience the approaches and hackery used to simulate a server-side push. Most of the problems is down to Internet Explorer, and so these techniques need some unconventional workarounds.

Long polling is essentially a "slow XMLHttpRequest" that can drip feed content back to the browser. Normally an HTTP header of Chunked is sent across. But even with chunked content, IE lies. The change of status is received, but IE claims it sees no data. The workaround is for the server to terminate the connection and get the browser to restart it, and that gets IE to recognise the content is there.

The forever frame approach is just a polling mechanism in a hidden iframe. IE lies here to about the content being available, and the workaround here is to flush the content out by putting a script tag for each data block. The alternative is to use the content-type of text/plain and push through at least 4k of text. Thankfully, with gzip compression prefixing the response with 4k of spaces compresses very well. The Iframe needs restarting every so often otherwise IE leaks memory like a sieve. Every URL change in the iframe also causes IE to emit a click, this can be worked around by confusing IE as to the parent of the iframe container, but this is a rather messy hack.

Internet Explorer offers an ActiveX component called htmlfile which allows the user to embed a browser into the current HTML document. It is effectively a heavy-weight iframe. This avoids the clicking on url change, but not on IE on Windows Server 2003. This is, of course, not supported in Firefox, Safari and Opera. The major downside of this is that IE also garbage collects this component even if there are references to it and it is still in use. Experimentation shows that the component lives long enough to run less than 50 javascript statements before the garbage collection kicks in.

Callback polling is using dynamic script blocks which can point to any domain. But this does not stream content. Flash Remoting is also a possible solution, but the interaction with JavaScript is not great and could be better, but it is an option worth considering.

All of the above techniques are nasty and horrible. HTML 5 makes things better with Web Sockets. That combined with DOMStorage will be a good alternative to the above hacks.

Eleven years when server-side push was in vogue it used mime messaging (using x-multipart-replace). This technique could also be used as it offers excellent performance, but this does not work in IE.

Even crazier is using "Forever GIFs". Gifs can be multi-framed (animated gifs), and this multiframing could be used to stream data from the server (if you are warped or sick enough).

Performance is an often cited reason to avoid the use of Comet, but testing with a Jetty server showed that even with 20,000 concurrent connections, the server still returns sub-second response times (the performance graphs showed 500 millisecond responses, versus about 250 for 10,000 connections). With that level of performance on a single server its more likey something else will break before performance becomes too bad.

The two connections per domain is a limiting issue, especially if you have multiple tabs open to the same domain that's streaming data. There are a number of techniques to alleviate this barrier. The best solution is multi-homed DNS - setting up loads of subdomains that map to the same space. Even if they are all on the same IP address, the browser still allows two connections per domain.

Modern browsers allow the raising of the number of per-domain connections, so these tend to now be between 4 and 8.

Client side mechanisms for alleviating the per-domain connection limit include using window.name as references, where one tab takes the lead in talking to the server, and it deals with all requests for all the tabs, using the window.name property to share data between the other tabs. Cookies are another way of sharing data between the tabs.

Joe goes on to talk about the server, talking about issues of detecting failed connections. He also talked about various approaches to writing Comet servers, using code examples in Java using the DWT libraries. He covered the two Comet architecture approaches of Inboard and Outboard - where the Comet server sits in relation to the rest of the server.

Comet is a hack. So is Ajax - which didn't stop Ajax. And it does work. Facebook routinely uses Comet, proof that Comet can work on massive and popular sites.

Christophe Porteneuve and Best practice solutions with Prototype

Christophe took us through 3 code examples of prototype being used for Ajax features. The first one was showing how to neaten up some Prototype code by taking advantage of features like chaining, better choice of selectors, overriding custom behaviours, also Event Delegation to dramatically simplify the code. The end result was a code chunk that was more readable, easier to understand, and hopefully makes maintenance easier.

The second code example was building up a persistent list of checkboxes - the basis of to-do lists. Here Christophe showed a minimal PHP to support both the rendering of the page and the Ajax interactions - using the Http-X-Requested-By: XMLHttpRequest HTTP header to branch between a normal page request and an Ajax update. After getting the basic functionality working, Christophe improved the usability of the widget by inserting loading indicators. He later built drag-and-drop allowing the list to be reordered in a persistent fashion.

The third example showed how to build a "is this login name taken" feature for a signup dialogue. Again showing the minimal PHP backend and the JavaScript needed for the basic functionality. Then he dramatically improved the usability with visual cues and messages, each step detailling a new aspect of the Prototype library.

A useful presentation showing not only the possibilities offered by the Prototype library, but also how a developer thinks his way through implementing an Ajax driven widget.

Dan Webb on Managing Complexity

Perhaps the most inspired talk of this years @media Ajax. Dan tackles the problem of dealing with pages laden heavily with Ajax features. He introduces what he calls the Behaviour pattern, which seeks to identify common behaviours on the page and create a loosely-coupled series of JavaScript classes to tackle each identified Behaviour.

On a typical Ajaxified page - from a music social network - Dan identifies common behaviours of Collapsible modules (using two different styles), Collection lists, Selections and Additions and Colour pickers.

Using jQuery and his own LowPro library, Dan attaches a Behaviour class to each element expressing a behaviour, and attaches the necessary event handlers to the element. He does this in a way where the Behaviour class knows the element it applies to, but not vice versa. This elegantly solves the memory leak problem of cross-references.

The end effect looks a marvel of simplicity. Encapsulate the generic behaviour in a JavaScript class, using a config option to deal with slight differences in that behaviour. The behaviour class then declares the events it is interested in and calls out to behaviour specific functions to do the work. The declarative nature of Dan's approach makes the resulting code quickly understandable.

A profound presentation, one I cannot do justice to in this blogpost. I hope Dan can spend some time writing up notes, or a few blog posts on this topic. I think the approach and underlying logic is right, and I feel it is a nice fit for websites that are modular in nature.

Yehuda Katz making jQuery code modular

Yehuda authored the book "jQuery in Action", still the only book on the market that covers jQuery 1.2, where he explains why jQuery does things a certain way, which helps developers unlock the potential of jQuery.

Yehuda grokked JavaScript thanks to a Douglas Crockford talk where he referred to JavaScript as a "big giant hash in the sky", and prototypal inheritance seems natural when painted this way. Yehuda gives a quick explanation of event delegation in jQuery using listen and intercept.

In this talk, Yehuda takes us step by step through building an autocomplete search field in less than 100 lines of code (that's what he promised on the session writeup). Yehuda managed it in 92 lines of code, but isn't entirely happy with certain bits that he feels aren't elegant enough.

Yehuda spends time describing the different states of an autocomplete feature, and shows how namespaced custom events can be used as a powerful mechanism for reducing code complexity to maintainable levels.

He takes the time to explain the facets and features inside of jQuery, including the data persistence, the custom events, how the state is handled with namespaced events.

To be honest, bits and pieces of the talk went way over my head, but Yehud will be making the code available with the presentation on his site, so I'll have to spend some time working through the code and figure out the bits I didn't grasp today.

Simon Willison when Ajax attacks

This was another brilliant and informative talk from Simon, who details the fundamentals of JavaScript security.

A few years ago, the shared wisdom about security can be boiled down to three tenets:

Never trust user inputs
Protect against SQL injection attacks
Don't let people inject JavaScript into your pages

The first two are boring, since they are solved problems on modern server side problems. SQL injection is largely dealt with by parameterised SQL and decent APIs. The third, today, is the most interesting.

XSS is cross site scripting. It is a security hole that allows an attacker to run his JavaScript code within your site. It is the most common vulnerability, and a single XSS hole is enough to compromise your site entirely (because of the same domain security policy).

This means that an attacker can gain access to cookies, call any JavaScript method. Inject their own HTML (useful for phishing), retarget a form to collect login details, embed malware and drive-by downloads. Basically the attacker can perform any action as the user.

There are two types of XSS, reflected attacks where a user clicks on a link that contains JavaScript to be injected into the page. The user needs only to click on the link. Persistent attacks involve injecting JavaScript into a site database, through guestbooks and forums. This boils down to displaying user entered data.

Simon shows a competition based around finding XSS vulnerabilities ordered by pagerank. There's a few Yahoo domains listed there. But also, Simon shows that there are XSS vulnerabilities in Facebook, YouTube, Google Groups and Microsoft's live search.

There was recently a mass XSS via SQL injection, hitting a large number of sites running the same application code. The app was badly written to allow the injection of SQL.

Preventing XSS involves:

Use a tool that escapes everything on output
Then only unescape stuff you know is safe and only contains markup you want to execute
IE8 has XSS filter to track potential XSS attempts and blocks it. But this is irrelevant to developers because IE8 isn't the only browser out there.
HttpOnly cookie, can't be accessed by JavaScript - these are mostly a waste of time because it doesn't stop evil stuff except the ability to copy the contents of cookies.

HTML sanitisation is an extremely common vector of XSS. The best idea is to avoid needing to do this, it is very very difficult to get this right.

The "samy is my hero", from October 2005, is the famous MySpace worm that defeated the HTML sanitisation on the user profile page. Simon dissects the worm's code showing the various techniques used to bypass the MySpace HTML sanitisation.

The worm itself infected MySpace exponentially. One user in the first hour, 221 friend requests after 7 hours, and over one million friend requests after 20 hours, which is when MySpace crashed. (and it took 18 months to prosecute Samy)

The flaws in MySpace that allowed this worm to propagate were:

Blacklisting HTML instead of whitelisting. This allowed an unknown attribute through the filter, and that attribute contained the bulk of the JavaScript code.
JavaScript code split mid-word by newlines also bypassed MySpace's filters. Internet Explorer glued the lines back together, which made the JavaScript ready to run.
the ultimate bad decision was to allow users to add whatever HTML, CSS and JavaScript they liked. And MySpace found it difficult to remove this freedom without creating a user revolt.

Google's 404 page fell victim to a UTF-7 hole. At the time Google didn't set a default character set, so Internet Explorer tries to guess the character set by looking at the first 4000 bytes of the page. By inserting in UTF-7 characters, the attacker was able to bypass Google's URL sanitisation and inject JavaScript in the page which ran. Google's policy is now to declare UTF-8 as the default character set on pages.

We can't trust CSS either. Samy used the background-url property to run a javascript expression. Also, HTC in IE and XBL in Mozilla can also be introduced in CSS, opening more vectors for attack. Live Journal was attacked this way which resulted in 10,000 accounts being stolen. They had to cut down on hosting user-authored CSS.

position: absolute was used to phish 30,000 MySpace user accounts by being used to position a retargeted login form directly over the genuine login form. The attackers released the details of all the hacked accounts. The good news is that there's hope for humanity yet, although a number of MySpace user passwords were password1, most passwords did show signs of simple password protection like not just using dictionary words.

CSRF: Cross Site Request Forgery. In 2005, when Google released the Google accelerator, many people using web applications found they were losing data. This is because Google's accelerator prefetches GET urls, and applications were using links to delete data from an application. This caused quite a stir, especially with the 37 signals group.

Apart from the argument of REST vs GET for everything, it was realised that there was a huge security risk already in using links for delete requests. A simple list of images with the src of those delete links is enough for an attacker to get the victim to delete his own data unwittingly.

Even hiding this delete behind a post isn't enough protection, as a visitor can be tempted to click a button that triggers a form post that does the same thing without proper notification. More so, this submit button can be fired using JavaScript's form.submit(). With a hidden iframe, the victim won't know what's happened.

A few years ago, Digg has no CSRF protection on the "Digg this" button. This lead to sites creating self-digging pages: programmatically hitting the digg this url without the visitor knowing.

Gmail had left a CSRF hole in their "Add filter" feature. This opened a backdoor for an attacker to add a filter rule to forward all received email to an attacker controlled email address. This hole was undiscovered for several months.

The main way to prevent CSRF is to be able to distinguish between form actions from your site from form actions from an external site. Referrer checking is unreliable, some anti-virus tools have been known to strip the referrer header from outgoing HTTP requests.

The main technique is to use a form token - called a crumb in Yahoo!. This is a hidden field on a form that contains a unique secret value that is only known to the host of the form. The server checks this crumb against a server side known value, and if it is valid, then allow the request to proceed. The crumb is unique for every user, and it needs to change over time to prevent replay attacks. One method of generating crumb is to derive it from a session cookie, like an md5 hash.

If you are not using crumbs now, you have a security hole you can drive a bus through.

Protecting the crumb becomes the vital link in protecting against CSRF. This means that an XSS vulnerability can allow the crumb to be stolen, so XSS holes are CSRF holes.

Because Ajax is done within a same-domain policy, and Ajax can set HTTP headers that regular forms can't, this means that an X-Requested-By: XMLHttpRequest can only come from your site, so if you see that in the header, you could skip crumb checking for those requests.

Plugins are security vulnerabilities. Simon talks about an exploit in PDF that allows the attacker to run any JavaScript code in the context of the domain the PDF is on, that means any domain name that contains a PDF is vulnerable to attack this way.

Flash's crossdomain.xml also opens up a vulnerability. This is mitigated slightly by moving API calls to a different domain to the website. Flickr for example has its API calls on api.flickr.com.

Flash can be tricked with fake crossdomain.xml files. One example Simon showed was a corrupted gif file that contained a crossdomain.xml, which gave attacker access to data on the target domain. Flash can fake the Ajax X-Requested-With HTTP header, and has access to crumbs, so it is definitely a vector for attack.

Simon spends time describing JSON-P, which is wrapping a JSON object in a function call, where the function is identified by a parameter on the request URI query string. He shows that if the JSON-P contains user sensitive information at a guessable URL, then that data is at risk. He shows a vulnerability in Google Docs which made a user's contact list publicly gettable by an attacker.

Even with JSON, Simon shows an exploit done by extending the Array object in JavaScript that writes out data within a JSON string.

In terms of users defending themselves, Simon talks about Firefox's NoScript extension which allows a whitelist of sites that can run JavaScript, and no JavaScript for the rest. This is enormously inconvenient for the user to enable scripting on a site by site basis, and it still doesn't protect the user from CSRF based on gettable URLs and clicking buttons.

As developers we need to stay informed. Simon offers the following resources:

One thing Simon has yet to find is a good tutorial or article for the best way of creating crumbs.

End of day two

The second day of @media Ajax turned out much better than the first, lots of technical details and code to dive into and peruse. Overall, the conference was a decent opportunity to see what is happening in the Ajax world. To be honest, Simon stole the event with a remarkable and interesting presentation on security.

[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]