Weblogs: Intelligent Agents

Proxy server headaches

Monday, March 24, 2003

I had a second go at creating an HTTP proxy compliant to HTTP/1.1 specifications this weekend. My first solution was a simple single-thread type "read request, do request, send results back, loop" type structure, and it seemed to me that Internet Explorer wasn't reusing the Keep-Alive connection. It was also dire in its handling of a HTTP request, not knowing when an HTTP request was complete (from the browser's side).

The second solution was to separate the processing into two bits. One thread just reading in the HTTP request from the browser and building a simple Request Object that knew when it had all the details available. When the full HTTP request was received, it dumped the request object into a Vector. A separate thread then went through the Vector, treating it as a queue, firing off each request waiting for the response and sending that back to the browser. Seemed all and well on paper, but there's a nasty flaw - the second thread keeps polling the Vector with "any more?" requests, and for some odd reason it takes up to four seconds for an entry in a Vector to actually be seen by the thread. Hmmm. This is not good - although it worked quite nicely. It also demonstrated how Internet Explorer uses multiple connections. A load of msn.com's frontpage used up to eight connections each with anything from two to eight requests through it. The surprising thing was Internet Explorer was requesting resources from different domain names through the same connection - so much for the Keep-Alive. Maybe that means that the Keep-Alive is purely supposed to be between the browser and the proxy (of course sending the HTTP parameter Proxy-Connection: Keep-Alive is probably an obvious indicator of that!).

Java Examples in a Nutshell gives an example of a simple proxy, using two anonymous threads, but its not very bright - it doesn't look at what's coming through, so its ability to filter incoming content is severely restricted. Although, I guess I'll have to use that as a starting point and build up some HTTP intelligence. The solution is quite neat: One thread taking everything from the browser and passing it to the server, the other thread does traffic from the server back to the browser. There's no room there for the problem I had with the Vector above.

The truely awkward bit is how to determine when to close the connection? Since Internet Explorer doesn't explicitly ask it to happen. A simple solution would be a time-out after the last byte has been sent, and stopping the counter when a new request is received on a live connection. This gets a bit messy.


[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]