More than anything, its probably my lack of experience with multithreading and thread control, and socket programming that's largely to blame for my inability to solve this frustrating problem. There could be a number of factors that are causing this slowness:
So how do I resolve this? I can probably eliminate VAJ as the cause by running my application outside of it. Although I am logging to stdout at regular events, its possible that this may be delayed so it only looks like the problem is in the connection listener. So it would be best for me to include a timestamp for each log entry and I can determine from that if the delay is where it seems to be.
As for the threading, I'm going to have to find better resources. I've winged through threading without actually understanding it, and without an idea of how time-slices are managed. I'm guessing (hoping) that time is allocated evenly through threads by default, and this may not be the case.
One practical solution is to use someone else's proxy server, like Surfboard. Had a brief glance at that and although it is written completely in java, its using a shell script for start-up, which means it may not be all that trivial to run under windows, especially when it comes to the location of the default configuration file.
Or maybe Java is just too slow for what I need or expect.
]]>What sort of modifications and updates do we need to do:
So its really about interacting with a knowledge base as if it were a text adventure. Locations become topics, for example Bruce Willis. To get to the location about Bruce Willis, teleporting there directly is a little difficult. You'd have to start out with something broad, like "Its an actor", and gradually building up more detail, "The guy from Die Hard and Armageddon". Once you get there - to the location that is Bruce Willis, you'd be able to explore the nearby locations - his career, details about his life, his marriage, related websites, interesting information.
Thankfully the more I think about this relationship between KR and IF, the more reasonable it is to see IF as nothing more than a shell in this instance - a command line. Ahh, I've been here before - when implementing a shell "API" for my proxy. Good! I was starting to worry about my mental state :-)
]]>The second solution was to separate the processing into two bits. One thread just reading in the HTTP request from the browser and building a simple Request Object that knew when it had all the details available. When the full HTTP request was received, it dumped the request object into a Vector. A separate thread then went through the Vector, treating it as a queue, firing off each request waiting for the response and sending that back to the browser. Seemed all and well on paper, but there's a nasty flaw - the second thread keeps polling the Vector with "any more?" requests, and for some odd reason it takes up to four seconds for an entry in a Vector to actually be seen by the thread. Hmmm. This is not good - although it worked quite nicely. It also demonstrated how Internet Explorer uses multiple connections. A load of msn.com's frontpage used up to eight connections each with anything from two to eight requests through it. The surprising thing was Internet Explorer was requesting resources from different domain names through the same connection - so much for the Keep-Alive. Maybe that means that the Keep-Alive is purely supposed to be between the browser and the proxy (of course sending the HTTP parameter Proxy-Connection: Keep-Alive is probably an obvious indicator of that!).
Java Examples in a Nutshell gives an example of a simple proxy, using two anonymous threads, but its not very bright - it doesn't look at what's coming through, so its ability to filter incoming content is severely restricted. Although, I guess I'll have to use that as a starting point and build up some HTTP intelligence. The solution is quite neat: One thread taking everything from the browser and passing it to the server, the other thread does traffic from the server back to the browser. There's no room there for the problem I had with the Vector above.
The truely awkward bit is how to determine when to close the connection? Since Internet Explorer doesn't explicitly ask it to happen. A simple solution would be a time-out after the last byte has been sent, and stopping the counter when a new request is received on a live connection. This gets a bit messy.
]]>