Weblogs: Intelligent Agents

Laying the knowledgebase foundation

Wednesday, February 19, 2003

The excellent repercussion about Chandler's discussion forums is that it sparks off those little electrons in the brain, and that has the habit of getting me thinking. Thinking is a dangerous thing. Thinking causes itches - the sort of itches that if scratched result in new Operating Systems - the good old fashioned developer's itch.

I took a few simple ideas and bundled them into a little pile of pieces, after adding more and more bits I got a finished jigsaw. Unfortunately I saw the entire jigsaw, and the picture it contained. At that point I realised there is no such product, and of course once you realise there is no spoon... damn, I should have taken the blue pill.

So the product is a knowledge management, content management, semantically structured, topic orientated, ontology compatible tool for capaturing information from email, websites and the troll-play-pen called Usenet. Also it needs to be able to take advantage of an Internet connection using Intelligent Agents to monitor updated sites looking for information that I might be interested in (based on my interest profile, naturally). It needs to provide the services of a proxy server (so it can monitor and capture notes I make about websites and newsgroup postings), plus have some sort of scheduling ability so it can fire up Intelligent Agents (to trawl the web on my behalf). The application needs to be self contained and portable, so it can run of a 340Mb Microdrive, but it also needs to be networkable, so any machine on a network can update or browse through its knowledge store.

So the design boils down to a client-server application. The client can be either a browser (so requiring a web-front end), or a specialised GUI. But there's always a fallback to a "command shell". Now the good thing about it is that since the bulk of the work is done on the server portion, the GUI can be either crossplatform application, or refined for the particular platform it is running on. So a generic client for Windows and Linux can be complemented with an optimised cut down version for the Sharp Zaurus - after all what's the point of knowlege management that's stuck indoors and cannot be consulted on the move (even better than that is using the Opera browser as the GUI!!)? Running the app off the Microdrive allows laptops to be used immediately by just slotting in the PCMCIA card. Since the microdrive is a Compact Flash type device, it just plugs into the back of the Zaurus.

The server itself needs to be a multithreaded listening on multiple ports. The services need to be configured on the fly, preferably all services need to be dynamically loaded at run time. Java excels in dynamic class loading and instantiation, so its the obvious trick to pull this off. Running different services is nothing more than listening on different ports, although the service configuration needs to be file driven, so that each "environment" that the knowledgebase runs on only runs the services it requires. The Zaurus for instance probably won't need the Intelligent Agent framework active, or the proxy server running since there's no Internet connection, and we wouldn't want the batteries to depart off to the great cell in the sky.

The tricky bit is data storage. A database is a requirement (don't know whether object databases will help here). But installing mySQL or postgreSQL is out of the question because they can't be installed on a microdrive using one laptop and still work when the microdrive is plugged into another laptop. It has to be a box independant highly portable database format. Two solutions come to mind - DBF files (harking back to the DBase days, now called the XBase format), and native java databases.

I especially like the idea of DBF databases, since you can treat them as databases in Java, but being a truely file based. From a portability sense, that is nirvana. Although being a file based database, file access times are killing. So caching is probably be a wise idea, with the drawback of memory usage.

The native java database approach is tempting and probably be preferred if the DBF solution fails. InstantDB and TinySQL both look to be good java-based implementations. The only disadvantage is that the data isn't as portable as the DBF format, so I'll need an export program (or service) to extract the data if I need to use external tools.

I'd like multiple instances of the server running, so the client can connect to the right one, but with different locations replication of data is needed. I'm designing the file structure so that all the data plus files to do with one knowledgebase is self contained under a directory structure. That allows a user to quickly copy off on directory (and all its little children) to another device as the simplest way of cloning a knowledge base. Then somehow I need to keep track of what has changed on the two databases. A timestamped key log file sounds reasonable, then just a simple "if one changed, update the other" routine which then allows the user to merge two changes if there is a conflict.

Two servers should be able to replicate over a network (need two services to handle that), but for redundancy one server should be able to take two directory structures and do a Notes type replication between them. That will be cool if it works

Acting as a proxy server seems to be the best and most transparent way to allow an Intelligent Agent framework to watch my surfing habits and build an interest representation. Plus, by running a web-server of a different port this offers ways of annotating webpages. The server needs to cache the html pages (well maybe all text/html or text/plain pages) so that it doesn't have to then retrieve the page again if I decide to annotate it. It should also keep a history of sites that I've visited, and remind me when I request a page I've already seen that hasn't modified.

Usenet is a contentious one. I need something transparent there. The best I can come up with at the moment is just to have a web based interface that allows me to paste in message-id's that I want to archive, and then annotate them via this web-based interface. A more ambitious idea is for the server to listen on the SMTP port, thus hijacking outgoing emails. Then I could just do a "reply to" which gets "delivered" to my knowledgebase and not the actual poster. This won't affect my normal email usage now since I have all my email stored online, accessed via a web-based email reader anyway.


[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]