Weblogs: Intelligent Agents

Shining and Polishing

Monday, September 16, 2002

The second implementation, actively seeking out textual content, is done, and theRegister parses rather nicely. I had to put back the rule to drop paragraphs with less than three words to remove the menu titles, and initially I had a weighting against bold and italic paragraphs which dropped the signoffs and author details on theRegister.

Now just a few refinements to put in:

[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]