Victor Lesser Bryan Horling Frank Klassner
Anita Raja
Thomas Wagner Shelley XQ. Zhang
UMass Computer Science Technical Report 1998-03
Effective information gathering on the WWW is a complex task requiring planning, scheduling, text processing, and interpretation-style reasoning about extracted data to resolve inconsistencies and to refine hypothesis about the data. This paper describes the rationale, architecture, and implementation of a next generation information gathering system - a system that integrates several areas of AI research under a single research umbrella. The goal of this system is to exploit the vast amount of information sources available today on the NII including a growing number of digital libraries, independent news agencies, government agencies, as well as human experts providing a variety of services. The large number of information sources and their different levels of accessibility, reliability and associated costs present a complex information gathering coordination problem. Our solution is an information gathering agent, BIG, that plans to gather information to support a decision process, reasons about the resource trade-offs of different possible gathering approaches, extracts information from both unstructured and structured documents, and uses the extracted information to refine its search and processing activities.