The integration of the different AI problem solvers in BIG, namely the RESUN planner, the Design-to-Criteria scheduler, the CRYSTAL information extraction system, with each other and the web retriever agents, the different data storage mechanisms and process modeling systems, is a major accomplishment in its own right. The integration of these systems and tools has enabled us to study the systems in a different light than they have been studied in a stand-alone research environment. For example, the software product domain, one of BIG's IG areas, is a new domain for the CRYSTAL extractor that required new training and new methods for handling documents, e.g., reviews and product comparisons, that are structured differently from the genres of documents dealt with in the past (e.g., terrorist articles and medical reports). We also have an interesting extraction problem when dealing with complimentary, but not competitor products. For example, when searching for word processors BIG is likely to come across supplementary dictionaries, word processor tutorials, and even document exchange programs like Adobe Acrobat. These can be misleading to the extraction tools and to BIG in general because they are referenced much like a competitor product and the documents about these products often contain terminology that further supports the notion that they are competitors rather than complimentary products. We are experimenting with enhancements to our information extraction systems to cope with this and planning to use a tf/idf style document classifier [1] to prequalify documents before running the extraction system on them.
We have also learned new things about the Design-to-Criteria scheduler and discovered some modeling problems with applying the TÆMS task modeling framework to this application. For example, in the information gathering task structures there is a notion of search activities producing some number of documents to process, and document processing time is tied to this number of documents; additionally, the final decision making process is tied to the number of documents that are processed because with each processed document, there is some probability that it will lead to new information objects that must be considered at decision time. This dependency is data-driven and TÆMS only models certain types of domain problem solving states. We have been able to model this task adequately using existing modeling constructs, but, inaccuracies in the models sometimes lead to less-than-perfect expectations. The solution is the addition of a database resource in TÆMS that can record and model the state information pertaining to the number of documents retrieved, the number of documents processed, and the number of information objects to be considered at decision time. A secondary enhancement is the creation of new TÆMS non-local-effects to model soft task interactions, e.g., hinders and facilitates, that have an additive, rather than power-multiplier, effect.
Another major integration issue is the balance between a top-down end-to-end view of problem solving and a reactive, opportunistic view. These two views are embodied by the scheduler and the RESUN planner respectively. The scheduler designs schedules to meet real-time and real-resource performance criteria by scheduling activities from start to finish. RESUN, on the other hand, is an opportunistic problem solver that responds to newly learned information and performs processing on whatever hypothesis seems most significant at a given time step. Currently, BIG uses little of RESUN's opportunistic control to react to changes in the problem solving state. We are working on integrating the two way feedback loop between the planner, task assessor, and scheduler, that will enable the system to react, where appropriate, to changes in the problem solving state. The major issue is identifying when it is beneficial to incur the cost of rescheduling BIG's planned actions and potentially disrupting finish time guarantees that have been communicated to the client. This tension between opportunistic, bottom-up, data-driven control and top-down process-centric control is one of the major open questions in BIG but also potentially our largest gain in terms of the ability to effectively retrieve, process, and make decisions with Web-based information. Relatedly, we also intend to study a slightly different view of BIG's control as an anytime process.
As we have discussed, the integration of these components in BIG, and the view of the IG problem as an interpretation task, has given BIG some very strong abilities. First there is the issue of information fusion. BIG does not just retrieve documents. Instead BIG retrieves information, extracts data from the information, and then combines the extracted data with data extracted from other documents to build a more complete model of the product at hand. RESUN's evidential framework enables BIG to reason about the sources of uncertainty associated with particular aspects of product object and to even work to find corroborating or negating evidence to resolve the SOUs. BIG also learns from previous problem solving episodes and reasons about resource trade-offs. As shown, given different allotments of cost and time, and even different desired quality levels, BIG can analyze its options and plan to achieve the decision goal while meeting the client's search criteria. Though cost is not an issue spotlighted in the examples in this paper, cost on the web is a reality. For example, in the automotive product domain different sites charge different amounts for information such as invoice prices, and some sites are free, but offer less timely and less precise information.
In summary, we are excited by the BIG project. The integration of different AI systems in BIG is leveraging our technologies and providing us with new and fertile research ground while addressing the information explosion, a very real and important task.