Publication - Organization of the Hearsay-II Speech Understanding System

Authors: Lesser, V.R.; Fennell, R.D.; Erman, L.D. and Reddy, D.R
Title: Organization of the Hearsay-II Speech Understanding System
Abstract: Hearsay II (HSII) is a system currently under development at Carnegie-Mellon University to study the connected speech understanding problem. It is similar to Hearsay I (HSI) in that it is based on the hypothesize-and-test paradigm, using cooperating independent knowledge sources communicating with each other through a global data structure (blackboard). It differs in the sense that many of the limitations and shortcomings of HSI are resolved in HSII. The main new features of the Hearsay II system structure are: 1) the representation of knowledge as self-activating, asynchronous, parallel processes, 2) the representation of the partial analysis in a generalized three-dimensional network (the dimensions being level of representation (e.g., acoustic, phonetic, phonemic, lexical, syntactic), time, and alternatives) with contextual and structural support connections explicitly specified, 3) a convenient modular structure for incorporating new knowledge into the system at any level, and 4) a system structure suitable for execution on a parallel processing system. The main task domain under study is the retrieval of daily wire-service news stories upon voice request by the user. The main parametric representations used for this study are 1/3-octave filter-bank and linear-predictive coding (LPC)-derived vocal tract parameters [10,11]. The acoustic segmentation and labeling procedures are parameter-independent [7]. The acoustic, phonetic, and phonological components [23] are feature-based rewriting rules which transform the segmental units into higher level phonetic units. The vocabulary size for the task is approximately 1200 words. This vocabulary information is used to generate word-level hypotheses from phonetic and surface-phonemic levels based on prosodic (stress) information. The syntax for the task permits simple English-like sentences and is used to generate hypotheses based on the probability of occurrence of that grammatical construct [19]. The semantic model is based on the news items of the day, analysis of the conversation, and the presence of certain content words in the partial analysis. This knowledge is to be represented as a production system. The system is expected to be operational on a 16-processor minicomputer system [3] being built at Carnegie-Mellon University. This paper deals primarily with the issues of the system organization of the HSII system.
Publication: IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol: ASSP-23, Num: 1, pp. 11 - 24
Date: 1975
Sources: PDF: /Documents/lesser/LesserIEEE_75.pdf
Reference: Lesser, V.R.; Fennell, R.D.; Erman, L.D. and Reddy, D.R. Organization of the Hearsay-II Speech Understanding System. IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume ASSP-23, Number 1, pp. 11-24. 1975.
bibtex:
@article{Lesser-280,
  author    = "V.R. Lesser and R.D. Fennell and L.D. Erman and
               D.R Reddy",
  title     = "{Organization of the Hearsay-II Speech
               Understanding System}",
  journal   = "IEEE Transactions on Acoustics, Speech, and Signal
               Processing",
  volume    = "ASSP-23",
  number    = "1",
  pages     = "11-24",
  year      = "1975",
  url       = "http://mas.cs.umass.edu/paper/280",
}