As speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition is becoming increasingly important. Robustness in speech recognition refers to the need to maintain good recognition accuracy even when the quality of the input speech is degraded, or when the acoustical, articulatory, or phonetic characteristics of speech in the training and testing environments differ. Obstacles to robust recognition include acoustical degradations produced by additive noise, the effects of linear filtering, nonlinearities in transduction or transmission, as well as impulsive interfering sources, and diminished accuracy caused by changes in articulation produced by the presence of high-intensity noise sources. Although progress over the past decade has been impressive, there are significant obstacles to be overcome before speech recognition systems can reach their full potential. Automatic speech recognition (ASR) systems must be robust to all levels, so that they can handle background or channel noise, the occurrence on unfamiliar words, new accents, new users, or unanticipated inputs. They must exhibit more “intelligence” and integrate speech with other modalities, deriving the user’s intent by combining speech with facial expressions, eye movements, gestures, and other input features, and communicating back to the user through multimedia responses.
The aim of this e-book series is to bring together many different aspects of the current research on robust automatic speech recognition and speech technology. The book is divided into 4 sections: i) voice activity detection, ii) speech enhancement, iii) speech recognition, and iv) emerging applications. Section i consists of 4 papers dealing with model-based techniques, GARCH processes, contextual or long-term information for voice activity detection and noise suppression for robust speech recognition. Section ii consists of three papers including an indepth review of the state-of-the art in speech enhancement, an independent component analysis technique for speech enhancement, and statistical model based techniques for speech enhancement and robust speech recognition. Section iii consists of six chapters devoted to analyzing different techniques including Bayesian networks, missing features, distribution-based feature compensation, multiple regression, to improve the robustness of speech recognition systems in noise environments and for single and multi-channel speech recognition. Section iv consists of a single paper showing advances in human-machine systems for in-vehicle environments.
The E-Book “Recent advances in robust speech recognition technology” is oriented to a wide audience including: i) researchers, professionals and technical experts working in the fields of robust speech recognition, speech enhancement, speech/music detection in noise, ii) the entire signal processing and communications community interested in processing and transmitting speech and music for next generation multimedia applications, and iii) technical experts requiring an understanding of speech/music transmission and recognition in noise over mobile and other networks, as well as postgraduate students working on robust speech/music processing and transmission.
One of the key benefits of this E-Book is that the readers will have access to novel research topics ranging from speech enhancement, robust speech recognition, voice activity detection and its application to demanding scenarios like in-vehicle speech management and robustness. All these topics will be covered in depth and in a more illustrated fashion than in other journals.
We would like to express our gratitude to all the contributing authors that have made this book a reality. We would like to also thank Dr. Acero for writing the foreword and Bentham Science Publishers, particularly Manager Asma Ahmed, for their support and efforts.
Javier Ramírez and Juan Manuel Górriz
University of Granada, Spain