RWCP Autonomous Learning Functions MRI Laboratory

Introduction

-->[Research results]

[Japanese]


Objective

Practical speech recongtion softwares have arose in the market recently. We are, however, surrounded by varisous sounds except speech. Such non-speech sounds give us rich infomation such as impacts or friction of things, or condition of machines. It is also useful for computers and robots to decide, to learn, and to move autonomously.

Moreover, there is some chance of enhancing human-computer communication. Everyone can make sounds easily; clapping hands, whistling, patting a can or a bottle, tinkling a bell. Non-speech sound enables us to command and reply to computer without special input devices and training.

Thus we are working on development of non-speech sound recognition technology.


Themes

Non-speech Sounds Recognition  --> Research results

Our research start with examination of (1) relationship between non-speech sounds and Japanese onomatopoeia (words in imitation of sounds) and (2) spectrum structure of typical non-speech sounds. We classified non-speech sounds into several categories.

We are developing (3) non-speech sound recognition technologies that are suit for individual categories such as short time impluse sound, and broad-band/short-band continuous sounds. Our target is to recognize more than ten types of non-speech sounds (ex. clapping hands, whistle, impact of cans or bottles, tinkling bells, and phone ringing) as thier source names or as onomatopoeia in 500msec.

Our research includes (4) beamforming technologies using microphone array that estimate sound position and emphasize sound from specific direction. Our target is, with 16 channels circuler microphone array, to estimate at least two sound source directions at the same time, and to improve SNR (signal noise ratio) by 20dB at 30 degree apart from other sound source.

We are also concerned with the applications of non-speech sound recognition and beamforming technologies. These are (5) interactive sound action game and (6)non-speech sound recognition module on Jijo-2 robot of ETL.

Real World Speech and Acoustic Database  --> Research results

We are developing real world speech and acoustic database that includes various non-speech sound sources and sound data by microphone arrays for studies such as non-speech sound recognition and beamforming.

We collect (1) non-speech sound source data in unechoic room at high quality 48kHz sampling. Sound data recorded in unechoic room is called `dry source' because sounds at various room can be reproduced by convolution with impulse response of the room.

(2) Microphone array that multiple microphones are arranged circular or spherical, has super directivity towards arbitrary direction at the need. We started to measure fundamental characteristics of microphone array. Then, we measure stationary, moving, multiple sound sources precisely. Real sound scenes will be also measured.

The database will be opened to academic as CD-ROM or DVD-ROM.
-- see Real World Speech and Acoustic Database Homepage (in Japanese)


Members


RWCP Autonomous Learning Functions MRI Laboratory
Otemachi 2-3-6, Chiyoda, Tokyo, Japan, 100-8141
(in Infomation Research Center,   Mitsubishi Research Institute,Inc.)
Director: Kazuo Hiyane <hiya@mri.co.jp>
TEL: +81-3-3277-0750   FAX: +81-3-3277-3471