[Back to Sound DB Home]

RWCP Sound Scene Database in Real Acoustical Environments
Sound interval detection for environment sound convolution speech

Kazuo Hiyane and Jun Iio, Mitsubishi Research Institute,Inc.

1. Convolution of the dry source with the impulse response

The "dry source" is the sound data measured in an anechoic room. Reverberation (reflected sound) and background sound, which are problematic in an ordinary room and outdoors, are almost absent in an anechoic room. Sound reaching the microphone directly from the sound source is measured. In other words, the dry source represents the data of the original sound generated by the sound source.

The dry source offers the best advantage in reproducibility to reconstruct the sound in a specified room. If the acoustic characteristics (the impulse response) of the room are measured, convolution therewith allows adding the reverberation as if it occurred in the room.

The pinciple thereof is very simple. If the dry source of clapping and the impulse response in a certain room are assumed as X(I), t = 1, ..., N and w(i), t = 1, ...M, respectively, the sound of clapping in the room can be reconstructed by the following formula.

y(t) = sum i=1 to M x(t-i) w(i)


Dry source of clapping sound (top), impulse response (middle) and reconstructed signal (bottom).

However, even a dry source of only 1 second of 48 kHz sampling has a 48,000 points of data length and the impulse response has normally the length of some ten thousand points, necessitating billion times multiplication. In the following section, a high-speed calculation using Fast Fourier Transform (FFT) and a convolution calculation program are presented.

2. High speed convolution calculations by Fast Fourier Transform

Simple case

In the case where the dry source is shorter than the impulse response N < M), convolution can be achieved by one-time Fourier transform. For simplification, the data length M of the impulse response is defined as a power of 2.

  1. Zero is inserted after the dry source x(t) and the data length M is defined as 2M.
    Zero is inserted after the impulse response w(t) and the data length M is defined as 2M.
  2. The dry source and the impulse response are subjected to Fourier transform (F[*],) respectively.
         X(f) = F[x(t)]   f = 1,กฤ,2M
         W(f) = F[w(t)]   f = 1,กฤ,2M
    
  3. A product of multiplication of complex numbers is yielded for each frequency to seek a value by Fourier transform of a reproduced signal.
         Y(f) = X(f) W(f)   f = 1,กฤ,2M
    
  4. The reproduced signal is subjected to Fourier transform (F-1[*]).
         y(t) = F-1[Y(f)]   t = 1,กฤ,2M
    

Case where data length of dry source is large

The impulse response is at longest 1 to 2 seconds, while the dry source occasionally reaches some 10 seconds. Theoretically, when zero is inserted after the impulse response to make the length of the impulse response equal to that of the dry source, followed by Fourier transform, the above described method can be applied to calculation. However, calculation even for zero inserted causes poor efficiency and in addition, there is a limit to the memory of a computer. Thus, it is preferable to divide the dry source for calculation.

Steps of dividing convolution are shown below.

  1. Divide the dry source in the length M.
  2. The dry source in respective intervals is subjected to Fourier transform, multiplication product of complex numbers, and reverse Fourier transform similar to those described above to calculate the reproduced signal of the length 2M.
  3. All reproduced signals are added by moving for M.

3. Software

A program for convolution of the dry source with the impulse response is provided. An operation environment, in which it is certain that computers will work properly, is RedHat Linux 6.2. A general UNIX environment may allow compilation and execution. The condition of distribution of the software follows GPL (GNU General Public License) regulations.

Installation

In glibc-2.1 Linux environment such as RedHat Linux 6.x, an executable of "dryconv" is simply copied at an execution path such as /usr/local/bin.

For compilation from a source file "dryconv.c", the following GNU's science library (GSL) for numerical analysis is required.

After completion of installation of the GSL, the source file is copied to a proper directory to compile as follows.
    $ gcc -O2 -o dryconv dryconv.c -lgsl -lgslblasnative

Usage

  usage : dryconv [option] dryfile impulsefile [outfile]

  (necessaries)
    dryfile     : dry source file        - 16bit int
    impulsefile : impulse response file  - 32bit float
  (options)
    -len ???    : output length of data [point]
    outfile     : reproduced signal file - 16bit int

As an argument, a dry source name and an impulse response name are given. Unless an output file name is given, it is flushed to a standard output. The data length of the output file can be designated by an option -len. To make the level of the reconstructed signal equal to the output of the dry source, the impulse response is normalized for convolution calculation.

Respective file formats are as follows. All are the Raw format without any header.

FileFormat
(input)dry source 16 bit integer Raw format
(input)impulse response32 bit float Raw format
(output)reproduced signal 16 bit integer Raw format
4. Sample

In order to reconstruct clapping sound, output data of a meeting room on the basis of claps.dat, the dry source data of sound of clapping hands, and ir383.dat, the impulse response of the meeting room (reverberation time 0.38 sec.) which have been presented in the first graph, the program is executed as follows.

    $ dryconv claps.dat ir383.dat output.dat

[Back to Sound DB Home]

RWCP Sound Scene Database in Real Acoustical Environments
Copyright (c) 1998-2001 Mitsubishi Research Insitute,Inc.