Jeremy S. De Bonet : Projection of voice onto recognition subspaces




Resume
Research
PROJECTS
     
ImageMosaics
Images to ASCII
Learning PacMan
Graphics
Web Hacks


Phase invariant features

Most voice recognition systems use features which are sensitive to only the amplitude signal spectrum. By using (typically overlapping) windows of short duration, such systems trade off phase invariance with phase sensitivity.

To envision input sounds which would be indistinguishable to these features, and therefore indiscriminable to the recognition systems, we can randomize the phase of a speech signal within overlapping windows of varying sizes.

Figure 1: 2000 samples of speech (in black) are phase scrambled within overlapping windows of 256 samples, and re-projected back into the time domain (shown in red). The signals are clearly very different.

Window size
Original
512
256
128
64
32
Source: female-clean.wav

Table 1: Examples of speech which has been phase scrambled within overlapping windows of varying length

Cepstral subspace projections

Features used in recognitions systems are often based upon the cepstral domain. Typically the cepstrum is computed and the top N coefficients are kept for recognition. Thus, any variations which occur in coefficients beyond the top N cannot be distinguished by these features.

By projecting onto this N dimensional subspace, and adding noise of varying power in the dimensions off of this subspace, we can create a wide variety of signals which would be invariant to such systems.

Figure 2: The spectrum of a block of speech shown in black, is projected onto a 10 dimensional subspace of the cepstral domain, and projected back resulting in the spectrum shown in red.

Figure 3: A section of the signal shown in black, after projection onto a onto a 10 dimensional subspace of the cepstral domain, shown in red.

Cepstral Features
All
30
30
30
30
20
20
20
20
10
10
10
10
Orthogonal Noise
None
None
10%
20%
50%
None
10%
20%
50%
None
10%
20%
50%
Source: female-noisy
Source: male-clean
Source: male-noisy
Source: test2

Table 2: Several examples of speech projected onto subsets of cepstral coefficients, with varying levels of noise added in the orthogonal dimensions.



Jeremy S. De Bonet
jsd@debonet.com
return to main page

Page loaded on April 25, 2024 at 04:47 PM.
Page last modified on 2006-05-27
Copyright © 1997-2024, Jeremy S. De Bonet. All rights reserved.