0% found this document useful (0 votes)
28 views

Decorrelation of Input Signals For Stereophonic Acoustic Echo Cancellation Using The Class of Perceptual Equivalence

This document proposes a method to decorrelate input signals for stereophonic acoustic echo cancellation using the concept of perceptual equivalence. It discusses how high correlation between input signals decreases the performance of acoustic echo cancelers. Previous decorrelation methods are classified as either not considering human auditory properties, or adding signals below the masking threshold to be inaudible. The proposed method modifies the input signals within the "class of perceptual equivalence" defined by the upper and lower bounds of perceptual equivalence, to decorrelate the signals without degrading perceptual quality. It is shown experimentally that this method improves echo canceler performance over other nonlinear transformation techniques.

Uploaded by

Anis Ben Aicha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Decorrelation of Input Signals For Stereophonic Acoustic Echo Cancellation Using The Class of Perceptual Equivalence

This document proposes a method to decorrelate input signals for stereophonic acoustic echo cancellation using the concept of perceptual equivalence. It discusses how high correlation between input signals decreases the performance of acoustic echo cancelers. Previous decorrelation methods are classified as either not considering human auditory properties, or adding signals below the masking threshold to be inaudible. The proposed method modifies the input signals within the "class of perceptual equivalence" defined by the upper and lower bounds of perceptual equivalence, to decorrelate the signals without degrading perceptual quality. It is shown experimentally that this method improves echo canceler performance over other nonlinear transformation techniques.

Uploaded by

Anis Ben Aicha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DECORRELATION OF INPUT SIGNALS FOR STEREOPHONIC ACOUSTIC ECHO

CANCELLATION USING THE CLASS OF PERCEPTUAL EQUIVALENCE

Anis Ben Aicha and Sofia Ben Jebara


Ecole Supérieure des Communications de Tunis
Research unit TECHTRA
Route de Raoued 3.5 Km, Cité El Ghazala, Ariana, 2083, TUNISIA
anis ben aicha@yahoo.fr, sofia.benjebara@supcom.rnu.tn

ABSTRACT d x1 (n)
In communication systems using stereophonic signals,
G1 

d x2 (n) J
the high correlation of input signals decreases dramatically
 1 JH1
the behavior of acoustic echo cancelers (AEC’s) especially G2 Ĥ2 Ĥ1
S J

 H2S J
the misalignment between the estimated impulse response source
and the real impulse response of receiving room. In this pa-

+j S
wJ
S ^
d
e(n) ŷ(n)
per, we focus on the decorrelation of input signals without -n y(n)
+
any modification of auditive quality. Using perceptual prop-
erties, we show that it is possible to find a set of signals which
are perceptually equivalent to input signals, in spite of their
Figure 1: Basic stereophonic acoustic echo canceler.
different spectral and temporal shapes. Hence, the ‘class of
perceptual equivalence’ (CPE) is defined. It is an interval
build in the frequency domain limited by two bounds: the up-
per bound of perceptual equivalence (UBPE) and the lower To improve AEC’s behaviors, many methods are proposed in
bound of perceptual equivalence (LBPE). These two bounds literature to decorrelate the transmitted signals x1 and x2 . We
are used as perceptual transparent non linear transforma- can classify them into two categories according to the usage
tions of input signals in order to decorrelate them. We show or not of the human auditory properties.
experimentally that the improvements yielded by this method The first category groups together decorrelation tech-
are higher than those of classical non linear transformations. niques without perceptual considerations. In [2], an indepen-
dent random noise with low level is added in each channel in
1. INTRODUCTION order to reduce the correlation between them. In [4], a chan-
nel dependent signal is added, it is obtained by a nonlinear
In variety of speech communication systems such as telecon- transformation of each channel.
ferencing systems, acoustic echo cancelers are necessary to In the second category, human perceptual properties are
remove undesirable echo that results from coupling between taken into account when adding to each input signal another
loudspeakers and microphones. Early systems consider the one which should be quasi-white and under the input signal
single channel case. Novel applications such as hands-free masking threshold rendering them inaudible. This technique
communications, home entertainment and virtual reality con- turns out to be more efficient than non perceptual decorrelat-
tribute to the development of multi-channel signal processing ing methods [5].
techniques providing the users with enhanced sound quality. Instead of adding another signal, we propose, in this pa-
In this paper, we consider specifically the stereophonic per, to perceptual modify the input signals in order to decor-
case. Fig.1, represents the conventional acoustic echo can- relate them without perceptible degradation. This fact en-
cellation (AEC) scheme. The transmitted signals from re- sures the perceptual transparency of the proposed transfor-
mote room x1 (n) and x2 (n) are restored in local room by two mation and improve the behaviors of stereophonic AEC’s.
loudspeakers and picked up by two microphones. If nothing In previous works [6] [7], we demonstrate that it is possi-
is done, the picked signal is retransmitted to the remote room ble, for each signal, to construct a class of perceptual equiv-
producing the undesired acoustic echo. alence (CPE). It is defined as an interval where signals have
Conventional AEC’s seek to estimate the echo using the same auditory properties as the original one. This class is
adaptive finite impulse response filters (Ĥ1 , Ĥ2 ) to model the built in the spectral domain with perceptual rules, it is limited
acoustic impulse response of the two echo paths (H1 , H2 ) be- by two boundaries: the low boundary of perceptual equiva-
tween loudspeakers and one of the two microphones. Simi- lence (LBPE) and the upper boundary of perceptual equiva-
lar paths couple to the other microphone, but for the scheme lence (UBPE). This concept introduces a great degree of free-
simplicity, they are not shown. dom to choose signals exciting the adaptive filters which have
In contrast to the case of mono-channel, the problem the same perceptual quality as the original ones and charac-
of stereophonic echo cancellation is much more difficult to terized by low cross-correlation. Hence, this paper aims find-
solve. The reasons why the problem is difficult are well ex- ing and justifying such signals.
plained in [1] [2] and are mainly due to the strong correlation The paper is organized as follows. Section 2 recalls the
of the transmitted signals x1 and x2 . Indeed, the two signals fundamental problem of stereophonic acoustic echo cancel-
x1 and x2 are obtained from the common source s(n) by fil- lation and gives an overview of nonlinear methods to decor-
tering it with impulse response of remote room G1 and G2 . relate input signals. In section 3, we construct the CPE us-
ing auditory properties of human ear. In section 4, we detail mation of the signals themselves.
the proposed technique. In section 5, we compare the pro-
posed technique over two techniques proposed in [3] and [5]. xi0 (n) = xi (n) + αF [xi (n)], (1)
The results show the improvement of proposed AEC’s versus
these techniques. Then we conclude the paper. where F is the chosen nonlinear function and α is a param-
eter which controls the amount of the added signal.
2. UNIFYING SCHEME FOR INPUT SIGNALS
DECORRELATION 2.2 Perceptual techniques (PT)
The basic idea is to take advantage of human auditory proper-
It has been shown for stereo AEC that the strong correlation ties, namely the simultaneous masking which happens in fre-
between input signals yields to the convergence of adaptive quency domain. In [5], the proposed method is based on the
filters to a solution that does not correctly model the transfer addition of a random noise spectrally shaped to be masked
functions between the loudspeakers and the microphones. In by the presence of the input signal. To achieve the complete
fact the cross correlation matrix of input signals plays a sig- masking of the added noise, the well known noise masking
nificant role in the convergence of adaptive filters to real im- threshold is computed from each input signal. It expresses
pulse responses of the receiving room [2]. The well condi- the maximum level of added noise to be not audible. The
tioned the cross correlation matrix is, the better behaviors of masking threshold serves to shape the added noise.
AEC’s are obtained [4]. If the cross correlation matrix is ill
conditioned, which is the case with high correlated signals, 2.3 Motivation
adaptive filters may converge to non optimal solution.
To make cross correlation matrix well conditioned per- Previous perceptual techniques add a data-dependent or a
mitting to adaptive filters to converge to real impulse re- data-independent signal to partially decorrelate stereophonic
sponses of local rooms with small bias, it is obvious that we input signals. A simple and an intuitive idea arises: is it
need to decorrelate input signals. possible to eliminate some parts of input signals instead of
Fig. 2 represents a unifying scheme to decorrelate input adding another signal? Of course, this elimination must op-
signals. x1 (n) and x2 (n) are non linearly transformed to other erate as a nonlinear operator and must not introduce any au-
signals x10 (n) and x20 (n). The purpose of this transformation dible degradation on input signals. Once again, we must con-
is to make the new signals x10 (n) and x20 (n) non correlated sider human auditory properties but in different manner.
as possible with constraint of no modification of perceptual The basic idea is inspired from our previous works deal-
quality. In the following, we recall some known methods in ing with speech denoising techniques [6] [7], where we pro-
literature according to their usage of perceptual properties. posed a class of perceptual equivalence (CPE). Each signal
having a spectrum belonging to this class is heard as the orig-
x1 (n) x10 (n) inal one. We think that if we consider the class bounds, using
NLT1 the upper bound as a transformed spectrum for the first input
x20 (n)
J
x2 (n) J H1 signal x1 (n) and the lower bound as a transformed spectrum
NLT2 for the second input signal x2 (n), we can reduce the correla-
@ J
:

 tion between them.
Ĥ2 Ĥ1 @J

 H2 @J First of all and before detailing the idea, let us describe
the perceptual class of equivalence, its bounds and its useful-
+l
JJ
e
@
R ^
ŷ(n) ness in the next paragraph.
e(n) y(n)
−n
+ 3. CLASS OF PERCEPTUAL EQUIVALENCE
We aim to find an interval where possible signals belonging
Figure 2: Unifying scheme to decorrelate stereophonic input to it have the same auditive properties as the considered sig-
signals by nonlinear transformations. nal. For such purpose, auditory properties of human ear are
considered. More precisely, the masking concept is used: a
masked signal is made inaudible by a masker if the masked
signal magnitude is below the perceptual masking threshold
2.1 Non perceptual techniques (NPT) MT [8].
The first idea to partially decorrelate the input signals is Using both signal spectrum and its masking threshold,
based on the addition of low level of independent random we look for decision rules to decide on the audibility of a
noise to each channel in order to reduce the coherence be- modified spectrum obtained by adding, subtracting or modi-
tween x10 (n) and x20 (n) when compared to the coherence be- fying some frequency components of the considered signal.
tween x1 (n) and x2 (n) [2]. Another technique consists on These rules will permit to construct the perceptual class of
the modulation of each input signal with an independent ran- equivalence.
dom noise [9]. However, it is shown that even if the level of
added noise is very low, the quality of speech is significantly 3.1 Upper Bound of Perceptual Equivalence UBPE
degraded [4]. The masking threshold is a curve computed in short time fre-
Without perceptual considerations and in order to mini- quency domain from power spectrum of the considered sig-
mize audible degradation, it is preferable to add something nal. It represents for each frequency component, the maxi-
like the original signal. The main idea proposed in [4] is to mum level of added noise to be inaudible. Hence, an addi-
add to each input signal x1 (n) and x2 (n) a nonlinear transfor- tive noise, with power spectrum under MT, will be inaudible.
However, it modifies the power spectrum shape of the consid-
ered signal. It is obvious, that we can add many ‘inaudible’
noises so that we get several shapes of power spectrum with
the same auditive quality as the original signal.
We seek to determine the intervals formed by the power
spectrum of the original speech, as a lower limit, and another
curve, as an upper limit, so that, any modified signal (i.e.,
original signal + inaudible noise) with power spectrum be-
tween the two limits will not impair the perceptual quality of
the original signal.
Since the masking threshold MT represents the maxi-
mum power of an additive inaudible noise, the upper limit
that we look for is the curve that results from adding the
power spectrum of the original signal with its own masking
threshold.
The resulting spectrum is called upper bound of percep-
tual equivalence “UBPE” and is defined as follows [6] [7]

UBPE(m, f ) = Γs (m, f ) + MT (m, f ), (2)


Figure 3: An illustration of UBPE and LBPE (in dB) of a
where m (resp. f ) denotes frames index (resp. frequency speech frame.
index). Γs (m, f ) is the clean speech power spectrum.

3.2 Lower Bound of Perceptual Equivalence LBPE


As illustration, we present in Fig.3 an example of speech
By duality, some attenuations of frequency components can frame power spectrum and its related curves UBPE (upper
be heard as speech distortion of input signal. Thus, by anal- curve in bold line) and LBPE (bottom curve in dash line).
ogy to UBPE, we propose to calculate a second curve which The original speech power spectrum is, for all frequencies
expresses the lower bound under which any attenuation of index, between the two curves UBPE and LBPE.
frequency components is heard as a distortion. We call it
lower bound of perceptual equivalence “LBPE”. To com- 4. PROPOSED TECHNIQUE TO DECORRELATE
pute LBPE, we used the audible spectrum introduced by STEREOPHONIC SIGNALS
Tsoukalas and al for audio signal enhancement [10]. In such
case, audible spectrum is calculated by considering the max- 4.1 Proposed technique principle
imum between the clean speech spectrum and the masking The CPE permits multiple choices of signals with equivalent
threshold. perceptual quality. For our application, we seek to reduce
When speech components are under MT, they are not the correlation between input signals. In other words, our
heard and we can replace them by a chosen threshold purpose is to change temporal forms of input signals as pos-
σ (m, f ). sible such as they become less correlated without any audible
The proposed LBPE is defined as follows [6], [7] degradation. Bringing up the problem to frequency domain,
we can, intuitively, choose the two limits UBPE for the first
Γs (m, f ) if Γs (m, f ) ≥ MT (m, f )

input signal x1 (n) and the LBPE for the second input signal
LBPE(m, f ) = (3) x2 (n). In fact, the two limits lead to the most different sig-
σ (m, f ) otherwise .
nals.
The choice of σ (m, f ) obeys only one condition. It must The proposed nonlinear transformations (NLT) are writ-
be under the masking threshold σ (m, f ) < MT (m, f ). We ten as follows.
choose it equal to the absolute threshold of hearing. It is 
defined as the minimum power of a signal to be audible in NLT1 (x1 ) = UBPEx1 computed from x1 (n)
(4)
absolute silence [8]. NLT2 (x2 ) = LBPEx2 computed from x2 (n)

3.3 Usefulness of UBPE and LBPE 4.2 Reduction of coherence function


Using UBPE and LBPE, we can define three regions char- To mathematically justify our choice, let us recall the expres-
acterizing the perceptual quality of a modified input signal: sion of coherence function which is computed in spectral do-
modified frequency components between UBPE and LBPE main as follows.
are perceptually equivalent to the original input signal com-
ponents. Frequency components above UBPE contain a au- γx1 x2 ( f )
C( f ) = p , (5)
dible noise and frequency components under LBPE are char- γx1 x1 ( f )γx2 x2 ( f )
acterized by speech distortion. Hence, UBPE and LBPE con-
stitute the limits of admissible transparent modification of where γxi xi ( f ) denotes the power spectrum density of xi (for
input signals: every signal having spectral shape included i = 1 and i = 2), γx1 x2 ( f ) denotes the inter-spectral density of
between UBPE and LBPE is perceptually equivalent to the signals x1 and x2 .
original one. The set of these signals forms the class of per- The coherence function measures the similarity in fre-
ceptual equivalence “CPE”. quency domain between two signals. If C( f ) is near 1, it
means that the two signals are highly correlated. In the op- T 
posite case, when C( f ) is close to zero, the two signals are
 
Ĥi (n) − Hi Ĥi (n) − Hi
completely decorrelated. Mi (n) = , (7)
HiT Hi
In our case of acoustic echo cancellation, the significance
of the coherence function is related to the cross correlation where the upper script T denotes the transposition oper-
matrix of input signals. For decorrelated input signals the ator.
cross correlation matrix is well conditioned which leads to Mi (n)expresses the estimation error of the impulse re-
the convergence of adaptive filters to the real impulse re- sponse of the ith path in local room at iteration n.
sponses of the receiving room [4]. • Echo return loss enhancement (ERLE)
In Fig.4, we represent the coherence magnitude (CM) of ERLE represents the attenuation of acoustic echo. It is
original signals and the CM obtained with three considered computed by block of N samples. The ERLE(k) of the
methods: non perceptual technique NPT developed in [3], kth block is given by:
perceptual technique PT developed in [5] and our proposed
technique. The input signals x1 (n) and x2 (n) were obtained kN
by convolving a clean speech with two impulse responses G1
and G2 of length 4096.
∑ y(n)2
n=(k−1)N+1
It can be seen that the coherence function of original sig- ERLE(k) = 10log10 kN
. (8)
nals is high for all frequencies, which corresponds to an ill
conditioned cross-correlation matrix. NPT method reduces
∑ [y(n) − ŷ(n)] 2

n=(k−1)N+1
the CM, but PT method does better. This is an expected re-
sult since PT method takes into account perceptual concepts 5.2 Simulation results
to decorrelate input signals. However, the best reduction is
achieved by our proposed technique, since the CM is much The microphone output signal y(n) in the receiving room
reduced even in the low and medium frequencies. Thus, bet- is obtained by summing the two convolutions (H1 ∗ x1 ) and
ter behavior of adaptive algorithms can be expected with this (H2 ∗ x2 ), where H1 and H2 are impulse responses of receiv-
latter method. ing room, each of length 4096 points and truncated to 512
points. For all of our simulations, we have used the normal-
ized least mean square algorithm (NLMS). To smooth the
curves, misalignment and MSE are averaged over 128 points.
Fig.5 represents the evolution of MSE over the number of
iterations for the considered methods (NPT developed in [3],
PT developed in [5] and our proposed technique). We remark
that the MSE decreases when n increases. The less residual
echo is obtained with the proposed technique. It means that
the echo is more reduced when regarding NPT and PT tech-
niques.
Fig.6 represents the misalignment of considered tech-
niques. Once again, the proposed method is seen to have the
best performances since it greatly reduces the misalignment.
Fig.7 represents the ERLE of considered methods. The
best ERLE is obtained with the proposed method.

6. CONCLUSION

Figure 4: Coherence magnitude comparison using three In this paper, we have developed and tested a new method
methods (NPT technique, PT technique and our proposed based on the use of human auditory properties to improve
technique). behaviors of stereophonic acoustic echo cancelers . Indeed,
two boundaries UBPE and LBPE are constructed in spectral
domain leading to multiple choice of perceptual equivalent
signals. The choice of UBPE for the first input signal and
5. EXPERIMENTAL RESULTS LBPE for the second input signal yields to an efficient reduc-
tion of the coherence of stereophonic signals, improving the
5.1 Evaluation criteria behaviors of AEC. The new technique was compared with
previous proposal and was found to be more efficient.
We wish to evaluate and compare our proposed method over
NPT method and PT method using adequate criteria for
acoustic echo cancellation. REFERENCES
• Minimum square error (MSE) [1] H. Buchner, J. Benesty, and W. Kellermann, “Multichan-
nel Frequency-Domain Adaptive Filtering with Appli-
MSE(n) = 10log10 e(n)2 ,

(6) cation to Multichannel Acoustic Echo Cancellation,” in
Adaptive Signal Processing: Applications to Real-World
where e(n) = y(n) − ŷ(n) is the estimation error. Problems, Berlin: Springer-Verlag, J. Benesty and Y.
• Misalignment (M) Huang, Eds., 2003, ch. 4.
Figure 5: Evolution of MSE for the three considered methods Figure 7: Evolution of ERLE for the three considered meth-
using with 2-NLMS. ods obtained with 2-NLMS.

1998.
[6] A. Ben aicha et S. Ben Jebara, “Caractérisation per-
ceptuelle de la dégradation apportée par les techniques
de débruitage de la parole,” Traitement et Analyse de
l’Information Méthodes et Applications TAIMA, Tunisia,
2007.
[7] A. Ben aicha et S. Ben Jebara, “Quantitative perceptual
separation of two kinds of degradation in speech denois-
ing applications,” Workshop on Non-Llinear Speech Pro-
cessing NOLISP, France, 2007.
[8] T. Painter and A. Spanias, “Perceptual coding of digital
audio,” in Proc. IEEE, vol. 88, pp. 451-513, 2000.
[9] S. Shimauchi and S. Makino, “Stereo projection echo
canceler with true echo path estimation,” in Porc. IEEE
ICASSP, pp. 3059-3062, 1995.
[10] D. E. Tsoukalas, J. N. Mourjopoulos and G. K. Kokki-
nakis, “Speech enhancement based on audible noise sup-
pression,” IEEE Trans. Speech and Audio Processing,
Figure 6: Misalignment of three considered methods ob- vol. 5, no. 6, pp. 497-517, 1997.
tained with 2-NLMS. [11] J. D. Johnston, “Transform coding of audio signal using
perceptual noise criteria,” IEEE J. Select. Areas Com-
mun., vol 6, pp. 314-323, 1988.
[2] M. M. Sondhi and M. R. Morgan, “Stereophonic acous-
tic echo concellation - an overview of fundamental prob-
lem,” IEEE Signal Processing Letters, vol. 2, no. 8 pp.
148-151, 1995.
[3] J. Benesty, D. R. Morgan and M. M. Sondhi, “A bet-
ter undrestanding and an improved solution to the spe-
cific problems of stereophonic aoustic echo cancella-
tion,” IEEE Transaction on Speech and Audio Process-
ing, vol. 6, no. 8, pp. 156-165, 1998.
[4] D. R. Morgan, J. L. Hall and J. Benesty, “Investigation of
several types of nonlinearities for use in stereo acoustic
echo cancellation,” IEEE Trans. Speech and Audio Pro-
cessing, vol. 9, no. 6, pp. 686-696, 2001.
[5] A. Gilloire and V. Turbin, “Using auditory properties
to improve the behaviour of stereophonic acoustic echo
cancellation,” in Porc. IEEE ICASSP, pp. 3681-3684,

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy