Decorrelation of Input Signals For Stereophonic Acoustic Echo Cancellation Using The Class of Perceptual Equivalence
Decorrelation of Input Signals For Stereophonic Acoustic Echo Cancellation Using The Class of Perceptual Equivalence
ABSTRACT d x1 (n)
In communication systems using stereophonic signals,
G1
d x2 (n) J
the high correlation of input signals decreases dramatically
1 JH1
the behavior of acoustic echo cancelers (AEC’s) especially G2 Ĥ2 Ĥ1
S J
H2S J
the misalignment between the estimated impulse response source
and the real impulse response of receiving room. In this pa-
+j S
wJ
S ^
d
e(n) ŷ(n)
per, we focus on the decorrelation of input signals without -n y(n)
+
any modification of auditive quality. Using perceptual prop-
erties, we show that it is possible to find a set of signals which
are perceptually equivalent to input signals, in spite of their
Figure 1: Basic stereophonic acoustic echo canceler.
different spectral and temporal shapes. Hence, the ‘class of
perceptual equivalence’ (CPE) is defined. It is an interval
build in the frequency domain limited by two bounds: the up-
per bound of perceptual equivalence (UBPE) and the lower To improve AEC’s behaviors, many methods are proposed in
bound of perceptual equivalence (LBPE). These two bounds literature to decorrelate the transmitted signals x1 and x2 . We
are used as perceptual transparent non linear transforma- can classify them into two categories according to the usage
tions of input signals in order to decorrelate them. We show or not of the human auditory properties.
experimentally that the improvements yielded by this method The first category groups together decorrelation tech-
are higher than those of classical non linear transformations. niques without perceptual considerations. In [2], an indepen-
dent random noise with low level is added in each channel in
1. INTRODUCTION order to reduce the correlation between them. In [4], a chan-
nel dependent signal is added, it is obtained by a nonlinear
In variety of speech communication systems such as telecon- transformation of each channel.
ferencing systems, acoustic echo cancelers are necessary to In the second category, human perceptual properties are
remove undesirable echo that results from coupling between taken into account when adding to each input signal another
loudspeakers and microphones. Early systems consider the one which should be quasi-white and under the input signal
single channel case. Novel applications such as hands-free masking threshold rendering them inaudible. This technique
communications, home entertainment and virtual reality con- turns out to be more efficient than non perceptual decorrelat-
tribute to the development of multi-channel signal processing ing methods [5].
techniques providing the users with enhanced sound quality. Instead of adding another signal, we propose, in this pa-
In this paper, we consider specifically the stereophonic per, to perceptual modify the input signals in order to decor-
case. Fig.1, represents the conventional acoustic echo can- relate them without perceptible degradation. This fact en-
cellation (AEC) scheme. The transmitted signals from re- sures the perceptual transparency of the proposed transfor-
mote room x1 (n) and x2 (n) are restored in local room by two mation and improve the behaviors of stereophonic AEC’s.
loudspeakers and picked up by two microphones. If nothing In previous works [6] [7], we demonstrate that it is possi-
is done, the picked signal is retransmitted to the remote room ble, for each signal, to construct a class of perceptual equiv-
producing the undesired acoustic echo. alence (CPE). It is defined as an interval where signals have
Conventional AEC’s seek to estimate the echo using the same auditory properties as the original one. This class is
adaptive finite impulse response filters (Ĥ1 , Ĥ2 ) to model the built in the spectral domain with perceptual rules, it is limited
acoustic impulse response of the two echo paths (H1 , H2 ) be- by two boundaries: the low boundary of perceptual equiva-
tween loudspeakers and one of the two microphones. Simi- lence (LBPE) and the upper boundary of perceptual equiva-
lar paths couple to the other microphone, but for the scheme lence (UBPE). This concept introduces a great degree of free-
simplicity, they are not shown. dom to choose signals exciting the adaptive filters which have
In contrast to the case of mono-channel, the problem the same perceptual quality as the original ones and charac-
of stereophonic echo cancellation is much more difficult to terized by low cross-correlation. Hence, this paper aims find-
solve. The reasons why the problem is difficult are well ex- ing and justifying such signals.
plained in [1] [2] and are mainly due to the strong correlation The paper is organized as follows. Section 2 recalls the
of the transmitted signals x1 and x2 . Indeed, the two signals fundamental problem of stereophonic acoustic echo cancel-
x1 and x2 are obtained from the common source s(n) by fil- lation and gives an overview of nonlinear methods to decor-
tering it with impulse response of remote room G1 and G2 . relate input signals. In section 3, we construct the CPE us-
ing auditory properties of human ear. In section 4, we detail mation of the signals themselves.
the proposed technique. In section 5, we compare the pro-
posed technique over two techniques proposed in [3] and [5]. xi0 (n) = xi (n) + αF [xi (n)], (1)
The results show the improvement of proposed AEC’s versus
these techniques. Then we conclude the paper. where F is the chosen nonlinear function and α is a param-
eter which controls the amount of the added signal.
2. UNIFYING SCHEME FOR INPUT SIGNALS
DECORRELATION 2.2 Perceptual techniques (PT)
The basic idea is to take advantage of human auditory proper-
It has been shown for stereo AEC that the strong correlation ties, namely the simultaneous masking which happens in fre-
between input signals yields to the convergence of adaptive quency domain. In [5], the proposed method is based on the
filters to a solution that does not correctly model the transfer addition of a random noise spectrally shaped to be masked
functions between the loudspeakers and the microphones. In by the presence of the input signal. To achieve the complete
fact the cross correlation matrix of input signals plays a sig- masking of the added noise, the well known noise masking
nificant role in the convergence of adaptive filters to real im- threshold is computed from each input signal. It expresses
pulse responses of the receiving room [2]. The well condi- the maximum level of added noise to be not audible. The
tioned the cross correlation matrix is, the better behaviors of masking threshold serves to shape the added noise.
AEC’s are obtained [4]. If the cross correlation matrix is ill
conditioned, which is the case with high correlated signals, 2.3 Motivation
adaptive filters may converge to non optimal solution.
To make cross correlation matrix well conditioned per- Previous perceptual techniques add a data-dependent or a
mitting to adaptive filters to converge to real impulse re- data-independent signal to partially decorrelate stereophonic
sponses of local rooms with small bias, it is obvious that we input signals. A simple and an intuitive idea arises: is it
need to decorrelate input signals. possible to eliminate some parts of input signals instead of
Fig. 2 represents a unifying scheme to decorrelate input adding another signal? Of course, this elimination must op-
signals. x1 (n) and x2 (n) are non linearly transformed to other erate as a nonlinear operator and must not introduce any au-
signals x10 (n) and x20 (n). The purpose of this transformation dible degradation on input signals. Once again, we must con-
is to make the new signals x10 (n) and x20 (n) non correlated sider human auditory properties but in different manner.
as possible with constraint of no modification of perceptual The basic idea is inspired from our previous works deal-
quality. In the following, we recall some known methods in ing with speech denoising techniques [6] [7], where we pro-
literature according to their usage of perceptual properties. posed a class of perceptual equivalence (CPE). Each signal
having a spectrum belonging to this class is heard as the orig-
x1 (n) x10 (n) inal one. We think that if we consider the class bounds, using
NLT1 the upper bound as a transformed spectrum for the first input
x20 (n)
J
x2 (n) J H1 signal x1 (n) and the lower bound as a transformed spectrum
NLT2 for the second input signal x2 (n), we can reduce the correla-
@ J
:
tion between them.
Ĥ2 Ĥ1 @J
H2 @J First of all and before detailing the idea, let us describe
the perceptual class of equivalence, its bounds and its useful-
+l
JJ
e
@
R ^
ŷ(n) ness in the next paragraph.
e(n) y(n)
−n
+ 3. CLASS OF PERCEPTUAL EQUIVALENCE
We aim to find an interval where possible signals belonging
Figure 2: Unifying scheme to decorrelate stereophonic input to it have the same auditive properties as the considered sig-
signals by nonlinear transformations. nal. For such purpose, auditory properties of human ear are
considered. More precisely, the masking concept is used: a
masked signal is made inaudible by a masker if the masked
signal magnitude is below the perceptual masking threshold
2.1 Non perceptual techniques (NPT) MT [8].
The first idea to partially decorrelate the input signals is Using both signal spectrum and its masking threshold,
based on the addition of low level of independent random we look for decision rules to decide on the audibility of a
noise to each channel in order to reduce the coherence be- modified spectrum obtained by adding, subtracting or modi-
tween x10 (n) and x20 (n) when compared to the coherence be- fying some frequency components of the considered signal.
tween x1 (n) and x2 (n) [2]. Another technique consists on These rules will permit to construct the perceptual class of
the modulation of each input signal with an independent ran- equivalence.
dom noise [9]. However, it is shown that even if the level of
added noise is very low, the quality of speech is significantly 3.1 Upper Bound of Perceptual Equivalence UBPE
degraded [4]. The masking threshold is a curve computed in short time fre-
Without perceptual considerations and in order to mini- quency domain from power spectrum of the considered sig-
mize audible degradation, it is preferable to add something nal. It represents for each frequency component, the maxi-
like the original signal. The main idea proposed in [4] is to mum level of added noise to be inaudible. Hence, an addi-
add to each input signal x1 (n) and x2 (n) a nonlinear transfor- tive noise, with power spectrum under MT, will be inaudible.
However, it modifies the power spectrum shape of the consid-
ered signal. It is obvious, that we can add many ‘inaudible’
noises so that we get several shapes of power spectrum with
the same auditive quality as the original signal.
We seek to determine the intervals formed by the power
spectrum of the original speech, as a lower limit, and another
curve, as an upper limit, so that, any modified signal (i.e.,
original signal + inaudible noise) with power spectrum be-
tween the two limits will not impair the perceptual quality of
the original signal.
Since the masking threshold MT represents the maxi-
mum power of an additive inaudible noise, the upper limit
that we look for is the curve that results from adding the
power spectrum of the original signal with its own masking
threshold.
The resulting spectrum is called upper bound of percep-
tual equivalence “UBPE” and is defined as follows [6] [7]
n=(k−1)N+1
the CM, but PT method does better. This is an expected re-
sult since PT method takes into account perceptual concepts 5.2 Simulation results
to decorrelate input signals. However, the best reduction is
achieved by our proposed technique, since the CM is much The microphone output signal y(n) in the receiving room
reduced even in the low and medium frequencies. Thus, bet- is obtained by summing the two convolutions (H1 ∗ x1 ) and
ter behavior of adaptive algorithms can be expected with this (H2 ∗ x2 ), where H1 and H2 are impulse responses of receiv-
latter method. ing room, each of length 4096 points and truncated to 512
points. For all of our simulations, we have used the normal-
ized least mean square algorithm (NLMS). To smooth the
curves, misalignment and MSE are averaged over 128 points.
Fig.5 represents the evolution of MSE over the number of
iterations for the considered methods (NPT developed in [3],
PT developed in [5] and our proposed technique). We remark
that the MSE decreases when n increases. The less residual
echo is obtained with the proposed technique. It means that
the echo is more reduced when regarding NPT and PT tech-
niques.
Fig.6 represents the misalignment of considered tech-
niques. Once again, the proposed method is seen to have the
best performances since it greatly reduces the misalignment.
Fig.7 represents the ERLE of considered methods. The
best ERLE is obtained with the proposed method.
6. CONCLUSION
Figure 4: Coherence magnitude comparison using three In this paper, we have developed and tested a new method
methods (NPT technique, PT technique and our proposed based on the use of human auditory properties to improve
technique). behaviors of stereophonic acoustic echo cancelers . Indeed,
two boundaries UBPE and LBPE are constructed in spectral
domain leading to multiple choice of perceptual equivalent
signals. The choice of UBPE for the first input signal and
5. EXPERIMENTAL RESULTS LBPE for the second input signal yields to an efficient reduc-
tion of the coherence of stereophonic signals, improving the
5.1 Evaluation criteria behaviors of AEC. The new technique was compared with
previous proposal and was found to be more efficient.
We wish to evaluate and compare our proposed method over
NPT method and PT method using adequate criteria for
acoustic echo cancellation. REFERENCES
• Minimum square error (MSE) [1] H. Buchner, J. Benesty, and W. Kellermann, “Multichan-
nel Frequency-Domain Adaptive Filtering with Appli-
MSE(n) = 10log10 e(n)2 ,
(6) cation to Multichannel Acoustic Echo Cancellation,” in
Adaptive Signal Processing: Applications to Real-World
where e(n) = y(n) − ŷ(n) is the estimation error. Problems, Berlin: Springer-Verlag, J. Benesty and Y.
• Misalignment (M) Huang, Eds., 2003, ch. 4.
Figure 5: Evolution of MSE for the three considered methods Figure 7: Evolution of ERLE for the three considered meth-
using with 2-NLMS. ods obtained with 2-NLMS.
1998.
[6] A. Ben aicha et S. Ben Jebara, “Caractérisation per-
ceptuelle de la dégradation apportée par les techniques
de débruitage de la parole,” Traitement et Analyse de
l’Information Méthodes et Applications TAIMA, Tunisia,
2007.
[7] A. Ben aicha et S. Ben Jebara, “Quantitative perceptual
separation of two kinds of degradation in speech denois-
ing applications,” Workshop on Non-Llinear Speech Pro-
cessing NOLISP, France, 2007.
[8] T. Painter and A. Spanias, “Perceptual coding of digital
audio,” in Proc. IEEE, vol. 88, pp. 451-513, 2000.
[9] S. Shimauchi and S. Makino, “Stereo projection echo
canceler with true echo path estimation,” in Porc. IEEE
ICASSP, pp. 3059-3062, 1995.
[10] D. E. Tsoukalas, J. N. Mourjopoulos and G. K. Kokki-
nakis, “Speech enhancement based on audible noise sup-
pression,” IEEE Trans. Speech and Audio Processing,
Figure 6: Misalignment of three considered methods ob- vol. 5, no. 6, pp. 497-517, 1997.
tained with 2-NLMS. [11] J. D. Johnston, “Transform coding of audio signal using
perceptual noise criteria,” IEEE J. Select. Areas Com-
mun., vol 6, pp. 314-323, 1988.
[2] M. M. Sondhi and M. R. Morgan, “Stereophonic acous-
tic echo concellation - an overview of fundamental prob-
lem,” IEEE Signal Processing Letters, vol. 2, no. 8 pp.
148-151, 1995.
[3] J. Benesty, D. R. Morgan and M. M. Sondhi, “A bet-
ter undrestanding and an improved solution to the spe-
cific problems of stereophonic aoustic echo cancella-
tion,” IEEE Transaction on Speech and Audio Process-
ing, vol. 6, no. 8, pp. 156-165, 1998.
[4] D. R. Morgan, J. L. Hall and J. Benesty, “Investigation of
several types of nonlinearities for use in stereo acoustic
echo cancellation,” IEEE Trans. Speech and Audio Pro-
cessing, vol. 9, no. 6, pp. 686-696, 2001.
[5] A. Gilloire and V. Turbin, “Using auditory properties
to improve the behaviour of stereophonic acoustic echo
cancellation,” in Porc. IEEE ICASSP, pp. 3681-3684,