Adaptive Filtering
Adaptive Filtering
Published by InTech
Janeza Trdine 9, 51000 Rijeka, Croatia
Statements and opinions expressed in the chapters are these of the individual contributors
and not necessarily those of the editors or publisher. No responsibility is accepted
for the accuracy of information contained in the published articles. The publisher
assumes no responsibility for any damage or injury to persons or property arising out
of the use of any materials, instructions, methods or ideas contained in the book.
Preface IX
A digital filter is a structure that transforms sequences of numbers to others from its
input to its output (signals) and models thus the behavior of a real system. The model
or transfer function is a simplified mathematical representation of the system. The
structure of the filter consists of a few elements: delays, multipliers, adders and, less
often, functions whose magnitude, combination and number determine its
characteristics. An adaptive filter, however, is able to self-adjust the parameters of
such elements in time (the coefficients of the multipliers for example) according to
certain algorithm and thus the relationship between the input and output sequences to
adapt itself to the changes of the complex system that represents. This update takes
place, usually, by minimizing a cost function in an iterative scheme.
Digital adaptive filters are, therefore, very popular in any implementation of signal
processing where the system modelled and/or the input signals are time-variants; such
as the echo cancellation, active noise control, blind channel equalization, etc.,
corresponding to problems of system identification, inverse modeling, prediction,
interference cancellation, etc.
Any design of an adaptive filter focuses its attention on some of its components:
structure (transversal, recursive, lattice, systolic array, non-linear, transformed
domain, etc.), cost function (mean square error, least squares), coefficient update
algorithm (no memory, block, gradient, etc.); to get certain benefits: robustness,
speed of convergence, misalignment, tracking capacity, computational complexity,
delay, etc.
This information is very interesting not only for all those who work with
technologies based on adaptive filtering but also for teachers and professionals
X Preface
interested in the digital signal processing in general and in how to deal with the
complexity of real systems in particular: non-linear, time-variants, continuous, and
unknown.
1. Introduction
Adaptive filtering algorithms have been widely applied to solve many problems in digital
communication systems [1- 3]. So far, the Least Mean Square (LMS) and its normalized
version (NLMS) adaptive algorithms have been the most commonly adopted approaches
owing to the clarity of the mean-square-error cost function in terms of statistical concept and
the simplicity for computation. It is known that the NLMS algorithm gives better
convergence characteristics than the LMS because it uses a variable step-size parameter in
which the variation is achieved due to the division, at each iteration, of the fixed step size by
the input power. However, a critical issue associated with both algorithms is the choice of
the step-size parameter that is the trade-off between the steady-state misadjustment and the
speed of adaptation. Recent studies have thus presented the idea of variable step-size NLMS
algorithm to remedy this issue [4-7]. Also, many other adaptive algorithms [8, 9] have been
defined and studied to improve the adaptation performance. In this work, the proposed
approach of randomizing the NLMS algorithm’s step-size has been introduced in the
adaptation process of both channel equalisation and system identification, and tested over a
defined communication channels. The proposed random step-size approach yields an
algorithm with good convergence rate and steady state stability.
The objective of this chapter is analyzing and comparing the proposed random step-size
NLMS and the standard NLMS algorithms that were implemented in the adaptation process
of two fundamental applications of adaptive filters, namely adaptive channel equalization
and adaptive system identification. In particular, we focus our attention on the behavior of
Mean Square Error (MSE) of the proposed and the standard NLMS algorithms in the two
mentioned applications. From the MSE performances we can determine the speed of
convergence and the steady state noise floor level. The key idea in this chapter is that a new
and simple approach to adjust the step-size (μ) of the standard NLMS adaptive algorithm
has been implemented and tested. The value of μ is totally controlled by the use of a
Pseudorandom Noise (PRN) uniform distribution that is defined by values from 0 to 1.
Randomizing the step-size eliminates much of the trade-off between residual error and
convergence speed compared with the fixed step-size. In this case, the adaptive filter will
4 Adaptive Filtering
change its coefficients according to the NLMS algorithm in which its step-size is controlled
by the PRN to pseudo randomize the step size. Also this chapter covers the most popular
advances in adaptive filtering which include adaptive algorithms, adaptive channel
equalization, and adaptive system identification.
In this chapter, the concept of using random step-size approach in the adaptation process of
the NLMS adaptive algorithm will be introduced and investigated. The investigation
includes calculating and plotting the MSE performance of the proposed algorithm in system
identification and channel equalization and compares the computer simulation results with
that of the standard NLMS algorithm.
The organization of this chapter is as follows: In Section 2 an overview of adaptive filters
and their applications is demonstrated. Section 3 describes the standard NLMS and the
proposed random step size NLMS algorithms. In Sections 4 the performance analysis of
adaptive channel equalization and adaptive system identification are given. Finally the
conclusion and the list of references are given in Sections 5 and 6, respectively.
In many applications requiring filtering, the necessary frequency response may not be
known beforehand, or it may vary with time. (for example; suppression of engine
harmonics in a car stereo). In such applications, an adaptive filter which can automatically
design itself and which can track system variations in time is extremely useful. Adaptive
filters are used extensively in a wide variety of applications, particularly in
telecommunications. Despite that adaptive filters have been successfully applied in many
communications and signal processing fields including adaptive system identification,
adaptive channel equalization, adaptive interference (Noise) cancellation, and adaptive
echo cancellation, the focus here is on their applications in adaptive channel equalisation
and adaptive system identification.
3. Adaptive algorithms
Adaptive filter algorithms have been used in many signal processing applications [1]. One
of the adaptive filter algorithms is the normalized least mean square (NLMS), which is the
most popular one because it is very simple but robust. NLMS is better than LMS because the
weight vector of NLMS can change automatically, while that of LMS cannot [2]. A critical
issue associated with all algorithms is the choice of the step-size parameter that is the trade-
off between the steady-state misadjustment and the speed of adaptation. A recent study has
presented the idea of variable step-size LMS algorithm to remedy this issue [4].
Nevertheless, many other adaptive algorithms based upon non-mean-square cost function
can also be defined to improve the adaptation performance. For example, the use of the
error to the power Four has been investigated [8] and the Least-Mean-Fourth adaptive
algorithm (LMF) results. Also, the use of the switching algorithm in adaptive channel
equalization has also been studied [9].
General targets of an adaptive filter are rate of convergence and misadjustment. The fast rate
of convergence allows the algorithm to adapt rapidly to a stationary environment of
unknown statistics, but quantitative measure by which the final value of mean-square error
(MSE) is averaged over an ensemble of adaptive filters, deviates from the minimum MSE
more severely as the rate of convergence becomes faster, which means that their trade-off
problem exists.
applies a weight adjustment to the transversal filter so as to minimize MSE value [11]. This
process is repeated for a number of iterations until the filter reaches to a steady-state. In
summary, the purpose of the adaptive system is to filter the input signal u(n) so that it
resembles the desired signal input d(n). The filter could be any type but the most widely
used is the N tap FIR filter because of its simplicity and stability.
d(n
y(n)
u(n)
Transversal Filter
e(n)
Adaptive Weight
Control Mechanism
(2)
d(n) = w H ( n + 1)u(n )
The method of the lagrange multiplier is used to solve this problem as:
(3)
1
w(n + 1) = w(n) + λ u(n)
2
The unknown multiplier, λ, can be obtained by substituting (3) into (2):
(4)
2 e(n)
λ= 2
u(n)
Convergence Evaluation of a Random Step-Size
NLMS Adaptive Algorithm in System Identification and Channel Equalization 7
Then, combining (3) and (4) to formulate the optimal value of the incremental change,
δw(n+1), we obtain:
μ (6)
w(n + 1) − w(n) = 2
u(n) e(n)
u(n)
Equation (6) can be written as:
μ (7)
w(n + 1) = w(n) + 2
u(n) e(n)
α + u(n)
Where the constant α is added to the denominator to avoid that w(n+1) cannot be bounded
when the tap-input vector u(n) is too small.
3.1.3 Step-size
The stability of the NLMS algorithm depends on the value of its step-size, and thus its
optimization criterion should be found [12]. The desired response has been set as follows:
ε ( n) = w − w(n) (9)
μ (10)
ε (n + 1) = ε (n) − 2
u(n)e(n)
u(n)
To study the stability performance of adaptive filters, the mean-square deviation may be
identified [11].
2
ξ (n) = E[ ε ( n) ] (11)
e(n) 2 (n)e(n)
ξ (n + 1) − ξ (n) = μ 2 E 2
− 2 μ E Re u 2
(12)
u(n) u(n)
The bounded range of the normalized step-size parameter can be found from (12) as:
8 Adaptive Filtering
0<μ<2
{
Re E u ( n )e( n ) / u( n )
2
} (14)
E e( n ) / u( n)
2 2
For the case of real-valued data, the following equation can be used:
E u ( n ) = E u( n ) ξ ( n )
2 2
(15)
Substituting (15) into (14) yields:
E u(n) ξ (n)
2
0<μ<2 (16)
E e( n)
2
or
PN[n] μ
w[n + 1] = w[n] + e(n) u(n) (18)
α +||u[n]||2
Where w(n) is the previous weight of the filter and w(n+1) is the new weight.
The step-size µ directly affects how quickly the adaptive filter will converge toward the
unknown system. If µ is very small, then the coefficients change only a small amount at each
update, and the filter converges slowly. With a larger step-size, more gradient information is
included in each update, and the filter converges more quickly; however, when the step-size
is too large, the coefficients may change too quickly and the filter will diverge. (It is possible
in some cases to determine analytically the largest value of µ ensuring convergence). In
summary, within that margin given in (17), the larger μ the faster the convergence rate is but
less stability around the minimum value. On the other hand, the smaller μ the slower the
convergence rate but will be more stable around the optimum value.
4. Performance analysis
4.1 Adaptive channel equalization
Adaptive channel equalization in digital communication systems is perhaps the most
heavily exploited area of application for adaptive filtering algorithms. Adaptive filtering
Convergence Evaluation of a Random Step-Size
NLMS Adaptive Algorithm in System Identification and Channel Equalization 9
algorithms have been widely applied to solve the problem of channel equalization in digital
communication systems. This is because, firstly, the adaptation of tap weights of an
equalizer is necessary to perform channel equalization tasks successfully and secondly, the
development of an adaptive algorithm that allows fast weight modification, while
improving the estimation performance of the equalizer, will enhance the capability of such
equalization systems in real applications.
Telephone Output
Input Signal + Equalizer Slicer
Channel Signal
AWGN
d(n)
y(n)
u(n)
Adaptive Filter
e(n)
Adaptive
Algorithm
In a transversal adaptive filter, the input vector Un and the weight vector Wn at the time of
nth iteration are defined as follows:
Wn = [ w0 , w1 ,..., wM −1 ]T (23)
Where un is the filter input and wi, (i=0, 1, …, M-1) is the weight vector which corresponds
to the filter length. The filter output is obtained as follows:
yn = WnTUn (24)
the output of the adaptive filter and the output of the unknown system. On the basis of
this measure, the adaptive filter will change its coefficients in an attempt to reduce the
error.
Hence the error signal, e(n), involved in the adaptive process, is defined by:
v(n)
d(n)
w(n) Σ
u(n) e(n)
Σ
w^(n)
y(n)
set to 12 for both minimum phase, H1(z), and non-minimum phase, H2(z), channels.
The step size for the NLMS algorithm was chosen from (0.01) to (0.1) and the number of
transmitted bits equals to 2500 bits. The comparison between the two algorithms,
the standard NLMS and the pseudo random step size NLMS, is done by first choosing
the best step size that gives fast convergence and then uses this step size for the
comparison.
Figure 5, shown below, shows that the step size with a value of 0.05 gives the
fast convergence rate over the minimum phase channel (CH-1) while over the
non-minimum phase channel (CH-2) the step size with a value of 0.01 gives the fast
convergence rate. The comparison also looks at the effects of decreasing the SNR from 25
dB to 5 dB. While Figures 6 – 8, shown below, show the performances of the mean square
error against the number of iterations for various values of signal-to-noise ratios using
non-linear transversal equalizer over the minimum phase channel. From these figures it is
clear that the speed of convergence has approximately the same convergence rate for both
algorithms but the ground noise floor level decreases when the SNR decreases in the case
of using the random step-size NLMS algorithm. The same conclusion has been noticed in
the case of using the non-minimum phase channel that defined in (32) above. This is due
to that the step-size parameter of the proposed random NLMS algorithm is designed
based on utilizing the random distribution which made the error sequence accelerates the
level of the noise floor to a much lower value compared to that of the standard NLMS
algorithm with a fixed step-size value. Results for CH-2 are not included here due to
space limitation.
Fig. 5. MSE performances of the NLMS for various step-sizes over CH-1
Convergence Evaluation of a Random Step-Size
NLMS Adaptive Algorithm in System Identification and Channel Equalization 13
Fig. 6. MSE performances of the two algorithms for SNR=15dB over CH-1
Fig. 7. MSE performances of the two algorithms for SNR=10 dB over CH-1
14 Adaptive Filtering
Fig. 8. MSE performances of the two algorithms for SNR=5 dB over CH-1
-15
MSE (dB)
-20
-25
-30
-35
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of iterations
Fig. 9. MSE performance of the NLMS algorithm for various step-sizes (mu)
-12
MSE (dB)
-14
-16
-18
-20
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of iterations
Fig. 10. MSE performance of the NLMS algorithm for various step-sizes (mu)
16 Adaptive Filtering
-8
-10
-12
MSE (dB)
-14
-16
-18
-20
-22
-24
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of Iterations
Fig. 11. MSE performances of fixed and random step-size NLMS algorithms
-10
-12
MSE(dB)
-14
-16
-18
-20
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of Iterations
Fig. 12. MSE performances for fixed and random step-size NLMS algorithms
Convergence Evaluation of a Random Step-Size
NLMS Adaptive Algorithm in System Identification and Channel Equalization 17
-10
MSE (dB)
-12
-14
-16
-18
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of iterations
Fig. 13. MSE performances for fixed and random step-size NLMS algorithms
5. Conclusion
The proposed idea of using random step-size approach in the adaptation of the NLMS
algorithm has been implemented and tested in the adaptation process of both channel
equalization and system identification. The tests measure the MSE performance of using
both the standard NLMS and the random step-size NLMS algorithms in the above
mentioned applications. An extensive investigation, to determine the NLMS algorithm’s
best fixed step size, has been carried out over the defined channels. And a comparison
between the performances of using the NLMS with a fixed step-size and a pseudo random
step-size approaches has been carried out which shows the trade off between the
convergence speed and the noise floor level.
It can be concluded, in the case of adaptive channel equalization, that the performance of
using the random step-size outperforms the performance of the fixed step-size by achieving
much lower ground noise floor, especially at low signal-to-noise ratios while maintaining a
similar convergence rate.
In the case of adaptive system identification, the performance of using the random step-size
outperforms the performance of that of fixed step-size NLMS adaptive algorithm and
achieved lower ground noise floor, especially at larger step-sizes.
The values of the step-size parameter of the proposed random NLMS algorithm are based
on utilizing the random uniform distribution which made the error sequence accelerates the
MSE’s level of the noise floor to a much lower value compared to that of the standard NLMS
algorithm with a fixed step-size value.
18 Adaptive Filtering
6. Acknowledgment
The author would like to acknowledge the contributions of Mr. N. Al-Saeedi [Ref. 13]
towards getting some results related to the channel equalization and Dr. T. Shimamura
[Ref. 14] for his contribution towards discussing the idea of using the random step-size
approach in system identification.
7. References
[1] Simon Haykin,” Adaptive Filter Theory”, Prentice- hall, 2002
[2] C.F.N.Cowan and P.M.Grant, Adaptive Filters, Prentice Hall, Englewood Cliffs, 1985.
[3] B.Widrow and S.D.Stearns, Adaptive Signal Processing, Prentice Hall, Englewood Cliffs,
NJ, 1988.
[4] A. I. Sulyman and A. Zerguine, Convergence and steady-state analysis of a variable step-size
NLMS algorithm, ELSEVIER Signal Processing, Vol. 83, , pp. 1255-1273, 2003.
[5] H. Takekawa. T. Shimamura, and S. A. Jimaa, “An efficient and effective variable step
size NLMS algorithm”, 42nd Asilomar conference on signals, systems and computers,
CA, USA, October 26-29, 2008.
[6] Ling Quin and M. G. Bellanger,” Convergence analysis of a variable step-size normalized
adaptive filter algorithm”, Proc. EUSIPCO, Adaptive Systems, PAS.4, 1996.
[7] Y. Wang, C. Zhang, and Z. Wang,“ A new variable step-size LMS algorithm with
application to active noise control“, Proc. IEEE ICASSP, Vol. 5, pp. 573-575, 2003.
[8] E.Walach and B.Widrow, The Least Mean Fourth (LMF) Adaptive Algorithm and its
Family" IEEE Transactions on Information Theory, 30(2), 275-283, 1984.
[9] S.A. Jimaa, C.F.N. Cowan and M.J.J. Holt," Adaptive Channel Equalisation Using Least
Mean Switched Error Norm Algorithm", IEE 16th Saraga Colloquium on Digital and
Analogue Filters and Filtering Systems, London, 9 Dec. 1996.
[10] Y. Wang, C. Zhang, and Z. Wang, “A new variable step-size LMS Algorithm with
application to active noise control”, Proc. IEEE ICASSP, pp. 573-575, 2003.
[11] T. Arnantapunpong, T. Shimamura, and S. A. Jimaa,” A New Variable Step Size for
Normalized LMS Algorithm”, NCSP'10 - 2010 RISP International Workshop on
Nonlinear Circuits, Communications and Signal Processing, Honolulu, Hawaii, USA
March 3-5, 2010.
[12] Simon Haykin, Adaptive Filter Theory, Prentice- hall, 3rd Edition, 1996
[13] S. A. Jimaa, N. Al Saeedi, S. Al-Araji, and R. M. Shubair, Performance Evaluation of
Random Step-Size NLMS in Adaptive channel equalization, IFIP Wireless days
conference, Dubai, UAE, November 2008.
[14] S. A. Jimaa and T. Shimamura, Convergence Evaluation of a Random Step-Size NLMS
Adaptive Algorithm in System Identification, The 10th IEEE International Conference
on Signal Processing ICSP 2010, Beijing, China, Oct.24-28, 2010.
2
Steady-State Performance
Analyses of Adaptive Filters
Bin Lin and Rongxi He
College of Information Science and Technology,
Dalian Maritime University, Dalian,
China
1. Introduction
Adaptive filters have become a vital part of many modern communication and control
systems, which can be used in system identification, adaptive equalization, echo cancellation,
beamforming, and so on [l]. The least mean squares (LMS) algorithm, which is the most
popular adaptive filtering algorithm, has enjoyed enormous popularity due to its simplicity
and robustness [2] [3]. Over the years several variants of LMS have been proposed to
overcome some limitations of LMS algorithm by modifying the error estimation function from
linearity to nonlinearity. Sign-error LMS algorithm is presented by its computational
simplicity [4], least-mean fourth (LMF) algorithm is proposed for applications in which the
plant noise has a probability density function with short tail [5], and the LMMN algorithm
achieves a better steady state performance than the LMS algorithm and better stability
properties than the LMF algorithm by adjusting its mixing parameter [6], [7].
The performance of an adaptive filter is generally measured in terms of its transient
behavior and its steady-state behavior. There have been numerous works in the literature on
the performance of adaptive filters with many creationary results and approaches [3]-[20]. In
most of these literatures, the steady-state performance is often obtained as a limiting case of
the transient behavior [13]-[16]. However, most adaptive filters are inherently nonlinear and
time-variant systems. The nonlinearities in the update equations tend to lead to difficulties
in the study of their steady-state performance as a limiting case of their transient
performance [12]. In addition, transient analyses tend to require some more simplifying
assumptions, which at times can be restrictive. Using the energy conservation relation
during two successive iteration update , N. R. Yousef and A. H. Sayed re-derived the
steady-state performance for a large class of adaptive filters [11],[12], such as sign-error LMS
algorithm, LMS algorithm, LMMN algorithm, and so on, which bypassed the difficulties
encountered in obtaining steady-state results as the limiting case of a transient analysis.
However, it is generally observed that most works for analyzing the steady-state
performance study individual algorithms separately. This is because different adaptive
schemes have different nonlinear update equations, and the particularities of each case tend
to require different arguments and assumptions. Some authors try to investigate the steady-
state performance from a general view to fit more adaptive filtering algorithms, although
that is a challenge task. Based on Taylor series expansion (TSE), S. C. Douglas and T. H.
Meng obtained a general expression for the steady-state MSE for adaptive filters with error
20 Adaptive Filtering
nonlinearities [10]. However, this expression is only applicable for the cases with the real-
valued data and small step-size. Also using TSE, our previous works have obtained some
analytical expressions of the steady-state performance for some adaptive algorithms [8],
[17], [19], [28]. Using the Price’s theory, T. Y. Al-Naffouri and A. H. Sayed obtained the
steady-state performance as the fixed-point of a nonlinear function in EMSE [11], [18]. For a
lot of adaptive filters with error nonlinearities, their closed-form analytical expressions can
not be obtained directly, and the Gaussion assumption condition of Price’s theory is not
adaptable for other noise. Recently, as a limiting case of the transient behavior, a general
expression of the steady state EMSE was obtained by H. Husøy and M. S. E. Abadi [13].
Observing from the Table 1 in [13], we can see that this expression holds true only for the
adaptive filters with most kinds of the preconditioning input data, and can not be used to
analyze the adaptive filters with error nonlinearities.
These points motivate the development in this paper of a unified approach to get their general
expressions for the steady-state performance of adaptive filters. In our analyses, second-order
TSE will be used to analyze the performance for adaptive algorithms for real-valued cases. But
for complex-valued cases, a so-called complex Brandwood-form series expansion (BSE), derived by
G. Yan in [22], will be utilized. This series expansion is based on Brandwood’s derivation
operators [21] with respect to the complex-valued variable and its conjugate, and was used to
analyze the MSE for Bussgang algorithm (BA) in noiseless environments [19], [20]. Here, the
method is extended to analyze other adaptive filters in complex-valued cases.
1.1 Notation
Throughout the paper, the small boldface letters are used to denote vectors, and capital
boldface letters are used to denote matrices, e.g., w i and R u . All vectors are column vectors,
except for the input vector u i , which is taken to be a row vector for convenience of notation.
In addition, the following notations are adopted:
Euclidean norm of a vector; Tr Trace of a matrix;
E Expectation operator; Re The real part of a complex-valued data;
I M M M M Identity matrix; Complex conjugation for scalars;
! Factorial; !! Double factorial;
f x a 1th derivative of the function f x with respect to x at the value a ;
1
f x , y a , b 2th partial derivative of the function f x , y with respect to x and y at the value
2
a, b 1;
C i D The set of all functions for which f x is continuous in definition domain D for
i
wi 1 wi g ui uH
,
i f ei , ei
(1)
2 If e is complex-valued, the estimation error function f ( e , e ) has two independent variables: e and
ei di u i w i , (2)
di u i w o ,i vi , (3)
where
step-size; u i 1 M row input (regressor) vector;
H conjugate and transpose; w i M 1 weight vector;
ei scalar-valued error signal; di scalar-valued noisy measurement;
g u i scalar variable factor for step-size.
w o ,i M 1 unknown column vector at constant i that we wish to estimate;
vi accounts for both measurement noise and modeling errors, whose support region is Dv .
f ei , ei memoryless nonlinearity function acting upon the error ei and its complex
conjugate ei . Different choices for f ei , ei result in different adaptive algorithms. For
example, Table 1 defines f ei , ei for many well-known special cases of (1) [10]-[12].
The rest of the paper is organized as follows. In the next section, the steady-state
performances for complex and real adaptive filters are derived, which are summarized in
Theorem 1 based on separation principle and Theorem 2 for white Gaussian regressor,
respectively. In section 3, based on Theorem 1 and Theorem 2, the steady-state performances
for the real and complex least-mean p-power norm (LMP) algorithm, LMMN algorithm and
their normalized algorithms, are investigated, respectively. Simulation results are given in
Section 4, and conclusions are drawn in Section 5.
LMMN
ei 1 ei
2
- NLMP ei
p2
ei u 2
- LMMN ei 1 ei
2
u 2
Notes:
1. The parameter p is the order of the cost function of LMP algorithm, which includes LMS ( p 2 ),
LMF ( p 4 ) algorithms.
2. The parameter , such that 0 1 , is the mixing paramter of LMMN algorithms. 1
results in the LMS algorithm and 0 results in the LMF algorithm.
3. The parameter of - NLMP algorithm or - LMMN algorithm is a small positive real value.
Table 1. Examples for the estimation error
ei ea i vi . (4)
22 Adaptive Filtering
2
The steady-state MSE for an adaptive filter can be written as MSE lim E ei . To get
i
MSE , we restrict the development of statistical adaptive algorithm to a small step-size, long
filter length, an appropriate initial conditions of the weights and finite input power and
noise variance in much of what follows3, which is embodied in the following two
assumptions:
A.1: The noise sequence vi with zero-mean and variance v2 is independent identically
distributed (i.i.d.) and statistically independent of the regressor sequence u i .
A.2: The a priori estimation error ea i with zero-mean is independent of vi . And for
complex-valued cases, it satisfies the circularity condition, namely, Eea2 i 0 .
The above assumptions are popular, which are commonly used in the steady-state
performance analyses for most of adaptive algorithms [11]-[14]. Then, under A.1 and A.2,
the steady-state MSE can be written as MSE v2 , where is the steady-state EMSE,
defined by
2
lim E ea i . (5)
i
w o ,i 1 w o , i qi , (6)
i g ui uH
i 1 w
w
i f ei , ei q i (7)
i1
w
2
w
i 2
H
w H
i u i g u i f ei , ei u i w
i g u i f ei , ei
2 2
g ui i g u i f ei , ei
2
2 ui f ei , ei iHqi qiH w
w i qiH u H (8)
u iqi g u i f ei , ei qi 2
3 As described in [25] and [26], the convergence or stability condition of an adaptive filter with error
nonlinearity is related to the initial conditions of the weights, the step size, filter length, input power
and noise variance. Since our works mainly focus on the steady-state performances of adaptive filters,
the above conditions are assumed to be satisfied.
Steady-State Performance Analyses of Adaptive Filters 23
Taking expectations on both sides of the above equation and using A.3 and ea i u i w
i , we
get
i1
E w
2
E w
i
E ea i g u i f ei , ei E ea i g u i f ei , ei
2
2 . (9)
2 2
2 E u i g u i f ei , ei E qi
2
2 2
At steady state, the adaptive filters hold lim E w i 1 lim E w
i [11], [12]. Then, the
i i
variance relation equation (9) can be rewritten compactly as
2 2
2 Re E ea g u f e , e E u g u Tr Q .
2
f e , e 1
(10)
where Tr Q E qi , and the time index ‘i’ has been omitted for the easy of reading.
2
Specially, in stationary environments, the second term in the light-hand side of (10) will be
removed since qi is a zero sequence (i.e., Tr Q 0 ).
This assumption is referred to as the separation principle in [11]. Under the assumptions A.2
and A.4, and using (4), we can rewrite (10) as
uE e , e uEq e , e 1Tr Q (11)
where
u E u g u
2
u Eg u ,
2
. (12)
q e, e f e, e
2
e , e 2 Re ea f e , e ,
2
e , e v , v 2 Re f e 1 v , v , q e , e v , v f e v , v
2 2 1
v , v 0,
. (13)
2
f e v , v 2 Re f v , v f e, e v , v
1 2
The proofs of Lemma 1 and all subsequent lemmas in this paper are given in the APPENDIXS.
4
Since e and e are assumed to be two independent variables, all f e , e in Table 1 can be
considered as a ‘real’ function with respect to e and e , although f e , e may be complex-valued.
Then, the accustomed rules of derivative with respect to two variables e and e can be used directly.
24 Adaptive Filtering
2
v 0, e 2, e v 4 f e 1 v , q e , e v 2 f e v 2 f v f e, e v .
2 1 2
(14)
C u
EMSE (15)
A u B u
1Tr Q C u
TEMSE (16)
A u B u
2
BTr Q Tr Q BTr Q
opt (17)
AC u C u AC u
where
1
,
2 2
A 2 Re Ef e v , v , B E f e v , v f e v , v 2 Re f v , v f e, e v , v
1 1 2
(18a)
2
C E f v, v
2
A 2Ef e v , B E f e v E f v f e, e v , C E f v
1 1 2 2
(18b)
(19)
v , v e v , v e 2 v , v e
1 2 2 2
2 2 2
2
e,e a e , e a e , e a O ea , ea
where O ea , ea denotes third and higher-power terms of ea or ea . Ignoring O ea , ea 6, and
taking expectations of both sides of the above equation, we get
E e , e E v , v E e v , v ea E e v , v ea
1
1
. (20)
1 2
1
2
E e , e v , v ea2 E e , e v , v ea E e , e v , v ea
2 2 2
2 2
Under A.2, (i.e. v , ea are mutually independent, and Eea Eea2 0 ), we obtain
E e , e =E v , v E e , e v , v TEMSE
2
(21)
where TEMSE is defined by (5). Here, to distinguish two kinds of steady-state EMSE, we use
different subscripts for , i.e., EMSE for steady-state MSE and TEMSE for tracking
performance. Similarly, replacing e , e in (20) by q e , e and using A.2, we get
Eq e , e Eq v , v Eq e , e v , v TEMSE .
2
(22)
E 2 v , v Eq 2 v , v
u e , e u e ,e
TEMSE uE v , v uEq v , v Tr Q . (23)
1
Under Lemma 1, the above equation can be rewritten as
1
e v e 1 v ea e 2, e v ea2 O ea (25)
2
6 At steady-state, since the a priori estimation error ea becomes small if step size is small enough,
ignoring O ea , ea
is reasonable, which has been used in to analyze the steady-state performance for
adaptive filters [11], [12], [19], [20].
7 The restrictive condition C.1 can be used to check whether the expressions (15) - (17) are able to be
used for a special case of adaptive filters. In the latter analyses, we will show that C.1 is not always
satisfied for all kinds of adaptive filters. In addition, due to the influences of the initial conditions of the
weights, step size, filter length, input power, noise variance and the residual terms O ea , ea
having
been ignored during the previous processes, C.1 can not be a strict mean square stability condition for
an adaptive filter with error nonlinearity.
26 Adaptive Filtering
1
E e E v E e v ea E e , e v ea2
1 2
(26)
2
1
E e E v E e , e v TEMSE
2
(27)
2
where TEMSE is defined by (5). Similarly, replacing e in (26) by q e and using A.2, we
get
1 2
Eq e Eq v Eq e , e v TEMSE (28)
2
Substituting (27) and (28) into (11), and using Lemma 2, we can obtain (24), where
parameters A, B, C are defined by (18b). Then, if the condition C.1 is satisfied, we can obtain
(16) for real-valued cases.
In stationary environments, letting Tr Q 0 in (16), we can obtain (15) for the steady-state
EMSE, i.e., EMSE .
Finally, Differentiating both-hand sides of (16) with respect to , and letting it be zero, we
get
Tr Q C u
1
TEMSE 0. (29)
opt A u Bu
opt
2 2 BTr Q Tr Q
opt opt 0. (30)
A uC C u
Solving the above equality, we can obtain the optimum step-size expressed by (17). Here, we
use the fact 0 . This ends the proof of Theorem 1.
Remarks:
1. Substituting (17) into (16) yields the minimum steady-state TEMSE.
2. Observing from (18), we can find that the steady-state expressions of (15) ~ (17) are all
second-order approximate.
3. In view of the step-size being very small, B u A u , and the expressions
(15) ~ (17) can be simplified to
uC (31)
EMSE ,
A u
1Tr Q C u
TEMSE , (32)
A u
Steady-State Performance Analyses of Adaptive Filters 27
Tr Q
opt (33)
C u
2
min C u Tr Q . (34)
A u
In addition, since B u in the denominator of (15) has been ignored, C.1 can be simplified
to A 0 , namely Re Ef e v , v 0 for complex-valued data cases, and Ef e v 0 for
1 1
real-valued data cases, respectively. Here, the existing condition of the second-order partial
derivative of f e , e can be weakened, i.e., f e , e C 1 Dv .
4. For fixed step-size cases, substituting g u 1 into (12), we get
Tr R u .
2
u 1, u E u (35)
Substituting (35) into (31) yields EMSE CTr R u A . For the real-valued cases, this
expression is the same as the one derived by S. C. Douglas and T. H.-Y. Meng in [10] (see
e.g. Eq. 35). That is to say, Eq. 35 in [10] is a special case of (15) with small step-size,
g u 1 , and real-valued data.
R u u2 I M M . (36)
Under the following assumption (see e.g. 6.5.13) in [11] at steady state, i.e.,
is independence of u ,
A.5 w
2
the term E u q e , e that appears in the right-hand side of (10) can be evaluated
explicitly without appealing to the separation assumption (e.g. A.4), and its steady-state
EMSE for adaptive filters can be obtained by the following theorem.
Theorem 2-Steady-state performance for adaptive filters with white Gaussian regressor: Consider
adaptive filters of the form (1) – (3) with white Gaussian regressor and g u i 1 , and
suppose the assumptions A.1 – A.3, and A.5 are satisfied. In addition, f e , e C 2 Dv .
Then, if the following condition is satisfied, i.e.,
C.2 A B M u2 ,
the steady-state EMSE, TEMSE and the optimal step-size for adaptive filters can be
approximated by
CM u2 , (37)
EMSE
A B M u2
1Tr Q MC u2
TEMSE , (38)
A B M u2
28 Adaptive Filtering
2
B M Tr Q Tr Q B M Tr Q
opt , (39)
AMC MC u2 AMC
where 1 , A, B and C are defined by (18a) for complex-valued data, and 2 , A, B and
C are defined by (18b) for real-valued data, respectively.
The proofs of Theorem 2 is given in the APPENDIX D.
For the case of being small enough, the steady-state EMSE, TEMSE, the optimal step-size,
and the minimum TEMSE can be expressed by (31) ~ (33), respectively, if we replace
Tr R u by M u2 and g u i 1 . That is to say, when the input vector u is Gaussian with a
diagonal covariance matrix (36), the steady-state performance result obtained by separation
principle coincides with that under A.5 for the case of being small enough.
f e e p 1 e
1 p2
(41a)
f e, e e p 1 p 2 e
2 p4
e
f e e , e e
1 p p2
2
p 2 p 4 2 .
1
f e e , e
2
e e (41b)
p p 2 p4
f e, e e , e
2
4
e e
Steady-State Performance Analyses of Adaptive Filters 29
in complex-valued cases, respectively. Substituting (41) into (18a) and (42) into (18b),
respectively, we get
A a vp 2
B b v2 p 4 (42)
C v2 p 2
k
where vk E v denote the k-order absolute moment of v , and
a 2 p 1 , b p 1 2 p 3 real-valued cases
(43)
a p , b p 1
2
complex-valued cases
b u 2 p 2
vp 2 v , (44)
a u
and the steady-state performance for real LMP algorithms can be written as
u v2 p 2
EMSE . (45a)
a u vp 2 b u v
2 p2
u v2 p 2 1Tr Q
TEMSE (45b)
a u vp 2 b u v
2 p2
2
b v2 p 4 Tr Q
b v2 p 4 Tr Q Tr Q
opt p2 2 p2
p2 2 p2 2 p2 . (45c)
a v v u a v v u v u
Similarly, substituting (42) into Theorem 2, we can also obtain the corresponding
expressions for the steady-state performance of LMP algorithms with white Gaussian
regressor.
Example 1: For LMS algorithm, substituting p 2 and (35) into (45a) ~ (45c), and
substituting (42) and p 2 into Theorem 2, yield the same steady-state performance results
(see e.g. Lemma 6.5.1 and Lemma 7.5.1) in [11]. For LMF algorithm, substituting p 4 and
(34) into (45a) ~ (45c), and substituting (42) and p 4 into Theorem 2, yield the same
steady-state performance results (see e.g. Lemma 6.8.1 and Lemma 7.8.1 with 0 8) in [11].
That is to say, the results of Lemma 6.5.1, Lemma 7.5.1, Lemma 6.8.1 and Lemma 7.8.1 in [11]
are all second-order approximate.
Example 2: Consider the real-valued data in Gaussian noise environments. Based on the
following formula, described in [23]
8 The parameters a , b , c in (44)-(45) are different from those in Lemma 6.8.1 and Lemma 7.8.1 in [11].
30 Adaptive Filtering
k 1 !! vk k:even
vk 2k k 1 , (46)
k
! v k:odd
2
A 2 p 1 p 3 !! vp 2
B p 1 2 p 3 !! v2 p 4 p: even
C 2 p 3 !! v2 p 2
. (47)
A 2 p p 1 p 3 2 ! vp 2
B p 1 2 p 3 !! v2 p 4 p: odd
C 2 p 3 !! v2 p2
Then, substituting (47) into Theorem 1 and Theorem 2 or substituting (46) into (45a) - (45c),
yield the steady-state performance results for real LMP algorithm in Gaussian noise
environments. Here, we only give the expression for EMSE
2 p 3 !! vp u
, p:even
2 p 1 p 3 !! u p 1 2 p 3 !! v u
p2
EMSE 2 p 3 !! vp u . (48)
p , p:odd
2 p3
p 1 2 ! u p 1 2 p 3 !! v u
p2
k
vk . (49)
k1
Substituting the above equation into (42), we get
A 2 p2
B p 1 2 p 4 . (50)
2 p2
C
2p 1
Then, substituting (50) into Theorem 1 and Theorem 2 yields the steady-state performance
for real LMP algorithm in uniformly distributed noise environments. Here, we also only
give the EMSE expression, expressed by
Steady-State Performance Analyses of Adaptive Filters 31
p u
EMSE . (51)
2 2 p 1 u 2 p 1 p 1 p 2 u
k k
k ! v k:even
vk E v 2 , (52)
0 k:odd
(42) becomes
p 2 p2
A p ! v
2
B p 1 p 1 ! v2 p 4 . (53)
C p 1 ! v2 p 2 ,
for even p. Then, substituting (53) into Theorem 1 and Theorem 2 or substituting (52) into
(45a) ~ (45c), we can obtain the steady-state performances for complex LMP algorithms with
even p in Gaussian noise environments. For instance, the EMSE expression can be written as
p 1 ! vp u
EMSE . (54)
p2
! u p 1 p 1 ! v u
p2
p
2
But for odd p, substituting (40) and (52) into (18a) yields A 0 , which leads to the
conditions C.1 and C.2 being not satisfied again. That is to say, the proposed theorems are
unsuitable to analyze the steady-state performances in this case.
Example 5: Tracking performance comparison with LMS
We now compare the ability of the LMP algorithm with p 2 to track variations in
nonstationary environments with that of the LMS algorithm. The ratio of the minimum
achievable steady-state EMSE of each of the LMS algorithm is used as a performance
measure. In addition, the step-size of this minimum value is often sufficient small, which
leads to that (34) can be used directly. Substituting (42) into (34), we obtain the minimum
TEMSE for LMP algorithm, expressed as
LMP v2 p 2 u Tr Q
min . (55)
u vp 2
where 2 p for complex-valued cases, and 1 p 1 for real-valued cases. Then the
LMS
ratio between min v Tr R u Tr Q (which can be obtained by substituting p 2 and
LMP
(35) into (55)) and min can be written as
LMS
min u vp 2 v Tr R u
. (56)
LMP
min u v2 p 2
32 Adaptive Filtering
For the case of LMF algorithm, substituting p 4 and (35) into (56), we can obtain the same
result (see e.g. Eq.7.9.1) in [11].
f e , e e e
2
, (57)
1
f e e , e 2 e
2
f e, e e
1 2
e
(58a)
f e , e 2 e
2
e , e
f e e 3 e 2
1
. (58b)
f e, e e 6 e
2
for real-valued cases, respectively. Substituting (58a) into (18a), or substituting (58b) into
(18b), respectively, we get
A 2 k0 v2
B k1 v2 k2 2 v4
2
. (59)
C 2 v2 2 v4 2 v6
u
u 2 k1 v2 k2 2 v4 , (60)
2 k0 v2
and the steady-state performance for LMMN algorithms (here, we only give the expression
for EMSE) can be written as
EMSE
u 2 v2 2 v4 2 v6 . (61)
2 k0 v2 u
u 2
k1 v2 k2 2 v4
Example 6: Consider the cases with g u 1 . Substituting (35) and A 2 b`, C a`, B c ` or
A 2b , C a , B c into (15) - (17) yields the steady-state performances for real and complex
Steady-State Performance Analyses of Adaptive Filters 33
LMMN algorithms, which coincide with the results (see e.g. Lemma 6.8.1 and Lemma 7.8.1)
in [11].
Example 7: In Gaussian noise environments, based on (46) and (52), we can obtain
EMSE
u 2 v2 k3 v4 k4 2 v6 . (62)
2 k0 v2 u 2
u k1 v2 k2 2 v4
where k0 3, k1 12, k2 45, k3 6, k4 15 for real-valued cases, k0 2, k1 8, k2 18,
k3 4, k 4 6 for complex-valued cases.
1
g u 2
. (63)
u
Substituting (63) into (12), we get
1 u
2
u E , u E 2
. (64)
u
2
u2
2
In general, u , so u equals to u , and can be expressed as
1
u u E . (65)
u2
Substituting (65) into (15) yields a simplified expression for steady-state EMSE
C (66)
EMSE
A B
Observing from the above equation, we can find that EMSE is no longer related to the
regressor.
4. Simulation results
In section Ⅲ, some well-known real and complex adaptive algorithms, such as LMS
algorithm, LMF algorithm and LMMN algorithm have shown the accuracy of the
34 Adaptive Filtering
corresponding analysis results. In this section, we will give the computer simulation for the
steady-state performance of real LMP algorithm with odd parameter p 2 (here p 3 ), -
NLMP and -NLMMN algorithms (here 0.5 ), which have not been involved in the
previous literatures.
u i 1 au i 1 a 2 s i , (67)
11, whose corresponding q are 5 10 5 ,1 10 5 ,1 10 5 ,1 10 5 , where Q q2 I , also
show the optimal step-sizes, at which the steady-state MSE possess minimum values.
These tracking figures show that these minimum values are in good agreement with the
corresponding theoretical values, written by 0.0283 , 0.0074, 0.0058 , 0.0074 , respectively.
LMS
min 2p 1
LMP
, (69)
min 3
Fig. 1. Two theoretical (Theorem 1 and Theorem 2) and simulated MSEs for real LMP
algorithm under the regressor model M.1 and the noise model N.1.
Fig. 2. Two theoretical (Theorem 1 and Theorem 2) and simulated MSEs for real LMP
algorithm under the regressor model M.1 and the noise model N.2.
Steady-State Performance Analyses of Adaptive Filters 37
Fig. 3. Theoretical and simulated MSEs for real LMP algorithm under the regressor model
M.2 and the noise model N.1.
Fig. 4. Theoretical and simulated MSEs for real LMP algorithm under the regressor model
M.2 and the noise model N.2.
38 Adaptive Filtering
Fig. 5. Theoretical and simulated MSEs for real NLMP algorithm under the regressor model
M.2 and the noise model N.1.
Fig. 6. Theoretical and simulated MSEs for real LMMN algorithm under the regressor model
M.2 and the noise model N.1
Steady-State Performance Analyses of Adaptive Filters 39
Fig. 7. Theoretical and simulated MSEs for complex LMMN algorithm under the regressor
model M.2 and the noise model N.1.
Fig. 8. Two theoretical (Theorem 1 and Theorem 2) and simulated tracking MSEs for real
LMP algorithm under the regressor model M.1 and the noise model N.1.
40 Adaptive Filtering
Fig. 9. Two theoretical (Theorem 1 and Theorem 2) and simulated tracking MSEs for real
LMP algorithm under the regressor model M.1 and the noise model N.2.
Fig. 10. Theoretical and simulated tracking MSEs for real LMP algorithm under the
regressor model M.2 and the noise model N.1.
Steady-State Performance Analyses of Adaptive Filters 41
Fig. 11. Theoretical and simulated tracking MSEs for real LMP algorithm under the
regressor model M.2 and the noise model N.2.
Fig. 12. Comparisons of the tracking performance between LMS algorithm and LMP
algorithm in Gaussian noise environments and uniformly distributed noise environments.
42 Adaptive Filtering
5. Conclusions
Based on the Taylor series expansion (TSE) and so-called complex Brandwood-form series
expansion (BSE), this paper develops a unified approach for the steady-state mean-square-
error (MSE) and tracking performance analyses of adaptive filters. The general closed-form
analytical expressions for the steady-state mean square error (MSE), tracking performance,
optimal step-size, and minimum MSE are derived. These expressions are all second-order
approximate. For some well-known adaptive algorithms, such as least-mean-square (LMS)
algorithm, least-mean-forth (LMF) algorithm and least-mean-mixed norm (LMMN)
algorithm, the proposed results are all the same as those summarized by A. H. Sayed in [11].
For least-mean p-power (LMP) algorithm, the normalized type LMMN algorithm and LMP
algorithm (i.e., -NLMMN and -NLMP), their steady-state performances are also
investigated. In addition, comparisons with tracking ability between LMP algorithm with
p 2 and LMS algorithm, show that the superiority of the LMS algorithm over LMP
algorithm in Gaussian noise environments, and inferiority in uniformly distributed noise
environments. Extensive computational simulations show the accuracy of our analyses.
v , v 2 Re ea f e , e
e v , e v
2 Re e v f e , e
e v , e v
0. (A.1)
2 2
e , e v , v
2
2 Re ea f e , e e v f e , e e v f e , e
ee ev, ee ev,
e v e v
e v f e 1 e , e f e , e e v f e e , e
1
e e v , (A.2)
e v
f e e , e f e e , e e v f e, e e , e e v f e, e e , e
1 1 2 2
ev,
e v
2 Re f e v , v .
1
2
q e , e v , v f e , e f e , e
2
f e , e
ee e v , e v
e e e v , e v
f e , e f e e , e f e e , e
1 1
f e , e
e e v , e v
f e, e f e, e
f e e , e f e e , e
1 1 1 1
e e
(A.3)
f e, e f e, e f e, e f e, e
2 2
e , e e , e
e v , e v
2 2
f e v , v f e v , v 2 Re f v , v f e, e v , v
1 1 2
Steady-State Performance Analyses of Adaptive Filters 43
Here, we use two equalities: f e e , e f e e , e and f e, e e , e f e, e e , e
1 1 2 2
.
v 2 ea f e 2 e v f e 0. (B.1)
ev ev
2
e 2, e v 2 ea f e 2 e v f e e f e
1
ee ev
e ev
(B.2)
2 f e e e v f e, e e f e e 4 f e v
1 2 1 1
ev
2 2
q e , e v f e 2 f e f e e
2 1
ee e ev
ev (B.3)
2 2
f e e 2 f e f e, e e =2 f e1 v f v f e,2e v
1 2
2
ev
p
1 2
2 p 2 p /2
2
z z j x y
z z 2 x y
p 2
p
p /2 1 p /2 1
x y2 2x j x2 y 2 2y (C.1)
4 4
p p2 p
z x jy z p 2 z
2 2
obtain
2 1
2 1
E u q e , e E u q v , v E u q e v , v ea E u q e v , v ea
2 2
.(D.1)
1 2 2
1
2
E u q e , e v , v ea2 E u q e , e v , v ea E u q e , e v , v ea
2 2 2 2 2
2 2
44 Adaptive Filtering
Due to v being independent of ea and u (i.e., A.1 and A.2), the above equation can be
rewritten as
E u q e , e E u Eq v , v E u ea Eq e v , v E u ea Eq e v , v
2
2 2 1 2 1
1
1
E u ea2 Eq e , e v , v E u ea Eq e ,e v , v
2 2 2 2 2
. (D.2)
2 2
E u 2
ea
2
Eq v, v
2
e , e
Using ea uw
and A.5, we get
E u ea E u ea =E u ea2 E u e 0 .
2 2 2 2 2
a (D.3)
Hence, substituting (D.3) into (D.2) and using Lemma 1, we can obtain
E u q e , e CE u
2 2
BE u 2
ea
2
(D.4)
where B and C are defined by (18a). Substituting the following formula [see e.g. 6.5.18] in
[11], i.e.,
E u 2
ea
2
M 1 E e 2
u a
2
(D.5)
2
into (D.4), and using (5) and E u M u2 , we have
E u q e , e CM u2 B M 1 u2 TEMSE .
2
(D.6)
E u q e CE u BE u
2
2
2
ea
2
. (D.8)
2
where B and C are defined by (18b). Using E u M u2 and substituting the following
formula [see e.g. 6.5.20] in [11], i.e.,
Steady-State Performance Analyses of Adaptive Filters 45
E u 2
ea
2
M 2 E e
2
u a
2
(D.9)
where A is defined by (18b). Then, if the condition C.2 is satisfied, we can obtain (37) and
(38) while 2 , respectively.
Next, letting Tr Q 0 , we can obtain the EMSE expression of (37) in stationary
environments.
Finally, differentiating both-hand sides of (38) with respect to , and letting it be zero, we get
2 2 B M Tr Q Tr Q
opt opt 0 (D.12)
AMC MC u2
Then, we can obtain (39) by solving the above equation. This ends the proof of Theorem 2.
6. References
[1] S. Haykin, Adaptive Filter Theory. 4th edn. Englewood Cliffs, NJ: Prentice Hall 2002.
[2] B. Widrow and M. E. Hoff, Jr., “Adaptive switching circuits,” IRE WESCON Conv. Rec.,
Pt. 4, pp. 96-104, 1960.
[3] B. Windrow, J. M. Mccool, M. G. Larimore, and C. R. Hohnson, “Stationary and
nonstationary learning characteristics of the LMS adaptive filter, ” Proc. IEEE,
vol.46, pp. 1151-1161, 1976.
[4] V. J. Mathews and S. Cho, “Improved convergence analysis of stochastic gradient
adaptive filters using the sign algorithm,” IEEE Trans. Acoust., Speech, Signal
Processing, vol. 35, pp. 450-454, 1987.
[5] E. Walah and B. Widrow, “The least-mean fourth (LMF) adaptive algorithm and its
family,” IEEE Trans. Information Theory, vol. 30, pp. 275-283, 1984.
[6] J. A. Chambers, O. Tanrikulu and A. G. Constantinides, “Least mean mixed-norm
adaptive filtering,” Electron. Lett., vol. 30, pp. 1574-1575, 1994.
[7] O. Tanrikulu and J. A. Chambers, “Convergence and steady-state properties of the least-
mean mixed-norm (LMMN) adaptive algorithm,” IEE Proceedings- Vision, Image
Signal Processing, vol. 143, pp137-142, 1996.
[8] B. Lin, R. He, X. Wang and B. Wang, “Excess MSE analysis of the concurrent constant
modulus algorithm and soft decision-directed scheme for blind equalization,” IET
Signal Processing, vol. 2, pp. 147-155, Jun., 2008.
[9] D. L. Duttweiler, “Adaptive filter performance with nonlinearities in the correlation
multiplier,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-30, pp.578-586,
Aug., 1982.
46 Adaptive Filtering
[10] S. C. Douglas and T. H.-Y. Meng, “Stochastic gradient adaptation under general error
criteria,” IEEE Trans. Signal Processing, vol.42, pp.1335-1351, Jun., 1994.
[11] A. H. Sayed, Fundamentals of adaptive filtering, New York:Wiley, 2003.
[12] N. R. Yousef and A. H. Sayed, “A unified approach to the steady-state and tracking
analyses of adaptive filters”, IEEE Trans. Signal Processing, vol. 49, pp. 314-324, Feb.,
2001.
[13] J. H. Husøy and M. S. E. Abadi, “Unified approach to adaptive filters and their
performance,” IET Signal Processing, vol. 2, pp. 97-109, Jun., 2008.
[14] T. Y. Al-Naffouri and A. H. Sayed, “Transient analysis of adaptive filters with error
nonlinearities,” IEEE Trans. Signal Processing, vol. 51, pp. 653-663, Mar., 2003.
[15] O. Dabeer, and E. Masry, “Analysis of mean-square error and transient speed of the
LMS adaptive algorithm,” IEEE Trans. Information Theory, vol. 48, pp. 1873-1894,
July, 2002.
[16] N. J. Bershad, J. C. M. Bermudez and J. Y. Tourneret, “An affine combination of two
LMS adaptive filters-transient mean-square analysis,” IEEE Trans. Signal Processing,
vol. 56, pp. 1853-1864, May, 2008.
[17] B. Lin, R. He, L. Song and B. Wang, “Steady-state performance analysis for adaptive
filters with error nonlinearities,” Proc. of ICASSP, Taibei, Taiwan, pp. 3093-3096,
Apr., 2009.
[18] N. R. Yousef and A. H. Sayed, “Fixed-point steady-state analysis of adaptive filters,”
Int. J. Contr. Signal Processing, vol. 17, pp. 237-258, 2003.
[19] B. Lin, R. He, X. Wang and B. Wang, “The excess mean square error analyses for
Bussgang algorithm,” IEEE Signal Processing Letters, vol. 15, pp. 793-796, 2008.
[20] A. Goupil and J. Palicot, “A geometrical derivation of the excess mean square error for
Bussgang algorithms in a noiseless environment”, ELSEVIER Signal Processing, vol.
84, pp. 311-315, May, 2004.
[21] D. H. Brandwood, “A complex gradient operator and its application in adaptive array
theory,” Proc. Inst. Elect. Eng. F, H, vol. 130, pp. 11-16, Feb., 1983.
[22] G. Yan and H. Fan, “A Newton-like algorithm for complex variables with applications
in blind equalization,” IEEE Trans. Signal Processing, vol. 48, pp. 553-556, Feb., 2000.
[23] S. C. Pei, and C. C. Tseng, “Least mean p-power error criterion for adaptive FIR filter,”
IEEE Journal on Selected Areas in Communications, vol. 12, pp. 1540-1547, Dec. 1994.
[24] I. S. Reed, “On a moment theorem for complex Gaussian process,” IRE Trans.
Information Theory, pp. 194-195, April, 1962.
[25] V. H. Nascimento and J. C. M. Bermudez, “Probability of divergence for the least-mean
fourth (LMF) algorithm,” IEEE Trans. Signal Processing, vol. 54, pp. 1376-1385, Apr.,
2006.
[26] J. C. M. Bermudez and V. H. Nascimento, “A mean-square stability analysis of the least
mean fourth adaptive algorithm,” IEEE Trans. Signal Processing, vol. 55, pp. 4018-
4028, Apr., 2007.
[27] A. Zerguine, "Convergence and steady-state analysis of the normalized least mean
fourth algorithm," Digital Signal Processing, vol. 17 (1), pp. 17-31, Jan., 2007.
[28] B. Lin, R. He, X. Wang and B. Wang, “The steady-state mean square error analysis for
least mean p-order algorithm,” IEEE Signal Processing Letter, vol. 16, pp. 176-179,
2009.
3
1. Introduction
Over the past decades a number of new adaptive filter algorithms have been elaborated and
applied to meet demands for faster convergence and better tracking properties than earlier
techniques could offer. The Filtered LMS algorithm is currently the most popular method
for adapting a filter, due to its simplicity and robustness, which have made it widely
adopted in many applications. Applications include adaptive channel equalization, adaptive
predictive speech coding, Noise Suppression and on-line system identification. Recently,
because of the progress of digital signal processors, a variety of selective coefficient update
of gradient-based adaptive algorithms could be implemented in practice. Different types of
adaptive algorithms have been developed and used in conventional adaptive filters such as,
filtered LMS algorithms [1], [2], [3] and [4], filtered X-LMS algorithms [1], [2], [5], and [6],
filtered NLMS algorithms and RLS algorithms [1] and [2]. As a result, this chapter surveys
sequential filter adaptation techniques and some applications for transversal FIR filter.
In other words filters are devices or systems that processes (or reshapes) the input signal
according to some specific rules to generate an output signal Figure 1.
x1 y 1
x2 y 2 (1)
then ax1 bx2 ay 1 by 2
48 Adaptive Filtering
meanwhile in non-linear filters; there is a nonlinearity between input and output of the filter
that satisfy the following property:
x1 y 1 x12
x2 y 2 x22 (2)
then x1 x2 x1 x2
2
In this chapter we will be focusing on adaptive filtering which could automatically adjust
(or adapt) in the face of changing environments and changing system requirements by
training them to perform specific filtering or decision-making tasks and in which they
should be some “adaptation algorithm” for adjusting the system’s parameters figure (2) [7].
2. Signals
Before pursuing the study of adaptive systems, it is important to refresh our memory with
some useful definitions from the stochastic process theory. The representation of signals
could fall into two categories:
Deterministic Signals
Random Signals
x n e n cos wn u n (3)
where u(n) is the unit-step sequence and The response of a linear time-invariant filter to an
input x(n) is given by:
y n x n h n x h
k
k nk
(4)
where h(n) is the impulse response of the filter. Knowing that The Z-transform and its inverse
of a given sequence x(n) is defined as:
Z x n X Z x Z
n
n
n
(5)
1
X Z Z dZ
n1
x n
2 j c
where C is a counter clockwise closed contour in the region of convergence of X(z) and
encircling the origin of the z-plane as a result; by taking the Z-transform of both sides of
equation (4), we will obtain
F xn X e jw x e
n
n
jwn
(7)
while a stochastic random process is a rule (or function) that assigns a time function to every
outcome of a random experiment. The elements of a stochastic process, {x(n)}, for different
value of time-index n, are in general complex-valued random variable that are characterized
by their probability distribution functions. A stochastic random process is called stationary
in the strict sense if all of its (single and joint) distribution function is independent of a shift
in the time origin.
In this subsection we will be reviewing some useful definitions in stochastic process:
Stochastic Average
mx E x n (8)
where E is the expected value of x(n).
Autocorrelation function for a stochastic process x(n)
where x* denotes the complex conjugate of x and the symmetry properties of correlation
function is:
Furthermore a stochastic random process is called stationary in the wide sense if mx and
xx n , m are independent of a shift in the time origin for any k, m, and n therefore,
and
Special case
xx Z kZ
k
xx
k (15)
1
xx Z xx (16)
Z
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 51
Equation (16) implies that if xx Z is a rational function of z, then its poles and zeros must
occur in complex-conjugate reciprocal pairs. Moreover, the points that belong to the region
of convergence of xx Z also occur in complex-conjugate pairs which suggest that the
1
region of convergence of xx Z must be of the form a z which will cover the unit
a
circle z 1 . By assuming that xx Z is convergent on the unit circle’s contour C and by
letting Z = ejw then,
1
xx 0 e dw
jw
xx (17)
2
1
x2 E x n e dw
2
jw
xx (18)
2
xy n , m E x n y m (19)
xy Z kZ
k
xy
k
(21)
1
xy Z xy (22)
Z
Auto-covariance function for stationary process is defined as:
xx k E x n mx x n k mx xx k mx
2
(23)
and the symmetry properties of auto-covariance function
xx k xx k (24)
Special case
xy k E x n mx y n k my xy k mx my
(26)
and the symmetry properties of the cross-covariance function is:
xy k yx
k (27)
x n , N n N
xNn (28)
0, otherwise
and the discrete-time Fourier transform of xNn is computed as:
N
x
X N e jw
n
N n e jwn x e
nN
n
jwn
(29)
we will obtain
N N
2
X M e jw
n N m N
x n xm e
jw n m
(31)
E x x e jwn m
n N m N n m
(32)
2N
1 k
E X N e jw
2
2 N 1
2 N 1 2 N 1 xx k e jwk
k
(33)
By assuming that the summation on the right hand side of equation 33 will converge for
large k then
1
2N
E X N e jw
2
lim
N 2 N 1 2 N xx k e jwk xx e jw
k
(34)
The function xx e jw is called the power spectral density of the stochastic, wide-sense
stationary process {x(n)} which is always real and non-negative.
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 53
xy Z kZ
k
xy
k
E x y
k
n
nk
Z k
(35)
and
y n h x
l
l n l
(36)
yy Z H Z H Z1 xx Z (37)
and
xy e jw H e jw xx e jw
yx e H e e
jw jw
xx
jw
(38)
yy e H e e
jw jw
xx
jw
3. Regression
Data fitting is one of the oldest adaptive systems which are a powerful tool, allowing
predictions of the present or future events to be made based on information about the past
or present events. The two basic types of regression are linear regression and multiple
regressions.
decreases as d increases. The d intercept is the d value of the line when x equals zero. It
defines the elevation of the line.
d
Δd
Slope = Δd/Δx
Δx
d
intercept
x
Fig. 4. Slope and Intercept.
The deviation from the straight line which represents the best linear fits of a set of data as
shown in figure 5 is expressed as:
d≈w×x+b (39)
or more specifically,
1 N 2
e
2 N n 1 n
(42)
56 Adaptive Filtering
b 0
(43)
0
w
x d x x d
2
n n n n n x x d d
n n
b n n n n
w n
(44)
x x
2
2
N x n x n
n n
where the bar over the variable means the mean value and the procedure to determine the
coefficients of the line is called the least square method.
y = xw+b
y
. ..d(n)
d(n)
d(1) ..
x(n) (b,w) y(n) e(n) . .d (2)
b
x
Change
Parameters
the error and modifies the parameters of the system. Thus, the error e(n) is fed-back to the
system and indirectly affects the output through a change in the parameters (b,w). With the
incorporation of the mechanism that automatically modifies the system parameters, a very
powerful linear system can be built that will constantly seek optimal parameters. Such
systems are called Adaptive systems, and are the focus of this chapter.
p
p
e i d i b w k x i , k d i w k x i , k (45)
k 0 k 0
where for this case is to find the coefficient vector W that minimizes the MSE of e(i) over the i
samples (Figure 9).
for j = 0, 1, …, p.
By defining:
1
R n , j x x
N n n , k n , j
(48)
1
P j x d
N n n , j n
(49)
58 Adaptive Filtering
as the cross-correlation of the input x for index j and the desired response y and substituting
these definitions into Eq. (47), the set of normal equations can be written simply as:
P RW or W R 1 P (50)
where W is a vector with the p+1 weights wi in which W* represents the value of the vector
for the optimum (minimum) solution. The solution of the multiple regression problems can
be computed analytically as the product of the inverse of the autocorrelation of the input
samples multiplied by the cross-correlation vector of the input and the desired response.
All the concepts previously mentioned for linear regression can be extended to the multiple
regression case where J in matrix notation is illustrated as:
d2n
W T RW 2 PT W (51)
n N
where T means the transpose and the values of the coefficients that minimize the solution
are:
0 RW P or W R 1 P (52)
4. Wiener filters
The Wiener filter is a filter proposed by Norbert Wiener during the 1940s and published in
1949 [8]. Its purpose was to reduce the amount of noise present in a signal in comparison
with an estimation of the desired noiseless signal. This filter is an MSE-optimal stationary
linear filter which was mainly used for images degraded by additive noise and blurring. The
optimization of the filter is achieved by minimizing mean square error defined as the
difference between the output filter and the desired response (Figure 10) which is known as
the cost function expressed as:
E en
2
(53)
Fig. 10. block schematic of a linear discrete-time filter W(z) for estimating a desired signal d(n)
based on an excitation x(n) where d(n) and x(n) are random processes.
In signal processing, a causal filter is a linear and time-invariant causal system. The word
causal indicates that the filter output depends only on past and present inputs. A filter
whose output also depends on future inputs is non-causal. As a result two cases should be
considered for the optimization of the cost function (equation 53).
The filter W(Z) is causal and or FIR (Finite Impulse Response).
The filter W(Z) is non-causal and or IIR (Infinite Impulse Response Filter).
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 59
W w0 w1 wN 1
T
(54)
X x0 x1 x N 1
T
(55)
be the input vector signal where two cases should be treated separately depending on the
required application:
Real valued input signal
Complex valued input signal
N 1
w x
T T
y n i ni
W X n X n W (56)
i 0
Where
T
e n d n y n d n X n W (57)
2
E en E d n W X n d n X n W
T T
(58)
E d2n W E X n d n E X n d n W W E X n X n W
T T T T
P E X n d n p0 p1 p N 1
T
(59)
and with the same reasoning as above the autocorrelation function E X n X n could be
T
expressed as:
r0 ,0 r0 ,1 r0 , N 1
r1, N 1
R E X n X n
T
r1,0 r1,1
(60)
rN 1,0 rN 1,1 rN 1, N 1
T T
E d2n 2 W P W RW (61)
where W in the quadratic function expressed in equation 47 will have a global minimum if
and only if R is a positive definite Matrix.
The gradient method is the most commonly used method to compute the tap weights that
minimizes the cost function therefore,
0 (62)
wi w0 w1 wN 1
0
wi
N 1
0 2 pi wl rli ril (63)
l0
N 1
w r
l 0
l li ril 2 pi
N 1 N 1 N 1
where we have assumed in the expanded equation E d2n 2 pl wl wl wl rlm that:
l0 l0 m0
N 1 N 1 N 1 N 1 N 1
wwr
l 0 m0
l l lm wwr
l0 m0
l l lm wi wmrim wi2 rii
m0
(64)
li mi li
Knowing that rli = ril due to the symmetry property of the autocorrelation function of real-
valued signal equation 49 could be expressed as:
N 1
wr
l0
l il pi (65)
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 61
RW op P (66)
known as Weiner-Hopf equation, whose solution for the optimal tap-weight vector W op and
by assuming that R has an inverse matrix, is:
W op R 1 P (67)
Finally, the minimum value of the cost function can be expressed as:
T T T
min E d2n W op P E d2n W op RW op E d2n P R 1 P (68)
E e n E e n en
2
(69)
and the gradient of the cost function would be:
N 1
e n d n w k x n k (71)
k 0
cwi e n x n i cwi wi
cwi en xn i cwi wi
wi wi
cwi wi j 1 j j 0 (72)
wi , R wi , I
wi wi
cwi wi j 1 j j 2
wi, R wi, I
Where the sub-indices R and I refers to the real part and imaginary part of the complex
number therefore, the gradient of the cost function (Equation 70) would be:
E eo n xn i 0 (74)
By defining
T H
x n x n x n 1 x n N 1 x*n x*n 1 x*n N 1
(75)
T H
w n w0* w1* wN* 1 w 0 w1 xN 1
where H denotes the complex-conjugate transpose or Hermitian and by re-writing equation
71 as:
H
e n d n w o x n (76)
Rw o p (77)
where
R E x n xn
H
(78)
p E x n dn
Equation 77 is known as Weiner-Hopf equation for the complex valued signals and the
minimum of the cost function will be:
H
min E d2n w o Rw o (79)
W k 1 W k k (80)
where µ is positive known as the step size and k denotes the gradient vector
2 RW 2 p evaluated at the point W W k . Therefore, equation 80 could be formulated
as:
W k 1 W k 2 RW k p k 1 2 R W k W 0 (81)
Knowing that the auto-correlation matrix R may be diagonalized by using the unitary
similarity decomposition:
R QQT (82)
v k W k W 0 (83)
v k 1 I 2 Q Q T v k (84)
'
For i = 0, 1, 2, ... , N – 1 and v k v0' k v1' k vN' 1 k that yields:
Equation 86 converges to zero if and only if the step-size parameter μ is selected so that:
1 2 i 1 (87)
which means
0 i (88)
0 max (89)
W k 1 W k R 1 k (90)
W k 1 W k 2 R 1 RW k p
1 2 W k 2 R p 1
(91)
1 2 W k 2 W 0
W k 1 W 0 1 2 W k W 0 (92)
where in actual implementation of adaptive filters, the exact values of k and R – 1 are not
available and have to be estimated.
W n 1 W n e2n (93)
T
T
where W n w0n w1n wN 1n and .
w0 w1 wN 1
The ith element of the gradient vector e2n is:
e2n 2 e n x n (95)
W n 1 W n 2 e n x n (96)
T
where x n x n x n 1 x n N 1 .
The main item of note about sound reflections off of hard surfaces is the fact that they
undergo a 180-degree phase change upon reflection. This can lead to resonance such as
standing waves in rooms. It also means that the sound intensity near a hard surface is
enhanced because the reflected wave adds to the incident wave, giving pressure amplitude
that is twice as great in a thin "pressure zone" near the surface. This is used in pressure zone
microphones to increase sensitivity. The doubling of pressure gives a 6-decibel increase in
the signal picked up by the microphone figure (19). Since the reflected wave and the
incident wave add to each other while moving in opposite directions, the appearance of
propagation is lost and the resulting vibration is called a standing wave. The modes of
vibration associated with resonance in extended objects like strings and air columns have
characteristic patterns called standing waves. These standing wave modes arise from the
combination of reflection and interference such that the reflected waves interfere
constructively with the incident waves. An important part of the condition for this
constructive interference is the fact that the waves change phase upon reflection from a fixed
end. Under these conditions, the medium appears to vibrate in segments or regions and the
fact that these vibrations are made up of traveling waves is not apparent - hence the term
"standing wave" figure (19).
Fig. 22. The fundamental and second harmonic standing waves for a stretched string.
The sound intensity from a point source of sound will obey the inverse square law if there
are no reflections or reverberation figure (23). Any point source, which spreads its influence
equally in all directions without a limit to its range, will obey the inverse square law. This
comes from strictly geometrical considerations. The intensity of the influence at any given
radius r is the source strength divided by the area of the sphere. Being strictly geometric in
its origin, the inverse square law applies to diverse phenomena. Point sources of
gravitational force, electric field, light, sound or radiation obey the inverse square law.
A plot of this intensity drop shows that it drops off rapidly figure (24). A plot of the drop of
sound intensity according to the inverse square law emphasizes the rapid loss associated
with the inverse square law. In an auditorium, such a rapid loss is unacceptable. The
reverberation in a good auditorium mitigates it. This plot shows the points connected by
straight lines but the actual drop is a smooth curve between the points.
Fig. 25. Reverberant sound is the collection of all the reflected sounds in an auditorium.
7.2 ANC
In order to cancel unwanted noise, it is necessary to obtain an accurate estimate of the noise
to be cancelled. In an open environment, where the noise source can be approximated as a
point source, microphones can be spaced far apart as necessary and each will still receive a
substantially similar estimate of the background noise. However in a confined environment
containing reverberation noise caused by multiple sound reflections, the sound field is very
complex and each point in the environment has a very different background noise signal.
The further apart the microphones are, the more dissimilar the sound field. As a result, it is
difficult to obtain an accurate estimate of the noise to be cancelled in a confined
environment by using widely spaced microphones.
The proposed model is embodied in a dual microphone noise suppression system in which
the echo between the two microphones is substantially cancelled or suppressed.
72 Adaptive Filtering
Reverberations from one microphone to the other are cancelled by the use of first and
second line echo cancellers. Each line echo canceller models the delay and transmission
characteristics of the acoustic path between the first and second microphones Figure (26).
Fig. 26. A pictorial representation of the sound field reaching an ear set in accordance with
the proposed model intended to be worn in the human ear.
If the two microphones are moved closer together, the second microphone should provide a
better estimate of the noise to be cancelled in the first microphone. However, if the two
microphones are placed very close together, each microphone will cause an additional echo
to strike the other microphone. That is, the first microphone will act like a speaker (a sound
source) transmitting an echo of the sound field striking the second microphone. Similarly,
the second microphone will act like a speaker (a sound source) transmitting an echo of the
sound field striking the first microphone. Therefore, the signal from the first microphone
contains the sum of the background noise plus a reflection of the background noise, which
results in a poorer estimate of the background noise to be cancelled figures (27) and (28).
Fig. 29. A noise suppression System in accordance with a first embodiment of the proposed
model.
s1 ( n) p2 ( n) =r v( n)
s1 (n)
p1 (n) e1 (n) p3 (n) e3 (n) v(n)
Adaptive Adaptive
Delay
Mic1 Filter Filter
y (n)
+
Mic2
Adaptive Adaptive
Delay
Filter p2 (n) e2 (n) Filter p4 (n) e4 (n) v(n)
s2 ( n )
s 2 (n) p1 (n) = r v (n)
Fig. 31. An alternate scheme for a noise suppression communications system in accordance
with a second embodiment of the proposed model.
Fig. 32. Simulink block diagram of the proposed noise suppressor in figure 29.
The conceptual key to this proposed model is that the signals received at two closely spaced
microphones in a multi-path acoustic environment are each made up of a sum of echoes of
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 75
the signal received at the other one. This leads to the conclusion that the difference between
the two microphone signals is a sum of echoes of the acoustic source in the environment. In
the absence of a speech source, the ANS scheme proposed by Jaber first attempts to isolate
the difference signal at each of the microphones by subtracting from it an adaptively
predicted version of the other microphone signal. It then attempts to adaptively cancel the
two difference signals. When speech is present (as detected by some type of VAD-based
strategy), the adaptive cancellation stage has its adaptivity turned off (i.e. the impulse
responses of the two FIR filters, one for each microphone, are unchanged for the duration of
the speech). The effect here is that the adaptive canceller does not end up cancelling the
speech signal contained in the difference between the two microphone signals.
The Simulink implementation of the noise suppression system illustrated in figure 29 is
displayed in figure 32 where figures 33 and 34 display the noise captured by Mic.1 and Mic.
2 respectively, meanwhile figure 35 shows the filter output.
The Simulink implementation of the Active Noise Suppressor illustrated in Figure 30 with
no Voice Activity Detection (VAD) attached is sketched in 36 and the taking off plane’s
noise captured by Mic.1 and Mic. 2 are presented in figures 37 and 38 respectively
meanwhile figure 39 shows the system output.
Fig. 36. Simulink block diagram of the proposed noise suppressor in figure 30
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 77
0.5
0.4
0.3
0.2
0.1
Mic 1
-0.1
-0.2
-0.3
-0.4
-0.5
4 6 8 10 12 14 16 18 20 22 24
Time (s)
0.5
0.4
0.3
0.2
0.1
Output y(n)
-0.1
-0.2
-0.3
-0.4
-0.5
4 6 8 10 12 14 16 18 20 22 24
Time (s)
Fig. 41. The clean speech obtained at the output of our proposed ANC (Fig. 30) by using a
VAD.
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 79
Fig. 42. The clean speech obtained at the output of our proposed ANC (Fig. 30) by reducing
the time scale
xn x0 x1 x2 x N 1 (97)
1 0 ... 0
0 0 for l c
1 ... 1
Ir I l ,c (98)
... ... ... ... 0 elsewhere
0 0 ... 1
for l = c = 0, 1, ..., r – 1.
Based on what was proposed in [2]-[9]; we can conclude that for any given discrete signal
x(n) we can write:
1 M Jaber “Method and apparatus for enhancing processing speed for performing a least mean square
l = (rn + p) (100)
for p = 0, 1, …, r – 1.
The mean (or Expected value) of xn is given as:
N 1
x n
Ex n0 (101)
N
which could be factorizes as:
N /r 1 N /r 1
x rn x rnr 1
Ex n0 n 0
x
N
1 N /r 1 (102)
xrn p
rn p
=r
x
n0
=
N r
r
therefore, the mean of the signal xn is equal to sum of the means of its r partial signals
divided by r for p = 0, 1, …, r – 1.
Similarly to the mean, the variance of the signal xn equal to sum of the variances of its r
partial signals according to:
N 1 2
Var x x2 E x x n
2
n0
x
N /r 1 N /r 1
x
2 2
=
n 0
rn
rn
n 0
rn r 1
rn r 1 (103)
r 1
= x2 rn p
p0
0 d0 y 0 , r dr y r , 2 r d2 r y 2 r ,...,
(104)
rn drn yrn , 1 d1 y1 ,..., rn1 d rn1 y rn1 ,..., rn r 1 d rn r 1 y rn r 1
for n = 0, 1, …, (N/r) – 1.
The Ultra High Speed LMS Algorithm Implemented on
Parallel Architecture Suitable for Multidimensional Adaptive Filtering 81
According to the method of least squares, the best fitting curve has the property that:
r 1 N r 1
2
rn j0
= a minimum
j0 0 n0
The parallel implementation of the least squares for the linear case could be expressed
as:
e n d n b wxn dn y n
= I r d rn j0 y rn j0 (106)
= I r e rn j0
for j0 = 1, …, r – 1 and in order to pick the line which best fits the data, we need a criterion to
determine which linear estimator is the “best”. The sum of square errors (also called the
mean square error (MSE)) is a widely utilized performance criterion.
1 N 1 2
J e
2 N n0 n
N r 1 (107)
r
r 1 2
1 1
r
2 N
I r e rn j0
j0 0 n 0
N r 1
1
r 1
r
J 1 I e 2
j0 0
r N
2
r
n0
rn j0
(108)
r I J = 1 J
r 1 r 1
= 1
j0 0
r j0 r j0 0
j0
where J j
0
is the partial MSE applied on the subdivided data.
Our goal is to minimize J analytically, which according to Gauss can be done by taking its
partial derivative with respect to the unknowns and equating the resulting equations to
zero:
J
b 0
(109)
J 0
w
which yields:
82 Adaptive Filtering
r 1
1 r I r J j0
J j0 0
r 1
1r
0
J j0
b b j0 0 b
(110)
1
r 1
J r I r J j0 r 1 J
1
r 0 0
j0 0 j
w w j0 0 w
With the same reasoning as above the MSE could be obtained for multiple variables by:
2
1 p
J
2N n
d n w k x n ,k
k 0
2
1 r
N 1
1 r 1 p
= d w x (111)
r j0 0 2 N
r
rn j0
n 0 k 0
rn j0 ,k rn j0 ,k
1 r 1
J j0
r j0 0
for j0 = 1, …, r – 1 and where J j
0
is the partial MSE applied on the subdivided data.
The solution to the extreme (minimum) of this equation can be found in exactly the same
way as before, that is, by taking the derivatives of J with respect to the unknowns (wk), and
j
J
w J (112)
w
where in this case the bias b is set to zero.
Start the search with an arbitrary initial weight w j , where the iteration is denoted by the
0 0
index in parenthesis (Fig 43). Then compute the gradient of the performance surface at w j ,
0 0
and modify the initial weight proportionally to the negative of the gradient at w j0 0 . This
changes the operating point to w j . Then compute the gradient at the new position w j ,
0 1 0 1
and apply the same procedure again, i.e.
w j0 w j0 J j0 (113)
k 1 k k
where η is a small constant and J denotes the gradient of the performance surface at the
0 k
j
kth iteration of j0 parallel segment. η is used to maintain stability in the search by ensuring
that the operating point does not move too far along the performance surface. This search
procedure is called the steepest descent method (fig 43)
1
J k I r J j0 k
r
J
1
w k w r
I J
r j0 k
N (114)
1
1 1 r 1 r
2
w 2 Nr
e2 I r ern j0
w 2 Nr j0 0 n0
1
I r e rn j0 k x rn j0 k
r w
84 Adaptive Filtering
What this equation tells us is that an instantaneous estimate of the gradient is simply the product
of the input to the weight times the error at iteration k. This means that with one multiplication
per weight the gradient can be estimated. This is the gradient estimate that leads to the
famous Least Means Square (LMS) algorithm (Fig 44).
If the estimator of Eq.114 is substituted in Eq.113, the steepest descent equation becomes
1
Ir w j Ir w j0 j e rn j x rn j
(115)
0
r
0 0 k
k 1 k
0 k
for j0 = 0,1, …, r -1.
This equation is the r Parallel LMS algorithm, which is used as predictive filter, is illustrated
in Figure 45. The small constant η is called the step size or the learning rate.
d rn
x rn
y rn e rn
Adaptive
Delay
l
d rn 1
x rn 1
y rn 1 e rn 1
Adaptive
Delay
l
e n
Jaber
Product
_
Device
_
d rn j0
x rn j0
y rn j0 e rn j0
Adaptive
Delay
l
original signal
5
-5
0 1000 2000 3000 4000 5000 6000 7000 8000
predicted signal
5
-5
0 1000 2000 3000 4000 5000 6000 7000 8000
error
2
-2
0 1000 2000 3000 4000 5000 6000 7000 8000
-5
0 500 1000 1500 2000 2500 3000 3500 4000
first portion of predicted signal
5
-5
0 500 1000 1500 2000 2500 3000 3500 4000
first portion of error
2
-2
0 500 1000 1500 2000 2500 3000 3500 4000
-5
0 500 1000 1500 2000 2500 3000 3500 4000
second portion of predicted signal
5
-5
0 500 1000 1500 2000 2500 3000 3500 4000
second portion of error
5
-5
0 500 1000 1500 2000 2500 3000 3500 4000
-5
0 1000 2000 3000 4000 5000 6000 7000 8000
reconstructed predicted signal
5
-5
0 1000 2000 3000 4000 5000 6000 7000 8000
reconstructed error signal
5
-5
0 1000 2000 3000 4000 5000 6000 7000 8000
Convergence of LMS
2
10
0
10
Mean Square Error MSE
-2
10
-4
10
-6
10
-8
10
0 100 200 300 400 500 600 700 800 900 1000
samples
Fig. 50. Simulation Result of the channel equalization (blue curve two LMS implemented in
Parallel red curve one LMS).
88 Adaptive Filtering
8. References
[1] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 1991.
[2] Widrow and Stearns, " Adaptive Signal Processing ", Prentice Hall 195.
[3] K. Mayyas, and T. Aboulnasr, "A Robust Variable Step Size LMS-Type Algorithm:
Analysis and Simulations", IEEE 5-1995, pp. 1408-1411.
[4] T. Aboulnasar, and K. Mayyas, "Selective Coefficient Update of Gradient-Based Adaptive
Algorithms", IEEE 1997, pp. 1929-1932.
[5] E. Bjarnason: "Analysis of the Filtered X LMS Algorithm ", IEEE 4 1993, pp. III-511, III-
514.
[6] E.A. Wan, "Adjoint LMS: An Efficient Alternative To The Filtered X LMS And Multiple
Error LMS Algorithm", Oregon Graduate Institute Of Science & Technology,
Department Of Electrical Engineering And Applied Physics, P.O. Box 91000,
Portland, OR 97291.
[7] B. Farhang-Boroujeny: “Adaptive Filters, Theory and Applications”, Wiley 1999.
[8] Wiener, Norbert (1949). “Extrapolation, Interpolation, and Smoothing of Stationary Time
Series”, New York: Wiley. ISBN 0-262-73005-7
[9] M. Jaber “Noise Suppression System with Dual Microphone Echo Cancellation US patent
no.US-6738482.
[10] M. Jaber, “Voice Activity detection Algorithm for Voiced /Unvoiced Decision and Pitch
Estimation in a Noisy Speech feature Extraction”, US patent application no.
60/771167, 2007.
[11] M. Jaber and D. Massicottes: “A Robust Dual Predictive Line Acoustic Noise Canceller”,
International Conference on Digital Signal Processing DSP 2009 Santorini Greece.
[12] M. Jaber, D. Massicotte, "A New FFT Concept for Efficient VLSI Implementation: Part I
– Butterfly Processing Element", 16th International Conference on Digital Signal
Processing (DSP’09), Santorini, Greece, 5-7 July 2009.
[13] J.C. Principe, W.C. Lefebvre, N.R. Euliano, “Neural Systems: Fundamentals through
Simulation”, 1996.
4
Japan
1. Introduction
In recent years, adaptive filters are used in many applications, for example an echo
canceller, a noise canceller, an adaptive equalizer and so on, and the necessity of their
implementations is growing up in many fields. Adaptive filters require various
performances of a high speed, lower power dissipation, good convergence properties, small
output latency, and so on. The echo-canceller used in the videoconferencing requires a fast
convergence property and a capability to track the time varying impulse response (Makino
& Koizumi, 1988). Therefore, implementations of very high order adaptive filters are
required. In order to satisfy these requirements, highly-efficient algorithms and
architectures are desired. The adaptive filter is generally constructed by using the
multipliers, adders and memories, and so on, whereas, the structure without multipliers has
been proposed.
The LMS adaptive filter using distributed arithmetic can be realized by using adders and
memories without multipliers, that is, it can be achieved with a small hardware. A
Distributed Arithmetic (DA) is an efficient calculation method of an inner product of
constant vectors, and it has been used in the DCT realization. Furthermore, it is suitable for
time varying coefficient vector in the adaptive filter. Cowan and others proposed a Least
Mean Square (LMS) adaptive filter using the DA on an offset binary coding (Cowan &
Mavor, 1981; Cowan et al, 1983). However, it is found that the convergence speed of this me-
thod is extremely degraded (Tsunekawa et al, 1999). This degradation results from an offset
bias added to an input signal coded on the offset binary coding. To overcome this problem,
an update algorithm generalized with 2’s complement representation has been proposed
(Tsunekawa et al, 1999), and the convergence condition has been analyzed (Takahashi et al,
2002). The effective architectures for the LMS adaptive filter using the DA have been
proposed (Tsunekawa et al, 1999; Takahashi et al, 2001). The LMS adaptive filter using
distributed arithmetic is expressed by DA-ADF. The DA is applied to the output calculation,
i.e., inner product of the input signal vector and coefficient vector. The output signal is
obtained by the shift and addition of the partial-products specified with the bit patterns of
the N-th order input signal vector. This process is performed from LSB to MSB direction at
the every sampling instance, where the B indicates the word length. The B partial-products
90 Adaptive Filtering
used to obtain the output signal are updated from LMB to MSB direction. There exist 2N
partial-products, and the set including all the partial-products is called Whole Adaptive
Function Space (WAFS). Furthermore, the DA-ADF using multi-memory block structure
that uses the divided WAFS (MDA-ADF) (Wei & Lou, 1986; Tsunakawa et al, 1999) and the
MDA-ADF using half-memory algorithm based on the pseudo-odd symmetry property of
the WAFS (HMDA-ADF) have been proposed (Takahashi et al, 2001). The divided WAFS is
expressed by DWAFS.
In this chapter, the new algorithm and effective architecture of the MDA-ADF are discussed.
The objectives are improvements of the MDA-ADF permitting the increase of an amount of
hardware and power dissipation. The convergence properties of the new algorithm are
evaluated by the computer simulations, and the efficiency of the proposed VLSI architecture
is evaluated.
T
S k s k , s k 1 , , s k N 1
, (1)
where, s(k) is an input signal at k time instance, and the T indicates a transpose of the vector.
The output signal of an adaptive filter is represented as
y k ST k W k , (2)
W k 1 W k 2 e k S k , (4)
where, the e(k), μ and d(k) are an error signal, a step-size parameter and the desired signal,
respectively. The step-size parameter deterimines the convergence speed and the accuracy
of the estimation. The error signal is obtained by
e k d k y k . (5)
The fundamental structure of the LMS adaptive filter is shown in Fig. 1. The filter input
signal s(k) is fed into the delay-line, and shifted to the right direction every sampling
instance. The taps of the delay-line provide the delayed input signal corresponding to the
depth of delay elements. The tap outputs are multiplied with the corresponding
coefficients, the sum of these products is an output of the LMS adaptive filter. The error
signal is defined as the difference between the desired signal and the filter output signal.
The tap coefficients are updated using the products of the input signals and the scaled
error signal.
An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures 91
N
y aT v ai vi (6)
i 1
a a 0 , a 1 , a N 1 T
(7)
v v0 , v1 , v N 1 T . (8)
In Eq.(8), the vi is represented on B-bit fixed point and 2’s complement representation, that
is,
1 v i 1
and
B 1
vi vi0 vik 2 k , i 0,1, , N 1 . (9)
k 1
92 Adaptive Filtering
In the Eq.(9), vik indicates the k-th bit of vi, i.e., 0 or 1. By substituting Eq.(9) for Eq.(6),
B1
y v00 , v10 , , vN
0
v ,v
1
k 1
k
0
k k
1 , , vN 1 2 k
(10)
N 1
v0k , v1k , , vN
k
av
1
i0
k
i i . (11)
Eq.(10) indicates that the inner product of y is obtained as the weighted sum of the partial-
products. The first term of the right side is weighted by -1, i.e., sign bit, and the following
terms are weighted by the 2-k. Fig.2 shows the fundamental structure of the FIR filter using
the DA (DA-FIR). The function table is realized using the Read Only Memory (ROM), and
the right-shift and addition operation is realized using an adder and register. The ROM
previously includes the partial-products determined by the tap coefficient vector and the
bit-pattern of the input signal vector. From above discussions, the operation time is only
depended on the word length B, not on the number of the term N, fundamentally. This
means that the output latency is only depended on the word length B. The FIR filter using
the DA can be implemented without multipliers, that is, it is possible to reduce the amount
of hardware.
S k A k F . (12)
y k ST k W k FT AT k W k . (13)
An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures 93
In Eq.(12) and Eq(13), an address matrix which is determined by the bit pattern of the input
signal vector is represented as
T
b0 k b0 k 1 b0 k N 1
b k b1 k 1 b1 k N 1
Ak 1 (14)
bB 1 k bB 1 k 1 bB 1 k N 1 ,
T
F 2 0 , 2 1 , , 2 B 1 , (15)
where, bi(k) is an i-th bit of the input signal s(k). In Eq.(13), AT(k)W(k) is defined as
P k AT k W k , (16)
yk F T P k . (17)
The P(k) is called adaptive function space (AFS), and is a B-th order vector of
T
P k p0 k , p1 k , , pB 1 k . (18)
The P(k) is a subset of the WAFS including the elements specified by the row vectors (access
vectors) of the address matrix. Now, multiplying both sides by AT(k), Eq.(4) becomes
AT k W k 1 AT k W k 2 e k A k F
AT k W k 2 e k AT k A k F . (19)
P k AT k W k
and
P k 1 A T k W k 1 , (20)
the relation of them can be explained as
P k 1 P k 2 e k AT k A k F . (21)
P k 1 P k 0.5 Ne k F , (23)
where, the operator E[ ] denotes an expectation. Eq.(23) can be performed by only shift and
addition operations without multiplications using approximated μN with power of two,
that is, the fast real-time processing can be realized. The fundamental structure is shown in
Fig.3, and the timing chart is also shown in Fig.4. The calculation block can be used the
fundamental structure of the DA-FIR, and the WAFS is realized by using a Random Access
Memory (RAM).
R N /M .
96 Adaptive Filtering
The capacity of the individual DWAFS is 2R words, and the total capacity becomes smaller
2RM words than the DA’s 2N words. For the smaller WAFS, the convergence of the algorithm
can be achieved by smaller iterations. The R-th order coefficient vector and the related AFS
is represented as
Wm k w m 0 k , w m 1 k , , w m R 1 k T , (24)
Pm k pm 0 k , pm 1 k , , pm B 1 k T , (25)
( m 0,1, , M 1 ; R N / M ),
Pm k A Tm k Wm k . (26)
M
y k FT Pm k , (27)
m1
T
bm0 k bm0 k 1 bm0 k R 1
b k bm1 k 1 bm1 k R 1
A m k
m1
. (28)
bm B 1 k bm B 1 k 1 bm B 1 k R 1
The update formula of the MDA-ADF is represented as
Pm k 1 Pm k 0.5 Re k F . (29)
The fundamental structure and the timing chart are shown in Fig.5 and 6, respectively.
The expression of “addressMSB” indicates the MSB of the address. Fig.7 shows the
difference of the access method between MDA and HMDA. The HMDA accesses the WAFS
with the R-1 bits address-line without MSB, and the MSB is used to activate the 2’s
complementors located both sides of the WAFS. Fig. 8 shows the comparison of the
convergence properties of the LMS, MDA, and HMDA. Results are obtained by the
computer simulations. The simulation conditions are shown in Table 2. Here, IRER
98 Adaptive Filtering
represents an impulse response error ratio. The step-size parameters of the algorithms were
adjusted so as to achieve a final IRER of -49.5 [dB]. It is found that both the MDA(R=1) and
the HMDA(R=2) achieve a good convergence properties that is equivalent to the LMS’s one.
Since both the MDA(R=1) and the HMDA(R=2) access the DWAFS with 1 bit address, the
DWAFS is the smallest size and defined for every tap of the LMS. The convergence speed of
the MDA is degraded by increasing R (Tsunekawa et al, 1999). This means that larger
capacity of the DWAFS needs larger iteration for the convergence. Because of smaller
capacity, the convergence speed of the HMDA(R=4) is faster than the MDA(R=4)’s one
(Takahashi et al, 2001). The HMDA can improve the convergence speed and reduce the
capacity of the WAFS, i.e., amount of hardware, simultaneously.
Fig. 7. Comparison of the access method for the WAFS. (a) Full-memory algorithm (b) Half-
memory algorithm.
Fig. 9. Comparison of the convergence characteristics for the different combinations of the
direction on the output calculation and update. The step-size of 0.017 is used for the (a) and
(c), and 0.051 is used for the (b) and (d).
Fig. 10. Relation of the timing between read and write of the DWAFS.
In the HMDA-ADF, the activation of the 2’s complementor is an exceptional processing for
the algorithm, that is, the processing time increases. The new algorithm is performed
without the activation of the 2’s complementor by use of the pseudo-odd symmetry
property. This is realized by using the address having inverted MSB instead of the 2’s
complementor. This new algorithm is called a simultaneous update algorithm, and the
MDA-ADF using this algorithm is called SMDA-ADF. Fig. 10 shows the timing of the read
and write of the DWAFS. The partial-product is read after writing the updated partial-
products.
The SMDA algorithm is represented as follows. The filter output is obtained as the same
manner in the MDA-ADF. The m-th output of the m-th DWAFS is
B1
y m k F 'TB 1 i Pm , B 1 i k . (30)
i 0
An LMS Adaptive Filter Using Distributed Arithmetic - Algorithms and Architectures 101
The output signal is the sum of these M outputs, and this can be expressed as
M
y( k ) ym k . (31)
m1
The address matrix including the inverted MSB for the output calculation is represented
as
T
bm 0 k bm 0 k 1 bm0 k R 1
b k bm 1 k 1 bm 1 k R 1
out
Am k m1 . (33)
m B 1 m B 1 k 1
b k b bm B 1 k R 1
This algorithm updates the two partial-products according to the address and its inverted
address, simultaneously. When the delays in Fig.10 is expressed by d, the address matrix for
the update procedure is represented as
T
bm0 k bm0 k 1 bm0 k R 1
b k b k 1 b k R 1
m B 1 d m B 1 d m B 1 d
.
m k
A up (34)
bm B d k 1 bm B d k 2 bm B d k R
bm B 1 k 1 bm B 1 k 2 bm B 1 k R
The update formulas are
( m 1 , 2 , , M ; i 0 ,1 , , B d ),
and
( m 1 ,2 , , M ; i B d 1 , , B 1 ).
102 Adaptive Filtering
e k d k y k
.
_
In Eq.(36) and Eq.(38), Pm,i(k) is the AFS specified by the inverted addresses.
4.3 Architecture
Fig.12 shows the block diagram of the SMDA-ADF. Examples of the sub-blocks are shown
in Fig.13, Fig.14, and Fig.15. In Fig.12, the input signal register includes (2N+1)B shift-
registers. The address matrix is provided to the DWAFS Module (DWAFSM) from the
input register. The sum of the M-outputs obtained from M-DWAFSM is fed to the Shift-
Adder.
After the shift and addition in B times, the filter output signal is obtained. The obtained two
error signals, the e(k-1) and the - e(k-1), are scaled during reading the partial-products to be
updated. In Fig.13, the DWAFSM includes the 2R+2 B-bit register, 1 R-bit register, 2
decoders, 5 selectors, and 2 adders. The decoder provides the select signal to the selectors.
The two elements of DWAFS are updated, simultaneously. Fig.16 shows the timing chart of
the SMDA-ADF. The parallel computation of the output calculation and update procedure
are realized by the delayed update method.
Observation noise White Gaussian noise independent to the input signal, 45dB
Fig. 15. Example of the input registers for B=4 and R=2. The (a) is for output calculation, and
the (b) is for update.
Fig. 16. Timing chart of the SMDA-ADF. The sampling period is equal to the word length of
the input signal. The current update procedure begins after delays.
DLMS (Meyer & Agrawal, 1993), pipelined LMS structure (Harada et al, 1998), and pipelined
NCLMS (Takahashi et al, 2006). Table 4 shows the evaluation conditions. The result for the
SMDA and MDA are shown in Table 5, and the others using the multipliers are shown in
Table 6. These results were obtained by a VLSI design system PARTHENON (NTT DATA
Corporation, 1990). It is found that the SMDA can achieve the high-sampling rate of 380% and
small output latency of 67% against the MDA, whereas, the power dissipation and the area are
increased. However, the improvement of the sampling rate and latency exceed the
degradation of the power dissipation and the area. The methods in Table 6 need both of the
very large amount of gates and the area against the SMDA. From these results, it is found that
the SMDA has advantages of small amount of hardware, a sampling rate close to the LMS.
MDA SMDA
Machine cycle [ns] 31 21
Sampling rate [MHz] 0.79 3.00
Latency [ns] 713 479
Power dissipation [W] 6.40 16.47
Area [mm2] 36 54
Number of gates 175,321 258,321
Table 5. Comparison of the VLSI evaluations for the MDA and the SMDA.
5. Conclusions
In this chapter, the new LMS algorithm using distributed arithmetic and its VLSI
architecture have been presented. According the discussions, we conclude as follows:
1. The SMDA-ADF can achieve the good convergence speed, higher sampling rate and
small output latency than the conventional MDA-ADF.
106 Adaptive Filtering
2. The small amount of hardware is the feature of the SMDA-ADF against the ADFs
employing the multipliers.
3. In spite of the delayed adaptation, the convergence speed is equivalent to the LMS’s one.
4. The convergence property depends on the combination of the direction of the output
and update procedure. The output calculation from LSB to MSB and the update
procedure with reverse direction is the best, when the step-size parameter is close to the
upper bound.
6. References
Widrow, B., Glover,J.R., McCool, J.M., Kaunitz, J., Williams, C.S., Hearn, R.H., Zeidler, J.R.,
Dong, E. & Goodlin, R.C. (1975). Adaptive noise cancelling: Principles and
applications. Proc. IEEE, vol.63, pp.1692-1716
Makino, S. & Koizumi, N. (1988). Improvement on Adaptation of an Echo Canceller in a
Room. IEICE Letter Fundamentals, Vol. J 71-A, No.12, pp.2212-2214
Tsunekawa,Y., Takahashi,K., Toyoda,S. & Miura, M. (1999). High-performance VLSI
Architecture of Multiplier-less LMS Adaptive Filter Using Distributed Arithmetic.
IEICE Trans. Fundamentals, Vol.J82-A, No.10, pp.1518-1528
Takahashi,K., Tsunekawa,Y., Toyoda,S. & Miura, M. (2001). High-performance Architecture
of LMS Adaptive Filter Using Distributed Arithmetic Based on Half-Memory
Algorithm. IEICE Trans. Fundamentals, Vol.J84-A, No.6, pp.777-787
Takahashi, K., Tsunekawa, Y., Tayama, N., Seki, K. (2002). Analysis of the Convergence
Condition of LMS Adaptive Digital Filter Using Distributed Arithmetic. IEICE
Trans. Fundamentals, Vol.E85-A, No.6, pp1249-1256
Takahashi, K., Kanno, D. & Tsunekawa, Y. (2006). High-Performance Pipelined Architecture
of Adaptive Digital Filter Employing FIR Filter with Minimum Coefficients Delay
and Output Latency. IEICE Trans. Fundamentals, Vol.J89-A, No.12, pp.1130-1141
Cowan, C.F.N. & Mavor., J. (1981). New digital adaptive-filter implementation using
distributed-arithmetic techniques. IEE Proc., vol.128, Pt.F, no.4, pp.225--230
Cowan, C.F.N, Smith, S.G. & Elliott, J.H. (1983). A digital adaptive filter using a memory-
accumulator architecture:theory and realization. IEEE Trans. Acoust., Speech \&
Signal Process., vol.31, no.3, pp.541--549
Peled, A. & Liu, B. (1974). A new hardware realization of digital filters. IEEE Trans. Acoust.,
Speech \& Signal Process., vol.22, no.12, pp.456--462
Wei, C.H. & Lou, J.J. (1986). Multimemory block structure for implementing a digital
adaptive filter using distributed arithmetic. IEE Proc., vol.133, Pt.G, no.1, pp.19--26
Long, G., Ling, F. & Proakis, J.G. (1989). The LMS algorithm with delayed coefficient
adaptation. IEEE Trans. Acoust. Speech Signal Process., vol.37, no.9, pp.1397-1405
Long, G., Ling, F. & Proakis, J.G. (1992). Corrections to “The LMS algorithm with delayed
coefficient adaptation”. IEEE Trans. Acoust. Speech Signal Process., vol.40, no.1,
pp.230-232
Meyer, M.D. & Agrawal, D.P. (1993). A high sampling rate delayed LMS filter architecture.
IEEE Trans. Circuits \& Syst. II, vol.40, no.11, pp.727-729
Wang, C.L. (1994). Bit-serial VLSI implementation of delayed LMS transversal adaptive
filters. IEEE Trans. Signal Processing, vol.42, no.8, pp.2169--2175
Harada, A., Nishikawa, K. & Kiya, H. (1998). Pipelined Architecture of the LMS Adaptive
Digital Filter with the Minimum Output Latency. IEICE Trans. Fundamentals,
Vol.E81-A, No.8, pp.1578-1585
NTT DATA Corporation (1990). PARTHENON User's Manual. Japan.
Part 2
Complex Structures,
Applications and Algorithms
5
1. Introduction
Adaptive filters are often involved in many applications, such as system identification,
channel estimation, echo and noise cancellation in telecommunication systems. In this
context, the Least Mean Square (LMS) algorithm is used to adapt a Finite Impulse Response
(FIR) filter with a relatively low computation complexity and good performance. However,
this solution suffers from significantly degraded performance with colored interfering
signals, due to the large eigenvalue spread of the autocorrelation matrix of the input signal
(Vaseghi, 2008). Furthermore, as the length of the filter is increased, the convergence rate of
the algorithm decreases, and the computational requirements increase. This can be a
problem in acoustic applications such as noise cancellation, which demand long adaptive
filters to model the noise path. These issues are particularly important in hands free
communications, where processing power must be kept as low as possible (Johnson et al.,
2004). Several solutions have been proposed in literature to overcome or at least reduce
these problems. A possible solution to reduce the complexity problem has been to use
adaptive Infinite Impulse Response (IIR) filters, such that an effectively long impulse
response can be achieved with relatively few filter coefficients (Martinez & Nakano 2008).
The complexity advantages of adaptive IIR filters are well known. However, adaptive IIR
filters have the well known problems of instability, local minima and phase distortion and
they are not widely welcomed. An alternative approach to reduce the computational
complexity of long adaptive FIR filters is to incorporate block updating strategies and
frequency domain adaptive filtering (Narasimha 2007; Wasfy & Ranganathan, 2008). These
techniques reduce the computational complexity, because the filter output and the adaptive
weights are computed only after a large block of data has been accumulated. However, the
application of such approaches introduces degradation in the performance, including a
substantial signal path delay corresponding to one block length, as well as a reduction in the
stable range of the algorithm step size. Therefore for nonstationary signals, the tracking
performance of the block algorithms generally becomes worse (Lin et al., 2008).
As far as speed of convergence is concerned, it has been suggested to use the Recursive
Least Square (RLS) algorithm to speed up the adaptive process (Hoge et al., 2008).The
convergence rate of the RLS algorithm is independent of the eigenvalue spread.
Unfortunately, the drawbacks that are associated with RLS algorithm including its O(N2)
computational requirements, which are still too high for many applications, where high
110 Adaptive Filtering
speed is required, or when a large number of inexpensive units must be built. The Affine
Projection Algorithm (APA) (Diniz, 2008; Choi & Bae, 2007) shows a better convergence
behavior, but the computational complexity increases with the factor P in relation to LMS,
where P denotes the order of the APA.
As a result, adaptive filtering using subband processing becomes an attractive option for
many adaptive systems. Subband adaptive filtering belongs to two fields of digital signal
processing, namely, adaptive filtering and multirate signal processing. This approach uses
filter banks to split the input broadband signal into a number of frequency bands, each
serving as an independent input to an adaptive filter. The subband decomposition is
aimed to reduce the update rate, and the length of the adaptive filters, hopefully, resulting
in a much lower computational complexity. Furthermore, subband signals are usually
downsampled in a multirate system. This leads to a whitening of the input signals and
therefore an improved convergence behavior of the adaptive filter system is expected. The
objectives of this chapter are: to develop subband adaptive structures which can improve
the performance of the conventional adaptive noise cancellation schemes, to investigate
the application of subband adaptive filtering to the problem of background noise
cancellation from speech signals, and to offer a design with fast convergence, low
computational requirement, and acceptable delay. The chapter is organized as follows. In
addition to this introduction section, section 2 describes the use of Quadrature Mirror
Filter (QMF) banks in adaptive noise cancellation. The effect of aliasing is analyzed and
the performance of the noise canceller is examined under various noise environments. To
overcome problems incorporated with QMF subband noise canceller system, an improved
version is presented in section 3. The system is based on using two-fold oversampled filter
banks to reduce aliasing distortion, while a moderate order prototype filter is optimized
for minimum amplitude distortion. Section 4 offers a solution with reduced
computational complexity. The new scheme is based on using polyphase allpass IIR filter
banks at the analysis stage, while the synthesis filter bank is optimized such that an
inherent phase correction is made at the output of the noise canceller. Finally, section 5
concludes the chapter.
H0 (z) ↓2 y0 ↑2 G0 (z)
x(n)
xˆ(n)
∑
H1 (z) ↓2 ↑2 G1 (z)
y1
X( z ) [ X( z) X ( z)]T (1)
where .T is a transpose operation. Similarly, the analysis filter bank is expressed as,
H 0 ( z ) H 0 ( z )
H( z ) (2)
H 1 ( z) H 1 ( z)
The output of the analysis stage is expressed as,
Y( z ) H( z )X( z) (3)
1 1
Xˆ ( z) X( z ) H 0 ( z)G0 ( z) H 1 ( z)G1 ( z) X( z) H 0 ( z)G0 ( z) H 1 ( z)G1 ( z) (4)
2 2
The right hand side term of equation (4) is the aliasing term. The presence of aliasing causes
a frequency shift of in signal argument, and it is unwanted effect. However, it can be
eliminated by choosing the filters as follows;
H 1 ( z) H 0 ( z) (6)
G0 ( z ) H 0 ( z ) (7)
G1 ( z ) H 0 ( z ) (8)
By direct substitution into Equation (4), we see that the aliasing terms go to zero, leaving
1
Xˆ ( z) X( z ) H 20 ( z) H 21 ( z) (9)
2
1
Xˆ ( e j ) X( e j ) H 20 ( e j ) H 21 ( j ) (10)
2
Therefore, the objective is to determine H 20 ( e j ) such that the overall system frequency
j .n 0
approximates e , i.e. approximates an allpass function with constant group delay n0 . All
four filters in the filter bank are specified by a length L lowpass FIR filter.
x F F1 ↑2
↓2 P0(Z
0
)
z-1 z-1
F1 x̂
↓2 P1(Z) P0F(Z
0
) ↑2
xˆ ( z) 1
T ( z) H 0 ( z)G0 ( z) H 1 ( z)G1 ( z) (11)
x( z) 2
which represents the distortion caused by the QMF bank. T(z) is the overall transfer function
(or the distortion transfer function). The processed signal xˆ (n) suffers from amplitude
distortion if T ( e j ) is not constant for all , and from phase distortion if T(z) does not have
linear phase. To eliminate amplitude distortion, it is necessary to constrain T(z) to be allpass,
whereas to eliminate phase distortion, we have to restrict T(z) to be FIR with linear phase.
Both of these distortions are eliminated if and only if T(z) is a pure delay, i.e.
n0
T ( z) cz (12)
Systems which are alias free and satisfy (12) are called perfect reconstruction (PR) systems.
For any pair of analysis filter, the choice of synthesis filters according to (7) and (8)
eliminates aliasing distortion, the distortion can be expressed as,
1
T ( z) H 0 ( z)H 1 ( z) H 1 ( z)H 0 ( z) (14)
2
The transfer function of the system in (14) can be expressed in terms of polyphase
components as,
1
T ( z) H 2 ( z ) H 2 ( z) 2 z 1F0 ( z 2 )F1 ( z 2 ) (15)
2 0 0
Since H 0 ( z ) is restricted to be FIR, this is possible if and only if F 0 ( z ) and F1 ( z ) are delays,
which means H 0 ( z ) must have the form;
2 n 0 (2 n 1 1)
H 0 ( z) c0 z c1 z (16)
For our purpose of adaptive noise cancellation, frequency responses are required to be more
selective than (16). So, under the constraint of (13), perfect reconstruction is not possible.
However, it is possible to minimize amplitude distortion by optimization procedures. The
coefficients of H 0 ( z ) are optimized such that the distortion function is made as flat as
possible. The stopband energy of H 0 ( z ) is minimized, starting from the stopband
frequency. Thus, an objective function of the form
2 /2
j 2 2
H 0 ( e j ) d (1 ) [1 T ( e ) ] d
s 0 (17)
0 1
The system output Ŝ is obtained after passing the subband error signals e0 and e1 through
a synthesis filter bank G0 ( z ) and G1 ( z ) . The subband adaptive filter coefficients ŵ 0 and
ŵ 1 have to be adjusted so as to minimize the noise in the output signal, in practice, the
adaptive filters are adjusted so as to minimize the subband error signals e0 and e1 .
v0
e0
s xˆ H0 (z) ↓2 +
∑ ↑2 G0 (z)
s
∑ − ŝ
v1 + e1 ∑
H1 (z) ↓2 ∑ ↑2 G1 (z)
−
y0
A(z)
y1
H0 (z) ↓2 ŵ 0
H1 (z) ↓2 ŵ 1
ˆ e 2 (n) e 2 (n)
J (w) (19)
0 1
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 115
where J (w)ˆ is a cost function which depends on the individual errors of the two adaptive
ˆ with respect to the samples of ŵ, we get the
filters. Taking the partial derivatives of J (w)
components of the instantaneous gradient vector. Then, the LMS adaptation algorithm is
expressed in the form;
e (n) e (n)
ˆ i (n 1) (n) 2 e0 (n) 0
ˆ i (n) w
w 2 e1 ( n ) 1 (20)
wˆ i wˆi
for i=0,1,2….. Lw 1, where Lw is the length of the branch adaptive filter. The convergence
of the algorithm (20) towards the optimal solution s sˆ is controlled by the adaptation step
size . It can be shown that the behavior of the mean square error vector is governed by the
eigenvalues of the autocorrelation matrix of the input signal, which are all strictly greater
than zero (Haykin, 2002). In particular, this vector converges exponentially to zero provided
that 1 / max , where max is the largest eigenvalue of the input autocorrelation matrix.
This condition is not sufficient to insure the convergence of the Mean Square Error (MSE) to
its minimum. Using the classical approach , a convergence condition for the MSE is stated as
2 (21)
max
trR
where trR is the trace of the input autocorrelation matrix R .
1 M 1
Gk ( z)H k ( zWMi ) cz
M k 0
(22)
where c is a constant, WM is the Mth root of unity, with i=0,1,2,...M-1 and is the
analysis/synthesis reconstruction delay. Thus, the prototype filter order partly defines the
signal delay in the system. The above equation is the perfect reconstruction (PR) condition
in z-transform domain for causal M-channel filter banks. The characteristic feature of the
paraunitary filter bank is the relation of analysis and synthesis subfilters; they are connected
via time reversing. Then, the same PR-condition can be written as,
1 M 1
H k ( z1 )H k ( zWMi ) cz
M k 0
(23)
116 Adaptive Filtering
The reconstruction delay of a paraunitary filter bank is fixed by the prototype filter
order, τ = L, where L is the order of the prototype filter. Amplitude response for such a
filter bank is shown in Fig.4. The analysis matrix in (2) can be expressed for the M-band
case as,
H ( z) H 0 ( zW ) H 0 ( zW M 1 )
0
H ( z) H 1 ( zW ) H 1 ( zW M 1 )
H( z) 1 (24)
H M 1 ( z) H M 1 ( zW ) H M 1 ( zW M 1 )
The matrix in (24) contains the filters and their modulated versions (by the Mth root of unity
W e j 2 /M ). This shows that there are M-1 alias components H ( zW k ) , k > 0 in the
reconstructed signal.
0
Mgnitude response (dB)
-20
-40
-60
-80
-100
0 0.1 0.2 0.3 0.4 0.5
Normalized frequency
Fig. 4. Magnitude response of 8-band filter bank, with prototype order of 63
2
Aliasing Distortion
1.5
0.5
0
0 10 20 30 40 50 60 70
Number of Subbands
Fig. 5. Aliasing versus the number of subbands for different prototype filter length
Parameter Value
Noise path length 92
Adaptive filter length 46
Step size µ 0.02
Sampling frequency 8kHz
Input (first test) Variable frequency sinusoid
Gaussian white noise with zero mean and unit
Noise (first test)
variance
Input (second test ) Speech of a woman
Noise ( second test) Machinery noise
Table 1. Test parameters
In a second experiment, a speech of a woman, sampled at 8 kHz, is used for testing.
Machinery noise as an environmental noise is used to corrupt the speech signal.
Convergence behavior using mean square error plots are used as a measure of performance.
These plots are smoothed with 200 point moving average filter and displayed as shown in
Fig.6 for the case of variable frequency sine wave corrupted by white Gaussian noise, and in
Fig.7 for the case speech input corrupted by machinery noise.
118 Adaptive Filtering
-20
-25
2
-30 1
-35
-40
500 1000 1500 2000 2500 3000
Iterations
2.7 Discussion
The use of the two-band QMF scheme, with near perfect reconstruction filter bank, should
lead to approximately zero steady state error at the output of the noise cancellation scheme;
this property has been experimentally verified as shown in Fig.6. The fullband adaptive
filter performance as well as for a four-band critically sampled scheme are shown on the
same graph for sake of comparison. The steady state error of the scheme with two-band
QMF banks is very close to the error of the fullband filter, this demonstrate the perfect
identification property. Those results show that the adaptive filtering process in subbands
based on the feedback of the subbands errors is able to model perfectly a system. The
subband plots exhibit faster initial parts; however, after the error has decayed by about 15
dB (4-band) and 30 dB (2-band), the convergence of the four-band scheme slows down
dramatically. The errors go down to asymptotic values of about -30 dB (2-band) and -20 dB
(4-band). The steady state error of the four-band system is well above the one of the fullband
adaptive filter due to high level of aliasing inserted in the system. The improvement of the
transient behavior of the four-band scheme was observed only at the start of convergence.
The aliased components in the output error cannot be cancelled, unless cross adaptive filters
are used to compensate for the overlapping regions between adjacent filters, this would lead
to an even slower convergence and an increase in computational complexity of the system.
Overall, the convergence performances of the two-band scheme are significantly better than
that of the four-band scheme: in particular, the steady state error is much smaller. However,
the convergence speed is not improved as such, in comparison with the fullband scheme.
The overall convergence speed of the two-band scheme was not found significantly better
than the one of the fullband adaptive filter. Nevertheless, such schemes would have the
practical advantage of reduced computational complexity in comparison with the fullband
adaptive filter.
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 119
-10
3
-20
2
-30
M S E in d B
-40
1
-50
-60
Noise ŵ0
n n0 x0
H0(z) ↓D
n1 x1 ŵ1
H1(z) ↓D
s+ń v0 -
y1
H0(z) ↓D ∑
Speech ~ v1 -
s x1 yM-1
∑ H1(z ↓D ∑
~
x M 1 vM-1 -
HM-1(z) ↓D ∑
Analysis Section
e0
u0
G0(z) ↑D e1
Filtered
Speech ∑
u1 eM-1
ŝ G1(z) ↑D
uM-1
GM-1(z) ↑D
Synthesis Section
1 M 1
T0 ( z) GK ( z)H k ( z)
M k 0
(25)
1 M 1
Ti ( z ) GK ( z)H k ( zWDl ) , for i 1, 2,.....D 1
D k 0
(26)
A critical sampling creates severe aliasing effect due to the transition region of the prototype
filter. This has been discussed in section 2. When the downsampling factor decreases, the
aliasing effect is gradually reduced. Optimizing the prototype filter by minimizing both
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 121
T0 ( z ) and Ti ( z) may result in performance deviated toward one of them. Adjusting such
an optimization process is not easy in practice, because there are two objectives in the design
of the filter bank. Furthermore, minimizing aliasing distortion Ti ( z) using the distortion
function T0 ( z ) as a constraint is a very non-linear optimization problem and the results may
not reduce both distortions. Therefore, in this section, we use 2-times oversampling factor to
reduce aliasing error, and the total system distortion is minimized by optimizing a single
prototype filter in the analysis and synthesis stages. The total distortion function T0 ( z) and
the aliasing distortion Ti ( z) can be represented in frequency domain as,
1 M 1
T0 ( e j ) GK ( e j )H k ( e j )
M k 0 (27)
1 M 1
Ti ( e j ) GK ( e j )H k ( e j WDl )
D k 0
(28)
The objective is to find prototype filters H 0 ( e j ), and G 0 ( e j ), that minimize the system
reconstruction error. In effect, a single lowpass filter is used as a prototype to produce the
analysis and synthesis filter banks by Discrete Fourier Transform (DFT) modulation,
H k ( z) H 0 ( ze 2 k /M ) (25)
L 1 j n
H 0 ( e j ) h0 (n)e (29)
n 0
For a lowpass prototype filter whose stop-band stretches from s to , we minimize the total
stopband energy according to the following function
2
Es H 0 ( e j ) d (30)
s
(1 )
s (31)
2M
where is the roll-off parameter. Stopband attenuation is the measure that is used when
comparing the design results with different parameters. The numerical value is the highest
sidelobe given in dBs when the prototype filter passband is normalized to 0 dB. Es is
expressed with a quadratic matrix as follows;
Es hT Φh (32)
122 Adaptive Filtering
where vector h contains the prototype filter impulse response coefficients, and Φ is given
by Nguyen (1993) as,
nm
s
n ,m 1 (33)
(sin((n m) ) sin((n k ) s )) n m
n m
The optimum coefficients of the FIR filter are those that minimize the energy
function Es in (30). For M-band complementary filter bank, the frequency / 2M
is located at the middle of the transition band of its prototype filter. The pass-band
(1 )
covers the frequency range of 0 . For a given number of subbands, M, a
2M
roll-off factor and for a certain length of prototype filter L we find the optimum
j
coefficients of the FIR filter. The synthesis prototype filter G 0 ( e ), is a time reversed
j
version of H 0 ( e ) . In general, it is not easy to maintain the low distortion level
unless the length of the filter increases to allow for narrow transition regions.
The optimization is run for various prototype filter lengths L, different number of
subbands M and certain roll-off factors . Frequency response of the final design of
prototype filter is shown in Fig.9.
ek ( m ) vk ( m) y k ( m) (35)
ˆ kT (m)x k (m)
y k (m) w (36)
The filter weights in each branch are adjusted using the subband error signal belonging to
the same branch. To prevent the adaptive filter from oscillating or being too slow, the step
size of the adaptation algorithm is made inversely proportional to the power in the subband
signals such that
1
k 2 (37)
xk
where x k is the norm of the input signal and is a small constant used to avoid possible
division by zero. On the other hand, a suitable value of the adaptation gain factor is
deduced using trial and error procedure.
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 123
prototype 2xover,M=8,L=40
0
-10
-20
Magnitude response dB
-30
-40
-50
-60
-70
-80
-90
0 0.1 0.2 0.3 0.4 0.5
Normalized Frequency
utterance “Kosong, Satu, Dua,Tiga” spoken by a woman. The speech was sampled at 16
kHz. Engine noise is used as a background interference to corrupt the above speech. Plots
of MSE are produced as shown in Fig.12. In this figure, convergence plots of a fullband
and critically sampled systems are also depicted for comparison.
Speech
S
+ ↓D
A (z)
z-1 ↓D
Noise ŵ0
x ↑D
↓D - z-1
ŵ1 ~ ↑D +
z-1 ↓D F(z) FFT IFFT F(z)
-
ŵM-1 z-1
z-1 ↓D - ↑D Ŝ
Analysis Synthesis
Adaptive
section section
section
Parameter Specification
Acoustic noise path FIR processor with 512 taps
3.6 Discussion
From Figure 11, it is clear that the MSE plot of the proposed oversampled subband noise
canceller converges faster than the fullband. While the fullband system is converging
slowly, the oversampled noise canceller approaches 25 dB noise reductions in about 2500
iterations. In an environment where the impulse response of the noise path is changing over
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 125
a period of time shorter than the initial convergence period, initial convergence will most
affect cancellation quality. On the other hand, the CS system developed using the method by
(Kim et al. 2008) needs a longer transient time than that OS system. The FB canceller needs
around 10000 iterations to reach approximately a similar noise reduction level. In case of
speech and machinery noise (Fig12), it is clear that the FB system converges slowly with
colored noise as the input to the adaptive filters. Tests performed in this part of the
experiment proved that the proposed optimized OS noise canceller does have better
performance than the conventional fullband model as well as a recently developed critically
sampled system. However, for white noise interference, there is still some amount of
residual error on steady state as it can be noticed from a close inspection of Fig.11.
1 Proposed (OS)
-5 2 Conventional (FB)
3 Kim (CS)
-10
3
MSE dB
-15
-20
2
1
-25
-30
0.5 1 1.5
Iterations x 10
4
0
1 Proposed (OS)
2 Conventional (FB)
3 Kim ( CS)
-5
MSE dB
-10
-15
1
3 2
-20
0 0.5 1 1.5 2
Iterations x 10
4
1 N 1
H 0 ( z) F ( z2 )z k
2 k 0 k
(38)
where,
Lk Lk
k ,n z 2
Fk ( z2 ) Fk ,n ( z2 ) (39)
n1 n1 1 k ,n z2
where αk,n is the coefficient of the kth allpass section in the nth branch Ln is the number of
sections in the nth branch, and N is the order of the section. These parameters can be
determined from filter specifications. The discussion in this chapter is limited to second
order allpass sections, since higher order allpass functions can be built from products of
such second order filters.
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 127
x(n) Σ
+ –
× α z-N
z-N
+
Σ y(n)
+
F0 (z2 ) ∑ ↓2 y0
x (n)
F1 (z 2 ) z-1 ∑ ↓2 y1
1
H 0 ( z ) (F0 ( z 2 ) z 1F1 ( z 2 )) (40)
2
1
H 1 ( z) ( F0 ( z2 ) z1F1 ( z 2 )) (41)
2
Filters H0(z) and H1(z) are bandlimiting filters representing lowpass and highpass
respectively. This modification results in half the number of calculations per input sample
and half the storage requirements. In Fig.14, y0 and y1 represent lowpass and highpass filter
outputs, respectively. The polyphase structure can be further modified by shifting the
downsampler to the input to give more efficient implementation. According to the noble
identities of multirate systems, moving the downsampler to the left of the filter results in the
power of z in F0 ( z2 ) and F1 ( z2 ) to reduced to 1 and the filters becomes F0(z) and F1(z) ,
where F0(z) and F1(z) are causal, real, stable allpass filters. Fig15 depicts the frequency
response of the analysis filter bank.
128 Adaptive Filtering
-20
-40
-60
-80
-100
-0.5 -0.25 0 0.25 0.5
0 0.125 0.25 0.375 0.5
Frequency \Normalized
Frequency/ Normalized
Fig. 15. Analysis filter bank magnitude frequency response
H ( e j ) H ( e j ) e j ( ) (42)
where ( ) is the phase response of the analysis prototype filter. On the other hand, the
synthesis filter bank is based on prototype low pass FIR filter that is related to the analysis
prototype filter by the following relationship
Gd ( e j ) G0 ( e j ) e j H 0 ( e j ) e
j . ( )
(43)
where Gd( e j ) is the desired frequency response of synthesis prototype filter and is the
phase of the synthesis filter. This shall compensate for any possible phase distortion at the
analysis stage. The coefficients of the prototype synthesis filter Gd( e j ) are evaluated by
minimizing the weighted squared of the error that is given by the following
2
WSE Wt( ) G0 ( e j ) Gd. ( e j ) (44)
2
Wt( ) Gˆ 0 ( e j ) Gd. ( e j ) (45)
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 129
-13
x 10
1
0.5
Distortion Functhin
-0.5
-1
-1.5
0 0.1 0.2 0.3 0.4 0.5
Normalized Frequency
the length of the acoustic path usually few thousands of taps, making the adaptive section is
the main bulk of computations. As far as system delay is concerned, the prototype analysis
filter has a group delay between 2.5 and 5 samples except at the band edge where it reaches
about 40 samples as shown in Fig. 17. The maximum group delay due to the analysis filter
bank is 70 samples calculated as 40 samples for the first stage followed by two stages, each
of them working at half the rate of the previous one. The synthesis stage has a maximum
group delay of 27 samples which brings the total delay to 97 samples..
Group Delay
40
35
G roup delay (in s am ples )
30
25
20
15
10
0
0 0.1 0.2 0.3 0.4
Normalized Frequency
Fig. 17. Group delay of prototype analysis filter
In the technique offered by Narasimha (2007) for example, the output is calculated only after
the accumulation of LFB samples block. For a path length of 512 considered in these
experiments, a delay by the same amount of samples is produced, which is higher than the
proposed one, particularly if a practical acoustic path is considered. Therefore for tracking
non-stationary signals our proposed technique offers a better tracking than that offered by
Narasimha (2007). Furthermore, comparison of computational complexity of our LC system
with other literature techniques is depicted in table 3.
at -25 dB after a fast initial convergence. This due to the presence of colored components as
discussed in the last section. Meanwhile, the MSE plot of the proposed LC noise canceller
outperforms the MSE plot of the classical fullband system during initial convergence and
exhibits comparable steady state performance with a little amount of residual noise. This is
probably due to some non linearity which may not be fully equalized by the synthesis stage,
since the synthesis filter bank is constructed by an approximation procedure. However,
subjective tests showed that the effect on actual hearing is hardly noticed. It is obvious that
the LC system reaches a steady state in approximately 4000 iterations. The fullband (FB)
system needs more than 10000 iterations to reach the same noise cancellation level. On the
other hand, the amount of residual noise has been reduced compared to the OS FIR/FIR
noise canceller. Tests performed using actual speech and ambient interference (Fig. 19)
proved that the proposed LC noise canceller does have an improved performance compared
to OS scheme, as well as the FB canceller. The improvement in noise reduction on steady
state ranges from 15-20 dB compared to fullband case, as this is evident from Fig. 20. The
improved results for the proposed LC system employing polyphase IIR analysis filter bank
can be traced back to the steeper transition bands, nearly perfect reconstruction, good
channel separation and very flat passband response, within each band. For an input speech
sampled at 16 kHz, the adaptation time for the given channel and input signal is measured
to be below 0.8 seconds. The convergence of the NLMS approaches above 80% in
approximately 0.5 seconds. The LC noise canceller possesses the advantage of low number
of multiplications required per input sample. To sum up, the proposed LC approach
showed an improved performance for white and colored interference situations, proving
usefulness of the method for noise cancellation.
0
1 LC canceller
-5 2 OS canceller
3 FB canceller
-10
3
-15 2 1
MSE d B
-20
-25
-30
-35
0 0.5 1 1.5 2
Iteration x 10
4
Fig. 18. MSE performance comparison of the proposed low complexity (LC) system with an
equivalent oversampled (OS) and fullband (FB) cancellers under white noise interference
132 Adaptive Filtering
3
-10
-20
MSE dB
-30 1
2
-40 1 LC canceller
2 OS canceller
3 FB canceller
-50
1 2 3 4 5 6
Iteration 4
x 10
Fig. 19. MSE performance comparison of the proposed low complexity (LC) system with an
equivalent oversampled (OS) and conventional fullband (FB) cancellers under ambient noise
5. Conclusion
Adaptive filter noise cancellation systems using subband processing are developed and
tested in this chapter. Convergence and computational advantages are expected from using
such a technique. Results obtained showed that; noise cancellation techniques using
critically sampled filter banks have no convergence improvement, except for the case of
two-band QMF decomposition, where the success was only moderate. Only computational
advantages may be obtained in this case. An improved convergence behavior is obtained by
using two-fold oversampled DFT filter bank that is optimized for low amplitude distortion.
The price to be paid is the increase in computational costs. Another limitation with this
technique is the coloring effect of the filter bank when the background noise is white. The
use of polyphase allpass IIR filters at the analysis stage with inherent phase compensation at
the synthesis stage have reduced the computational complexity of the system and showed
convergence advantages. This reduction in computational power can be utilized in using
more subbands for high accuracy and lower convergence time required to model very long
acoustic paths. Moreover, the low complexity system offered a lower delay than that offered
by other techniques. A further improvement to the current work can be achieved by using a
selective algorithm that can apply different adaptation algorithms for different frequency
bands. Also, the use of other transforms can be investigated.
6. References
Bergen, S.W.A. (2008). A design method for cosine-modulated filter banks using weighted
constrained-least-squares filters, Elsevier Signal Processing Journal, Vol.18, No.3,
(May 2008), pp. 282–290. ISSN 1051-2004.
Adaptive Filtering Using Subband Processing: Application to Background Noise Cancellation 133
Choi, H, & Bae, H.D. (2007). Subband affine projection algorithm for acoustic echo
cancellation system. EURASIP Journal on Advances in Signal Processing, Vol. 2007.
doi:10.1155/2007/75621, ISSN 1110-8657.
Deng, Y.; Mathews, V.J. & Boroujeny, B.F. (2007). Low-Delay Nonuniform Pseudo-QMF
Banks With Application to Speech Enhancement, IEEE Trans. on Signal Processing,
Vol.55, No.5, (May 2007), pp. 2110-2121, ISSN 1053-587X.
Diniz, P. S. R. (2008). Adaptive Filtering: Algorithms and practical implementations 3rd edition,
Springer Science+Business Media, ISBN 978-0-387-3-31274-3, New York, USA.
Hameed, A.K.M. & Elias, E. (2006). M-channel cosine modulated filter banks with linear
phase analysis and synthesis filters. Elsevier Signal Processing, Vol.86, No.12,
December 2006, pp. 3842–3848.
Haykin, S. (2002). Adaptive filter Theory, 4thed, Prentice Hall, ISBN 0-130-90126-1, New Jersey,
USA.
Hoge, S.W.; Gallego, F.; Xiao, Z. & Brooks. D.H. (2008). RLS-GRAPPA: Reconstructing
parallel MRI data with adaptive filters, Proceedings of the 5th IEEE Symposium on
Biomedical Imaging (ISBI 2008), pp. 1537-1540, ISBN 978-1-4244-2002-5,Paris, France,
May 14-17, 2008.
Johnson, J.; Cornu, E.; Choy G. & Wdowiak, J. (2004). Ultra low-power sub-band acoustic
echo cancellation for wireless headsets, Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. V357-60, ISBN0-7803-8484-
9, Montreal, Canada, May 17-21, 2004.
Kim, S.G., Yoo, C.D. & Nguyen, T.Q. (2008). Alias-free subband adaptive filtering with
critical sampling. IEEE Transactions on Signal Processing. Vol.56, No.5, May 2008, pp.
1894-1904. ISSN 1053-587X.
Lin, X.; Khong, A. W. H.; Doroslova˘cki, M. & Naylor, P. A. (2008). Frequency-domain
adaptive algorithm for network echo cancellation in VoIP. EURASIP Journal on
Audio, Speech, and Music Processing. Volume 2008, Article ID 156960, 9 pages,
doi:10.1155/2008/156960, ISSN 1687-4714.
Martinez, J.I.M., & Nakano, K. (2008). Cascade lattice IIR adaptive filter structure using
simultaneous perturbation method for self-adjusting SHARF algorithm, Proceedings
of SICE Annual SICE Annual Conference, pp. 2156-2161, ISBN 978-4-907764-30-2,
Tokyo, Japan, 20-22 August 20-22, 2008.
Mendel, J.M. (1991). Tutorial on higher-order statistics (spectra) in signal processing and
system theory: Theoretical results and some applications. IEEE Transactions, (March
1991), Vol. 79, No.3, pp. 278–305, ISSN 0018-9219.
Milić, L. (2009). Multirate filtering for digital signal processing: MATLAB Applications.
Information Science Reference (IGI Global), ISBN 1605661783, Hershy PA 17033,
USA.
Narasimha, M.J. (2007). Block adaptive filter with time-domain update using three
transforms. IEEE Signal Processing Letters, Vol.14, No.1, (January 2007), pp51-53,
ISSN 1070-9908.
Naylor, P.A., Tanrıkulu,O. & Constantinides, A.G. (1998). Subband adaptive filtering for
acoustic echo control using allpass polyphase IIR filterbanks. IEEE Transactions on
Speech and Audio Processing, Vol.6, No.2, (March 1998), pp. 143-155. ISSN 1063-6676.
Nguyen, T. Q. & Vaidyanathan, P.P. (1988). Maximally decimated perfect – reconstruction
FIR filter banks with pairwise mirror-Image analysis (and synthesis ) frequency
134 Adaptive Filtering
response. IEEE Trans. on Acoustics, Speech and Signal Processing, Vol.36, No.5, (May
1988), pp. 693-706. ISSN 0096-3518.
Poucki , V.M.; Žemvaa, A. ; Lutovacb, M.D. & Karcnik, T. (2010). Elliptic IIR filter
sharpening implemented on FPGA. Elsevier Signal Processing, Vol.20, No.1, (January
2010), pp. 13–22, ISSN 1051-2004.
Radenkovic, M. & Tamal Bose. (2001). Adaptive IIR filtering of non stationary signals.
Elsevier Signal Processing, Vol.81, No.1, (January 2010), pp.183-195, ISSN 0165-1684.
Vaseghi, V.S. (2008). Advanced digital signal processing and noise reduction. 4rd Edition, John
Willey and Sons Ltd,978-0-470-75406-1, West Sussex, England.
Wasfy, M., B. & Ranganathan, R. 2008. Complex FIR block adaptive digital filtering
algorithm with independent adaptation of real and imaginary filter parameters,
Proceedings of the 51st Midwest Symposium on Circuits and Systems, pp. 854-85, ISBN
978-1-4244-2166-4, Knoxville, TN, August 10-13, 2008.
0
6
USA
1. Introduction
Least mean square (LMS) adaptive filters, as investigated by Widrow and Hoff in 1960
(Widrow & Hoff, 1980), find applications in many areas of digital signal processing including
channel equalization, system identification, adaptive antennas, spectral line enhancement,
echo interference cancelation, active vibration and noise control, spectral estimation, and
linear prediction (Farhang-Boroujeny, 1999; Haykin, 2002). The computational burden and
slow convergence speed of the LMS algorithm can render its real time implementation
infeasible. To reduce the computational cost of the LMS filter, Ferrara proposed a frequency
domain implementation of the LMS algorithm (Ferrara, 1980). In this algorithm, the data is
partitioned into fixed-length blocks and the weights are allowed to change after each block
is processed. This algorithm is called the DFT block LMS algorithm. The computational
reduction in the DFT block LMS algorithm comes from using the fast DFT convolution to
calculate the convolution between the filer input and weights and the gradient estimate.
The Hirschman optimal transform (HOT) is a recently developed discrete unitary transform
(DeBrunner et al., 1999; Przebinda et.al, 2001) that uses the orthonormal minimizers of
the entropy-based Hirschman uncertainty measure (Przebinda et.al, 2001). This measure
is different from the energy-based Heisenberg uncertainty measure that is only suited for
continuous time signals. The Hirschman uncertainty measure uses entropy to quantify the
spread of discrete-time signals in time and frequency (DeBrunner et al., 1999). Since the HOT
bases are among the minimizers of the uncertainty measure, they have the novel property of
being the most compact in discrete-time and frequency. The fact that the HOT basis sequences
have many zero-valued samples, as well as their resemblance to the DFT basis sequences,
makes the HOT computationally attractive. Furthermore, it has been shown recently that a
thresholding algorithm using the HOT yields superior frequency resolution of a pure tone in
additive white noise to a similar algorithm based on the DFT (DeBrunner et al., 2005). The
HOT is similar to the DFT. For example, the 32 -point HOT matrix is explicitly given below.
136
2 Adaptive Filtering
Will-be-set-by-IN-TECH
⎡ ⎤
1 0 0 1 0 0 1 0 0
⎢0 1 0 0 1 0 0 1 0 ⎥
⎢ ⎥
⎢0 0 1 0 0 1 0 0 1 ⎥
⎢ ⎥
⎢1 0 0 e− j2π/3 0 0 e− j4π/3 0 0 ⎥
⎢ ⎥
⎢0 e− j2π/3 e− j4π/3 ⎥
⎢ 1 0 0 0 0 0 ⎥ (1)
⎢ ⎥
⎢0 0 1 0 0 e− j2π/3 0 0 e − j4π/3
⎥
⎢ ⎥
⎢1 0 0 e− j4π/3 0 0 e− j8π/3 0 0 ⎥
⎢ ⎥
⎣0 1 0 0 e− j4π/3 0 0 e− j8π/3 0 ⎦
0 0 1 0 0 e− j4π/3 0 0 e− j8π/3
In general, the NK-point HOT basis are generated from the N-point DFT basis as follows.
Each of the DFT basis functions are interpolated by K and then circularly shifted to produce
the complete set of orthogonal basis signals that define the HOT. The computational saving
of any fast block LMS algorithm depends on how efficiently each of the two convolutions
involved in the LMS algorithm are calculated (Clark et al., 1980; Ferrara, 1980). The DFT
block LMS algorithm is most efficient when the block and filter sizes are equal. Recently, we
developed a fast convolution based on the HOT (DeBrunner & Matusiak, 2003). The HOT
convolution is more efficient than the DFT convolution when the disparity in the lengths of
the sequences being convolved is large. In this chapter we introduce a new fast block LMS
algorithm based on the HOT. This algorithm is called the HOT DFT block LMS algorithm. It
is very similar to the DFT block LMS algorithm and reduces its computational complexity by
about 30% when the filter length is much smaller than the block length. In the HOT DFT block
LMS algorithm, the fast HOT convolution is used to calculate the filter output and update the
weights.
Recently, the HOT transform was used to develop the HOT LMS algorithm (Alkhouli et
al., 2005; Alkhouli & DeBrunner, 2007), which is a transform domain LMS algorithm, and
the HOT block LMS algorithm (Alkhouli & DeBrunner, 2007), which is a fast block LMS
algorithm. The HOT DFT block LMS algorithm presented here is different from the HOT
block LMS algorithm presented in (Alkhouli & DeBrunner, 2007). The HOT DFT block LMS
algorithm developed in this chapter uses the fast HOT convolution (DeBrunner & Matusiak,
2003). The main idea behind the HOT convolution is to partition the longer sequence into
sections of the same length as the shorter sequence and then convolve each section with the
shorter sequence efficiently using the fast DFT convolution. The relevance of the HOT will
become apparent when the all of the (sub)convolutions are put together concisely in a matrix
form as will be shown later in this chapter.
The following notations are used throughout this chapter. Nonbold lowercase letters are
used for scalar quantities, bold lowercase is used for vectors, and bold uppercase is used
for matrices. Nonbold uppercase letters are used for integer quantities such as length or
dimensions. The lowercase letter k is reserved for the block index. The lowercase letter n
is reserved for the time index. The time and block indexes are put in brackets, whereas
subscripts are used to refer to elements of vectors and matrices. The uppercase letter N
is reserved for the filter length and the uppercase letter L is reserved for the block length.
The superscripts T and H denote vector or matrix transposition and Hermitian transposition,
respectively. The N-point DFT matrix is denoted by F N or simply by F. The subscripts F and
H are used to highlight the DFT and HOT domain quantities, respectively. The N × N identity
matrix is denoted by I N × N or I. The N × N zero matrix is denoted by 0 N × N . The linear and
Hirschman Optimal Transform (HOT) DFT Block LMS Algorithm 1373
Hirschman Optimal Transform (HOT)
DFT Block LMS Algorithm
circular convolutions are denoted by ∗ and , respectively. Diag [ u ] or U denotes the diagonal
matrix whose diagonal elements are the elements of the vector u.
In section 2, The explicit relation between the DFT and HOT is developed. The HOT
convolution is presented in Section 3. In Section 4, the HOT DFT block LMS algorithm is
developed. Its computational cost is analyzed in Section 5. Section 6 contains the convergence
analysis and Section 7 contains its misadjustment. Simulations are provided in Section 8
before the conclusions in Section 9
ũ = Pu. (4)
Without loss of generality, we consider the special case of N = 4 and K = 3 to find an explicit
relation between the DFT and HOT. The 4 × 3-point HOT is given by
⎡ ⎤
1 0 0 1 0 0 1 0 0 1 0 0
⎢0 1 0 0 1 0 0 1 0 0 1 0 ⎥
⎢ ⎥
⎢0 0 1 0 0 1 0 0 1 0 0 1 ⎥
⎢ ⎥
⎢1 0 e −
0 j2π/4 0 0 e − j4π/4 0 0
e − j6π/4 0 0 ⎥
⎢ ⎥
⎢0 1 0 0 e − j2π/4 0 0 e − j4π/4 0 0 e − j6π/4 0 ⎥
⎢ ⎥
⎢0 e− j2π/4 e− j4π/4 e− j6π/4 ⎥
⎢ 0 1 0 0 0 0 0 0 ⎥
H=⎢ ⎥.
⎢1 0 e− j4π/4
0 0 0 e− j8π/4 0 0 e− j12π/4 0 0 ⎥
⎢ − − − ⎥
⎢0 1 0 0 e j4π/4 0 0 e j8π/4 0 0 e 12π/4 0 ⎥
⎢ − − − ⎥
⎢0 0 1 0 0 e j4π/4 0 0 e j8π/4 0 0 e j12π/4 ⎥
⎢ ⎥
⎢1 0 e −
0 j6π/4 0 0 e − j12π/4 0 0 e − j18π/4 0 0 ⎥
⎢ ⎥
⎣0 1 0 0 e− j6π/4 0 0 e− j12π/4 0 0 e− j18π/4 0 ⎦
0 0 1 0 0 e − j6π/4 0 0 e − j12π/4 0 0 e − j18π/4
(6)
Equation (6) shows that the HOT takes the 4-point DFTs of the 3 polyphase components and
then reverses the polyphase decomposition. Therefore, the relation between the DFT and HOT
can be written as ⎡ ⎤
F 4 04 × 4 04 × 4
⎢ ⎥
H = P⎢ ⎣ 04×4 F4 04×4 ⎦ P.
⎥ (7)
04 × 4 04 × 4 F 4
Also, it can be easily shown that
⎡ −1 ⎤
F 4 04 × 4 04 × 4
⎢ ⎥
H −1 = P ⎢ −1
⎣ 04×4 F4 04×4 ⎦ P.
⎥ (8)
04×4 04×4 F4−1
According to the overlap-save method (Mitra, 2000), y(n ) for 0 ≤ n ≤ KN, where K is an
integer, can be calculated by dividing u (n ) into K overlapping sections of length 2N and h(n )
is post appended with N zeros as shown in Figure 1 for K = 3. The linear convolution in (9)
can be calculated from the circular convolutions between and h(n ) and the sections of u (n ).
Let u k (n ) be the kth section of u (n ). Denote the 2N-point circular convolution between u k (n )
and h(n ) by ck (n ) = u k (n ) h(n ).
Hirschman Optimal Transform (HOT) DFT Block LMS Algorithm 1395
Hirschman Optimal Transform (HOT)
DFT Block LMS Algorithm
u( n) h( n )
...... ...... ...... ...... ......
−N
− N+ 1
2N
−1
2N − 1
2N + 1
3N − 1
N
N− 1
N+ 1
N− 1
0
1
0
1
u1 ( n ) h( n )
...... ...... ......
−N
− N+ 1
2N − 1
−1
N− 1
N− 1
N+ 1
0
1
0
1
u2 ( n ) h( n )
...... ...... ......
N
2N − 1
N− 1
N+ 1
N− 1
N+ 1
0
1
0
1
2N − 1
u3 ( n ) h( n )
...... ...... ......
2N
2N − 1
2N + 1
3N − 1
N
N
2N − 1
N+ 1
N− 1
N+ 1
0
1
The circular convolution ck (n ) can be calculated using the 2N-point DFT as follows. First,
form the vectors ⎡ ⎤
h (0)
⎢ h (1) ⎥
⎢ ⎥
⎢ .. ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ h ( N − 1) ⎥
h=⎢ ⎥, (10)
⎢ 0 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
⎡ ⎤
c k (0)
⎢ c k (1) ⎥
⎢ ⎥
ck = ⎢ .. ⎥, (11)
⎣ . ⎦
ck (2N − 1)
Then the 2N-point DFT of ck is given by
where u k is the vector that contains the elements of u k (n ). “·” indicates pointwise
matrix multiplication and, throughout this chapter, pointwise matrix multiplication takes
a lower precedence than conventional matrix multiplication. Combining all of the circular
convolutions into one matrix equation, we should have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
F2N c0 F2N h F2N u0
⎢ F2N c1 ⎥ ⎢ F2N h ⎥ ⎢ F2N u1 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = ⎢ . ⎥·⎢ .. ⎥. (13)
⎣ . ⎦ ⎣ .
. ⎦ ⎣ . ⎦
F2N cK −1 F2N h F2N u K −1
Using equation (7), equation (13) can be written as
Hc̃ = Hũ · Hh̃r , (14)
where ⎡ ⎤
h
⎢h⎥
⎢ ⎥
hr = ⎢ . ⎥ , (15)
⎣ .. ⎦
h
and ⎡ ⎤
u0
⎢ u1 ⎥
⎢ ⎥
u = ⎢ . ⎥. (16)
⎣ .. ⎦
u k −1
Therefore, The vector of the circular convolutions is given by
c = PH−1 Hũ · Hh̃r . (17)
According to the overlap-save method, only the second half of ck corresponds to the kth section
of the linear convolution. Denote the kth section of the linear convolution by y k and the vector
that contains the elements of y(n ) by y. Then y k can be written as
y k = 0 N × N I N × N ck , (18)
and y as
y = Gc, (19)
where ⎡ ⎤
0N×N IN×N 02N ×2N · · · 02N ×2N
⎢ 02N ×2N 0N×N IN×N · · · 02N ×2N ⎥
⎢ ⎥
G=⎢ .. .. .. .. ⎥. (20)
⎣ . . . . ⎦
02N ×2N 02N ×2N · · · 0N×N IN×N
Finally, the linear convolution using the HOT is given by
y = GPH−1 Hũ · Hh̃r . (21)
1. Divide u (n ) into K overlapping sections and combine them into one vector to from u.
2. Perform K-band polyphase decomposition of u to form ũ.
3. Take the HOT of ũ.
4. Post append h(n ) with N zeros and then stack the appended h(n ) K times into one vector
to form hr .
5. Perform K-band polyphase decomposition of hr to form h̃r .
6. Take the HOT of h̃r .
7. Point-wise multiply the vectors from steps 3 and 6.
8. Take the inverse HOT of the vector from step 7.
9. Perform K-band polyphase decomposition of the result from step 8.
10. Multiply the result of step 9 with G.
be the vector of input samples needed in the kth block. To use the fast HOT convolution
described in the previous section, û (k) is divided is into K overlapping sections. Such sections
142
8 Adaptive Filtering
Will-be-set-by-IN-TECH
According the fast HOT convolution, see equation (21), the output of the adaptive filter in the
kth block ⎡ ⎤
y(kL )
⎢ y(kL + 1) ⎥
⎢ ⎥
⎢ .. ⎥
y (k) = ⎢ . ⎥ (26)
⎢ ⎥
⎣ y(kL + L − 2) ⎦
y(kL + L − 1)
is given by
y (k) = GPH−1 HPwr (k) · HPJû(k) . (27)
The desired signal vector and the filter error in the kth block are given by
⎡ ⎤
d(kL )
⎢ d(kL + 1) ⎥
⎢ ⎥
⎢ .. ⎥
d̂ (k) = ⎢ . ⎥ (28)
⎢ ⎥
⎣ d(kL + L − 2) ⎦
d(kL + L − 1)
and ⎡ ⎤
e(kL )
⎢ e(kL + 1) ⎥
⎢ ⎥
⎢ .. ⎥
ê(k) = ⎢ . ⎥, (29)
⎢ ⎥
⎣ e(kL + L − 2) ⎦
e(kL + L − 1)
respectively, where
e ( n ) = d ( n ) − y ( n ). (30)
Hirschman Optimal Transform (HOT) DFT Block LMS Algorithm 1439
Hirschman Optimal Transform (HOT)
DFT Block LMS Algorithm
The sum in equation (31) can be efficiently calculated using the ( L + N )-point DFTs of the error
vector e(n ) and input vector u (n ). However, the ( L + N )-point DFT of u (n ) is not available
and only the 2N-point DFTs of the K sections of û(k) are available. Therefore, the sum in
equation (31) should be divided into K sections as follows:
⎡ ⎤
u (kL + i )
⎢ u (kL + i − 1) ⎥
L −1 ⎢ ⎥
⎢ .. ⎥
∑ ⎢⎢ . ⎥ e(kL + i ) =
⎥
i =0 ⎣
u (kL + i − N + 2) ⎦
u (kL + i − N + 1)
⎡ ⎤
u (kL + l N + j)
⎢ u (kL + l N + j − 1) ⎥
K −1 N −1 ⎢ ⎥
⎢ .. ⎥
∑ ∑ ⎢⎢ . ⎥ e(kL + lK + j).
⎥
(32)
l =0 j =0 ⎣
u (kL + l N + j − N + 2) ⎦
u (kL + l N + j − N + 1)
For each l, the sum over j can be calculated as follows. First, form the vectors
⎡ ⎤
u (kL + l N − N )
⎢ .. ⎥
⎢ ⎥
u l (k) = ⎢ . ⎥, (33)
⎣ u (kL + l N + N − 2) ⎦
u (kL + l N + N − 1)
⎡ ⎤
0 N ×1
⎢ e(kL + l N ) ⎥
⎢ ⎥
⎢ . ⎥
el (k) = ⎢ .
. ⎥. (34)
⎢ ⎥
⎣ e(kL + l N + N − 2) ⎦
e(kL + l N + N − 1)
Then the sum over j is just the first N elements of the circular convolution of el (k) and
circularly shifted u l (k) and it can be computed using the DFT as shown below:
N −1
∑ u l (k)e(kL + lK + j) = U N u ∗lF (k) · elF (k) , (35)
j =0
where
IN×N 0N×N
UN = , (36)
0N×N 0N×N
144
10 Adaptive Filtering
Will-be-set-by-IN-TECH
μ K −1
w ( k + 1) = w ( k ) + ∑ U N F −1 u ∗lF (k) · elF (k) . (39)
L l =0
Next, we express the sum in equation (39) in terms of the HOT. Form the vectors
⎡ ⎤
u0 ( k )
⎢ u1 ( k ) ⎥
⎢ ⎥
u (k) = ⎢ .. ⎥, (40)
⎣ . ⎦
u K −1 ( k )
⎡ ⎤
e0 ( k )
⎢ e1 ( k ) ⎥
⎢ ⎥
e(k) = ⎢ .. ⎥. (41)
⎣ . ⎦
e K −1 ( k )
Then using equation (7), the filter update equation can be written as
μ
w(k + 1) = w(k) + SPH−1 H∗ ũ (k) · Hẽ(k) , (42)
L
where the matrix S is given by
⎡ ⎤
1 K ×1 0 K ×1 · · · 0 K ×1
⎢ 0 K ×1 1 K ×1 · · · 0 K ×1 ⎥
⎢ ⎥
⎢ .. .. . . .. 0 N ×KN ⎥ .
S=⎢ . . . . ⎥ (43)
⎢ ⎥
⎣ 0 K ×0 0 K ×1 · · · 1 K ×1 ⎦
0 N ×KN 0 N ×KN
Figure 2 shows the flow block diagram of the HOT DFT block LMS adaptive filter.
Fig. 2. The flow block diagram of the HOT DFT block LMS adaptive filter.
146
12 Adaptive Filtering
Will-be-set-by-IN-TECH
The ratio between the number of multiplications required for the HOT DFT block LMS
algorithm and the number of multiplications required for the DFT block LMS algorithm is
plotted in Figure 3 for different filter lengths. The HOT DFT block LMS filter is always
more efficient than the DFT block LMS filter and the efficiency increases as the block length
increases.
1.1
N = 50
N = 15
1 N = 10
N=5
0.9
0.8
0.7
Ratio
0.6
0.5
0.4
0.3
0.2
100 200 300 400 500 600 700 800 900 1000
Block size
Fig. 3. Ratio between the number of multiplications required for the HOT DFT and the DFT
block LMS algorithms.
μ K −1
w F ( k + 1) = w F ( k ) + ∑ FU N F −1 u ∗lF (k) · elF (k) . (44)
L l =0
Let the desired signal be generated using the linear regression model
d ( n ) = w o ( n ) ∗ u ( n ) + e o ( n ), (45)
where wo (n ) is the impulse response of the Wiener optimal filter and eo (n ) is the irreducible
estimation error, which is white noise and statistically independent of the adaptive filter input.
In the kth block, the l th section of the desired signal in the DFT domain is given by
d̂ l (k) = 0 N × N I N × N F −1 woF (k) · u lF (k) + êol (k), (46)
where
0N×N 0N×N
LN = , (48)
0N×N IN×N
and F (k) = woF − w F (k). Using equation (44), the error in the estimation of the adaptive filter
weight vector F (k) is updated according to
μ K −1
F ( k + 1) = F ( k ) − ∑ U N,F u ∗lF (k) · elF (k) , (49)
L l =0
where
IN×N 0N×N
U N,F = F F −1 . (50)
0N×N 0N×N
Taking the DFT of equation (47), we have that
elF (k) = L N,F F (k) · u lF (k) + eolF (k), (51)
where
0N×N 0N×N
L N,F = F F −1 . (52)
0N×N IN×N
Using equation (51), we can write
u ∗lF (k) · elF (k) = U lF
∗
(k) L N,F U lF (k)F (k) + eolF (k) . (53)
Using
∗
U lF (k)L N,F U lF (k) = u ∗lF (k)u lF
T
(k) · L N,F , (54)
equation (53) can be simplified to
u ∗lF (k) · elF (k) = u ∗lF (k)u lFT
(k) · L N,F F (k) + u ∗lF (k) · eolF (k). (55)
For the block LMS algorithm, the mean square error is given by
1 L−1 2
J (k) = E ∑ e(kL + i ) , (61)
L i =0
where J o is the mean square of eo (n ). Assuming that (k) and Diag[ u lF (k)] are independent,
the excess mean square error is given by
K −1
1
2NL l∑
Jex (k) = E FH (k)EU lF
H
(k)L N,F U lF
H
( k ) F ( k ). (64)
=0
Using equation (54), the excess mean square error can be written as
K
Jex = EFH (k) Ru,F · L N,F F (k), (65)
2NL
or equivalently
K
Jex = tr Ru,F · L N,F EF (k)FH (k) . (66)
2NL
10
LMS
DFT Block
0 HOT−DFT
−10
−20
Mean sqaure error dB
−30
−40
−50
−60
−70
−80
−90
0 0.5 1 1.5 2 2.5 3
Number of iteration x 10
4
Fig. 4. Learning curves for the LMS, HOT DFT block LMS, and DFT block LMS algorithms.
N = 4 and K = 3. ρ = 0.9.
10
LMS
DFT Block
0 HOT−DFT
−10
−20
Mean sqaure error dB
−30
−40
−50
−60
−70
−80
−90
0 0.5 1 1.5 2 2.5 3
Number of iteration x 10
4
Fig. 5. Learning curves for the LMS, HOT DFT block LMS, and DFT block LMS algorithms.
N = 50 and K = 10. ρ = 0.9.
Another coloring filter was also used to simulate the learning curves of the algorithms.
The coloring filter was a bandpass filter with H (z) = 0.1 − 0.2z−1 − 0.3z−2 + 0.4z−3 +
0.4z−4 − 0.2z−5 − 0.1z−6 . The frequency response of the coloring filter is shown in Figure
7. The learning curves are shown in Figure 8. The simulations are again consistent with the
theoretical predictions presented in this chapter.
150
16 Adaptive Filtering
Will-be-set-by-IN-TECH
10
LMS
DFT Block
0 HOT−DFT
−10
−20
Mean sqaure error dB
−30
−40
−50
−60
−70
−80
−90
0 0.5 1 1.5 2 2.5 3
Number of iteration x 10
4
Fig. 6. Learning curves for the LMS, HOT DFT block LMS, and DFT block LMS algorithms.
N = 50 and K = 10. ρ = 0.8.
−5
Magnitude (dB)
−10
−15
−20
−25
−30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
−200
Phase (degrees)
−400
−600
−800
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (×π rad/sample)
10
LMS
DFT Block
0 HOT−DFT
−10
−20
Mean sqaure error dB
−30
−40
−50
−60
−70
−80
−90
0 0.5 1 1.5 2 2.5 3
Number of iteration x 10
4
Fig. 8. Learning curves for the LMS, HOT DFT block LMS, and DFT block LMS algorithms.
N = 50 and K = 10.
9. Conclusions
In this chapter a new computationally efficient block LMS algorithm was presented. This
algorithm is called the HOT DFT block LMS algorithm. It is based on a newly developed
transform called the HOT. The basis of the HOT has many zero-valued samples and resembles
the DFT basis, which makes the HOT computationally attractive. The HOT DFT block LMS
algorithm is very similar to the DFT block LMS algorithm and reduces it computational
complexity by about 30% when the filter length is much smaller than the block length. The
analytical predictions and simulations showed that the convergence characteristics of the HOT
DFT block LMS algorithm are similar to that of the DFT block LMS algorithm.
10. References
Alkhouli, O.; DeBrunner, V.; Zhai, Y. & Yeary, M. (2005). “FIR Adaptive filters based
on Hirschman optimal transform,” IEEE/SP 13th Workshop on Statistical Signal
Processing, 2005.
Alkhouli, O. & DeBrunner, V. (2007). “Convergence Analysis of Hirschman Optimal
Transform (HOT) LMS adaptive filter,” IEEE/SP 14th Workshop on Statistical Signal
Processing, 2007.
Alkhouli, O. & DeBrunner, V. (2007). “Hirschman optimal transform block adaptive filter,”
International conference on Acoustics, Speech, and Signal Processing (ICASSP), 2007.
Clark, G.; Mitra S. & Parker, S. (1981). “Block implementation of adaptive digital filters,” IEEE
Trans. ASSP, pp. 744-752,ăJun 1981.
DeBrunner, V.; Özaydin, M. & Przebinda T. (1999). “Resolution in time-frequency,” IEEE Trans.
ASSP, pp. 783-788, Mar 1999.
152
18 Adaptive Filtering
Will-be-set-by-IN-TECH
DeBrunner, V. & Matusiak, E. (2003). “An algorithm to reduce the complexity required to
convolve finite length sequences using the Hirschman optimal transform (HOT),”
ICASSP 2003, Hong Kong, China, pp. II-577-580, Apr 2003.
DeBrunner, V.; Havlicek, J.; Przebinda, T. & Özaydin M. (2005). “Entropy-based uncertainty
measures for L2 ( R)n , 2 ( Z ), and 2 ( Z/NZ ) with a Hirschman optimal transform for
2 ( Z/NZ ) ," IEEE Trans. ASSP, pp. 2690-2696, August 2005.
Farhang-Boroujeny, B. (2000). Adaptive Filters Theory and Applications. Wiley, 1999.
Farhang-Boroujeny, B. & Chan, K. (2000). “Analysis of the frequency-domain block LMS
algorithm,” IEEE Trans. ASSP, pp. 2332, Aug. 2000.
Ferrara, E. (1980). “Fast implementation of LMS adaptive filters,” IEEE Trans. ASSP, vol.
ASSP-28, NO. 4, Aug 1980.
Mitra, S. (2000). Digital Signal Processing. Mc Graw Hill, Second edition, 2000.
Haykin S. (2002). Adaptive Filter Theory. Prentice Hall information and system sciences series,
Fourth edition, 2002.
Przebinda, H.; DeBrunner, V. & Özaydin M. (2001). “The optimal transform for the discrete
Hirschman uncertainty principle,” IEEE Trans. Infor. Theory, pp. 2086-2090, Jul 2001.
Widrow, B. & Hoff, Jr., M. (1980). “Adaptive switching circuit,” IRE WESCON Conv. Rec., pt. 4
pp. 96-104, 1980.
7
1. Introduction
The techniques for noise cancellation have been developed with applications in signal
processing, such as homomorphic signal processing, sensor array signal processing and
statistical signal processing. Some exemplar applications may be found from kepstrum (also
known as complex ceptstrum) method, beamforming and ANC (adaptive noise cancelling)
respectively as shown in Fig. 1.
Fig. 1. Signal processing techniques and the application of methods for noise cancellation
Based on the two-microphone approach, the applications are characterized as three
methods, which are based on identification of unknown system in acoustic channels,
adaptive speech beamforming and adaptive noise cancellation. It can be described as
generalized three sub-block diagram as shown in Fig. 2, where it is shown as three
processing stages of (1) kepstrum (complex cepstrum), (2) beamforming and (3) ANC and
also two structures of beamforming and ANC.
154 Adaptive Filtering
+ 0 .5 +
H2 W1 ( z) − W 2 (z ) −
Adaptive filter 1 Error 1 Adaptive filter 2 Error 2
MIC 2
noise ratio) improvement over the LMS algorithm (Harrison et al., 1986) but it requires a
high demand of computational complexity for the processing. Delay filter 2 is used as
noncausality filter to maintain a causality.
As desribed above, the techniques have been developed on the basis of above described
methods and the structures. From the above analysis, kepstrum noise cancelling technique
has been studied, where the kepstrum has been used for the identification of acoustic
transfer functions between two microphones and the kepstrum coefficients from the ratio of
two acoustic transfer functions have been applied in front of adaptive beamforming
structure for noise cancellation and speech enhancement (Jeong & Moir, 2008).
Furthermore, by using the fact that the random signal plus noise may be represented as
output of normalized minimum phase spectral factor from the innovations white-noise
input (Kalman & Bucy, 1961), the application of an innovations-based whitened form (here
we call it as inverse kepstrum) has been investigated in a simulation test, where front-end
inverse kepstrum has been analyzed with application of cascaded FIR LMS algorithm
(Jeong, 2009) and also FIR RLS algorithm (Jeong, 2010a; 2010b), both in ANC structure for
noise cancellation.
In this paper, for a practical real-time processing using RLS algorithm, analysis of
innovations-based whitening filter (inverse kepstrum) has been extended to beamforming
structure and it has been tested for the application in a realistic environment. From the
simulation test, it will be shown that overall estimate from front-end inverse kepstrum
processing with cascaded FIR RLS approximates with estimate of IIR RLS algorithm in ANC
structure. This provides alternative solution from computational complexity on ANC
application using pole-zero IIR RLS filter, which is mostly not acceptable to practical
applications. For the application in realistic environment, it has been applied to
beamforming structure for an effective noise cancelling application and it will be shown that
the front-end kepstrum application with zero-model FIR RLS provides even better
performance than pole-zero model IIR RLS algorithm in ANC structure.
1 Φ xd (z)
Hopt = = A(z)B(z) (1)
H (z) H − (z) +
+
From the equation (1), it may be regarded as a cascaded form of transfer functions A(z) and
B(z), where Φ xd (z) is the double-sided z-transform of the cross-correlation function between
the desired signal and the reference signal. H + ( z) and H − ( z ) are the spectral factors of the
double-sided z-transform, Φ xx ( z ) from the auto-correlation of reference signal. These
spectral factors have the property that the inverse z-transform of H + ( z) is entirely causal
and minimum phase, on the other hand, the inverse z- transform of H − ( z) is non causal. The
notation of + in outside bracket indicates that the z- transform of the causal part of the
inverse z- transform of B(z) is being taken.
156 Adaptive Filtering
From the optimum Wiener filtering structure, the innovations process ε n can be obtained by
the inverse of spectral factor A(z) from the input signal of desired signal plus noise as shown
in Fig. 3. Therefore, the optimal Wiener filter can be regarded as combination of two
cascaded filters, a front-end whitening filter A(z), which generates the white innovations
process and a cascaded shaping filter B(z), which provides a spectral shaping function for
the input signal.
Hopt(z)
A(z ) B (z )
xn = s n + vn 1 εn Φ xd ( z ) yn = sˆn
+
H ( z) H − (z)
+
Fig. 3. Analysis of innovations-based optimal Wiener filter: A(z): whitening filter and B(z):
spectral shaping filter
It can be applied to two-microphone noise cancelling structure as optimum IIR Wiener
filtering approach as shown in Fig. 4.
sn dn
H (z )
A(z ) B(z )
vn xn 1 εn yn + en
Φ xd ( z )
+
H ( z) H − (z)
+ −
vn
ξn sn + x n xn 1 εn
H (z) H + (z)
+
Fig. 5. (A): The generating input model for signal plus noise ( xn ) (B): whitening model for
innovations-based white noise input (ε n )
To obtain the innovations white noise, the processing procedure is described as:
Step 1. Take periodogram (P) from FFTs (fast Fourier transforms) of the input signal x n .
1 2
P= Xi (2)
N
where N is frame size and i = 0, 1, 2,….,N-1.
Step 2. Get the kepstrum coefficients from the inverse FFT (IFFT) of the logarithm of the
periodogram.
1
log +
↔ −K + (z) (4)
H (z)
(n + 1)hn + 1 = m = 0 (n + 1 − m)hm ( kn + 1 − m ),
n
0≤n≤l−1 (5)
Step 7. Finally, convolve the impulse response (5) with input signal x n to obtain the
innovations whitened sequence.
n
∇ h ( J n ) = ∇ h ( β n − k ek2 ) = 0 (6)
k =1
wnR n = p n (7)
n
where autocorrelation matrix, R n = β n − k x k xTk = XT Λ X , cross-correlation vector,
k =1
n
p n = β n − k dk xTk = XT Λ d with Λ = diag [ β n − 1 β n − 2 ....1]
k =1
Both R n and p n can be computed recursively:
R n = β R n − 1 + x n x nH , p n = β p n − 1 + dn x n (8)
To find the weight vector wn from (7), we need the inverse matrix from R n . Using a R n−1
matrix inversion lemma (Haykins, 1996), a recursive update equation for R n−1 is found as:
β −1R n−1− 1x n
where gain vector, μn' =
1 + β −1xTn R n−1− 1x n
The equation (9) is known as ordinary RLS algorithm and it is valid for FIR filters because
no assumption is made about the input data x n . We can then find the weights update
equation as:
Primary MIC
DL
(z)
H1
Reference MIC
Noise
H 2 ( z) H1(z) / H2(z) yn + Error
nn Adaptive filter
− en
(A)
DL
(z)
H1
yn +
H 2 ( z) 1/ H2 (z) H 1 (z)
nn − en
Fig. 6. (A) typical ANC method (B) front-end innovations based inverse kepstrum method,
where both are applied to ANC structure
dn +
DL
+ 0 .5
( z)
H1
0 .5
+ +
yn
H 2 ( z)
xn
KI(z) −
L (z ) − en
nn xn'
Whitening filter
5. Experiment
The objective is to analyze the operation of the front-end innovations based whitening method
and the rear-end FIR RLS filter between ANC and beamforming structure. For the simulation
test, 2 kepsturm coefficients and first order of zero model RLS have been used, which will be
compared with pole-zero model IIR RLS with first order of numerator polynomial and first
order of denominator polynomial in ANC structure. Based on this, it will be tested in
beamforming structrue for real-time processing in a realistic room environment, where noise
cancelling performance will be compared with typical IIR RLS method in ANC structure. For
the application of signal plus noise, a simple sine waveform (consisting of 500Hz, 550Hz and
700Hz) has been selected as a desired signal, which considered as a desired signal of speech
signal with real data in noise signal. For the processing, two FFT points (2048 in simulation
160 Adaptive Filtering
test and 4096 in real test) frame sizes have been used, and sampling frequency of 22050Hz and
Nyquist frequency of around 11000Hz have been chosen. For the precise test, programmed
operation is made to stop the estimate to freeze both kepstrum coefficients and adaptive (FIR
and IIR RLS) filter weights when the signal is applied as desired speech signal (Jeong, 2010a;
2010b). The frozen coefficients and weights are then applied to desired signal and noise
periods. For the test in a real environment, two unidirectional microphones (5cm distance
apart) with broadside configuration have been set up and tested in a corner of room
(3.8m(d)x3m(w)x2.8m(h)) with moderate reverberant status.
+
H1( z)
H (z) =
H 2 (z) +
ξn
+ en
1/ H2(z) H1 ( z) −
5.2 Operation of innovations-based whitening filter and cascaded zero-model FIR RLS
filter in ANC structure
To verify the operation of inverse kepstrum whitening filter with a nonminimum phase
term from numerator polynomial and a minimum phase term from denominator
polynomial, H ( z) = H 1 ( z) / H 2 ( z) has been used as a simple example of unknown system,
where each acoustic transfer functions are
Hence H(z) = (1 + 1.5z −1 ) /(1 + 0.4z −1 ) , which is illustrated as zero (z = −1.5) and pole
(p = −0.4) in Fig. 9 (A).
Therefore, it can be described as a polynomial of:
Fig. 9. Comparison of pole-zero placement: (A): ordinary IIR RLS (B): front-end inverse
kepstrum method and cascaded FIR RLS
As shown in Fig. 9 (B), the front-end inverse kepstrum estimates minimum phase term (13)
in denominator polynomial and cascaded zero-model RLS estimates remaining
nonminimum phase term (14) in numerator polynomial,
It is also compared in terms of overall estimate, where overall estimate (III) from (C) is
obtained from the convolution of estimate (I) and estimate (II). Table 1 shows that (A) is the
ordinary IIR RLS with one pole (p = −0.4) and one zero (z = −1.5) model , (B) is its estimates,
and (C) is estimates of front-end inverse kepstrum and cascaded FIR RLS as listed in Table 1.
From the observation, it can be found that innovations based inverse kepstrum gives
approximation to the ordinary IIR RLS, where it is also be verified in Fig. 9.
1 1 .5 - - -
1 0 .4 - - -
1 1 .1 − 0 . 44 0 . 176 − 0 .070
1 1 . 499 - - -
1 0 .4 - - -
1 1 . 099 − 0.439 0 . 175 − 0 .070
1 - 0 . 397 0 .078 - -
1 1 . 501 - - -
1 1 . 096 - 0 .525 0 .122 0.000
Table 1. Comparison of overall estimate in vector weights: (A) IIR RLS in theory (B) IIR RLS
in estimate (C) front-end innovations based inverse kepstrum and cascaded FIR RLS in
estimate
162 Adaptive Filtering
0 .5
xn + yn + en
KI(z) xn' x 'n' L (z )
− −
Fig. 10. Application of front-end whitening filter and rear-end adaptive filter to
beamforming structure
Without application of whitening filter, acoustic path transfer function is estimated by
adaptive filter L(z) as the ratio of combined transfer functions,
H(z) = (H1 (z) + H 2 (z)) /(H1 (z) − H2 (z)) in beamforming structure. With application of
whitening filter 1 / H 2 (z) , the rear end adaptive filter estimates
L(z) = (H1 (z) + 1) /(H1 (z) − 1) in beamforming structure as shown in Fig. 11 (A), where it is
shown that adpative filter is only related with estimates of H1 (z) . From the analysis on the
last ANC structure, adaptive filter now estimates only numerator polynomial part
(H1 (z) + 1) with one sample delay filter D −1 as shown in Fig. 11 (B). Both whitening
coefficients and adaptive filter weights are continously updated during the noise period
only, and then stopped and frozen during the signal plus noise period.
+
H1 ( z) + H 2 ( z) +
H(z) =
H1 ( z) − H 2 ( z) +
ξn +
+ + en
KI (z) =
1 − L (z ) =
H 1 ( z) + 1
H2 (z) H 1 ( z) − 1
−
+
H1 ( z) + 1
L(z) =
H1( z) − 1
D −1 +
ξn
+ en
L(z) = H1(z) +1 −
Fig. 12. Locations of pole-zero placement: (A) H1 (z) = 1 + 0.2z −1 (B) H1 (z) = 1 + 1z −1 (C)
H1 (z) = 1 + 1.5z −1 (D) H1 (z) = 1 + 2z −1 : H2 (z) is commonly applied as H2 (z) = 1 + 0.4z −1
The FIR RLS filter is then estimated on (H1 (z) + 1) , which gives that L(z) = 1 + 0.75z −1 ,
where a0 =0.75. It shows that weight value is half in size from the orignal weight value,
1.5 in H1 ( z) . Fig. 12 shows pole-zero locations according to different weight value in
H 1 ( z) , where a0 values are (A) 0.2 (B) 1 (C) 1.5 and (D) 2. With the use of three inverse
kepstrum coefficients as shown in Fig. 12, it shows that adaptive FIR RLS is approximated
to the half values, which are (A) 0.1 (B) 0.5 (C) 0.75 and (D) 1, respectively.
164 Adaptive Filtering
5.5 Test of noise cancellation on signal plus noise for real-time processing in a
realistic environment
For real-time processing in a realistic room environment, it has been tested for the
comparison of 1) the noise cancelling performance at each step in beamforming structure,
and 2) the performance on front-end whitening application between ANC structure and
beamforming structure, and finally 3) the noise cancelling performance in noise and signal
plus noise between ordinary ANC approach using IIR RLS in ANC structure and front-end
whitening approach with FIR RLS in beamforming structure.
Firstly, as shown in Fig. 13, the noise cancelling perfomance has been found from each
processing stage, of 1) microphone output x n , 2) inverse kepstrum filter output x'n 3) overall
output e n with application of inverse kepstrum filter only and 4) overall output e n with
application of inverse kepstrum filter and FIR RLS filter from the each points in Fig. 10. For
this test, 32 inverse kepstrum coefficients have been processed with FFT frame size 4096.
Based on this, it is found that inverse kepstrum filter works well in beamforming structrure.
Secondly, with the sole application by inverse kepstrum filter only, its noise cancelling
performance has been tested in (A) ANC structure and it has been compared in (B)
beamforming structure as shown in Fig. 14. From the test, it has been found that inverse
kepsrum is more effective in beamforming structure than its application in ANC structure.
Thirdly, it has also been compared in average power spectrum between IIR RLS in ANC
structure and inverse kepstrum filter in front with rear-end FIR RLS in beamforming
structure. From the test result, it shows that inverse kepstrum provides better noise
cancelling performance in frequency range over 1000 Hz for noise alone period as well as
signal plus noise period as shown in Fig. 15.
Fig. 14. (A) Comparison in ANC structure: between (i) whitening filter application only and
(ii) no-processing, (B) comparison in beamforming structure: between (i) whitening filter
application only and (ii) no-processing
166 Adaptive Filtering
Fig. 15. Average power spectrum showing noise cancelling performance: comparison
between (i) IIR RLS in ANC structure and (ii) whitening filter with FIR RLS in beamforming
structure during the period of (A) noise and (B) signal and noise
6. Conclusion
It has been shown in simulation test that the application of front-end innovations-based
whitening application (inverse kepstrum method) to cascaded zero model FIR RLS
algorithm in ANC structure could perform almost same performance on convergence
compared with pole-zero model IIR RLS in ANC structure. For the more effective
performance in realistic environment, the front-end whitening application with rear-end
FIR RLS to beamforming structure has shown better noise cancelling performance than the
ordinary approach using pole-zero model IIR RLS in ANC structure. Therefore, when it is
processed in real-time, it is claimed that the front-end whitening application could
provide an effective solution due to a reduced computational complexity in inverse
kepstrum processing using FFT/IFFT, which could be a benefit over sole application of
IIR RLS algorithm.
7. Acknowledgment
This work was supported in part by the UTM Institutional Grant vote number 77523.
Real-Time Noise Cancelling Approach on
Innovations-Based Whitening Application to Adaptive FIR RLS in Beamforming Structure 167
8. References
Berghe, J. V. & Wouters, J. (1998). An adaptive noise canceller for hearing aids using two
nearby microphones, Journal of the Acoustical Society of America, 103 (6), pp. 3621-
3626
Compernolle, D. V. (1990). Switching adaptive filters for enhancing noisy and reverberant
speech from microphone array recordings, International conference on acoustics,
speech, and signal processing (ICASSP), pp. 833-836, Albuquerque,
Griffiths L. J. & Jim C. W. (1982). An alternative approach to linearly constrained
adaptive beamforming, IEEE transactions on antennas and propagation, vol. AP-
30, pp. 27-34
Haykin, S. (1996). Adaptive filter theory, third ed., Prentice-Hall Inc., Upper Saddle River,
NJ,.
Jeong, J. & Moir, T. J. (2008). A real-time kepstrum approach to speech enhancement and
noise cancellation, Neurocomputing 71(13-15), pp.2635-2649
Jeong, J. (2009). Analysis of inverse kepstrum and innovations-based applicaion to noise
cancellation, proceedings of the IEEE international symposium on industrial electronics
(ISIE), pp. 890-896, July 5-8, 2009
Jeong, J. (2010a). Inverse kepstrum approach to FIR RLS algorithm and application to
adaptive noise canceling, proceedings of IEEE international conference on
industrial technologies (ICIT), pp. 203-208, Vina del mar, Chile, March 14-17,
2010
Jeong, J. (2010b). Real-time acoustic noise canceling technique on innovations-based
inverse kepstrum and FIR RLS, proceedings of IEEE international symposium on
intelligent control (ISIC), pp. 2444-2449, Yokohama, Japan, September 08-10,
2010
Kailath, T. (1968). An innovations approach to least-squares estimation-part I: linear filtering
in additive white noise, IEEE transactions on automatic control, vol. 13, issue 6, pp.
646–655
Kalman, R.E. & Bucy, R. S. (1961). New results in linear filtering and prediction theory,
Transactions of the ASME – Journal of basic engineering, 83: (1961): pp. 95-107
Knapp, C. & Carter, G. C. (1976). The generalized correlation method for estimation of time
delay, IEEE transaction on acoustics, speech and signal processing, ASSP-24(4), pp. 320-
327
Ljung, L. & Sodestrom, T. (1987). Theory and practice of recursive estimation: MIT Press
Moir, T. J. & Barrett, J. F. (2003). A kepstrum approach to filtering, smoothing and prediction
with application to speech enhancement, Proc. R. Soc. Lond. A 2003(459): pp.2957-
2976
Silvia, M. T. & Robinson, E. A. (1978). Use of the kepstrum in signal analysis, Geoexploration,
16(1978), pp. 55-73.
Wallace, R. B. & Goubran, R. A. (1992). Noise cancellation using parallel adaptive filters,
IEEE transaction on circuits and systems-II: Analog and digital signal processing, 39 (4):
pp. 239-243
168 Adaptive Filtering
Widrow, B., Glover, J. R. Jr., McCool, J. M., Kaunitz, J., Williams, C. S., Hearn, R. H., Zeidler,
J. R., Dong, E. Jr., & Goodlin, R. C. (1975). Adaptive noise cancelling: principles and
applications, Proceedings of the IEEE, 63 (12), pp.1692-1716
Widrow, B. & Hoff, M. E. (1960). Adaptive switching circuits, IRE Wescon Convention Record,
pp. 96-104
8
1. Introduction
1.1 Background
In ordinary channel equalizer and multi-antenna system, many types of detecting methods
have been proposed to compensate the distorted signals or recover the original symbols of
the desired user [1]-[3]. For channel equalization, transversal equalizers (TEs) and decision
feedback equalizers (DFEs) are commonly used as a detector to compensate the distorted
signals [2]. It is well-known that a DFE performs significantly better than a TE of equivalent
complexity [2]. As to a multi-user-multi-antenna system, adaptive beamforming (BF)
detectors have provided practical methods to recover the symbols of the desired user [3].
Many classical optimization algorithms, such as minimum mean-squared error (MMSE) [1]-
[4], minimum bit-error rate (MBER) [5]-[9], adaptive MMSE/MBER training methods [6],
[10]-[12] and the bagging (BAG) adaptive training method [13], are proposed to adjust the
parameters of the above mentioned classical detectors (i.e., TE, DFE and BF).
Due to the optimal nonlinear classification characteristics in the observed space, Bayesian
decision theory derived from maximum likelihood detection [15] has been extensively
exploited to design the so-called Bayesian TE (BTE) [14]-[15], Bayesian DFE (BDFE) [16]-[17]
and Bayesian BF (BBF) [18]-[19]. The bit-error rate (BER) or symbol-error rate (SER) results
of Bayesian-based detectors are often referred to as the optimal solutions, and are extremely
superior to those of MMSE, MBER, adaptive MMSE (such as least mean square algorithm
[1]), adaptive MBER (such as linear-MBER algorithm [6]) or BAG-optimized detector. The
BTE, BDFE and BBF can be realized by the radial basis functions (RBFs) [14], [17], [19]-[23].
Classically, the RBF TE, RBF DFE or RBF BF is trained with a clustering algorithm, such as k-
means [14], [17], [24] and rival penalized competitive learning (RPCL) [25]-[31]. These
clustering techniques can help RBF detectors find the center vectors (also called center units
or centers) associated with radial Gaussian functions.
increases linearly, the number of hidden nodes in RBF TE grows exponentially, so do the
computation and hardware complexity [20]. Trial-and-error method is an alternative way to
determine the number of hidden nodes of RBF.
Except the clustering RBF detectors, there are other types of nonlinear detectors, such as
multilayer perceptrons (MLPs) [32]-[38], adaptive neuro fuzzy inference system (ANFIS)
[39]-[41] and self-constructing recurrent fuzzy neural networks (SCRFNNs) [42]-[44].
Traditionally, MLP and ANFIS detectors are trained by the back-propagation (BP) learning
[32], [34], [35], [38], [40]. However, due to the improper initial parameters of MLP and
ANFIS detectors, the BP learning often results in an occurrence of local minima which can
lead to bad performance [38]. Recently, evolution strategy (ES) has been also used to train
the parameters of MLP and ANFIS detectors [36], [41]. Although the ES inherently is a
global and parallel optimization learning algorithm, tremendous computational costs in the
training process make it impractical in modern communication environments. In addition,
the structures (i.e., the numbers of hidden nodes) of MLP and ANFIS detectors must be
fixed and assigned in advance and determined by trial-and-error method.
In 2005, the SCRFNN detector and its another version, i.e., self-constructing fuzzy neural
network (SCFNN), have been applied to the channel equalization problem [43]-[44].
Specifically, the SCRFNN or SCFNN equalizers perform both self-constructing process and
BP learning process simultaneously in the training procedure without the knowledge of
channel characteristics. Initially, there are no hidden nodes (also called fuzzy rules
hereinafter) in the SCRFNN or SCFNN structure. All of the nodes are flexibly generated
online during the self-constructing process that not only helps automate structure
modification (i.e., the number of hidden nodes is automatically determined by the self-
constructing algorithm instead of the trial-and-error method) but also locates good initial
parameters for the subsequent BP algorithm. The BER or SER of the SCRFNN TE and
SCFNN TE thus is extremely superior to that of the classical BP-trained MLP and ANFIS
TEs, and is close to the optimal Bayesian solution. Moreover, the self-constructing process of
SCRFNN and SCFNN can construct a more compact structure due to setting conditions to
restrict the generation of a new hidden node, and hence SCRFNN and SCFNN TEs results in
lower computational costs compared to traditional RBF and ANFIS TEs.
Although the SCRFNN TE and SCFNN TE in [43-44] have provided a scheme to obtain
satisfactory BER and SER performance with low computational complexity, it doesn’t take
advantage of decision feedback signals to improve the detecting capability. In Section 2, a
novel DFE structure incorporated with a fast SCFNN learning algorithm is presented. We
term it as fast SCFNN (FSCFNN) DFE [58]. FSCFNN DFE is composed of several FSCFNN
TEs, each of which corresponding to one feedback input vector. Because the feedback input
vector occurs independently, only one FSCFNN TE is activated to decide the estimated
symbol at each time instant. Without knowledge of channel characteristics, the
improvement over the classical SCRFNN or SCFNN TE can be achieved by FSCFNN DFE in
terms of BER, computational cost and hardware complexity.
In modern communication channels, a time-varying fading caused by Doppler effect [33],
[37], [49] and a frequency offset casued by Doppler effect and/or mismatch between the
frequencies of the transmitter and receiver oscillators are usually unavoidable [45].
Moreover, a phase noise [45] also may exist due to distorted transmission environment
and/or imperfect oscillators. Therefore, these distortions need to be compensated at the
receiver to avoid a serious degradation. To the best of our knowledge, most of the work in
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 171
the area of nonlinear TE or DFE over the past few decades focuses on the time-invariant
channels. Therefore, the simulations of the FSCFNN DFE and the other nonlinear equalizing
methods will be investigated in Section 2.3 under the linear and nonlinear channels with
time-invariant or time-varying environment.
the mean square error (MSE) for a DFE is always smaller than that of a TE, especially
when the channel has a deep spectral null in its bandwidth [2]. However, if the channel
has severely nonlinear distortions, classical TE and DFE perform poorly. Generally
speaking, the nonlinear equalization techniques proposed to address the nonlinear
channel equalization problem are those presented in [14], [16], [17], [22], [32], [35], [39],
[44], [54]. Chen et al. have derived a Bayesian DFE (BDFE) solution [16], which not only
improves performance but also reduces computational cost compared to the Bayesian
transversal equalizer (BTE). Based on the assumption that the channel order nh has been
known, i.e., the channel order nh has been successfully estimated before detection process,
a radial basis function (RBF) detection can realize the optimal BTE and BDFE solutions
[14], [16]. However, as the channel order or/and the equalizer order increases, the
computational cost and memory requirement will grow exponentially as mentioned in
Section 1.2.
A powerful nonlinear detecting technique called fuzzy neural network (FNN) can make
effective use of both easy interpretability of fuzzy logics and superior learning ability of
neural networks, hence it has been adopted for equalization problems, e.g. an adaptive
neuro fuzzy inference system (ANFIS)-based equalizer [39] and a self-constructing recurrent
FNN (SCRFNN)-based equalizer [44]. Multilayer perceptron (MLP)-based equalizers [32],
[35] are another kind of detection. Both FNN and MLP equalizers do not have to know the
channel characteristics including the channel order and channel coefficients. For ANFIS and
MLP nonlinear equalizers, the structure size must be fixed by trial-and-error method in
advance, and all parameters are tuned by a gradient descent method. As to SCRFNN
equalizer, it can simultaneously tune both the structure size and the parameters during its
online learning procedure. Although the SCRFNN equalizer has provided a scheme to
automatically tune the structure size, it doesn’t derive an algorithm to improve the
performance with the aid of decision feedback symbols. Thus, a novel adaptive filtering
based on fast self-constructing neural network (FSCFNN) algorithm has been proposed with
the aid of decision feedback symbols [58].
nh
r (n) g(rˆ(n)) v( n) g hi s( n i ) v(n ) , (2.1)
i 0
where g() is a nonlinear distortion, hi is the channel coefficient of the linear FIR channel
rˆ(n) with length nh + 1 (nh is also called channel order), s(n) is the transmitted symbol at the
time instant n, and v(n) is the AWGN with zero mean and variance v2 . The standard DFE is
characterized by the three integers Nf, Nb and d known as the feedforward order, feedback
order, and decision delay, respectively. We define the feedforward input vector at the time
instant n as the sequence of the noisy received signals {r(n)} inputting to the DFE, i.e.,
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 173
rˆ(n) g (rˆ(n))
sˆ(n d )
FSCFNN section
The 1st
FSCFNN
r ( n N f 1) The Nsth
FSCFNN
Feedforward section
If sb(n)=sb,j ,
Then activating the jth
FSCFNN equalizer
Feedback section
Rd 1 j 2 Nb Rd , j (2.7)
where Rd,j = {sf(n)|sb(n)=sb,j}, . Since each feedback state sb,j occurs independently, the
FSCFNN DFE uses N s 2 N b FSCFNN detectors to separately classify the Ns feedforward
input subsets Rd,j into 2 classes. Thus, for the feedforward input vectors belonging to Rd,j, the
jth FSCFNN detector corresponding to Rd,j will be exploited as shown in Figure 2.2 to
further classify subset Rd,j into 2 subsets according to the value of s(n-d), i.e.,
where Rd( i,)j { s f ( n) ( sb (n) sb, j ) (s(n d ) si )} , i = 1, 2. Thus, a feedfoward input vector
with sb,j being its feedback state can be equalized by solely observing subset Rd,j
corresponding to the jth FSCFNN detector.
B. Learning of the FSCFNN with decision feedback
If the FSCFNN DFE (Figure 2.2) receives a feedforward input vector sf(n) with sb(n)=sb,j at n,
the j-th FSCFNN detection will be activated as mentioned above. The structure of this j-th
FSCFNN detection is shown in Figure 2.3. The output of the j-th FSCFNN detector is
defined as
O j (n) k j 1 k , j (n)Ok(3)
K ( n)
, j ( n) (2.9)
with
(O(1)
p ( n) mkp , j ( n))
2
, j ( n) p 1 exp(
N
Ok(3) f
2
) (2.10)
2 kp , j (n)
O j (n)
1, j (n) K ( n ), j ( n)
2 , j ( n ) j
( 2)
O11( 2,)j (n) O21
( 2)
, j ( n) OK( 2j)( n ),1, j (n) O1(N2 )f , j (n) O2 N f , j (n) OK j ( n ), N f , j (n)
( 2)
r (n) r (n N f 1)
in the j-th FSCFNN detector. Finally, the output value of FSCFNN DFE (Figure 2.2) at n is
expressed as y(n) = Oj(n).
Based on the self-constructing and parameter learning phases in SCRFNN structure [44], a
fast learning version [58] has been proposed for FSCFNN DFE to further reduce the
computational cost in the training period. Similarly, there are no fuzzy rules initially in each
FSCFNN detector. As sb(n)=sb,j at n, the proposed fast self-constructing and parameter
learning phases are performed simultaneously in the j-th FSCFNN structure. In the self-
constructing learning phase, we use two measures to judge whether to generate the hidden
176 Adaptive Filtering
node or not. The first is the measure of the system error s(n d ) sˆ(n d) for considering
the generalization performance of the overall network. The second is the measure of the
maximum membership degree max max k Ok(3)
, j ( n ) . Consequently, for a feedforward input
vector sf(n) with sb(n)=sb,j, the fast learning algorithm contains three possible scenarios to
perform the self-constructing and parameter learning phases:
a. 0 and max min : It shows that the network obtains an incorrect estimated symbol
and no fuzzy rule can geometrically accommodate the current feedforward input vector
sf(n). Our strategy for this case is to try improving the entire performance of the current
network by adding a fuzzy rule to cover the vector sf(n) , i.e., Kj(n+1) = Kj(n) + 1. The
parameters associated with the new fuzzy rule in the antecedent part of the j-th
FSCFNN are initialized the same as those of SCRFNN:
A. Time-invariant channel
Several comparisons are made with various methods for the nonlinear time-invariant
channel A. Figure 2.4 shows the BER performance and average numbers of fuzzy rules
needed in computation for FSCFNN DFE under various values min in a different length of
training. Clearly, the results of BER performance are similar if min 0.05 , but the numbers
of rules are increased as min grows. Moreover, it shows that the needed training data size
for FSCFNN DFE is about 300. Figure 2.5 demonstrates the BER performance and average
numbers of rules for various methods. The SCRFNN with min 0.00003 is used in this
plot. The FSCFNN DFEs with min 0.5 and min 0.05 are respectively denoted as
FSCFNN DFE(A) and FSCFNN DFE(B) in this plot. Obviously, the FSCFNN DFEs are
superior to the other methods. Because we want to obtain satisfactory BER performance,
both 400 training data size for various methods and min 0.05 for FSCFNN DFE will be set
in the following simulations.
(a)
0
min = 0.95
-0.5
min = 0.5
-1 min = 0.05
min = 0.005
-1.5
log10(BER)
-2
-2.5
-3
-3.5
-4
-4.5
100 200 300 400 500 600 700
Training data size
(b)
2.8
2.7
min = 0.95
2.6 min = 0.5
min = 0.05
2.5
min = 0.005
Numbers of rules
2.4
2.3
2.2
2.1
1.9
1.8
100 200 300 400 500 600 700
Training data size
Fig. 2.4 Performance of FSCFNN DFE for various values min with a different length of
training in the time-invariant channel A as SNR = 18 dB: (a) BER (b) Numbers of fuzzy rules
Figure 2.6 illustrates the performance of various methods at different SNRs. Note that the
BERs in SNR = 20 dB are gained by averaging 10000 runs for accurate consideration.
178 Adaptive Filtering
Without knowledge of the channel, FSCFNN DFE improves the BER performance close to
optimal BDFE solutions in satisfactory low numbers of rules.
Figures 2.7 & 2.8 show examples of the fuzzy rules generated by SCRFNN equalizer and
FSCFNN DFE as SNR = 18 dB. The channel states and decision boundaries of the optimal
solution are also plotted. The j-th FSCFNN detector can geometrically cluster the
feedforward input vectors associated with sb(n)=sb,j, and in Figure 2.8, only 2 fuzzy rules in
each FSCFNN are generated. Because the SCRFNN equalizer needs to cluster the whole
input vectors, 4 fuzzy rules are created to attain this purpose (Figure 2.7). Therefore,
FSCFNN DFE requires lower computational cost than SCRFNN in the learning or
equalization period. In Figure 2.8, the optimal decision boundaries for four types of
feedforward input vector subsets Rd,j are almost linear, but the optimal decision boundary in
SCRFNN is nonlinear. It also implies that classifying the distorted received signals into 2
classes in FSCFNN DFE is easier than that in SCRFNN equalizer. This is the main reason
that the BER performance of FSCFNN DFE is superior to that of the classical SCRFNN
equalizer.
B. Time-varying channel
The FSCFNN DFE is tested on time-varying channel environments. The following linear
multipath time-varying channel model is used:
(a) (b)
0 18
SCRFNN
-0.5 ANFIS DFE 16
RBF DFE
-1 FSCFNN DFE(A) 14
FSCFNN DFE(B)
-1.5 12 SCRFNN
ANFIS DFE
Numbers of rules
RBF DFE
log10(BER)
-2 10
FSCFNN DFE(A)
-2.5 8 FSCFNN DFE(B)
-3 6
-3.5 4
-4 2
-4.5 0
100 200 300 400 500 600 700 100 200 300 400 500 600 700
Training data size Training data size
Fig. 2.5 Performance of various methods with a different length of training in the time-
invariant channel A as SNR = 18 dB: (a) BER (b) Numbers of fuzzy rules
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 179
(a) (b)
-1 18
16
-2
14
ANFIS DFE
-3 12
SCRFNN
Numbers of rules
RBF DFE
log10(BER)
10
FSCFNN DFE
-4
BDFE
8
-5 6
ANFIS DFE
SCRFNN 4
-6 RBF DFE
FSCFNN DFE 2
BDFE
-7 0
8 10 12 14 16 18 20 8 10 12 14 16 18 20
SNR,dB SNR,dB
Fig. 2.6 Performance of various methods for different SNRs in the time-invariant channel A:
(a) BER (b) Numbers of fuzzy rules
1
r(n-1)
-1
-2
-3
-2 -1 0 1 2 3
r(n)
Fig. 2.7 Fuzzy rules generated by trained SCRFNN (ellipse), channel states (small circles)
and optimal decision boundary (line) in the time-invariant channel A as SNR = 18dB
Figure 2.9 shows the performance of various methods for different fd in time-varying
channel B. Because the fade rate fd in a real world is usually no larger than 0.1, thus we run
the simulations from fd = 0.02 (slowly time-varying) to fd = 0.18 (fast time-varying). The
FSCFNN DFEs with min 0.95 and min 0.05 are, respectively, denoted as FSCFNN
DFE(A) and FSCFNN DFE(B) here. Besides, the values min in SCRFNN(A) and
SCRFNN(B) are set as 0.00003 and 0.003, respectively. When the value of min is large
enough, the BER performance of FSCFNN DFE for various time-varying environments may
be satisfactory. Also, numbers of rules in FSCFNN DFE are increased as min grows.
Because the FSCFNN DFE(B) has better performance in both time-varying channels B and C
than the classical equalizers, the value min 0.05 is used in the following simulations of
this paper.
180 Adaptive Filtering
2 2
r(n-1)
r(n-1)
1 1
0 0
-1 -1
-2 0 2 4 -2 0 2 4
r(n) r(n)
s b(n) = s b,3 = [-1, +1] s b(n) = s b,4 = [-1, -1]
2 2
1 1
r(n-1)
r(n-1)
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
r(n) r(n)
Fig. 2.8 Fuzzy rules generated by trained FSCFNN DFE (ellipse), channel states (small
circles) and optimal decision boundaries (lines) for four feedback input vectors in the time-
invariant channel A as SNR = 18dB
Figure 2.10 shows the performance of various methods for different SNRs in time-varying
channel B. The SCRFNN equalizer with min 0.003 is used here for obtaining satisfactory
performance. Note that the BER results as SNR = 18 dB in Figure 2.10(a) are gained by
averaging 10000 runs for accurate consideration. The BER performance of FSCFNN DFE is
slightly better than that of RBF DFE. However, the RBF DFE is assumed that the perfect
knowledge of the channel order is acquired in advance for simulations. Similarly, numbers
of rules needed in computation for FSCFNN DFE are the best.
(a) (b)
-1 18
SCRFNN(A)
SCRFNN(B) 16
-1.5
ANFIS DFE
RBF DFE 14
-2 FSCFNN DFE(A)
SCRFNN(A)
FSCFNN DFE(B) 12 SCRFNN(B)
Numbers of rules
ANFIS DFE
-2.5
log10(BER)
10 RBF DFE
FSCFNN DFE(A)
-3 8 FSCFNN DFE(B)
6
-3.5
4
-4
2
-4.5 0
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18
fd fd
Fig. 2.9 Performance of various methods for different fd in the time-varying channel B as
SNR = 16 dB: (a) BER (b) Numbers of rules
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 181
(a) (b)
-1 18
-1.5 16
-2
14
-2.5 SCRFNN
12
ANFIS DFE
Numbers of rules
-3 RBF DFE
log10(BER)
10
FSCFNN DFE
-3.5
8
-4
6
-4.5
SCRFNN 4
-5
ANFIS DFE
-5.5 RBF DFE 2
FSCFNN DFE
-6 0
8 10 12 14 16 18 8 10 12 14 16 18
SNR,dB SNR,dB
Fig. 2.10 Performance of various methods for different SNRs as fd = 0.1 in the time-varying
channel B (a) BER (b) Numbers of rules
M
xl (n) Ai bi (n)exp( jtl (i )) vl (n) (3.1)
i 1
where 1 l L , xl ( n ) xlR ( n ) jxlI ( n ) is the complex-valued array input signal of the lth
linear array element, n denotes the bit instance, the ith user’s signal bi(n) is assumed to be a
182 Adaptive Filtering
binary signal taking from the set {±1} with equal probability, Ai2 denotes the signal power
of user i, tl (i ) [(l 1)d sin i ] c [57] is the relative time delay at element l for user i, θi is the
direction of arrival (DOA) for user i, c is speed of light and vl (n) is the complex-valued
white Gaussian noise having a zero mean and a variance of 2 v2 . Without loss of generality,
user 1 is assumed to be a desired user and the rest of the users are interfering users. The
desired user’s SNR is defined as SNR A12 2 v2 . We can rewrite (3.1) in a vector form:
K
ys ( x(n)) wkGk (n) , (3.3)
k 1
where K is the number of fuzzy rules, wk is the real-valued consequent weight of the k-th
fuzzy rule, and Gk(n) is the Gaussian membership function (GMF) of the k-th fuzzy rule,
which is associated with the current array input vector x(n):
where c kl c klR jc klI and kl klR j klI , respectively, are the complex-valued center and
complex-valued width of the k-th fuzzy rule for the l-th array input signal, and we define
the center vector and width vector of the k-th fuzzy rule as ck ≡ [ck1,...,ckL]T and σk ≡ [σk1,...,
σkL]T. The major difference between the equation (4.4) and a standard RBF [19] is that the
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 183
ellipsoid GMFs are designed for the former, but the radial GMFs are used for the latter. To
accommodate all geometric locations of x(n) belonging to χ by little geometric clusters
corresponding to GMFs (i.e., classify all observed vectors x(n) with a small number K), the
widths of the SCFNN classifier will be thus designed to be trainable to attain this purpose.
The estimation of b1(n) is obtained by bˆ ( n ) sgn{ y ( n )} .
1 s
As demonstrated in Section 2.2, the learning algorithm of a standard SCFNN detection
involves two phases: self-constructing learning and parameter learning. Given a series of
training data (x(n), b1(n)), n = 0, 1, 2, …, the SCFNN training algorithm is performed
at each time instant n. Note that there are no fuzzy rules in adaptive SCFNN beamformer
initially, too. In the self-constructing learning phase, the maximum membership
degree = Max ( ) is also adopted to judge whether to generate a fuzzy rule
or not, and the parameters of the fuzzy rule generated are then initialized
properly. Consequently, the growth criterion that must be met before a new fuzzy rule is
added is:
where min is a pre-specified threshold (0 < min < 1). This growth criterion implies that the
geometric clusters corresponding to the existing fuzzy rules are far from the geometric
location of the current array input vector x(n). Hence, a new fuzzy rule should be generated
to cover x(n), i.e., K ← K + 1. Once a new fuzzy rule is added, its initial geometric cluster is
assigned accordingly:
where is the chosen kernel width [47]. Then the estimated error probability of an SCFNN-
related beamformer at the time instant n can be given by [47]
( )= ( , )d . (3.8)
184 Adaptive Filtering
The objective of the standard MBER method is to minimize ( ) subject to the SCFNN-
related beamformer’s parameters. Namely, all of parameters of SCFNN are adapted by
MBER-based gradient descent method. Because the criterion > implies that the
array input vector x(n) should be a member of the geometric cluster corresponding to ( ),
where = arg Max ( ), we propose to only optimize the parameters corresponding
to the q-th fuzzy rule during MBER-based parameter training phase. By adopting this
method, the time cost can be significantly reduced. Consequently, this modified MBER
method (called C-MBER hereinafter for convenience) optimizes the parameters of the
proposed SCFNN beamformer by the updating amount in (3.9)-(3.11):
( )
∆ =− = ( ) ( ) (3.9)
( ) ( )
∆ =− = ( ) ( ) (3.10)
( )
( ) ( ( ) )
∆ =− = ( ) ( ) (3.11)
( )
with
[ ( ) ]
( )
( )= ( )
= e ∙ ( ) (3.12)
√
( ) ( ) ( ) ( )
=⋃ and =⋃ , (3.13)
( ) ( ) ( ) ( )
where = { ( )| ( ) = } and = { ( )| ( ) = }. It can be easily seen that
( )
={ ( )=
( )
+ ( )} ∈ ( ) and −
( )
≡ ( ) = (− ( ) ) − ( ) = ( )
+
( ) ( ) ( ) ( )
( ) = ∈ . Therefore, the two spaces and are distributed symmetrically,
( ) ( ) ( ) ( ) ( ) ( )
namely, for any subspace ∈ the subspace ∈ satisfies =− .
The basic idea of SCFNN learning is to accommodate all array input vectors ( ) ∈ by
adjusting the geometric clusters corresponding to ( ), = 1, … , . Since the signal spaces
( )
and ( ) are distributed symmetrically in the multi-antenna beamforming system, we
propose to create symmetric geometric clusters to accommodate all of ( ) ∈ . Thus, the
output of the proposed S-SCFNN beamformer is defined by
( ) =∑ ( ( )− ( )) (3.14)
with
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 185
( ), ( ) = 1
( )= (3.16)
− ( ), ( ) = −1.
In the self-constructing learning phase, the maximum membership degree is also adopted.
Because the k-th fuzzy rule of S-SCFNN detector is strongly related with the geometric
clusters corresponding to both ( ) and ( ), the output value of { ( ) − ( )} is
regarded as the membership degree that the current array input vector belongs to the k-th
fuzzy rule. Thus the maximum membership degree is defined as
= Max , ; ( ) − − , ; ( ) . (3.17)
Consequently, the growth criterion that must be met before a new fuzzy rule is added is:
≤ , where is as defined in (3.5). This criterion implies that the existing fuzzy
rules cannot simultaneously satisfy the following two conditions:
(a) At least one of geometric clusters of ( ) is close to the geometric location of x(n).
(b) The geometric cluster of ( ) should be relatively far from the geometric location of
( ) compared to that of ( ).
Hence, a new fuzzy rule should be generated to accommodate ( ), i.e., ← + 1. Then, its
initial symmetric clusters are assigned by
∆ = ( ) ( )+ ( ) (3.19)
( ) ( )
∆ = ( ) ( ( ) + ( ) ) (3.20)
( ) ( )
( ( ) ) ( ( ) )
∆ = ( ) ( ( ) − ( ) ), (3.21)
( ) ( )
186 Adaptive Filtering
where
≤ and ≤ (3.23)
where and are as defined in the SCFNN or S-SCFNN learning, and the integer N
is pre-assigned to be no larger than . With the aid of the latter criterion in (3.23), the
number of fuzzy rules generated during learning can be greatly reduced at low SNRs
compared with that of the S-SCFNN beamformer. The initial cluster setting of a new fuzzy
rule and the parameter learning phase are the same as described in (3.18)-(3.22).
The center vectors of the adaptive SRBF beamformer in the literature have been trained by
several algorithms, such as classical k-means clustering, improved enchanced k-means
clustering [48] and MBER method [47]. However, a serious slow convergence easily occurs in
these classical clustering due to the improper initialization. Moreover, if the number of the
center vectors is really huge, the classical algorithms will even need a longer training sequence
to achieve convergence. Compared to the adaptive SRBF beamformer, the proposed RS-
SCFNN beamformer has a faster convergence capability. By using the criterion <
during the self-constructing learning phase, the initial location of the new center vector
= ( ) can be far from the locations of the other existing center vectors . This feature
avoids two or more center vectors initially located around the same ideal center.
Although in this section we design the adaptive RS-SCFNN beamformer under the BPSK
beamforming system, its extension to high-order quadrature amplitude modulation (QAM)
schemes is also achievable. For example, for the 4-QAM modulation, the array input signal
space can be partitioned into the four subsets (± ± ) depending on the value of ( ) as
the above derivation. Besides, for the four subspaces (± ± ) , the following symmetric
relationship can be easily verified by using the same derivation in S-SCFNN learning:
( )
=+ ∙ ( )
, ( )
= −1 ∙ ( )
and ( )
=− ∙ ( )
. Then, the idea of
creating symmetric geometric clusters to accommodate all ( ) ∈ can be exploited to
modify the 4-QAM FNN detector in [41] [44]. The derivation for the high-order QAM RS-
SCFNN beamformer is much more complex and is beyond the scope of this work.
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 187
Beamformers
Table 3.2 MBER learning rates for different nonlinear adaptive beamformers used in
Section 3.3
Due to the fact of ≤ =2 = 32, we choose = 23 and = 28 for RS-SCFNN
beamformers in the simulations. The chosen training data size for all adaptive beamformers
is 400 in the simulated System A. Figure 3.1 depicts the BER performance for adaptive
SCFNN related beamformers. Figure 3.2 shows the average numbers of fuzzy rules for
adaptive SCFNN related beamformers. Since adaptive S-SCFNN beamformer only observes
half the array input signal space during training, S-SCFNN can generate half as many
188 Adaptive Filtering
fuzzy rules as SCFNN. Namely, the S-SCFNN beamformer only needs to train half as many
parameters ( , and ) as the SCFNN one. As a result, the parameters of S-SCFNN
beamformer can quickly converge, and thus the SCFNN exploiting symmetry has better
BER performance than the standard SCFNN. When SNR is larger than 2 dB, the average
numbers of fuzzy rules for S-SCFNN and RS-SCFNN beamformers are almost the same.
Thus they also have similar BER performance at SNR = 4 ~ 8 dB. However, the numbers of
fuzzy rules for S-SCFNN beamformer are sharply increased at low SNRs. With tiny sacrifice
for BER performance, the numbers of fuzzy rules for RS-SCFNN can be effectively limited
within the number N as mentioned in Section 3.2. In order to check the performance of C-
MBER and MBER methods, the RS-SCFNN-M beamformer is also plotted. The results
indicate that the RS-SCFNN with C-MBER has the similar performance to the RS-SCFNN
with MBER (RS-SCFNN-M). Note that the C-MBER method only needs to train parameters
associated with one of fuzzy rules as mentioned in Section 3.2, but the standard MBER
method [47] have to train parameters associated with all of fuzzy rules.
-1
-1.5
-2
-2.5
log10(BER)
-3
-3.5
-4 SCFNN
S-SCFNN
-4.5 RS-SCFNN: N = 23
RS-SCFNN: N = 28
-5 RS-SCFNN-M: N = 23
RS-SCFNN-M: N = 28
-5.5
0 1 2 3 4 5 6 7 8
SNR,dB
135
SCFNN
125
S-SCFNN
115 RS-SCFNN: N = 23
RS-SCFNN: N = 28
105
RS-SCFNN-M: N = 23
95 RS-SCFNN-M: N = 28
Number of rules
85
75
65
55
45
35
25
15
0 1 2 3 4 5 6 7 8
SNR,dB
Fig. 3.2 Numbers of hidden nodes for adaptive SCFNN related beamformers
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 189
-1
-1.5
-2
-2.5
-3
log10(BER)
-3.5
-4
MMSE
-4.5
MBER
-5 SRBF
RS-SCFNN: N = 23
-5.5 RS-SCFNN: N = 28
Bayesian (Optimal)
-6
0 1 2 3 4 5 6 7 8
SNR,dB
66
62
58
54
50
Number of rules
46 SRBF
RS-SCFNN: N = 23
42
RS-SCFNN: N = 28
38 Bayesian (Optimal)
34
30
26
22
18
0 1 2 3 4 5 6 7 8
SNR,dB
Fig. 3.4 Numbers of hidden nodes for adaptive SRBF and RS-SCFNN beamformers
0.5
MMSE
0 MBER
-0.5 SRBF
RS-SCFNN: N = 23
-1 SCFNN
Bayesian (Optimal)
-1.5
log10(BER)
-2
-2.5
-3
-3.5
-4
-4.5
-5
100 200 300 400 500 600 700 800 900 1000
Training data size
The number of rules for SRBF beamformer is fixed before training and specified by 2 = 32
as done in [47]. Then the BER performance for various kinds of adaptive beamformers is
illustrated in Figure 3.3, and the average numbers of rules for various nonlinear adaptive
beamformers are plotted in Fig. 3.4. In such rank-deficient system and low amount of training
data, the proposed RS-SCFNN beamformers can provide excellent BER performance
compared to the classical linear and nonlinear ones. As shown in Figure 3.4, the numbers of
rules for adaptive RS-SCFNN beamformer can be determined flexibly at different SNRs, but
those for adaptive SRBF one must be fixed to a constant at every SNR. Of course, the SRBF
beamformer also can assign different numbers of hidden nodes for various SNRs, but it will
need huge manpower to achieve this purpose. The relatively large numbers of rules and bad
initial parameters for SRBF easily lead to a slow convergence or poor BER results. However,
the proposed RS-SCFNN beamformer can set good initial parameters as specified in Section
3.2 with little fuzzy rules, so the RS-SCFNN has the BER results close to Bayesian solutions.
The BER convergence rates for different beamformers are demonstrated in Figure 3.5. We can
see that the proposed RS-SCFNN beamformer can obtain satisfactory performance close to
Bayesian solutions if the training data size up to 300.
4. Conclusion
This chapter provides different versions of SCFNN-based detectors for various system models
to improve the performance of classical nonlinear detectors. For improving the classical RBF,
FNN and SCRFNN equalizers in both time-invariant and time-varying channels, a novel
FSCFNN DFE has been demonstrated in Section 2. Specifically, FSCFNN DFE is composed of
several FSCFNN detectors, each of which corresponds to one feedback input vectors. The fast
learning processes, i.e., self-constructing and parameter learning, are adopted in FSCFNN DFE
to make it suitable in time-varying environments. The fast learning algorithm of FSCFNN DFE
has set conditions on the increase demand of fuzzy rules during the self-constructing
algorithm and FSCFNN DFE only activates only one FSCFNN detector at each time instant.
Therefore, the computational complexity of FSCFNN DFE is less than that of traditional
equalizers. In multi-antenna systems, adaptive beamformers based on SCFNN detectors are
also presented in Section 3. By adopting the symmetric property of array input signal space,
the RS-SCFNN learning algorithm can be easier than the standard SCFNN one. From the
simulations, we can see that the SCFNN-based adaptive beamformers can flexibly and
automatically determine structure size themselves for various SNRs. Therefore, we conclude
that the adaptive RS-SCFNN beamformer is potentially a better scheme than the SRBF and
SCFNN ones. Because the competitors of SCFNN-based detectors, such as RBF and FNN, have
been successfully applied to the space-time equalization, turbo equalization, DOA estimation,
high-level QAM systems, OOK optical communications and CDMA or OFDM communication
systems in the literature (IEEE/IET Journals etc.), the future work for SCFNN detectors could
be the extension to the above mentioned systems or the improvement of SCFNN-based
detectors based on the issues of determination of threshold min and the method for
simplifying the complexity.
5. References
[1] S. Haykin, Adaptive filter theory (4th Edition), Prentice Hall, 2002.
[2] T.S. Rappaport, Wireless communications: principles and practice (2nd Edition), Prentice
Hall, 2002.
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 191
[3] J. Litva, T.K.Y. Lo, Digital beamforming in wireless communications, Artech House,
1996.
[4] Y. Gong, X. Hong, “OFDM joint data detection and phase noise cancellation based on
minimum mean square prediction error,” Signal Process., vol. 89, pp. 502-509, 2009.
[5] M.Y. Alias, A.K. Samingan, S. Chen, L. Hanzo, “Multiple antenna aided OFDM
employing minimum bit error rate multiuser detection,” Electron. Lett., vol. 39, no.
24, pp. 1769-1770, 2003.
[6] S. Chen, N.N. Ahmad, L. Hanzo, “Adaptive minimum bit error rate beamforming,” IEEE
Trans. Wirel. Commun., vol. 4, no.2, pp. 341-348, 2005.
[7] J. Li, G. Wei, F. Chen, “On minimum-BER linear multiuser detection for DS-CDMA
channels,” IEEE Trans. Signal Process., vol. 55, no.3, pp. 1093-1103, 2007.
[8] T.A. Samir, S. Elnoubi, A. Elnashar, “Block-Shannon minimum bit error rate
beamforming,” IEEE Trans. Veh. Technol., vol. 57, no.5, pp. 2981-2990, 2008.
[9] W. Yao, S. Chen, S. Tan, L. Hanzo, “Minimum bit error rate multiuser Transmission
designs using particle swarm optimisation,” IEEE Trans. Wirel. Commun., vol. 8,
no.10, pp. 5012-5017, 2009.
[10] S. Gollamudi, S. Nagaraj, S. Kapoor, Y.F. Huang, “Set-membership filtering and a set-
membership normalized LMS algorithm with an adaptive step size,” IEEE Signal
Process. Lett., vol. 5, no. 5, pp. 111-114, 1998.
[11] Y.C. Liang, F.P.C. Chin, “Coherent LMS algorithms,” IEEE Commun. Lett., vol. 4, no. 3,
pp. 92-94, 2000.
[12] S. Choi, T.W. Lee, D. Hong, “Adaptive error-constrained method for LMS algorithms
and applications,” Signal Process., vol. 85, pp. 1875-1897, 2005.
[13] E.F. Harrington, “A BPSK decision-feedback equalization method robust to phase and
timing errors,” IEEE Signal Process. Lett., vol. 12, pp. 313-316, 2005.
[14] S. Chen, B. Mulgrew, P.M. Grant, “A clustering technique for digital communications
channel equalization using radial basis function networks,” IEEE Trans. Neural
Netw., vol. 4, no. 4, pp. 570-579, 1993.
[15] J. Montalvão, B. Dorizzi, J. Cesar M. Mota, “Why use Bayesian equalization based on
finite data blocks,” Signal Process., vol. 81, pp. 137-147, 2001.
[16] S. Chen, B. Mulgrew, S. McLaughlin, “Adaptive Bayesian equalizer with decision
feedback,” IEEE Trans. Signal Process., vol. 41, no. 9, pp. 2918-2927, 1993.
[17] S. Chen, S. McLaughlin, B. Mulgrew, P.M. Grant, “Bayesian decision feedback equalizer
for overcoming co-channel interference,” IEE Proc.-Commun., vol. 143, no. 4, pp.
219-225, 1996.
[18] S. Chen, L. Hanzo, A. Wolfgang, “Kernel-based nonlinear beamforming construction
using orthogonal forward selection with the fisher ratio class separability
measure,” IEEE Signal Process. Lett., vol. 11, no. 5, pp. 478-481, 2004.
[19] S. Chen, L. Hanzo, A. Wolfgang, “Nonlinear multiantenna detection methods,”
EURASIP J Appl. Signal Process., vol. 9, pp. 1225-1237, 2004.
[20] J. Lee, C. Beach, N. Tepedelenlioglu, “A practical radial basis function equalizer,” IEEE
Trans. Neural Netw., vol. 10, pp. 450-455, 1999.
192 Adaptive Filtering
[21] P.C. Kumar, P. Saratchandran, N. Sundararajan, “Minimal radial basis function neural
networks for nonlinear channel equalization,” IEE Proc.-Vis. Image Signal Process.,
vol. 147, no. 5, pp. 428-435, 2000.
[22] M.S. Yee, B.L. Yeap, L. Hanzo, “Radial basis function-assisted turbo equalization,” IEEE
Trans. Commun., vol. 51, pp. 664-675, 2003.
[23] Wolfgang, S. Chen, L. Hanzo, “Radial basis function network assisted space-time
equalisation for dispersive fading environments,” Electron. Lett., vol. 40, no. 16, pp.
1006-1007, 2004.
[24] J.B. MacQueen, “Some methods for classification and analysis of multivariate
observations,” Proceedings of the 5th Berkeley Symposium on Mathematics
Statistics and Probability, Berkeley, U.S.A., pp. 281-297, 1967.
[25] R. Assaf, S.E. Assad, Y. Harkouss, “Adaptive equalization for digital channels RBF
neural network,” Proceedings of the European Conference on Wireless Technology,
Paris, France, pp. 347-350, 2005.
[26] R. Assaf, S.E. Assad, Y. Harkouss, “Adaptive equalization of nonlinear time varying-
channel using radial basis network,” Proceedings of the 2006 International
Conference on Information and Communication Technologies, Damascus, Syria,
pp. 1866-1871, 2006.
[27] L. Xu, A. Krzyzak, E. Oja, “Rival penalized competitive learning for clustering analysis,
RBF net, and curve detection,” IEEE Trans. Neural Netw., vol. 4, no. 4, pp. 636-649,
1993.
[28] Y.M. Cheung, “On rival penalization controlled competitive learning for clustering with
automatic cluster number selection,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 11,
pp. 1583-1588, 2005.
[29] J. Ma, T. Wang, “A cost-function approach to rival penalized competitive learning
(RPCL),” IEEE Trans. Syst. Man Cybern. Part B-Cybern., vol. 36, no. 4, pp. 722-737,
2006.
[30] S. Chen, T. Mei, M. Luo, H. Liang, “Study on a new RPCCL clustering algorithm,”
Proceedings of the 2007 IEEE International Conference on Mechatronics and
Automation, Harbin, China, pp. 299-303, 2007.
[31] X. Qiao, G. Ji, H. Zheng, “An improved rival penalized competitive learning algorithm
based on fractal dimension of algae image,” Proceedings of the 2008 IEEE Control
and Decision Conference, Yantai, China, pp. 199-202, 2008.
[32] S. Siu, G.J. Gibson, C.F.N. Cowan, “Decision feedback equalization using neural
network structures and performance comparison with standard architecture,” IEE
Proc.-Commun., vol. 137, pp. 221-225, 1990.
[33] J. Coloma, R.A. Carrasco, “MLP equaliser for frequency selective time-varying
channels,” Electron. Lett., vol. 30, pp. 503-504, 1994.
[34] C.H. Chang, S. Siu, C.H. Wei, “Decision feedback equalization using complex
backpropagation algorithm,” Proceedings of 1997 IEEE International Symposium
on Circuits and Systems, Hong Kong, China, pp. 589-592, 1997.
[35] S.S. Yang, C.L. Ho, C.M. Lee, “HBP: improvement in BP algorithm for an adaptive MLP
decision feedback equalizer,” IEEE Trans. Circuits Syst. II-Express Briefs, vol. 53,
no. 3, pp. 240-244, 2006.
Adaptive Fuzzy Neural Filtering for Decision Feedback Equalization and Multi-Antenna Systems 193
[36] S. Siu, S.S. Yang, C.M. Lee, C.L. Ho, “Improving the Back-propagation algorithm using
evolutionary strategy,” IEEE Trans. Circuits Syst. II-Express Briefs, vol. 54, no. 2,
pp. 171-175, 2007.
[37] K. Mahmood, A. Zidouri, A. Zerquine, “Performance analysis of a RLS-based MLP-DFE
in time-invariant and time-varying channels,” Digit. Signal Prog., vol. 18, no. 3, pp.
307-320, 2008.
[38] S.S. Yang, S. Siu, C.L. Ho, “Analysis of the initial values in split-complex
backpropagation algorithm,” IEEE Trans. Neural Netw., vol. 19, pp. 1564-1573,
2008.
[39] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-fuzzy and soft computing - a computational
approach to learning and machine intelligence, Prentice Hall, 1997.
[40] C.H. Lee, Y.C. Lin, “An adaptive neuro-fuzzy filter design via periodic fuzzy neural
network,” Signal Process., vol. 85, pp. 401-411, 2005.
[41] S. Siu, C.L. Ho, C.M. Lee, “TSK-based decision feedback equalizer using an
evolutionary algorithm applied to QAM communication systems,” IEEE Trans.
Circuits Syst. II-Express Briefs,vol. 52, pp. 596-600, 2005.
[42] F.J. Lin, C.H. Lin, P.H. Shen, “Self-constructing fuzzy neural network speed controller
for permanent-magnet synchronous motor drive,” IEEE Trans. Fuzzy Syst., vol. 9,
no. 5, pp. 751-759, 2001.
[43] W.D. Weng, R.C. Lin, C.T. Hsueh, “The design of an SCFNN based nonlinear channel
equalizer,” J. Inf. Sci. Eng., vol. 21, pp. 695-709, 2005.
[44] R.C. Lin, W.D. Weng, C.T. Hsueh, “Design of an SCRFNN-based nonlinear channel
equaliser,” IEE Proc.-Commun., vol. 152, no. 6, pp. 771-779, 2005.
[45] A. Bateman, A. Bateman, Digital communications: design for the real world, Addison
Wesley, 1999.
[46] M. Martínez-Ramón, J.L. Rojo-Álvarez, G. Camps-Valls, C.G. Christodoulou, “Kernel
antenna array processing,” IEEE Trans. Antennas Propag., vol. 55, no. 3, pp. 642-
650, 2007.
[47] S. Chen, A. Wolfgang, C.J. Harris, L. Hanzo, “Adaptive nonlinear least bit error-rate
detection for symmetrical RBF beamforming,” Neural Netw., vol. 21, pp. 358-367,
2008.
[48] S. Chen, A. Wolfgang, C.J. Harris, L. Hanzo, “Symmetric complex-valued RBF receiver
for multiple-antenna-aided wireless systems,” IEEE Trans. Neural Netw., vol. 19,
no. 9, pp. 1657-1663, 2008.
[49] Q. Liang, J.M. Mendel, “Overcoming time-varying co-channel interference using type-2
fuzzy adaptive filter,” IEEE Trans. Circuits Syst. II-Express Briefs, vol. 47, pp. 1419-
1428, 2000.
[50] Q. Liang, J.M. Mendel, “Equalization of nonlinear time-varying channels using type-2
fuzzy adaptive filters,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 551-563, 2000.
[51] Y.J. Chang, C.L. Ho, “Improving the BP algorithm using RPCL for FNN-based adaptive
equalizers,” Proceedings of 2008 National Symposium on Telecommunications,
Yunlin, Taiwan, pp. 1385-1388, 2008.
[52] Y.J. Chang, C.L. Ho, “SOFNN-based equalization using rival penalized controlled
competitive learning for time-varying environments,” Proceedings of 2009
194 Adaptive Filtering
1. Introduction
Stereo acoustic echo canceller is becoming more and more important as an echo canceller is
applied to consumer products like a conversational DTV. However it is well known that if
there is strong cross-channel correlation between right and left sounds, it cannot converge
well and results in echo path estimation misalignment. This is a serious problem in a
conversational DTV because the speaker output sound is combination of a far-end
conversational sound, which is essentially monaural, and TV program sound, which has
wide variety of sound characteristics, monaural sound, stereo sound or mixture of them. To
cope with this problem, many stereo echo cancellation algorithms have been proposed. The
methods can be categorized into two approaches. The first one is to de-correlate the stereo
sound by introducing independent noise or non-linear post-processing to right and left
speaker outputs. This approach is very effective for single source stereo sound case, which
covers most of conversational sounds, because the de-correlation prevents rank drop to
solve a normal equation in a multi-channel adaptive filtering algorithm. Moreover, it is
simple since many traditional adaptation algorithms can be used without any modification.
Although the approach has many advantages and therefore widely accepted, it still has an
essential problem caused by the de-correlation which results in sound quality change due to
insertion of the artificial distortion. Even though the inserted distortion is minimized so as
to prevent sound quality degradation, from an entertainment audio equipment view point,
such as a conversational DTV, many users do not accept any distortion to the speaker
output sound. The second approach is desirable for the entertainment types of equipments
because no modification to the speaker outputs is required. In this approach, the algorithms
utilize cross-channel correlation change in a stereo sound. This approach is also divided into
two approaches, depending on how to utilize the cross-channel correlation change. One
widely used approach is affine projection method. If there are small variations in the cross-
channel correlation even in a single sound source stereo sound, small un-correlated
component appears in each channel. The affine projection method can produce the best
direction by excluding the auto-correlation bad effect in each channel and by utilizing the
small un-correlated components. This approach has a great advantage since it does not
require any modification to the stereo sound, however if the variation in the cross-channel
correlation is very small, improvement of the adaptive filter convergence is very small. Since
the rank drop problem of the stereo adaptive filter is essentially not solved, we may need
slight inserted distortion which reduces merit of this method. Another headache is that the
method requires P by P inverse matrix calculation in an each sample. The inverse matrix
196 Adaptive Filtering
operation can be relaxed by choosing P as small number, however small P sometimes cannot
attain sufficient convergence speed improvement. To attain better performance even by
small P, the affine projection method sometimes realized together with sub-band method.
Another method categorized in the second approach is “WARP” method. Unlike to affine
projection method which utilizes small change in the cross-channel correlation, the method
utilizes large change in the cross-channel correlation. This approach is based on the nature
of usual conversations. Even though using stereo sound for conversations, most parts of
conversations are single talk monaural sound. The cross-channel correlation is usually very
high and it remains almost stable during a single talking. A large change happens when
talker change or talker’s face movement happens. Therefore, the method applies a monaural
adaptive filter to single sound source stereo sound and multi-channel (stereo) adaptive filter
to non-single sound source stereo sound. Important feature of the method is two monaural
adaptive filter estimation results and one stereo adaptive filter estimation result is
transformed to each other by using projection matrixes, called WARP matrixes. Since a
monaural adaptive filter is applied when a sound is single source stereo sound, we do not
need to suffer from the rank problem.
In this chapter, stereo acoustic echo canceller methods, multi-channel least mean square,
affine projection and WARP methods, all of them do not need any modification to the
speaker output sounds, are surveyed targeting conversational DTV applications. Then
WARP method is explained in detail.
News +
Music
Broad Casting DTV Audio
Audio System Receiver
…. +
Variety
Sampling
Frequency
Converter
1. Mixing of broadcasting sound and communication speech: Two stereo sounds from the
DTV audio receiver and local conversational speech decoder are mixed and sent to the
stereo speaker system.
2. Sampling frequency conversion: Sampling frequency of DTV sound is usually wider
than that of conversational service, such as f SH 48kHz for DTV sound and
f S 16 kHz for conversational service sound, we need sampling frequency conversion
between DTV and conversational service audio parts
3. Stereo acoustic canceller: A stereo acoustic echo canceller is required to prevent howling
and speech quality degradation due to acoustic coupling between stereo speaker and
microphone.
Among the above special functionalities, the echo canceller for the conversational DTV is
technically very challenging because the echo canceller should cancel wide variety of stereo
echoes for TV programs as well as stereo speech communications.
X Si ( k ) [ x Si ( k ), x Si ( k 1), x Si ( k P 1)]
x Si ( k ) [ xSi ( k ), xSi ( k 1), xSi ( k N 1)]T
x Ri ( k ) [ x Ri ( k ), x Ri ( k 1), x Ri ( k N 1)]T
(1)
x Li ( k ) [ xLi ( k ), xLi ( k 1), xLi ( k N 1)]T
( k ) [ xURi
xURi ( k ), xURi ( k N 1)]T
( k 1), xURi
( k ) [ xULi
xULi ( k ), xULi ( k N 1)]T
( k 1), xULi
where P and N are impulse response length of the FIR system and tap length of the
adaptive filter for each channel, respectively.
Then the FIR system output x i ( k ) is 2N sample array and is expressed as
x ( k ) X ( k )g Ri ( k ) xURi ( k )
x i ( k ) Ri Si . (2)
( k )
x Li ( k ) X Si ( k )g Li ( k ) xULi
where g Ri ( k ) and g Li ( k ) are P sample impulse responses of the FIR system defined as
Stereo
Sound
Generation Multi-Channel Multi-Channel
Model Adaptive Filter Echo Path Model
(LTI) T T
hˆ STi (k) hˆ TRi (k) hˆ TLi (k) h (k) hTR hTL
(2) is re-written as
x Ri ( k ) X Si ( k )g Ri X Si ( k )g Ri ( k ) xURi ( k )
. (5)
( k )
x Li ( k ) X Si ( k )g Li X Si ( k )g Li ( k ) xULi
This situation is usual in the case of far-end single talking because transfer functions
between talker and right and left microphones vary slightly due to talker’s small movement.
By assuming the components are also un-correlated noise, (5) can be regarded as a linear
time invariant system with independent noise components, xURi ( k ) and xULi ( k ) , as
x Ri ( k ) X Si ( k )g Ri xURi ( k )
. (6)
x Li ( k ) X Si ( k )g Li xULi ( k )
where
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 199
(k)
xURi ( k ) X Si ( k )g Ri ( k ) xURi
. (7)
(k)
xULi ( k ) X Si ( k )g Li ( k ) xULi
In (6), if there are no un-correlated noises, we call the situation as strict single talking.
( k ) and xULi
In this chapter, sound source signal( xSi ( k ) ), uncorrelated noises ( xURi ( k ) ) are
assumed as independent white Gaussian noise with variance xi and Ni , respectively.
ei ( k ) yi ( k ) yˆ i ( k ) ni ( k ) (8)
T
hˆ STi ( k )= hˆ TRi ( k ) hˆ TLi ( k ) . (11)
Optimum echo path estimation hˆ OPT which minimizes the error power e2 ( k ) is given by
solving the linier programming problem as
N LS 1
Minimize ei2 ( k ) (12)
k 0
where N LS is a number of samples used for optimization. Then the optimum echo path
estimation for the ith LTI period hˆ OPTi is easily obtained by well known normal equation
as
N LS 1
hˆ OPTi =( ( y i ( k )x i ( k )))X NLSi
1
(13)
k 0
200 Adaptive Filtering
where X NLSi is an auto-correlation matrix of the adaptive filter input signal and is given
by
NLS 1 N LS 1
( x Ri (k)x Ri (k))
T
N LS 1
( x Ri (k)xTLi (k))
A Bi k 0 .
X NLSi ( x i ( k )xTi ( k ))= i =
Di N LS 1
k 0
N LS 1 (14)
k 0 Ci
k ( xLi (k)xTLi (k))
( x Li (k)xTRi (k))
0 k 0
In the case of the stereo generation model which is defined by(2), the sub-matrixes in (14)
are given by
N LS 1
Ai ( X Si ( k )GRRi XTSi ( k ) 2 xURi ( k )( X Si ( k )g Ri )T xURi ( k )xURi
T
( k ))
k 0
N LS 1
Bi ( X Si ( k )GRLi XTSi ( k ) xURi ( k )( X Si ( k )g Ri )T xULi ( k )( X Si ( k )g Ri )T xURi ( k )xULi
T
( k ))
k 0
.(16)
N LS 1
Ci ( X Si ( k )GLRi XTSi ( k ) xULi ( k )( X Si ( k )g Ri )T xURi ( k )( X Si ( k )g Li )T xULi ( k )xURi
T
( k ))
k 0
N LS 1
Di ( X Si ( k )GLLi XTSi ( k ) 2 xULi ( k )( X Si ( k )g Li )T xULi ( k )xULi
T
( k ))
k 0
where
In the cease of strict single talking where xURi ( k ) and xULi ( k ) do not exist, (16) becomes very
simple as
N LS 1
Ai ( X Si ( k )GRRi XTSi ( k ))
k 0
N LS 1
Bi ( X Si ( k )GRLi XTSi ( k ))
k 0
N LS 1
. (18)
Ci ( X Si ( k )GLRi XTSi ( k ))
k 0
N LS 1
Di ( X Si ( k )GLLi XTSi ( k ))
k 0
DiC i Ci A i1BiCi
N LS 1
( X Si ( k )(GLLi GLRiG1 RRiG RLi )XTSi ( k ))X Si ( k )GLRi XTSi ( k ))
k 0
. (20)
N LS 1
N xi2 ( X Si ( k )( g TLi g Li (g Li g TRi g Li g TRi (g Ri g TRi )1 g Ri g TRi )XTSi ( k ))
k 0
0
Hence no unique solution can be found by solving the normal equation in the case of strict
single talking where un-correlated components do not exist. This is a well known stereo
adaptive filter cross-channel correlation problem.
where X M 2 Ni ( k ) is a Mx2N matrix which is composed of adaptive filter stereo input array as
defined by
where
1
f ( hˆ STi ( k )) hˆ TSTi (k)Q 2 N 2 Ni ( k )hˆ TSTi (k) hˆ TSTi (k)Q 2 N 2 Ni hST . (28)
2
For the quadratic function, gradient Δ i ( k ) is given by
X M 2 Ni ( k )=XTM 2 Ni ( k ) (32)
Assuming initial tap coefficient array as zero vector and 0 during 0 to 2N-1th samples
and 1 at 2Nth sample , (34) can be re-written as
This iteration is done only once at 2 N 1th sample. If N LS 2 N , inverse matrix term in (35)
is written as
N LS 1
XT2 N 2 Ni ( k )X 2 N 2 Ni ( k )= ( x i ( k )xTi ( k ))=X NLSi (37)
k 0
Comparing (13) and (35) with (37), it is found that LS method is a special case of gradient
method when M equals to 2N.
T X ( k )GTRi XURi ( k )
X P 2 Ni ( k ) XTRi ( k ) XTLi ( k ) 2 Si (38)
T
X 2 Si ( k )GLi XULi ( k )
204 Adaptive Filtering
where
GRi and GLi are source to microphones response (2P-1)xP matrixes and are defined as
Q ( k ) Q BNNi ( k )
Q N 2 Ni ( k )= ANNi (42)
QCNNi ( k ) Q DNNi ( k )
where Q ANNi ( k ) and Q DNNi ( k ) are right and left channel auto-correlation matrixes,
Q BNNi ( k ) and QCNNi ( k ) are cross channel-correlation matrixes. These sub-matrixes are
given by
expectation values for sub-matrixes in (42) are simplified applying statistical independency
between sound source signal and noises and Tlz function defined in Appendix as
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 205
T T T T
Q ANNi Tlz( X 2 Si ( k )G Ri R i ( k )R i ( k )G Ri X 2 Si ( k ) ) Tlz( XURi ( k )R i ( k )R i ( k )XURi ( k ) )
T T T
Q BNNi Tlz( X 2 Si ( k )G Ri R i ( k )R i ( k )GLi X 2 Si ( k ) )
(44)
T T T
Q CNNi Tlz( X 2 Si ( k )GLi R i ( k )R i ( k )G Ri X 2 Si ( k ) )
T T T
Q DNNi Tlz( X 2 Si ( k )GLi R i ( k )R i ( k )GLi X 2 Si ( k ) ) Tlz( XULi ( k )R i ( k )R i ( k )XULi ( k ) )
where
with
Applying matrix operations to Q 2 N 2 Ni , a new matrix Q2 N 2 Ni which has same determinant
as Q 2 N 2 Ni is given by
Q (k) 0
Q2 N 2 Ni ( k )= ANNi (47)
0 QDNNi ( k )
where
Since both X 2 Si ( k )GTRi and X 2 Si ( k )GTLi are symmetric PxP square matrixes, QANNi and
QBNNi are re-written as
( k )GT RT ( k )R ( k )G X T ( k ) X ( k )GT RT ( k )R ( k )G X T ( k )
QANNi X 2 Si Ri i i Ri 2 Si 2 Si Li i i Li 2 Si
( k )RT ( k )R ( k )X T ( k )
X URi i i URi
As evident by(47), (48) and(49), Q2 N 2 Ni ( k ) is composed of major matrix QANNi ( k ) and noise
matrix QDNNi ( k ) . In the case of single talking where sound source signal power X2 is much
206 Adaptive Filtering
2
larger than un-correlated signal power Ni , RTi ( k )R i ( k ) which minimizes eigenvalue
spread in Q 2 N 2 Ni ( k ) so as to attain the fastest convergence is given by making QANNi as a
identity matrix by setting RTi ( k )R i ( k ) as
RTi ( k )R i ( k ) ( N Xi
2
(G RiGTRi GLi GTLi ))1 (50)
In other cases such as double talking or no talk situations, where we assume X2 is almost
zero, RTi ( k )R i ( k ) which orthogonalizes QANNi is given by
RTi ( k )R i ( k ) ( N Ni
2
I P ) 1 (51)
1
RTi ( k )R i ( k ) XTP 2 Ni ( k )X P 2 Ni ( k ) . (52)
Since
XTP 2 Ni ( k )X P 2 Ni ( k )
X 2 Si ( k )GTRi XURi ( k )
GRi XT2 Si ( k ) XURi
T T
T
( k ) G X
Li 2 Si ( k ) X ULi ( k ) X ( k )GT X ( k )
2Si Li ULi . (53)
GRi XT2Si ( k )X 2 Si ( k )GTRi GLi XT2 Si ( k )X 2Si ( k )GTLi XURi
T T
( k )XURi ( k ) XULi ( k )XULi ( k )
2
N Xi (GRi GTRi GLiGTLi )+2N Ni
2
IP
The method can be intuitively understood using geometrical explanation in Fig. 3. As seen
here, from a estimated coefficients in a k-1th plane a new direction is created by finding the
nearest point on the i th plane in the case of traditional NLMS approach. On the other hand,
affine projection creates the best direction which targets a location included in the both i-1
and i th plane.
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 207
Affine Projection
NLMS Iteration
x R (k ) , x L (k ) Goal
Space
x R (k 1) , x L ( k 1)
Space
x R (k ) , x L (k )
x R (k 1) , x L (k 1)
It is well known that convergence speed of (57) depends on the smallest and largest eigen-
value of the matrix Q 2 N 2 Ni . In the case of the stereo generation model in Fig.2 for single
talking with small right and left noises, we obtain following determinant of Q 2 N 2 Ni for
M=1 as
2 2
where min i and max i are the smallest and largest eigenvalues, respectively.
208 Adaptive Filtering
2
Q 2 N 2 Ni ( k ) is given by assuming un-correlated noise power Ni is very small
2 2
( Ni min i ) as
Hence, it is shown that stereo NLMS echo-canceller’s convergence speed is largely affected
by the ratio between the largest eigenvalue of g Ri g TRi g Li g TLi and non-correlated signal
2
power Ni . If the un-correlated sound power is very small in single talking, the stereo
NLMS echo canceller’s convergence speed becomes very slow.
Ei ( k ) e Pi ( k ) e Pi ( k - 1) e Pi (k - p 1) (61)
ˆ ( k 1)=H
H ˆ (k) X T 1 (62)
STi STi P 2 Ni ( k )( X P 2 Ni ( k )X P 2 Ni ( k )) E i ( k )
where
ˆ ( k ) hˆ ( k ) hˆ ( k - 1) hˆ (k - p 1)
H (63)
STi STi STi STi
In the case of strict single talking, following assumption is possible in the ith LTI period by
(53)
2
G RRLLi N Xi (G RiGTRi GLiGTLi ) (65)
ˆ ( k 1)G ˆ
H STi RRLLi =HSTi ( k )G RRLLi X P 2 Ni ( k )E i ( k ) (66)
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 209
ˆ ( k ) by a new matrix H
Re-defining echo path estimation matrix H ˆ ( k ) which is defined by
STi STi
ˆ ( k )=H
H ˆ ( k )G (67)
STi STi RRLLi
(66) is re-written as
ˆ ( k 1)=H
H ˆ (k) X
STi STi P 2 Ni ( k )E i ( k ) (68)
T
H ˆ ( k ) X 2 Si ( k )GRi XURi ( k ) E ( k )
ˆ ( k 1)=H (69)
STi STi T i
X 2 Si ( k )GLi XULi ( k )
In the case of strict single talking where no un-correlated signals exist, and if we can assume
GLi is assumed to be an output of a LTI system GRLi which is PxP symmetric regular matrix
with input GRi , then (69) is given by
H ˆ ( k 1) H ˆ ( k ) X 2 Si ( k )GRi Ei ( k )
STRi STRi
=
ˆ ˆ
HSTLi ( k 1) HSTLi ( k ) X 2Si ( k )GRiGi Ei ( k )
(70)
H ˆ ( k 1) H ˆ (k) X 2 Si ( k )G Ri Ei ( k )
STRi STRi
=
ˆ 1 ˆ 1
HSTLi ( k 1)GRLi HSTLi ( k )GRLi X 2 Si ( k )G Ri Ei ( k )
It is evident that rank of the equation in (70) is N not 2N, therefore the equation becomes
monaural one by subtracting the first law after multiplying (GRLi )1 from the second low as
ˆ ˆ
H MONRLi ( k 1) H MONRLi ( k ) 2 X Ri ( k )E i ( k ) (71)
where
ˆ ˆ ˆ 1
H MONRLi ( k ) HSTRi ( k ) HSTLi ( k )G RLi (72)
ˆ ˆ
H MONLRi ( k 1) H MONLRi ( k ) 2 X Li ( k )E i ( k ) (73)
where
ˆ ˆ ˆ 1
H MONRLi ( k ) HSTLi ( k ) HSTRi ( k )GLRi (74)
ˆ ˆ ˆ 1
H MONRLi ( k ) HSTRi ( k )G RRLLi HSTLi ( k )G RRLLi G RLi (75)
210 Adaptive Filtering
or
ˆ ˆ 1 ˆ
H MONLRi ( k ) HSTRi ( k )G RRLLi GLRi HSTLi ( k )G RRLLi (76)
ˆ
From the stereo echo path estimation view point, we can obtain H ˆ
MONRLi ( k ) or H MONLRi ( k ) ,
however we can’t identify right and left echo path estimation from the monaural one. To
cope with this problem, we use two LTI periods for separating the right and left estimation
results as
where Hˆ ˆ
MONLRi and H MONLRi 1 are monaural echo canceller estimation results at the end of
ˆ 1 ˆ
H MONi , i 1 Wi HSTi (78)
ˆT
where H MONRLij is estimation result matrix for the i 1th and ith LTI period’s as
H ˆT
ˆ MONRLi (79)
H MONi , i 1 T
ˆ
H MONRLi 1
ˆ T is stereo echo path estimation result as
H STi
H ˆT
ˆ STRi
H STi T (80)
ˆ
HSTLi
Wi1 is a matrix which projects stereo estimation results to two monaural estimation results
and is defined by
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 211
T 1
GRLi and GLRi 1 are regular matrix
GRLi 1GLRi 1 GRRLLi 1
T
GRRLLiGLRi
1
GRRLLi
T 1
GLRi and GRLi 1 are regular matrix
G RRLLi 1 GRRLLi 1GRLi 1
By swapping right side hand and left side hand in(78), we obtain right and left stereo echo
path estimation using two monaural echo path estimation results as
ˆ ˆ
H STi Wi H MONi , i 1 . (82)
Since Wi1 and Wi are used to project optimum solutions in two monaural spaces to
corresponding optimum solution in a stereo space and vice-versa, we call the matrixes as
WARP functions. Above procedure is depicted in Fig. 4. As shown here, the WARP system
is regarded as an acoustic echo canceller which transforms stereo signal to correlated
component and un-correlated component and monaural acoustic echo canceller is applied to
the correlated signal. To re-construct stereo signal, cross-channel correlation recovery matrix
is inserted to echo path side. Therefore, WARP operation is needed at a LTI system change.
Ei (k )
+ +
ei (k) yi (k)
ni (k)
Fig. 4. Basic Principle for WARP method
212 Adaptive Filtering
hˆ STi 1 ( k )
hˆ STi 1 ( k 1)
hˆ STi (k )
hˆ STi (k 1)
xi (k 1)
xi1(k)
xi (k) xi1(k 1)
4. Realization of WARP
4.1 Simplification by assuming direct-wave stereo sound
Both stereo affine projection and WARP methods require P x P inverse matrix operation
which needs to consider its high computation load and stability problem. Even though the
WARP operation is required only when the LTI system changes such as far-end talker
change and it is much smaller computation than inverse matrix operations for affine
projection which requires calculations in each sample, simplification of the WARP operation
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 213
is still important. This is possible by assuming that target stereo sound is composed of only
direct wave sound from a talker (single talker) as shown in Fig. 6.
xSi ()
where lRi and lLi are attenuation of the transfer functions and Ri and Li are analog delay
values.
Since the right and left sounds are sampled by f S ( S / 2 ) Hz and treated as digital
signals, we use z- domain notation instead of -domain as
z exp[2 j / s ] . (85)
-
+ + ni (z)
ei (z) yi (z)
Fig. 7. WARP Method using Z-Function
yˆ i ( z) hˆ Ti ( z)x i ( z)
(87)
y i ( z) hT ( z)x i ( z) n i ( z)
where n i ( z ) is a room noise, hˆ i ( z) and hˆ i ( z) are stereo adaptive filter and stereo echo path
characteristics at the end of ith LTI period respectively and which are defined as
ˆ
ˆ ( z) h Ri ( z) , H ( z) h R ( z) .
H (88)
STi ST
ˆ
hLi ( z) hL ( z)
ˆ T ( z)x ( z)
ei ( z) y i ( z) H (89)
STi i
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 215
In the case of single talking, we can assume both xURi ( z) and xULi ( z ) are almost zero, and
(89) can be re-written as
Since the acoustic echo can also be assumed to be driven by single sound source x Si ( z) , we
can assume a monaural echo path h Monoi ( z) as
This equation implies we can adopt monaural adaptive filter by using a new monaural
quasi-echo path hˆ ( z) as
Monoi
However, it is also evident that if LTI system changes both echo and quasi-echo paths
should be up-dated to meet new LTI system. This is the same reason for the stereo echo
canceller in the case of pure single talk stereo sound input. If we can assume the acoustic
echo paths is time invariant for two adjacent LTI periods, this problem is easily solved by
satisfying require rank for solving the equation as
hˆ Monoi ( z ) hˆ Ri ( z )
Wi1 ( z ) . (94)
ˆ ˆ
h Monoi 1 ( z ) hLi ( z )
where
g ( z) g SLi ( z)
Wi1 ( z) SRi (95)
g
SRi 1 ( z ) g SLi 1 ( z )
In other words, using two echo path estimation results for corresponding two LTI periods,
we can project monaural domain quasi-echo path to stereo domain quasi echo path or vice -
versa using WARP operations as
ˆ ( z ) W ( z )H
H ˆ
STi i Monoi ( z )
. (96)
ˆ
H 1
( z) W ( z)H ˆ ( z)
Monoi i STi
where
hˆ Monoi ( z) ˆ
ˆ ˆ ( z) h Ri ( z) .
H Monoi ( z ) ,H STi (97)
ˆ ˆ
h Monoi 1 ( z) hLi ( z )
216 Adaptive Filtering
1 g RLi ( z)
RR Transition
1 g RLi-1 ( z)
1 g RLi ( z)
RL Transition
g LRi-1 ( z) 1
Wi ( z) (98)
g LRi ( z) 1
1 g ( z ) LR Transition
RLi-1
g LRi ( z) 1
LL Transition
g LRi-1 ( z) 1
where g RLi ( z) and g LRi ( z) are cross-channel transfer functions between right and left stereo
sounds and are defined as
The RR, RL, LR and LL transitions in (98) mean a single talker’s location changes. If a talker’
location change is within right microphone side (right microphone is the closest
microphone) we call RR-transition and if it is within left-microphone side (left microphone
is the closest microphone) we call LL-transition. If the location change is from right-
microphone side to left microphone side, we call RL-transition and if the change is opposite
we call LR-transition. Let’s assume ideal direct-wave single talk case. Then the domain
transfer functions, g RLi ( ) and g LRi ( ) are expressed in z-domain as
g RLi ( z) lRLi ( RLi , z)z dRLi , g LRi ( z) lLRi ( LRi , z)z dLRi (100)
where RLi , , and LRi , are fractional delays and dRLi and dLRi are integer delays for the direct-
wave to realize analog delays RLi and LRi , these parameters are defined as
sin( )
φ( , z) z . (102)
( )
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 217
hˆ ( z) hˆ Monoi 1 ( z)
hˆ Ri ( z) Monoi
g RLi 1 ( z) g RLi ( z)
RR Transition (103)
g ( z)hˆ ( z) g ( z)hˆ Monoi 1 ( z )
hˆ Li ( z) RLi 1 Monoi RLi
g RLi 1 ( z ) g RLi ( z )
or
hˆ Monoi ( z) hˆ Monoi 1 ( z)
hˆ Ri ( z)
lRLi 1φ( RLi 1, z)z dRLi1 lRLi φ( RLi , z)z dRLi
RR Transition (105)
lRLi 1φ( RLi 1, z)z dRLi1 hˆ Monoi ( z) lRLi φ( RLi , z)z dRLi hˆ Monoi 1 ( z)
hˆ Li ( z)
lRLi 1φ( RLi 1, z)z dRLi1 lRLi φ( RLi , z)z dRLi
and
φ( , z) φ( , z) 1. (107)
( hˆ ( z ) hˆ Monoi 1 ( z ))lRLi
1
1φ( RLi 1, z )z
dRLi 1
hˆ Ri ( z ) Monoi 1
1 (lRLi lRLi 1 )φ( RLi 1, z)φ( RLi , z )z( dRLi dRLi1 )
RR Transition (108)
ˆ 1 dRLi dRLi 1 ˆ
ˆh ( z ) h Monoi ( z ) lRLi lRLi 1φ( RLi , z)φ( RLi 1, z)z h Monoi 1 ( z)
Li 1 ( dRLi dRLi 1 )
1 (lRLi lRLi 1 )φ( RLi 1, z )φ( RLi , z )z
218 Adaptive Filtering
These functions are assumed to be digital filters for the echo path estimation results as
shown in Fig.8.
Ĥ i ( z )
lRLRLi ˆ RLi1 ( z )ˆ RLi ( z 1 ) z d RLRLi
Fig. 8. Digital Filter Realization for WARP Functions
D RRi ( z) 1 RR Transition
(109)
D RLi ( z) 1 RL Transition
where
1 ( dRLi dRLi 1 )
D RRi ( z) lRLi lRLi 1φ( RLi 1, z )φ( RLi , z )z RR Transition
(110)
D RLi ( z) lLRi lRLi 1φ( LRi , z)φ( RLi 1, z)z( dRLi1 dLRi ) RL Transition
From(109),
1
lRLi lRLi 1 1 / 1.44 RR Transition
(113)
lLRi lRLi 1 1 / 1.44 RL Transition
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 219
Secondly, conditions for causality are given by checking the delay of the feedback
component of the denominators D RRi ( z) and D RLi ( z) . Since convolution of a “Sinc
Interpolation” function is also a “Sinc Interpolation” function as
1 ( dRLi dRLi 1 )
D RRi ( z) lRLi lRLi 1φ( RLi , RLi 1, z )z RR Transition
(115)
D RLi ( z) lLRi lRLi 1φ( LRi RLi 1, z)z ( dRLi1 dLRi ) RL Transition
The “Sinc Interpolation” function is an infinite sum toward both positive and negative
delays. Therefore it is essentially impossible to endorse causality. However, by permitting
some errors, we can find conditions to maintain causality with errors. To do so, we use a
“Quasi-Sinc Interpolation” function which is defined as
NF
sin( )
( , z )
φ z . (116)
N F 1 ( )
NF
sin 2 ( ) sin 2 ( )
φ ( , z)φ ( , z)dz z
*
2 2
z . (117)
( ) N F 1 ( )
2 NF 1
sin( ) N F 1
( , z )
φ ( )
z . (118)
0
1
D RRi ( z) lRLi lRLi ( RLi , RLi 1, z)z( dRLi dRLi1 N F 1) RR Transition
1φ
(119)
( LRi RLi 1, z)z( dRLi1 dLRi N F 1) RL Transition
D RLi ( z) lLRi lRLi 1φ
The physical meaning of the conditions are the delay difference due to talker’s location
( , z ) in
change should be equal or less than cover range of the “Quasi-Sinc Interpolation” φ
the case of staying in the same microphone zone and the delay sun due to talker’s location
( , z ) in
change should be equal or less than cover range of the “Quasi-Sinc Interpolation” φ
the case of changing the microphone zone.
220 Adaptive Filtering
xURi ( z ) z d +
gSRi (z)
x Si ( z ) gˆ RLi ( z ) gˆ LRi ( z ) h R (z)
gSLi (z)
xULi ( z ) z d +
h L ( z)
ĥ RRi ( z ) ĥ LRi ( z )
HCAL/HSYN
hˆ i ( z ), hˆ i1 ( z )
E Ri ( z ) WARP block
+ +
+
ei ( z )
AEC-II block (MC-NLMS) yi ( z ) ni ( z)
AEC-I block (NLMS)
By the WARP method, monaural estimated echo paths and stereo estimated echo paths are
transformed each other.
Fig. 9. System Configuration for WARP based Stereo Acoustic Echo Canceller
As shown in Fig.9, actual echo cancellation is done by stereo acoustic echo canceller (AEC-
II), however, a monaural acoustic echo canceller (AEC-I) is used for the far-end single
talking. The WARP block is active only when the cross-channel transfer function changes
and it projects monaural echo chancellor echo path estimation results for two LTI periods to
one stereo echo path estimation or vice-versa.
5. Computer simulations
5.1 Stereo sound generation model
Computer simulations are carried out using the stereo generation model shown in Fig.10 for
both white Gaussian noise (WGN) and an actual voice. The system is composed of cross-
channel transfer function estimation blocks (CCTF), where all signals are assumed to be
sampled at f S 8KHz after 3.4kHz cut-off low-pass filtering. Frame length is set to 100
samples. Since the stereo sound generation model is essentially a continuous time signal
system, over-sampling (x6, f A 48KHz ) is applied to simulate it. In the stereo sound
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 221
0.8
y
A Left microphone side
FAR ( z)
0.5 X MicRi , j ( z ) Right microphone side
X Ri , j ( z )
B FB R ( z ) LPF
0.5 N R ( z)
FAL ( z) D2
x
AF1
C
128 tap
FB L ( z ) N L ( z) N-LMS AF2
X Li , j ( z ) AF 8 tap
D LPF
N-LMS
AF
X MicLi , j ( z )
E -
d +
Over-sampling (x6) area to simulate -
analog delay +
Simulation set-up for Stereo CL(dB) Calculation
Sound Generation
Fig. 10. Stereo Sound Generation Model and Cross-Channel Transfer Function Detector
1.0
1.0 0.6
response
0.4
0.8
0.2
Response (AF1 Coefficinets)
0.0
0.6
-0.2
-0.4
0.4 0 1 2 3 4 5 6 7 8
sample
0.2
0.0
-0.2
-0.4
0 20 40 60 80 100 120 140
Sample
Fig. 11. Impulse Response Estimation Results in CCTF Block
Fig. 12. Estimated Tap Coefficients by Short Tap Adaptive Filter in CCTF Estimation Block
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 223
NF 1 NF 1
10log 10 ( y i , j , k / e MON i , j , k ) MonauralEchoCanceller
2 2
k 0 k 0
ERLEL( i 1) j 1 (121)
NF 1 NF 1
10 log 10 ( y 2
i , j , k / eST2
i , j , k ) StereoEchoCanceller
k 0 k 0
224 Adaptive Filtering
where e MON i , j , k and e MON i , j , k are residual echo for the monaural echo canceller (AEC-I)
and stereo echo canceller (AEC-II) for the kth sample in the jth frame in the ith LTI period,
respectively. The second measurement is normalized misalignment of the estimated echo
paths and are defined as
( h R )T ( h R ) ( h L )T ( h L ) (122)
NORM L( i 1) j 1 10 log 10 ( )
( h hˆ
R Ri , j)T ( h hˆ )
L Li , j
where ĥ Ri , j and ĥ Li , j are stereo echo canceller estimated coefficient arrays at the end of (i , j )th
frame, respectively. h R and h L are target stereo echo path impulse response arrays,
respectively.
The WARP operations are applied at the boundaries of the three LTI periods for the talkers
C, D and E. NORM for the stereo echo canceller (AEC-II). As shown here, after two LTI
periods (A, B periods), NORM and ERLE improves quickly by WARP projection at WARP
timings in the Fig. 16. As for ERLE, stereo acoustic echo canceller shows better performance
than monaural echo canceller. This is because the monaural echo canceller estimates an echo
path model which is combination of CCTF and real stereo echo path and therefore the
performance is affected by the CCTF estimation error. On the other hand, the echo path
model for the stereo echo canceller is purely the stereo echo path model which does not
include CCTF.
where Level and T X are constants which determine the level shift ratio and cycle. Figure 15
shows the cancellation performances when Level and T X are 10% and 500msec,
respectively.
Fig. 16. WARP Echo Canceller Performances Affected by Far-End Back Ground Noise
In Fig. 15, the WARP method shows more than 10dB better stereo echo path estimation
performance, NORM, than that of affine projection (P=3). ERLE by stereo echo canceller
base on WARP method is also better than affine projection (P=3). ERLE by monaural
acoustic echo canceller based on WARP method is somehow similar cancellation
performance as affine method (P=3), however ERLE improvement after two LTI periods
by the WARP based monaural echo canceller is better than affine based stereo echo
canceller.
Figure 16 shows the echo canceller performances in the case of CCTF estimation is degraded
by room noise in the far-end terminal. S/N in the far-end terminal is assumed to be 30dB or
50dB. Although the results clearly show that lower S/N degrade ERLR or NORM, more
than 15dB ERLE or NORM is attained after two LTI periods.
Figure 17 shows the echo canceller performances in the case of echo path change happens.
In this simulation, echo path change is inserted at 100frame. The echo path change is chosen
20dB, 30dB and 40dB. It is observed that echo path change affects the WARP calculation and
therefore WARP effect degrades at 2nd and third LTI period boundary.
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 227
Fig. 17. WARP Echo Canceller Cancellation Performance Drops Due to Echo Path
Chance
Figure 18 summarizes NORM results for stereo NLMS method, affine projection method as
WARP method. In this simulation, as a non-linear function for affine projection,
independent absolute values of the right and left sounds are added by
where ABS is a constant to determine non-liner level of the stereo sound and is set to
10%. In this simulation, an experiment is carried out assuming far-end double talking,
where WGN which power is same as far-end single talking is added between 100 and 130
frames.
As evident from the results in Fig. 18, WARP method shows better performances for the
stereo echo path estimation regardless far-end double talking existence. Even in the case
10% far end signal level shit, WARP method attains more than 20% NORM comparing
affine method (P=3) with 10% absolute non-linear result.
228 Adaptive Filtering
Fig. 18. Echo Path Estimation Performance Comparison for NLMS, Affine and WARP
Methods
WARP
-10
-20
Lres(dB)
-30
-40 (b)
-50
0 50 100 150 200 250 300 350 400
Frame
(a) NORM Far-end S/N30dB
(b) Lres Far-end S/N30dB
Fig. 19. Residual Echo (Lres (dB) Level and Normalized Estimated Echo Misalignment.
(NORM) for the voice Source at Far-end Terminal S/N=30dB.
(Level shift 0, 500tap Step gain=1.0)
6. Conclusions
In this chapter stereo acoustic echo canceller methods are studied from cross-channel
correlation view point aiming at conversational DTV use. Among many stereo acoustic echo
cancellers, we focused on AP (including LS and stereo NLMS methods) and WARP
methods, since these approaches do not cause any modification nor artifacts to speaker
output stereo sound which is not desirable consumer audio-visual products such as DTV.
In this study, stereo sound generation system is modeled by using right and left Pth order
LTI systems with independent noises. Stereo LS method (M=2P) and stereo NLMS method
(M=P=1) are two extreme cases of general AP method which requires MxM inverse matrix
operation in each sample. Stereo AP method (M=P) can produce the best iteration direction
fully adopting un-correlated component produced by small fluctuation in the stereo cross-
channel correlation by calculating PxP inverse matrix operations in each sample. Major
problem of the method is that it cannot cope with strict single talking where no un-
correlated signals exist in right and left channels and therefore rank drop problem happens.
Contrary to AP method, WARP method creates a stereo echo path estimation model
applying a monaural adaptive filter for two LTI periods at a chance of far-end talker change.
Since it creates stereo echo path estimation using two monaural echo path models for two
230 Adaptive Filtering
LTI periods, we do not suffer from any rank drop problem even in a strict single talking.
Moreover, using WARP method, computational complexity can be reduced drastically
because WARP method requires PxP inverse matrix operations only at LTI characteristics
change such as far-end talker change. However, contrary to AP method, it is clear that
performance of WARP method may drop if fluctuation in cross-channel correlation becomes
high. Considering above pros-cons in affine projection and WARP methods, it looks
desirable to apply affine method and WARP method dynamically depending on the nature
of stereo sound. In this chapter, an acoustic echo canceller based on WARP method which
equips both monaural and stereo adaptive filters is discussed together with other gradient
base stereo adaptive filter methods. The WARP method observes cross-channel correlation
characteristics in stereo sound using short tap pre-adaptive filters. Pre-adaptive filter
coefficients are used to calculate WARP functions which project monaural adaptive filter
estimation results to stereo adaptive filter initial coefficients or vice-versa.
To clarify effectiveness WARP method, simple computer simulations are carried out using
white Gaussian noise source and male voice, using 128tap NLMS cross-channel correlation
estimator, 1000tap monaural NLMS adaptive filter for monaural echo canceller and
2x1000tap (2x500tap for voice) multi-channel NLMS adaptive filter for stereo echo canceller.
Followings are summary of the results:
1. Considering sampling effect for analog delay, x6 over sampling system is assumed for
stereo generation model. 5 far-end talker positions are assumed and direct wave sound
from each talker is assumed to be picked up by far-end stereo microphone with far-end
room background noise. The simulation results show we can attain good cross-channel
transfer function estimation rapidly using 128tap adaptive filter if far-end noise S/N is
reasonable (such as 20-40dB).
2. Using the far-end stereo generation model and cross-channel correlation estimation
results, 1000tap NLMS monaural NLMS adaptive filter and 2-1000 tap stereo NLMS
adaptive filters are used to clarify effectiveness of WARP method. In the simulation far-
end talker changes are assumed to happen at every 80frames (1frame=100sample). Echo
return loss Enhancement (ERLE) MORMalized estimation error power (NORM) are
used as measurements. It is clarified that both ERLE and NORM are drastically
improved at the far-end talker change by applying WARP operation.
3. Far-end S/N affects WARP performance, however, we can still attain around SN-5dB
ERLE or NORM.
4. We find slight convergence improvement in the case of AP method (P=3) with non-
linear operation. However, the improvement is much smaller than WARP at the far-end
talker change. This is because sound source is white Gaussian noise in this simulation
and therefore merit of AP method is not archived well.
5. Since WARP method assumes stereo echo path characteristics remain stable, stereo echo
path characteristics change degrade WARP effectiveness. The simulation results show
the degradation depends on how much stereo echo path moved and the degradation
appears just after WARP projection.
6. WARP method works correctly actual voice sound too. Collaboration with AP method
may improve total convergence speed further more because AP method improves
convergence speed for voice independent from WARP effect.
As for further studies, more experiments in actual environments are necessary. The author
would like to continue further researches to realize smooth and natural conversations in the
future conversational DTV.
A Stereo Acoustic Echo Canceller Using Cross-Channel Correlation 231
7. Appendix
If N N matrix Q is defined as
X 2 S ( k )= x ( k ), x ( k 1), x ( k N 1)
(A-2)
x ( k ) [ x ( k ), x ( k 1), x ( k 2 p 2)]T
G is defined as a (2 P 1) P matrix as
g T 0 0
T
0 g 0
(A-3)
G
0 0 0
0 0 g T
where g is P sample array defined as
g [ g0 , g1 , , g g P 1 ]T . (A-4)
Q Tlz( Q ) (A-5)
Considering
a(u v ,0) P 1 u v 0
aTlZ (u , v ) a(0, v u) P 1 v u 0 . (A-8)
0 u v P
By setting the (u , v ) th element of P P ( P N ) Toepliz matrix Q as aTlZ (u , v )
( (0 u P ,0 v P ) ), we define a function Tlz( Q ) which determines N N Toepliz
matrix Q .
It is noted that if Q is a identity matrix Q is also identity matrix.
232 Adaptive Filtering
8. References
J. Nagumo, “A Learning Identification Method for System Identification”, IEEE Trans. AC.
12 No.3 Jun 1967 p282
M.M.Sondhi et.al. "Acoustic Echo Cancellation for Stereophonic Teleconferencing",
Workshop on Applications of Signal Processing to Audio and Acoustics, May 1991.
Benesty. J, Amand. F, Gillorie A, Grenier Y, “adaptive filtering algorithm for a stereophonic
echo cancellation” Proc. Of ICASSP-96, Vol.5, May 1996, 3099-3012..
J. Benesty, D.R. Morgan and M.M. Sondhi, “A better understanding and an improved
solution to the specific problems of stereophonic acoustic echo canceller”, IEEE
Trans. Speech Audio Processing, vol. 6, No. 2 pp156-165, Mar 1998.
Bershad NJ, “Behavior of the -normalized LMS algorithm with Gaussian inputs”, IEEE
Transaction on Acoustic, Speech and Signal Processing 1987, ASSP-35(5): 636-644.
T. Fujii and S.Shimada, "A Note on Multi-Cannel Echo Cancelers," technical report of ICICE
on CS, pp 7-14, Jan. 1984
A. Sugiyama, Y. Joncour and A. Hirano, “A stereo echo canceller with correct echo-path
identification based on an input-sliding technique”, IEEE Trans. On Signal
Processing, vol. 49, No. 11, pp2577-2587 2001.
Jun-Mei Yang;Sakai,”Stereo acoustic echo cancellation using independent component
analysis” IEEE, Proceedings of 2007
International Symposium on Intelligent Signal Processing and Communication Systems
(USA) P.P.121-4
Jacob Benesty, R.Morgan, M. M. Sondhi, “A hybrid Momo/Stereo Acoustic Echo Canceller”,
IEEE Transactions on Speech and Audio Processing, Vol. 6. No. 5, September 1998.
S. Shimauchi, S.;Makino, S., “Stereo projection echo canceller with true echo path
estimation”, IEEE Proc. of ICASSP95, vol. 3662 P.P.3059-62 vol.5 PD:1995
S. Makino, K. Strauss, S. Shimauchi, Y. Haneda, and A.Nakagawa,"Subband Stereo Echo
Canceller using the projection algorithm with fast convergence to the true echo
path”, IEEE Proc. of ICASSP 97, pp299-302, 1997
S. Shimauchi, S. Makino, Y. Haneda, and Y.Kaneda, "New configuration for a stereo echo
canceller with nonlinier pre-processing”, IEEE Proc. of ICASSP 98, pp3685-3688,
1998
S. Shimauchi, S. Makino, Y. Haneda, A. Nakagawa, S. Sakauchi, "A stereo echo canceller
implemented using a stereo shaker and a duo-filter control system”, IEEE
ICASSP99 Vo. 2 pp857-60, 1999
Akira Nakagawa and Youichi Haneda, " A study of an adaptive algorithm for stereo signals
with a power difference”, IEEE ICASSP2002,Vol. 2, II-1913-16, 2002
S. Minami, “An Echo Canceller with Comp. & Decomposition of Estimated Echo Path
Characteristics for TV Conference & Multi-Media Terminals”, The 6th Karuizawa
Workshop on Circuits and Sytstems, April 19-20 1993 pp 333-337
S.Minami,"An Acoustic Echo Canceler for Pseudo Stereophonic Voice", IEEE GLOBCOM'87
35.1 Nov. 1987
S.Minami, " A stereophonic Voice Coding Method for teleconferencing", IEEE ICCC. 86, 46.6,
June 1986
Multi-Channel Acoustic Echo Canceller with Microphone/Speaker Array ITC-CSCC’09 pp
397-400 (2009)
WARP-AEC: A Stereo Acoustic Echo Canceller based on W-Adaptive filters for Rapid
Projection IEEE ISPACS’09
10
EEG-fMRI Fusion:
Adaptations of the Kalman
Filter for Solving a High-Dimensional
Spatio-Temporal Inverse Problem
Thomas Deneux1,2
1Centre National de Recherche Scientifique,
Institut de Neurosciences Cognitives de la Méditerranée, Marseille,
2Institut National de la Recherche en Informatique et Automatique, Sophia-Antipolis,
France
1. Introduction
Recording the dynamics of human brain activity is a key topic for both neuroscience
fundamental research and medicine. The two main techniques used, Electro- and
Magneto-encephalography (EEG/MEG) on the one hand, functional Magnetic Resonance
Imaging (fMRI) on the other hand, measure different aspects of this activity, and have
dramatically different temporal and spatial resolutions. There is an important literature
dedicated to the analysis of EEG/MEG (REF) and fMRI data (REF). Indeed, the both
techniques provide partial and noisy measures of the hidden neural activity, and
sophisticated methods are needed to reconstruct this activity as precisely as possible.
Adaptive filtering algorithms seem well-adapted to this reconstruction, since the problem
can easily be formulated as a dynamic system, but it is only recently that such
formulations have been proposed for EEG analysis (Jun et al., 2005), fMRI analysis
(Johnston et al., 2008; Murray & Storkey, 2008; Riera et al., 2004), or EEG-fMRI fusion
(Deneux & Faugeras, 2006b; Plis et al., 2010).
In this chapter, we focus on the so-called "EEG-fMRI fusion", i.e. the joint analysis of
EEG/MEG and fMRI data obtained on the same experiment. For more than a decade, EEG-
fMRI fusion has become a hot topic, because it is believed that both techniques used
together should provide higher levels of information on brain activity, by taking advantage
of the high temporal resolution of EEG, and spatial resolution of fMRI. However, the two
modalities and their underlying principles are so different from each other that the
proposed solutions were often ad hoc, and lacked a common formalism. We show here how
the use of dynamic system formulation and adaptive filter algorithms appears to be a
natural way to achieve the EEG-fMRI fusion.
However, not only do adaptive filtering techniques offer new possibilities for the EEG-
fMRI fusion, but also this specific problem brings new challenges and fosters the
development of new filtering algorithms. These challenges are mostly a very high
234 Adaptive Filtering
dimensionality, due to the entanglement between the temporal and spatial dimensions,
and high levels of non-linearity, due to the complexity of the physiological processes
involved. Thus, we will present some new developments that we issued, in particular, the
design of a variation of the Kalman filter and smoother which performs a bi-directional
sweep, first backward and second forward. And we will show directions for the
development of new algorithms.
The results presented in this chapter have already been published in (Deneux and
Faugeras, 2010). Therefore, we focus more here on explaining in detail our comprehension
of the EEG-fMRI fusion problem, and of its solution through the design of new
algorithms. In this introduction, we pose the problem, its specific difficulties, and
advocate the use of adaptive filters to solve it. In a second part, we will tackle a
simplified, linear, problem: we present our Kalman-based fusion algorithm, discuss its
characteristics and prove that it is more suitable to estimate smooth activities, while the
estimation of sparse activities would rather necessitate the development of new
algorithms based on the minimization of a L1-norm. In a third part, we will address the
problem of strong nonlinearities: we present a modification of the Kalman-based
algorithm, and also call for the development of new, more flexible, methods based for
example on particle filters.
Fig. 1. (A) Physiological basis: this figure briefly summarizes the main effects giving rise
to the measured signals. For the EEG/MEG: the brain gray matter is organized in cortical
columns where large number of cells work synchronously; in particular, the large
pyramidal cells (1), which have a characteristic parallel vertical organization, but also
other cellular types such as the smaller interneurons (2); when synchrony is sufficiently
large, the electrical activities of the neurons sum up together, and can be represented by
an equivalent dipole (3), which generate circulating currents (4) through the brain and
even outside of it; EEG sensors touching the skin, or MEG sensors placed close to it (5) can
detect voltage differences, or currents, generated by the neural activity. For the functional
MRI: neuronal activity (1,2) consumes energy, which is provided by the astrocytes (6),
which themselves extract glucose and oxygen from the blood, and regulate the blood flow
in the cerebral vasculature; this affects the concentration of deoxyhemoglobin, i.e.
hemoglobin which delivered its oxygen molecule; deoxyhemoglobin itself, due to its
paramagnetic properties, perturbs the local magnetic field and modifies the magnetic
resonance signals recorded by the MRI coil (9). (B) EEG-fMRI fusion: EEG or MEG capture
mostly temporal information about the unknown brain activity: the electric signals
measured by the sensors on the scalp. On the contrary, fMRI captures mostly spatial
information, which leads to precise maps showing which brain regions are active. Ideally,
EEG-fMRI fusion should produce an estimation of the activity which takes into account
these complimentary information.
It thus appears that EEG/MEG and fMRI recordings are complimentary (Figure 1(B)), the
first providing more information on the timing of the studied activity, the second, on its
localization. Therefore, many experimental studies combine acquisitions using the two
modalities. Such acquisitions can be performed separately, by repeating the same
experimental paradigm under both modalities: in such case, averaging over multiple
repetitions will be necessary to reduce the trial-to-trial variability. Also, simultaneous
acquisition of EEG and fMRI is possible and found specific application in the study of
epilepsy (Bénar et al., 2003; Gotman et al., 2004; Lemieux, et al., 2001; Waites et al., 2005)
236 Adaptive Filtering
and resting states (Goldman et al., 2002; Laufs et al., 2003) (note that, since on the contrary
MEG cannot be acquired simultaneously with fMRI, we focus here on EEG-fMRI fusion;
however, our results will remain true for MEG-fMRI fusion in the context of separate
average acquisitions). Apart from a few trivial cases, the joint analysis of the two dataset
in order to best reconstruct the observed neural activity presents several specific
challenges
Fig. 2. Schematic representations of EEG-fMRI fusion. (A) The neural activity dynamics are
represented by a 2-dimensional array (x-axis is time, y-axis is space). A yellow pattern inside
this array marks a single event; the location of this event can be identified precisely by fMRI
(red mark), and its exact dynamic can be recorded by the EEG (time courses). In such case,
these only information, i.e. spatial information from the fMRI and temporal information
from the EEG are sufficient to fully describe the event. (B) The same schematics are used,
but two events occur now, at two different locations and with different dynamics (see the
yellow and orange patterns). In such case, the sole spatial information from fMRI and
temporal information from EEG is not sufficient to fully describe the events since it is not
possible to determine which part of these information correspond to which event. (C) Now,
a similar spatio-temporal array features a complex activity. Both the temporal and spatial
dimensions of EEG and fMRI are considered: the fMRI [EEG, resp.] measure is represented
by a narrow vertical [horizontal] array to indicate a reduced temporal [spatial] resolution;
more precisely, these measures were obtained by low-pass filtering and sub-sampling the
neural activity along the appropriate dimension. The "EEG-fMRI fusion" problem consists in
estimating the neural activity given the two measures, and should result in a better
reconstruction than when using only the EEG or the fMRI measures (it is indeed the case
here, since fMRI-only reconstruction lacks spatial precision, and EEG-only reconstruction
lacks temporal resolution).
As we noticed above, the temporal dimension and spatial dimension are highly entangled in
the EEG-fMRI fusion problem. Indeed, one EEG measure at time t on a specific sensor is
influenced by neural activity at time t in a large part of the cortex if not all; and conversely,
one fMRI measure at time t and at a specific spatial location x is influenced by neural
activity during the last ~10 seconds before t at location x. Therefore, traditional approaches
estimate p(u | yEEG) independently for each time point, and p(u | yfMRI) independently for
238 Adaptive Filtering
each spatial location. But it is not possible to "cut the problem in smaller pieces" in order to
estimate p(u | yEEG, yfMRI). This results in an inverse problem of very high dimensionality:
the dimension of u, which surely depends on the temporal and spatial resolution used. If we
choose for u the spatial resolution of fMRI, and the temporal resolution of EEG, then its
dimension is the product of several thousands of spatial locations (number of cortical
sources) by up to one thousand of time instants per second of experiment (for an EEG
sampling rate at 1kHz).
If the experimental data is averaged over repetitions of a specific paradigm, then its total
temporal length can be limited to a few seconds, such that the temporal and spatial sizes of
u are in the same range. On the contrary, if the interest is in estimating the neural activity
without any averaging, the length can be of several minutes or more, and the temporal size
of u becomes extremely large. It is in this case that adaptive filter techniques are particularly
interesting. Also, it is obvious that, although fMRI measures depend on activity which
occurred in the last ~10s, they are independent from earlier activities; therefore, the adaptive
filter will need to keep in memory some information in order to link the delayed detections
of the activity by EEG and fMRI, but this memory does not need either to cover the full
extent of the experiment.
Fig. 3. Graphical representation of the forward model. This model features the evolution of
the neural activity u, and of the hemodynamic activity h (driven by the neural activity), the
EEG measure yEEG (which depends only on the neural activity), and the fMRI measure
(which depends directly only on the hemodynamic state, and hence depends indirectly on
the neural activity). Blue background indicate measures, which are known, while orange
background indicate hidden states, which have to be estimated from the measures.
The graph in figure 3 represents the forward model which will guide us in designing
algorithms for EEG-fMRI fusion. The neural activity at time t, ut, has its own evolution, and is
driving the evolution of metabolic and hemodynamic variables, such as oxygen and glucose
consumption, blood flow, volume and oxygenation, represented altogether by the variable ht.
At time t, the EEG measure is a function only of neural activity at the same time, while the
fMRI measure is a function of the hemodynamic state. Note that the acquisition rate of fMRI is
in fact much lower than that of EEG, but it can be useful for the sake of simplicity to re-
interpolate it at the rate of EEG, without loss in the algorithm capabilities (Plis et al., 2010).
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 239
1.2.3 Nonlinearity
The second challenge of EEG-fMRI fusion is the high level of nonlinearity which can exist in
the forward model depicted in figure 3. To make it clear, let us first show how this graph
corresponds to the physiological models depicted in figure 1. Clearly, EEG measure ytEEG is
the signal recorded by the set of EEG sensors (5) and fMRI measure ytfMRI is the MRI image
reconstructed after resonance signals have been recorded by the MRI coil (9). The
metabolic/hemodynamic state ht is the state of all elements involved in the hemodynamic
response (6,7,8). The case of the so-called "neural activity" ut is more delicate, since it can be
either a complex description of the actual activities of different types of neurons (1,2), or
simply the electric dipole that averages electrical activities in a small region (3).
And here resides the main source of nonlinearity. Indeed, different types of neuronal
activities can lead to the same energy consumption and hence to similar fMRI signals, and
yet, average into very different equivalent dipoles of current. For example, some activity can
result in a dipole with an opposite orientation, or even can be invisible to the EEG. This
explains in particular that some activity can bee seen only by fMRI (when electrical activities
do not have a preferred current orientation), or only by EEG (when a large area has a low-
amplitude but massively parallel activity).
Besides, authors have also often emphasized the nonlinearity of the hemodynamic process
(transition ht ht+!), but in fact these nonlinearities, for example those found in the
“Balloon Model” modeling (Buxton et al., 2004), are less important, and linear
approximations can be used as long as the error they introduce does not exceed the level of
noise in the data (Deneux et al., 2006a). Note also that the spatial extent of the hemodynamic
response to a local activity can be larger than that of the activity itself, since changes can be
elicited in neighboring vessels; however, such distant hemodynamic effects can still be a
linear function of the local activity. In any case, such small discrepancies in the spatial
domain are usually ignored, mostly because of a lack of knowledge.
As a summary, EEG-fMRI fusion is a difficult problem, because of its high dimensionality,
where space and time are intrinsically entangled, and because of nonlinear relations – and
surely a lack of robust knowledge – at the level of the underlying link between electric and
hemodynamic activities. Therefore, different approaches are proposed to tackle these
difficulties. The aforementioned non-symmetric methods are efficient in specific
experimental situations. Region-based methods decrease the dimensionality by clustering
the source space according to physiologically-based. Here, we do not attempt to decrease
the dimensionality, or simplify the inverse problem, but we propose the use of adaptive
filters to solve it.
algorithm, while new algorithms which would minimize a L1-norm would be more adapted
to the estimation of sparse activities.
2.1 Methods
2.1.1 Kalman filter and smoother
The forward model summarized in figure 3 can be simply described in the dynamic model
formalism:
x (t ) = F( x(t )) + ξ (t )
, (1)
y(t ) = G( x(t )) + η (t )
where the x(t), the hidden state, is the combination of the neural activity u(t) and the
hemodynamic state h(t), and y(t), the measure, is the combination of yEEG(t) and yfMRI (t), ξ(t)
is a white noise process, and η(t) a Gaussian noise. Once time is discretized and the
evolution and measure equations are linearized, it yields:
x1 = x 0 + ξ 0 , ξ 0 ~ N (0, Q 0 )
x k +1 = Ax k + ξ k , ξ k ~ N (0, Q ) , (2)
y = Dx + η , η ~ N (0, R )
k x k k
where A and D are the evolution and measure matrices, obtained by linearization of F and
G, N(0,Q0) is the Gaussian initial a priori distribution of the hidden state, and N(0,Q) and
N(0,R) are the Gaussian distributions of the evolution and measure noises.
Estimating the hidden state given the measures is performed with the 2 steps of the Kalman
filter and smoother (Kalman, 1960; Welch & Bishop, 2006; Welling, n.d.). The first step runs
forward and successively estimates the distributions p( xk | y1 ,..., yk ) for increasing values of k.
The second runs backward and estimates p ( xk | y1 ,..., yn ) for decreasing values of k (n being
the total number of measure points).
We recall here the equations for this estimation, for which we introduce the following
notation (note that all distributions are Gaussian, therefore they are fully described by their
mean and variance):
xˆkl = E ( xk | y1 ,..., yl )
. (3)
Pkl = V ( xk | y1 ,..., yl )
First, the Kalman filters starts with the a priori distribution of x1:
xˆ10 = x 0
, (4)
P10 = Q 0
then repeats for k = 1, 2, ..., n the "measurement update":
xˆ kk +1 = Axˆ kk
. (6)
Pkk+1 = APkk A T + Q
Second, the Kalman smoother repeats for k = n-1, n-2, …, 1:
u (t ) = −λu (t ) +ξ u (t ) , (8)
where λ is a positive parameter which controls the temporal autocorrelation of sources time
courses (<u(t)u(t+Δt)>/<u(t)u(t)> = exp(-λ Δt)), and ξu is a Gaussian innovation noise. This
noise is white in time, but can be made smooth in space and thus make u itself smooth in
space, by setting its variance matrix such that it penalizes the sum of square differences
between neighboring locations.
Since u represents here the amplitudes of equivalent dipoles of current at the sources
locations, the EEG measure is a linear function of it:
where B is the matrix of the EEG forward problem, constructed according to the Maxwell
equations for the propagation of currents through the different tissues and through the skull
(Hamalainen et al. 1993). The measure noise ηEEG(t) is Gaussian and independent in time
and space.
Finally, the hemodynamic state evolution and the fMRI measure are modeled using the
Balloon Model introduced by R. Buxton (Buxton et al., 2004):
f (t ) = ε u − κ s f (t ) − κ f ( f (t ) − 1)
v (t ) 1
= ( f (t ) − v(t )1/α )
τ
, (10)
q(t ) 1 1 − (1 − E0 )1/ f ( t ) q( t )
= ( f (t ) − v(t )1/α )
τ E0 v(t )
fMRI
y (t ) = V0 ( a1 ( v(t ) − 1) + a2 (q(t ) − 1)) + η fMRI (t )
where the hemodynamic state h(t) is represented by four variables: the blood flow f(t), its
time derivative, the blood flow v(t) and the blood oxygenation q(t). ε , κ s , κ f , τ , α , E0 , V0 ,
a1 and a2 are physiological parameters. The measure noise ηfMRI(t) is Gaussian and
independent in time and space.
242 Adaptive Filtering
Fig. 4. EEG-fMRI fusion using the linear Kalman filter and smoother. (A) A neural activity has
been generated on 2,000 sources spread on the cortical surface according to the model, then
EEG and fMRI measures of this activity were generated. 5 snapshots of the activity at intervals
of 100ms are shown (row ‘SOURCE’). Below are shown the Kalman reconstruction using only
EEG measures (‘EEG-only’), only fMRI measures (‘fMRI-only’), or both together (‘FUSION’).
Noticeable, EEG estimate is very smooth in space, fMRI estimate is varies very slowly in time,
and the fusion estimate is the closest to the true activity. The white arrows indicate one
particular source location, and the graph on the right compares the time courses of the true
activity of this source with its three estimations. We can also observe that the fMRI estimate is
much smoother than the EEG estimate, and that the fusion estimate is the closest to the true
signal. (B) In a second simulation, neural activity was chosen to represent a brief (50ms)
activation of a few neighboring sources. Similar displays show that EEG finds a brief but
spread activity, fMRI find a precisely localized but slow activity, and the fusion estimates finds
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 243
a trade-off between the two of them. Two locations are shown with arrows: the center of the
true activity, and the center of the activity found by EEG alone: again, the two graphs on the
right show how the fusion algorithm finds a trade-off between the two previous estimates.
(C) In a third simulation, the generated activity is smooth in both time and space. The results
are similar to (A), and it appears even more clearly that the fusion algorithm efficiently
combines information provided by EEG and fMRI.
Note that the above physiological parameters, as well as the variances of the different
evolution and measure noise, are fixed to some physiologically-meaningful values (Friston
et al., 2000) for both simulation and estimation.
2.2 Results
We used the forward model (equation (8)) to generate a random neural activity at 2000 source
locations spread on the cortical surface, for a time duration of 1 min, at a 200Hz sampling rate
(Figure 4(A), first row). And to generate EEG (equation (9)) and fMRI measures (equation (10))
of this activity. Then, the Kalman filter and smoother (equations (4)-(7)) were used to estimate
the initial activity, based either on the EEG measure alone, fMRI measure, or both together.
Figure 4 shows these estimation results and compares them to the true initial activity: the
left part compares snapshots of the activity map (both true and estimated) at different time
instants, and the right part compares the true and estimated time courses of one selected
source.
Specific characteristics of the estimation results can be observed: the EEG-alone estimations
are very smooth spatially, but change fast in time; inversely, the fMRI-alone estimations
have more spatial details, but vary very slowly in time; finally, the fusion estimations are the
most similar to the true activity. All this is in accordance with the idea that EEG-fMRI
should combine the good temporal resolution of EEG and good spatial resolution of fMRI.
More quantitatively, table 1 (first column) shows indeed that the fusion estimate is the one
which best correlates the true activity. Even, the part of variance explained by the fusion
estimate is almost equal to the sum of the parts of variance explained by the EEG and fMRI
estimates, indicating that the two modalities capture complimentary rather than redundant
information, and that the fusion algorithm efficiently combines these information.
This is shown with even more details in figure 5, where true and estimated activities have
been decomposed as sums of activities in specific temporal and spatial bandwidths. Thus,
the goodness of fit could be computed independently for different temporal and spatial
frequencies. We can observe then that fMRI estimates accurately activities with low
temporal frequencies (blue surface), while EEG best estimates activities with low spatial
frequencies (green surface). And the EEG-fMRI fusion (yellow surface) efficiently combines
information provided by both measures, since it provides a better fit than both measures in
any frequency domain.
As a result, it appears that EEG-fMRI fusion gives the best performances on activities which
are smooth both in time and space, as opposed to fast-varying activities. We illustrated this
result by producing two additional simulations. First, we generated a very sparse and
focused activity, by activating only a small number of neighboring sources during a short
period of time (50ms). Second, we generated a smooth activity, by performing a temporal
and a spatial low-pass filtering of activity generated by the model. In both cases, we used
the model to generate corresponding EEG and fMRI measures, and then estimated the
activity: results are shown in figure 4, parts B and C, and the goodness of fit are displayed in
table 1, two rightmost columns. In both cases EEG and fMRI succeed in collaborating to
produce a fusion estimation better than they do alone. As expected from the previous
considerations, the fusion is very satisfying on the smooth dataset, and more than 50% of the
variance of the true sources activity is explained. On the contrary, its result on the sparse
activity is disappointing: it seems to give an average of the EEG and fMRI estimates rather
than recovering the focused activity.
Fig. 5. Spectral analysis of EEG-fMRI fusion efficiency. The correspondence between the true
activity and estimated activity are quantified independently in separate temporal and spatial
frequency domains, using the percentage of variance explained as in table 1. We observe that
fMRI estimation is accurate at low time frequencies, while EEG estimation is accurate at low
spatial frequencies. The fusion estimation is efficient, in the sense that its accuracy is superior
to both EEG and fMRI estimations in all frequency domains, and in some specific domains
where EEG and fMRI do bring complementary information (at temporal scale of 1~10s and
spatial scale >2cm), it yields even higher values by combining these information.
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 245
2.3 Discussion
2.3.1 Computational cost
Our simulations prove the feasibility of EEG-fMRI fusion at a realistic scale despite its high
dimensionality, at least in the case of a linear forward model, using the Kalman filter and
smoother algorithms. We would like to stress here the specific aspects of their
implementation which made it possible, and further improvements which are desirable.
First of all, it is important to note that the fact that we used a linear model was critical for
keeping a reasonable computational cost, because the Kalman filter and smoother exhibit
some particular properties which would have been lost if we had used the extended Kalman
filter to run estimations using the exact nonlinear equations (in particular, equation (10)).
Indeed, the most costly operations are the matrix computations in equations (5)-(7). The Pkk −1
and Pkk variance matrices describe the degree of certitude about the hidden state estimation
just before and after the measure update; they obviously depend on the characteristics of the
noise (matrices Q0, Q and R), but, in the case of the linear Kalman filter, they do not depend
on the values of the measurements yk. Moreover, they converge to some limits (Welling,
n.d.), which we note P and P̂ , when k → +∞ ; these limits are the solutions of the system:
Therefore, the variance matrices can be pre-computed, and for values of k sufficiently large,
it is possible to use P and P̂ instead, which in the case of long experiments decreases
dramatically the number of times that the matrix operations in (5) and (6) must be
performed. We can even go one step further by choosing Q0= P : then, for all k we have
P k −1 = P and P k = Pˆ . This choice for Q0 is equivalent to saying that at time 0 the system has
k k
been running for a long time already according to its natural evolution equations, and that
we have an information on its state from previous measures; since in fact such previous
measures do not exist, this a priori variance is too small, which could have the consequence
that the beginning of the estimated sequence underfits the measures; however we did not
observe important estimation errors in the beginning of our simulations; on the other hand
this choice is particularly convenient in terms of computational and memory cost.
The EEG-fMRI fusion algorithm can then be performed the following way:
• Apply iteratively the matrix computations in (5) and (6) until Pkk −1 and Pkk converge to
their limits P and P̂ , starting with P10=Q and stopping after a number of iterations
determined heuristically
• ˆ −1
Compute K = PDT ( DPDT + R )−1 and J = PAP
• Apply the vector operations in equations (5), (6) and (7) to compute the estimation of
the hidden states, using for all k Pkk −1 = P and Pkk = P̂ . Note that it is not necessary to
compute the values of the Pkn variance matrices.
The total computational cost is determined by the cost of the first step, the matrix
convergence. The time complexity of this convergence is O(n3fs), where n is the number of
cortical sources, and fs the sampling frequency: indeed, the limiting factor is the
246 Adaptive Filtering
multiplication of matrices of size O(n), repeated O(fs) times to reach the convergence. Since
memory limitation can also occur for large values of n, it is also important to know the space
complexity: it is equal to O(n2), the number of elements of the variance matrices. For our
simulations, the convergence of the variance matrices (300 iterations) took 24 hours on an
AMD Opteron 285, 2.6GHz, and occupied 5Gb RAM. Since we used n = 2,000 sources, and
since the size of the hidden state x(t) = {u(t),h(t)} is 5n, it involved multiplications and
inversion of square matrices of size 10,000.
One should note that such size is in fact quite small, compared to the total size of the neural
activity to be estimated. For example, in our simulation the number of elements of u is n*fs*T
= 2,000*200*60 = 24,000,000. As mentioned in the introduction, decreasing this
dimensionality was made possible by the use of adaptive filtering.
A promising direction for decreasing the computational cost would be to describe the
variance matrices in a more compact way than full symmetric matrices. Indeed, although
the matrices are not sparse, they exhibit some specific structure, as shown in figure 6, and
studying carefully this structure should enable the description of the matrices using a lower
dimension representation. Maybe even these matrices can be only approximated: indeed,
they contain covariance information between all pairs of hidden states, for example between
the electric state at a first location and the hemodynamic state at second distant location,
which is not necessarily critical for the estimation. However, it must be ensured that
approximations do not affect the numerical stability of the algorithm.
Another approach would be to use specialized numerical methods in order to reach faster
the convergences of matrices P and P̂ , unless an analytic expression as a function of Q and
R can be derived!
2.3.2 Algorithm
Our simulations showed that using the Kalman filter and smoother to perform EEG-fMRI
fusion is efficient when estimating an activity which is smooth in space and time. In such
case indeed, the effects of the disparity between the resolutions of both modalities are
attenuated, therefore the information they provide on the underlying neural activity can be
combined together in an efficient way.
On the contrary, when estimating a sparse and focused activity as in figure 4(B), results are
more disappointing. The EEG-only estimation is focused in time, but spread and imprecise
in space, while the fMRI-only estimation is focused in space, but spread in time. Thus, an
ideal fusion algorithm should be able to reconstruct an activity focalized both in time and
space. But instead, our fusion estimate looks more like an average between the EEG-only
and fMRI-only estimates.
In fact, this is a direct consequence of using the Kalman filter; even, this is a consequence of
using Bayesian methods which assume Gaussian distributions! Indeed, when using the
Kalman filter or any such method, the a priori distribution of the neural activity u is a
Gaussian (fully described in our case by the initial distribution p(u1) and the Markov chain
relation p(uk+1|uk)), and estimating u will rely on a term which minimizes a L2-norm of u (i.e.
a weighted sum of square differences). We show using schematic examples that the
minimization of a L2-norm results in solutions which are smooth rather than sparse, and in
the case of two measures of the same activity u which have very different temporal and
spatial resolution, estimation results show similar pattern as observed above with two
components, one smooth in time, the other smooth in space.
These examples are as in figure 2(C): a 2-dimensional array represents u, one dimension
standing for time and the second for space. We use a simple forward model: "EEG" and
"fMRI" measures are obtained by low-pass filtering and subsampling u in the spatial and
temporal dimension, respectively, resulting in lower resolution versions of u. The
corresponding inverse problem – estimate u given the two measures – is then strongly
under-determined, as is the real EEG-fMRI fusion inverse problem, and some constraint can
be imposed on the estimated û in order to decide between an infinity of solutions. We
compared minimizing the L2-norm of û , i.e. its mean square, and minimizing the L1-norm,
i.e. its mean absolute value. Figure 7 shows several different estimations, using various
patterns for the true activity u.
248 Adaptive Filtering
Fig. 7. Schematic reconstructions illustrate the different behavior of L2-norm based and L1-
norm based algorithms. Different patterns of activity are generated, one per column, to
produce 2-dimensional array as in figure 2(C), representing the neural activity. Schematic
EEG and fMRI measures are generated, again as in figure 2(C). The first row displays these
activity. The second and third row display reconstructions of the activity from the 2
measures, based either on the minimization of the L2-norm or the L1-norm of the source. We
can notice that L1-norm method produces sparse estimates (which is preferable in the case of
a Dirac, a set of local activities, or when only a few sources are activated). The L2-norm
method produces smoother estimates, more precisely, the sum of two components, one
smooth in time, the second smooth is space; it is more appropriate to the case of a smooth
activity (the ‘correlated noise’ column).
It happens that, very clearly, the L1-norm leads to sparse estimates, while the L2-norm leads
to smooth estimates, which even more precisely show two component, one smooth in time,
the second smooth in spread. For example, a Dirac is perfectly estimated by the L1-norm
algorithm, and very poorly by the L2-norm algorithm. A set of focal activities is also better
estimated using the L1-norm. On the contrary, a smooth activity is better estimated using the
L2-norm.
In fact, all this is quite well-known in the image processing domain (Aubert and Kornprobst
2006; Morel and Solimini 1995). However, the consequences are important: algorithms based
on L2-norm such as the Kalman filter and smoother are appropriate for EEG-fMRI fusion
whenever the neural activity to be estimated is known to be smooth in time and space (one
can think for example about spreading waves of activity). Inversely, if the activity is known
to be sparse, the inverse problem should be solved through the minimization of other
norms, such as the L1-norm, which means that any Bayesian method relying on Gaussian
distribution would not be appropriate. Rather this calls for the development of new
algorithms, if possible still performing a temporal filtering, which would be based for
example on the L1-norm.
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 249
Fig. 8. Limits of the extended Kalman filter and smoother. (A) Left: the time course of a
scalar hidden state is generated as a Brownian motion (solid line), and a 2-elements noisy
measure is generated, the first measure being obtained by the linear identity function (dark
gray crosses), and the second, by the non-linear absolute value function (light gray crosses).
Right: reconstruction of the hidden state by the Kalman filter and smoother; black solid line:
true time courses; gray solid line: estimated time courses; gray dashed lines: representation
of the a posteriori variance, the true time courses indeed lies within this variance. (B) More
weight is given to the nonlinear measure (the linear measure is noisier, while the nonlinear
measure is less noisy). The estimation now fails is gets trapped in local minima where the
sign of the hidden state is misestimated.
250 Adaptive Filtering
Second, EKF can handle only small nonlinearities, and not strong ones such as the use of an
absolute value. Simulations in figure 8 demonstrate this fact: we use a simple dynamical
model, the hidden state is a one-dimensional Brownian motion, and the measure consists of
two scalars, the first equal to the hidden state plus noise, the second equal to the absolute
value of hidden state plus noise (left part of the figure). In figure 8(A), the noise in the first
measure is sufficiently low so that the estimation using the Kalman filter and smoother is
successful. On the contrary, in figure 8(B), the noise in the first measure is higher, so that
information on the sign of the hidden state is weaker: then the estimation diverges strongly
from the true value. More precisely, the estimation gets trapped in a local minimum, and the
local linearizations and approximations with a Gaussian performed by the EKF become
totally erroneous compared to the real a posteriori distribution.
As a consequence, we would like to encourage the development of new adaptive filter
algorithms, based for example on particle filtering. However, we will show in this section on
low-dimensionality examples that it is still possible to use the extended Kalman filter,
thanks to (i) a modeling of electric-metabolic relations which minimizes nonlinearities and
can handle the strong disparity between the fast temporal variations of the EEG signals and
slow variations of the fMRI signals, and (ii) a new backward-forward estimation scheme
that minimizes estimation errors.
3.1 Methods
3.1.1 Physiological model
Several studies attest that fMRI signals correlate particularly well oscillating EEG activities
in specific frequency domains (Kilner et al., 2005; Laufs et al., 2003; Martinez-Montes et al.,
2004; Riera et al., 2010). In such cases, the hemodynamic activity depends linearly, not on the
current intensities themselves, but rather on the power of the currents in these specific
frequencies. We propose here a simple model for the current intensity that favors oscillating
patterns and makes it easy to link hemodynamics to the oscillation amplitude. It consists in
describing the current intensity at a given source location as:
where χ(t) is the envelope of the signal, fc is the preferred frequency, and φ(t) a phase shift.
cos(2π f c t ) reflects the intrinsic dynamic of the region, while χ(t) reflects a modulation of
this activity in amplitude, and φ(t) a modulation in frequency (if ϕ (t ) > 0 , the frequency is
higher, and if ϕ (t ) < 0 , the frequency is lower) . Note that although u(t) varies fast, χ(t) and
φ(t) can change at a slower pace (figure 9).
The evolution of χ and φ is modeled as follows:
χ (t ) = −λχ χ (t ) + ξ χ (t ), ξ χ ~ N (0, Qχ )
σϕ . (17)
ϕ (t ) = −λφ (ϕ (t ) − ϕ (t )) + ξϕ (t ), ξϕ ~ N (0, Qϕ )
χ is driven by the same Ornstein-Uhlenbeck process as in our earlier model (8), except that we
did not allow χ to become negative (at each iteration of the generation of the simulated data,
whenever it would become negative, it was reset to zero). φ is driven by a similar innovative
process, but the phase shift at a given source location, instead of being pulled back towards
zero, is pulled back towards the average phase shift over neighboring locations, ϕ ϕ : this
σ
ensures a spatial coherence between sources, controlled by the model parameter σφ.
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 251
u(t ) = χ (t )cos(2π f c t + ϕ (t ))
Fig. 9. Hidden states of the nonlinear model for neural activity. The signal u is defined by
two hidden states χ and φ which can vary much slower than u itself.
Then we model the EEG measure as in equation (9), i.e. it is a linear function of u(t). Note
however that, since now u itself is represented by χ and φ, the EEG measure has become a
nonlinear function of the hidden state: in fact, the transition ut ytEEG is now nonlinear.
Besides, fMRI measure now depends linearly on the signal envelope χ(t): the first equation
in (10) is replaced by
f (t ) = εχ (t ) − κ f (t ) − κ ( f (t ) − 1) . (18)
s f
x1 = x 0 + ξ 0 , ξ 0 ~ N (0, Q 0 )
x k +1 = F( xk ) + ξ k , ξ k ~ N (0, Q ) , (19)
y = G( x ) + η , η ~ N (0, R )
k x k k
where F and G are nonlinear evolution and measure functions. Whereas it would be possible
to linearize again F without major harm, the measure function G on the contrary is highly
nonlinear because of the cos(2π f c t + ϕ (t )) in the EEG measure.
We would like to use the extended Kalman filter and smoother (Arulampalam et al., 2002;
Welch & Bishop, 2006) to estimate the hidden state. However, as observed in figure 9, small
errors of the estimation can lead to erroneous linearizations, and therefore to increased errors in
the next filtering steps, causing divergence. Also, as explained earlier, the classical Kalman filter
estimates p( xk |y 1 ,..., y k ) for k = 1…n, and the Kalman smoother estimates p( x k |y 1 ,..., y n ) for k
= n-1…1. The problem is that p( xk |y 1 ,..., y k ) does not use any fMRI information to estimate xk,
since the fMRI measure occurs only several seconds later. Therefore, we propose to first run a
backward sweep during which we estimate p( x k |y k ,..., y n ) for k = n…1, and only after a
forward sweep, to estimate p( x k |y 1 ,..., y n ) for k = 2…n. Indeed, p( x k |y k ,..., y n ) will make less
estimation error since {y k ,..., y n } contains both EEG and fMRI information about activity xk.
We introduce the new notation:
252 Adaptive Filtering
x kl = E( x k |y l ,..., y n )
l . (19)
Pk = V ( xk |y l ,..., y n )
First, it is important to point out that, in the absence of any measurement, all states xk have
the same a priori distribution N(x0,P0), where P0 can be obtained by repeating the Kalman
filter time update step until convergence:
First, the backward step starts with the a priori distribution of xn:
xnn+1 = x 0
n+ 1 , (21)
Pn = P 0
J = P 0 AT ( AP 0 AT + Q )−1
x kk +1 = x 0 + J ( xkk++11 − x 0 ) , (23)
k +1 k +1 T
Pk = JPk +1 J + ( I − JA)P 0
where A is the derivative of the evolution function at x kk+1 : F( x ) ≈ F( x kk+1 ) + A( x − x kk+1 ) .
Second, the forward smoothing steps repeats for k = 2, 3, …, n:
We derived these equations in a similar way to how the measure update step is derived in
(Welling, n.d.):
- Our measure update is the same as for the Kalman filter.
- The backward time update is obtained by first proving that
p( x k |xk+1 ) = N ( x 0 + J ( x k+1 − x 0 ),( I − JA)P 0 ) , and then by calculating
p( x k |y k +1 ,..., y n ) = p( xk |x k +1 )p( x k +1 |y k +1 ,..., y n )dxk +1 .
xk + 1
Since we observed some divergences when applying this second smoothing step due to
accumulating errors in the estimation of the phase, we applied it only on the estimation of
the envelope (and kept the result from the first step only for the phase estimate).
3.2 Results
We have run this algorithm on a highly reduced dataset where the cortical surface was
downsampled to 10 cortical patches, and only 3 EEG electrodes were considered (so as to
preserve the underdetermination of the EEG backward problem). We generated 1min long
patterns of activity, sampled at 200Hz, with an average frequency of the oscillations
fc=10Hz. First we generated a cortical activity and measures according to the model (17);
then we estimated the activity from these measures: figure 10(A) shows estimation results,
which are quantified in table 2, first two columns. We compare estimations obtained by
applying the filtering technique to either the EEG alone, fMRI alone, or both together (in the
case of fMRI alone, since there is no information on the phase φ, only the estimated envelope
is displayed): we see that the envelope of the signals is best estimated using fMRI alone,
whereas the EEG-fMRI fusion estimates the signals better than EEG-only. We also generated
a one-second width pulse of activity and noisy fMRI and EEG measures, and the estimation
results (figure 10(B) and two last columns in table 2) are even more significant: the fusion
estimate clearly provides the best estimate of the signals and of their envelopes.
3.3 Discussion
3.3.1 Algorithm
We briefly discuss here the specific ideas presented in this section. First, the way we
proposed to describe the electric activity in term of envelope and phase is very specific, and
should be used only in the analysis of experiments where it is well-founded (i.e.
experiments where the fMRI signals are known to correlate with EEG activity in a given
frequency range). However, the key idea of this modeling could still be used in different
models: it is to set apart the information which is accessible to fMRI, in our case χ, the
envelope of the activity, which somehow measures "how much the source is active" in total,
but is ignorant of the exact dynamics of the currents u. In such a way, EEG and fMRI
measures collaborate in estimating χ, but the phase shift φ is estimated only by the EEG (at
least at a first approximation, since in fact in the fusion algorithm, all the different states are
linked through their covariance). Furthermore, χ is easier to estimate for the fMRI since it
can evolve slower that u, which corresponds more to its weak temporal resolution; and
similarly, φ can be easier to estimate for the EEG if there is a high phase coherence among
the sources of a same region, because then less spatial resolution is needed.
254 Adaptive Filtering
Fig. 10. EEG-fMRI fusion using the nonlinear Kalman algorithm. (A) A neural activity has
been generated on 10 sources according to the nonlinear model, then EEG and fMRI
measures of this activity were generated (panel ‘SOURCE’). We can observe in the structure
of this activity the modulations of χ, the envelope of the signals, the intrinsic oscillatory
activity at frequency fc,, and in the magnified area, the dephasing between different sources
due to the variations of the state φ. EEG and fMRI measures were generated according to the
model, and estimations were performed on either EEG, fMRI or both (the 3 next panels).
Note that the fMRI estimate is not oscillating since φ is not estimated. We can observe the
improved estimation in the fusion case compared to the EEG case. (B) In a second
simulation, a single pulse of activity was generated. We can observe very well that the EEG
estimate finds the accurate dynamics, the fMRI estimate finds the accurate localization, and
the fusion produces a very satisfying estimate, with the correct dynamic and localization.
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 255
Indeed, we could run estimation for ten (as shown here) to a few hundred sources, as shown
in (Deneux & Faugeras, 2010) spread on the cortical surface, but to estimate the activity of
several thousands sources would be too demanding, because the simplifications which
apply in the linear case (section 2.3.1) do not any more. However, there is a large room for
further improvements.
Fig. 11. Details of the Kalman filter and smoother steps. Estimation obtained using the linear
model on a dataset with 1,000 sources with a focused activity. (A) After only the Kalman
filter has been applied, the neural activity estimate is not accurate, and one can recognize
two peaks, which ‘imprint’ the information given by the EEG and fMRI measure (the second
could not be placed correctly in time since the filter encounters the fMRI measures too late).
(B) After the Kalman smoother has been applied, this ‘fMRI-informed peak’ has been put
back at the correct time location.
Second, we want to stress that the adaptation we propose for the extended Kalman filter,
where the backward sweep is performed before the forward sweep, might be well-adapted
for many other situations where adaptive filters are used on a nonlinear forward model, but
where the measure of the phenomenon of interest are delayed in time. We already explained
above that the order between these two sweeps matters because accumulating errors cause
divergence, therefore it is preferable that the first sweep will be as exact as possible. Note
that this is due exclusively to the errors made in the local linearizations performed by the
extended Klaman filter; on the contrary, the order of the sweeps does not matter in the case
of a linear model, and all the derivations for the means and variances of p(xk|y1,…,yl) are
exact. To illustrate this fact, we show in figure 11 the neural activity estimated in the
previous part by the linear Kalman filter and smoother: although the Kalman filter estimate
is not accurate, it somehow "memorizes" the information brought late by fMRI data (figure
10(A)); therefore, this information is later available to the Kalman smoother, which can then
"bring it back to the good place" (figure 10(B)).
specifically deal with the strong nonlinearities of the model. For example, (Plis et al., 2010)
proposed a particle filter algorithm (Arulampalam et al., 2002) to estimate the dynamics of a
given region of interest based on EEG and fMRI measures. Unfortunately, the fact that they
reduced the spatial dimension to a single ROI reduced the interest of their method, since the
fMRI could not bring any additional information compared to the EEG, and they admit that
increasing the spatial dimension would pose important computational cost problems, as the
number of particles used for the estimation should increase exponentially with the number
of regions. However, particle filters seem to be a good direction of research, all the more
since they do not assume specific Gaussian distributions, and hence – as we saw in the
previous part – could be more efficient in estimating sparse and focused activities.
Besides, there are interesting questions to ask about the dimension of the hidden neural
activity. On the one hand, we would like it to have a temporal resolution as good as that of
the EEG, and a spatial resolution as good as that of the fMRI. On the other hand, this leads
to a very high total dimension for u, which is the main cause of high computational cost, and
it is tempting to rather decrease this dimension. Several works have proposed approaches
based on regions of interests (Daunizeau et al., 2007; Ou et al., 2010), where the spatial
dimension is reduced by clustering the sources according to anatomical or physiological
considerations. It is possible however that new approaches will be able in the same time to
keep high temporal and spatial resolution, and decrease the total dimension of the source
space, by using multi-level, or any other complex description of the neural activity relying
on a reduced number of parameters.
4. Conclusion
We have proposed two algorithms to perform EEG-fMRI fusion, both based on the Kalman
filter and smoother. Both algorithms aim at estimating the spatio-temporal patterns of a
hidden neural activity u responsible for both EEG and fMRI measurements, the relation
between them being described by a physiological forward model. The first algorithm
assumes a linear model and in such case, fusion appeared to be feasible on data of a realistic
size (namely, activity of 2,000 sources spread on the cortical surface, sampled at 200Hz
during several minutes). The second algorithm relies on a more accurate nonlinear model,
and though its increased computational cost led to estimations on only 100 sources, it gives
a proof of principle for using adaptive filters to perform EEG-fMRI fusion based on realistic
nonlinear models.
We would like to stress however that these progresses on the methodological front should
not hide the fundamental physiological questions regarding whether EEG and fMRI
measure the same phenomena at all, as we discussed in our introduction, and as recent
reviews (Rosa et al., 2010) pinpoint to still be a problematic question. Nevertheless, if the
question should be answered negatively in general, it remains that many simultaneous EEG-
fMRI studies proved to be successful in specific contexts such as epilepsy or α-rythms
activity. In such case, the development of new fusion algorithms which target the estimation
of the neural activity spatio-temporal patterns should focus on such specific applications
and on the specific links which exist between the two measures in theses contexts. This is
what we plan to do in our future works with real data: in fact, our nonlinear algorithm,
where we assumed a very specific oscillatory activity and a linear relation between the fMRI
BOLD signals and the envelope of this activity is a preliminary for such development.
As we mentioned in the introduction, this work is not only a contribution to the human
brain imaging field, but also to the algorithmic field, since a new modification of the Kalman
EEG-fMRI Fusion: Adaptations of the Kalman
Filter for Solving a High-Dimensional Spatio-Temporal Inverse Problem 257
filter was proposed that filters first backward, and second forward, and since it fosters the
development of new algorithms and filters. In particular, we call for new methods that can
handle nonlinear models and non-Gaussian distributions. We advocate in particular the
minization of L1-norm rather than L2-norm in order to estimate accurately sparse and
focused activities, and the reduction of the computational cost of either Kalman methods or
particle filter methods through the use of low-dimensional representations of high-
dimensional distributions.
5. References
Ahlfors SP, Simpson GV. (2004). Geometrical interpretation of fMRI-guided MEG/EEG
inverse estimates, Neuroimage 1:323–332
Arulampalam, M. Sanjeev; Maskell, Simon; Neil Gordon & Clapp, Tim. (2002). A Tutorial on
Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking IEEE
transactions on signal processing, 50, 174-188
Bénar, C.; Aghakhani, Y.; Wang, Y.; Izenberg, A.; Al-Asmi, A.; Dubeau, F. & Gotman, J.
(2003). Quality of EEG in simultaneous EEG-fMRI for epilepsy Clin Neurophysiol,
114, 569-580T
Buxton, R. B.; UludaNg, K.; Dubowitz, D. J. & Liu, T. (2004). Modelling the hemodynamic
response to brain activation NeuroImage, 23, 220 - 233
Daunizeau, J.; Grova, C.; Marrelec, G.; Mattout, J.; Jbabdi, S.; Pélégrini-Issac, M.; Lina, J. M.
& Benali, H. (2007). Symmetrical event-related EEG/fMRI information fusion in a
variational Bayesian framework Neuroimage., 36, 69 - 87
Daunizeau J, Laufs H, Friston KJ. (2010). EEG-fMRI information fusion: Biophysics and data
analysis, In: EEG-fMRI-Physiology, Technique and Applications, Mulert L (eds.)
Springer DE
Deneux, T. & Faugeras, O. (2006a). Using nonlinear models in fMRI data analysis: model
selection and activation detection Neuroimage, 32, 1669-1689
Deneux, T. & Faugeras, O. (2006b). EEG-fMRI fusion of non-triggered data using Kalman
filtering International Symposium on Biomedical Imaging, 2006, 1068-1071
Deneux, T. & Faugeras, O. (2010). EEG-fMRI Fusion of Paradigm-Free Activity Using
Kalman Filtering Neural Comput., 22, 906 – 948
Friston, K. J.; Mechelli, A.; Turner, R. & Price, C. J. (2000). Nonlinear Responses in fMRI : the
Balloon Model, Volterra Kernels, and Other Hemodynamics, NeuroImage, 12, 466-
477
Goldman, R. I.; Stern, J. M.; Engel, J. J. & Cohen, M. S. (2002). Simultaneous EEG and fMRI of
the alpha rhythm, Neuroreport, 13, 2487-2492
Gotman, J.; Bénar, C. & Dubeau, F. (2004). Combining EEG and fMRI in epilepsy:
Methodological challenges and clinical results, Journal of Clinical Neurophysiology, 21
Grova C, Daunizeau J et al. (2008). Assessing the concordance between distributed EEG
source localization and simultaneous EEG-fMRI studies of epileptic spikes,
Neuroimage 39:755–774
Hämäläinen, M.; Hari, R.; Ilmoniemi, R. J.; Kunuutila, J. & Lounasmaa, O. V. (1993).
Magnetoencephalography: Theory, instrumentation, and applications to
noninvasive studies of the working human brain, Rev. Modern Phys., 65, 413-497
Johnston, L. A., Duff, E., Mareels, I., and Egan, G. F. (2008). Nonlinear estimation of the bold
signal. Neuroimage 40, 504–514.
258 Adaptive Filtering
Jun, S.-C., George, J. S., Pare-Blagoev, J., Plis, S. M., Ranken, D. M., Schmidt, D. M., and
Wood, C. C. (2005). Spatiotemporal Bayesian inference dipole analysis for MEG
neuroimaging data. Neuroimage 28, 84–98.
Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems,
Transactions of the ASME--Journal of Basic Engineering, 82, 35-45
Kilner, J. M.; Mattout, J.; Henson, R. & Friston, K. J. (2005). Hemodynamic correlates of EEG:
a heuristic, Neuroimage, 28, 280-286
Laufs, H.; Kleinschmidt, A.; Beyerle, A.; Eger, E.; Salek-Haddadi, A.; Preibisch, C. &
Krakow, K. (2003). EEG-correlated fMRI of human alpha activity, Neuroimage, 19,
1463-76
Lemieux, L.; Krakow, K. & Fish, D. R. (2001). Comparison of spike-triggered functional MRI
BOLD activation and EEG dipole model localization, Neuroimage, 14, 1097-1104
Martinez-Montes, E.; Valdes-Sosa, P. A.; Miwakeichi, F.; Goldman, R. I. & Cohen, M. S.
(2004). Concurrent EEG/fMRI analysis by multiway Partial Least Squares,
Neuroimage, 22, 1023-1034
Murray, L., and Storkey, A. (2008). Continuous time particle filtering for fMRI, In: Advances
in Neural Information Processing Systems, Vol. 20, eds J. C. Platt, D. Koller, Y. Singer,
and S. Roweis (Cambridge, MA: MIT Press), 1049–1056.
Ogawa, S.; Menon, R. S.; Tank, D. W.; Kim, S.; Merkle, H.; Ellerman, J. M. & Ugurbil, K.
(1993). Function brain mapping by blood oxygenation level-dependent contrast
magnetic resonance imaging: a comparison of signal characteristics with a
biophysical model, Biophys. J., 64, 803-812
Ou, W.; Nummenmaa, A.; Ahveninen, J. & Belliveau, J. W. (2010). Multimodal functional
imaging using fMRI-informed regional EEG/MEG source estimation Neuroimage.,
52, 97-108
Plis, S. M., Ranken, D. M., Schmidt, D. M., and Wood, C. C. (2005). Spatiotemporal Bayesian
inference dipole analysis for MEG neuroimaging data. Neuroimage 28, 84–98.
Riera, J.; Watanabe, J.; Kazuki, I.; Naoki, M.; Aubert, E.; Ozaki, T. & Kawashima, R. (2004). A
state-space model of the hemodynamic approach: nonlinear filtering of BOLD
signals, NeuroImage, 21, 547-567
Riera J, Sumiyoshi A. (2010). Brain oscillations: Ideal scenery to understand the
neurovascular coupling, Curr Op Neurobiol 23:374–381, 2010.
Rosa, M. J.; Daunizeau, J. & Friston, K. J. (2010). EEG-fMRI integration: A critical review of
biophysical modeling and data analysis approaches, J Integr Neurosci., 2010, 9, 453 -
476
Waites, A. B., Shaw, M. E., Briellmann, R. S., Labate, A., Abbott, D. F., & Jackson, G. D.
(2005). How reliable are fMRI-EEG studies of epilepsy? A nonparametric approach
to analysis validation and optimization. Neuroimage, 24(1), 192–199.
Welch, G. & Bishop, G. (2006). An Introduction to the Kalman Filter
Welling, M., Max Welling’s classnotes in machine learning. (N.d.). Available online at
http://www.cs.toronto.edu/~welling/classnotes/papers_class/KF.ps.gz.
0
11
Adaptive-FRESH Filtering
Omar A. Yeste Ojeda and Jesús Grajal
Universidad Politécnica de Madrid
Spain
1. Introduction
Adaptive filters are self-designing systems for information extraction which rely for their
operation on a recursive algorithm (Haykin, 2001). They find application in environments
where the optimum filter cannot be applied because of lack of knowledge about the
signal characteristics and/or the data to be processed. This a priori knowledge required
for the design of the optimum filter is commonly based on stochastic signal models,
which traditionally are stationary. However, the parameters of many signals found in
communication, radar, telemetry and many other fields can be represented as periodic
functions of time (Gardner, 1991). When this occurs, stationary signal models cannot
exploit the signal periodicities, while cyclostationary models become more suitable since they
represent more reliably the statistical signal properties.1
Firstly, let us review some of the key points of cyclostationary signals, while introducing some
definitions that will be used throughout this chapter. We have said that most of signals used
in many fields such as communication, radar, or telemetry exhibit statistical parameters which
vary periodically with time. The periodicities of these parameters are related to the parameters
of the signal modulation, such as the carrier frequency or the chip rate among others (Gardner,
1986; 1994). Let the second-order2 auto-correlation function of a zero-mean stochastic signal
be defined as:
R xx (t, λ) E { x (t) x ∗ (λ)} (1)
A signal is said to be (second-order) cyclostationary if, and only if, its (second-order)
auto-correlation function is a periodic function of time, namely with period T. Therefore,
it can be expanded in a Fourier series (Giannakis, 1998):
∞
α
R xx (t, λ) = ∑ R xxp (t − λ) e j2πα p λ (2)
p =− ∞
where α p = p/T are called the cycle frequencies of x (t), and the set of all the cycle frequencies
is referred to as the cyclic spectrum. The Fourier coefficients of the expansion in (2) are named
1 Cyclostationary signal models are a more general class of stochastic processes which comprise the
stationary ones. Therefore, they always represent the statistical properties of the signal at least as well
as the stationary ones.
2 Since the optimality criterion used in this chapter is based on the Mean Squared Error (MSE), only the
second-order statistical moments are of interest to us. The first-order statistical moment (i.e. the mean)
is zero by assumption. Throughout this chapter, the second-order cyclostationarity is exploited. For
brevity, hereinafter the cyclostationarity and correlation functions are assumed to be of second order,
even without explicit mention.
260
2 Adaptive Filtering
Will-be-set-by-IN-TECH
In practice, the signal periodicities are often incommensurable with each other, which yields
that the auto-correlation function in (1) is not periodic, but an almost-periodic function of time.
In this general case (in the sense that periodic functions are a particular case of almost-periodic
ones), the signal is said to be almost-cyclostationary, and its auto-correlation function allows
its expansion as a generalized Fourier series:
α
R xx (t, λ) = ∑ R xxp (t − λ) e j2πα p λ (4)
α p ∈A xx
where the set Axx stands for the cyclic spectrum of x (t), and is generally composed of the sum
and difference of integer multiples of the signal periodicities (Gardner, 1987; Gardner et al.,
1987; Napolitano & Spooner, 2001). Additionally, the definition of the cyclic auto-correlation
functions must be changed accordingly:
T
1
Rαxx (τ ) lim R xx (t + τ, t) e− j2παt dt (5)
T → ∞ 2T − T
One of the most important properties of almost-cyclostationary signals concerns the existence
of correlation between their spectral components. The periodicity of the auto-correlation
function turns into spectral correlation when it is transformed to the frequency domain. As a
result, almost-cyclostationarity and spectral correlation are related in such a way that a signal
exhibits almost-cyclostationary properties if, and only if, it exhibits spectral correlation too.
Let XΔ f (t, f ) be the spectral component of x (t), around time t and frequency f , and with
spectral bandwidth Δ f :
t+ 1
x (u ) e− j2π f u du
2Δ f
XΔ f (t, f ) = (6)
t− 2Δ1 f
which represents the time-averaged correlation between two spectral components centered
at frequencies f and f − α, as their bandwidth tends to zero. It can be demonstrated
that the spectral correlation function matches de Fourier transform of the cyclic correlation
functions (Gardner, 1986), that is:
∞
S αxx ( f ) = Rαxx (τ ) e− j2π f τ dτ (8)
−∞
The orthogonality principle establishes that, if h(t, λ) is optimum, then input signal x (t) and
the estimation error (ε(t) = d(t) − d(t)) are orthogonal with each other: 3
where
Ruv (t, λ) E {u (t) v∗ (λ)} (12)
stands for the cross-correlation function between u (t) and v(t), and hΓ (t, λ) stands for the
optimal LAPTV filter (where the meaning of the subindex Γ will be clarified in the next
paragraphs). Since d(t) and x (t) are jointly almost-cyclostationary by assumption, their auto-
and cross-correlation functions are almost-periodic functions of time, and therefore they can
be expanded as a generalized Fourier series (Corduneanu, 1968; Gardner, 1986):
where Adx and B xx are countable sets consisting of the cycle frequencies of Rdx (t, λ) and
R xx (t, λ), respectively. In addition, the cyclic cross- and auto-correlation functions Rαdx (τ )
3 Up to this point, the signals considered are real valued and therefore the complex-conjugation operator
can be ignored. However, it is incorporated in the formulation for compatibility with complex signals,
which will be considered in following sections.
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 2635
This condition can be satisfied for all t, λ ∈ R if hΓ (t, λ) is also an almost-periodic function
of time, and therefore, can be expanded as a generalized Fourier series (Gardner, 1993;
Gardner & Franks, 1975):
γ
hΓ (t, λ) ∑ hΓ q (t − λ) e j2πγq λ (18)
γq ∈Γ
where Γ is the minimum set containing Adx and Bxx which is closed in the addition and
γ
subtraction operations (Franks, 1994). The Fourier coefficients in (18), hΓ q (τ ), can be computed
analogously to Eq. (15) and (16). Then, the condition in (17) is developed by using the
definition in (18), taking the Fourier transform, and augmenting the sets Adx and Bxx to the
set Γ, which they belong to, yielding the following condition:
αk γ β
∑ Sdx ( f ) e j2παk λ = ∑ HΓ q ( f ) S xxp ( f − γq ) e j2π ( β p +γq ) λ (19)
α k ∈Γ β p ,γq ∈Γ
which must be satisfied for all f , λ ∈ R, and where the Fourier transform of the cyclic cross-
and auto-correlation functions,
∞
α
Suv (f) = Rαuv (τ ) e− j2π f τ dτ (20)
−∞
stands for the spectral cross- and auto-correlation function, respectively (Gardner, 1986).
Finally, we use the fact that two almost-periodic functions are equal if and only if their
generalized Fourier coefficients match (Corduneanu, 1968), which allows to reformulate (19)
as the design formula of optimal LAPTV filters:
αk γ α − γq
Sdx (f) = ∑ HΓ q ( f ) S xxk ( f − γq ) , ∀αk ∈ Γ, ∀ f ∈ R (21)
γq ∈Γ
Note that the sets of cycle frequencies Adx and B xx are, in general, subsets of Γ. Consequently,
the condition in (21) makes sense under the consideration that Sdx α ( f ) = 0 if α ∈
/ Adx , and
S αxx ( f ) = 0 if α ∈
/ B xx , which is coherent with the definitions in (15) and (16). Furthermore,
(21) is also coherent with the classical Wiener theory. When d(t) and x (t) are jointly stationary,
then the sets of cycle frequencies Adx , B xx , and Γ consist only of cycle frequency zero, which
yields that the optimal linear estimator is LTI and follows the well known expression of Wiener
filter:
S0dx ( f ) = HΓ0 ( f ) S0xx ( f ) (22)
Let us use a simple graphical example to provide an overview of the implications of the
design formula in (21). Consider the case where the signal to be estimated, s(t), is corrupted
264
6 Adaptive Filtering
Will-be-set-by-IN-TECH
Fig. 1. Power spectral densities of the signal, the noise and interference in the application
example.
by additive stationary white noise, r (t), along with an interfering signal, u (t), to form the
observed signal, that is:
Assuming that s(t), r (t), and u (t) are statistically independent processes, the design formula
becomes:
γ
α −γ α −γ α −γ
αk
Sss ( f ) = ∑ HΓ q ( f ) Sssk q ( f − γq ) + Srrk q ( f − γq ) + Suuk q ( f − γq ) (25)
γq ∈Γ
In the following, let us consider the PSDs plotted in Fig. 1 for the signal, the noise and
the interference, all of which are flat in their spectral bands, with PSD levels ηs , ηr and ηu ,
respectively. Let us further simplify the example by assuming that the signal is received
with a high Signal-to-Noise Ratio (SNR), but low Signal-to-Interference Ratio (SIR), so that
ηu ηs ηr . The Wiener filter can be obtained directly from Fig. 1, and becomes:
⎧ ηs
⎪
⎪ ≈0 f ∈ Bu
⎪ ηs + ηr + ηu
⎨
Hw ( f ) = η η+s η ≈ 1 f ∈
/ Bu , f ∈ Bs (26)
⎪
⎪ s r
⎪
⎩
0 f ∈
/ Bs
where Bu and Bs represent the frequency intervals comprised in the spectral bands of the
interference and the signal, respectively. Thus, the Wiener filter is not capable of restoring
the signal spectral components which are highly corrupted by the interference, since they are
almost cancelled at its output.
On the contrary, an LAPTV filter could restore the spectral components cancelled by the
Wiener filter depending on the spectral auto-correlation function of the signal and the
availability at the input of correlated spectral components of the signal which are not
perturbed by the interference. In our example, consider that the signal s(t) is Amplitude
Modulated (AM). Then, the signal exhibits spectral correlation at cycle frequencies α = ±2 f c ,
in addition to cycle frequency zero, which means that the signal spectral components at
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 2657
positive frequencies are correlated with those components at negative frequencies (Gardner,
1987). 4 The spectral correlation function of the AM signal is represented in Fig. 2.
frequencies Γ only consists of the signal cyclic spectrum, that is Γ = {−2 f c , 0, 2 f c }, so that
the design formula must be solved only for these values of αk . Suppose also that the cycle
±2 f ±2 f
frequencies ±2 f c are exclusive of s(t), so that S xx c ( f ) = Sss c ( f ). This is coherent with the
assumption that noise is stationary and with Fig. 1, where the bandwidth of the interference is
narrower than 2 f c , and therefore none of its spectral components is separated 2 f c in frequency.
Fig. 3 graphically represents the conditions imposed by the design formula (25) on the
Fourier coefficients of the optimal LAPTV filter, where each column stands for the different
equations as αk takes different values from Γ. The plots in the first three rows stand for the
amplitude of the frequency shifted versions of the spectral auto-correlation function of the
signal, the noise and the interference, while the plots in the last row represent the spectral
cross-correlation between the input and the desired signals. The problem to be solved is
−2 f 2f
to find the filter Fourier coefficients, { HΓ c ( f ), HΓ0 ( f ), HΓ c ( f )}, which multiplied by the
spectral correlation functions represented in the first three rows, and added together, yield
the spectral cross-correlation function in the last row.
Firstly, let us pay attention to the spectral band of the interference at positive frequencies.
From Fig. 3, the following equation system apply:
⎧
⎪ −2 f c 2 fc
⎪ S xx ( f ) HΓ ( f ) + S xx ( f − 2 f c ) HΓ ( f ) = Sss ( f )
0 0 0
⎪
⎪
⎪
⎨
2f 2f 2f
S xxc ( f ) HΓ0 ( f ) + S0xx ( f − 2 f c ) HΓ c ( f ) = Sss c ( f ) , f ∈ Bu+ (27)
⎪
⎪
⎪
⎪
⎪
⎩ S 0 ( f + 2 f ) H −2 f c ( f ) = 0
xx c Γ
4 The definition of the spectral correlation function used herein (see Eq. (7)) differs in the meaning of
frequency from the definition used by other authors, as in (Gardner, 1987). Therein, the frequency
stands for the mean frequency of the two spectral components whose correlation is computed. Both
α
definitions are related with each other by a simple change of variables, that is S αxx ( f ) = S xx ( f − α/2),
α
where S xx ( f ) corresponds to spectral correlation function according to the definition used in (Gardner,
1987).
266
8 Adaptive Filtering
Will-be-set-by-IN-TECH
(c) αk = −2 f c
(b) αk = 2 f c
(a) αk = 0
Fig. 3. Graphical representation of the design formula of LAPTV filters, which has been
applied to our example. Each plot corresponds to a different value of αk in (25).
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 2679
The result in (31) is coherent with the fact that there are not any signal spectral components
located in Bu+ after frequency shifting the input downwards by 2 f c . After using the
approximations of high SNR and low SIR, the above results for the other filter coefficients
can be approximated by:
The preceding example has been simplified in order to obtain comprehensive results of
how an LAPTV filter performs, and to intuitively introduce the idea of frequency shifting
and filtering. This idea will become clearer in Section 4, when describing the FRESH
implementation of LAPTV filters. In our example, no reference to the cyclostationary
properties of the interference has been made. The optimal LAPTV filter also exploits the
interference cyclostationarity in order to suppress it more effectively. However, if we had
considered the cyclostationary properties of the interference, the closed form expressions for
the filter Fourier coefficients would have result more complex, which would have prevented
us from obtaining intuitive results. Theoretically, the set of cycle frequencies Γ consists of
an infinite number of them, which makes very hard to find a closed form solution to the
design formula in (21). This difficulty can be circumvented by forcing the number of cycle
frequencies of the linear estimator h(t, λ) to be finite, at the cost of performance (the MSE
increases and the filter is no longer optimal). This strategy will be described along with the
FRESH implementation of LAPTV filters, in Section 4. But firstly, the expression in (22) is
generalized for complex signals in the next section.
where the set of cycle frequencies Γ is defined in this case as the minimum set comprising
Adx , Adx ∗ , B xx and B xx ∗ , which is closed in the addition and subtraction operations (with Adx ∗
and B xx ∗ being the sets of cycle frequencies of the cross-correlation functions of the complex
conjugate of the input, x ∗ (t), with d(t) and x (t), respectively.)
As it occurred for real signals in Section 2, finding a closed-form solution to the design formula
in (37) may result too complicated when a large number of cycle frequencies compose the set
Γ. In the next section a workaround is proposed based on the FRESH implementation of
LAPTV filters and the use of a reduced set of cycle frequencies, Γ s ⊂ Γ.
It can be clearly seen that the output of the WLAPTV filter, d(t), is the result of adding together
the outputs of the LTI filters hα k (t) and g β p (t), whose inputs are frequency-shifted versions
of the input signal x (t) and its complex conjugate x ∗ (t), respectively. This is precisely the
definition of a FRESH filter.
6
Formally, two FRESH filters are required respectively for the input and its complex conjugate.
Nevertheless, we will refer to the set of both as a FRESH filter herein.
270
12 Adaptive Filtering
Will-be-set-by-IN-TECH
The most difficult problem in the design of sub-optimal FRESH filters concerns the choice of
the optimal subset of frequency shifts, Γ s ⊂ Γ, under some design constraints. For instance,
a common design constraint is the maximum number of branches of the FRESH filter. The
optimum Γ s becomes highly dependent on the spectral correlation function of the input
signal and can change with time in nonstationary environments. However, with the aim
of simplifying the FRESH implementation, Γ s is usually fixed beforehand and the common
approach to determine the frequency shifts consists in choosing those cycle frequencies at
which the spectral cross-correlation functions between d(t) and x (t), or its complex conjugate,
exhibits maximum levels (Gardner, 1993; Yeste-Ojeda & Grajal, 2008).
Once the set of frequency shifts has been fixed, the design of FRESH filters is much simpler
than that of WLAPTV filters, because FRESH filters only use LTI filters. Because of the
nonstationary nature of the input (it is almost-cyclostationary), the optimality criterion used
in the design of the set of LTI filters is the MTAMSE criterion, in contrast to the MMSE
criterion (Gardner, 1993). The resulting FRESH filter is a sub-optimum solution since it is
optimum only within the set of FRESH filters using the same set of frequency shifts.
Let us formulate the output of the FRESH filter by using vector notation:
∞
d(t) = w † (t − u ) x(u ) du (41)
−∞
where w † (t) stands for the conjugate transpose of vector w (t), and each component of the
input vector x(t) represents the input of an LTI filter according to the scheme represented in
Fig. 5:
⎡ ⎤ ⎡ ⎤
x1 ( t ) x (t) e j2πα1 t
⎢ x2 ( t ) ⎥ ⎢ x ( t ) e j2πα 2 t ⎥
⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
x( t ) = ⎢ x P ( t ) ⎥ = ⎢ x ( t ) e j2πα P t ⎥ (42)
⎢ ⎥ ⎢ ∗ ⎥
⎢ x P +1 (t) ⎥ ⎢ x (t) e j2πβ1 t ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
x L (t) ∗
x (t) e j2πβ L − P t
where P is the number of branches used for filtering the frequency shifted versions of x (t),
( L − P ) is the number of branches used for filtering its complex conjugate x ∗ (t), and L
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 271
13
is the total number of branches. Note that the first P cycle frequencies can be repeated
in the next P − L cycle frequencies, since using a frequency shift for the input x (t) does
not exclude it from being used for its complex conjugate. With the aim of computing the
MTAMSE optimal set of LTI filters, a stationarized signal model is applied to both the desired
signal d(t) and the input vector x(t), which are jointly almost-cyclostationary processes, and
therefore stationarizable (Gardner, 1978). The stationary auto- and cross-correlation functions
are obtained by taking the stationary part (time-averaging) the corresponding nonstationary
correlation functions. As a result, the orthogonality principle is formulated as follows:
E ε ( t + τ ) x† ( t ) = 0 T , ∀ τ ∈ R (43)
∞
Rdx (τ ) = wΓ† s (τ − u ) Rxx (u ) du , ∀τ ∈ R (44)
−∞
where 0 is the null vector, and the cross-correlation (row) vector Rdx (τ ) and the auto-
correlation matrix Rxx (τ ) are computed by time-averaging the corresponding nonstationary
correlation functions. For Rdx (τ ) this yields:
⎤T⎡
Rαdx1 (τ )
⎢ R α2 ( τ ) ⎥
⎢ dx ⎥
⎢ .. ⎥
⎢ ⎥
T ⎢ . ⎥
1 ⎢ αP ⎥
Rdx (τ ) = lim E d(t + τ ) x (t) dt = ⎢ Rdx (τ ) ⎥
†
(45)
T → ∞ 2T − T ⎢ α P+1 ⎥
⎢ Rdx ∗ (τ ) ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
αL
Rdx ∗ (τ )
where the cyclic cross-correlation functions were defined in (15). The matrix Rxx (τ ) becomes:
T
1
Rxx (τ ) = lim E x(t + τ ) x† (t) dt (46)
T → ∞ 2T − T
where the element in the q-th row and k-th column is:
⎧ α −αq
⎪
⎪
⎪
R xxk (τ ) e j2παq τ q ≤ P, k ≤ P
⎪ αk −αq
⎪
⎨ R xx ∗ (τ ) e j2πα q τ q ≤ P, k > P
[Rxx (τ )]qk = αq −αk ∗ j2παq τ (47)
⎪
⎪ R ∗ (τ ) e q > P, k ≤ P
⎪ xx
⎪
⎩ Rα q −α k (τ ) ∗ e j2πα q τ
⎪
q > P, k > P
xx
and the element of the q-th row and k-th column of matrix Sxx ( f ) is:
⎧ αk −αq
⎪
⎪ S xx f − αq q ≤ P, k ≤ P
⎪
⎪ α −α
⎪
⎨ S xxk ∗ q f − αq q ≤ P, k > P
[Sxx ( f )]qk = αq −αk ∗ (51)
⎪
⎪ S ∗ αq − f q > P, k ≤ P
⎪ xx
⎪ ∗
⎪
⎩ S α − α
xx
q k
αq − f q > P, k > P
It is noteworthy that the expression in (37), which defines the optimal WLAPTV filter, is a
generalization of (48) for the case where Γ s = Γ. In addition, we want to emphasize the main
differences between optimal and sub-optimal FRESH filters:
1. Optimal FRESH filters are direct implementations of optimal WLAPTV filters defined by
(37), and generally consist of an infinite number of LTI filters. Contrarily, sub-optimal
FRESH filters are limited to a finite set of LTI filters.
2. For jointly almost-cyclostationary input and desired signals, optimal FRESH filters are the
optimal with respect to any other linear estimator. On the contrary, sub-optimal FRESH
filters are defined for a given subset Γ s and, therefore, they are optimal only in comparison
to the rest of FRESH filters using the same frequency shifts.
3. Optimal FRESH filters minimize the MSE at all times (MMSE criterion). However,
sub-optimal FRESH filters only minimize the MSE on average (MTAMSE criterion). This
means that another FRESH filter, even by making use of the same set of frequency shifts,
could exhibit a lower MSE at specific times.
Finally, the inverse of matrix Sxx ( f ) could not exist for all frequencies, which would
invalidate the expression in (49). However, since its main diagonal represents the power
spectral density of the frequency shifted versions of the input, this formal problem can be
ignored by assuming that a white noise component is always present at the input.
At this point we have finished the theoretical background concerning FRESH filters. The
following sections are focused on the applications of adaptive FRESH filters. Firstly, the
introduction of an adaptive algorithm in a FRESH filter is reviewed in the next section.
7
Hereinafter, FRESH filters are always limited to a finite set of frequency shifts. Therefore, sub-optimal
FRESH filters will be referred to as “optimal” for brevity.
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 273
15
In order to simplify the analysis of the adaptive algorithms, the structure of FRESH filters is
particularized to the case of discrete-time signals with the set of LTI filters exhibiting Finite
Impulse Response (FIR filters). After filtering the received signal, x (n ), the output of the
adaptive FRESH filter is compared to a desired (or reference) signal, d(n ), and the error, ε(n ),
is used by an adaptive algorithm in order to update the filter coefficients. Commonly, the
desired signal is either a known sequence (as a training sequence for equalizers), or obtained
from the received signal (as for blind equalizers).
Let the output of the FRESH filter be defined as the inner product:
L
d(n ) w † (n )x(n ) = ∑ wi† (n)xi (n) (52)
i =1
where the vectors w (n ) and x(n ) are the concatenation of the filter and the input vectors,
respectively:
⎡ ⎤ ⎡ ⎤
w1 ( n ) x1 ( n )
⎢ w2 ( n ) ⎥ ⎢ x2 ( n ) ⎥
⎢ ⎥ ⎢ ⎥
w (n ) = ⎢ . ⎥ , x( n ) = ⎢ . ⎥ (53)
⎣ .. ⎦ ⎣ .. ⎦
w L (n ) x L (n )
wi (n ) and xi (n ) are the filter and input vectors, respectively, of the i-th branch:
where Mi is the length of the i-th filter. Consequently, the total length of vectors w (n ) and
x(n ) is M = ∑ iL=1 Mi .
274
16 Adaptive Filtering
Will-be-set-by-IN-TECH
Using the results in the previous sections, the optimal set of LTI filters is given by
multidimensional Wiener filter theory:
w o = (R x ) −1 p (56)
where Rx stands for the time-averaged auto-correlation matrix of the input vector, 8
R x = E x( n ) x† ( n ) (57)
In Eq. (56), the vector p represents the time-averaged cross-correlation between the input
vector x and d(n ):
p = E {x(n )d∗ (n )} (59)
The correlation functions in (57) and (59) are time-averaged in order to force the LTI
characteristic of wo . The expressions in (56), (57), and (59) allow to follow the classical
developments of adaptive algorithms (Haykin, 2001). Despite using classical algorithms,
adaptive FRESH filters naturally exploit the cyclostationary properties of the signals as the
inputs of the LTI filters are frequency-shifted versions of the input of the adaptive FRESH
filter.
S αxx ( f ) = Sdx
α
(f) (60)
and therefore (60) could only be satisfied if the desired and the input signals are the same,
which would eliminate the necessity of a filter. Contrarily, many stochastic processes do not
exhibit spectral cross-correlation with their complex conjugate at cycle frequency zero, for
instance the circular stochastic processes. This allows the condition (61) to be satisfied without
implying that d(t) = x (t).
When adaptive BAFRESH filters use a suitable set of frequency shifts, the correlation between
the inputs of the LTI filters, xi (n ), and the reference signal x (n ), is only due to the signal to
be estimated. Thus, the adaptive algorithm will converge to a solution by which the desired
signal is estimated at the output of the BAFRESH filter, while any other component present
at the input is minimized. This is a powerful characteristic of adaptive FRESH filters which
turns them into a good candidate for those applications where no reference signal is available
at the receiver.
constructive one when the outputs of the branches are added together. Although this is a well
known problem, it has been exhaustively dealt with only in the field of beamforming and
direction finding, mainly regarding MUSIC (Schmidt, 1986) and SCORE (Agee et al., 1990)
algorithms. Besides describing the problem, some solutions have been proposed. Schell
and Gardner stated that the degradation in performance increases with the product of the
observation time and the error in cycle frequency (Schell & Gardner, 1990). As a solution, they
proposed two alternatives: Using multiple cycle frequencies in order to increase robustness,
or adaptively estimating the cycle frequencies by maximizing the magnitude of the FFT of
the lag product of the observations. Lee and Lee proposed two procedures for estimating the
cycle-frequency offset (Lee & Lee, 1999). Both of them are based on the maximization of the
highest eigenvalue of the product of a sample autocorrelation-related matrix. In (Lee et al.,
2000), cycle frequencies are estimated iteratively through a gradient descent algorithm which
maximizes the output power of an adaptive beamformer, assuming that the initial cycle
frequencies are close enough. A similar approach can be found in (Jianhui et al., 2006),
where the conjugate gradient algorithm is used instead of gradient descent, and after proper
modifications in the cost function. A new approach is described in (Zhang et al., 2004), where
the robustness of the beamformer is achieved by minimizing a cost function which is averaged
over a range of cycle frequencies, instead of trying to correct the cycle frequency errors.
The approach used in this section is rather different from the cited work. The purpose of
this section is to explore an additional advantage of using adaptive filters for FRESH filtering,
which is their capability of working in the presence of errors in the frequency shifts. This is
possible because the adaptive filter is able to track these errors, by updating the coefficients of
the LTI filters in a cyclic manner. As a result, the adaptive filter at each branch of the FRESH
filter behaves as a Linear Periodically Time-Variant (LPTV) filter, rather than converging to
an LTI filter. This ability is strongly conditioned by the rate of convergence of the adaptive
algorithm. For slow convergence, the outputs of the branches with frequency-shift errors
are cancelled. As the convergence accelerates, the adaptive filters of those branches with
frequency-shift errors behave as LPTV filters in order to compensate the errors. These subjects
shall be dealt with later on, but firstly let us highlight the problem of cycle-frequency errors in
FRESH filtering.
11
However, the degradation of the estimation error depends on the contribution of the corresponding
branch to the signal estimate.
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 277
19
Let us use the definitions introduced in the previous section, specifically from (52) to (59). For
convenience, let the auto-correlation matrix of the input vector be expressed in the form
E x1 (n )x2L (n )
Rx11 †
Rx = (62)
E x2L (n )x1 (n )
† Rx2L
where Rxij , with i ≤ j, stands for the auto-correlation matrix of input vectors from the i-th to
the j-th branch:
Rxij = E xij (n ) x†ij (n ) (63)
Similarly, the vector xij (n ) stands for the concatenation of input vectors from the i-th to the
j-th branch:
T
xij (n ) = xiT (n ), . . . , x Tj (n ) (64)
Let us assume that there is an error in the frequency shift used for the first branch. Then, the
time-averaged cross-correlation between x1 (n ) and the input vectors of the rest of branches is
zero. Moreover, the time-averaged cross-correlation of x1 (n ) with the desired signal, d(n ), is
also zero. Thus, the optimal set of LTI filters when there exist errors in the frequency shift of
the first branch results:
−1
−1 Rx11 0 0 0
w o = (R x ) p = = −1 p (65)
0 R− 1
x2L p2L R x2L 2L
Therefore, the optimal LTI filter of the first branch and its output are null. In addition, the
optimal set of filters for the other branches is equivalent to not using the first branch. In short,
what is happening is that the first frequency shift does not belong to set Γ. When all branches
exhibit errors in their frequency shifts, then the optimal filter vector is zero and the FRESH
filter becomes useless. It is noteworthy that the cancellation of the branches with an error in
their frequency shift is caused by the LTI characteristic of the filters, which has been forced by
time-averaging Rx and p. In the next section, it is shown that the errors in the frequency shifts
can be compensated by allowing the filters to be time-variant, specifically, LPTV.
where woi stands for the optimal LTI filter in the i-th branch and xi (n ) are the outputs of the
frequency shifters without error. The LPTV solution can be obtained by including the errors
278
20 Adaptive Filtering
Will-be-set-by-IN-TECH
i =1
L ! "! "
= ∑ e− j2πΔi n woi
†
xi (n ) e j2πΔi n (67)
i =1
L
= ∑ w̃oi† (n) x̃i (n) = w̃o† (n) x̃(n)
i =1
where w̃o (n ) is the set of LPTV filters, and x̃(n ) are the input vector to these filters (the output
of the first frequency shifters in Fig. 8).
By defining the diagonal matrix
⎡ ⎤
e j2πΔ1 I M1 0 ··· 0
⎢ 0 e j2πΔ2 I ··· 0 ⎥
⎢ M2 ⎥
Φ⎢ .. .. .. .. ⎥ (68)
⎣ . . . . ⎦
0 0 · · · e j2πΔ L I ML
where I N is the N × N identity matrix, the vectors w̃o (n ) and x̃(n ) can be expressed in the
form:
w̃o (n ) = Φn wo (69)
x̃(n ) = Φn x(n ) (70)
Additionally, the optimal set of linear time-variant filters is obtained by minimization of the
instantaneous MSE. This optimal set (whose input vector is x̃(n )) is (Van Trees, 1968):
where
Rx̃ (n ) = E x̃(n ) x̃† (n ) (72)
p̃(n ) = E {x̃(n ) d∗ (n )} (73)
Therefore, since (69) is verified, a sufficient condition to obtain (69) from (71), is that:
Rx̃ (n ) Φn E x(n ) x† (n ) Φ−n
= Φ n Rx Φ − n (74)
∗
p̃(n ) Φ E {x(n )d (n )}
n
= Φn p (75)
6.3 Excess MSE of the LMS algorithm in the presence of errors in the frequency shifts
In this section we develop an analytical expression for the time-averaged MSE at steady-state
of adaptive FRESH filters using the LMS algorithm (LMS-FRESH). The objective is to obtain
an expression that accounts for errors in the frequency shifts of the LMS-FRESH filter.
Basically, the LMS algorithm consists in updating the filter weights in the direction of a
gradient estimate of the MSE function, which can be thought as a stochastic version of
the steepest descent algorithm (Widrow et al., 1976). Let w̃ (n ) be the concatenation vector
defining the LMS-FRESH filter. The updating algorithm can be formulated as follows:
280
22 Adaptive Filtering
Will-be-set-by-IN-TECH
w̃ (n + 1) = w̃ (n ) + 2με∗ (n )x̃(n )
ε(n ) = d(n ) − d(n ) (76)
d(n ) = w̃ † (n ) x̃(n )
Let us separate the excess MSE into two terms: The excess MSE due to gradient noise and the
excess MSE due to lag error. The separation is possible by considering the weight error vector
as the sum of two terms:
The first term of the sum in the right hand side, ũ(n ) = w̃ (n ) − E {w̃ (n )}, produces the excess
MSE due to gradient noise, which is related to the gradient estimation process. The second
term, ṽ (n ) = E {w̃ (n )} − w̃o (n ), produces the excess MSE due to lag, which quantifies the
error due the fact that the adaptive filter cannot follow the variations of w̃o (n ) with time. By
substituting (78) in (77), we obtain:
ξ e (n ) = E ũ† (n ) Rx̃ (n ) ũ(n ) + E ṽ † (n ) Rx̃ (n ) ṽ (n ) + 2Re E ũ† (n ) Rx̃ (n ) ṽ (n )
(79)
It can be easily shown (from the definitions of ũ(n ) and ṽ (n )) that the last term of (79) is zero.
Thus, the total time-averaged MSE at steady-state of the LMS-FRESH filters can be expressed
as the sum of three terms:
ξ LMS E | ε(n )|2 = ξ min + ξ e (n ) = ξ min + ξ ∇ + ξ lag (80)
where ξ min is the minimum time-averaged MSE attained by the optimal FRESH filter, ξ ∇ is
the time-averaged excess MSE due to gradient noise, and ξ lag is the time-averaged excess MSE
due to lag error.
The minimum time-averaged MSE results from multivariate Wiener theory (Gardner, 1993):
ξ min = σd2 − w̃o† (n ) Rx̃ (n ) w̃o (n )
Assuming that the LMS adaptive filter converges to an LPTV filter at each branch (as shown
in Fig. 8), the time-averaged excess MSE at steady-state can be approximated by12 :
where tr {Rx } is the trace of matrix Rx defined in (57). From (83), ξ ∇ can be reduced as much
as desired by decreasing μ. However, the convergence factor μ controls the rate of convergence
of the LMS, so that it is faster as μ increases. Additionally, it will be shown next that the excess
MSE due to lag error increases as μ decreases.
The excess MSE due to lag error becomes apparent when the adaptive filter cannot follow
the variations of w̃o (n ) (the LPTV solution) with time. This excess MSE is a function of the
weight-error vector ṽ (n ), which represents the instantaneous difference between the expected
value of the adaptive filter and w̃o (n ):
The mathematical development of the time-averaged excess MSE at steady-state due to the
lag error can be found in the appendix of (Yeste-Ojeda & Grajal, 2010), and yields:
ξ lag = v † Rx v (85)
where v, which does not depend on time, is the weight-error vector at steady-state. Under the
same conditions that were assumed for (83) (see footnote 12):
lim v = −R−
x p = − wo
1
(88)
μ →0
12
Under small error conditions, and by assuming that ε ( n ) and x̃( n ) are Gaussian, and x̃( n ) is
uncorrelated over time, i.e. E {x̃( n ) x̃( n + k)} = 0, with k = 0, (Widrow et al., 1975; 1976).
282
24 Adaptive Filtering
Will-be-set-by-IN-TECH
which implies that the LMS-FRESH filter converges to the null vector in a mean sense.
Moreover, the filter converges to the null vector also in a mean square sense. This results
is derived from the fact that the gradient noise tends to zero as μ tends to zero (see
(83)), which entails that ũ(n ) = 0, and therefore the LMS-FRESH filter vector matches its
expected value, i.e. w̃ (n ) = E {w̃ (n )} = 0. As a result, the outputs of the branches with
an error in their frequency shift are null. The time-averaged excess MSE due to lag error is
obtained by substituting (88) in (85), which yields:
This result could be obtained considering that, since the output of the adaptive-FRESH
filter is null, the error signal is ε(n ) = d(n ).
4. The optimal convergence factor which minimizes the time-averaged MSE of the LMS
algorithm is obtained from (80), (83) and (85), which yields:
μ o = arg min μ ξ min tr {Rx } + v † Rx v (91)
μ
where λmax is the maximum eigenvalue of matrix Rx . This is the same condition as for
stationarity environments (Haykin, 2001; Widrow et al., 1976). 13
6.5 Application
Finally, let us quantify the performance of adaptive FRESH filters for cycle-frequency error
compensation through a case study where the received signal consists of a BPSK signal
embedded in stationary white Gaussian noise. The receiver incorporates a FRESH filter with
the purpose of extracting the BPSK signal from the noise. Two frequency shifts related to the
cyclic spectrum of the BPSK signal are used, the inverse of its symbol interval α1 = 1/Ts , and
twice its carrier frequency, α2 = 2 f c . These two cycle frequencies have been chosen, according
to the common approach mentioned in Section 4, because they exhibit the highest values of
13 Note that the value of the convergence factor used in (Haykin, 2001) is twice the value used herein.
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 283
25
2
σd
2
1.9
−7 −5
Δ2 = 10 Δ2 = 10 Δ = 10−3
1.8 2
Time Averaged MSE
1.7
1.6
1.5
1.4
Δ2 = 0
1.3
1.2
ξmin
−10 −8 −6 −4 −2
10 10 10 10 10
Convergence Factor, μ
Fig. 10. Analytical time-averaged MSE as a function of the convergence factor, when using
only the branch corresponding to frequency shift α = 2 f c . The thick dashed line corresponds
to Δ2 = 0.
spectral correlation for a BPSK modulation (Gardner et al., 1987). The desired signal for the
adaptive algorithm is the received signal, d(n ) = x (n ), following the blind scheme described
in Section 5.1 (see Fig. 9). In all cases, the carrier frequency and symbol interval have been set
to f c = 0.3 (normalized
to the sampling frequency) and Ts = 32 samples. The noise power is
set to σr2 = E |r (n )|2 = 1, and the SNR is also fixed to SNR= 0 dB, and is defined as the ratio
between the mean power of the signal and the noise: SNR= E | s(n )|2 /σr2 . Both w̃1 (n )
and w̃2 (n ), are FIR filters with Mi = 64 taps.
In order to clarify some concepts, let us consider firstly the case where the FRESH filter
is composed of only the branch associated with the frequency shift α2 = 2 f c . The total
time-averaged MSE at steady-state (hereinafter referred to as simply “the MSE”) is shown in
Fig. 10, as a function of the convergence factor and for different values of the frequency-shift
error, Δ2 , which are also normalized to the sampling frequency. The MSE of the LMS-FRESH
filter when Δ2 = 0 is plotted with a dashed thick line as a point of reference. This is the
lower bound of the MSE attainable by the LMS-FRESH filter, and converges to the MSE of the
optimal FRESH filter, ξ min , as μ decreases. Note that the minimum MSE is always greater than
the noise power, i.e. ξ min > 1, since even if the signal were perfectly estimated (s(n ) = s(n )),
the error signal would match the noise (ε(n ) = r (n )).
When there exist errors in the frequency shift, the MSE is always higher than the lower bound.
Since the lower bound includes the gradient noise effect, the excess MSE over the lower bound
284
26 Adaptive Filtering
Will-be-set-by-IN-TECH
Δ = 10−3
2 1
σ2
d
1.95
1.9
Δ = 10−7 Δ = 10−5 Δ =0
1.8 1 1 1
1.75
1.7
ξmin
−10 −8 −6 −4 −2
10 10 10 10 10
Convergence Factor, μ
Fig. 11. Analytical time-averaged MSE as a function of the convergence factor, when using
only the branch corresponding to frequency shift α = 1/Ts . The thick dashed line
corresponds to Δ1 = 0.
is due to the lag error only, and varies from zero (for high μ) up to wo† Rx wo = σd2 − ξ min , (see
(89), for small μ). Thus, the total MSE tends to σd2 when μ decreases. The curves also show the
dependence of the MSE with the error in the frequency shift, which increases with Δ2 (at any
given μ). Therefore, in order to obtain a small excess MSE due to lag, it is required a faster rate
of convergence (bigger μ) when Δ2 increases.
An additional result shown in Fig. 10 is that for high enough errors in the frequency shifts, the
MSE does not reach ξ min at any value of μ. In other words, it is impossible to simultaneously
reduce the MSE terms due to the lag error and the gradient noise. In such a case, the optimal μ
locates at an intermediate value as a result of the trade-off between reducing the lag error and
the gradient noise. In practice, the error of the frequency-shifts is commonly unknown. Then
the convergence factor should be chosen as a trade-off between the maximum cycle-frequency
error which can be compensated by the adaptive filter and the increase of the excess MSE due
to gradient noise.
Similar results can be extracted from Fig. 11, which corresponds to the case where only the
branch of the FRESH filter with frequency shift α = 1/Ts has been used. The main difference
between Figures 10 and 11 is that the minimum MSE, ξ min , is bigger when only α = 1/Ts is
used. The reason is that the spectral correlation level exhibited by a BPSK signal is smaller at
α = 1/Ts than at α = 2 f c (Gardner et al., 1987). Furthermore, it can be seen that σd2 is not an
upper bound for the MSE (as could be thought from Fig. 10), but the limit of the MSE as μ
tends to zero.
For the case illustrated in Fig. 12, the two branches shown in Fig. 9 are used, but there is
uncertainty only in the symbol rate, that is, the error in the frequency shift related to the
carrier frequency, Δ2 , is always zero. The curves show that the improvement in the MSE is
not very significant when the error in the symbol rate is compensated. This occurs when the
contribution to the signal estimate for a branch is more significant than for the other, which is
mainly caused by the different spectral level exhibited by a BPSK signal at cycle frequencies
α = 1/Ts and α = 2 f c . As a consequence, the minimum MSE when only the second branch
is used (ξ min,2), and when both branches are used (ξ min,12 ) are very similar. Furthermore, μ
tends to zero the MSE tends to ξ min,2 instead of σd2 , since there is no uncertainty in the carrier
frequency.
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 285
27
1.23
1.22
−3
Δ1 = 10
1.21
1.19
−7 −5
Δ = 10 Δ = 10
1 1
1.18
Δ =0
1
1.17
ξ
min,12
1.16
−10 −8 −6 −4 −2
10 10 10 10 10
Convergence Factor, μ
Fig. 12. Time-averaged MSE as a function of the convergence factor, when using the two
branches. Δ2 = 0 in all cases, also for the thick dashed line which corresponds to Δ1 = 0.
Simulation results are represented by cross marks.
σ2
d
2
1.9
−7 −5 −3
Δ2 = 10 Δ2 = 10 Δ2 = 10
1.8
Time Averaged MSE
1.7
1.6
1.5
1.4
1.3
1.2 Δ =0
2
1.1
ξ
min
−10 −8 −6 −4 −2
10 10 10 10 10
Convergence Factor, μ
Fig. 13. Time-averaged MSE as a function of the convergence factor, when using the two
branches. Δ1 = 10−5 in all cases, except for the thick dashed line which corresponds to
Δ1 = 0 and Δ2 = 0. Simulation results are represented by cross marks.
Fig. 13 shows the MSE when there exist errors in both frequency shifts. The curves correspond
to an error in the symbol rate Δ1 = 10−5 and different errors for the carrier frequency. In this
case, the minimum MSE is attained at a convergence factor such that the lag error of both
branches is compensated. However, compensating the error in the carrier frequency is critical,
while compensating the error in the symbol rate only produces a slight improvement. This
conclusion can be deduced from the curve for Δ2 = 10−7 . Since Δ1 = 10−5 , it is required a
convergence factor close to μ = 10−4 or higher in order to compensate the frequency-shift
error at the first branch (see Fig. 11). However, the excess MSE due to lag error is small
for μ = 10−6 or higher, where only the frequency-shift error at the second branch can be
compensated.
The analytical expression for the MSE has been obtained under some assumptions
(Gaussianity, small error conditions and input uncorrelated over time) which in practice are
only approximations, and also in the case studies presented. Therefore, in order to check its
accuracy, Figures 12 and 13 also include the MSE obtained by simulation, which is represented
286
28 Adaptive Filtering
Will-be-set-by-IN-TECH
by the lines with cross marks: 200 realizations have been used for the ensemble averages in
order to obtain the instantaneous MSE. Then, the instantaneous MSE has been time-averaged
over 200.000 samples after the convergence of the LMS. The agreement between theoretical
and simulation results is excellent in all cases, which confirms the validity of the assumptions
made.
A last case study is presented with a twofold purpose: 1) To demonstrate that the adaptive
algorithm compensates cycle-frequency errors also in the presence of interferences. 2) To
demonstrate that an adaptive algorithm different from the LMS exhibits a similar behavior.
For this reason, we shall use the RLS algorithm in this last case study, despite the lack of
an analytical expression for the MSE. The scheme for the adaptive FRESH filter presented in
Fig. 9 is valid also in this case study, but a BPSK interference having random parameters
has been added to the input, along with the BPSK signal and the noise defined for the
previous case studies. The errors in the frequency shifts of the FRESH filter are also random.
All random variables are constant at each trial, but change from trial to trial according an
uniform distribution within the ranges gathered in Table 1. As regards the RLS algorithm,
it is exponentially weighted with a convergence factor λ = 1 − μ, where μ is known as the
forgetting rate (Haykin, 2001). Also, we have used in this case study the BPSK signal as the
reference signal, so that the MSE does not depend on the random interference power.
Fig. 14 shows the MSE obtained by simulation (using 100 trials for computing the ensemble
averages and 20000 samples for the time averages). The results show the capability of the RLS
algorithm for compensating errors in the frequency shifts in the presence of the interference.
Otherwise, the RLS algorithm would have converged to a FRESH filter which cancels its
output, and the obtained MSE would have been equal to σd2 . Analogously to the previous
case studies using the LMS algorithm, the RLS algorithm cannot compensate errors in the
frequency shifts if the forgetting rate is too small (slow convergence). In such a case, the
adaptive FRESH filter tends to cancel its output and the error is σd2 . For moderate forgetting
rates, the cycle-frequency errors are compensated, and the MSE approaches ξ min . Contrarily,
an excessively high forgetting rate increases the MSE as a consequence of the gradient noise.
In summary, the possible existence of cycle-frequency errors in LAPTV filtering is a serious
problem that must be managed. When using the non-adaptive FRESH implementation, the
minimum time-averaged MSE is obtained when the branches with uncertainties in their
frequency shifts are not used (or equivalently, their output is cancelled). On the contrary,
adaptive-FRESH filters can work in the presence of errors in the frequency shifts. In such a
case, the adaptive-FRESH filter behaves as an LPTV filter for those branches with an error in
the frequency shift. In order to be effective, the rate of convergence of the adaptive algorithm
must be carefully chosen. The optimal rate of convergence results from the trade-off between
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 287
29
1
σ2
d
0.9
0.8
0.6
0.5
0.4
0.3
0.2 ξ
min
0.1
−5 −4 −3 −2
10 10 10 10
Forgetting Rate, μ = 1 − λ
Fig. 14. Simulated time-averaged MSE as a function of the forgetting rate of the RLS
algorithm, in the presence of a BPSK interference with random parameters.
decreasing the excess MSE due to gradient noise (slow rate of convergence) and decreasing
the excess MSE due to lag error (fast rate of convergence). The analytical expressions in
this section allow to compute the optimal convergence factor and the time-averaged MSE
at steady-state of the LMS-FRESH filter.
been removed, the detection is performed on the residual, which is assumed to consist of only
the noise and the SOI.
filters. As a result, the frequency response of the optimum set of LTI filters becomes:
where ηr is the noise Power Spectral Density (PSD), and IL is the identity matrix of size L × L.
In addition, the definition of the spectral correlation vector Suu ( f ) is analogous to (50), while
the spectral autocorrelation matrices Sss ( f ) and Suu ( f ) are defined analogously to (51).
1
Hw ( f ) = # (96)
1 + |WΓs ( f )|2
Moreover, using an adaptive algorithm requires some training time, during which the
detector decisions could be wrong. The scheme proposed in Fig. 16 consists of a branch for
training the interference rejection system which incorporates the adaptive FRESH filter, and
an independent non-adaptive FRESH filter which effectively performs the signal separation
task and is the one connected to the detector. The coefficients of the non-adaptive FRESH
filter are fixed once the adaptive one has converged. This requires some mechanism for
controlling the beginning and ending of a learning interval, in which the detector decisions are
considered wrong. Such a mechanism might be based on monitoring the power variations of
the interferences estimate and the SOI plus noise estimate, updating the non-adaptive FRESH
system when both powers stabilize.
7.3 Application
Finally, let us quantify the performance of the described interception system through a case
study whose scenario is represented in Fig. 17. In that scenario, the SOI is a Continuous-Wave
290
32 Adaptive Filtering
Will-be-set-by-IN-TECH
Fig. 17. Scenario for the detection of a CW-LFM signal hidden by a DS-BPSK interference.
15
10
5
Power spectral density, (dB)
−5
−10
−15
−20
−25
Fig. 18. PSD of noise, the CW-LFM signal (SOI) and the DS-BPSK interference (interference).
SNR and INR fixed at 0 dB.
Linear Frequency Modulated (CW-LFM) radar signal which is being transmitted by a hostile
equipment and, therefore, unknown. This signal is intentionally transmitted in the spectral
band of a known Direct-Sequence Binary Phase Shift Keying (DS-BPSK) spread-spectrum
communication signal, with the aim of hindering its detection. The SOI sweeps a total
bandwidth of BW = 20 MHz with a sweeping time Tp = 0.5 ms, while the DS-BPSK
interfering signal employs a chip rate 1/Tc = 10.23 Mcps. The PSDs of the SOI, the
interference and the noise are shown in Fig. 18.
The structure of the FRESH filter is designed based on a previous study of the performance of
the sub-optimal FRESH filters. As a result, the interference rejection system incorporates an
adaptive FRESH filter consisting of 5 branches, each one using a FIR filter of 1024 coefficients.
The frequency shifts of the FRESH filter correspond to the 5 cycle frequencies of the DS-BPSK
interference with the highest spectral correlation level, which are (Gardner et al., 1987):
{±1/Tc } for the input x (t), and {2 f c , 2 f c ± 1/Tc } for the complex conjugate of the input
x ∗ (t); where f c is the carrier frequency and Tc is the chip duration of the DS-BPSK interference.
The adaptive algorithm used is Fast-Block Least Mean Squares (FB-LMS) (Haykin, 2001), with
a convergence factor μ = 1.
Next, we present some simulation results on the interception system performance obtained
after the training interval has finished. Firstly, the improvement in the SIR obtained at the
output of the interference rejection system is represented in Fig. 19, as a function of the input
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 291
33
45
INR = 0 dB
INR = 10 dB
40 INR = 20 dB
INR = 30 dB
35 INR = 40 dB
30
20
15
10
0
−30 −20 −10 0 10 20 30 40
Input SINR, (dB)
Fig. 19. Simulated SIR improvement as a function of the input SINR and INR.
40
INR = 0 dB
35 INR = 10 dB
INR = 20 dB
30 INR = 30 dB
INR = 40 dB
25
20
Output INR, (dB)
15
10
−5
−10
−15
−30 −20 −10 0 10 20 30 40
Input SINR, (dB)
Fig. 20. Simulated INR at the output of the interference rejection system.
0
0
10 10
−1 −1
10 10
−2 −2
10 10
−3 −3
10 INR = −∞ dB 10 INR = −∞ dB
INR = 0 dB INR = 0 dB
INR = 10 dB INR = 10 dB
INR = 20 dB INR = 20 dB
INR = 30 dB INR = 30 dB
INR = 40 dB INR = 40 dB
−4 −4
10 10
900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 5 10 15 20 25 30 35 40
Detection threshold Detection threshold
Fig. 21. Degradation of the PFA when the interference is present (with the interference
rejection system).
interference becomes masked by noise if the input INR is low enough (i.e. INR= 0 dB).
As the input INR increases, the interference rejection system increases its effectiveness, so
that the output INR decreases. That is why the output INR is lower for INR= 10 dB and
INR= 20 dB than for INR= 0 dB.
The output INR can provide an idea about the degradation of the probability of false alarm
(PFA ) of the detector (the probability of detection when the SOI is not present). However,
each particular detector is affected in a different way. We shall illustrate this fact with two
different detectors. The first one is an energy detector (ED), which consists of comparing the
total energy to a detection threshold set to attain a desired PFA . The second one is a detector
based on atomic decomposition (AD), such as that proposed in (López Risueño et al., 2003).
This detector exhibits an excellent performance for LFM signals, as the SOI considered in our
case study. The AD-based detector can be thought as a bank of correlators or matched filters,
each one matched to a chirplet (a signal with Gaussian envelope and LFM), whose maximum
output is compared to a detection threshold. Both detectors process the signal by blocks,
taking a decision each 1024 samples.
Fig. 21 shows the degraded PFA of the whole interception system in the presence of the
interference, and when the detection threshold has been determined for an input consisting
of only noise. The curves clearly show the different dependence of the PFA of both detectors
on the input INR. The energy detector exhibits a higher sensitivity to the interference than
the AD-based one. Thus, the AD-based detector visibly degrades its PFA only for an input
INR= 40 dB and above. On the contrary, ED always exhibits a degraded PFA in the presence
of the interference due to the energy excess, which is proportional to the output INR shown
in Fig. 20.
Finally, we end this application example by showing the sensitivity improvement of the
interception system obtained thanks to the interference rejection system. The sensitivity is
defined as the SNR at the input of the interception system required to attain an objective
probability of detection (PD = 90%), for a given probability of false alarm (PFA = 10−6 ). Thus,
the detection threshold takes a different value depending on the input INR so that the PFA
holds for all the INR values. The simulation results are gathered in Tab. 2, with all values
expressed in dB. At each INR value, the sensitivities of both detectors, AD and ED, with and
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 293
35
With interf. rej. system Without interf. rej. system Sensitivity improvement
INR AD ED AD ED AD ED
−∞ -12.1 -6.1 -12.1 -6.1 0.0 0.0
0 -9.6 -4.3 -4.0 -3.5 5.6 0.8
10 -9.0 -4.3 5.8 1.1 14.8 5.4
20 -9.2 -4.3 15.9 6.8 27.1 11.1
30 -8.3 -3.4 26.0 14.3 34.3 17.7
40 -4.1 -0.5 35.8 21.6 39.9 22.1
Table 2. Sensitivity (SNR, dB) for the CW-LFM signal as a function of the INR. PFA = 10−6 ,
PD = 90%.
without the interference rejection system, are shown. The sensitivity improvement obtained
by using the interference rejection system is also shown.
As can be seen, the improvement is very significant and proves the benefit of using the
interference rejection system. Moreover, the improvement is higher for increasing input INR.
However, there is still a sensitivity degradation as the INR increases due to an increase in the
detection threshold and/or a distortion produced by the interference rejection system to the
SOI because of the signal leakage at the FRESH output (the latter only applies to AD, since ED
is insensitive to the signal waveform). And, as expected, the AD-based detector outperforms
ED (López Risueño et al., 2003).
8. Summary
This chapter has described the theory of adaptive FRESH filtering. FRESH filters
represent a comprehensible implementation of LAPTV filters, which are the optimum
filters for estimating or extracting signal information when the signals are modelled as
almost-cyclostationary stochastic processes. When dealing with complex signals, both the
signal and its complex conjugate must be filtered, resulting in WLAPTV filters. The
knowledge required for the design optimal FRESH filters is rarely available beforehand in
practice, which leads to the incorporation of adaptive scheme. Since FRESH filters consist
of a set of LTI filters, classical algorithms can be applied by simply use the stationarized
versions of the inputs of these LTI filters, which are obtained by time-averaging their statistics.
Then, the optimal set of LTI filters is given by the multidimensional Wiener filter theory.
In addition, thanks to their properties of signal separation in the cycle frequency domain,
adaptive FRESH filters can operate blindly, that is without reference of the desired signal,
by simply using as frequency shifts the cycle frequencies belonging uniquely to the desired
signal cyclic spectrum. Furthermore, adaptive FRESH filters have the advantage of being able
to compensate small errors in their frequency shifts, which can be present in practice due to
non-ideal effects such as Doppler or the oscillators stability. In this case, the convergence rate
of the adaptive algorithm must be carefully chosen in order to simultaneously minimize the
gradient noise and the lag errors. The chapter is finally closed by presenting an application
example, in which an adaptive FRESH filter is used to suppress known interferences for an
unknown hidden signal interception system, demonstrating the potentials of adaptive FRESH
filters in this field of application.
294
36 Adaptive Filtering
Will-be-set-by-IN-TECH
9. References
Adlard, J., Tozer, T. & Burr, A. (1999). Interference rejection in impulsive noise for
VLF communications, IEEE Military Communications Conference Proceedings, 1999.
MILCOM 1999., pp. 296–300.
Agee, B. G., Schell, S. V. & Gardner, W. A. (1990). Spectral self-coherence restoral: A new
approach to blind adaptive signal extraction using antenna arrays, Proceedings of the
IEEE 78(4): 753–767.
Brown, W. A. (1987). On the Theory of Cyclostationary Signals, Ph.D. dissertation.
Chen, Y. & Liang, T. (2010). Application study of BA-FRESH filtering technique for
communication anti-jamming, IEEE 10th International Conference on Signal Processing
(ICSP), pp. 287–290.
Chevalier, P. & Blin, A. (2007). Widely linear MVDR beamformers for the reception of an
unknown signal corrupted by noncircular interferences, IEEE Transactions on Signal
Processing 55(11): 5323–5336.
Chevalier, P. & Maurice, A. (1997). Constrained beamforming for cyclostationary signals,
International Conference on Acoustics, Speech, and Signal Processing, ICASSP-97.
Chevalier, P. & Pipon, F. (2006). New insights into optimal widely linear array
receivers for the demodulation of BPSK, MSK, and GMSK signals corrupted by
noncircular interferences - Application to SAIC, IEEE Transactions on Signal Processing
54(3): 870–883.
Corduneanu, C. (1968). Almost Periodic Functions, Interscience Publishers.
Franks, L. E. (1994). Polyperiodic linear filtering, in W. A. Gardner (ed.), Cyclostationarity in
Communicactions and Signal Processing, IEEE Press.
Gameiro, A. (2000). Capacity enhancement of DS-CDMA synchronous channels by
frequency-shift filtering, Proceedings of the IEEE International Symposium on Personal,
Indoor and Mobile Radio Communications.
Gardner, W. A. (1978). Stationarizable random processes, IEEE Transactions on Information
Theory 24(1): 8–22.
Gardner, W. A. (1986). Introduction to Random Processes with Applications to Signals and Systems,
Macmillan Publishing Company.
Gardner, W. A. (1987). Spectral correlation of modulated signals: Part I – analog modulation,
IEEE Transactions on Communications 35(6): 584–594.
Gardner, W. A. (1991). Exploitation of spectral redundancy in cyclostationary signals, IEEE
Signal Processing Magazine 8(2): 14–36.
Gardner, W. A. (1993). Cyclic Wiener filtering: Theory and method, IEEE Transactions on
Communications 41(1): 151–163.
Gardner, W. A. (1994). Cyclostationarity in Communications and Signal Processing, IEEE Press.
Gardner, W. A., Brown, W. A. & Chen, C. K. (1987). Spectral correlation of modulated signals:
Part II – digital modulation, IEEE Transactions on Communications 35(6): 595–601.
Gardner, W. A. & Franks, L. E. (1975). Characterization of cyclostationary random signal
processes, IEEE Transactions on Information Theory 21(1): 4–14.
Gelli, G. & Verde, F. (2000). Blind LPTV joint equalization and interference suppression,
Acoustics, Speech, and Signal Processing, 2000. ICASSP ’00. Proceedings. 2000 IEEE
International Conference on.
Giannakis, G. B. (1998). Cyclostationary signal analysis, in V. K. Madisetti & D. Williams (eds),
The Digital Signal Processing Handbook, CRC Press.
Adaptive-FRESH
Adaptive-FRESH Filtering Filtering 295
37
Gonçalves, L. & Gameiro, A. (2002). Frequency shift based multiple access interference
canceller for multirate UMTS-TDD systems, The 13th IEEE International Symposium
on Personal, Indoor and Mobile Radio Communications.
Haykin, S. (2001). Adaptive Filter Theory, Prentice Hall.
Hu, Y., Xia, W. & Shen, L. (2007). Study of anti-interference mechanism of multiple WPANs
accesing into a HAN, International Symposium on Intelligent Signal Processing and
Communication Systems.
Jianhui, P., Zhongfu, Y. & Xu, X. (2006). A novel robust cyclostationary beamformer based
on conjugate gradient algorithm, 2006 International Conference on Communications,
Circuits and Systems Proceedings, Vol. 2, pp. 777–780.
Lee, J. H. & Lee, Y. T. (1999). Robust adaptive array beamforming for cyclostationary
signals under cycle frequency error, IEEE Transactions on Antennas and Propagation
47(2): 233–241.
Lee, J. H., Lee, Y. T. & Shih, W. H. (2000). Efficient robust adaptive beamforming for
cyclostationary signals, IEEE Transactions on Signal Processing 48(7): 1893–1901.
Li, X. & Ouyang, S. (2009). One reduced-rank blind fresh filter for spectral overlapping
interference signal extraction and DSP implementation, International Workshop on
Intelligent Systems and Applications, ISA, pp. 1–4.
Loeffler, C. M. & Burrus, C. S. (1978). Optimal design of periodically time-varying and
multirate digital filters, IEEE Transactions on Acoustic, Speech and Signal Processing
66(1): 51–83.
López Risueño, G., Grajal, J. & Yeste-Ojeda, O. A. (2003). Atomic decomposition-based
radar complex signal interception, IEE Proceedings on Radar, Sonar and Navigation
150: 323–331.
Martin, V., Chabert, M. & Lacaze, B. (2007). Digital watermarking of natural images based
on LPTV filters, Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE
International Conference on.
Mirbagheri, A., Plataniotis, K. & Pasupathy, S. (2006). An enhanced widely linear
CDMA receiver with OQPSK modulation, IEEE Transactions on Communications
54(2): 261–272.
Napolitano, A. & Spooner, C. M. (2001). Cyclic spectral analysis of continuous-phase
modulated signals, IEEE Transactions on Signal Processing 49(1): 30–44.
Ngan, L. Y., Ouyang, S. & Ching, P. C. (2004). Reduced-rank blind adaptive frequency-shift
filtering for signal extraction, IEEE International Conference on Acoustics, Speech, and
Signal Processing, ICASSP, Vol. 2.
Petrus, P. & Reed, J. H. (1995). Time dependent adaptive arrays, IEEE Signal Processing Letters
2(12): 219–222.
Picinbono, B. & Chevalier, P. (1995). Widely linear estimation with complex data, IEEE
Transactions on Signal Processing 43(8): 2030–2033.
Reed, J. H. & Hsia, T. C. (1990). The performance of time-dependent adaptive filters for
interference rejection, IEEE Transactions on Acoustic, Speech and Signal Processing
38(8): 1373–1385.
Schell, S. V. & Gardner, W. A. (1990). Progress on signal-selective direction finding, Fifth ASSP
Workshop on Spectrum Estimation and Modeling, pp. 144–148.
Schmidt, R. O. (1986). Multiple emitter location and signal parameter estimation, IEEE
Transactions on Antennas and Propagation 34(3): 276–280.
296
38 Adaptive Filtering
Will-be-set-by-IN-TECH
Van Trees, H. L. (1968). Detection, Estimation, and Modulation Theory, Vol. 1, John Wiley and
Sons.
Whitehead, J. & Takawira, F. (2004). Low complexity constant modulus based cyclic blind
adaptive multiuser detection, AFRICON, 2004. 7th AFRICON Conference in Africa,
Vol. 1, pp. 115–120.
Whitehead, J. & Takawira, F. (2005). Blind adaptive multiuser detection for periodically time
varying interference suppression [DS-CDMA system applications], IEEE Wireless
Communications and Networking Conference, 2005, Vol. 1, pp. 273–279.
Widrow, B., Glover, Jr., J. R., McCool, J. M., Kaunitz, J., Williams, C. S., Hearn, R. H., Zeidler,
J. R., Dong, Jr., E. & Goodlin, R. C. (1975). Adaptive noise cancelling: Principles and
applications, Proceedings of the IEEE 63(12): 1692–1717.
Widrow, B., McCool, J. M., Larimore, M. G. & Johnson, Jr., C. R. (1976). Stationary and
nonstationary learning characteristics of the LMS adaptive filter, Proceedings of the
IEEE 64(8): 1151–1162.
Wong, H. E. & Chambers, J. A. (1996). Two-stage interference immune blind equaliser which
exploits cyclostationary statistics, Electronic Letters 32(19): 1763–1764.
Yeste-Ojeda, O. A. & Grajal, J. (2008). Cyclostationarity-based signal separation in interceptors
based on a single sensor, IEEE Radar Conference 2008, pp. 1–6.
Yeste-Ojeda, O. A. & Grajal, J. (2010). Adaptive-FRESH filters for compensation of
cycle-frequency errors, IEEE Transactions on Signal Processing 58(1): 1–10.
Zadeh, L. A. (1950). Frequency analysis of variable networks, Proceedings of the I.R.E.
pp. 291–299.
Zhang, H., Abdi, A. & Haimovich, A. (2006). Reduced-rank multi-antenna cyclic Wiener
filtering for interference cancellation, Military Communications Conference, 2006.
MILCOM 2006.
Zhang, J., Liao, G. & Wang, J. (2004). A novel robust adaptive beamforming for cyclostationary
signals, The 7th International Conference on Signal Processing, ICSP, Vol. 1, pp. 339–342.
Zhang, J., Wong, K. M., Luo, Z. Q. & Ching, P. C. (1999). Blind adaptive FRESH filtering for
signal extraction, IEEE Transactions on Signal Processing 47(5): 1397–1402.
0
12
1. Introduction
The Least Mean Square (LMS) algorithm is probably the most popular adaptive algorithm.
The algorithm has since its introduction in Widrow & Hoff (1960) been widely used in
many applications like system identification, communication channel equalization, signal
prediction, sensor array processing, medical applications, echo and noise cancellation etc. The
popularity of the algorithm is due to its low complexity but also due to its good properties
like e.g. robustness Haykin (2002); Sayed (2008). Let us explain the algorithm in the example
of system identification.
AA
K
A
A filter y(n)
- Adaptive
A
A
A
A
A # ?−
A
A e(n) Σ
"!
6
+
x (n) - d(n) -
Plant
response of the plant observing its input and output signals. To do so we connect an adaptive
filter in parallel with the plant. The adaptive filter is a linear filter with output signal y(n).
We then compare the output signals of the plant and the adaptive filter and form the error
signal e(n). Obviously one would like have the error signal to be as small as possible in some
sense. The LMS algorithm achieves this by minimizing the mean squared error but doing this
in instantaneous fashion. If we collect the impulse response coefficients of our adaptive filter
computed at iteration n into a vector w(n) and the input signal samples into a vector x(n), the
LMS algorithm updates the weight vector estimate at each iteration as
where μ is the step size of the algorithm. One can see that the weight update is in fact a low
pass filter with transfer function
μ
H (z) = (2)
1 − z −1
operating on the signal e∗ (n)x(n). The step size determines in fact the extent of initial
averaging performed by the algorithm. If μ is small, only a little of new information is passed
into the algorithm at each iteration, the averaging is thus over a large number of samples and
the resulting estimate is more reliable but building the estimate takes more time. On the other
hand if μ is large, a lot of new information is passed into the weight update each iteration, the
extent of averaging is small and we get a less reliable estimate but we get it relatively fast.
When designing an adaptive algorithm, one thus faces a trade–off between the initial
convergence speed and the mean–square error in steady state. In case of algorithms belonging
to the Least Mean Square family this trade–off is controlled by the step-size parameter. Large
step size leads to a fast initial convergence but the algorithm also exhibits a large mean–square
error in the steady state and in contrary, small step size slows down the convergence but
results in a small steady state error.
Variable step size adaptive schemes offer a possible solution allowing to achieve both fast
initial convergence and low steady state misadjustment Arenas-Garcia et al. (1997); Harris
et al. (1986); Kwong & Johnston (1992); Matthews & Xie (1993); Shin & Sayed (2004). How
successful these schemes are depends on how well the algorithm is able to estimate the
distance of the adaptive filter weights from the optimal solution. The variable step size
algorithms use different criteria for calculating the proper step size at any given time instance.
For example squared instantaneous errors have been used in Kwong & Johnston (1992)
and the squared autocorrelation of errors at adjacent time instances have been used in
Arenas-Garcia et al. (1997). The reference Matthews & Xie (1993) ivestigates an algorithm
that changes the time–varying convergence parameters in such a way that the change is
proportional to the negative of gradient of the squared estimation error with respect to the
convergence parameter. In reference Shin & Sayed (2004) the norm of projected weight error
vector is used as a criterion to determine how close the adaptive filter is to its optimum
performance.
Recently there has been an interest in a combination scheme that is able to optimize the
trade–off between convergence speed and steady state error Martinez-Ramon et al. (2002).
The scheme consists of two adaptive filters that are simultaneously applied to the same inputs
as depicted in Figure 2. One of the filters has a large step size allowing fast convergence
and the other one has a small step size for a small steady state error. The outputs of the
filters are combined through a mixing parameter λ. The performance of this scheme has been
studied for some parameter update schemes Arenas-Garcia et al. (2006); Bershad et al. (2008);
Transient Analysis of a Combination of Two Adaptive Filters 2993
Transient Analysis of a Combination
of Two Adaptive Filters
Candido et al. (2010); Silva et al. (2010). The reference Arenas-Garcia et al. (2006) uses convex
combination i.e. λ is constrained to lie between 0 and 1. The references Silva et al. (2010) and
Candido et al. (2010) present transient analysis of a slightly modified versions of this scheme.
The parameter λ is in those papers found using an LMS type adaptive scheme and possibly
computing the sigmoidal function of the result. The reference Bershad et al. (2008) takes
another approach computing the mixing parameter using an affine combination. This paper
uses the ratio of time averages of the instantaneous errors of the filters. The error function of
the ratio is then computed to obtain λ.
In Mandic et al. (2007) a convex combination of two adaptive filters with different adaptation
schemes has been investigated with the aim to improve the steady state characteristics. One
of the adaptive filters in that paper uses LMS algorithm and the other one Generalized
Normalized Gradient Decent algorithm. The combination parameter λ is computed using
stochastic gradient adaptation. In Zhang & Chambers (2006) the convex combination of two
adaptive filters is applied in a variable filter length scheme to gain improvements in low SNR
conditions. In Kim et al. (2008) the combination has been used to join two affine projection
filters with different regularization parameters. The work Fathiyan & Eshghi (2009) uses the
combination on parallel binary structured LMS algorithms. These three works use the LMS
like scheme of Azpicueta-Ruiz et al. (2008b) to compute λ.
It should be noted that schemes involving two filters have been proposed earlier Armbruster
(1992); Ochiai (1977). However, in those early schemes only one of the filters have been
adaptive while the other one has used fixed filter weights. Updating of the fixed filter has been
accomplished by copying of all the coefficients from the adaptive filter, when the adaptive
filter has been performing better than the fixed one.
In this chapter we compute the mixing parameter λ from output signals of the individual
filters. The scheme was independently proposed in Trump (2009a) and Azpicueta-Ruiz et al.
(2008a), the steady state performance of it was investigated in Trump (2009b) and the tracking
performance in Trump (2009c). The way of calculating the mixing parameter is optimal in the
sense that it results from minimization of the mean-squared error of the combined filter. In the
main body of this chapter we present a transient analysis of the algorithm. We will assume
throughout the chapter that the signals are complex–valued and that the combination scheme
uses two LMS adaptive filters. The italic, bold face lower case and bold face upper case letters
will be used for scalars, column vectors and matrices respectively. The superscript ∗ denotes
complex conjugation and H Hermitian transposition of a matrix. The operator E[·] denotes
mathematical expectation, tr [·] stands for trace of a matrix and Re{·} denotes the real part of
a complex variable.
2. Algorithm
Let us consider two adaptive filters, as shown in Figure 2, each of them updated using the
LMS adaptation rule
In the above equations the vector wi (n) is the length N vector of coefficients of the i-th
adaptive filter, with i = 1, 2. The vector wo is the true weight vector we aim to identify with
our adaptive scheme and x(n) is the N input vector, common for both of the adaptive filters.
300
4 Adaptive Filtering
Will-be-set-by-IN-TECH
e1 ( n )
#
y1 ( n )
- w1 (n) - −
"!
-
λ(n) #?
x (n) y(n)
- + - d(n)
"!
1 − λ(n) 6
-
A
K
A #
A
- w2 (nA) - −
A y2 ( n )
"!
A
A
e2 ( n )
The input process is assumed to be a zero mean wide sense stationary Gaussian process. The
desired signal d(n) is a sum of the output of the filter to be identified and the Gaussian, zero
mean i.i.d. measurement noise v(n). We assume that the measurement noise is statistically
independent of all the other signals. μi is the step size of i–th adaptive filter. We assume
without loss of generality that μ1 > μ2 . The case μ1 = μ2 is not interesting as in this case the
two filters remain equal and the combination renders to a single filter.
The outputs of the two adaptive filters are combined according to
where yi (n) = wiH (n − 1)x(n) and the mixing parameter λ can be any real number.
We define the a priori system error signal as difference between the output signal of the true
system at time n, given by yo (n) = woH x(n) = d(n) − v(n), and the output signal of our
adaptive scheme y(n)
Let us now find λ(n) by minimizing the mean square of the a priori system error. The
derivative of E[|ea (n)|2 ] with respect to λ(n) reads
∂E[|ea (n)|2 ]
= 2E[(yo (n) − λ(n)y1 (n) − (1 − λ(n))y2 (n))
∂λ(n)
· (−y1 (n) + y2 (n))∗ ]
= 2E[Re{(yo (n) − y2 (n))(y2 (n) − y1 (n))∗ }
+λ(n)|(y2 (n) − y1 (n))|2 ].
Transient Analysis of a Combination of Two Adaptive Filters 3015
Transient Analysis of a Combination
of Two Adaptive Filters
3. Transient analysis
In this section we are interested in finding expressions that characterize transient performance
of the combined algorithm i.e. we intend to derive formulae that characterize entire course of
adaptation of the algorithm. Before we can proceed we need, however, to introduce some
notations. First let us denote the weight error vector of i–th filter as
w̃i (n) = wo − wi (n). (9)
Then the equivalent weight error vector of the combined adaptive filter will be
w̃(n) = λw̃1 (n) + (1 − λ)w̃2 (n). (10)
The mean square deviation of the combined filter is given by
As ei,a (n) = w̃iH (n − 1)x(n), the expression of the excess mean square error becomes
In what follows we often drop the explicit time index n as we have done in (15), if it is not
necessary to avoid a confusion.
Noting that yi (n) = wiH (n − 1)x(n), we can rewrite the expression for λ(n) in (8) as
We thus need to investigate the evolution of the individual terms of the type EMSEk,l =
E[w̃kH (n − 1)x(n)x H (n)w̃l (n − 1)] in order to reveal the time evolution of EMSE(n) and λ(n).
To do so we, however, concentrate first on the mean square deviation defined in (11).
For a single LMS filter we have after subtraction of (3) from wo and expressing ei (n) through
the error of the corresponding Wiener filter eo (n)
w̃i (n) = I − μi xx H w̃i (n − 1) − μi xeo∗ (n). (17)
We next approximate the outer product of input signal vectors by its correlation matrix xx H ≈
Rx . The approximation is justified by the fact that with small step size the weight error update
of the LMS algorithm (17) behaves like a low pass filter with a low cutoff frequency. With this
approximations we have
This means in fact that we apply the small step size theory Haykin (2002) even if the
assumption of small step size is not really true for the fast adapting filter. In our simulation
study we will see, however, that the assumption works in practice rather well.
Let us now define the eigendecomposition of the correlation matrix as
Q H R x Q = Ω, (19)
where Q is a unitary matrix whose columns are the orthogonal eigenvectors of R x and Ω is
a diagonal matrix having eigenvalues associated with the corresponding eigenvectors on its
main diagonal. We also define the transformed weight error vector as
Then we can rewrite the equation (18) after multiplying both sides by QH from the left as
v i ( n ) = ( I − μ i Ω ) v i ( n − 1) − p i ( n ). (22)
We note that the mean of pi is zero by the orthogonality theorem and the crosscorrelation
matrix of pk and pl equals
E[xeo∗ (n)eo (n)x H ] = E[xeo∗ (n)] E[eo (n)x H ] + E[xx H ] E[|eo |2 ]. (24)
Transient Analysis of a Combination of Two Adaptive Filters 3037
Transient Analysis of a Combination
of Two Adaptive Filters
The first term in the above is zero due to the principle of orthogonality and the second term
equals RJmin . Hence we are left with
where Jmin = E[|eo |2 ] is the minimum mean square error produced by the corresponding
Wiener filter. As the matrices I and Ω in (22) are diagonal, it follows that the m-th element of
vector vi (n) is given by
where ωm is the m-th eigenvalue of R x and vi,m and pi,m are the m-th components of the
vectors vi and pi respectively.
We immediately see that the mean value of vi,m (n) equals
2
0 < μi < (29)
λmax
is satisfied.
In some applications the input signal correlation matrix and its eigenvalues are not known a
priori. In this case it may be convenient to use the fact that
N −1 N −1
tr {R x } = ∑ r x (i, i ) = ∑ ωi > ωmax , (30)
i =0 i =0
where r x (i, i ) is the i-th diagonal element of the matrix R. Then we can normalize the step size
with the instantaneous estimate of the trace of correlation matrix x H (n)x(n) to get so called
normalized LMS algorithm. The normalized LMS algorithm uses the normalized step size
αi
μi =
x H (n)x(n)
and is convergent if
0 < αi < 2.
304
8 Adaptive Filtering
Will-be-set-by-IN-TECH
The normalized LMS is more convenient for practical usage if the properties of the input signal
are unknown or are varying in time like this is for example the case with speech signals.
To proceed with our development for the combination of two LMS filters we note that we can
express the MSD and its individual components in (11) through the transformed weight error
vectors as
N −1
E[w̃kH (n)w̃l (n)] = E[vkH (n)vl (n)] = ∑ E[vk,m (n)v∗l,m (n)] (31)
m =0
so we also need to find the auto– and cross correlations of v. Let us concentrate on
the m-th component in the sum above corresponding to the cross term and denote it as
Υm = E[vk,m (n)v∗l,m (n)]. The expressions for the component filters follow as special cases.
Substituting (26) into the expression of Υm above, taking the mathematical expectation and
noting that the vector p is independent of v(0) results in
Υm = E (1 − μk ωm )n vk (0) (1 − μl ωm )n v∗l (0) (32)
⎡ ⎤
n −1 n −1
+E ⎣ ∑ ∑ (1 − μk ωm )n−1−i (1 − μl ωm )n−1− j pk,m (i) p∗l,m ( j)⎦ .
i =0 j =0
We now note that most likely the two component filters are initialized to the same value
vk,m (0) = vl,m (0) = vm (0)
and that
μ μ ω J , i=j
E pk,m (i ) p∗l,m ( j) = k l m min
. (33)
0, otherwise
We then have for the m-th component of MSD
The sum over i in the above equation can be recognized as a geometric series with n terms.
The first term is equal to 1 and the geometric ratio equals (1 − μk ωm )−1 (1 − μl ωm )−1 . Hence
we have
n −1
∑ (1 − μ k ω m ) −i (1 − μ l ω m ) −i (35)
i =0
(1 − μ k ω m ) (1 − μ l ω m ) (1 − μ k ω m ) − n +1 (1 − μ l ω m ) − n +1
= − .
μk μl ωm − μk ωm − μl ωm
2 μk μl ωm 2 −μ ω −μ ω
k m l m
After substitution of the above into (34) and simplification we are left with
Υm = E[vk,m (n)v∗l,m (n)] (36)
Jmin
= (1 − μk ωm )n (1 − μl ωm )n |vm (0)|2 + 2 − ωm ωm
ωm μl − μk
Jmin
− 2 − ωm ωm ,
ωm μl − μk
Transient Analysis of a Combination of Two Adaptive Filters 3059
Transient Analysis of a Combination
of Two Adaptive Filters
which is our result for a single entry to the MSD crossterm vector. It is easy to see that for the
terms involving a single filter we get an expressions that coincide with the one available in the
literature Haykin (2002).
Let us now focus on the cross term
EMSEkl = E w̃kH (n − 1)x(n)x H (n)w̃l (n − 1) ,
appearing in the EMSE equation (15). Due to the independence assumption we can rewrite
this using the properties of trace operator as
EMSEkl = E w̃kH (n − 1)Rx w̃l (n − 1) (37)
= tr E Rx w̃l (n − 1)w̃kH (n − 1)
= tr Rx E w̃l (n − 1)w̃kH (n − 1) .
Let us now recall that for any of the filters w̃i (n) = Ωvi (n) to write
EMSEkl = tr Rx E Qvl (n − 1)vkH (n − 1)Q H
= tr E vkH (n − 1)Q H Rx Qvl (n − 1)
= tr E vkH (n − 1)Ωvl (n − 1)
N −1
= ∑ ωi E v∗k,i (n − 1)vl,i (n − 1) . (38)
i =0
where the components of type E[vk,i (n − 1)vl,i (n − 1)] are given by (36). To compute λ(n) we
use (16) substituting (38) for its individual components.
4. Simulation results
A simulation study was carried out with the aim of verifying the approximations made in the
previous Section. In particular we are interested in how well the small step-size theory applies
to our combination scheme of two adaptive filters.
We have selected the sample echo path model number one shown in Figure 3 from ITU-T
Recommendation G.168 Digital Network Echo Cancellers (2009), to be the unknown system to
identify.
We have combined two 64 tap long adaptive filters. In order to obtain a practical algorithm,
the expectation operators in both numerator and denominator of (8) have been replaced by
exponential averaging of the type
where u(n) is the signal to be averaged, Pu (n) is the averaged quantity and γ = 0.01. The
averaged quantities were then used in (8) to obtain λ. With this design the numerator and
306
10 Adaptive Filtering
Will-be-set-by-IN-TECH
denominator of in the λ expression (8) are relatively small random variables at the beginning
of the test cases. For practical purposes we have therefore restricted λ to be less than unity
and added a small constant to the denominator to avoid division by zero.
In the Figures below the noisy blue line represents the simulation result and the smooth red
line is the theoretical result. The curves are averaged over 100 independent trials.
In our first simulation example we use Gaussian white noise with unity variance as the input
signal. The measurement noise is another white Gaussian noise with variance σv2 = 10−3 . The
step sizes are μ1 = 0.005 for the fast adapting filter and μ2 = 0.0005 for the slowly adapting
filter. Figure 4 depicts the evolution of EMSE in time. One can see that the system converges
fast in the beginning. The fast convergence is followed by a stabilization period between
sample times 1000 – 7000 followed by another convergence to a lower EMSE level between
the sample times 8000 – 12000. The second convergence occurs when the mean squared error
of the filter with small step size surpasses the performance of the filter with large step size.
One can observe that the there is a good accordance between the theoretical and the simulated
curves.
In the Figure 5 we show the time evolution of mean square deviation of the combination in
the same test case. Again one can see that the theoretical and simulation curves fit well.
We continue with some examples with coloured input signal. In those examples the input
signal x is formed from the Gaussian white noise with unity variance by passing it through
the filter with transfer function
1
H (z) =
1 − 0.5z−1 − 0.1z−2
to get a coloured input signal. The measurement noise is Gaussian white noise, statistically
independent of x.
Transient Analysis of a Combination of Two Adaptive Filters 307
Transient Analysis of a Combination
of Two Adaptive Filters 11
Fig. 4. Time–evolutions of EMSE with μ1 = 0.005 and μ2 = 0.0005 and σv2 = 10−3 .
Fig. 5. Time–evolutions of EMSE with μ1 = 0.005 and μ2 = 0.0005 and σv2 = 10−3 .
308
12 Adaptive Filtering
Will-be-set-by-IN-TECH
In our first simulation example with coloured input we have used observation noise with
variance σv2 = 10−4 . The step size of the fast filter is μ1 = 0.005 and the step size of the slow
filter μ2 = 0.001. As seen from Figure 6 there is a a rapid convergence, determined by the
fast converging filter in the beginning followed by a stabilization period. When the EMSE of
the slowly adapting filter becomes smaller than that of the fast one, between sample times
10000 and 15000, a second convergence occurs. One can observe a good resemblance between
simulation and theoretical curves.
Fig. 6. Time–evolutions of EMSE with μ1 = 0.005 and μ2 = 0.001 and σv2 = 10−4 .
In Figure 7 we have made the difference between the step sizes small. The step size of
the fast adapting filter is now μ1 = 0.003 and the step size of the slowly adapting filter
is μ2 = 0.002. One can see that the characteristic horizontal part of the learning curve has
almost disappeared. We have also increased the measurement noise level to σv2 = 10−2 . The
simulation and theoretical curves show a good match.
In Figure 8 we have increased the measurement noise level even more to σv2 = 10−1 . The step
size of the fast adapting filter is μ1 = 0.004 and the step size of the slowly adapting filter is
μ2 = 0.0005. One can see that the theoretical simulation results agree well.
Figure 9 depicts the time evolution of the combination parameter λ in this simulation. At
the beginning of the test case the combination parameter is close to one. Correspondingly the
output signal of the fast filter is used as the output of the combination. After a while, when the
slow filter catches up the fast one and becomes better, λ changes toward zero and eventually
becomes a small negative number. In this state the slow but more accurate filter determines
the combined output. Again one can see that there is a clear similarity between the lines.
Transient Analysis of a Combination of Two Adaptive Filters 309
Transient Analysis of a Combination
of Two Adaptive Filters 13
5. Conclusions
In this chapter we have investigated a combination of two LMS adaptive filters that are
simultaneously applied to the same input signals. The output signals of the two adaptive
filters are combined together using an adaptive mixing parameter. The mixing parameter λ
was computed using the output signals of the individual filters and the desired signal. The
transient behaviour of the algorithm was investigated using the assumption of small step size
and the expressions for evolution of EMSE(n) and λ(n) were derived. Finally it was shown
in the simulation study that the derived formulae fit the simulation results well.
6. References
Arenas-Garcia, J., Figueiras-Vidal, A. R. & Sayed, A. H. (1997). A robust variable step–size
lms–type algorithm: Analysis and simulations, IEEE Transactions on Signal Processing
45: 631–639.
Arenas-Garcia, J., Figueiras-Vidal, A. R. & Sayed, A. H. (2006). Mean-square performance
of convex combination of two adaptive filters, IEEE Transactions on Signal Processing
54: 1078–1090.
Armbruster, W. (1992). Wideband Acoustic Echo Canceller with Two Filter Structure, Signal
Processing VI, Theories and Applications, J Vanderwalle, R. Boite, M. Moonen and A.
Oosterlinck ed., Elsevier Science Publishers B.V.
Azpicueta-Ruiz, L. A., Figueiras-Vidal, A. R. & Arenas-Garcia, J. (2008a). A new least squares
adaptation scheme for the affine combination of two adaptive filters, Proc. IEEE
Transient Analysis of a Combination of Two Adaptive Filters 311
Transient Analysis of a Combination
of Two Adaptive Filters 15
Widrow, B. & Hoff, M. E. J. (1960). Adaptive switching circuits, IRE WESCON Conv. Rec.,
pp. 96–104.
Zhang, Y. & Chambers, J. A. (2006). Convex combination of adaptive filters for a variable
tap–length lms algorithm, IEEE Signal Processing Letters 13: 628–631.
13
1. Introduction
In many signal processing applications, adaptive frequency estimation and tracking of noisy
narrowband signals is often required in communications, radar, sonar, controls, biomedical
signal processing, and the applications such as detection of a noisy sinusoidal signal and
cancellation of periodic signals. In order to achieve the objective of frequency tracking and
estimation, an adaptive finite impulse response (FIR) filter or an adaptive infinite impulse
response (IIR) notch filter is generally applied. Although an adaptive FIR filter has the
stability advantage over an adaptive IIR notch filter, it requires a larger number of filter
coefficients. In practical situations, an adaptive IIR notch filter (Chicharo & Ng, 1990; Kwan
& Martin, 1989; Nehorai, 1985) is preferred due to its less number of filter coefficients and
hence less computational complexity. More importantly, a second-order adaptive pole/zero
constrained IIR notch filter (Xiao et al, 2001; Zhou & Li, 2004) can effectively be applied to
track a single sinusoidal signal. If a signal contains multiple frequency components, then we
can estimate and track its frequencies using a higher-order adaptive IIR notch filter
constructed by cascading second-order adaptive IIR notch filters (Kwan & Martin, 1989). To
ensure the global minimum convergence, the filter algorithm must begin with initial
conditions, which require prior knowledge of the signal frequencies.
However, in many practical situations, a sinusoidal signal may be subjected to nonlinear
effects (Tan & Jiang, 2009a, 2009b) in which possible harmonic frequency components are
generated. For example, the signal acquired from a sensor may undergo saturation through
an amplifier. In such an environment, we may want to estimate and track the signal’s
fundamental frequency as well as any harmonic frequencies. Using a second-order adaptive
IIR notch filter to estimate fundamental and harmonic frequencies is insufficient, since it
only accommodates one frequency component. On the other hand, applying a higher-order
IIR notch filter may not be effective due to adopting multiple adaptive filter coefficients and
local minimum convergence of the adaptive algorithm. In addition, monitoring the global
minimum using a grid search method requires a huge number of computations, and thus
makes the notch filter impractical in real time processing. Therefore, in this chapter, we
propose and investigate a novel adaptive harmonic IIR notch filter with a single adaptive
coefficient to efficiently perform frequency estimation and tracking in a harmonic frequency
environment.
314 Adaptive Filtering
The proposed chapter first reviews the standard structure of a cascaded second-order
pole/zero constrained adaptive IIR notch filter and its associated adaptive algorithm. Second,
we describe the structure and algorithm for a new adaptive harmonic IIR notch filter under a
harmonic noise environment. The key feature is that the proposed filter contains only one
adaptive parameter such that its global minimum of the MSE function can easily be monitored
during adaptation. For example, when the input fundamental signal frequency has a step
change (the signal frequency switches to a different frequency value), the global minimum
location of the MSE function is also changed. The traditional cascaded second-order adaptive
IIR notch filter may likely converge to local minima due to its slow convergence; and hence an
incorrect estimated fundamental frequency value could be obtained. However, with the
proposed algorithms, when a possible local minimum is detected, the global minimum can
easily be detected and relocated so that adaptive filter parameters can be reset based on the
estimated global minimum, which is determined from the computed MSE function.
In this chapter, we perform convergence analysis of the adaptive harmonic IIR notch filter
(Tan & Jiang, 2009). Although such an analysis is a very challenging task due to the
extremely complicated plain gradient and MSE functions, with reasonable simplifications
we are still able to achieve some useful theoretical results such as the convergence upper
bound of the adaptive algorithm. Based on convergence analysis, we further propose a new
robust algorithm. Finally, we demonstrate simulation results to verify the performance of
the proposed adaptive harmonic IIR notch filters.
where A and are the amplitude and phase angle; and v(n) is a zero-mean Gaussian noise
process. f s and n are the sampling rate and time index, respectively.
To estimate the signal frequency, a standard second-second order adaptive IIR notch filter
(Zhou & Li, 2004) is applied with its transfer function given by
1 2 cos( )z 1 z2
H ( z) (2)
1 2 r cos( )z1 r 2 z2
The transfer function has one notch frequency parameter and has zero on the unit circle
resulting in an infinite-depth notch. r is the pole radius which controls the notch
bandwidth. It requires 0 r 1 for achieving a narrowband notch. Making the parameter
to be adaptive, that is, (n) . The filter output can be expressed as
Again, when r is close to 1, the 3-dB notch filter bandwidth can be approximated as
BW 2(1 r ) radians (Tan, 2007). Our objective is to minimize the filter output power
E[ y 2 (n)] . Once the output power is minimized, the filter parameter will converge to its
corresponding frequency f Hz. For a noise free case, the minimized output power should
be zero. Note that for frequency tracking, our expected result is the parameter (n) rather
than the filtered signal y(n) . A least mean square (LMS) algorithm to minimize the
instantaneous output power y 2 ( n) is often used and listed below:
where the gradient function (n) y(n) / (n) can be derived as following:
and u is the convergence factor which controls the speed of algorithm convergence. Fig. 2
illustrates the behavior of tacking and estimating a sinusoid with its frequency value of 1
kHz at a sampling rate of 8 kHz for a noise free situation. As shown in Fig. 2, the LMS
algorithm converges after 2600 iterations. Again, note that the estimated frequency is 1 kHz
2
x(n)
-2
0 1000 2000 3000 4000 5000 6000
2
y(n)
-2
0 1000 2000 3000 4000 5000 6000
3000
f(n) (Hz)
2000
1000
Fig. 2. Frequency tracking of a single sinusoid using a second-order adaptive IIR notch filter
(sinusoid: A 1 , f 1000 Hz, f s 8000 ; adaptive notch filter: r 0.95 and 0.005 )
316 Adaptive Filtering
while the filter output approaches to zero. However, when estimating multiple frequencies
(or tracking a signal containing not only its fundamental frequency but also its higher-order
harmonic frequencies), a higher-order adaptive IIR notch filter using cascading second-
order adaptive IIR notch filter is desirable.
K K
1 2 cos k (n)z 1 z2
H ( z) H k ( z) 1 2 2
(6)
k 1 k 1 1 2 r cos k ( n)z r z
The filter contains K stages ( K sub-filters). k (n) is the adaptive parameter for the k th
sub-filter while r is the pole radius as defined in Section 2.1. For an adaptive version, the
output from each sub-filter is expressed as
Note that the notch filter output y(n) is from the last stage sub-filter, that is, y(n) y K (n) .
After minimizing its instantaneous output power yK2 (n) , we achieve LMS update equations
as
where Kk ( n) is the gradient function which can be determined from the following
recursions:
x(n)
0
-2
0 0.5 1 1.5 2 2.5 3
4
x 10
2
y1(n)
0
-2
0 0.5 1 1.5 2 2.5 3
4
x 10
2
y2(n)
0
-2
0 0.5 1 1.5 2 2.5 3
4
x 10
2
y3(n)
0
-2
0 0.5 1 1.5 2 2.5 3
n (Iterations) 4
x 10
(a)
4000
3500
3000 f3(n)
f1(n), f2(n), f3(n) (Hz)
2500
2000
f2(n)
1500
1000
f1(n)
500
0
0 0.5 1 1.5 2 2.5 3
n (Iterations) 4
x 10
(b)
Fig. 3. Frequency tracking of a signal with its fundamental component, second and third
harmonics by using second-order adaptive IIR notch filters in cascade ( r 0.95 and
0.002 )
2625 Hz from 20001 to 30000 iterations, respectively. We can see that the algorithm
exhibits slow convergence when the fundamental frequency switches. Another problem is
that the algorithm may converge to a local minimum if it starts at arbitrary initial
conditions. As shown in Fig. 4, if the algorithm begins with initial conditions:
1 (0) 2 400 / f s 0.1 radians, 2 (0) 0.2 radians, 2 (0) 0.3 radians, it converges
to local minima with wrong estimated frequencies when the fundamental frequency of the
input signal is 1225 Hz. When this fundamental frequency steps to 1000 Hz and 875 Hz,
respectively, the algorithm continuously converges to local minima with incorrectly
estimated frequency values.
318 Adaptive Filtering
x(n)
0
-2
0 0.5 1 1.5 2 2.5 3
4
x 10
2
y1(n)
0
-2
0 0.5 1 1.5 2 2.5 3
4
x 10
2
y2(n)
0
-2
0 0.5 1 1.5 2 2.5 3
4
x 10
2
y3(n)
0
-2
0 0.5 1 1.5 2 2.5 3
n (Iterations) 4
x 10
(a)
3000
2500
2000
f1(n), f2(n), f3(n) (Hz)
f3(n)
1500
1000
500 f2(n)
f1(n)
0
0 0.5 1 1.5 2 2.5 3
n (Iterations) 4
x 10
(b)
Fig. 4. Frequency tracking of a single with its fundamental component, second and third
harmonics by using second-order adaptive IIR notch filters in cascade
(sinusoid: A 1 , f 1000 Hz, f s 8000 ; adaptive notch filter: r 0.95 and 0.005 ).
Adaptive Harmonic IIR Notch Filters for Frequency Estimation and Tracking 319
M
x( n ) Am cos[2 (mf )n / fs m ] v(n) (10)
m1
where Am , mf , and m are the magnitude, frequency (Hz), and phase angle of the m th
harmonic component, respectively. To estimate the fundamental frequency in such
harmonic frequency environment, we can apply a harmonic IIR notch filter with a structure
illustrated in Fig. 5 for the case of M 3 (three harmonics).
Im ( z )
M 3
r
R e( z )
Fig. 5. Pole-zero plot for the harmonic IIR notch filter for M 3
320 Adaptive Filtering
As shown in Fig. 5, to construct a notch filter transfer function, two constrained pole-zero
pairs (Nehorai, 1985) with their angles equal to m (multiple of the fundamental
frequency angle ) relative to the horizontal axis are placed on the pole-zero plot for
m 1, 2, , M , respectively. Hence, we can construct M second-order IIR sub-filters. In a
cascaded form (Kwan & Martin, 1989), we have
M
H ( z) H 1 ( z)H 2 ( z) H M ( z) H m ( z) (11)
m1
where H m ( z) denotes the mth second-order IIR sub-filter whose transfer function is defined
as
1 2 z 1 cos(m ) z2
H m ( z) (12)
1 2 rz 1 cos(m ) r 2 z2
We express the output ym (n) from the mth sub-filter with an adaptive parameter (n) as
m 1, 2, , M
with y 0 (n) x(n) . From (12), the transfer function has only one adaptive parameter (n)
and has zeros on the unit circle resulting in infinite-depth notches. Similarly, we require
0 r 1 for achieving narrowband notches. When r is close to 1, its 3-dB notch
bandwidth can be approximated by BW 2(1 r ) radians. The MSE function at the final
2
stage, E[ y M (n)] E[ e 2 (n)] , is minimized, where e(n) y M (n) . It is important to notice that
once the single adaptive parameter (n) is adapted to the angle corresponding to the
fundamental frequency, each m ( n) ( m 2, 3, , M ) will automatically lock to its harmonic
frequency. To examine the convergence property, we write the mean square error (MSE)
function (Chicharo & Ng, 1990) below:
2
1 M
1 2 z 1 cos[m ( n)] z2 dz
E e 2 (n) E y M
2
(n)
2 j
1 2rz1 cos[m (n)] r 2 z2 xx
z
(14)
m1
where xx is the power spectrum of the input signal. Since the MSE function in (14) is a
nonlinear function of adaptive parameter , it may contain local minima. A closed form
solution of (14) is difficult to achieve. However, we can examine the MSE function via a
numerical example. Fig. 6 shows the plotted MSE function versus for a range from 0 to
/ M radians [ 0 to f s /(2 M ) Hz] assuming that all harmonics are within the Nyquist limit
for the following conditions: M 3 , r 0.95 , f a 1000 Hz, f s 8000 Hz (sampling rate),
signal to noise power ratio (SNR)=22 dB, and 400 filter output samples. Based on Fig. 6, we
observe that there exit four (4) local minima in which one (1) global minimum is located at 1
kHz. If we let the adaptation initially start from any point inside the global minimum valley
(frequency capture range), the adaptive harmonic IIR notch filter will converge to the global
minimum of the MSE error function.
Adaptive Harmonic IIR Notch Filters for Frequency Estimation and Tracking 321
2.5
Frequency capture range
MSE 1.5
0.5
0
0 200 400 600 800 1000 1200 1400
Frequency (Hz)
Fig. 6. Error surface of the harmonic IIR notch filter for M 3 and r 0.95
m 1, 2, , M
with 0 (n) y0 (n) / (n) x(n) / (n) 0 , 0 (n 1) 0 ( n 2) 0 .
To prevent local minima convergence, the algorithm will start with an optimal initial value
0 , which is coarsely searched over the frequency range: /(180 M ) , ..., 179 /(180 M ) ,
as follows:
where the estimated MSE function, E[ e 2 (n , )]) , can be determined by using a block of N
signal samples:
1 N 1 2
E[ e 2 ( n , )]) y M ( n i , )
N i 0
(18)
322 Adaptive Filtering
There are two problems depicted in Fig. 7 as an example. When the fundamental frequency
switches from 875 Hz to 1225 Hz, the algorithm starting at the location of 875 Hz on the
MSE function corresponding to 1225 Hz will converge to the local minimum at 822 Hz. On
the other hand, when the fundamental frequency switches from 1225 Hz to 1000 Hz, the
algorithm will suffer a slow convergence rate due to a small gradient value of the MSE
function in the neighborhood at the location of 1225 Hz. We will solve the first problem in
this section and fix the second problem in next section.
To prevent the problem of local minima convergence due to the change of a fundamental
frequency, we monitor the global minimum by comparing a frequency deviation
f | f (n) f 0 | (19)
where f 0 0.5 f s 0 / Hz is the pre-scanned optimal frequency via (17) and (18); BW is the
3-dB bandwidth of the notch filter, which is approximated by BW (1 r ) f s / in Hz. If
1.8
Algorithm starts at 1225 Hz
and converges to 1000 Hz at slow speed
1.6
1.4
1.2
Algorithm starts at 875 Hz
and converges to 822 Hz
MSE
1
Local minimum at 822 Hz
0.8
Fig. 7. MSE functions for the fundamental frequencies, 875 Hz, 1000 Hz, and 1225 Hz
( M 3 , r 0.96 , N 200 , and f s 8 kHz)
f f max , the adaptive algorithm may possibly converge to its local minima. Then the
adaptive parameter (n) should be reset to its new estimated optimal value 0 using (17)
and (18) and then the algorithm will resume frequency tracking in the neighborhood of the
global minimum. The LMS type algorithm is listed in Table 1.
Adaptive Harmonic IIR Notch Filters for Frequency Estimation and Tracking 323
where
2 sin(m ) M
H (m )
(1 r )( e jm re jm ) k 1, k m
H k ( e jm ) B(m )m (22)
Considering the input signal x(n) in (10), we now can approximate the harmonic IIR notch
filter output as
M
y M (n) mAmB(m )cos[(m )n m m ] (n) v1 (n) (24)
m1
M M 2 n sin(n )z 1 M
2 r n sin(n )z 1
SM ( z ) H k ( z ) 1 2 2
H ( z) 1 2 2
(27)
n1
k 1, k n 1 2 r cos(n )z r z n 1 1 2 r cos( n )z r z
At the optimal points, m , the first term in (27) is approximately constant, since we can
easily verify that these points are essentially the centers of band-pass filters (Petranglia, et al,
1994). The second-term is zero due to H ( e jm ) 0 . Using (22) and (23), we can approximate
the gradient filter frequency response at m as
M 2 m sin(m )
SM ( e jm ) H k ( e jm ) jm
mB( m )(m ) (28)
k 1, k m (1 r )( e re jm )
M
M (n) mB(m )Am cos[(m )n m m ] v2 (n) (29)
m1
where v2 (n) is the noise output from the gradient filter. Substituting (24) and (29) in (15)
and assuming that the noise processes of v1 (n) and v2 (n) are uncorrelated with the first
summation terms in (24) and (29), it leads to the following:
M
E[ (n 1)] E[ (n)] m 2 Am2 B2 (m )E[ (n)] 2 E[ v1 (n)v2 ( n)] (31)
m1
v2 dz
2 j
E[ v1 (n)v2 (n)] H ( z )SM (1 / z ) (32)
z
where v2 is the input noise power in (10). To yield a stability bound, it is required that
M
1 m2 Am2 B2 (m ) 1 (33)
m1
M
( ) 2 / m2 Am2 B2 (m ) (34)
m1
The last term in (31) is not zero, but we can significantly suppress it by using a varying
convergence factor developed later in this section. Since evaluating (34) still requires
knowledge of all the harmonic amplitudes, we simplify (34) by assuming that each
frequency component has the same amplitude to obtain
M
( ) M / x2 m2 B2 ( m ) (35)
m1
where x2 is the power of the input signal. Practically, for the given M , we can numerically
search for the upper bound max which works for the required frequency range, that is,
Fig. 8 plots the upper bounds based on (36) versus M using x2 1 for r 0.8 , r 0.9 , and
r 0.96 , respectively.
We can see that a smaller upper bound will be required when r and M increase. We can
also observe another key feature described in Fig. 9.
-1
10
r=0.80
-2
10
r=0.90
umax
-3
10
r=0.96
-4
10
-5
10
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
M
r=0.96
0
0 200 400 600 800 1000 1200 1400
Frequency (Hz)
10
r=0.8
0
(b) |H(f)| (dB)
-10 r=0.9
-20 r=096
-30
-40
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency (Hz)
Fig. 9. (a) MSE functions; (b) Magnitude frequency responses ( M 3 , N 200 , f s 8 kHz)
As shown in Fig. 9, when the pole radius r is much smaller than 1 ( r 0.8 ), we will have a
larger MSE function gradient starting at 1225 Hz and then the convergence speed will be
increased. But using the smaller r will end up with a degradation of the notch filter frequency
response, that is, a larger notch bandwidth. On the other hand, choosing r close to 1
( r 0.96 ) will maintain a narrow notch bandwidth but result in a slow convergence rate, since
the algorithm begins with a small MSE function gradient value at 1225 Hz. Furthermore, we
expect that when the algorithm approaches to its global minimum, the cross-correlation c(n)
between the final filter output e(n) y M (n) and its delayed signal y M (n 1) becomes
uncorrelated, that is, c(n) E[ y M ( n)y M (n 1)] 0 . Hence, the cross-correlation measurement
can be adopted to control the notch bandwidth and convergence factor. We propose the
improved algorithm with varying bandwidth and convergence factor below:
where 0 1 , rmin r (n) rmin r 1 with rmin 0.8 (still providing a good notch
filter frequency response), max is the upper bound for r (n) rmin r , and is the
damping constant, which controls the speed of change for the notch bandwidth and
convergence factor. From (37), (38), and (39), our expectation is as follows: when the
algorithm begins to work, the cross-correlation c(n ) has a large value due to a fact that the
filter output contains fundamental and harmonic signals. The pole radius r (n) in (38) starts
with a smaller value to increase the gradient value of the MSE function at the same time the
Adaptive Harmonic IIR Notch Filters for Frequency Estimation and Tracking 327
step size u(n) in (39) changes to a larger value. Considering both factors, the algorithm
achieves a fast convergence speed. On the other hand, as c(n) approach to zeroe, r ( n) will
increases its value to preserve a narrow notch bandwidth while u(n) will decay to zero to
reduce a misadjustment as described in (31).
To include (37), (38), and (39) in the improved algorithm, the additional computational
complexity over the algorithm proposed in the reference (Tan & Jiang, 2009a, 2009b) for
processing each input sample requires six (6) multiplications, four (4) additions, two (2)
absolute operations, and one (1) exponential function operation. The new improved
algorithm is listed in Table 2.
where f a is the fundamental frequency and f s 8000 Hz. The fundamental frequency
changes for every n 2000 samples. Our developed algorithm uses the following parameters:
The upper bound max 2.14 10 4 is numerically searched using (35) for r 0.96 . The
behaviors of the developed algorithm are demonstrated in Figs. 10-13. Fig. 10a shows a plot
of the MSE function to locate the initial parameter, (0) 2 1222 / f s 0.3055 when the
fundamental frequency is 1225 Hz. Figs. 10b and 10c show the plots of MSE functions for
resetting initial parameters (0) 0.25 and (0) 0.22225 when the fundamental
frequency switches to 1000 Hz and then 875 Hz, respectively. Fig. 11 depicts the noisy input
signal and each sub-filter output. Figs. 12a to 12c depicts the cross correlation c(n ) , pole
radius r ( n) , and adaptive step size (n) . Fig. 12d shows the tracked fundamental
frequencies. As expected, when the algorithm converges, c(n ) approaches to zero
(uncorrelated), r ( n) becomes rmax 0.96 to offer a narrowest notch bandwidth. At the same
time, (n) approaches to zero so that the misadjustment can be reduced. In addition, when
0.9 0.9 1
MSE
MSE
0.5 0.5 0.6
2
x(n)
0
-2
0 1000 2000 3000 4000 5000 6000
2
y1(n)
0
-2
0 1000 2000 3000 4000 5000 6000
2
y2(n)
0
-2
0 1000 2000 3000 4000 5000 6000
2
y3(n)
0
-2
0 1000 2000 3000 4000 5000 6000
n (Iterations)
Fig. 11. Input and each sub-filter output for new adaptive harmonic IIR notch filter with
varying bandwidth and convergence factor
Adaptive Harmonic IIR Notch Filters for Frequency Estimation and Tracking 329
0.2
(a) c(n)
0
-0.2
0 1000 2000 3000 4000 5000 6000
1
(b) r(n)
0.9
0.8
0 1000 2000 3000 4000 5000 6000
-4
x 10
4
(c) u(n)
0
0 1000 2000 3000 4000 5000 6000
2000
(d) f(n) (Hz)
Fig. 12. (a) The cross correlation c(n ) between the notch filter output and its delayed output;
(b) Varying notch filter parameter r (n) ; (c) Varying convergence factor (n) ; (d) Tracked
fundamental frequencies f ( n) .
1500
1400
Adaptive harmonic
1300 IIR notch filter
1200
1100
f(n) (Hz)
1000
New adaptive harmonic
900 IIR notch filter
800
700
SNR=12 dB SNR=18 dB SNR=9 dB
600
500
0 1000 2000 3000 4000 5000 6000
n (Iterations)
the frequency changes from 1225 Hz to 1000 Hz, the algorithm starts moving away from its
original global minimum, since the MSE function is changed. Once the tracked frequency is
moved beyond the maximum allowable frequency deviation f max , the algorithm relocates
0 and reset (n) 0 ; and (n) is reset again after the frequency is switched from 1000 Hz
to 875 Hz. The improved algorithm successfully tracks the signal fundamental frequency
and its changes.
To compare with the algorithm recently proposed in the reference (Tan & Jiang, 2009b), we
apply the same input signal to the adaptive harmonic IIR notch filter using a fixed notch
bandwidth ( r 0.96 ) and a fixed convergence factor, 2.14 104 . As shown in Fig. 13,
the improved algorithm is much robust to noise under various SNR conditions. This is
because when the algorithm converges, the varying convergence factor approaches to zero
to offer a smaller misadjustment.
Fig. 14 shows the comparison of standard deviation of the estimated frequency between two
algorithms, where we investigate the following condition: f a 1000 Hz, M 3 using 5000
iterations. For the previous algorithm, we use r 0.96 and 10 4 while for the improved
algorithm, all the parameters are the same except that max 10 4 . For each algorithm, we
obtain the results using 50 independent runs under each given signal to noise ratio (SNR).
From Fig. 14, it is evident that the developed adaptive harmonic IIR notch filter with
varying notch bandwidth and convergence factor offers a significant improvement.
2
10
0
10
Standard deviation (Hz)
-1
10
-2
10
-3
10
-5
10
0 5 10 15 20 25 30
SNR (dB)
4. Conclusion
In this chapter, we have reviewed the standard adaptive IIR notch filters for applications of
single frequency and multiple frequency estimation as well as tracking in the harmonic
noise environment. The problems of slow convergence speed and local minima convergence
are addressed when applying a higher-order adaptive IIR notch filter for tracking multiple
frequencies or harmonic frequencies. We have demonstrated that the adaptive harmonic IIR
Adaptive Harmonic IIR Notch Filters for Frequency Estimation and Tracking 331
notch filter offers an effective solution for frequency estimation and tracking in a harmonic
noise environment. In addition, we have derived a simple and useful stability bound for the
adaptive harmonic IIR notch filter. In order to achieve the noise robustness and accuracy of
frequency tracking in the noisy environment, we have developed an improved adaptive
harmonic IIR notch filter with varying notch bandwidth and convergence factor. The
developed algorithm is able to prevent its local minima convergence even when the signal
fundamental frequency switches in the tracking process.
5. Acknowledgment
This work was supported in part by the Purdue University 2010 Summer Faculty Research
Grant, Indiana, USA.
6. References
Nehorai, A. (1985). A Minimal Parameter Adaptive Notch Filter with Constrained Poles and
Zeros. IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, No. 4, pp. 983-996,
August 1985.
Kwan, T. & Martin, K. (1989). Adaptive Detection and Enhancement of Multiple Sinusoids
Using a Cascade IIR Filter. IEEE Trans Circuits Syst., Vol. 36, No. 7, pp. 937-947,
July 1989.
Chicharo, J. & Ng, T. (1990). Gradient-based Adaptive IIR Notch Filtering for Frequency
Estimation. IEEE Trans. Acoust., Speech, Signal Process., Vol. 38, No. 5, pp. 769-777,
May 1990.
Zhou, J. & Li, G. (2004). Plain Gradient-based Direct Frequency Estimation Using Second-
order Constrained Adaptive IIR Notch filter. Electronics Letters, vol. 40, No. 5, pp.
351-352, March 2004.
Xiao, Y., Takeshita, Y. & Shida, K. (2001). Steady-state Analysis of a Plain Gradient
Algorithm for a Second-order Adaptive IIR Notch Filter with Constrained Poles
and Zeros. IEEE Trans. on Circuits and Systems-II: Analog and Digital Signal
Processing, Vol. 48, No. 7, pp 733-740, July 2001.
Tan, L. & Jiang, J. (2009). Novel Adaptive IIR Notch Filter for Frequency Estimation and
Tracking. IEEE Signal Processing Magazine, Vol. 26, Issue 6, pp. 168-189, November
2009.
Tan, L. & Jiang, J. (2009). Real-Time Frequency Tracking Using Novel Adaptive Harmonic
IIR Notch Filter. the Technology Interface Journal, Vol. 9, No. 2, Spring 2009
Tan, L. (2007). Digital Signal Processing: Fundamentals and Applications. pp. 358-362, ISBN:
978-0-12-374090-8, Elsevier/Academic Press, 2007.
Stoica, P., & Nehorai, A. (1998). Performance Analysis of an Adaptive Notch Filter with
Constrained Poles and Zeros. IEEE Trans. Acoust., Speech, Signal Process., Vol. 36,
No. 6, pp. 911-919, June 1998.
Handel, P. & Nehorai, A. (1994). Tracking Analysis of an Adaptive Notch Filterwith
Constrained Poles and Zeros. IEEE Trans. Signal Process., Vol. 42, No. 2, pp. 281-291,
February 1994.
332 Adaptive Filtering
Petraglia, M., Shynk, J. & Mitra, S. (1994). Stability Bounds and Steady-state Coefficient
Variance for a Second-order Adaptive IIR Notch Filter. IEEE Trans. Signal
Process., Vol. 42, No. 7, pp. 1841-1845, July 1994.
0
14
1. Introduction
Echo is defined as the delayed and attenuated version of the original signal produced by some
device, such as a loudspeaker. As a consequence a person listens to a delayed replica of its
own voice signal. This is an undesired effect that appears whenever the output signal is fed
back into the system’s input and it can be quite disturbing on voice conversations.
Echo arises in long distance communication scenarios such as hands-free systems Hänsler
(1994); Jeannès et al. (2001); Liu (1994), voice over internet protocol (VoIP) Witowsky (1999),
teleconferencing Kuo & Pan (1994), mobile phone conversation, and satellite communications
among others.
In order to minimize or even remove the presence of echo in communications, echo
suppression and echo cancellation techniques have been proposed in the last three
decades Sondhi (2006). An echo suppressor is a voice-operated switch that disconnects
the communication path (or introduces a very large attenuation) whenever some decision
mechanism indicates that we are in the presence of echo. The emitting circuit is disconnected
whenever we have signal on the reception part of the circuit; the reception circuit is
disconnected whenever we have signal emission. Their behavior is not adequate for cross
conversation (full duplex) scenarios. Echo suppressors were the first approach to this problem.
In the last decade, due to their unsatisfactory results, they have been replaced by digital echo
cancelers. An echo canceler device, as opposed to an echo suppressor, does not interrupt
the echo path; it operates by removing (subtracting) the detected echo replicas from the
information signal. The term usually coined for the cancellation of echoes with acoustic
coupling is acoustic echo cancellation (AEC) Gilloire & Hänsler (1994).
In the past years, adaptive filtering techniques Haykin (2002); Sayed (2003); Widrow & Stearns
(1985) have been employed for the purpose of AEC Breining (1999); Widrow et al. (1975).
Typically, these techniques rely on the use of finite impulse response (FIR) filters Oppenheim
& Schafer (1999); Veen & Haykin (1999) whose coefficients are updated along the time by an
efficient rule guided by some statistical criterion. Usually, one employs a gradient descent
technique in order to minimize some cost (error) function. The most popular of these
techniques is the Widrow-Hoff least mean squares (LMS) algorithm as well as its variants, that
minimize the mean square error (MSE) between two signals. Moreover, in many cases such
as real-time conversations over mobile phones, AEC algorithms must run in real-time to be
useful. We thus have the need for efficient implementations of echo cancellation techniques
on digital embedded devices like field programmable gate array (FPGA) and/or digital signal
processor (DSP), to fulfill real-time requirements of many applications, these days.
334
2 Adaptive Filtering
Will-be-set-by-IN-TECH
This chapter reviews and compares existing solutions for AEC based on adaptive filtering
algorithms. We also focus on real-time solutions for this problem on DSP platforms. Section
2 states the echo cancellation problem. Section 3 reviews some basic concepts of adaptive
filtering techniques and algorithms. Section 4 describes some existing solutions for AEC.
Section 5 details real-time implementations of AEC systems with DSP from Texas Instruments.
Section 6 presents some experimental results and Section 7 ends the chapter with some
concluding remarks and future directions and challenges for AEC techniques.
Fig. 1. The acoustic echo scenario. A version of the received signal trough the loudspeaker is
fed back into the microphone re-entering the system leading to undesired feedback. The
worst case happens when the feedback is sustained (non-decaying); this makes the most
annoying effect named as howling. Functions f and g represent the echo path on both ends.
• whenever someone makes a speech on some room; each room has its own acoustic
conditions leading to echo Antweller & Symanzik (1995);
• the two-wire/four-wire conversion Sondhi & Berkeley (1980) in telephony carried out by a
hybrid circuit on a telephone line, as depicted in Fig. 2, in which we consider a simplified
connection between two subscribers S1 and S2 .
Fig. 2. Simplified long distance connection between two subscribers, using a hybrid circuit
for the two-wire/four-wire conversion. The impedance mismatch on the two-wire/four-wire
conversion originates the return of a portion of the emitted signal. This causes echo on both
ends of the conversation system.
The subscriber loop connects the analog telephone with a two-wire line. In order to establish
a connection, the central office must connect the two-wire line from one subscriber to another.
This way, long distance telephone connections are four-wire connections with two-wires for
transmission and the other two for reception. The hybrid circuit is a device that establishes the
connection and conversion between the two-wire and the four-wire circuits. The connection
between S1 and H1 (or between S2 and H2 ) is a two-wire connection between the subscriber
and the central office. Between central offices, we have four-wire connections (between H1
and H2 ). Each two-wire connection is usually designated as the subscriber loop; in this portion
of the circuit, both directions of communication are supported in the same pair of wires.
The hybrid circuit H1 converts the signal from S2 to the two-wire connection to S1 , whitout
back reflection of energy from this signal. In practice, this is not possible to accomplish
because due to many varying characteristics such as the length of the subscriber loop, the
individual subscriber devices and line impedance; these factors altogether inhibit the perfect
separation between emission and reception signals. Since there is some energy reflection from
336
4 Adaptive Filtering
Will-be-set-by-IN-TECH
the emmited signal, as a consequence S2 (or S1 ) receives a delayed version of its own voice
signal with some attenuation and distortion.
Applications such as hands-free telephony, tele-conferencing and video-conferencing require
the use of acoustic echo cancellation (AEC) techniques to eliminate acoustic feedback from the
loudspeaker to the microphone Gay & J.Benesty (2000).
Echo cancellation is usually achieved by using an adaptive filter which attempts to synthesize
a replica of the echo signal and subtract it from the returned signal. An adaptive filter changes
its coefficients along the time; as a consequence it changes its frequency response in order to
satisfy the adaptation criterion. This is the principle illustrated in Fig. 3 in the AEC context.
The adaptive filter operates on the voice signal and tries to replicate the echo signal, which is
Fig. 3. The acoustic echo cancellation scenario for the setup of Fig. 1. Echo cancellation is
achieved by using an adaptive filter which attempts to synthesize a replica of the echo signal
and subtract it from the returned signal.
subtracted from the emitted signal. The adaptive filter imitates the echo path thus canceling its
effects. In the case of the two-wire/four-wire conversion, depicted in Fig. 2, AEC is performed
with the block diagram of Fig. 4.
Fig. 4. Echo cancellation on the telephone line using an adaptive filter. The adaptive filter
compensates the non-ideal behavior of the hybrid circuit.
The adaptive filter synthesizes a replica of the echo, which is subtracted from the returned
signal. This removes/minimizes the echo whitout interrupting the echo path. The adaptive
filter compensates the undesired effects of the non-ideal hybrid circuit.
Fig. 5. FIR and IIR structures for digital filtering. When these structures are applied on
adaptive filtering problems, their coefficients are time-changing, being updated by some rule.
338
6 Adaptive Filtering
Will-be-set-by-IN-TECH
with M zeros (given by qk ) and N poles (given by rk ). The zeros correspond to the direct
connections between input and output whereas the poles indicate the feedback connections.
It is well-known that the use of poles can lead to accomplish a given filter specification more
easily that using only zeros. However, for causal filters poles cause instability whenever
placed outside the unit circle in the z-plane; the adaptation algorithm has to assure stability.
IIR filters are, by definition, systems with infinite impulse response and thus we can
theoretically accommodate any echo path of largest length. In the case of IIR filters, the
adaptive filtering algorithm must take additional steps in order to keep the N poles of the
filter inside the unit circle on the z-plane. Since a FIR filter is stable independently of its
coefficients, adaptive filtering algorithms usually employ FIR filtering instead of IIR filtering.
Fig. 6. Block diagram of statistical filtering. The discrete filter is applied to the input signal
x [n] and produces output o [n], which is compared against the desired signal d[n]. The error
signal (to be minimized) is the difference between o [n] and d[n]. The discrete filter is learned
with the statistics from the input signal.
Fig. 7 shows the block diagram of a typical adaptive filtering application. The error signal is
used as input to the coefficient update algorithm of the adaptive (FIR or IIR) filter. As time
goes by and samples move along vector x, the coefficients in vector w are updated by some
Echo Cancellation
Echo Cancellation forSystems
for Hands-Free Hands-Free Systems 3397
Fig. 7. Block diagram of adaptive filtering. The discrete filter is applied to the input signal
x [n] and produces output o [n], which is compared against the desired signal d[n]. The error
signal is used as input to the coefficient update algorithm of the adaptive filter. The adaptive
filter coefficients are updated according to the error signal produced at each time instant.
rule using a statistical criterion. The minimization of the mean square error (MSE) between the
output and the desired signal, leads to minimization of the energy of the error signal. If we
define the error signal as the difference between the desired signal d[n] and the output signal
o [n], we get
e[n] = d[n] − o [n] = d[n] − xw T = d[n] − wx T . (7)
The energy of the error signal is defined as
∞ ∞
Ee = ∑ e2 [ n ] = ∑ (d[n] − o [n])2
n=−∞ n=−∞
∞
= ∑ (d2 [n] − 2d[n]o [n] + o2 [n]). (8)
n=−∞
R = E [xT x]
⎡ ⎤
x2 [n] x [ n ] x [ n − 1] . . . x [n] x [n − ( M − 1)]
⎢ x [ n − 1] x [ n ] x 2 [ n − 1] . . . x [n − 1] x [n − ( M − 1)] ⎥
=⎢
⎣ .. .. .. ⎥,
⎦ (11)
. . .
x [n − ( M − 1)] x [n] x [n − ( M − 1)] x [n − 1] . . . x [n − ( M − 1)]
2
and p is the vector with the cross-correlation between the desired and the input signal
The error surface is a multidimensional paraboloid with concavity aligned along the positive
axis, leading to a surface with a single minimum. In order to find this minimum, we search
for the optimal weight vector w∗ that leads to a null gradient
δE [e2 [n]]
∇ = 0. (13)
δw
For stationary input signals, the minimization of (13) leads to the Wiener filter Orfanidis
(2007); Widrow & Stearns (1985) solution that computes the optimal weight vector given by
w∗ = R−1 p, (14)
leading to the optimal solution in the MSE sense. However, Wiener filter is not adequate for
non stationary situations, because it requires prior knowledge of the statistical properties of
the input signal. This way, Wiener filter is optimal only if these statistics match with the ones
considered for the design of the filter. In cases which we do not have such information, it may
not be possible to design the Wiener filter, or at least it can not be optimal. We can overcome
this problem with a two step approach:
1. estimation of the statistical parameters of those signals;
2. compute the filter parameters.
For real-time applications, the computational complexity of this approach is prohibitive. An
efficient method to solve this problem consists on the use of an adaptive filter. These filters
exhibit a satisfactory behavior in environments in which there is no prior knowledge of the
statistical properties of the signals under consideration. The adaptive filter designs itself
automatically, with some recursive learning algorithm to update its parameters, minimizing
some cost function. It starts with a set of parameters which reflect the absence of knowledge of
the environment; if we have a stationary environment the adaptive filter parameters converge,
at each iteration step, to the Wiener filter solution given by (14) Haykin (2002); Sayed (2003);
Widrow & Stearns (1985). On a non-stationary environment, the adaptive filtering algorithm
provides a way to follow the variation of the statistical properties of the signal, along the time,
provided that this variation is slow enough.
Adaptive filtering has been used in a variety of applications such as communications, radar,
sonar, seismology, and biomedical engineering. Altough these applications come from such
different fields, they all share a common characteristic: using an input vector and a desired
response, we compute an estimation error; this error is then used to update the adaptive filter
coefficients. Table 1 summarizes classes and some applications of adaptive filtering. The echo
cancellation problem belongs to the class IV-Interference Cancellation.
Class Applications
I. Identification System Identification
Terrestrial Layer Modelization
II. Inverse Modelization Predictive Deconvolution
Adaptive Equalization
III. Prediction Linear Predictive Coding
Auto-Regressive Spectral Analysis
Signal Detection
IV. Interference Cancellation Noise Cancellation
Echo Cancellation
Adaptive Beamforming
Table 1. Classes and applications of adaptive filtering techniques.
The adaptive filter coefficients adjustment has been addressed using the least mean squares
(LMS) adaptive filtering algorithm and its variants Haykin (2002); Widrow & Stearns (1985).
The choice of the adaptive filter adjustment coefficients must be made taking into account
some important characteristics of the algorithm such as:
• rate of convergence and precision;
• numerical complexity;
• filter structure and stability.
The rate of convergence is defined as the number of iterations that the algorithm requires,
operating on stationary signals, to approximate the optimal Wiener solution. The higher
the rate of convergence, the faster the algorithm adapts to non stationary environments with
unknown characteristics.
The numerical complexity is the number of operations needed to complete an iteration of the
algorithm. Depending on the adaptation algorithm and on the processor where the algorithm
runs, it is possible to have numerical problems. The most common source of problems of
this kind is the so-called finite precision effects, due to limited number of bits used to represent
data types for coefficients and samples. As the algorithm performs its computations, it is
possible to accumulate quantization errors. If this situation is not monitored, the adaptive
filter coefficients may enter into an overflow (or underflow) problem. These factors prevent
that the adaptive algorithm converges to an acceptable solution. In the worst case of overflow
and underflow, we say that the algorithm is numerically unstable.
The filter stability is assured by choosing a FIR filter; whatever are its coefficients, a FIR filter is
always stable since it has no feedback from the output to the input. If the filter input becomes
zero at a given time instant, then its output will be zero after M/FS , where M is the order
of the filter and Fs is the sampling frequency. The longest echo path (time delay) that we can
accommodate is given by the order of the filter; long echo paths lead to higher order filters that
are computationally more demanding to store in memory and to update in real-time. Thus,
the computational complexity that we can put into the adaptive filter limits the longest echo
path.
On the other hand, if we use an adaptive IIR filter care must be taken to assure stability. In
any case (FIR or IIR filter) we have a time varying impulse response, due to the adaptation of
the filter coefficients.
342
10 Adaptive Filtering
Will-be-set-by-IN-TECH
3.3 Adaptive filter update: Least mean squares algorithm and variants
In this section, we review the most common adaptive filtering techniques. For the purpose of
explanation we consider adaptive FIR filters. The least mean square (LMS) algorithm operates
by updating the filter coefficients w according to a gradient descent rule given by
where w(i) is the value of the coefficient vector w at time instant i, μ is the step size, e(i) is
the error signal and x(i) represents the present and previous samples at the filter taps, at time
instant i.
The LMS step μ must be chosen carefully in order to assure proper convergence. A small μ
will assure convergence but the rate of convergence can be quite slow that is not adequate
for a real-time conversation. A large μ can cause that the LMS algorithm does not find an
adequate local minimum, thus leading to unsatisfactory echo cancellation. It can be shown
Haykin (2002); Widrow & Stearns (1985) that the choice
1
0<μ< , (16)
( M + 1) P(i)
assures adequate convergence rate; M is the order of the adaptive filter and P(i) is the average
power of the signal present in the input of the filter. In Özbay & Kavsaoğlu (2010), the choice
of the optimal value for the LMS adaptation step is discussed.
for NLMS, which has a relatively low convergence speed but it is quite stable and has low
complexity. The leaky least mean squares (LLMS) algorithm is another variant of LMS Kuo &
Lee (2001), which introduces a leakage factor β such that (0 < β < 1) into the coefficient
update
w(i+1) ← βw(i) + 2μe(i) x(i) . (19)
Thus, for the following iteration we use a portion of the coefficient value (they have leaks
caused by β). The NLMS and LLMS can be used simultaneously in the coefficient update
process, leading to
w(i+1) ← βw(i) + 2μ(i) e(i) x(i) . (20)
Echo Cancellation
Echo Cancellation forSystems
for Hands-Free Hands-Free Systems 343
11
The recursive least squares algorithm (RLS) Haykin (2002) recursively computes the filter
coefficients such that minimize a weighted linear least squares cost function. Notice that the
LMS algorithm aims at minimizing the mean square error. RLS algorithm considers the input
signals as deterministic, while for the LMS and variants these are considered as instances of
stochastic processes. RLS has extremely fast convergence at the expense of high computational
complexity, which makes it uninteresting for real-time implementations. The main drawback
of RLS is its poor performance when the filter to be estimated changes its statistical properties.
A LMS variant named frequency-response-shaped least mean squares (FRS-LMS) was proposed
in Kukrera & Hocanin (2006) and shown to have good convergence properties. The
FRS-LMS algorithm has improved performance when a sinusoidal input signal is corrupted
by correlated noise. The algorithm shapes the frequency response of the transversal filter.
This shaping action is performed on-line using an additional term similar to leakage β in
LLMS shown in (19). This term involves the multiplication of the filter coefficient vector by
a matrix, and it is computed efficiently with the fast Fourier Transform (FFT) Oppenheim &
Schafer (1999). The authors show analytically that the adaptive filter converges in both the
mean and mean-square senses. They also analyze the filter in the steady state in order to
show its frequency-response-shaping capability. The experimental results show that FRS-LMS
is very effective even for highly correlated noise.
The decoupled partitioned block frequency domain adaptive filter (DPBFAD) approach is quite
demanding regarding computational requirements in the number of operations as well as the
required memory, when compared to LMS, NLMS, and LLMS. The coefficients are updated
in blocks, to allow real-time implementation, on the frequency domain. It requires the
computation of the DFT (using FFT) and its inverse at each iteration.
Table 2 compares three algorithms that have proven effective for adaptive filtering, namely
the least mean squares (LMS), recursive least squares (RLS), and fast RLS (also known as fast
Kalman) Orfanidis (2007). This list is by no means exhaustive, since there are many different
algorithms for this purpose.
Algorithm Rate of Convergence Complexity Stability
LMS Slow Low Stable
RLS Fast High Stable
Fast RLS Fast Low Unstable
Table 2. Rate of convergence, complexity, and stability of well-known adaptive filtering
algorithms.
is employed. The resulting algorithm enables long-distance echo cancellation with low
computational requirements. It reaches high echo return loss enhancement (ERLE) and shows
fast convergence. The key issue to use centered adaptive filters is that the echo-path impulse
response is characterized mainly by two active regions, corresponding to the near-end and the
far-end signal echo respectively, as shown in Fig. 8.
Fig. 8. Typical echo path impulse response (adapted from Marques et al. (1997)). We have
two active regions corresponding to the near-end and far-end echo, respectively.
Each active region has a length usually much shorter than the total supported echo-path
length. The proposed system is based on time-delay estimators to track the position of these
active regions, where short-length adaptive filters have to be centered.
Fig. 9 shows the impulse response of an acoustic echo path, resulting from the direct coupling
between the speaker and the microphone of an IRISTEL telephone. Although the supported
echo path length is 64 delay elements, only a small region is active. Knowing its position and
length, the adaptive filter has to adjust only the corresponding coefficients.
Fig. 9. Acoustic echo path impulse response for an IRISTEL telephone. Most of the
coefficients are near zero and only a small subset of the coefficients has a significant value
(adapted from Marques et al. (1997)).
In Marques et al. (1997) the authors compare the traditional full-tap FIR and short-length
centered filter solution in an echo path with a delay of half a second. The conventional
structure converges to a solution where the ERLE is less than 10 dB, while the centered filter
achieves approximately 80 dB, as depicted in Fig. 10.
Fig. 10. Convergence of the traditional FIR structure compared to the centered adaptive filter,
for a delay of 4000 taps (adapted from Marques et al. (1997)).
Fig. 11. The block diagram of the proposed system with two echo cancelers and a speech
detector. Each echo canceler has a short-length centered adaptive filter and a time delay
estimator (from Marques et al. (1997)).
The speech detector is very important in echo cancellation systems where double talking may
occur (full duplex mode) as this situation originates the abrupt increase of the adjustment
error. The common solution of using adaptive FIR filters to approximate the echo-path
impulse response becomes insufficient; if this situation occurs and no action is taken, drift
of the adaptive filter coefficients is possible Johnson (1995). Additionally, in this system,
erroneous time-delay estimation (TDE) may happen. To overcome this problem, the strategy
is to inhibit the filters adjustment and the delay estimation when double talking is detected.
In Fig. 12 a centered adaptive filter example is shown. The supported echo path length is
M taps, the position of the active region is ( M − 1) Ts and for illustration purposes only, the
considered length is 3 taps. Ts = 1/Fs is the sampling time, that is, the time between two
consecutive samples. The main advantages of the centered adaptive filter, as compared to the
typical full-length FIR solution, are:
346
14 Adaptive Filtering
Will-be-set-by-IN-TECH
Fig. 12. Centered adaptive filter. The supported echo path length is Na taps, but considering
an active region of 3 taps, only the corresponding 3 coefficients need adjustment.
• reduced computational cost, due to the lower number of coefficients that need adjustment,
when compared with the total supported echo path length;
• greater convergence speed, since the adaptation step can now be larger;
• reduced residual error because the coefficients which would otherwise converge to zero,
now take precisely that value.
To adjust the short length centered adaptive filter coefficients, the NLMS algorithm was
employed, due to its adequacy in the presence of speech signals Haykin (2002).
where < ., . > denotes the inner product between its two arguments. Essentially, DCC is the
inner product between x [n] and shifted versions of y[n]. The DCC can be computed efficiently
on the frequency domain using the fast Fourier transform (FFT) and its inverse Oppenheim &
Schafer (1999) by
DCCxy [k] = IFFT[FFT[ x [n]]. ∗ FFT[y[−n]]], (22)
using L-point FFT/IFFT and .∗ being the dot product between vectors. The maximum value
of DCCxy [k] corresponds to the estimated location of the delay k̂ = arg maxk DCCxy [k]; the
value of td as in (1) is given by td = k̂Ts . The ASDF estimator is given by
1 L −1
L k∑
ASDFxy [k] = ( x [n] − y[n − k])2 , (23)
=0
which is similar to the Euclidian distance between two signals. Finally, the AMDF estimator
computes
1 L −1
L k∑
AMDFxy [k] = | x [n] − y[n − k]|, (24)
=0
Echo Cancellation
Echo Cancellation forSystems
for Hands-Free Hands-Free Systems 347
15
with the advantage, over the previous two measures, that it requires no multiplications to
measure the similarity between two signals. For ASDF and AMDF we are interested in finding
indexes k̂ = arg mink ASDFxy [k] and k̂ = arg mink AMDFxy [k].
Supported on many experimental tests, the authors in Marques et al. (1997) chose the DCC
method, because it outperforms the other two (AMDF and ASDF) for low signal-to-noise ratio
(SNR) scenarios. The TDE component is the most demanding block on the entire system; it
takes about 90 % of the total processing time.
update is carried out simultaneously with the update of the centered filter coefficients. In the
experimental tests, this delay estimator with low complexity, has obtained good results even
in situations of low signal-to-noise ratio. The number of coefficients that need adjustment is
small when compared with the total number of elements in the supported delay line, thus
enabling a larger step.
Fig. 13. The block diagram of the Development Starter Kit C6713 developed by Spectrum
Digital for Texas Instruments (adapted from Spectrum Digital Inc., DSP Development
Systems (2003)).
The filters are managed as circular buffers and inline functions are used whenever possible.
The sampling rate is 8 kHz, and the number of bits per sample is 16 (the minimum allowed
by the AIC23 codec), suited for speech signals. This way, we have 125 μs between two
consecutive samples, and this is the maximum processing time to meet real-time requirements
(28125 instructions, under a 225 MHz clock). The time delay estimator has the largest amount
of total processing time, being not possible to completely update the time delay estimation,
within 125 μs. Between two consecutive samples, we update only a small portion of the filter
coefficients.
6. Experimental results
This section presents some experimental results obtained with the AEC systems described in
Subsections 4.1 and 4.2, respectively.
Feature Value
Maximum supported echo delay 381 ms // 2643 ms
Maximum length of dispersion area 4 ms
Absolute delay 0.375 ms
Minimum attenuation on the returned echo 6 dB
Convergence Improvement of 41 dB in 80 ms
Residual echo level -51 dBm0
Speech detector level 6 dB below emission level
Hold time after speech detection 75 ms
Table 4. Main features of the AEC approach with TMS320C50 DSP described in
Subsection 4.1. The maximum supported echo delay depends on the amount of
internal//external memory.
Fig. 14 shows the ERLE (in dB) obtained by the AEC system with simulated, electric, and real
echo paths, as a function of time. As expected, we get the best results on the simulated echo
path, due to the adequacy of the adaptive filter to this path. The electric echo path is easier
to cancel than the acoustic echo path, in which due to its non-linearities, the experimental
results show less attenuation than for the other two paths. Even on the acoustic echo path
which is the most difficult, we rapidly get 10 dB of attenuation, in less than 30 ms (which is
roughly the delay time that a human user perceives the echo); this attenuation stops about -20
dB which is a very interesting value. In summary, ERLE is greater than 41 dB in just 80 ms in
a simulated echo path; with real electrical and acoustic echo paths, 24.5 dB and 19.2 dB have
been measured, respectively.
Fig. 14. Echo canceller ERLE in simulated, electric and acoustic paths. On the acoustic path,
which is the most difficult we get about 10 dB of attenuation in less than 30 ms.
Table 5 compares this system with the CCITT G.165 recommendation, for a real situation, on
the following tests:
• CR - Convergence Rate;
• FERLAC - Final Echo Return Loss After Convergence;
• IRLC - Infinite Return Loss Convergence;
• LR - Leak Rate.
352
20 Adaptive Filtering
Will-be-set-by-IN-TECH
We conclude that the system exceeds the recommendation levels for all these tests. The CR
and FERLAC measures are taken on the single-talk scenario. Fig. 15 shows the time delay
estimator ability to track time varying delays in the presence of real speech signals. On the
voiced parts of the speech signals the TDE block tracks the delays accurately.
Fig. 15. Real speech signal (top) and real/estimated delay obtained by the TDE module. The
TDE block has a good performance on the presence of real speech. Adapted from Marques
et al. (1997).
In Fig. 16 the usefulness of the speech detector to prevent the filter coefficient drift is
emphasized. In the presence of double talk, with the speech detector disabled the coefficient
drift happens.
Echo Cancellation
Echo Cancellation forSystems
for Hands-Free Hands-Free Systems 353
21
Fig. 16. The speech detector prevents filter coefficient drift in the case of double talk. With the
speech detector disabled, coefficient drift happens. Adapted from Marques et al. (1997).
Feature Value
Absolute delay 0.375 ms
Convergence speed 27 dB (125 ms)
Digitalization Fs = 8000Hz n = 16 bit/sample
Hold time after speech 75 ms
Max. length 256 ms
Max. length of dispersion area 4 ms
Max. memory (data + code) < 192 kB
Residual echo -42.26 dBm0
Returned echo minimum loss 6 dB
Speech detector 6 dB below threshold
Table 6. Main features of the AEC approach with TMS320C6713 DSP described in
Subsection 4.2.
have 32 or 64 coefficients, while FIR-based time delay estimator uses up to M=4000 coefficients
(delays up to 0.5 seconds), giving a reasonable range of delays, suited for several applications.
Fig. 17 shows the (typical) centered adaptive filter coefficients (with 32 active coefficients), for
a speech signal. Only a small subset of coefficients is far from zero according to the echo path
characteristics, as expected; this is a typical test situation. Fig. 18 displays the echo and error
Fig. 17. Centered adaptive filter coefficients. Only a small subset of coefficients is far from
zero.
signals for a speech signal, while Fig. 19 displays the achieved attenuation, of about 20 dB, for
the speech signal on its voiced parts. It is interesting to note that how attenuation increases on
the voiced parts of the speech signal, according to the AEC fundamental concepts presented
in Subsections 2.1 and 2.2.
Fig. 18. Echo (top) and error (bottom) signal. Whenever there is echo with higher energy the
adaptive filter error signal follows it. On its portions with higher energy, the error signal
shows a decaying behavior that corresponds to the convergence of the adaptive filter.
Fig. 19. Attenuation obtained for the speech signal of Fig. 18. We have increased attenuation
on the voiced parts of the speech signal.
Table 7 compares our system with the CCITT G.165 recommendation levels, for a real
conversation. We conclude that this system approaches the recommendation levels for
Echo Cancellation
Echo Cancellation forSystems
for Hands-Free Hands-Free Systems 355
23
FERLAC and IRLC measures, matches for CR and exceeds it for the LR measure. The CR
and FERLAC measures are taken on the single-talk scenario.
Fig. 20 displays the attenuation obtained for several electric and acoustic echo paths, using
the average power of the received echo as the reference value, because the attenuation on the
acoustic channel is not constant along these tests. The attenuation for the simulated echo path
Fig. 20. Attenuation for the echo paths real (acoustic), electric and simulated (real-time on
CCS).
is much larger than the other two, as expected. On the other hand, the attenuation for the
electric echo path is around 30 dB, which is a considerable value. Finally, for the acoustic
path we get about 10 dB of attenuation, which is also an acceptable practical value. This
result was expected due to the strong non-linearities in the acoustic echo path, caused by the
loudspeakers and microphone.
7. Conclusions
In this chapter, we have addressed the problem of acoustic echo cancellation. Echo being
a delayed and attenuated version of the original signal produced by some device, such as
a loudspeaker, causes disturbing effects on a conversation. This problem arises in many
telecommunication applications such those as hands-free systems, leading to need of its
cancellation in real-time. The echo cancellation techniques have better performance than the
oldest and simpler echo suppression techniques.
We have reviewed some concepts of digital, statistical, and adaptive filtering. Some solutions
for real-time acoustic echo cancellation based on adaptive filtering techniques, on digital
signal processors were described in detail.
We have addressed some implementation issues of adaptive filtering techniques in real-time.
After the description of the acoustic echo cancellation solutions in some detail, we have
focused on their real-time implementations on well known digital signal processor platforms,
356
24 Adaptive Filtering
Will-be-set-by-IN-TECH
discussing its main characteristics as well as their experimental results according to standard
measures.
8. Acknowledgments
This work has been partly supported by the Portuguese Fundação para a Ciência e Tecnologia
(FCT) Project FCT PTDC/EEA-TEL/71996/2006.
9. References
Antweller, C. & Symanzik, H. (1995). Simulation of time variant room impulse responses,
IEEE International Conference on Acoustics, Speech, and Signal Processing - ICASSP’95,
Vol. 5, Detroit, USA, pp. 3031–3034.
Benesty, J., Gaensler, T., Morgan, D., Sondhi, M. & Gay, S. (2001). Advances in Network and
Acoustic Echo Cancellation, Springer-Verlag.
Birkett, A. & Goubran, R. (1995). Acoustic echo cancellation using NLMS-neural network
structures, IEEE International Conference on Acoustics, Speech, and Signal Processing -
ICASSP’95, Detroit, USA.
Breining, C. (1999). Acoustic echo control. an application of very-high-order adaptive filters,
Digital Signal Processing 16(6): 42–69.
Ferreira, A. & Marques, P. (2008). An efficient long distance echo canceler, International
Conference on Signals and Electronic Systems, ICSES’08, Krakow, pp. 331–334.
Gay, S. & J.Benesty (2000). Acoustic signal processing for telecommunications, Kluwer Academic
Publishers.
Gilloire, A. & Hänsler, E. (1994). Acoustic echo control, Annals of Telecommunications
49: 359–359. 10.1007/BF02999422.
URL: http://dx.doi.org/10.1007/BF02999422
Greenberg, J. (1998). Modified LMS algorithms for speech processing with an adaptive noise
canceller, IEEE Transactions on Signal Processing 6(4): 338–351.
Hänsler, E. (1994). The hands-free telephone problem: an annotated bibliography update,
Annals of Telecommunications 49: 360–367. 10.1007/BF02999423.
URL: http://dx.doi.org/10.1007/BF02999423
Haykin, S. (2002). Adaptive Filter Theory, 4th edn, Prentice-Hall.
Instruments, T. (1986). Digital voice echo canceler with a TMS32020, I: 415–454.
Instruments, T. (1993). TMS 320C5X users guide.
Jacovitti, G. & Scarano, G. (1993). Discrete time techniques for time delay estimation, IEEE
Transactions on Signal Processing 41(2): 525–533.
Echo Cancellation
Echo Cancellation forSystems
for Hands-Free Hands-Free Systems 357
25
Jeannès, R., Scalart, P., Faucon, G. & Beaugeant, C. (2001). Combined noise and echo reduction
in hands-free systems: A survey, IEEE Transactions on Speech And Audio Processing
9(8): 808–820.
J.Ni & Li, F. (2010). Adaptive combination of subband adaptive filters for acoustic echo
cancellation, IEEE Transactions on Consumer Electronics 56(3): 1549 – 1555.
Johnson, C. (1995). Yet still more on the interaction of adaptive filtering, identification, and
control, IEEE Signal Processing Magazine pp. 22 – 37.
Kehtarnavaz, N. (2004). Real-Time Digital Signal Processing, Newnes.
Krishna, E., Raghuram, M., Madhav, K. & Reddy, K. (2010). Acoustic echo cancellation using
a computationally efficient transform domain lms adaptive filter, 10th International
Conference on Information Sciences Signal Processing and their Applications (ISSPA),
Kuala Lumpur, Malaysia, pp. 409–412.
Kukrera, O. & Hocanin, A. (2006). Frequency-response-shaped LMS adaptive filter, Digital
Signal Processing 16(6): 855–869.
Kuo, S. & Lee, B. (2001). Real-Time Digital Signal Processing, John Wiley & Sons.
Kuo, S. & Pan, Z. (1994). Acoustic echo cancellation microphone system for large-scale
videoconferencing, IEEE International Conference on Acoustics, Speech, and Signal
Processing - ICASSP’94, Vol. 1, Adelaide, Australia, pp. 7–12.
Liu, Z. (1994). A combined filtering structure for echo cancellation in hand-free mobile phone
applications, International Conference on Signal Processing Applications & Technology,
Vol. 1, Dallas, USA, pp. 319–322.
Marques, P. (1997). Long distance echo cancellation using centered short length transversal filters,
Master’s thesis, Instituto Superior Técnico, Lisbon, Portugal.
Marques, P. & Sousa, F. (1996). TMS320C50 echo canceller based on low resolution time delay
estimation, The First European DSP Education and Research Conference, Paris.
Marques, P., Sousa, F. & Leitao, J. (1997). A DSP based long distance echo canceller using short
length centered adaptive filters, IEEE International Conference on Acoustics, Speech, and
Signal Processing - ICASSP’97, Munich.
Messerschmidt, D., Hedberg, D., Cole, C., Haoui, A. & Winship, P. (1986). Digital voice echo
canceller with a TMS320C30, Digital Signal Processing Applications 1: 415–454.
Oh, D., Shin, D., Kim, S. & Lee, D. (1994). Implementation of multi-channel echo canceler for
mobile communication with TMS320C50, International Conference on Signal Processing
Applications & Technology, Vol. 1, Dallas, USA, pp. 259–263.
Oppenheim, A. & Schafer, R. (1999). Discrete-Time Signal Processing, 2nd edn, Prentice Hall
International, Inc.
Orfanidis, S. (ed.) (2007). Optimum Signal Processing, 2nd edn, McGraw-Hill.
Özbay, Y. & Kavsaoğlu, A. (2010). An optimum algorithm for adaptive filtering on acoustic
echo cancellation using tms320c6713 dsp, Digit. Signal Process. 20: 133–148.
URL: http://portal.acm.org/citation.cfm?id=1663653.1663859
Sayed, A. (2003). Fundamentals of Adaptive Filtering, Wiley-IEEE Press.
Sondhi, M. (2006). The history of echo cancellation, IEEE Signal Processing Magazine 23(5): 95
– 102.
Sondhi, M. & Berkeley, D. (1980). Silencing echoes on the telephone network, Proceedings of
the IEEE 68(8): 948–963.
Spectrum Digital Inc., DSP Development Systems (2003). TMS320C6713 DSK technical
reference.
Veen, B. & Haykin, S. (1999). Signals and Systems, John Wiley & Sons.
358
26 Adaptive Filtering
Will-be-set-by-IN-TECH
Widrow, B., Glover, J., McCool, J., Kaunitz, J., Williams, C., Hearn, R., Zeidler, J., Dong,
E. & Goodlin, R. (1975). Adaptive noise cancelling: Principles and applications,
Proceedings of the IEEE 63(12): 1692 – 1716.
Widrow, B. & Stearns, S. (1985). Adaptive Signal Processing, Prentice Hall.
Witowsky, W. (1999). Understanding echo cancellation in voice over IP networks, International
Conference on Signal Processing Applications and Technology (ICSPAT), Orlando.
15
1. Introduction
The heterodyne process has been an important part of electronic communications systems
for over 100 years. The most common use of the heterodyne process is in modulation and
demodulation where a local oscillator produces the heterodyne signal which is then mixed
with (multiplied by) the signal of interest to move it from one frequency band to another.
For example, the superheterodyne receiver invented by U.S. Army Major Edwin Armstrong
in 1918 uses a local oscillator to move the incoming radio signal to an intermediate band
where it can be easily demodulated with fixed filters rather than needing a variable filter or
series of fixed filters for each frequency being demodulated (Butler, 1989, Duman 2005).
Today you will find heterodyne as a critical part of any modern radio or TV receiver, cell
phone, satellite communication system, etc.
In this chapter we will introduce the concept of making a tunable or adaptive filter using the
heterodyne process. The concept is very similar to that of the superheterodyne receiver, but
applied to tunable filters. Most tunable filters require a complicated mechanism for
adjusting the coefficients of the filter in order to tune the filter. Using the heterodyne
approach, we move the signal to a fixed filter and then move the signal back to its original
frequency band minus the noise that has been removed by the fixed filter. Thus complicated
fixed filters that would be virtually impossible to tune using variation of the filter
parameters can be easily made tuneable and adaptive.
Spectrum (THSS, e.g.: IEEE 802.15) in which the carrier is turned on and off by the random
code sequence, (4) Chirp Spread Spectrum (CSS, e.g.: IEEE 802.15.4a-2007) which uses
wideband linear frequency modulated chirp pulses to encode the information, and (5) Ultra
Wide Band (UWB, e.g.: IEEE 802.15.3a – Note: No standard assigned, MB-OFDM and DS-
UWB will compete in market) based on transmitting short duration pulses.
When working properly, the narrow-band transmissions licensed to the frequency spectrum
do not affect the broadband systems. They either interfere with a small portion of the broad-
band transmission (which may be re-sent or reconstructed) or the narrow-band signals are
themselves spread by the receiver demodulation process (Pickholtz et al., 1982). However,
in practice the narrow-band transmissions can cause serious problems in the spread-
spectrum receiver (Coulson, 2004, McCune, 2000). To alleviate these problems, it is often
necessary to include narrow-band interference attenuation or suppression circuitry in the
design of the spread-spectrum receiver. Adaptive heterodyne filters are an attractive
approach for attenuation of narrow-band interference in such broadband systems. Other
approaches include smart antennas and adaptive analog and digital filters, but adaptive
heterodyne filters are often a good choice for attenuation of narrow band interference in
broadband receivers (Soderstrand, 2010a).
cos(ω0n)
x(n) y(n)
( ) ⇔ ( ) (1)
Adaptive Heterodyne Filters 361
( ) cos( )= ( )[ + ]⇔ [ + ] (2)
From equation (2) we can clearly see the separation of the input signal into two signals, one
translated in frequency by rotation to the left ω0 and the other translated in frequency by
rotation to the right ω0 in the z-plane. In a modulation system, we would filter out the lower
frequency and send the higher frequency to the antenna for transmission. In a demodulator,
we would filter out the higher frequency and send the lower frequency to the IF stage for
detection.
cos(ω0n) cos(ω0n)
u(n) v(n)
x(n) H(z) y(n)
Fig 2. Simple tunable heterodyne band-pass filter (H(z) must be a narrow-band low-pass
filter)
Using the same analysis as equation (2) we obtain:
( ) = ( ) cos( )= ( )[ + ]⇔ ( )= [ + ] (3)
( )= [ + ] ( ) (4)
( )= + ( )+ ( )+ ( ) (5)
Equation (5) is obtained by the straight-forward application of equation (1) for the
multiplication of equation (4) by the cosine heterodyne function. Equation (5) consists of
four separate terms. If H(z) is a narrow-band low-pass filter, then the first two terms of
equation (5) represent a narrow-band band-pass filter centered at the heterodyne frequency
ω0.
( )= + ( ) (6)
This narrow-band band-pass filter has only half the energy, however, because the other half
of the energy appears in the high-frequency last terms in equation (5).
( )= ( )+ ( ) (7)
362 Adaptive Filtering
Before invoking the above script each of the input values was set (inp=1, npoints=1000, ω0 =
π/5, ω0 = 2π/5, ω0 = 3π/5 and ω0 = 4π/5). The filter H(z) was selected as an inverse-
Chebyshev filter designed as [b,a] = cheby2(11, 40, 0.1). As can be seen from Figure 3, we
have been able to implement a tunable narrow-band band-pass filter that can be tuned by
the changing the heterodyne frequency.
Figure 4 shows a MatLab simulation of the ability of the circuit of Figure 2 to attenuate
frequencies outside the band-pass filter and pass frequencies inside the bandwidth of the
band-pass filter. The following MatLab script (GENINP) generates an input signal
consisting of nine cosine waves spaced by π/10 in the z-plane:
Adaptive Heterodyne Filters 363
Fig. 3a. Tunable band-pass filter with = /5 Fig. 3b. Tunable band-pass filter with = 2 /5
Fig. 3c. Tunable band-pass filter with = 4 /5 Fig. 3d. Tunable band-pass filter with = 4 /5
Fig. 3. MatLab simulation of circuit of Figure 2 for various values of the heterodyne
frequency ω0.
364 Adaptive Filtering
This input is then used with the previous script (COSHET) to generate Figure 4 (inp=0).
60
50
40
30
20
10
-10
0 100 200 300 400 500 600 700 800 900 1000
Fig. 4. Output of circuit of figure 2 when input is nine equally-spaced cosine waves.
In Figure 4 you can see the nine equally spaced cosine waves. The heterodyne frequency
was set to π/2. Thus the cosine waves at all frequencies except π/2 are severely attenuated.
There is nearly 40db difference between the cosine output at π/2 and the cosine output at
other frequencies. Once again, this verifies the ability of the circuit of Figure 2 to implement
a tunable narrow-band band-pass filter. (NOTE: The plots obtained from MatLab label the
Nyquist frequency π as 500. The plots show the entire frequency response from 0 to 2π.
Hence, the cosine at π/2 appears at 250 on the x-axis.).
the maximum attenuation that can be achieved across the stop band is only 6db! This is
illustrated in Figure 5 where we have replaced the narrow-band band-pass H(z) from the
previous section with a wide-band high-pass H(z).
Fig. 5a. H(z) for a wideband high-pass filter Fig. 5b. Tunable band-stop filter
(poor attenuation)
Fig. 6a. Complex heterodyne circuit for software or hardware that supports complex
arithmetic
366 Adaptive Filtering
u(n) v(n)
x(n) H(z) y(n)
Fig. 7. Complex heterodyne rotation circuit (Rotates H(z) by ω0 in the z-plane so that what
was at DC is now at ω0)
The MatLab code to implement Figure 7 is as follows:
Fig. 8a. Frequency response of prototype filter Fig. 8b. Pole-zero plot of prototype filter
Fig. 8c. Frequency response of rotated filter Fig. 8d. Pole-zero plot of rotated filter
Fig. 10a. Input X(z) Fig. 10b. Rotate ω0 to DC Fig. 10c. Rotate - ω0 to DC Fig. 10d. Rotate back
Fig. 10. Creation of a notch at ω0 using the three-way tunable complex heterodyne filter of
Figure 9 (z-plane).
Before we look at the detailed analysis of Figure 9, let’s take an overview of its operation. In
order to make the procedure clear, let’s assume that H(z) is a wide-band high-pass filter like
that of Figure 8a and 8b. Then the first heterodyne operation rotates the input signal x(n)
(shown in Figure 10a.) by -ω0 so that the frequencies that were at DC are now located at -ω0
(see Figure 10b). H(z) is then applied attenuating frequencies that were at ω0 in x(n) before
the rotation (indicated by white spot in Figure 10b). At this point if we simply rotated back
like we did in the circuit of Figure 7, we would get the rotated H(z) as shown in Figure 8c
and 8d. However, in Figure 9 we now rotate back 2ω0 so that the frequencies of x(n) that
were at DC are now at +ω0 in x(n) (see Figure 10c) The second identical H(z) then attenuates
frequencies in x(n) that were at -ω0 before any of these rotations (indicated by second white
spot in Figure 10c). Finally, we rotate the signal back to its original frequencies with the
attenuation having been applied both at +ω0 (first H(z)) and -ω0 (second H(z)) as shown in
Figure 10d. Since H(z) is applied twice, we will experience twice the pass-band ripple.
Hence, the prototype filter H(z) must be designed with one-half the ripple desired in the
370 Adaptive Filtering
final filter. Also because H(z) is applied twice, some portions of the stop-band will have
twice the attenuation while other parts will have the desired attenuation. Having more
attenuation than specified is not a problem, so we will design the prototype filter H(z) with
the desired stop-band attenuation (not half the stop-band attenuation).
Now let’s look at the detailed mathematics of Figure 9. Making use of the relationship of
Equation 1, we have the following as a result of passing x(n) (see Figure 10a) through the
first complex heterodyne unit in Figure 9:
( ) ⇔ (8)
Next we apply the prototype filter H(z) (see Figure 10b):
( ) ∗ ℎ( ) ⇔ ( ) (9)
Now we rotate back 2ω0 by passing through the second complex heterodyne unit (see Figure
10c):
[ ( ) ∗ ℎ( )] ⇔ ( ) (10)
We then apply the second identical prototype filter H(z) (see Figure 10c):
( ) ∗ ℎ( ) ∗ ℎ( ) ⇔ ( ) (11)
Finally we pass through the last complex heterodyne unit returning the signal to its original
location (see Figure 10d):
( ( ) ∗ ℎ( ) ∗ ℎ( )) ⇔ ( ) ( ) (12)
The transfer function shown in equation (12) above is the effect of the entire Three-Way
Tunable Complex Heterodyne Filter shown in Figure 9. By choosing different prototype
filters H(z) we are able to implement tunable center-frequency band-stop and notch filters,
tunable cut-off frequency low-pass and high-pass filers, and tunable bandwidth band-pass
and band-stop filters. In the following sections we will look at the details for each of these
designs.
Designing tunable filters using the Three-Way Complex Heterodyne circuit of Figure 9 is
simply a matter of choosing the correct prototype filter H(z). Table 1 on the next page
shows the types of tunable filters that can be designed using the Three-Way Complex
Heterodyne Technique including the requirements for the prototype filter H(z) and the
tunable range. In the following sections we shall make use of this table to design examples
of each of these tunable filters.
noted in Table 1. The examples in this chapter will all be based on linear-phase Parks-
McClellan filters. The prototype filter H(z) will use a 64-tap prototype filter with weights
designed to obtain 40db attenuation in the stop band and a maximum ripple of 1.5db in the
prototype filter pass-band (3db in the tunable filter pass-band).
Desired Tunable
Required H(z) Tunable Range
Filter
High-pass filter with cut-off frequency equal to
one-half of the desired band-width, pass-band
Tunable center- Fully tunable from
ripple equal to one-half the desired pass-band
frequency band-stop DC to the Nyquist
ripple and stop-band attenuation equal to the
filter frequency.
desire stop-band attenuation for the tunable
center-frequency band-stop filter.
Low-pass filter with cut-off frequency equal to
one-half of the Nyquist frequency, pass-band Cut-off frequency
Tunable cut-off
ripple equal to one-half the desired pass-band tunable from DC
frequency low-pass
ripple and stop-band attenuation equal to the to one-half the
filter
desire stop-band attenuation for the tunable cut- Nyquist frequency
off frequency low-pass filter.
High-pass filter with cut-off frequency equal to
one-half of the Nyquist frequency, pass-band Cut-off frequency
Tunable cut-off high- ripple equal to one-half the desired pass-band tunable from DC
pass filter ripple and stop-band attenuation equal to the to one-half the
desire stop-band attenuation for the tunable cut- Nyquist frequency
off frequency high-pass filter.
Band-pass filter centered at /2 with band-
width of /2, pass-band ripple equal to one-half
Bandwidth
Tunable band-width the desired pass-band ripple and stop-band
tunable from to
band-pass filter attenuation equal to the desired stop-band
/2
attenuation for the tunable band-width band-
pass filter.
Band-stop filter centered at /2 with band-
width of /2, pass-band ripple equal to one-half
Bandwidth
Tunable band-width the desired pass-band ripple and stop-band
tunable from to
band-stop filter attenuation equal to the desired stop-band
/2
attenuation for the tunable band-width band-
stop filter.
NOTE: In bandwidth tuning, is the smallest bandwidth available. The actual value of
depends on the transition band of the prototype filter H(z). The narrower the transition
band, the smaller the value of . Attempts to tune the bandwidth to less than will result
in leakage at DC and the Nyquist frequency.
Table 1. Design of tunable filters using the three-way complex heterodyne circuit of Figure 9
372 Adaptive Filtering
The following MatLab code is used to implement the Three-Way Complex Heterodyne
Circuit of Figure 9:
% N3WAYHET
% Implements the Three-Way Heterodyne Rotion Filter
% Also known as the Full-Tunable Digital Heterodyne Filter
% INPUTS:
% Set the following inputs before calling 3WAYHET:
% inp = 0 (provide input file inpf)
% = 1 (impulse response)
% npoints = number of points in input
% w0 = heterodyne frequency
% [b a] = coefficients of filter H(z)
% scale = 0 (do not scale the output)
% = 1 (scale the output to zero db)
%
% OUTPUTS: ydb = frequency response of the filter
% hdb, sdb, udb, vdb, wdb (intermediate outputs)
clear y ydb hdb s sdb u udb v vdb w wdb
if inp==1
for index=1:npoints
inpf(index)=0;
end
inpf(1)=1;
end
for index=1:npoints
s(index)=inpf(index)*exp(-1i*w0*(index-1));
end
u=filter(b,a,s);
for index=1:npoints
v(index)=u(index)*exp(+2*1i*w0*(index-1));
end
w=filter(b,a,v);
for index=1:npoints
y(index)=w(index)*exp(-1i*w0*(index-1));
end
[h,f]=freqz(b,a,npoints,'whole');
hdb=20*log10(abs(h));
sdb=20*log10(abs(fft(s)));
udb=20*log10(abs(fft(u)));
vdb=20*log10(abs(fft(v)));
wdb=20*log10(abs(fft(w)));
ydb=20*log10(abs(fft(y)));
if scale==1
ydbmax=max(ydb)
ydb=ydb-ydbmax;
end
plot(ydb,'k')
Adaptive Heterodyne Filters 373
To design a tunable center-frequency band-pass filter, the prototype filter must be a narrow-
band low-pass filter with the bandwidth equal to half the bandwidth of the desired tunable
band-pass filter. Before calling the MatLab m-file n3wayhet, we initialize the input variables
as follows:
Fig. 11. Design criteria for prototype wide-band high-pass filter H(z) required to implement
a tunable band-stop filter using the three-way complex heterodyne circuit of Figure 9.
Figure 12 shows the result of running this MatLab m-file simulation of the circuit of Figure 9
for four different values of ω0, = 0, = , = and = . Key features of the
Three-Way Complex Heterodyne Technique can be seen in Figure 12. First, when = 0 we
get the frequency response shown in Figure 12a which is the prototype filter convolved with
itself (H(z)H(z)). Thus we have over 80db attenuation in the stop band and the desired less
that 3db ripple in the pass-band. The prototype filter is High-Pass. Figure 12b shows the
circuit with = /4. This tunes the center frequency to /4 which shows up as 125 on the
x-axis of Figure 12b. Figure 12c shows the circuit with = /2. This tunes the center
frequency to /2 which shows up as 250 on the x-axis of Figure 12c. Figure 12d shows the
circuit with = 3 /4. This tunes the center frequency to 3/4 which shows up as 375 on
the x-axis of Figure 12d. Notice that the attenuation of the tuned band-stop filters is over
40db which is the same stop-band attenuation as the prototype filter. All of these filters
retain the linear-phase property of the prototype filter that was designed using the Parks-
McClellan algorithm.
374 Adaptive Filtering
Fig. 12a. Tunable band pass = 0 [H(z)H(z)] Fig. 12b. Tunable band pass = /4
Fig. 12c. Tunable band pass = /2 Fig. 12d. Tunable band pass = 3 /4
Fig. 12. Tunable center-frequency linear-phase band-stop filter using the three-way Complex
heterodyne circuit
Fig. 13. Design criteria for prototype low-pass filter H(z) with cut-off frequency at /2
required to implement a tunable cut-off frequency low-pass filter using the three-way
complex heterodyne circuit of Figure 9.
Fig. 14a. Tunable cut-off low-pass filter = 0 [H(z)H(z)] Fig. 14b. Tunable cut-off low-pass = /4
Fig. 14c. Tunable cut-off low-pass = /2 Fig. 14d. Tunable cut-off low-pass = 3 /4
Fig. 14. Tunable cut-off frequency linear-phase low-pass filter using three-way complex
heterodyne circuit
376 Adaptive Filtering
Figure 14 shows the tunable cut-off frequency low-pass filter. First, when = 0 we get the
frequency response shown in Figure 14a which is the prototype filter convolved with itself
(H(z)H(z)). Thus we have over 80db attenuation in the stop band and the desired less that
3db ripple in the pass-band. The prototype filter is Low-Pass with bandwidth set to one-half
the Nyquist frequency (250 on the x-axis). Figure 14b shows the circuit with = /8. This
tunes the cut-off frequency to /2 - /8 = 3/8 which shows up as 187.5 on the x-axis of
Figure 14b. Figure 14c shows the circuit with = /4. This tunes the cut-off frequency to
/2 - /4 = /4 which shows up as 125 on the x-axis of Figure 14c. Figure 14d shows the
circuit with = 3 /8. This tunes the center frequency to /2 - 3/8 = /8 which shows
up as 62.5 on the x-axis of Figure 14d. Notice that the attenuation of the tuned low-pass
filters is over 40db which is the same stop-band attenuation as the prototype filter. All of
these filters retain the linear-phase property of the prototype filter that was designed using
the Parks-McClellan algorithm.
Fig. 15. Design criteria for prototype high-pass filter h(z) with cut-off frequency at /2
required to implement a tunable cut-off frequency high-pas filter using the three-way
complex heterodyne circuit of figure 9.
Adaptive Heterodyne Filters 377
Figure 16 shows the tunable cut-off frequency high-pass filter. First, when = 0 we get
the frequency response shown in Figure 16a which is the prototype filter convolved with
itself (H(z)H(z)). Thus we have over 80db attenuation in the stop band and the desired less
that 3db ripple in the pass-band. The prototype filter is High-Pass with bandwidth set to
one-half the Nyquist frequency (250 on the x-axis). Figure 16b shows the circuit with
= /8. This tunes the cut-off frequency to /2 + /8 = 5/8 which shows up as 312.5 on
the x-axis of Figure 16b. Figure 16c shows the circuit with = /4. This tunes the cut-off
frequency to /2 + /4 = 3/4 which shows up as 375 on the x-axis of Figure 16c. Figure
16d shows the circuit with = 3 /8. This tunes the center frequency to /2 + 3/8 =
7/8 which shows up as 437.5 on the x-axis of Figure 16d. Notice that the attenuation of the
tuned high-pass filters is over 40db which is the same stop-band attenuation as the
prototype filter. All of these filters retain the linear-phase property of the prototype filter
that was designed using the Parks-McClellan algorithm.
Fig. 16a. Tunable high-pass filter = 0 (H(z)H(z)) Fig. 16b. Tunable high-pass filter = /8
Fig. 16c. Tunable high-pass filter = /4 Fig. 16d. Tunable high-pass filter = 3 /8
Fig. 16. Tunable cut-off frequency linear-phase high-pass filter using three-way complex
heterodyne circuit
378 Adaptive Filtering
Fig. 17. Design criteria for prototype band-pass filter h(z) centered at /2 with bandwidth of
/2 (band edges at /4 and 3/4) required to implement a tunable bandwidth band-pass
filter using the three-way complex heterodyne circuit of figure 9.
Adaptive Heterodyne Filters 379
Fig. 18a. Tunable bandwidth BP filter = 0 (H(z)H(z)) Fig. 18b. Tunable bandwidth BP filter = /8
Fig. 18c. Tunable bandwidth BP filter = /4 Fig. 18d. Tunable bandwidth BP filter = 3 /8
Fig. 18. Tunable bandwidth linear-phase band-pass filter using three-way complex
heterodyne circuit
Fig. 19. Design criteria for prototype band-stop filter H(z) centered at /2 with bandwidth of
/2 (band edges at /4 and 3/4) required to implement a tunable bandwidth band-stop
filter using the three-way complex heterodyne circuit of Figure 9.
Fig. 20a. Tunable bandwidth BS filter = 0 (H(z)H(z)) Fig. 20b. Tunable bandwidth BS filter = /8
Fig. 20c. Tunable bandwidth BS filter = /4 Fig. 20d. Tunable bandwidth Bs filter = 3 /8
Fig. 20. Tunable bandwidth linear-phase band-stop filter using three-way complex
heterodyne circuit
Adaptive Heterodyne Filters 381
Figure 20 shows the tunable bandwidth band-stop filter. First, when = 0 we get the
frequency response shown in Figure 20a which is the prototype filter convolved with itself
(H(z)H(z)). Thus we have over 80db attenuation in the stop band and the desired less that 3db
ripple in the pass-band. The prototype filter is Band-Stop centered at /2 with bandwidth of
/2 (125 to 375 on the x-axis). Figure 20b shows the circuit with = /16. This tunes the
lower band edge to /2 - /4 + /16 = 5/16 (156.25 on the x-axis of Figure 18b) and the
upper band edge to /2 + /4 - /16 = 11/16 (343.75 on the x-axis of Figure 20b). Figure 20c
shows the circuit with = /8. This tunes the lower band edge to /2 - /4 + /8 = 3/8
(187.5 on the x-axis of Figure 20c) and the upper band edge to /2 + /4 - /8 = 5/8 (312.5
on the x-axis in Figure 20c). Figure 20d shows the circuit with = 3 /16. This tunes the
lower band edge to /2 - /4 + 3/16 = 7/16 (218.75 on the x-axis of Figure 20d) and the
upper band edge to /2 + /4 - 3/16 = 9/16 (281.25 on the x-axis of Figure 20d). Notice that
the attenuation of the tuned band-stop filters is over 40db which is the same stop-band
attenuation as the prototype filter. All of these filters retain the linear-phase property of the
prototype filter that was designed using the Parks-McClellan algorithm.
Comparing the circuit of Figure 21 to the circuit of Figure 9, we see that the circuit of Figure
21 has one additional fixed filter block Hz(z) between the input and the first heterodyne
stage. This block allows for fixed signal processing that is not subject to the rotations of the
other two blocks. Otherwise, this is the same circuit as Figure 9. However, the addition of
this extra block gives us the flexibility to do many more signal processing operations.
We do not have sufficient room in this chapter to explore all the possibilities of the circuit of
Figure 9, so we shall limit ourselves to three: (1) Tunable filters with at least some real poles
and zeros, (2) Tunable filters with poles and zeros clustered together on the unit circle, and
(3) Tunable filters realized with a Nyquist filter that allows the elimination of the last
heterodyne stage. This third option is so important that we will cover it as a separate topic
in section 5. The first two are covered here in section 4.1 and 4.2 respectively.
[b,a]=butter(11,0.5);
To design a tunable cut-off frequency low-pass filter using the circuit of Figure 21, we will
divide the poles and zeros of the prototype filter between the three transfer function boxes
such that Hz(z) contains all the real poles and zeros, HB(z) contains all the complex poles and
zeros with negative imaginary parts (those located in the bottom of the z-plane) and HT(z)
contains all the complex poles and zeros with positive imaginary parts (those located in the
top of the z-plane). The following MatLab m-file accomplishes this:
% BOTTOMTOP
% Extracts the bottom and top poles and zeros from a filter
function
% INPUT: [b,a] = filter coefficients
% delta = maximum size of imaginary part to consider
it zero
% OUTPUT:
% [bz,az] = real poles and zeros
% [bb,ab] = bottom poles and zeros
% [bt,at] = top poles and zeros
clear rb rbz rbt rbb ra raz rat rab bz bt bb az at ab
rb=roots(b);
lb=length(b)-1;
% find real zeros
rbz=1;
nbz=0;
nbt=0;
nbb=0;
for index=1:lb
Adaptive Heterodyne Filters 383
if abs(imag(rb(index)))<delta
nbz=nbz+1;
rbz(nbz,1)=real(rb(index));
% find top zero
elseif imag(rb(index))>0
nbt=nbt+1;
rbt(nbt,1)=rb(index);
% find bottom zero
else
nbb=nbb+1;
rbb(nbb,1)=rb(index);
end
end
ra=roots(a);
la=length(a)-1;
% find real poles
raz=1;
naz=0;
nat=0;
nab=0;
for index=1:la
if abs(imag(ra(index)))<delta
naz=naz+1;
raz(naz,1)=real(ra(index));
% find top zero
elseif imag(ra(index))>0
nat=nat+1;
rat(nat,1)=ra(index);
% find bottom zero
else
nab=nab+1;
rab(nab,1)=ra(index);
end
end
if nbz==0
bz=1;
else
bz=poly(rbz);
end
if nbt==0
bt=1;
else
bt=poly(rbt);
end
if nbb==0
bb=1;
else
bb=poly(rbb);
end
if naz==0
az=1;
else
az=poly(raz);
384 Adaptive Filtering
end
if nat==0
at=1;
else
at=poly(rat);
end
if nab==0
ab=1;
else
ab=poly(rab);
end
Figure 22 shows the results of applying the above m-file to the prototype 11th order
Butterworth Low-Pass Filter with cut-off frequency at /2.
Fig. 22a. Pole-zero plot of H (z) Fig. 22b. Pole-zero plot of Hz(z)
th
(11 order butterworth LP) (real poles and zeros)
Fig. 22c. Pole-Zero Plot of H B(z) Fig. 22b. Pole-Zero Plot of HT(z)
(bottom poles and zeros). (top poles and zeros)
Fig. 22. Illustration of the result of the Matlab m-file dividing the poles and zeros in the
prototype 11th order butterworth low-pass filter designed with a cut-off frequency of /2.
The resulting transfer functions Hz(z), HB(z) and HT(z) are then implanted in the appropriate
boxes in the circuit of Figure 21.
Adaptive Heterodyne Filters 385
To simulate the Bottom-Top Tunable Complex Heterodyne Filter of Figure 21, we make use
of the following MatLab m-file:
% CMPLXHET
% Implements the Complex Heterodyne Filter
% INPUTS:
% Set the following inputs before calling 3WAYHET:
% inp = 0 (provide input file inpf)
% = 1 (impulse response)
% npoints = number of points in input
% w0 = heterodyne frequency
% [bz az] = coefficients of filter Hz(z)
% [bb ab] = coefficients of filter Hb(z)
% [bt at] = coefficients of filter Ht(z)
% scale = 0 (do not scale the output)
% = 1 (scale the output to zero db)
%
% OUTPUTS: ydb = frequency response of the filter
% hdb, sdb, udb, vdb, wdb (intermediate outputs)
clear y ydb hdb s sdb u udb v vdb w wdb h f
if inp==1
for index=1:npoints
inpf(index)=0;
end
inpf(1)=1;
end
r=filter(bz,az,inpf);
for index=1:npoints
s(index)=r(index)*exp(1i*w0*(index-1));
end
u=filter(bb,ab,s);
for index=1:npoints
v(index)=u(index)*exp(-2*1i*w0*(index-1));
end
w=filter(bt,at,v);
for index=1:npoints
y(index)=w(index)*exp(1i*w0*(index-1));
end
[h,f]=freqz(b,a,npoints,'whole');
hdb=20*log10(abs(h));
rdb=20*log10(abs(fft(r)));
sdb=20*log10(abs(fft(s)));
udb=20*log10(abs(fft(u)));
vdb=20*log10(abs(fft(v)));
wdb=20*log10(abs(fft(w)));
ydb=20*log10(abs(fft(y)));
if scale==1
ydbmax=max(ydb)
ydb=ydb-ydbmax;
end
plot(ydb,'k')
386 Adaptive Filtering
Figure 23 shows the results of this simulation for the 11th order Butterworth Low Pass
prototype filter with cut-off frequency of /2 (250). Figure 23a shows the result for ω0 = 0.
This is the prototype filter. Unlike the Three-Way Tunable Complex Heterodyne
Technique of the previous section, we do not need to design for half the desired pass-
band ripple. We can design for exactly the desired properties of the tunable filter.
Figure 23b shows the result for ω0 = -/8. This subtracts /8 from the cut-off frequency of
/2 moving the cut-off frequency to 3/8 (187.5). Figure 23c shows the result for
ω0 = -/4. This subtracts /4 from the cut-off frequency of /2 moving the cut-off
frequency to /4 (125). Figure 23d shows the result for ω0 = -3/8. This subtracts 3/8
from the cut-off frequency of /2 moving the cut-off frequency to /8 (62.5). The
horizontal line on each of the plots indicates the 3db point for the filter. While there is
some peaking in the pass-band as the filter is tuned, it is well within the 3db tolerance of
the pass-band.
Fig. 23a. Tunable low-pass with ω0 = 0 Fig. 23b. Tunable low-pass with ω0 = -/8
Fig. 23c. Tunable low-pass with ω0 = -/4 Fig. 23d. Tunable low-pass with ω0 = -3/8
Fig. 23. Implementation of a tunable cut-off frequency low-pass filter using the bottom-top
technique of Figure 21.
Adaptive Heterodyne Filters 387
4.2 Tunable filters with poles and zeros clustered together on the unit circle
One of the most powerful applications of the Bottom-Top Tunable Complex Heterodyne
Technique is its ability to implement the very important tunable center-frequency band-stop
filter. Such filters, when made adaptive using the techniques of section 6 of this chapter, are
very important in the design of adaptive narrow-band noise attenuation circuits. The
Bottom-Top structure of Figure 21 is particularly well suited to the implementation of such
filters using any of the designs that result in a cluster of poles and zeros on the unit circle.
This is best accomplished by the design of narrow-band notch filters centered at /2. All of
the IIR design techniques work well for this case including Butterworth, Chebyshev, Inverse
Chebyshev and Elliptical Filters.
As an example, we design a Butterworth 5th order band-stop filter and tune it from DC to
the Nyquist frequency. In MatLab we use [b,a]=butter(5,[0.455 0.545],’stop’); to obtain the
coefficients for the prototype filter. We then use the m-file BOTTOMTOP as before to split
the poles and zeros into the proper places in the circuit of Figure 21. Finally, we run the
MatLab m-file CMPLXHET to obtain the results shown in Figures 24 and 25.
Fig. 24a. Pole-zero plot of prototype band-stop filter Fig. 24b. Pole zero plot of Hz(z)
(real poles and zeros)
Fig. 24c. Pole-zero plot of HB(z) Fig. 24d. Pole zero plot of HT(z)
(bottom poles & zeros) (top poles & zeros)
Fig. 24. Distribution of poles and zeros for 5th order butterworth band-stop filter centered at
/2. Notice how the poles and zeros are clustered on the unit circle. This is the ideal case of
use of the bottom-top tunable complex heterodyne filter circuit of Figure 21.
388 Adaptive Filtering
Figure 24 shows the poles and zeros clustered in the z-plane on the unit circle. Figure 24a.
shows the poles and zeros of the prototype 5th order Butterworth band-stop filter centered at
/2 designed by [b,a]=butter(5,[0.455 0.545],’stop’);. Figure 24b shows the poles and zeros
assigned to Hz(z) by the MatLab m-file BOTTOMTOP. Similarly, Figures 24c and 24d show the
poles and zeros assigned by the MatLab m-file BOTTOMTOP to HB(z) and HT(z) respectively.
Figure 25 shows the MatLab simulation of the Bottom-Top Tunable Complex Heterodyne
Filter as implemented by the circuit of Figure 21 in the MatLab m-file CMPLXHET. The
band-stop center frequency is fully tunable from DC to the Nyquist frequency. The tuned
band-stop filter is identical to the prototype band-stop filter. Furthermore, this works for
any band-stop design with clustered poles and zeros such as Chebyshev, Inverse Chebyshev
and Elliptical designs. In section 6 we shall see how to make these filters adaptive so that
they can automatically zero in on narrow-band interference and attenuate that interference
very effectively. Figure 25a is for ω0 = 0, Figure 25b is for ω0 = -7/16, Figure 25c is for ω0 = -
3/16 and Figure 25d is for ω0 = 5/16. Note the full tenability form DC to Nyquist.
Fig. 25a. Band Stop Tuned to /2 (ω0 = 0) Fig. 25b. Band Stop Tuned to /16 (ω0 = -7/16)
Fig. 25c. Band Stop Tuned to 5/16 (ω0 = -3/16) Fig. 25b. Band Stop Tuned to 13/16 (ω0 = 5/16)
Fig. 25. Butterworth tunable band-stop filter implemented using bottom-top tunable
complex heterodyne technique. Note that the band-stop filter is fully tunable from DC to the
Nyquist frequency.
Adaptive Heterodyne Filters 389
Fig. 26. Nyquist tunable complex heterodyne filter circuit (Soderstrand’s technique)
In the circuit of Figure 26 the signal is first passed through HNQ(z), a complex-coefficient
digital filter that removes all frequencies from the bottom half of the unit circle in z-plane.
Thus this filter removes the negative frequencies or equivalently the frequencies above the
Nyquist frequency. Such a filter is easily designed in MatLab by designing a low-pass filter
with cut-off frequency of /2 and then rotating it in the z-plane so as to pass positive
frequencies (frequencies above the real axis in the z-plane) and to attenuate negative
frequencies (frequencies below the unit circle in the z-plane).
examples we shall assume that we need a Nyquist Filter with 60db attenuation of the
negative frequencies and no more than 1db ripple in the pass-band (positive frequencies).
We shall choose a 64-tap filter, although excellent results can be obtained with many fewer
taps.
The first step in the design of the Nyquist Filter is to use FIRPM to design a low-pass filter
with cut-off frequency at /2 with the desired specifications (60db stop-band attenuation
and 1db ripple in the pass-band, and 3db attenuation at DC):
% NQFILTER
% This script rotates a low-pass filter with a
% cut-off frequency at 0.5 (pi/2) by phi radians
% in the z-plane to create a complex-coefficient
% digital filter that removes the frequencies in
% the lower half of the z-plane (phi = pi/2).
% INPUTS: [blp,alp] = lowpass filter with 0.5 cut-off
% phi = (suggest pi/2)
% OUTPUTS:
% [bnq,anq] = the complex-coefficents of
% the Nyquist filter
clear nb na bnq anq
nb = length(blp);
na = length(alp);
if nb > 1
for index=1:nb
bnq(index)=blp(index)*exp(1i*(index-1)*(phi));
end
else
bnq = 1;
end
if na > 1
for index=1:na
anq(index)=alp(index)*exp(1i*(index-1)*(phi));
end
else
anq = 1;
end
Adaptive Heterodyne Filters 391
Fig. 27a. Pole-zero plot of low-pass filter Fig. 27b. Pole-zero plot of nyquist filter
Fig. 27c. Frequency response plot of low-pass filter Fig. 27d. Frequency response plot of nyquist filter
Fig. 27. Design of a nyquist filter by rotating a 64-tap parks McClellan linear-phase filter
with cut-off frequency at /2 by 90 degrees in the z-plane to obtain a filter that attenuates all
negative frequencies.
5.2 Novel technique for tuning a complex heterodyne prototype filter (Soderstrand’s
technique)
Once the negative frequencies have been removed from the input signal, the problem of
tuning the filter becomes much simpler. Any prototype filter H(z) may be rotated from its
center at DC to its center at ω0 through the standard rotation technique of Figure 7. This is
exactly what is done in the Nyquist Tunable Complex Heterodyne Circuit of Figure 26. The
potential problem, however, is that we obtain a filter with all the poles and zeros in the
upper half of the z-plane and none in the lower half. Hence the output, y(n), will consist of
complex numbers. The novelty of Soderstrand’s Technique is that the mirror image poles
and zeros needed in the bottom half of the z-plane can be easily created by simply taking the
real part of the output y(n). Since we only need the real part of the output, this also
simplifies the hardware because we can use the simplified circuit of Figure 6d in the last
stage of the circuit of Figure 26. The simulation of the circuit of Figure 26 is accomplished in
MatLab with the m-file NQHET:
392 Adaptive Filtering
5.3 Example design of nyquist tunable complex heterodyne filter using circuit of
Figure 26 (Soderstrand’s technique)
The Nyquist Technique is very general and can rotate any type filter, IIR of FIR, low-pass,
high-pass, band-pass or band-stop. However, one of the most important uses of the
Nyquist Complex Heterodyne Filter is the design of a tunable band-stop filter that can be
used to attenuate narrow-band interference in spread-spectrum receivers. To design
tunable stop-band filters, the prototype filter, H(z), must be a high-pass filter with cut-off
frequency set to one-half the desired bandwidth of the desired tunable band-stop filter.
The high-pass prototype filter should have the same stop-band attenuation and pass-band
ripple as the desired tunable band-stop filter as all characteristics of the prototype filter
are maintained in the tunable filter. The MatLab instruction to design the prototype high-
pass filter is
Fig. 28a. Nyquist tunable band-stop ω0 = 0 Fig. 28b. Nyquist tunable band-stop ω0 = /4
Fig. 28c. Nyquist tunable band-stop ω0 = /2 Fig. 28d. Nyquist tunable band-stop ω0 = 3/4
Fig. 28. Example of nyquist tunable complex heterodyne filter (Soderstrand’s technique) for
a tunable band-stop filter with 40db attenuation in the stop band, less than .1 db ripple in
the pass-band and bandwidth /10.
Figure 28 shows the MatLab simulation of the Nyquist Tunable Complex Heterodyne Filter
of the circuit of Figure 26. This is fully tunable from DC to the Nyquist frequency. For
example, to get the plot of Figure 28d, we use:
inp=1;npoints=1000;w0=3*pi/4;nqhet;
Notice the dip at DC and the Nyquist Frequency. This is due to the Nyquist Filter not
having 3db attenuation at DC. We deliberately allowed the attenuation to be 5db to
demonstrate the effect of not having 3db attenuation at DC in the Nyquist filter. However,
the effect is negligible only causing a ripple of less than 1db. If the attenuation were greater,
the dip would be greater. If the attenuation is less than 3db, we would see a upward bulge
at DC and the Nyquist frequency in Figure 27. The tunable filter has the same stop-band
394 Adaptive Filtering
attenuation and pass-band ripple as the prototype filter except for this added ripple due to
the Nyquist filter. The bandwidth of the stop-band is twice the bandwidth of the prototype
filter.
Fig. 29. Narrow-band interference detection circuit to turn tunable complex heterodyne
filters into adaptive complex heterodyne filters.
Adaptive Heterodyne Filters 395
Detection Circuit and to the Tunable Complex Heterodyne Filter. The Attenuation
Frequency Detection Circuit (shown in inset) is a simple second-order FIR LMS adaptive
notch filter. Because the detection circuit is FIR it will identify the interference without bias.
Furthermore, this simple second-order FIR filter is known to be robust and to converge
quickly on the correct frequency. However, the simple second-order FIR filter does not
provide an adequate attenuator for the narrow-band interference because it attenuates too
much of the desired signal. Therefore we only use the detection circuit to determine the
value of needed to generate the complex heterodyne tuning signal . This value of
is fed to a numerically controlled complex oscillator that produces the complex heterodyne
signal .
Fig. 30. Test setup for simulation in MatLab of a comparison of adaptive complex
heterodyne filters to adaptive gray-markel lattice filters.
Figure 31 shows plots of the energy leakage during a transition of the narrow-band
interference from one frequency to another. Both the Gray Markel and the Complex
Heterodyne Adaptive Filters track the interference very well. However, in large
transitions such as those shown in Figure 31a (transition from frequency /24 to 11/24)
and Figure 31b (transition from /12 to 5/12) the Adaptive Complex Heterodyne Filter
provides more attenuation of the narrow-band interference than the Gary-Markel
Adaptive Filter. In the case of Figure 31a, the difference is about 20db and in the case of
Figure 31b it is only about 10db. However, these represent significant differences in the
ability to attenuate a fast moving signal. On the smaller transitions of Figure 31c
(transition from /8 to 3/8) and Figure 31d. (transition from /4 to /8) there is little
difference between the two filters (although the Gray-Markel adaptive filter is a bit
smoother in the transition). The point of this simulation is to show that Adaptive
Heterodyne Filters offer an excellent alternative to currently used adaptive filters such as
the Gray-Markel adaptive filter.
396 Adaptive Filtering
Fig. 31a. Transition from /24 to 11/24 Fig. 31b. Transition from /12 to 5/12
Fig. 31c. Transition from /8 to 3/8 Fig. 31d. Transition from /4 to /8
Fig. 31. Comparison of gray-markel and complex heterodyne adaptive filters while tracking
a moving signal.
8. References
Azam, Azad, Dhinesh Sasidaran, Karl Nelson, Gary Ford, Louis Johnson and Michael
Soderstrand (2000), Single-Chip Tunable Heterodyne Notch Filters Implemented in
FPGA's, IEEE International Midwest Symposium on Circuits and Systems, Lansing, MI,
August 8-11, 2000.
Butler, Lloyd, (1989) Introduction to the Superheterodyne Receiver, Amateur Radio, March 1989,
available from http://users.tpg.com.au/users/ldbutler/Superhet.htm, accessed
12/11/2010.
Cho, Grace Yoona, (2005) Louis G. Johnson & Michael A. Soderstrand, New Complex-
Arithmetic Heterodyne Filter, IEEE International Symposium on Circuits and Systems,
vol. 3, pp. 593-596, May, 2005.
Coulson, A.J. (2004) Narrowband Interference in Pilot Symbol Assisted OFDM Systems,
IEEE Transactions on Wireless Communications, vol. 3, no. 6, pp. 2277-2287,
November 2004.
Dorf, Richard C. & Zhen, Wan (2000) The z-Transform, In: R.C. Dorf, The Electrical
Engineering Handbook, CRC Press, available at
http://www.4shared.com/document/VA837U3Q/ebook_-
_Electrical_Engineering.html, Chapter 8, section 8.2, ISBN 978-0-84-93018-8.
Duman, Tolga M., (2005) Analog and Digital Communications, In: R.C. Dorf, The Engineering
Handbook, 2nd Ed, CRC Press, chapter 135, ISBN 0-8493-1586-7.
Etkin, R, Parekh, A & Tse, D, (2005) Spectrum Sharing for Unlicensed Bands, Proceedings of
the 43rd Allerton Conference on Communications, Control and Computing, ISBN 978-1-
60-423491-6, Monticello, Illinois, September 2005.
Gray, A.H. Jr., (1973) & John D. Markel, Digital Lattice and Ladder Filter Synthesis, IEEE
Transactions on Audio, vol. AU-21, no. 6, pp. 491-500, Dec 1973.
McCune, Earl, (2000) DSSS vs. FHSS Narrowband Interference Performance Issues, mobile
Development and Design, September 1, 2000, available from
http://mobiledevdesign.com/hardware_news/radio_dsss_vs_fhss/
Nelson, K.E., (1997), P-V.N. Dao, M.A. Soderstrand, S.A. White, J.P. Woodard, A Modified
Fixed-Point Computational Gradient Descent Gray-Markel Notch Filter Method for
398 Adaptive Filtering