speech processing pbl
speech processing pbl
Under Guidance of
Dr. Satish Kumar Singh
Head of Department
Prof. Deepak Nagaria
Submitted By
Aashi Diwakar(2004331001)
Abhishek Patel(2004331001)
Aditya Kumar(1904331006)
1
CERTIFICATE
This is to certify that this Project Based Learning Report on “Pitch Period Estimation
using Autocorrelation Function” has been successfully deliveredby “Abhishek Patel,
Aditya Kumar and Aashi Diwakar (B.Tech. Final Year)” under the guidance of Dr.
Satish Kumar Singh for fulfilment of Bachelor of Technology degree from Bundelkhand
Institute of Engineering and Technology, Jhansi during academic year 2023-2024.
2
ACKNOWLEDGEMENT
This project would not have been possible without the kind support and help of many
individuals and organizations. We are indebted to our respected Head of Department, Dr.
Deepak Nagaria Sir for guiding us. We are also grateful to our project guide, Dr. Satish
Kumar Singh Sir, for his indomitable contribution and guidance without which this
project would have been impossible to complete. Our sincerest gratitude to all the
teachers, seniors and colleagues whose help and guidance brought this project to a
successful completion.
3
CONTENTS
1. Abstract 5
2. Introduction 6-7
3. Methods 8
4. Code 9-10
5. Results 10-11
6. Conclusion 12
7. References 13
4
1.ABSTRACT
In speech recognition and speech synthesis, accurate estimation of the pitch period is an important
part of speech processing. The traditional direct peak estimation method and the autocorrelation
function method are both effective time domain estimation algorithms. The autocorrelation method
is a pitch period estimation algorithm suitable for low SNR. Both algorithms need to get accurate
peak position estimation. In this paper, a multi-line cut method which is a method for judging the
position of the peak point is proposed. The multi-line cut method is used to intercept the sampled
data of the waveform by using multiple cut lines. The median value is calculated by the starting
and ending points of the cut line position, and the peak position is indirectly evaluated. By
minimizing the impact of interference on the peak estimate, the likelihood of falling into local
extreme points is reduced, therefore a more accurate peak point estimate than the direct search for
peak points can be obtained. The simulation results show that compared with the traditional direct
peak estimation method, the performance of peak estimation by the multi-line cut method can be
greatly improved, and the multi-line cut method can be used to estimate the peak value in the
autocorrelation method, and also achieve a certain performance improvement. In addition, the
number of cut lines is directly related to performance, and the more the number is, the better the
performance is. The complexity of this method is not high and easy to implement.
5
2.INTRODUCTION
In speech signal processing technology, the estimation of the pitch period is a very important link.
Pitch detection is widely used in speech analysis, speech synthesis, speech compression coding,
speech recognition and speech segmentation. For many years, researchers have proposed various
pitch detection algorithms, such as Autocorrelation Function method(ACF), Average Magnitude
Difference Function (AMDF), wavelet transform method, Cepstrum method, etc. In general, pitch
period extraction methods are mainly time domain estimation methods and transform domain
estimation methods. The time domain estimation method is to estimate the pitch period directly
from the waveform of the speech signal, and it has been applied very early, and it is widely used
because of its simple implementation and low computational complexity. The peak direct
estimation method is one of the time domain estimation methods and is still widely used at present.
The autocorrelation function method is also a time domain estimation method, which is suitable
for the pitch period extraction in the case of low SNR. The autocorrelation function method needs
to estimate the peak position when performing the pitch extraction. When the peak position is
inaccurate due to the local minimum value, the performance is affected. In this paper, a peak point
position estimation method will be described, which can make the judgment of the peak point
position more accurate, and relatively accurately estimate the pitch period of the speech signal.
The following is a description of the traditional peak direct estimation method and the short-term
autocorrelation function estimation method to estimate the pitch period. Then, the multi line cut
method proposed in this paper is introduced, and then the four methods are verified and evaluated.
Two time domain pitch period estimation methods In the time domain pitch period estimation
method, the traditional direct peak estimation method and the autocorrelation function method are
both effective algorithms. Among them, the autocorrelation method is a pitch period estimation
algorithm suitable for low SNR. Both algorithms require an accurate estimate of the peak position.
The following is a brief introduction to the two algorithms. Traditional peak direct estimation The
peak direct estimation method is to directly find two adjacent peak points of the periodic signal,
and calculate the interval time T between the two peak points, that is, the period of the signal.
However, due to the influence of noise and interference, this method may lead to inaccurate peak
point estimation, which results in inaccurate period estimation. However, this method is simple
and intuitive, and has low complexity. There are still many application scenarios.
Short-term autocorrelation function method The autocorrelation function method belongs to the
time domain estimation algorithm. Compared with other time domain algorithms, it has better anti-
noise interference characteristics. The extracted pitch contour features are obvious, the accuracy
is good, the implementation is simple, and it is also a widely used algorithm in the field of speech
6
signal processing. The principle of the algorithm is that the autocorrelation function value of the
speech signal will peak at an integer multiple of the pitch period, and the pitch period can be
extracted to estimate the pitch period. Autocorrelation calculation is performed for each frame
by the calculation formula of the short-time autocorrelation function.
7
3.Methods
Pitch period estimation is a crucial step in many audio and speech processing applications, such
as speech recognition and pitch-based musical analysis. The autocorrelation function is commonly
used for pitch period estimation. Here's a basic method using autocorrelation:
1.Preprocessing:
Begin by pre-processing the audio signal. Typically, this involves windowing the signal to reduce
spectral leakage. A commonly used window function is the Hamming window.
2.Peak Picking:
Identify peaks in the autocorrelation function. Peaks correspond to potential pitch periods. One
way to do this is to find local maxima in the autocorrelation function.
/Fs.
Basic Steps:
Preprocess the signal by applying a window function (e.g., Hamming) to mitigate spectral leakage
and normalize amplitude.
Estimate the pitch period based on the lag corresponding to the highest peak.
8
4. Matlab Code
close all;
clc;
[x1,fs]=audioread('test1.wav');
T=1/fs;
plot(x1)
ylabel('Amplitude')
xlabel('Time (s)');
title('Synthesis Signal');
x=x1 (700:800);
[rxx lags] = xcorr (x,x);
figure
plot (lags, rxx)
xlabel('lag');
ylabel('Correlation Measurement');
title('Auto-correlation Function')
first_peak_loc= length (x)+1;
min_period_in_samples=30;
half_min = min_period_in_samples/2;
seq(first_peak_loc-half_min:first_peak_loc+half_min) = min(seq);
plot(rxx, 'rx');
hold on
plot (seq)
[max_val second_peak_loc] = max(seq);
period_in_samples = abs (second_peak_loc-first_peak_loc)
period_in_samples = period_in_samples*T
9
Fundamental_frequency=1/period_in_samples
sound(x1)
5.Results
MATLAB 7.0 is used for our calculations. We chose MATLAB as our programming environment
as it offers many advantages. It contains a variety of signal processing and statistical tools, which
help users in generating a variety of signals and plotting them. MATLAB excels at numerical
computations, especially when dealing with vectors or matrices of data. One of the speech signal
used in this study is given with Fig.02 algorithm pitch periods estimation using autocorrelation
function..
10
Figure 2-Autocorrelation vs Lag in Discrete domain.
11
6.Conclusion
In conclusion, pitch period estimation using the autocorrelation function in MATLAB is a practical
and accessible method for analyzing periodicities in audio signals. MATLAB provides a
convenient environment for signal processing tasks, including pitch period estimation using the
autocorrelation function. The xcorr function is employed to compute the autocorrelation, and the
findpeaks function helps identify peaks in the autocorrelation function. Basic Steps involves
preprocess the signal by applying a window function (e.g., Hamming) to mitigate spectral leakage
and normalize amplitude. Compute the autocorrelation function using the xcorr function. Identify
peaks in the autocorrelation function using the findpeaks function. Estimate the pitch period based
on the lag corresponding to the highest peak.
12
7.References
• https://www.researchgate.net/publication/259823741_VoicedUnvoiced_Decision_for_Sp
eech_Signals_Based_on_Zero-Crossing_Rate_and_Energy
• https://www.mathworks.com/help/signal/ref/zerocrossrate.html
• "Fundamentals of Speech Recognition" by Lawrence Rabiner and Biing-Hwang Juang.
• https://www.youtube.com/watch?v=q9nki9ksHHs&t=214s
13