Unit 1
Unit 1
The Role of Statistics in Engineering : The Engineering Method and Statistical Thinking -
Collecting Engineering Data - Basic Principles - Retrospective Study - Observational Study
- Designed Experiments -Observing Processes Over Time - Mechanistic and Empirical
Models
Data Description and Representation: Collection of data- Classification and Tabulation of
data - Stem-and-Leaf Diagrams - Frequency Distributions and Histograms - Box Plots -
Time Sequence Plots - Probability Plots .
UNIT-II
Descriptive Statistics: Measures of central Tendency-Measures of Dispersion-Skewness
and Kurtosis. Correlation and Regression: Scatter Diagram – Types of Correlation – Karl
Pearsons Coefficient of Correlation and Spearmen’s Rank Correlations- Method of Least
Squares – Linear Regression.
UNIT-III
Sampling: Different types of sampling - Sampling Distributions - Sampling
Distribution of Mean.
Point Estimation of Parameters: General Concepts of Point Estimation -
Unbiased Estimators -Variance of a Point Estimator - Standard Error- Methods of
Point Estimation (Method of Moments - Method of Maximum Likelihood).
Statistical Intervals for a Single Sample: Confidence Interval on the Mean of a
Normal Distribution with Variance Known - Confidence Interval on the Mean of a
Normal Distribution with Variance Unknown - Confidence Interval on the Variance
and Standard Deviation of a Normal Distribution - A Large-Sample Confidence
Interval for a Population Proportion
UNIT-IV
Tests of Hypotheses for a Single Sample: Tests of Statistical Hypotheses - General
Procedure for Hypothesis Testing –Tests on the Mean of a Normal Distribution with
Variance Known - Tests on the Mean of a Normal Distribution with Variance Unknown
- Tests on the Variance and Standard Deviation of a Normal Distribution.
Statistical Inference for Two Samples: Inference For a Difference in Means of Two
Normal Distributions with Variances Known - Inference For a Difference in Means of
Two Normal Distributions with Variances Unknown -Inference on the Variances of
Two Normal Distributions – Inference on Two Population Proportions.
UNIT-V L- 6 The Analysis of Variance: Concept-
Assumptions-One way classification and two-way classifications.
Designing Engineering Experiments –Concept of Randomization, Replication and
local control - Completely Randomized Design -Randomized Block Design –Latin
square Design.
Text Books
1. Douglas C. Montgomery and George C. Runger. Applied Statistics and
Probability for Engineers, (3rd Edn), John Wiley and Sons, Inc., New York,
2003.
2. Robert H. Carver and Jane Gradwohl Nash. Doing Data Analysis with SPSS
Version 18.0, (Indian Edition), Cengage Learning, New Delhi, 2012
3. Richard A. Johnson and C.B.Gupta, Probability and Statistics for Engineers, (7th
Edn.), Pearson Education, Indian Impression 2006.
Reference:
1. Mohammed A.Shayib. Applied Statistics, First Edition. eBook, Bookboon.com
2013.
2. Peter R.Nelson, Marie Coffin, Copeland Kanen, A.F. Introductory Statistics for
Engineering Experimentation, Elsevier Science and Technology Books, New
York, 2003.
3. Sheldon M. Ross, Introduction to Probability and Statistics, (3rd Edn), Elsevier
Science and Technology Books, New York, 2004.
4. T.T.Soong, Fundamentals of Probability and Statistics for Engineers, John
Wiley and Sons, Ltd., New York, 2004.
5. J.P.Marques de Sá , Applied Statistics using SPSS, STATISTICA, MATLAB and
R, (2nd Edn.), Springer Verlag, Heidelberg, 2007.
The Role of Statistics in Engineering
THE ENGINEERING METHOD AND STATISTICAL THINKING
Identify, at least tentatively, the important factors that affect this problem
or that may play a role in its solution.
Propose a model for the problem, using scientific or engineering
knowledge of thephenomenon being studied. State any limitations or
assumptions of the model.
• SAMPLE
• ENUMERATION STUDY
• ANALYTIC STUDY
Collecting Engineering Data
Basic Principles
• An observational study
• A designed experiment
Retrospective Study
• A retrospective study may involve a lot of data, but that data may contain relatively little
useful information about the problem. Furthermore, some of the relevant data may be
missing, there may be transcription or recording errors resulting in outliers (or unusual
values), or data on other important factors may not have been collected and archived.
• For example, the specific concentrations of butyl alcohol and acetone in the input feed stream
are a very important factor, but they are not archived because the concentrations are too hard
to obtain on a routine basis.
• As a result of these types of issues, statistical analysis of historical data sometimes identify
interesting phenomena, but solid and reliable explanations of these phenomena are often
difficult to obtain.
Observational Study
• In an observational study, the engineer observes the process or population, disturbing it as
little as possible, and records the quantities of interest. Because these studies are usually
conducted for a relatively short time period, sometimes variables that are not routinely
measured can be included.
• In the distillation column, the engineer would design a form to record the two temperatures
and the reflux rate when acetone concentration measurements are made. It may even be
possible to measure the input feed stream concentrations so that the impact of this factor
could be studied. Generally, an observational study tends to solve problems and goes a long
way toward obtaining accurate and reliable data.
Designed Experiments
• In a designed experiment the engineer makes deliberate or purposeful
changes in the controllable variables of the system or process, observes
the resulting system output data, and then makes an inference or
decision about which variables are responsible for the observed changes
in output performance.
Observing Processes Over Time
• Often data are collected over time. In this case, it is usually very helpful to plot
the data versus time in a time series plot. Phenomena that might affect the
system or process often become more visible in a time-oriented plot and the
concept of stability can be better judged.
MECHANISTIC AND EMPIRICAL MODELS
Models play an important role in the analysis of nearly all engineering
problems. Much of the formal education of engineers involves learning
about the models relevant to specific fields and the techniques for
applying these models in problem formulation and solution. As a simple
example, suppose we are measuring the flow of current in a thin copper
wire. Our model for this phenomenon might be Ohm’s law:
Current = voltage/resistance
We call this type of model a mechanistic model because it is built from our underlying
knowledge of the basic physical mechanism that relates these variables.
EMPIRICAL MODEL
• It uses our engineering and scientific knowledge of the
phenomenon, but it is not directly developed from our theoretical or
first-principles understanding of the underlying mechanism.
COLLECTION OF DATA
Collection of statistical data forms the fundamental basis for all statistical analysis. Care
must be taken to see that the data collected are reliable and useful for the purpose of the
inquiry.
Before collecting statistical data one should clearly define
(1) The purpose of inquiry
(2) The source of information
(3) Scope of inquiry
(4) The degree of accuracy desired
(5) Methods of collecting data
(6) The unit of data collection
The purpose of inquiry
• Statistical data are collected to draw desired conclusions based on the
data.
• These information may also be useful for some other statistical
survey.
Example:
*Product produced is popular in market
SCOPE OF INQUIRY
NOTE:
• Primary data
• Secondary data
Statistical Unit
If the statistical data collected are numerical facts about the qualities
like male, female,employed, Indian, foreigners, etc., the classification of
the data the data is done according to these characteristics.
Classification according to quantitative basis
Example:
• The production of fertilizer from different parts of the country.
CHRONOLOGICAL CLASSIFICATION
Statistical data arranged according to the time of occurrence come
under this classification.
Example:
Production of wheat from the year 1980 to 1985
TABULATION
• The classified data has to be presented in a tabular form in an orderly
way before analysis and interpretation of the data.
• Tabulation is defined as “the orderly or systematic presentation of
numerical data in rows and columns, designed to facilitate the
comparision between the figures”
• Tabulation is a statistical tool used for condensation of the data in a
statistical process.
Characteristics of a good table
• When a number of tables are presented in the analysis of a statistical data, serial numbers
should be given to the tables.
• The unit of measurement used should be clearly indicated. These units are normally
mentioned at the top of the columns.
• The data of preparation of the table and the source of information should
be mentioned at the bottom of the table.
STEM AND LEAF DIAGRAM
In a stem-and-leaf plot, numerical data are listed in ascending
or descending order. The digits in the greatest place value of
the data are used for the stems. The digits in the next greatest
place value form the leaves.
Box plots
Step 3: Draw a number line that will include the smallest and the largest data.
Step 4: Draw three vertical lines at the lower quartile (12), median (22) and
the upper quartile (36), just above the number line.
Step 5: Join the lines for the lower quartile and the upper quartile to form a
box.
Step 6: Draw a line from the smallest value (5) to the left side of the box and draw a line from
the right side of the box to the biggest value (53)
Consider, again, this dataset.
1 1 2 2 4 6 6.8 7.2 8 8.3 9 10 10 11.5
The first quartile is two, the median is seven, and the third quartile is nine. The
smallest value is one, and the largest value is 11.5. The following image shows
the constructed box plot.
Times Series Plots
A time series plot is a graph where some measure of time is the unit on
the x-axis. In fact, we label the x-axis the time-axis. The y-axis is for the
variable that is being measured. Data points are plotted and generally
connected with straight lines, which allows for the analysis of the graph
generated.
From the graph generated by the plotted points, we can see any trends in
the data. A trend is a change that occurs in general direction. For
example, if we see a car at a red light and then the light turns green, we
could plot the distance the car moves versus the time it takes to get to its
current position. We would notice the trend of an increasing distance
from the starting point.
Distance versus time graph
PROBABILITY PLOT
• The probability plot (Chambers et al., 1983) is a graphical technique
for assessing whether or not a data set follows a given distribution
such as the normal or Weibull. The data are plotted against a
theoretical distribution in such a way that the points should form
approximately a straight line.