Statistics
Statistics
in which we live. The facts we collect are often number facts such as the number of runs scored by Indian team against Pakistan. - The methods and techniques of collection, presentation, analyses and interpretations of numerical data in a logical and systematic manner so as to serve a purpose is known as statistics. - Statistics is a mathematical science pertaining to the collection, analysis, interpretation and presentation of data. MEANING OF STATISTICS: Statistics is concerned with scientific method for collecting and presenting, organizing and summarizing and analyzing data as well as deriving valid conclusions and making reasonable decisions on the basis of this analysis. ORIGIN AND GROWTH OF STATISTICS (HISTORY): The word statistics and statistical are derived from the Latin word status, means political state. the German stastistik, first introduced by Gottfried ache wall (1749), originally designated the data analysis of state. It was used by the British mainly for administrative and governmental bodies. In particular census provides regular information about the population. Today however statistics had broadened Far beyond the service of a state or government ,it includes areas such as Business Natural and social sciences and Medicines Before 3000B.C. the Babylonians used small clay tablets to record tabulations of agricultural yields and of commodities bartered or sold.
the Egyptians analyzed the population and material wealth of their kingdoms. FUNDAMENTAL CHARACTERSTICS OF STAISTICS:
they are related to each other and are comparable. They are aggregate of facts and not a single observation . Statistics do not take into account individual cases Statistics data are numerically expressed. Statistics are collection of data in a systematic manner. Statistics are collected for a predetermined purpose. Statistics deals with group and doesnt study individually. Statistics laws are not exact; they are true only on averages. the data collected by someone else, other than the investigator, are known as secondary data. the data obtained in the original form are called ungrouped data or raw data. An arrangement of raw numerical data in ascending or descending order of magnitude is called array.
USES AND APPLICATIONS OF STATISTICS: Statistics and its studies have been used to answer questions in the following fields:INDUSTRIES AND BUSINESS Report of early sales & comparison others. It shows where the factory or its sales lack and where they are AGRICULTURE What amount of crops are grown this year in comparison to previous year or in comparison to required amount of crop for the country Quality and size of grains grown due to use of different fertilizer. FORESTRY good
how much growth has been occurred in area under forest or how much forest has been depleted in last 5 years? How much different species of flora and fauna have increased or decreased in last 5 years? EDUCATION Money spent on girls education in comparison to boys education? Increase in no. of girl students who seated in who seated for different exams? Comparison for result for last 10 years.
ECOLOGICAL STUDIES Comparison of increasing impact of pollution on global warming? Increasing effect of nuclear reactors on environment? MEDICAL STUDIES No. of new diseases grown in last 10 year. Increase in no. of patients for a particular disease.
SPORTS Used to compare run rates of two different teams. Used to compare to different players. Data and its classification: Data can be classified as: - Ungrouped data - Grouped data ~ Ungrouped data refers to the data where the frequencies are not - Arithmetic mean of raw data (when the frequencies are not given): given.
~Grouped data: Sometimes the data is so large that it is inconvenient to list every item in the frequency distribution table. Then, we group the items into convenient intervals and the data is presented in a frequency-distribution in which each class interval contains 5 or 10 values of the variate generally. The mid-value of each class is the representative of each item falling in that interval.
Marks 4 5 8 18 28 29 31
no. of students 1 1 1 2 1 2 1
Marks 46 47 50 55 70 71 75
No. of students 3 2 2 3 1 1 2
40
80
Data It refers to the collection of information. Observation The value of the mark in the data is an observation. The above data in the ascending order is called an arrayed and the way of arrangement is called an array. The way of arrangement of data in the table is known is known as frequency distribution. Inclusive or Discrete series: When the class intervals are so fixed that the upper limit of the class is included in the same class interval. Then such series are known as inclusive series. (For example: 1-10, 11-20) Exclusive or Continuous series: In the exclusive series, the upper limit of one class is the lower limit of next class. (For example: 150-155, 155160) Marks are called variates the no. of students who secured a particular no. of marks are frequency of variates is called frequency of the variate. The number of times a number has been repeated is called the frequency of the variate. Each group into which a raw data is condensed is called a class. The size of a class known as the class interval. For ex. 10 is the class interval of class 0-10. Each class is bounded by 2 fig. which are called the lower limits and 20is the upper limit. The difference between the upper limit of the class and the lower limit of class is called as the class size. The value which lies midway between lower and upper limits of a class is known as its mid value or class mark. Class mark = upper limit + lower limit 2 The difference between the two extreme observations in an arranged data i.e. the difference between the maximum and minimum values of observations is known as the Range.
CENTRAL TENDENCY:
The measures of central tendency tap into the average distribution of a set of scores or values in the data. There are three measures of central tendency: Mean Average value of the given data. Mode It is the value of the variate that occurs most often. Median It is the middle most term of the given data.
MEAN:
The mean of some data is the average score or value. Mean of groped data :If x1,x2 , x3 ,.xn are variables of a variable x , then the arithmetic mean or simply mean of these values is denoted by X and is defined as X= x1 +x2 +x3 +xn n Or X= i=1n xi/n
Direct
Assume d Mean
Step Deviation
Direct method:
ALGORITHM: Step I= Prepare the frequency table in such a way that its first column consists of the values of the variate and the second column the Step II=multiply the frequency of each row with the corresponding values of variable to obtain third column containing fixi; Step III= Find the sum of all entries in column III to obtain fixi. Step IV= Find the sum of all the frequencies in column II to obtain fi Step V= Use the formula: X = Examples:
fixi /fi
1. Mid - values 2 3 4 5 6 Total Frequencies (fi) 49 43 57 38 13 fi = 200 fixi 98 129 228 190 78 fixi = 723
= 723 200 = 3.615 2. Class Interval 10 30 30 50 50 70 70 90 90 110 fi 90 20 30 20 40 fi = 200 Mid value (xi) 20 40 60 80 100 fixi 1800 800 1800 1600 4000 fixi = 10000
Mean =
fixi
/fi
= 10000 200 = 50
Examples: 1.
Frequency (fi) 24 40 33 28
Mid value di = xi - A (xi) 125 175 225 (275) A -150 -100 -50 0
30 22 16 7 fi = 200
2. Class 0 50 50 100 100 150 150 200 ]200 250 250 300 Mid values di = xi - A (xi) 25 75 125 (175) A 225 275 -150 -100 -50 0 50 100 Frequencies (fi) 17 35 43 40 21 24 fi = 180 Here, A = 175, fi = 180 and fidi = -4750 fidi -2550 -3500 -2150 0 1050 2400 fidi =-4750
= 175 + -4750
180
fi
Examples: 1. Classes 1400 1500 1500 1600 1600 1700 1700 1800 1800 1900 1900 - 2000 Frequency (fi) 5 10 20 9 6 2 fi = 52 Here, A = 1750, fiui = -45, fi = 52 and h = 100 Now, Mean = A + fiuix h
fi xi - A
Classes
Frequency (fi) 5 4 8 12 16 15 10 8 5 2
fiui
0 - 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100
fi = 85
fiui = 56
= 55 + -56 x10
55
= 55-6.59 = 48.41
MODE:
The mode or modal value of a distribution is that value of the variable for which the frequency is maximum. In order to compute the mode of a series of individual observations. We first convert it into a discrete series frequency distribution by preparing a frequency table. From the frequency table, we
identify the value having maximum frequency. The value of variable to obtain is the mode or modal value. Mode for a grouped data is given by, Mode= l + f1 f0 x h 2f1-f0-f2
l = lower limit of the modal class h = size of the class interval f1 = frequency of the modal class f0 =frequency of the class preceding the modal class f2= frequency of the class succeeding the modal class
Examples: 1. Class 0-10 10-20 20-30 30-40 14 13 12 40-50 20 50-60 11 60-70 15 70-80 8
Frequency 7
Here, 20 is the highest frequency, So 40-50 is the modal class. Here, l = 40,fo=12, f1=20, f2=11 Mo= l +f1 f0 x h 2f1-f0-f2 =40 + 20-12 x 10 40-12-11
= 40+4.7 = 44.7 2.
Class
1-3
3-5 8
5-7 2
7-9 2
9-11 1
Frequency 7
Here highest frequency is 8. So 3-5 is the modal class. L=3,f0=7,f1=8,f2=2.h=2 Mo= l +f1 f0 x h 2f1-f0-f2 = 3 + 8-7 x 2 16-7-2 = 3+0.286 = 3.286
Here mean=62.8, fi=40+f,fixi = 2640 +f Mean = fixi/fi = 62.8 = 2640 + 50f 40 + f 62.8 (40 + f) = 2640 + 50 2512 + 62.8f = 2640 + 50f 12.8f = 128 f = 10
25
Also, Mean=1.46
1.46= fixi/N