0% found this document useful (0 votes)
95 views

Stat 23 Lecture Notes

This document discusses key concepts in statistics including populations, samples, parameters, variables, and levels of measurement. It defines statistics as involving methods of data collection, organization, analysis, and interpretation to understand patterns in data and make evidence-based decisions under uncertainty. It distinguishes between descriptive statistics, which analyzes a data set, and inferential statistics, which uses a sample to draw conclusions about a larger population.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Stat 23 Lecture Notes

This document discusses key concepts in statistics including populations, samples, parameters, variables, and levels of measurement. It defines statistics as involving methods of data collection, organization, analysis, and interpretation to understand patterns in data and make evidence-based decisions under uncertainty. It distinguishes between descriptive statistics, which analyzes a data set, and inferential statistics, which uses a sample to draw conclusions about a larger population.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Sample survey or sampling – the process in which information

obtained is only a part of the population.


Stat 23 Lecture Notes “A statistic is to a sample as a parameter i
s to a population”.
Neal Quizon

1.1. Things to know 8. Types of Variables and Data

1. Definition of Statistics The building blocks of statistical science are data. Specific
characteristics (e.g., age, height, and weight) that we want to
• In its plural sense, it refers to the data itself or to some assess for a certain population are referred to as variables.
numerical computations derived from a set of data that are Variables may be categorized further as qualitative and
systematically collected and analyzed. • In its singular sense, it quantitative variables.
refers to the scientific discipline consisting of the theory and
methods for processing collections of quantitative and Qualitative variables – These are variables that yield
qualitative data useful when making decisions in the face of observations by which individuals can be categorized according
uncertainty. to some characteristic or quality.

Statistics as a science is basically concerned with t


• e.g., gender, marital status and blood type; they are
he understanding of some structures in a data set. A expressed in categories
s such, statisticians are involved with methods of d
ata collection, data organization, and analyses as w • Are expressed in categories.
ell as interpretation of the results.
Quantitative variables – These are variables that yield
observations that can be measured.
However, uncovering patterns embedded under the backdrop of
uncertainty involves not just science but also art.
• e.g., weight, height, systolic blood pressure and body
mass index.
2. Learning the methods in statistics enable us to
develop a way of thinking that helps us in many ways:
Constant – This is a variable or a variable that only assumes
one value.
• describe or characterize persons, objects, situations, and
some phenomena with some reliability; • make assessments Data collected on variables are classified as either qualitative or
and comparisons in an objective manner; • make evidence- quantitative. Qualitative data (e.g., gender, marital status, and
based decisions. blood type), are data obtained on variables that are usually
expressed in categories. Quantitative data are expressed in
3. Some Applications of Statistics: numbers (e.g., weight, height, systolic blood pressure and body
• Determining the level of patient’s satisfaction on the nursing mass index); data collected in these cases are measured and
care administered by student nurses at Central Mindanao counted.
University. Quantitative data is either classified as discrete or continuous
• Determining the distribution of the number of text messages data.
sent per day of CMU students enrolled in Statistics subjects. • Discrete data – This refers to any data that can be counted,
• Comparing the exam results in Statistics of the different CMU e.g., number of patients in a hospital, number of students who
colleges. obtained 1.0 grade in Math 15 and Math 34. These data assume
only a countable number of values.
• Relationship of faculty status and work commitment.
• Continuous data – This refers to any data that can be
• Prediction of the number of CMU students for the next school measured, e.g., systolic blood pressure, weight and height.
year 2016-2017. These data result from infinitely many possible values that can
be associated with points on a continuous scale in such a way
4. Major Categories of Statistics that there are no gaps or interruptions.
Descriptive Statistics – methods concerned with collecting, Note: Arithmetical operations for quantitative data have some
describing, and analyzing a set of data without drawing physical interpretation. Some variables may take numerical
conclusions (or inferences) beyond the data. values, but it does not make the variable quantitative, e.g., sum
of two zip codes or the difference of your cellular phone number
Inferential Satistics – methods concerned with the analysis of
to your seatmate. Thus, the arithmetic operations of the above
a subset of data leading to predictions or inferences about the
examples do not make sense. The issue is whether performing
entire set of data, that is, to generalize results beyond the data
arithmetical operations on these data would make any sense.
collected provided that the data collected is a part (sample) of a
The figure in the next page illustrates the classification of data
large set of items (population).
collected on particular variables.
5. Examples of Descriptive Statistic
10. Levels of Measurement or Measurement Scales
• Total number of CMU students that are university scholar.
• The CMU registrar cited statistics showing an increased Measurement is the assignment of numbers to objects or events
number of CMU students during the past five years. according to a predetermined set of rules. For instance, if it is
desired to measure a person’s weight in kilograms, we may
6. Example of Inferential Statistics: assign the number 50 to a person and say that a person’s weight
is 50 kilograms. Determining the level of measurement of certain
• A new milk formulation designed to improve the psychomotor
set of data is important because it helps in deciding to determine
of infants was tested on randomly selected infants. Based on the
which statistical inference test that will be used to analyze the
results, it was concluded that the new milk formulation is
data. There are four types of measurement scales: nominal,
effective in improving the psychomotor development of infants.
ordinal, interval and ratio scales. They differ in the property of
7. Key Definitions numbers (identity, order, additivity) that they possess.

Universe – is the set of all entities under study, that is, the
collection of things or observational units under study. • Identity – the property that enables a person to
Variable – is a characteristic observed or measured on every distinguish one number from the other. They are
unit of the universe. recognized by the shapes of the way they are written.
Population - is the set of all possible values of the variable.
• Order – the property that numbers of observations are
Sample – is a subset of the population.
arranged in a sequence. For any integers A,B,A,B, we
Parameters – are numerical measures that describe the
can determine whether A>BA>B, A=BA=B, or B<AB<A.
population or universe of interest.
Statistics – are numerical measures of a sample. • Additivity – the property that allows us to add two or
Frame – a listing of all the elements in a population. more numbers. For any real numbers A,B,CA,B,C,
Census – the process in which information is gathered for all and DD, because of the equality of scale, we can
units in the population. determine if A−B=C−D,A−B>C−D.A−B=C−D,A−B>C−D.
• Absolute zero property means that there is a level at
which there is nothing of the characteristic being
TheRStudio
measured. • RStudio is an integrated development environment (IDE) for
R. It includes a console, syntax-highlighting editor that
• Nominal scale – the lowest level of measurement and is most supports direct code execution, as well as tools for plotting,
often used with variables that are qualitative in nature, rather history, debugging and workspace management.
than quantitative. • RStudio is available in open source and commercial editions
and runs on the desktop (Windows, Mac, and Linux).
- Examples: gender, eye color, smoking status and n • You can download the latest version of RStudio
ationality. at https://www.rstudio.com/products/rstudio/

- Data in the nominal scale possess only the property of identity. CurrentandchangingWorkingDirectoryinR
Thus, numbers or observations are only used to classify. For
example, in the variable gender, if 1 is assign to male and 2 is getwd()
for female, it does not necessarily mean that female is better ## [1] "D:/SY2223 1st sem/Stat23"
than male.
setwd("D:/DOST")
• Ordinal scale – data in this case possesses the property of
identity and order. getwd()
## [1] "D:/DOST"
• can rank-order the objects as to whether they possess
more, less or the same amount of the variables being
measured. Thus, we can determine setwd("D:/Extension Valencia District/Presentation")
whether A>B,A>B, or A=B,A=B, or A<B.A<B.
• We still cannot determine how much greater or dir() # Show the files in your working directory
less AA is less than BB in the attribute being
measured.
## [1] "Lecture1.html" "Lecture1.Rmd"
• Examples: level of educational attainment, military
ranks. "R-Presentation.Rmd"

• Interval scale – Data in this level possesses the properties of ## [4] "R Presentation.Rmd" "rsconnect"
identity, order and additivity but do not have the absolute zero
property.
- Examples: Celsius scale measurement of temperature StatisticsandProbabilityLessonswithR
and intelligence score.
*Lesson: Mean and Variance of a Random Variable
• Ratio scale – Data at this level possesses the properties of
identity, order, equality of scale and absolute zero. - Examples:
weight and height of persons. Question: What is the expected number of heads in tossing
a coin twice?
Introduction on the Use of R Program x<-c(0, 1, 2)
p<-c(.25, .5, .25)
Objectives coin<-rbind(x, p)
• Provide History and Overview of R
xbar<-sum(x*p)
xbar
• Guide in the installation of R and RStudio
## [1] 1
• Show Working/Changing Directory in R
• Incorporate the use of f R in some Statistics and Probability
Lessons Question: What is variance number of headsin tossing a
• Introduce basic commands in R coin twice?
• Introduce R Script and R Markdown x<-c(0, 1, 2)
• Install some R packages p<-c(.25, .5, .25)
• Illustrate: generate R data, data in R, and Export Excel Data in coin<-rbind(x, p)
R xbar<-sum(x*p)
xbarobs<-c(1, 1, 1)
History and Overview of R variance<-sum(p*(x-xbarobs)^2)
• R is an independent open-source implementation of a variance
statistical analysis system developed by Ross Ihaka and ## [1] 0.5
Robert Gentleman at the University of Auckland in 1995.
• R can be used both as a programming language, and as a
piece of software. It can be used for data manipulation, *Lesson: Binomial Probability Distribution
calculation, and graphical display.
• One of the biggest advantages of R is that it can be Question: You flip a fair coin 5 times, what is the
distributed for free. probability of getting 4 or 5 heads?
• R is freely downloaded on the internet bn4<-dbinom(4, 5, 0.5)
bn5<-dbinom(5, 5, 0.5)
The R Installation bntotal<-bn4+bn5
• Obtain a copy of an R language installer from a dependable bntotal
source or directly from the Internet. The URL ## [1] 0.1875
is http://cran.r-project.org/
• The latest version of R is 4.0.5
Or using the pbinom function:
• Once the installation is done, start R by clicking the Desktop
icon for R
bntotalalt<-1-pbinom(3, 5, .5)
TheRConsole bntotalalt
• Along the top of the window is a limited set of menus, which ## [1] 0.1875
can be used for various tasks including opening, loading, and
saving script windows, loading and saving your workspace,
and installing packages. *Lesson: Normal Distribution
• When you open an R session (i.e. start the R program), the R
console opens and you are presented with a screen like this: Question: Suppose that diastolic blood pressures (DBPs)
from men aged 30-44 are normally distributed with a
mean of 85mmHg and a standard deviation of 10 mmHg.
What is the probability that a random 30–44-year-old has o Make the statistical decision:
a DBP less than 80? ▪ If decision rule is based on
region of rejection: Check if test
pnorm(80,mean=85,sd=10,lower.tail = TRUE) statistic falls in the region of
rejection. If yes, reject the null
## [1] 0.3085375 hypothesis.
▪ If decision rule is based on p-
Question: Brain volume for adult men is normally value: Determine the p-value. If
distributed with a mean of about 1,100 cc with a standard the p-value is less than or equal
deviation of 70 cc. What brain volume represents the 95th to reject the null hypothesis.
percentile? o Interpret results.

qnorm(0.95,mean=1100,sd=70, lower.tail = TRUE) Example:


## [1] 1215.14 The mean weight of the sample of 100 persons from the
Honolulu Heart Study is 63 kg. If the ideal weight is
Refer to previous example: Brain volume for adult men is known to be 60 kg, is the group significantly overweight?
normally distributed with a mean of about 1,100 cc with a Assume σ=0.05σ=0.05 and α=10kg.
standard deviation of 70 cc. Consider the sample mean of
100 random adult men from this population. What is the
95th percentile of the distribution of the sample mean?
INTRODUCTION TO PROBABILITY
Note: As the number of people is large enough, we can
consider that the sample mean follows a normal
Probability Theory: Foundation for Data
distribution where the population mean is 1100, Science with Anne Dougherly
population standard deviation is 70 and n=100.
Learning Goals for Module 1
qnorm(0.95,mean=1100,sd=70/10,lower.tail = TRUE)

## [1] 1111.514 In this module, we’ll learn about the deference between a population
and a sample and why probability is the foundation for statistics and
data science. At the end of this Module, students should be able to:
*Estimation and Hypothesis Testing
➢ Explain why probability theory is relevant to statistics and data
Question: In a population of interest, a sample of 9 men
yielded a sample average brain volume of 1,100cc and a science.
standard deviation of 30cc. What is a 95% Student’s T ➢ Describe what it means to predict the outcome of an experiment
confidence interval for the mean brain volume in this new
population? and organize the outcomes into sample spaces.
n<-9 ➢ Calculate probabilities of events using the Axioms of Probability.
mu<-1100 ➢ Understand permutations and combinations and be able to
st.dev<-30 calculate probabilities when each simple event is equally likely
quantile = 0.975 # is 95% with 2.5% on both sides of the range
What is Statistics?
conf= mu + c(-1, 1) * qt(quantile, df=n-1) * st.dev/sqrt(n)
Statistics is the science of using data effectively to gain
conf
new knowledge. We need data to learn something new. We
## [1] 1076.94 1123.06 need to collect and analyze the data ethically.

Population: Those individuals or objects from which we


Steps in Hypothesis Testing want to acquire information or draw a conclusion. Most of
the time, the population is so large, we can only collect data
HYPOTHESIS TESTING: on a subset of it. We will call this our sample.

In probability we assume we know the characteristics of


Things to know the entire population. Then, we can pose and answer
1. Two areas of Inferential Statistics: Estimation questions about the nature of a sample. In statistics, if we
and Hypothesis Testing. have a sample with particular characteristics, we want to be
able to say, with some degree of confidence, whether the
2. Hypothesis Testing is an area statistical whole population has this characteristic, or not.
inference in which one evaluates a conjecture
about some characteristic of the population Sample Spaces and Events
based upon the information contained in the
random sample. Usually, the conjecture Probability studies randomness and uncertainty by givin
concerns one of the unknown parameters of the these concepts a mathematical foundation.
population.
For example, we want to understand how to find the
3. Hypothesis is a claim or statement about the probability
population parameter.
4. Steps in Hypothesis Testing • of getting at least 2 heads in 5-coin flips,
• that a customer will buy milk if they are also buying
o State the null and alternative hypotheses. bread,
o Decide on a level of significance, • that the price of a stock will be in a certain range on a
o Select the appropriate test statistic.
certain date in the future.
o Establish the critical region/regions.
o Compute the actual value of the test statistic Probability gives us the framework to quantify uncertainty.
from the sample.
Terminology ➢ Describe what it means to predict the outcome of an
experiment and organize the outcomes into sample
• An experiment is any action or process that generates spaces.
observations. ➢ Calculate probabilities of events using the Axioms of
• The sample space of an experiment, denoted S, is the Probability.
set of all possible outcomes of an experiment. ➢ Calculate probabilities when each simple event is
• An event is any possible outcome, or combination of equally likely. Understand permutations and
outcomes, of an experiment. combinations.
• The cardinality of a sample space or an event, is the
number of outcomes it contains. |S| represents the What is a Probability?
cardinality of the sample space.
The goal of probability is to assign some number, P(A),
Examples called the probability of event A, which will give a precise
measure to the chance that A will occur. In statistics, we
For each of the following, describe the sample space, S, and draw a sample from a population, and give an estimate. So,
give its cardinality. you will be able to understand statistics more thoroughly
and deeply if you first understand probabilities.
• Experiment 1: Flip a coin twice
• Experiment 2: Flip a coin until you get a tail. ➢ Start with an experiment that generates outcomes
• Experiment 3: Select a car coming o↵ an assembly ➢ Organize all of the outcomes into a sample space, S
line and inspect it for 3 di↵erent defects (engine ➢ Let A be some event contained in S. That is, A is some
problem, seat belt problem, bad paint job). collection of outcomes from the experiment.
• Experiment 4: Measure the arrival time between
What do we expect to be true of P(A)?
two customers.
Axioms of Probability
Set Notation
Axiom 1 : For any event A, 0 P(A) 1
For events A and B,
Axiom 2 : P(S)=1
• A [ B, the union of A and B, means an outcome in A
or an outcome of B occurs. Axiom 3 : If A1, A2,..., An are a collection of n mutually
• A \ B, the intersection of A and B, is all the outcomes exclusive events (i.e. the intersection of any two is the
that are in both A and B empty set), then
• A’c, the complement of A, means the set of all
events in S that are not in A
• A and B are mutually exclusive, or disjoint, if they Axioms of Probability - continued
have no events in common. We write A \ B = ;.
Axiom 3’ : More generally, if A1, A2,... is an infinite collection
Examples continued of mutually exclusive events, then

S = {000, 100, 010, 001, 110, 101, 011, 111}


Consider the following events: These three properties are called the Axioms of Probability
➢ A is the event that there is an engine problem and we can derive many results from them.
(defect 1).
In set notation: A = {100, 110, 101, 111} Example 1
➢ B is the event that there is exactly one defect.
Experiment: Flip a fair coin until the first tail appears. Let 0
In set notation: B = {100, 010, 001}
represent a head and 1 a tail.
➢ C is the event that there are exactly two
defects, so C = {110, 101, 011} S = {1, 01, 001, 0001,...}
➢ A\B=
➢ A’c = Let An represent the event of obtaining a tail on the nth flip,
➢ A’c [ B = An = {00 ··· 01} Find P(A1), P(A2), P(A5) and P(An), where
➢ B\C= n is a positive integer.
P(A1)=1/2
Venn Diagrams P(A2) = P({01})=1/4
P(A5) = P({00001})=1/25
Venn diagrams can be used to help us visualize unions, P(An)=1/2n
intersections, and complements
Example 1 – continued
___________________________________________________________________

AXIOMS OF PROBABILITY If B is the event that it takes at least 3 flips to obtain a tail,
Probability Theory: Foundation for Data Science with find P(B).
Anne Dougherly Bc , the complement of B, is the event that you obtain a tail
on the first or second flip.
P(Bc ) = P({1, 01})=1/2+1/4=3/4
Learning Goals
We also note:
In this module, we’ll learn about the difference between a P(S) = P(B [ Bc ) = P(B) + P(Bc ) = 1. So,
population and a sample and why probability is the P(B)=1 P(Bc )=1 3/4=1/4
foundation for statistics and data science. We’ll also begin Consequences of the Axioms
our study of the foundations of probability.
If A and B are two events contained in the same sample
➢ Explain why probability theory is relevant to statistics space S,
and data science.
➢ A \ Ac =; and A [ Ac = S so, 1 = P(S) = P(A [ Ac ) = P(A) +
P(Ac ) which implies P(Ac )=1 P(A)
➢ If A \ B = ;, then P(A \ B) = 0.
➢ P(A [ B) = P(A) + P(B) P(A \ B) Examples
These three consequences will help us calculate many Experiement: Roll a six-sided dice twice.
probabilities.
S = {(i, j) | i, j 2 {1, 2, 3, 4, 5, 6}}, |S| = 36 and each of the 36
Example 2 outcomes of S is equally likely.

Return to our car example: Recall a randomly selected car is ➢ Let A be the event of rolling a 1 on the first roll.
inspected for three defects. The sample space is P(A) =
➢ Let B be the event that the sum of the two rolls is 8.
S = {000, 100, 010, 001, 110, 101, 011, 111}. Consider the P(B) =
three events: ➢ Let C be the event that the value of the second roll is two
➢ A is the event defect 1 is present, A = {100, 110, 101, more than the first roll.
111} P(C) =
➢ B is the event defect 2 is present, B = {010, 110, 011,
Permutations
111}
➢ C is the event defect 3 is present, C = {001, 011, 101, Any ordered sequence of k objects taken from a set of n
111} distinct objects is called a permutation of size k. Notation:
Pk,n.
Example 2 continued
Example: Suppose an organization has 60 members. One
Suppose over many days, data is collected, and it is found person is selected at random to be the president, another
that 20% of the cars have defect 1, 25% have defect 2, and person is selected as the vice-president, and a third is
30% have defect 3. Further, 5% have defects 1 and 2, 7.5% selected as the treasurer. How many ways can this be done?
have defects 2 and 3, 6% have defects 1 and 3, and 1.5% (This would be the cardinality of the sample space.)
have all three.
Definition: n! = n(n 1)(n 2)··· 3 · 2 · 1 for any positive
Example 2 continued integer n. By definition, we take 0! = 1.
Calculate the probability of each of the following events for Combinations
the randomly selected car:
Given n distinct objects, any unordered subset of size k of
➢ defect 1 did not occur the objects is called a combination. Notation: Ck,n
➢ at least one defect occurs
➢ no defect occurs Example: Suppose we have 60 people and want to choose a
➢ defects 1 and 3 occur but 2 does not. 3person team (order is not important). How many
combinations are possible?

Example – continued
COUNTING: PERMUTATIONS AND
COMBINATIONS
this represents the number of combinations of size k chosen
Probability Theory: Foundation for Data Science with from n distinct objects.
Anne Dougherly
Example - continued
Learning Goals for Module 1
Example: Suppose we have the same 60 people, 35 are
In this module, we’ll learn about the difference between a female and 25 are male. We need to select a committee of
population and a sample and why probability is the 11 people.
foundation for statistics and data science. At the end of this
Module, students should be able to: ➢ How many ways can such a committee be formed?
➢ What is the probability that a randomly selected
➢ Explain why probability theory is relevant to statistics committee will contain at least 5 men and at least 5
and data science. women? (Assume each committee is equally likely.)
➢ Describe what it means to predict the outcome of an
experiment and organize the outcomes into sample Example: A city has bought 20 buses. Shortly after being put
spaces. into service, some of them develop cracks in the frame. The
➢ Calculate probabilities of events using the Axioms of buses are inspected and 8 have visible cracks.
Probability.
➢ How many ways can the city select a sample of 5 for
➢ Understand permutations and combinations and
thorough inspection? (Assume each bus is equally likely
be able to calculate probabilities when each simple
to be chosen.)
event is equally likely.
➢ If 5 buses are chosen at random, find the probability
Counting that exactly 4 have cracks.
➢ If 5 buses are chosen at random, find the probability
Recall that the goal of probability is to assign some number, that at least 4 have cracks.
P(A), called the probability of event A, which will give a
precise measure to the chance that A will occur. If a sample
space, S, has N single events, and if each of these events is
equally likely to occur, then we need only count the number
of events to find the probability.

For example, if S = {E1, E2,..., EN} and if P(Ek )=1/N for k =


1, 2,...,N, and if A is an event in S, then

P(A) = (number of simple events in A)/N

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy