0% found this document useful (0 votes)
3 views

7 Inference L8 Unlocked

Uploaded by

moneeshbba2026
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

7 Inference L8 Unlocked

Uploaded by

moneeshbba2026
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Postgraduate Diploma in Business Analytics (PGDBA)

2024-26 Batch

LECTURE 8
MODULE-II

December 30, 2021 2


Stat3: INFERENCE, PGDBA Programme, ISI, 2021
 Estimation in the context of business analytics
 Estimation as a data summarization and inferential tool
 Concepts of population, sample and estimators
 Criteria for good estimators
 Concepts of
 unbiasedness
 consistency
 Illustration of sample mean and sample proportion through Monte
Carlo simulations
 Introduction to different methods of estimation
 Concepts of sampling distributions of a statistic
 Confidence interval and their usages; examples in real life
analytic scenarios (5 weeks).

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 3


December 30, 2021 4
Stat3: INFERENCE, PGDBA Programme, ISI, 2021
A population is a collection of all possible individuals, objects, or
measurements of interest.

A sample is a portion, or part, of the population of interest

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 5


Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 6
 Makes certain propositions about a population using data drawn from the
population through some sampling procedure.
 STATISTICAL INFERENCE consists of
 assuming a realistic statistical model of the process that generates the data
 deducing (statistical) propositions from the model.
 The conclusion of a statistical inference is a statistical proposition.
 Some common types of statistical proposition
 A POINT ESTIMATE
 a particular value computed from the data that best approximates some parameter of
interest
 AN INTERVAL ESTIMATE
 an interval constructed using the data such that, under repeated sampling of such
datasets, such intervals would contain the true parameter value with the probability at
the stated confidence level
 REJECTION OF A STATISTICAL HYPOTHESIS

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 7


 Isnothing but INDUCTIVE LOGIC or REASONING.
 Inductive Logic
 A collection of observations is synthesized to come up with a
general principle.
 As opposed to deductive logic or reasoning, namely,
 If the premises are correct, the conclusion of a deductive argument
is certain.
 The truth of the conclusion of an inductive argument is
probable, based upon the evidence given.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 8


Example of Inductive Logic
 Most Indians I have come
across love spicy food.
 Therefore Indians probably
love spicy food.
Example of Deductive Logic
https://www.stratechi.com/
 All viruses undergo mutations.
 SARS-CoV-2 is a virus.
 Therefore SARS-CoV-2 must
undergo mutations.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 9


 A set of assumptions concerning the generation of the observed data.
 The description of statistical models usually emphasize the role of
population quantities of interest, about which we wish to draw inference.
 Three levels of modelling assumptions in statistics
 PARAMETRIC: The data-generation process is assumed to be fully described by a
family of probability distributions involving a finite number of unknown
parameters
 For example, one may assume that it can be described by a 𝑁𝑁(𝜇𝜇, 𝜎𝜎 2 ) distribution.
 NON-PARAMETRIC: The assumptions made about the data-generation process are
much more general than in parametric statistics and may be minimal.
 For example, the data-generation process can be described by a continuous probability
distribution.
 SEMI-PARAMETRIC: Intermediate to the fully and non-parametric approaches.
 For example, one may assume that
 the data-generation process is a continuous probability distribution with a finite mean.
 the mean in the population is a linear function of some covariate

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 10


 A model for data collection which generates observations that are
said to constitute a random sample from the population
 RANDOM SAMPLE
 Definition: A collection of random variables 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 is said to be a
random sample of size 𝑛𝑛 from a population characterized by a pmf/pdf
𝑓𝑓(𝑥𝑥) if
 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 are mutually independent random variables
 each 𝑋𝑋𝑖𝑖 has the same pmf/pdf 𝑓𝑓(𝑥𝑥)
 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 are said to be independent and identically distributed
random variables (or i.i.d. random variables, in short).
 A set of random observations 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 of size 𝑛𝑛 from 𝑓𝑓(𝑥𝑥) is a
realization of the random variables 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 .
Convention: Uppercase letters represent random variables, their lowercase
counterparts represent observations on them.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 11


 Independent members of a population drawn without bias
 The statistical theory to be discussed henceforth is based on
the premise that a random sample is available from the
population.
 The larger the sample, the better the inference from the
data.
 Samples can be drawn in many ways.
 Analysis will depend on the sampling method used.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 12


A mathematical model
•parametric
•non-parametric

RELATIONSHIP
BETWEEN A
POPULATION AND
A SAMPLE

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 13


 Any function of 𝑋𝑋1 , 𝑋𝑋2 , ⋯ , 𝑋𝑋𝑛𝑛 , like
1 𝑛𝑛
 sample mean: 𝑋𝑋� = ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖
𝑛𝑛
1 𝑛𝑛 1
 sample variance: 𝑆𝑆 2 = ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2 or 𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2
𝑛𝑛 𝑛𝑛−1
1 𝑛𝑛 1
 sample standard deviation: 𝑆𝑆 = + ∑ 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2 or + 𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2
𝑛𝑛 𝑖𝑖=1 𝑛𝑛−1
 order statistics: 𝑋𝑋(1) , 𝑋𝑋(2) , ⋯ , 𝑋𝑋(𝑛𝑛) where 𝑋𝑋(1) ≤ 𝑋𝑋 2 ≤ 𝑋𝑋 𝑛𝑛 .
 sample maximum: 𝑋𝑋 𝑛𝑛
 sample minimum: 𝑋𝑋 1
 sample range: 𝑋𝑋 𝑛𝑛 − 𝑋𝑋 1
 A Statistic, being a function of random variables, is also a random
variable.
 Sampling distribution of a statistic is its probability distribution.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 14


 In the singular sense, it refers to the discipline of Statistics,
that is, a body of scientific methods dealing with collection
and analysis of numerical data.
 In the plural sense, it refers to
 more than one statistic, as defined in the previous slide, OR
 more than one fact or piece of information obtained from a study
of a large quantity of numerical data.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 15


Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 16
December 30, 2021 17
Stat3: INFERENCE, PGDBA Programme, ISI, 2021
 Monte Carlo Methods are a broad class of computational
algorithms that rely on repeated random sampling to obtain
numerical results.
 Use randomness to solve problems that might be
deterministic in principle.
 Often used in physical and mathematical problems.
 Are most useful when it is difficult or impossible to use other
approaches.
 Monte Carlo methods are mainly used for
 optimization
 numerical integration
 generating random samples from a probability distribution.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 18


 Numerical Integration: Computing a definite integral
𝑏𝑏

𝐼𝐼 𝑎𝑎, 𝑏𝑏 = � 𝑔𝑔 𝑥𝑥 𝑑𝑑𝑑𝑑
𝑎𝑎
when there is no closed form expression for the corresponding indefinite integral
∫ 𝑔𝑔 𝑥𝑥 𝑑𝑑𝑑𝑑 .
 Solution: Select 𝑛𝑛 points 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 at random (that is, generate 𝑛𝑛 random
1
numbers) from the interval [𝑎𝑎, 𝑏𝑏] and estimate 𝐼𝐼 𝑎𝑎, 𝑏𝑏 by 𝐼𝐼̂(𝑛𝑛) = (𝑏𝑏 − 𝑎𝑎) ∑𝑛𝑛𝑖𝑖=1 𝑔𝑔(𝑥𝑥𝑖𝑖 ).
𝑛𝑛

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 19


 Any a set of numbers exhibiting statistical randomness, that
is, not exhibiting any discernible patterns or regularities
 Generation of random numbers is at the heart of Monte Carlo
methods.
 Generation of random numbers
 In earlier days, before the advent of computers, random number
tables were used
 Almost all software for statistical computation have built-in random
number generators
 Actually, they are pseudo-random number generators since they use
deterministic algorithms, like 𝑥𝑥𝑛𝑛+1 = 𝑎𝑎𝑥𝑥𝑛𝑛 + 𝑏𝑏 (mod 𝑀𝑀) with 𝑥𝑥0 as the seed,
𝑀𝑀 being a large positive integer. (linear congruential generators)

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 20


Let 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 , ⋯ be a sequence of random numbers in [0,1].
 Ideally, plots of 𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑖𝑖+1 should exhibit complete absence of
any kind of pattern, that is should be truly randomly
distributed. Truly random Numbers Pseudo-random Numbers

 Pseudo-random numbers
exhibit patterns.

https://www.ics.uci.edu/~goodrich/

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 21


 Simulation
 The imitation of the behaviour of a real-world process or system over time.
 Requires the use of models which represent the key characteristics of the system or process.
 Generally computers are used to execute the simulation.
 Stochastic Simulation
 is a simulation of a system that involves random variables, that is, which can change
stochastically (randomly) with individual probabilities.
 Realizations of these random variables are generated and inserted into a model of the system.
 Outputs of the model are recorded.
 The process is repeated with a new set of random values until a sufficient amount of data is
generated.
 The distribution of the outputs provides insights into the system, like the most probable
estimates of parameters, and so on

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 22


 The random variables used for
simulating a stochastic model are
generated on a computer with a
random number generator (RNG).
 The sequence of numbers
generated by a RNG takes values in
[0,1] generally.
 Can be looked upon as a realization of
the uniform random variable over
[0,1] or the 𝑈𝑈(0,1) distribution.
 These can be transformed into random
variables with respective probability
distributions. https://genedan.com/

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 23


Let 𝑓𝑓(𝑥𝑥) be the pdf/pmf to be simulated from, with
corresponding cumulative distribution function (CDF) 𝐹𝐹(𝑥𝑥).

 Some commonly-used methods for simulating from 𝑓𝑓(𝑥𝑥)


 Inversion method
 Acceptance-Rejection method
 Box-Muller method for normal variables
 Special methods for specific distributions, based on their
properties.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 24


 It can easily be shown that the random variable 𝐹𝐹(𝑋𝑋) has the
𝑈𝑈(0,1) distribution.
−1
 If the inverse function for 𝐹𝐹, that is, 𝐹𝐹 exists, then 𝐹𝐹 −1 (𝑈𝑈)
has cdf 𝐹𝐹, where U~𝑈𝑈(0,1).
 Thus for any realization 𝑢𝑢 from the 𝑈𝑈(0,1) distribution,
𝐹𝐹 −1 (𝑢𝑢) is a realization from 𝑓𝑓(𝑥𝑥).
 Example
1 𝑥𝑥⁄
 𝑓𝑓 𝑥𝑥 = 𝑒𝑒 − 𝜇𝜇 , 𝑥𝑥 > 0, that is, the exponential distribution with
𝜇𝜇
mean 𝜇𝜇
𝑥𝑥⁄ 1
Here 𝐹𝐹 𝑥𝑥 = 1 − 𝑒𝑒 − 𝜇𝜇 and 𝐹𝐹 −1 𝑢𝑢 = − log 𝑒𝑒 1 − 𝑢𝑢 .
𝜇𝜇
1 1
Hence − log 𝑒𝑒 1 − 𝑢𝑢 or, equivalently, − log 𝑒𝑒 𝑢𝑢 is a realization from 𝑓𝑓(𝑥𝑥).
𝜇𝜇 𝜇𝜇

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 25


 Easily extended to the
case of discrete random
variables though 𝐹𝐹 −1 (𝑢𝑢) is
not uniquely defined.
 Here Pr 𝑥𝑥 = 𝑥𝑥𝑖𝑖 𝑢𝑢

= 𝐹𝐹 𝑥𝑥𝑖𝑖+1 − 𝐹𝐹(𝑥𝑥𝑖𝑖 )
 So, for 𝑢𝑢~𝑈𝑈 0,1 , if
𝐹𝐹 𝑥𝑥𝑖𝑖 < 𝑢𝑢 ≤ 𝐹𝐹 𝑥𝑥𝑖𝑖+1 ,
take 𝑥𝑥𝑖𝑖+1 to be a realization
from 𝑓𝑓.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 26


 Consider another pmf/pdf 𝑔𝑔(�) 𝑓𝑓(𝑥𝑥)
which has the same support as
𝑓𝑓(�). 𝑐𝑐𝑐𝑐(𝑥𝑥)
𝑓𝑓(𝑥𝑥)
 Let 𝑐𝑐 = < ∞, that is,
max
𝑥𝑥 𝑔𝑔(𝑥𝑥)
𝑓𝑓 𝑥𝑥 ≤ 𝑐𝑐𝑐𝑐 𝑥𝑥 ∀𝑥𝑥, with 𝑐𝑐 > 1.
 Generate 𝑢𝑢 from 𝑈𝑈(0,1) and 𝑣𝑣 Reject
from 𝑔𝑔, independently.
1 𝑓𝑓(𝑣𝑣) 𝑓𝑓(𝑥𝑥)
 If 𝑢𝑢 < ,
return 𝑣𝑣 as a Accept
𝑐𝑐 𝑔𝑔(𝑣𝑣)
realization from 𝑓𝑓; 𝑥𝑥
else go to the previous step.

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 27


 Simulation from the 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(2,2)
density
𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 1 − 𝑥𝑥 , 0 < 𝑥𝑥 < 1.
 Take the 𝑈𝑈(0,1) density as the
proposal density and 𝑐𝑐 = 1.5.
 Choice of 𝑐𝑐 is important since
the proportion of samples from
1
𝑔𝑔 that are rejected is .
𝑐𝑐

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 28


 Generates a pair of independent, standard normal variates
(𝑁𝑁(0,1) ) 𝑍𝑍0 and 𝑍𝑍1 , given a pair of uniformly distributed
random numbers 𝑈𝑈1 and 𝑈𝑈2 as

or, equivalently, as

Stat3: INFERENCE, PGDBA Programme, ISI, 2021 December 30, 2021 29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy