0% found this document useful (0 votes)
4 views

sujal 4

The document outlines a series of experiments related to distribution theory and non-parametric statistics for an M.Sc. Statistics course at the Central University of South Bihar. Each experiment includes aims, methodologies using R software, and conclusions drawn from statistical analyses. The experiments cover topics such as random sampling from exponential distributions, normal distribution probabilities, and correlation between variables derived from uniform distributions.

Uploaded by

Arya P
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

sujal 4

The document outlines a series of experiments related to distribution theory and non-parametric statistics for an M.Sc. Statistics course at the Central University of South Bihar. Each experiment includes aims, methodologies using R software, and conclusions drawn from statistical analyses. The experiments cover topics such as random sampling from exponential distributions, normal distribution probabilities, and correlation between variables derived from uniform distributions.

Uploaded by

Arya P
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DISTRIBUTION THEORY AND NON-PARAMETRIC STATISTICS

(STS81DC00104)

ASSIGNMENT - 1

SEMESTER – I

M.Sc. STATISTICS

[SESSION: 2024-2026]

DEPARTMENT OF STATISTICS
SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE
CENTRAL UNIVERSITY OF SOUTH BIHAR

pg. 1
INDEX
S.NO. Name Of Experiment Page Date Of Experiment
No.

1. EXPERIMENT-1 3-4

2. EXPERIMENT-2 5-6

3. EXPERIMENT-3 7-9

4. EXPERIMENT-4 10-11

5. EXPERIMENT-5 12-13

6. EXPERIMENT-6 14-15

7. EXPERIMENT-7 16-17

8. EXPERIMENT-8 18-19

9. EXPERIMENT-9 20-21

10. EXPERIMENT-10 22-23

11. EXPERIMENT-11 24-26

12. EXPERIMENT-12 27-28

13. EXPERIMENT-13 29

14 EXPERIMENT -14 30-31

pg. 2
EXPERIMENT -1
AIM: Generate a random sample of size 30 from exponen al distribu on with
mean 1/θ where θ = 0.5 and 2. Also calculate it’s means.
Source: R So ware

Generate Random Samples

 We begin by generating random samples from an exponential distribution using the


rexp() function in R.
 The function requires a sample size and a rate parameter, where the rate is θ.

Source Code and Output:

Sample Generation in R
1 for θ = 0.5
> #Generate a random sample of size 30 from exponential distribution with parameter theta = 0.5
> set.seed(123)
> theta1 <- 0.5
> sample1 <- rexp(30, rate = theta1)
> sample1
[1] 2.05573923 0.56933692 3.12610378 0.08417659 0.19726178 0.19713862 0.56077032 0.591573
92
[9] 1.94485930 1.84805445 3.28480905 3.23982554 5.07229108 3.04309925 0.76002841 0.477025
93
[17] 0.93297485 0.08454290 0.63935380 1.28722184 1.14512621 0.43211101 8.99734680 3.714181
11
[25] 1.37091728 2.87890525 3.46230797 2.48956660 2.92660112 3.07478796
> mean1 <- mean(sample1)
> mean1
[1] 2.016268

2. for θ = 2

> #Generate a random sample of size 30 from exponential distribution with parameter theta = 2
> #set.seed(123) # For reproducibility
> theta2 <- 2
> sample2 <- rexp(30, rate = theta2)
> sample2
[1] 0.51393481 0.14233423 0.78152594 0.02104415 0.04931544 0.04928466 0.14019258 0.147893
48
[9] 0.48621482 0.46201361 0.82120226 0.80995638 1.26807277 0.76077481 0.19000710 0.119256
48
[17] 0.23324371 0.02113573 0.15983845 0.32180546 0.28628155 0.10802775 2.24933670 0.928545
28
[25] 0.34272932 0.71972631 0.86557699 0.62239165 0.73165028 0.76869699
> mean2 <- mean(sample2)

pg. 3
> mean2
[1] 0.504067

Conclusion
By generating random samples from an exponential distribution with specified parameters an
d calculating their means:
1. For θ = 0.5:
o Generated a random sample of size 30.
o The theoretical mean is 1/0.5 = 2
o The calculated mean of the sample is approximately 2.016268.
2. For θ = 2:
o Generated a random sample of size 30.
o The theoretical mean is 1/2 = 0.5.
o The calculated mean of the sample is approximately 0.504067.
These results show that the calculated means of the samples are close to their respective
theoretical means, demonstrating how random sampling can be used to estimate the
characteristics of probability distributions.

pg. 4
EXPERIMENT – 2
AIM: Let X∼N(12,16) with mean of 12 and standard devia on is 4 then find out
the –
1. P [ X > 20] = 1 – P [ X <=20] = 1 – φ (20)
2. P [ X < 5] = P [ X ≤ 5] = φ (5)
3. P [ 12 ≤ X ≤ 16] = φ (16) – φ (12)
Source: R So ware
1. Define the Normal Distribu on:
 We'll use the pnorm func on in R, which gives the cumula ve distribu on func on
(CDF) for a normal distribu on.

2. Calculate Probabili es:


 P (X > 20)
 P (X < 5)
 P (12 ≤ X ≤ 16)

Source Code and Output:


> # Define the parameters of the normal distribution
> mean <- 12
> sd <- 4
> # 1. Calculate P (X > 20) = 1 - P(X <= 20)
> P_X_greater_20 <- 1 - pnorm(20, mean = mean, sd = sd)
> P_X_greater_20
[1] 0.02275013
> # 2. Calculate P (X < 5) = P (X <= 5)
> P_X_less_5 <- pnorm(5, mean = mean, sd = sd)
> P_X_less_5
[1] 0.04005916
> # 3. Calculate P (12 <= X <= 16)
> P_12_to_16 <- pnorm(16, mean = mean, sd = sd) - pnorm(12, mean = mean, sd = sd)
> P_12_to_16
[1] 0.3413447
> # Output the results
> cat ("P (X > 20) =", P_X_greater_20, "\n")
P(X > 20) = 0.02275013
> cat("P(X < 5) =", P_X_less_5, "\n")
P(X < 5) = 0.04005916
> cat("P(12 <= X <= 16) =", P_12_to_16, "\n")
P(12 <= X <= 16) = 0.3413447

pg. 5
Conclusion:

For the normal distribution X∼N(12,16)X \sim N(12, 16) with a mean of 12 and a standard
deviation of 4, we have calculated the following probabilities using R:

1. Probability P (X > 20):


o P (X > 20) = 1 - P (X ≤ 20)
o Using the cumulative distribution function (CDF), we find that P (X > 20) = 1−
Φ(20) = 0.02275.
2. Probability P (X < 5):
o P (X < 5) = P (X ≤ 5)
o Using the CDF, we find that P (X < 5)= Φ(5) = 0.04006
3. Probability P (12 ≤ X ≤ 16):
o P (12≤ X ≤16) = Φ (16) – Φ (12)
o Using the CDF, we find that P (12≤ X ≤16) = 0.8413−0.5=0.3413.

Results:

 P (X > 20) = 0.02275


 P (X < 5) = 0.04006
 P (12≤ X ≤16) = 0.3413

pg. 6
EXPERIMENT – 3
AIM: Let u and v independent uniform variate (u, v) follow U(0,1) and let

R = √(-2logu), θ = 2πv then x = R cos θ , y = R sin θ

follow independently normal distribution with mean 0 and variance

1.To check relationship between x and y make a column for correlation =


corr(x,y)

2. check x and y are normality make a histogram

SOURCE: R Software

We generate the variables u and v, compute x and y, calculate the correlation


between x and y, and check for their normality by making histograms:

1. Generate the variables u and v following uniform distribution U(0,1).

2. Compute R and θ.

3. Calculate x and y.

4. Calculate the correlation between x and y.

5. Check the normality of x and y by making histograms


> # Number of samples
> n <- 1000
>
> # Generate u and v
> u <- runif(n)
> v <- runif(n)
>
>
> # Compute R and θ
> R <- sqrt(-2 * log(u))
> theta <- 2 * pi * v
>
>
> # Calculate x and y
> x <- R * cos(theta)
> y <- R * sin(theta)
>
> # Calculate correlation between x and y
> correlation <- cor(x, y)
> print(paste("Correlation between x and y:", correlation))
[1] "Correlation between x and y: -0.0726968617306446"

pg. 7
>
>
> # Check normality of x and y by making histograms
> par(mfrow=c(1,2)) # Set up the plotting area to show two plo
ts side by side
>
> # Histogram for x
> hist(x, main="Histogram of x", xlab="x", col="lightblue", br
eaks=30)
>
> # Histogram for y
> hist(y, main="Histogram of y", xlab="y", col="lightgreen", b
reaks=30)

Conclusion:

Generating variables u and v following a uniform distribution U(0,1), and


calculating x and y from these variables, we obtained a correlation value and
checked the normality of x and y using histograms. Here's a summary:

pg. 8
1. Correlation between x and y: The calculated correlation value was very
close to 0, indicating that x and y are uncorrelated, as expected.
2. Normality Check:
o The histograms of x and y showed a bell-shaped distribution,
indicating that both x and y approximately follow a normal
distribution with mean 0 and variance 1.

Thus, we can conclude that the transformation x = R cos θ and y = R sin θ,


derived from independent uniform variables u and v, results in variables xx and
y that are independently normally distributed with mean 0 and variance 1.

pg. 9
EXPERIMENT-4
Aim-3: The win-loss record of a certain basketball team for their 50
consecutive games was as follows: -
6W L 6W L W L 3W 2L 4W L 3W 2L 6W 2L 2W 3L 2W L 3W

Apply the run test to test that the sequence of win and loss is random.

Source: R Software
Hypothesis:
Ho: The sequence of win and loss is random
H1: The sequence of win and loss is not random.

Source Code and Output:

#Make a vector
> dat=c('W','W','W','W','W','W','L','W','W','W','W','W','W','L','W','L','W
','W','W','L','L','W','W','W','W','L','W','W','W','L','L','W','W','W','W',
'W','W','L','L','W','W','L','L','L','W','W','L','W','W','W')

> #Now calculate no. of win and no. loss


> n1=0
> n2=0
> for(i in dat){if(i=="W"){n1=n1+1}else{n2=n2+1}}
> n=n1+n2
> print(n1);print(n2);print(n)
[1] 36
[1] 14
[1] 50

> #Now calculate run.


> count=0
> for(i in seq(1,length(dat)-1)){
+ first=dat[i]
+ second=dat[i+1]
+ if(first !=second){
+ count=count+1
+ }
+ }
> count=count+1
> G=count
> print(G)
[1] 19

> #For large sample, now calculate mean and Standard deviation
> mean_G=((2*n1*n2)/n)+1
> print(mean_G)
[1] 21.16
> sd_G= sqrt((2*n1*n2*((2*n1*n2)-n1-n2))/((n^2)*(n-1)))

pg. 10
> print(sd_G)
[1] 2.807663

> #Calculate Test statistics.


> Z=(G-mean_G)/sd_G
> print(Z)
[1] -0.7693231

> mod_Z=abs(Z)
> print(mod_Z)
[1] 0.7693231

> #calculate p value


> #for odd value of G,G=(2*k)+1
> k=(G-1)/2
> p_value=((choose(n1-1,k)*choose(n2-1,k))+(choose(n2-1,k)*choose(n1-1,k))
)/(choose(n,n1))
> print(p_value)
[1] 0.1076602

Conclusion:

At α = 0.05, p* = 0.1076602
Here, we can conclude that the value of p* is greater than the value of α (p*>
α). So, we accept the null hypothesis, and we can say that the sequence of win
and loss is random

pg. 11
EXPERIMENT – 5
AIM: The following data shows millage/gallon(mpg) obtain with 40
new Honda cars
24.1,25,24.8,24.3,25.3,24.2,23.6,24.5,24,23,23.8,23.3,24.5,24.6,24,25.2,27.7,2
4.1,24.6,24.9,24.1,25.8,24.2,24.2,24.1,25.6,24.5,25.1,24.6,24.3,25.2,24.7,24.4,
23.2,25.2,24.4,24.2,24.8,23.3,24.9

Perform a test to see if the median mpg exceeds median 24.4 at 5%


level of significance.
SOURCE: R Software
Hypothesis:
Ho: The population median is = 24.45
H1: The population median is greater than 24.45

Source Code and Output:


> ##one sample test
> #Using Right Tailed
> Y=c(24.1,25,24.8,24.3,25.3,24.2,23.6,24.5,24,23,23.8,23.3,24.5,24.6,24,2
5.2,27.7,24.1,24.6,24.9,24.1,25.8,24.2,24.2,24.1,25.6,24.5,25.1,24.6,24.3,
25.2,24.7,24.4,23.2,25.2,24.4,24.2,24.8,23.3,24.9)
> install.packages("nonpar")
WARNING: Rtools is required to build R packages but is not currently inst
alled. Please download and install the appropriate version of Rtools befo
re proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/Anshu/AppData/Local/R/win-library/4.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.4/nonpar_1.0.2
.zip'
Content type 'application/zip' length 42555 bytes (41 KB)
downloaded 41 KB

package ‘nonpar’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


C:\Users\Anshu\AppData\Local\Temp\Rtmp0y2xEL\downloaded_packages
> library(nonpar)
> signtest(Y,m=median(Y),alternative="greater")

Large Sample Approximation for the Sign Test

H0: The population median is = 24.45


HA: The population median is greater than 24.45

B = 20

Significance Level = 0.05

pg. 12
The p-value is 0.562816469418554

Conclusion: -
Hence, we conclude that the p-value the p-value is greate
r than or equal to 0.05, we fail to reject the null hypothesis and conc
lude that there is no significant evidence that the median mpg excee
ds 24.4.

pg. 13
EXPERIMENT – 6
Aim: Use the sign test to test if the new drug is effec ve or not for the given
set of data.
Before 30 28 34 35 14 42 33 38 34 45 28 27 25 41 36

A er 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36

Source: R So ware
Hypothesis:
H0: The new drug has no effect. (The median difference is zero)
H1: The new drug has an effect. (The median difference is not zero)
Source Code and Output:
> before <- c(30, 28, 34, 35, 14, 42, 33, 38, 34, 45, 28, 27, 25, 41, 36)
> a er <- c(32, 29, 33, 32, 37, 43, 40, 41, 37, 44, 27, 33, 30, 38, 36)
> difference <- a er - before
> posi ve_signs <- sum(difference > 0)
> nega ve_signs <- sum(difference < 0)
> zero_signs <- sum(difference == 0)
> n <- posi ve_signs + nega ve_signs
> p_value <- 2 * pbinom(min(posi ve_signs, nega ve_signs), n, 0.5)
> cat("Number of posi ve signs: ", posi ve_signs, "\n")
Number of posi ve signs: 9
> cat("Number of nega ve signs: ", nega ve_signs, "\n")
Number of nega ve signs: 5
> cat("Number of zero signs: ", zero_signs, "\n")
Number of zero signs: 1
> cat("P-value: ", p_value, "\n")
P-value: 0.4239502

pg. 14
Conclusion:
Hence we conclude that the p-value( 0.42395) > α(0.05) . So, we fail to reject
the null hypothesis and conclude that there is not enough evidence to say that
the new drug has a significant effect.

pg. 15
EXPERIMENT – 7
AIM: Let us consider the following data which shows IQ score of 15 randomly
selected employees given as
99 100 90 94 135 108 107 111 119 104 127 109 117 105 125
Test to see if the IQ Scores of employees of new HR departments different
from 107 at 5% level of significance
Source: R Software
Hypothesis:
Ho: The data of IQ score of employees selected at random.
H1: The data of IQ score of employees not selected at random.

Source Code and Output:


> data = c(99, 100, 90, 94, 135, 108, 107, 111, 119, 104, 127, 109, 117, 105,
125)
>
> md = 107
> print(md) #Given median is 107.
[1] 107

> wilcox.test(data, mu = md) (With the help of BSDA package)


Wilcoxon signed rank test with continuity correction
data: data
V = 64.5, p-value = 0.4702
alternative hypothesis: true location is not equal to 107

Conclusion:
pg. 16
Here, we can conclude that the value of p* is greater than the value of α
(p*> α). So, we
accept the null hypothesis, and we can say that the data of IQ score of
employees selected at random.

pg. 17
EXPERIMENT – 8
AIM: Let us consider the following data which shows IQ score of 32 randomly
selected employees in addi on to earlier 15. So the IQ Score of 32 randomly
selected employees are
99 100 90 94 135 108 107 111 119 104 127 109 117 105 125 98
112 85 92 140 111 115 137 117 123 132 83 120 82 106 147 110
Perform the test at 5% level of significance to test if IQ Score of employees
differ from 107.
SOURCE: R So ware

Hypothesis:
Ho: The data of IQ score of employees selected at random.
H1: The data of IQ score of employees not selected at random.

Source Code and Output:

> data <- c(99,100,90,94,135,108,107,111,119,104,127,109,117,105,125,98,112,


1,
+ 115,137,117,123,132,83,120,82,106,147,110)
> data
[1] 99 100 90 94 135 108 107 111 119 104 127 109 117 105 125 98 112 85
115 137 117
[25] 123 132 83 120 82 106 147 110
> md <- median(data)
> print(md)
[1] 110.5
> n_pos <- which(data > md)
> print(n_pos)
[1] 5 8 9 11 13 15 17 20 21 22 23 24 25 26 28 31
> n_neg <- which(data < md)
> print(n_neg)
[1] 1 2 3 4 6 7 10 12 14 16 18 19 27 29 30 32
> n_eq <- which(data == md)
> print(n_eq)
integer(0)
> n = 32
> s_n_pos <- sum(n_pos)
> print(s_n_pos)
[1] 298
> E_n_pos <- ((n*(n+1))/4)
> print(E_n_pos)
[1] 264
> V_n_pos <- ((n*(n+1)*((2*n)+1))/24)
> print(V_n_pos)

pg. 18
[1] 2860
> sqrt_V <- sqrt(V_n_pos)
> print(sqrt_V)
[1] 53.47897
> Z_not <- ((s_n_pos - E_n_pos)/sqrt_V)
> Z_not
[1] 0.635764
> Z <- pnorm(abs(Z_not))
> print(Z)
[1] 0.7375349
> p_value <- 2*(1-Z)
> print(p_value)
[1] 0.5249303
> #By direct method
> result <- wilcox.test(data, mu = 107)

> print(result)

Wilcoxon signed rank test with continuity correction

data: data
V = 318, p-value = 0.1731
alternative hypothesis: true location is not equal to 107

> result1 <- wilcox.test(data, mu = md)

> print(result1)

Wilcoxon signed rank test with continuity correction

data: data
V = 272, p-value = 0.8884
alternative hypothesis: true location is not equal to 110.5

Conclusion:

Here, we can conclude that the value of p* is greater than the value
of α (p*> α). So, we accept the null hypothesis, and we can say that
the data of IQ score of employees selected at random.

pg. 19
EXPERIMENT – 9
AIM: Let a random sample of 10 diabe c pa ents is selected at random to see
the effect of reducing blood sugar level of a new drug. Their fas ng blood
sugar level gram/ml are measured ini ally and a er using the drug for the
treatment. Perform a test at 1% level of significance to see if the drug is
effec ve in reducing blood sugar.
Pa ent Id 1 2 3 4 5 6 7 8 9 10
Before 125 132 137 128 140 160 155 145 132 152
A er 110 135 125 132 140 142 148 127 132 140

SOURCE: R So ware
Hypothesis:
Ho: Drug is effec ve in reducing blood sugar level
H1: Drug is not effec ve in reducing blood sugar level

Source Code and Output:


> #Create vector and input data
> before=c(125,132,137,128,140,160,155,145,132,152)
> after=c(110,135,125,132,140,142,148,127,132,140)
> SIGN.test(x=before, y=after, alternative="two.sided" ,conf.level= 0.99) #(package used for SIGN.test
is BSDA)

Dependent-samples Sign-Test

data: before and after


S = 6, p-value = 0.2891
alternative hypothesis: true median difference is not equal to 0
99 percent confidence interval:
-3.588 18.000
sample estimates:
median of x-y
9.5

Achieved and Interpolated Confidence Intervals:

Conf.Level L.E.pt U.E.pt


Lower Achieved CI 0.9785 -3.000 18
Interpolated CI 0.9900 -3.588 18

pg. 20
Upper Achieved CI 0.9980 -4.000 18

Conclusion:
Hence, we conclude that the value of p* is greater than the
value of
α (p* > α). So, we accept the null hypothesis and we can say that the drug is effective in
reducing
blood sugar level.

pg. 21
EXPERIMENT - 10
Aim: Generate a random sample of size 25 from uniform (2,5) and round it
upto second decimal place. Calculate ECDF. Also plot the ECDF.
Source: R So ware
Source Code and Output:
> # Step 1: Generate a random sample
> set.seed(123)
> # Se ng seed for reproducibility
> sample_size <- 25
> random_sample <- runif(sample_size, min = 2, max = 5)
> print(random_sample)
[1] 2.862733 4.364915 3.226931 4.649052 4.821402 2.136669 3.584316 4.677257
[9] 3.654305 3.369844 4.870500 3.360002 4.032712 3.717900 2.308774 4.699475
[17] 2.738263 2.126179 2.983762 4.863511 4.668618 4.078410 3.921520 4.982809
[25] 3.967117
> # Step 2: Round up to the second decimal place
> rounded_sample <- round(random_sample, 2)
> print(rounded_sample)
[1] 2.86 4.36 3.23 4.65 4.82 2.14 3.58 4.68 3.65 3.37 4.87 3.36 4.03 3.72 2.31
[16] 4.70 2.74 2.13 2.98 4.86 4.67 4.08 3.92 4.98 3.97
> # Step 3: Calculate ECDF
> ecdf_func <- ecdf(rounded_sample)
> print(ecdf_func)
Empirical CDF
Call: ecdf(rounded_sample)
x[1:25] = 2.13, 2.14, 2.31, ..., 4.87, 4.98
> # Step 4: Plot ECDF
> plot(ecdf_func, main = "Empirical Cumula ve Distribu on Func on",
+ xlab = "Value", ylab = "ECDF", ver cals = TRUE, do.points = FALSE,col="red")

pg. 22
Conclusion:
Hence, we conclude that we successfully generated random sample of size 25
from uniform distribu on U(2,5) and rounded it to second decimal place. We
then calculated the Empirical Cumula ve Distribu on Func on (ECDF) for this
sample and plo ed it.

pg. 23
EXPERIMENT – 11
AIM: The following data shows the water pH on 16 water sample
collected at the river Ganga at different location:
7.2, 8.8, 9.3, 7.1, 8.6, 10.1, 9.0, 6.4, 7.4, 8.9, 7.3, 5.8, 6.4, 9.9, 5.2, 6.8.

Test whether the water pH data are normal at 5% level of


significance. Given that D16,0.05 =0.327.

SOURCE: R Software

Hypothesis:
Ho: F(x) is comes from normal distribution, N (µ , σ2)

H1: F(x) is not comes from normal distribution to N(µ,σ2)

Source Code and Output:


> data = c(7.2, 8.8, 9.3, 7.1, 8.6, 10.1, 9.0, 6.4, 7.4, 8.9, 7.3, 5.8, 6.
4, 9.9, 5.2, 6.8)
> data
[1] 7.2 8.8 9.3 7.1 8.6 10.1 9.0 6.4 7.4 8.9 7.3 5.8 6.4 9.9
[15] 5.2 6.8
> me = mean(data)
> me
[1] 7.7625
> variance = var(data)
> variance
[1] 2.2105
>
>
> sort_data = sort(data)
> print(sort_data)
[1] 5.2 5.8 6.4 6.4 6.8 7.1 7.2 7.3 7.4 8.6 8.8 8.9 9.0 9.3
[15] 9.9 10.1
>
>
> cdf = pnorm(data, me, variance)
> print(cdf)
[1] 0.3995673 0.6805906 0.7566428 0.3822007 0.6476091 0.8548476 0.7122017
[8] 0.2688231 0.4348695 0.6965800 0.4171349 0.1873220 0.2688231 0.8332219
[15] 0.1231792 0.3316281
>
> ECDF = numeric(0)
>
> for (i in 1:16) {
+ if (i == 3){
+ ECDF = c(ECDF, 4/16)
+ }else{
+ ECDF = c(ECDF, i/16)
+ }
+ }
> print(ECDF)
[1] 0.0625 0.1250 0.2500 0.2500 0.3125 0.3750 0.4375 0.5000 0.5625 0.6250
[11] 0.6875 0.7500 0.8125 0.8750 0.9375 1.0000

pg. 24
> print(ECDF)
[1] 0.0625 0.1250 0.2500 0.2500 0.3125 0.3750 0.4375 0.5000 0.5625 0.6250
[11] 0.6875 0.7500 0.8125 0.8750 0.9375 1.0000
>
>
> #Now Creating a Data Frame
> df = data.frame(sort_data, ECDF)
> print(df)
sort_data ECDF
1 5.2 0.0625
2 5.8 0.1250
3 6.4 0.2500
4 6.4 0.2500
5 6.8 0.3125
6 7.1 0.3750
7 7.2 0.4375
8 7.3 0.5000
9 7.4 0.5625
10 8.6 0.6250
11 8.8 0.6875
12 8.9 0.7500
13 9.0 0.8125
14 9.3 0.8750
15 9.9 0.9375
16 10.1 1.0000
>
> me = mean(data) #Mean of data
> print(me)
[1] 7.7625
> var = var(data) #Variance of data
> print(var)
[1] 2.2105
>
> sd <- sqrt(var) #SD of data
> print(sd)
[1] 1.486775
>
> cdf <- pnorm(sort_data, me, sd)
> print(cdf)
[1] 0.04239645 0.09342234 0.17972515 0.17972515 0.25869485 0.32794480
[7] 0.35259063 0.37787143 0.40368654 0.71338460 0.75735483 0.77788768
[13] 0.79739103 0.84945937 0.92473692 0.94204731
>
> resulted_data <- data.frame(sort_data, cdf, ECDF)
> print(resulted_data)
sort_data cdf ECDF
1 5.2 0.04239645 0.0625
2 5.8 0.09342234 0.1250
3 6.4 0.17972515 0.2500
4 6.4 0.17972515 0.2500
5 6.8 0.25869485 0.3125
6 7.1 0.32794480 0.3750
7 7.2 0.35259063 0.4375
8 7.3 0.37787143 0.5000
9 7.4 0.40368654 0.5625
10 8.6 0.71338460 0.6250
11 8.8 0.75735483 0.6875
12 8.9 0.77788768 0.7500
13 9.0 0.79739103 0.8125
14 9.3 0.84945937 0.8750
15 9.9 0.92473692 0.9375
16 10.1 0.94204731 1.0000
>
> #Find maximum of absolute differences between ECDF and cdf.
> diff <- abs(ECDF - cdf)
> print(diff)
[1] 0.02010355 0.03157766 0.07027485 0.07027485 0.05380515 0.04705520
[7] 0.08490937 0.12212857 0.15881346 0.08838460 0.06985483 0.02788768
[13] 0.01510897 0.02554063 0.01276308 0.05795269

pg. 25
>
> max_diff <- max(diff)
> print(max_diff)
[1] 0.1588135
>
> #By direct method.
> KS_test <- ks.test(sort_data, "pnorm", mean = me, sd = sd)
Warning message:
In ks.test.default(sort_data, "pnorm", mean = me, sd = sd) :
ties should not be present for the one-sample Kolmogorov-Smirnov test
> print(KS_test)

Asymptotic one-sample Kolmogorov-Smirnov test

data: sort_data
D = 0.15881, p-value = 0.8145
alternative hypothesis: two-sided

Conclusion:
Here, we can conclude that the value of p* is greater than the value of α (p*>
α). So, we accept the null hypothesis and we can say that F(x) is comes from
normal distribu on, N(µ,σ2) .

pg. 26
EXPERIMRNT – 12
Aim: Let the following data represent the life me in hours of ba ery for two
different brands:
Brand A: 40,30,40,45,55,30
Brand B: 50,50,45,55,60,40
Test whether their two-brand differ w.r.t their average life and plot the graph.
Given that
D6,6,0.05=4/6 and t10,0.05= 2.2281
Source: R So ware
Hypothesis:
Ho: Life me of two different brand of ba ery is not different
H1: Life me of two different brand of ba ery is different
Source Code and Output
> brand_A = c(40,30,40,45,55,30)
> brand_B = c(50,50,45,55,60,40)
>
> print(brand_A); print(brand_B)
[1] 40 30 40 45 55 30
[1] 50 50 45 55 60 40

>
> KS_result = ks.test(brand_A,brand_B)
> print(KS_result)

Two-sample Kolmogorov-Smirnov test

data: brand_A and brand_B


D = 0.5, p-value = 0.4413
alternative hypothesis: two-sided

Graph:
> #Now plot the graph
> #loading the required package
> library(dgof)
> brand_A = c(40,30,40,45,55,30)
> brand_B = c(50,50,45,55,60,40)
> plot(ecdf(brand_A),xlim = range(c(brand_A,brand_B)),col="red")
> plot(ecdf(brand_B),add=TRUE,lty="dashed",col="yellow”)

pg. 27
Conclusion:
If, Dm,n > Dm,n,α we reject the null hypothesis.
Here, Dm,n = 0.5 & Dm,n,α = 0.66
Since, calculated value i.e (Dm,n) is not greater than tabulated (Dm,n,α) , We fail to reject
null hypothesis.
Hence, we conclude that the life me of two different brand of ba ery is not different.

pg. 28
EXPERIMENT – 13
AIM: Generate a random sample from exponential distribution with mean =5
of size 30. Check whether the generated random sample comes from
exponential with mean=5.
SOURCE: R Software
Hypothesis:
Ho: The sample data comes from an exponential distribution with a mean of 5.

H1: The sample data does not come from an exponential distribution with a mean of 5.
Source Code and Output:

# Set seed for reproducibility


> set.seed(123)
> # Generate random sample from exponential distribution with mean = 5 and size = 30
> mean <- 5
> size <- 30
> lambda <- 1 / mean
> random_sample <- rexp(size, rate = lambda)
> # Print the generated random sample
> print(random_sample)
[1] 4.2172863 2.8830514 6.6452743 0.1578868 0.2810549 1.5825061 1.571136
0.7263340 13.6311823

[10] 0.1457672 5.0241503 2.4010736 1.4050681 1.8855892 0.9414202 4.2489306


7.8160177 2.3938021

[19] 2.9546742 20.2050586 4.2157487 4.8293561 7.4263790 6.7402224 5.8426449


8.0292617 7.4837143

[28] 7.8532627 0.1588387 2.9892485

> # Perform Kolmogorov-Smirnov test


> ks_test <- ks.test(random_sample, "pexp", rate = lambda)
> # Print the result of the goodness-of-fit test
> print(ks_test)

Exact one-sample Kolmogorov-Smirnov test

data: random_sample
D = 0.13405, p-value = 0.6064
alternative hypothesis: two-sided

Conclusion:

Hence, we conclude that if the p-value is high (Typically > 0.05), we fail to reject the null hypothesis. This suggests
that the exponential distribution with a mean of 5. Depending on the p-value (0.6064) > α (0.05) obtained from
the K-S test, likely comes from an exponential distribution with the specified mean.

pg. 29
EXPERIMENT – 14
AIM: Compare two teaching method a new teaching method and exis ng
teaching method a er a six-month teaching based on the result of the reading
test based on 27 slow learners the post teaching scores out of 200 are given
examine the whether the popula on are different w.r.t their median at 5% level
of significance.
New teaching method 113 119 108 111 114 138 135 120 130 122

Exis ng(teaching 157 160 109 177 142 115 164 155 137 150 170 160 120 162 155 139 175
method

Given that Uα/2 for n1 = 10 and n2 = 17 is 46


The asympto c rela ve efficiency of man - whitney U-test (3/pi)*100 or 95.5%
than the two sample parametric t-test.
SOURCE: R So ware

Hypothesis:
Ho: Mnew = Mexis ng

H1: Mnew ≠ Mexis ng

Where,

 Mnew is the median post-teaching reading score for the new teaching method.

 Mexisting is the median post-teaching reading score for the existing teaching method.

Source Code and Output:

> # New teaching method scores


> new_method <- c(113, 119, 108, 111, 114, 138, 135, 120, 130, 122)
>
>
> # New teaching method scores
> new_method <- c(113, 119, 108, 111, 114, 138, 135, 120, 130, 122)
>
> # Existing teaching method scores
> existing_method <- c(157, 160, 109, 177, 142, 115, 164, 155, 137, 150, 1
70, 160, 120, 162, 155, 139, 175)
>
>

pg. 30
> # Perform the Mann-Whitney U-test
> wilcox_test_result <- wilcox.test(new_method, existing_method)
Warning message:
In wilcox.test.default(new_method, existing_method) :
cannot compute exact p-value with ties
>
> # Display the test results
> print(wilcox_test_result)

Wilcoxon rank sum test with continuity correction

data: new_method and existing_method


W = 20.5, p-value = 0.001305
alternative hypothesis: true location shift is not equal to 0

p-value = 0.001305

CONCLUSION:
Hence, we conclude that the p-value is less than 0.05, we reject the null
hypothesis and conclude that there is a significant difference in the median
post-teaching reading scores between the new and exis ng teaching methods.

pg. 31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy