sujal 4
sujal 4
(STS81DC00104)
ASSIGNMENT - 1
SEMESTER – I
M.Sc. STATISTICS
[SESSION: 2024-2026]
DEPARTMENT OF STATISTICS
SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE
CENTRAL UNIVERSITY OF SOUTH BIHAR
pg. 1
INDEX
S.NO. Name Of Experiment Page Date Of Experiment
No.
1. EXPERIMENT-1 3-4
2. EXPERIMENT-2 5-6
3. EXPERIMENT-3 7-9
4. EXPERIMENT-4 10-11
5. EXPERIMENT-5 12-13
6. EXPERIMENT-6 14-15
7. EXPERIMENT-7 16-17
8. EXPERIMENT-8 18-19
9. EXPERIMENT-9 20-21
13. EXPERIMENT-13 29
pg. 2
EXPERIMENT -1
AIM: Generate a random sample of size 30 from exponen al distribu on with
mean 1/θ where θ = 0.5 and 2. Also calculate it’s means.
Source: R So ware
Sample Generation in R
1 for θ = 0.5
> #Generate a random sample of size 30 from exponential distribution with parameter theta = 0.5
> set.seed(123)
> theta1 <- 0.5
> sample1 <- rexp(30, rate = theta1)
> sample1
[1] 2.05573923 0.56933692 3.12610378 0.08417659 0.19726178 0.19713862 0.56077032 0.591573
92
[9] 1.94485930 1.84805445 3.28480905 3.23982554 5.07229108 3.04309925 0.76002841 0.477025
93
[17] 0.93297485 0.08454290 0.63935380 1.28722184 1.14512621 0.43211101 8.99734680 3.714181
11
[25] 1.37091728 2.87890525 3.46230797 2.48956660 2.92660112 3.07478796
> mean1 <- mean(sample1)
> mean1
[1] 2.016268
2. for θ = 2
> #Generate a random sample of size 30 from exponential distribution with parameter theta = 2
> #set.seed(123) # For reproducibility
> theta2 <- 2
> sample2 <- rexp(30, rate = theta2)
> sample2
[1] 0.51393481 0.14233423 0.78152594 0.02104415 0.04931544 0.04928466 0.14019258 0.147893
48
[9] 0.48621482 0.46201361 0.82120226 0.80995638 1.26807277 0.76077481 0.19000710 0.119256
48
[17] 0.23324371 0.02113573 0.15983845 0.32180546 0.28628155 0.10802775 2.24933670 0.928545
28
[25] 0.34272932 0.71972631 0.86557699 0.62239165 0.73165028 0.76869699
> mean2 <- mean(sample2)
pg. 3
> mean2
[1] 0.504067
Conclusion
By generating random samples from an exponential distribution with specified parameters an
d calculating their means:
1. For θ = 0.5:
o Generated a random sample of size 30.
o The theoretical mean is 1/0.5 = 2
o The calculated mean of the sample is approximately 2.016268.
2. For θ = 2:
o Generated a random sample of size 30.
o The theoretical mean is 1/2 = 0.5.
o The calculated mean of the sample is approximately 0.504067.
These results show that the calculated means of the samples are close to their respective
theoretical means, demonstrating how random sampling can be used to estimate the
characteristics of probability distributions.
pg. 4
EXPERIMENT – 2
AIM: Let X∼N(12,16) with mean of 12 and standard devia on is 4 then find out
the –
1. P [ X > 20] = 1 – P [ X <=20] = 1 – φ (20)
2. P [ X < 5] = P [ X ≤ 5] = φ (5)
3. P [ 12 ≤ X ≤ 16] = φ (16) – φ (12)
Source: R So ware
1. Define the Normal Distribu on:
We'll use the pnorm func on in R, which gives the cumula ve distribu on func on
(CDF) for a normal distribu on.
pg. 5
Conclusion:
For the normal distribution X∼N(12,16)X \sim N(12, 16) with a mean of 12 and a standard
deviation of 4, we have calculated the following probabilities using R:
Results:
pg. 6
EXPERIMENT – 3
AIM: Let u and v independent uniform variate (u, v) follow U(0,1) and let
SOURCE: R Software
2. Compute R and θ.
3. Calculate x and y.
pg. 7
>
>
> # Check normality of x and y by making histograms
> par(mfrow=c(1,2)) # Set up the plotting area to show two plo
ts side by side
>
> # Histogram for x
> hist(x, main="Histogram of x", xlab="x", col="lightblue", br
eaks=30)
>
> # Histogram for y
> hist(y, main="Histogram of y", xlab="y", col="lightgreen", b
reaks=30)
Conclusion:
pg. 8
1. Correlation between x and y: The calculated correlation value was very
close to 0, indicating that x and y are uncorrelated, as expected.
2. Normality Check:
o The histograms of x and y showed a bell-shaped distribution,
indicating that both x and y approximately follow a normal
distribution with mean 0 and variance 1.
pg. 9
EXPERIMENT-4
Aim-3: The win-loss record of a certain basketball team for their 50
consecutive games was as follows: -
6W L 6W L W L 3W 2L 4W L 3W 2L 6W 2L 2W 3L 2W L 3W
Apply the run test to test that the sequence of win and loss is random.
Source: R Software
Hypothesis:
Ho: The sequence of win and loss is random
H1: The sequence of win and loss is not random.
#Make a vector
> dat=c('W','W','W','W','W','W','L','W','W','W','W','W','W','L','W','L','W
','W','W','L','L','W','W','W','W','L','W','W','W','L','L','W','W','W','W',
'W','W','L','L','W','W','L','L','L','W','W','L','W','W','W')
> #For large sample, now calculate mean and Standard deviation
> mean_G=((2*n1*n2)/n)+1
> print(mean_G)
[1] 21.16
> sd_G= sqrt((2*n1*n2*((2*n1*n2)-n1-n2))/((n^2)*(n-1)))
pg. 10
> print(sd_G)
[1] 2.807663
> mod_Z=abs(Z)
> print(mod_Z)
[1] 0.7693231
Conclusion:
At α = 0.05, p* = 0.1076602
Here, we can conclude that the value of p* is greater than the value of α (p*>
α). So, we accept the null hypothesis, and we can say that the sequence of win
and loss is random
pg. 11
EXPERIMENT – 5
AIM: The following data shows millage/gallon(mpg) obtain with 40
new Honda cars
24.1,25,24.8,24.3,25.3,24.2,23.6,24.5,24,23,23.8,23.3,24.5,24.6,24,25.2,27.7,2
4.1,24.6,24.9,24.1,25.8,24.2,24.2,24.1,25.6,24.5,25.1,24.6,24.3,25.2,24.7,24.4,
23.2,25.2,24.4,24.2,24.8,23.3,24.9
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/Anshu/AppData/Local/R/win-library/4.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.4/nonpar_1.0.2
.zip'
Content type 'application/zip' length 42555 bytes (41 KB)
downloaded 41 KB
B = 20
pg. 12
The p-value is 0.562816469418554
Conclusion: -
Hence, we conclude that the p-value the p-value is greate
r than or equal to 0.05, we fail to reject the null hypothesis and conc
lude that there is no significant evidence that the median mpg excee
ds 24.4.
pg. 13
EXPERIMENT – 6
Aim: Use the sign test to test if the new drug is effec ve or not for the given
set of data.
Before 30 28 34 35 14 42 33 38 34 45 28 27 25 41 36
A er 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36
Source: R So ware
Hypothesis:
H0: The new drug has no effect. (The median difference is zero)
H1: The new drug has an effect. (The median difference is not zero)
Source Code and Output:
> before <- c(30, 28, 34, 35, 14, 42, 33, 38, 34, 45, 28, 27, 25, 41, 36)
> a er <- c(32, 29, 33, 32, 37, 43, 40, 41, 37, 44, 27, 33, 30, 38, 36)
> difference <- a er - before
> posi ve_signs <- sum(difference > 0)
> nega ve_signs <- sum(difference < 0)
> zero_signs <- sum(difference == 0)
> n <- posi ve_signs + nega ve_signs
> p_value <- 2 * pbinom(min(posi ve_signs, nega ve_signs), n, 0.5)
> cat("Number of posi ve signs: ", posi ve_signs, "\n")
Number of posi ve signs: 9
> cat("Number of nega ve signs: ", nega ve_signs, "\n")
Number of nega ve signs: 5
> cat("Number of zero signs: ", zero_signs, "\n")
Number of zero signs: 1
> cat("P-value: ", p_value, "\n")
P-value: 0.4239502
pg. 14
Conclusion:
Hence we conclude that the p-value( 0.42395) > α(0.05) . So, we fail to reject
the null hypothesis and conclude that there is not enough evidence to say that
the new drug has a significant effect.
pg. 15
EXPERIMENT – 7
AIM: Let us consider the following data which shows IQ score of 15 randomly
selected employees given as
99 100 90 94 135 108 107 111 119 104 127 109 117 105 125
Test to see if the IQ Scores of employees of new HR departments different
from 107 at 5% level of significance
Source: R Software
Hypothesis:
Ho: The data of IQ score of employees selected at random.
H1: The data of IQ score of employees not selected at random.
Conclusion:
pg. 16
Here, we can conclude that the value of p* is greater than the value of α
(p*> α). So, we
accept the null hypothesis, and we can say that the data of IQ score of
employees selected at random.
pg. 17
EXPERIMENT – 8
AIM: Let us consider the following data which shows IQ score of 32 randomly
selected employees in addi on to earlier 15. So the IQ Score of 32 randomly
selected employees are
99 100 90 94 135 108 107 111 119 104 127 109 117 105 125 98
112 85 92 140 111 115 137 117 123 132 83 120 82 106 147 110
Perform the test at 5% level of significance to test if IQ Score of employees
differ from 107.
SOURCE: R So ware
Hypothesis:
Ho: The data of IQ score of employees selected at random.
H1: The data of IQ score of employees not selected at random.
pg. 18
[1] 2860
> sqrt_V <- sqrt(V_n_pos)
> print(sqrt_V)
[1] 53.47897
> Z_not <- ((s_n_pos - E_n_pos)/sqrt_V)
> Z_not
[1] 0.635764
> Z <- pnorm(abs(Z_not))
> print(Z)
[1] 0.7375349
> p_value <- 2*(1-Z)
> print(p_value)
[1] 0.5249303
> #By direct method
> result <- wilcox.test(data, mu = 107)
> print(result)
data: data
V = 318, p-value = 0.1731
alternative hypothesis: true location is not equal to 107
> print(result1)
data: data
V = 272, p-value = 0.8884
alternative hypothesis: true location is not equal to 110.5
Conclusion:
Here, we can conclude that the value of p* is greater than the value
of α (p*> α). So, we accept the null hypothesis, and we can say that
the data of IQ score of employees selected at random.
pg. 19
EXPERIMENT – 9
AIM: Let a random sample of 10 diabe c pa ents is selected at random to see
the effect of reducing blood sugar level of a new drug. Their fas ng blood
sugar level gram/ml are measured ini ally and a er using the drug for the
treatment. Perform a test at 1% level of significance to see if the drug is
effec ve in reducing blood sugar.
Pa ent Id 1 2 3 4 5 6 7 8 9 10
Before 125 132 137 128 140 160 155 145 132 152
A er 110 135 125 132 140 142 148 127 132 140
SOURCE: R So ware
Hypothesis:
Ho: Drug is effec ve in reducing blood sugar level
H1: Drug is not effec ve in reducing blood sugar level
Dependent-samples Sign-Test
pg. 20
Upper Achieved CI 0.9980 -4.000 18
Conclusion:
Hence, we conclude that the value of p* is greater than the
value of
α (p* > α). So, we accept the null hypothesis and we can say that the drug is effective in
reducing
blood sugar level.
pg. 21
EXPERIMENT - 10
Aim: Generate a random sample of size 25 from uniform (2,5) and round it
upto second decimal place. Calculate ECDF. Also plot the ECDF.
Source: R So ware
Source Code and Output:
> # Step 1: Generate a random sample
> set.seed(123)
> # Se ng seed for reproducibility
> sample_size <- 25
> random_sample <- runif(sample_size, min = 2, max = 5)
> print(random_sample)
[1] 2.862733 4.364915 3.226931 4.649052 4.821402 2.136669 3.584316 4.677257
[9] 3.654305 3.369844 4.870500 3.360002 4.032712 3.717900 2.308774 4.699475
[17] 2.738263 2.126179 2.983762 4.863511 4.668618 4.078410 3.921520 4.982809
[25] 3.967117
> # Step 2: Round up to the second decimal place
> rounded_sample <- round(random_sample, 2)
> print(rounded_sample)
[1] 2.86 4.36 3.23 4.65 4.82 2.14 3.58 4.68 3.65 3.37 4.87 3.36 4.03 3.72 2.31
[16] 4.70 2.74 2.13 2.98 4.86 4.67 4.08 3.92 4.98 3.97
> # Step 3: Calculate ECDF
> ecdf_func <- ecdf(rounded_sample)
> print(ecdf_func)
Empirical CDF
Call: ecdf(rounded_sample)
x[1:25] = 2.13, 2.14, 2.31, ..., 4.87, 4.98
> # Step 4: Plot ECDF
> plot(ecdf_func, main = "Empirical Cumula ve Distribu on Func on",
+ xlab = "Value", ylab = "ECDF", ver cals = TRUE, do.points = FALSE,col="red")
pg. 22
Conclusion:
Hence, we conclude that we successfully generated random sample of size 25
from uniform distribu on U(2,5) and rounded it to second decimal place. We
then calculated the Empirical Cumula ve Distribu on Func on (ECDF) for this
sample and plo ed it.
pg. 23
EXPERIMENT – 11
AIM: The following data shows the water pH on 16 water sample
collected at the river Ganga at different location:
7.2, 8.8, 9.3, 7.1, 8.6, 10.1, 9.0, 6.4, 7.4, 8.9, 7.3, 5.8, 6.4, 9.9, 5.2, 6.8.
SOURCE: R Software
Hypothesis:
Ho: F(x) is comes from normal distribution, N (µ , σ2)
pg. 24
> print(ECDF)
[1] 0.0625 0.1250 0.2500 0.2500 0.3125 0.3750 0.4375 0.5000 0.5625 0.6250
[11] 0.6875 0.7500 0.8125 0.8750 0.9375 1.0000
>
>
> #Now Creating a Data Frame
> df = data.frame(sort_data, ECDF)
> print(df)
sort_data ECDF
1 5.2 0.0625
2 5.8 0.1250
3 6.4 0.2500
4 6.4 0.2500
5 6.8 0.3125
6 7.1 0.3750
7 7.2 0.4375
8 7.3 0.5000
9 7.4 0.5625
10 8.6 0.6250
11 8.8 0.6875
12 8.9 0.7500
13 9.0 0.8125
14 9.3 0.8750
15 9.9 0.9375
16 10.1 1.0000
>
> me = mean(data) #Mean of data
> print(me)
[1] 7.7625
> var = var(data) #Variance of data
> print(var)
[1] 2.2105
>
> sd <- sqrt(var) #SD of data
> print(sd)
[1] 1.486775
>
> cdf <- pnorm(sort_data, me, sd)
> print(cdf)
[1] 0.04239645 0.09342234 0.17972515 0.17972515 0.25869485 0.32794480
[7] 0.35259063 0.37787143 0.40368654 0.71338460 0.75735483 0.77788768
[13] 0.79739103 0.84945937 0.92473692 0.94204731
>
> resulted_data <- data.frame(sort_data, cdf, ECDF)
> print(resulted_data)
sort_data cdf ECDF
1 5.2 0.04239645 0.0625
2 5.8 0.09342234 0.1250
3 6.4 0.17972515 0.2500
4 6.4 0.17972515 0.2500
5 6.8 0.25869485 0.3125
6 7.1 0.32794480 0.3750
7 7.2 0.35259063 0.4375
8 7.3 0.37787143 0.5000
9 7.4 0.40368654 0.5625
10 8.6 0.71338460 0.6250
11 8.8 0.75735483 0.6875
12 8.9 0.77788768 0.7500
13 9.0 0.79739103 0.8125
14 9.3 0.84945937 0.8750
15 9.9 0.92473692 0.9375
16 10.1 0.94204731 1.0000
>
> #Find maximum of absolute differences between ECDF and cdf.
> diff <- abs(ECDF - cdf)
> print(diff)
[1] 0.02010355 0.03157766 0.07027485 0.07027485 0.05380515 0.04705520
[7] 0.08490937 0.12212857 0.15881346 0.08838460 0.06985483 0.02788768
[13] 0.01510897 0.02554063 0.01276308 0.05795269
pg. 25
>
> max_diff <- max(diff)
> print(max_diff)
[1] 0.1588135
>
> #By direct method.
> KS_test <- ks.test(sort_data, "pnorm", mean = me, sd = sd)
Warning message:
In ks.test.default(sort_data, "pnorm", mean = me, sd = sd) :
ties should not be present for the one-sample Kolmogorov-Smirnov test
> print(KS_test)
data: sort_data
D = 0.15881, p-value = 0.8145
alternative hypothesis: two-sided
Conclusion:
Here, we can conclude that the value of p* is greater than the value of α (p*>
α). So, we accept the null hypothesis and we can say that F(x) is comes from
normal distribu on, N(µ,σ2) .
pg. 26
EXPERIMRNT – 12
Aim: Let the following data represent the life me in hours of ba ery for two
different brands:
Brand A: 40,30,40,45,55,30
Brand B: 50,50,45,55,60,40
Test whether their two-brand differ w.r.t their average life and plot the graph.
Given that
D6,6,0.05=4/6 and t10,0.05= 2.2281
Source: R So ware
Hypothesis:
Ho: Life me of two different brand of ba ery is not different
H1: Life me of two different brand of ba ery is different
Source Code and Output
> brand_A = c(40,30,40,45,55,30)
> brand_B = c(50,50,45,55,60,40)
>
> print(brand_A); print(brand_B)
[1] 40 30 40 45 55 30
[1] 50 50 45 55 60 40
>
> KS_result = ks.test(brand_A,brand_B)
> print(KS_result)
Graph:
> #Now plot the graph
> #loading the required package
> library(dgof)
> brand_A = c(40,30,40,45,55,30)
> brand_B = c(50,50,45,55,60,40)
> plot(ecdf(brand_A),xlim = range(c(brand_A,brand_B)),col="red")
> plot(ecdf(brand_B),add=TRUE,lty="dashed",col="yellow”)
pg. 27
Conclusion:
If, Dm,n > Dm,n,α we reject the null hypothesis.
Here, Dm,n = 0.5 & Dm,n,α = 0.66
Since, calculated value i.e (Dm,n) is not greater than tabulated (Dm,n,α) , We fail to reject
null hypothesis.
Hence, we conclude that the life me of two different brand of ba ery is not different.
pg. 28
EXPERIMENT – 13
AIM: Generate a random sample from exponential distribution with mean =5
of size 30. Check whether the generated random sample comes from
exponential with mean=5.
SOURCE: R Software
Hypothesis:
Ho: The sample data comes from an exponential distribution with a mean of 5.
H1: The sample data does not come from an exponential distribution with a mean of 5.
Source Code and Output:
data: random_sample
D = 0.13405, p-value = 0.6064
alternative hypothesis: two-sided
Conclusion:
Hence, we conclude that if the p-value is high (Typically > 0.05), we fail to reject the null hypothesis. This suggests
that the exponential distribution with a mean of 5. Depending on the p-value (0.6064) > α (0.05) obtained from
the K-S test, likely comes from an exponential distribution with the specified mean.
pg. 29
EXPERIMENT – 14
AIM: Compare two teaching method a new teaching method and exis ng
teaching method a er a six-month teaching based on the result of the reading
test based on 27 slow learners the post teaching scores out of 200 are given
examine the whether the popula on are different w.r.t their median at 5% level
of significance.
New teaching method 113 119 108 111 114 138 135 120 130 122
Exis ng(teaching 157 160 109 177 142 115 164 155 137 150 170 160 120 162 155 139 175
method
Hypothesis:
Ho: Mnew = Mexis ng
Where,
Mnew is the median post-teaching reading score for the new teaching method.
Mexisting is the median post-teaching reading score for the existing teaching method.
pg. 30
> # Perform the Mann-Whitney U-test
> wilcox_test_result <- wilcox.test(new_method, existing_method)
Warning message:
In wilcox.test.default(new_method, existing_method) :
cannot compute exact p-value with ties
>
> # Display the test results
> print(wilcox_test_result)
p-value = 0.001305
CONCLUSION:
Hence, we conclude that the p-value is less than 0.05, we reject the null
hypothesis and conclude that there is a significant difference in the median
post-teaching reading scores between the new and exis ng teaching methods.
pg. 31