STATS 10 Assignment 1 PT 2
STATS 10 Assignment 1 PT 2
You may choose to type or write your answers electronically or scan your handwritten
solutions. Please ensure that you show all steps and explanations to receive full credit,
unless otherwise instructed.
1. A data set on Shark Attacks Worldwide posted on StatCrunch records data on all shark
attacks in recorded history including attacks before 1800. The data set can be viewed here:
https://www.statcrunch.com/app/index.html?dataid=2188687
b. Which of the following questions could not be answered using this data set? Briefly
explain.
i. In what month do most shark attacks occur?
ii. Are shark attacks more likely to occur in warm temperature or cooler
temperatures?
1 Cannot be answered because there is no variable about temperature, the rest
can be solved for from the information, but this one cannot be.
iii. Attacks by which species of shark are more likely to result in a fatality?
iv. What country has the most shark attacks per year?
c. A researcher wants to understand the age of the people in the data set and proposed some
questions of interest: Are the reported cases are mostly younger people or older people?
How is the age distributed? How would you help the research answer these questions?
What statistical tools (e.g., graphs, measures) will you use? (You only need to describe
your approach)
i. First, I would advise the researcher to create a histogram with all of the data. The
bin width of the histogram can range from ages such as 15 to 20, 20 to 25, 25 to
30, up to.
ii. From the histogram itself, the researcher can then determine the reported cases
and their corresponding age by looking at the overall shape of the histogram
comma taking note of the skewness and the modality.
2. The scores of a quiz are displayed in the graph below.
b. Would the mean score be greater than, less than, or about the same as the median score?
Explain.
i. In a graph that is skewed left, the mean score would be less than the median
score.
c. What measures would you use to report the center and spread. Explain.
i. To measure the center, the median would be a good measure of a typical value for
skewed distributions. And to measure the spread, the interquartile range would be
best as it would use this median, And provide more information on the range in
the middle 50% of the scores.
3. The distribution of test scores in a class is unimodal and symmetric with a mean of 80 pts
and a standard deviation of 7pts. Based on the information, Adam estimated that his score is
higher than approximately 97.5% of the students in class. What score did Adam receive?
Explain.
i. We can assume that Adams’s score is around 94 points. At the second standard
deviation above the mean, a score of 94 points is already higher than 95%
students in the class. So, since Adam is higher than 97.5% his score must fall
within the highest value of the second standard deviation.
4. Assume that both men and women’s heights have symmetric and unimodal distributions.
Women’s distribution has a mean of 64 inches and a standard deviation of 2.5 inches. Men’s
distribution has a mean of 69 inches and a standard deviation of 3 inches. a. What women’s
height corresponds with a z-score of -1.50?
i. With a Z score of - 1.50, the height corresponding would be around 60.25 inches
for women.
b. Professional basketball player Evelyn Akhator is 75 inches tall and plays in the
WNBA (women’s league). Professional basketball player Draymond Green is 79 inches
tall and plays in the NBA (men’s league). Compared to their own peers, who is taller?
i. Evelyn Akhator has a zscore of around 4.4 inches, and Draymond has a z
score of 3.3. So, Evelyn Akhator is taller than her peers in the WNBA.
5. The top ten movies based on Marvel comic book characters for the U.S. box office as of fall
2017 are shown in the following table, with domestic gross rounded to the nearest hundred
million. (Source: ultimatemovieranking.com)
6. The data set below show the number of central public libraries in 32 states.
The five number summary is given as:
Minimum Q1 Median Q3 Maximum
1 62 91 218 756
Sketch a boxplot using the five-number summary above and the data below.
Mark the values of the quartiles, the lower whisker, the upper whisker, and any potential
outliers in the boxplot. Explain how you determined the length of the whiskers. (The
scale of the plot does not need to be accurate)
Q1 – 1.5*1qr = lower bound (first whisker), Q3+ 1.5*iqr = upper bound (other whisker)
I found the lower bound by subtracting 1.5 * IQR from Q1, and I found the lower bound
by adding 1.5 * IQR from the Q3. For the lower bound I got the value -172, however
the lower bound is at 1, and for the upper bound I got the value 452 which is the length
of the other whisker.