0% found this document useful (0 votes)
204 views

Statistical Method

This book describe some of statistically Method for beginner
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
204 views

Statistical Method

This book describe some of statistically Method for beginner
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 156
STATISTICAL METHODS Donna L. Mohr y. William J. Wilson y Rudolf J. Freund ~~ Academic Press is an imprint of Ehevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Strect, Suite 1650, San Diego, CA 92101, United States 50) Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxiord OX5 1GB, United Kingdom Copyright © 2022 Elsevier Inc. All rights reserved, No pat of this publication may be reproduced or transmitted in any form or by any means, electronic o mechanical, including photocopying, recording, oF any information stonige and revtieval system, without permission in writing from the publither. Detils on how to seek permision, further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www. elsevier.com/permisions. by the This book and the individual contributions contained in ie are protected under copy Publisher (other than as may be noted herein) Notices Knowledge and best practice in this field are constanely changing. As new research and experience broaden our understanding, changes in retearch methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating, and using any information, methods, compounes, or experiments described herein. In using such ormation or methods they should be mindful of their own safety and the safety of others, including partes for whom they have a professional responsibilty. ‘To the fallest extent of the law, nsither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or oderwise, or fiom any use oF operation of any methods, products, instructions, of ideas contained in the material herein. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Pablication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12 :23043-5, For Information on all Academie Press publications visit our website at htps://www.elsevier.com/books-and-joumnals ‘Content Strategist: Katey Birtcher [Content Development Specialist: Alice Grant Publishing Services Manager: Shereen Jameel Project Manager: Rukmani Krishnan ‘Typeset by MPS Limited, Chennai, India Printed in India aA veloping conics List digit isthe print number: 9-8 7 6 5 43.2 1 Contents Preface ow 1, Data and Statistics 1 1A Introduction 1 11 Data Sources 4 1.2. Using the Computer 5 1.2. Observations and Variables 5 13 Types of Measurements for Variables r 14 Distributions 3 14.1. Graphical Representation of Distrbsstions 15 1S. Numerical Descriptive Statistics 19 15.1 Location 20 152 _ Dispersion Pa 153° Other Measures, 23 1.54. Computing the Mean and Standard Deviation from a Frequency Oistribution 31 155. Change of Scale 31 16 Exploratory Data Analysis 33 1.6.1 The Stem and Leaf Piot 3 1.62. The Box Plot 4 1.63. Examples of Exploratory Data Analysis 35 17 Bivariate Data 37 17.1 Categorical Variables 38 17.2 Categorical and interval Variables 40 1.73 Interval Variables 40 18 Populations, Samples, and Statistical inference—A Preview 2 19. Data Collection 8 1.10. Chapter Summary 45 LAAT Chapter Bxercises 49 Concept Questions 0 Practice Exercises 53 Brercises 54 Projects 64 2. Probability and Sampling Distributions 65 21 Introduction 66 21.1. Chapter Preview 69 22. Probability cy Contents 23 24 25 26 27 28 221 Definitions and Concepts 222 System Reliability 22.3. Random Variables Discrete Probability Distrbutions 23.1. Properties of Discrete Probability Distributions 232. Descriptive Measures for Probability Distibutions 233. The Discrete Uniform Distrfoution 234 The Binomial Distibution 235. The Poisson Distribution Continuous Probabiity Distributions 244 Characteristics of a Continuous Probability Distribution 242. The Continuous Uniform Distribution 243. The Normal Distribution 244 Calculating Probabilities Using the Table of the Normal Distribution Sampling Distributions 25.1 Sampling Distribution of the Mean. 252. Useluiness of the Sampling Distribution 253. Sampling Distibution of a Proportion ther Sampling Distributions 26.1 ‘The x Dstribution 262. Distibution of the Sample Variance 263 The f Distribution 264 Using the ¢ Distribution 265. The Distribution 266 Using the F Distribution 26,7 Relationships among the Distributions Chapter Summary Chapter Exercises Concept Questions Practice Exercises Brercises 3. Principles of Inference 3. 32 Introduction Hypothesis Testing 3.2.1. General Considerations 3.22 The Hypotheses 3.23 Rules for Making Decisions 3.24 Possible Errors in Hypothesis Testing 3.25 Probabilities of Making Errors 123 124 125 125 126 128 129 130 4 33 34 3S 36 37 3.2.6 Choosing between a and 32.7. Five-Step Procedure for Hypothesis Testing 3.2.8 Why Do We Focus on the Type | Eror? 3.29 Choosing a 32.10. The Five Steps for Example 33 3211p Values 32.12. The Probability of a Type Il Eror 3213 Power 32.14. Unformly Most Powerful Tests 3.2.15 One-Taied Hypothesis Tests Estimation 32.1 Intempreting the Confidence Coefficient 33.2 Relationship between Hypothesis Testing and Confidence intervals Sample Size Assumotions 35.1. Statistical Significence versus Practical Significance Chapter Summary Chapter Exercises, Concept Questions Practice Exercises Multiple Choice Questions Bercises Inferences on a Single Population at aa 43 44 4s Introduction Inferences on the Population Meen 42.1 Hypothesis Test on 422. Estimation of 423. Sample Size 424 Degrees of Freedom Inferences on a Proportion for Large Samples 43.1 Hypothesis Test on p 432. Estimation of p 433. Sample Size Inverences on the Variance of One Population 44: Hypothesis Test on 0? 442. Estimation of 0? Assumotions 45.1 Required Assumptions and Sources of Violations 45.2. Detection of Violations Con ents 132 133 134 135 138 139 141 144 5 5 147 149 151 152 155 156 197 159 159 160 161 168 169 ” m1 m 175 176 Ww 78 178 179 181 181 182 184 184 185 185 vil vil Contents 453. Tests for Normality 45.4 IF Assumptions Fall 455 Altemate Methodology 46 Chapter Summary AJ. Chapter Exercises Concept Questions Practice Exercises Bercises Projects 5. Inferences for Two Populations 5.1. Introduction 52. Inferences on the Difference between Means Using Independent Samples 52.1. Sampling Distribution of a Linear Function of Random Variables 52.2 The Sampling Distribution of the Difference between Two Means 523. Variances Known 5.24 Variances Unknovin but Assumed Equal 525. The Pooled Variance Estimate 526 The Pooled’ t Test 52.7 Variances Unknown but Not Equal 52.8 Choosing between the Pooled and Unequal Varience t Tests 53. inferences on Variances 54 Inferences on Means for Dependent Samples 55. Inferences on Propations for Large Samples 55.1 Comparing Proportions Using Independent Sales 55.2. Comparing Propertions Using Paired Samples 5.46 Assumptions and Remedial Methods 527 Chapter Summary 58 Chapter Exercises Concent Questions Practice Exercises Bercises Projects 6. Inferences for Two or More Means 6.1 Introduction 65.1 Using Statistical Software 62 The Analyss of Variance 62.1 Notation and Definitions 62.2. Heuristic Justification for the Analyss of Variance 186 188 190 191 192 192 192 193 19 201 203 204 204 205 206 207 207 208 210 23 24 218 22 23 226 207 B0 Bi Bl 232 23 240 243 ™ 245 m5 7 a) Contents x 623 Computational Formulas and the Partitioning of Sums of Squares 232 624 The Sum of Squares between Means 252 625. The Sum of Squates within Groups 23 62.6 The Ratio of Variances 23 627 Parttioning of the Sums of Squares 253 63. The Linear Model 256 63.1 TheLinear Model for 2 Single Population 256 63.2. The Linear Mode! for Several Populations 37 633 The Analysis of Vaiance Model BI 634. Fixed and Random Effects Model 258 635. The Hypotheses 258 636 Expected Mean Squares 259 64. Assumotions 260 64.1. Assumptons and Detection of Violations 260 642. Formal Tests for the Assumption of Equal Variance 21 643 Remedial Measures 262 65° Specific Comparisons 255 65.1 Conirasts 266 65.2 Constructing a t Statistic for a Contrast 27 653. Planned Contrasts with No Pattern—Bonferron's Method 258 65. Planced Comparisons versus Contrel—Dunnett's Method 259 655 Planned All Possible Pairwise Comparsons—Fishers LSD and Tukey's HSD 270 656 Planned Orthogonal Contrasts m 65.7 Unplanned Contrasts—Scheffé’s Method ms 658 Comments 78 66 Random Models 278 61 Analyss of Means 231 67.1. ANON for Proportions 24 67.2 ANOM for Count Data 285 68 Chapter Summary 287 69. Chapter Exercises 238 Concept Questions 238 Practice Exercises 289 Beercises BI Projets 29 7. Linear Regression 301 7. Introduction 32 72. The Regression Medel 304 Contents 23 74 7S 76 77 78 79 720 Estimation of Parameters ip and fh 73.1. ANote on Least Squares Estimation of »” and the Partitioning of Sums of Squares Inferences for Regression TS.1_ The Analysis of Variance Test for 3, 752 The (Equivaiend) Test for 3} 153. Confidence intewal for SA. inferences on the Response Variable Using Statistical Software Correlation Regression Diegnosties Chapter Summary Chapter Exercises Concept Questions Practice Exercises Exercises Projects 8, Multiple Regression 81 82 a3 a4 as 86 The Multiple Regression Model 8.1.1 The Partial Regression Coefficient Estimation of Coefficients 8.2.1 Simple Linear Regression with Matrices 822 Estimating the Parameters of @ Multiple Regression Model 823 Correcting for the Mean, an Alternative Calculating Method Inferential Procedures 8.3.1 Estimation of a” and the Partitioning of the Sums of Squares 83.2 The Coefficient of Vatiation 83.3 Inferences for Coefficients 834 Tests Normally Provided by Statistical Software 83.5 The Equivalent t Statistic for Individual Coefficients 83.6 Inferences on the Response Varlable Comtelations 84.1 Multiple Correlation 84.2. How Useful ls the A Statistic? 84.3 Partial Correlation Using Statistical Software Special Models 86.1. The Polynomial Model 86.2 The Multiplicative Model x0 312 315 316 317 219 319 325 328 331 237 #1 342 351 354 356 357 358 362 363 370 370 372 372 375 78 See 385 287 390 331 395 863. Nonlinear Models 87. Multicolinearity 87.1 Redefining Variables 87.2. Other Methods 88 Variable Selection 88.1 Other Selection Procedures 89 Detection of Outliers, Row Diagnosties 8.10. Chapter Summary 8.11 Chapier Exercises Concept Questions Practice Exerces Brercises Projects Factorial Experiments 91 Introduction 92 Concepts and Defintions 93. The Two-Factor Factorial Experiment 93.1 The Linear Model 93.2 Notation 93.3 Computations for the Analysis of Vatiance 934 RetweenCells Analysis 935. The Factorial Analysis 93.6 Expected Mean Squares, 93.7. Unbalanced Data 94 Specific Comparisons 9.4.1 Preplanned Contrasts 942 Basic Test Statistic for Contasts 9.43. Multiple Comparisons 95. Quantitative Factors 95.1 Lack of Fit 96 No Replications 97. Three or More Factors 9.7.1. Addtional Considerations 98 Chapter Summary 99. Chapter Exercises Concept Questions Practice Exercises Bercises Project Contents 399 390 03 05 05 09 an 419 23 23 “24 26 43 445 46 a7 450 60 41 42 452 453 455 459 460 61 62 468 470 472 a2 5 415 a9, 479 80 82 91 xi xi Contents 10, Design of Experiments 493 101. introduction 495 102 The Randomized Block Desian 496 102.1 The Linear Model 498 10.2.2. Relative Efciency 501 1023 Random Tieatment Effects in the Randomized Block Design 502 103. Randomized Blocks with Sampling 502 104 Other Designs 508 10.4.1 Factorial Experiments in a Randomized Biock Design 509 10.42 Nested Designs 512 105 Repeated Measures Designs 515 105.1 One Between-Subject and One Within-Subject Factor 516 10.5.2 Two Within-Subject Factors 521 1053 Assumptions of the Repeated Measures Model 523 10.54. Split Plot Designs 524 1055 Additional Topics 529 106 Chapter Summary 529 107 Chapter Exercises 533 Concept Questions 533 Practice Exercises 534 Exercises 535 Projects 546 11, Other Linear Models 547 Wal Introduction SAT 11.2. The Dummy Variable Model 49 11.2.1 Factor Effects Coding 552 11.22. Reference Cell Coding 552 11.23. Comparing Coding Schemes 552 113. Unbalanced Data 554 1A. Statstical Software's Implementation of the Dummy Variable Model 556 15 Models with Dummy and interval Variables 558 115.1 Analysis of Covariance 560 1152 Multiple Covariates 564 1153 Unequal Slopes 565 115A independence of Covaiiates and Factors 568 116 Fxtensions to Other Models 570 11.7 Estimating Linear Combinations of Regression Parameters 570 117.41 Covatiance Matrices 5” 11.1.2 Linear Combination of Regression Parameters 572 118 Weighted Least Squares 574 11.9. Conelated Enors 11.40. Chapter Summary a4 Chapter Exercises Concept Questions Pradice Exercises Exercises Projects 12. Categorical Data 121. Introduction 122 Hypothess Tests fer a Multinomial Population 123 Goodness of Fit Using the 4? Test 123.1. Test fora Discrete Distribution 1232. Test for a Contiwuous Distribution 124 Contingency Tables 124.1 Computing the Test Statistic 12.42 Test for Homogeneity 1243. Test for Independence 1244. Neasuees of Dependence 1245" Likelihood Ratio Test 12A6 Fisher's Exact Test 125. Spectic Comparisons in Contingency Tables 126 Chapter Summary 127 Chapter Exercises Concept Questions Practice Exercises Beercses Projects 13. Special Types of Regression 131 Introduction 13.1.1 Maximum Likelhiood and Least Squares 132. Logistic Regression 133 Poisson Regression 133.1 Choosing between Logistic and Poisson Regression 134. Nonlinear Least Squares Regression 13441. Sigmeidal Shapes (5 Curves) 1342. Symmetric Unimodal Shapes 135. Chapier Summary 136. Chapier Exercises Concept Questions Contents 77 581 583 583 584 585 595 597 397 598 1 ol 2 5 608 a0 al 612 614 as ae ae a7 67 2 623 3 623 25 el 66 ee 9 es 2 a3 a3 xi xiv Contents Practice Exerclses Frercises Projects 14, Nonparametric Methods 141. troduction 141.4 Ranks 14.1.2. Randomization Tests 14.13. Comparing Parametic and Nonparametric Procedures 142. One Sample 143. Two Independent Samples 144 More Than Two Samples 145. Randomized Block Design 146. Rank Correlation 147 The Bootstrap 148 Chapter Summary 149. Chapter Exercises Concept Questions Practice Exercises Exercises Projects APPENDIX A Tables of Distributions Al Table of the Standard Normal Distribution AIA Table of Critical Vaues for the Standard Normal Dstibution ‘A2 Student's t Distribution — Values exceeded by a given probabilty a ‘3. The y7 Distribution — Values exceeded by a given probability « ‘Ad The Distribution — 10% in the upper tail, POF > <) =0.10 AAA The F Distribution — 5% in the upper tai, PIF > = 005 AAB_ The F Distribution — 25% in the upper tal, PIF > c) = 0.025 AAC. The F Distribution — 1% in the upper tal, P(F > cl = 001 AS Critical Values for Dunnett's Two-Sided Test of Treatments versus Contr 6 _Gritical Values of the Studentized Range, for Tukeys HSD AT Critical Values for Use with the Analysis of Means (ANOM) AB _iitical Values for the Wilcoxon Signed Rank Test ‘AS Critical Values for the Mann—Whitney Rank Sums Test APPENDIX B APPENDIX B A Brief introduction to Matrices © 2 Mavic aigenra one on © 22 sovin Liner Eauatonsonine on 3 651 53 6a 65 Cy 8 2 0 on om 6 6 or a7 2 685 85 6 er 9 ol 2 3 @ or 699 700.21 70082 700e5 APPENDIX C Descriptions of Data Sets C1 Florida Lake Data C2. State Education Data C3" National Atmosoheric Depostion Program (NADP) Data CA Florida County Data C5. Cowpea Data C5 Jax House Prices Data C7 Gainesville, FL, Weather Data CB. General Social Survey (G55) 2016 Data Hints for Selected Exercises References Index Contents 701 m1 702 703 704 704 5 706 707 709 mi v7 ~w This page intentionally left blank Preface The goal of Statistical Methods, Fourth Edition, is to introduce the student both to sta~ tistical reasoning and to the most commonly used statistical techniques. It is designed for undergraduates in statistics, engineering, the quantitative sciences, or mathematics, or for graduate students in a wide range of disciplines requiring statistical analysis of data. The text can be covered in a two-semester sequence, with the first semester cor- responding to the foundational ideas in Chapters 1 through 7 and perhaps Chapter 12. Throughout the text, techniques have almost universal applicability. They may be illustrated with examples from agriculture or education, but the applications could just have easily occurred in public administration or engineering, Our ambition is that students who master this material will be able to select, imple ment, and interpret the most common types of analyses as they undertake research in their own disciplines, They should be able to read research articles and in most cases understand the descriptions of the statistical results and how the authors used them to reach their conclusions. They should understand the pitfals of collecting statistical data and the roles played by the various mathematical assumptions. Statistics can be studied at several levels. On one hand, students can leam by rote how to plug numbers into formulas, or more often now, into statistical software, and draw a number with a neat circle around it as the answer. This limited approach rarely leads to the kind of understanding that allows students to critically select methods and interpret results, On the other hand, there are numerous textbooks that provide intro- ductions to the elegant mathematical backgrounds of the methods. Although this is a much deeper understanding than the fint approach, its prerequisite mathematical understanding closes it to practitioners from many other disciplines, In this text, we have tried to take a middle way. We present enough of the forma las to motivate the techniques, and illustrate their numerical application in small exam- ples. However, the focus of the discussion is on the selection of the technique, the interpretation of the results, and a critique of the validity of the analysis. We urge the student (and instructor) w focus on these skills Guiding Principles + No mathematics beyond algebra is required. However, mathematically oriented students may still find the material in this book challenging, especially if they ako participate in courses in statistical theory. xvii ill Preface + Formulas are presented primarily to show the how and why of a particulhr statisti- cal analysis. For that reason, there are a minimal number of exercises that plug numbers into formuhs. + All examples are worked to a logical conclusion, including interpretation of results. Where computer printouts are used, results are discussed and explained. In general, the emphasis is on conclusions rather than mechanics. + Throughout the book we stress that certain assumptions about the data must be falfilled for the statistical analyses to be valid, and we emphasize that although the assumptions are often fulfilled, they should be routinely checked. «Examples of the statistical techniques, as they are actually applied by researchers, are presented throughout the text, both in the chapter discussions and in the exercises, + Students will have opportunities to work with data drawn fiom a variety of disciplines. New to this Edition + Streamlined Presentation. Numerous sections have been completely rewritten with the goal of a more concise description of the methods. + Practice Problems for Every Chapter. Every chapter now includes Practice Exercises, With full solutions presented at the end of the text. + Additional Data Sots jor Projeas. We have added three new data sets that instrctors can use in preparing assignments, and we have updated the old data sets. Using this Book Organization The organization of Statistical Methods, Fourth Edition, follows the classical order, The formulas in the book are generally the so-called definitional ones that emphasize con- cepts rather than computational efficiency. These formulas can be used for a few of the very simplest examples and problems, but we expect that virtually all exercises will be implemented on computers using special-purpose statistical software. The first seven chapters, which are normally covered in a first semester, include data description, probability and sampling distributions, the basics of inference for one and two sample situations, the analysis of variance, and one-variable regression, The second portion of the book starts with chapters on multiple regression, factorial experiments, experimen- tal design, and an introduction to general linear models including the analysis of covariance. We have separated factorial experiments and design of experiments because they are different applications of the same numeric methods, Preface ‘The hast three chapters introduce topics in the analysis of categorical data, logistic and other special types of regression, and nonparametric statistics. These chapters pro- vide a brief introduction to these important topics and are intended to round out the statistical education of those who will leam from this book. Coverage This book contains more material than can be covered in a two-semester course. We have purposely done this for two reasons: + Because of the wide variety of audiences for statistical methods, not all instructors will want to cover the same material. For example, courses with heavy enrollments of students from the social and behavioral sciences will want to emphasize nonpara~ metric methods and the analysis of categorical data with less emphasis on experi~ mental design. * Students who have taken statistical methods courses tend to keep their statistics books for future reference. We recognize that no single book will ever serve as a complete reference, but we hope that the broad coverage in this book will at least lead these students in the proper direction when the occasion demands, Sequencing For the most part, topics are arranged so that each new topic builds on previous topics; hence course sequencing should follow the book. There are, however, some excep tions that may appeal to some instructors; + In some cases it may be preferable to present the material on categorical data at an early stage. Much of the material in Chapter 12 (Categorical Data) can be taught anytime after Chapter 5 (Inference for Two Populations). + Some instructors prefer to present nonparametric methods along with parametric methods. Again, any of the sections in Chapter 14 (Nonparametric Methods) may be extracted and presented along with their analogous parametric topic in earlier chaptess. Data Sets Data files for all exercises and examples are available fiom the text Web site at hitps://wuw. celsevier.coms/books-and-joumals/book-compenion/9780128230435 in ASCII (1x1), EXCE and SAS format. Appendix C fully describes eight data sess drawn from the geosciences, social sciences, and agricultural sciences that are suitable for a variety of small projects. Computing Ic is essential that students have access to statistical software. All the methods used in this text are common enough so that any multipurpose statistical software should xx Preface suffice. (The single exception is the bootstrap, at the very end of the text.) For consis~ tency and convenience, and because it is the most widely used single statistical com- puting package, we have relied heavily on the SAS System to illustrate examples in this text. However, we stres that the examples and exercises could as easily have been done in SPSS, Stata, R, Minitab, or any of a number of other software packages. As we demonstrate in 2 few cases, the various printouts contain enough common infor- mation that, with the aid of documentation, someone who can interpret results from ‘one package should be able to do so from any other. ‘This text does not attempt to teach SAS or any other statistical software, Generic rather than software-specific instructions are the only directions given for performing the analyses. Most common statistical software has an increasing amount of indepen- dently published material available, either in traditional print or online, For those who wish to use the SAS System, sample programs for the examples within each chapter have been provided on the text Web site at https://www.elsevier.com/books-and- jourmals/book-companion/978(128230435. Students may find these of use as template programs that they can adape for the exercises. Acknowledgments I was pleased when Rudy Freund and Bill Wilson invited me to help with the Third Edition, and honored to have the opportunity to become lead author on the Fourth Edition, Both experiences have left me with a tremendous respect for the erudition, time, and just plain hard work that Rudy and Bill put into writing the original text. My respect for them as statisticians, teachers, and mentors is unbounded. Sadly, Rudy Freund passed away in 2014. His reputation lives on with the numerous texts and research articles that he authored, and with the students that he inspired. Donna Mohr, PhD Emeritus Faculty of the University of North Florida >a This page intentionally left blank Statistical Methods This page intentionally left blank CHAPTER 1 Data and Statistics Contents 11 Introduction 11 Data Sources 11.2 Using the Computer 112 Observations and Variables 13 Types of Measurements for Variables 14 Distributions 1A. Graphical Representation of Distributions 1S Numerical Descriptive Statistics 15.1 Location 15.2 Dispersion 153 Other Measures 15.4 Computing the Mean and Standard Deviation from a Frequency Distribution 155 Change of Scale 116 Exploratory Data Analyss 16.1 The Stem and Leaf Plot 1.6.2 The Box Plot 1.6.3 Examples of Exploratory Data Analysis 17 Bivariate Data 17.1 Categorical Variables 417.2 Categorical and Interval Variables 173 Interval Variables 18 Popuations, Samples, and Statistical Inference — A Preview 19 Data Collection 1.10 Chapter Summary 1.11 Chapter Exercises 1.1 Introduction To most people, the word statisties conjures up images of vase ables of numbers referring to stock prices, population, or baseball batting averages. Statistics, however, actually denotes a system for reasoning based on data, The collection of the data, Stati! Meo © 202 Elseier he DOL: lgp:/idoi ony/ 1010168978 :23043-5,00001-1 All nghts reserved. Statistical Methods its description through appropriate summaries, and the methods for drawing conclu- sions from it all form the discipline of statistics. It is the fundamental tool for data~ driven reasoning, It is appropriate, then, to begin with a discussion of the characteris- tics of data, The purpose of this chapter is to 1. provide the definition of a set of data, 2. define the components of such a data set, 3. present tools chat are used to describe a data set, and briefly 4. discuss methods of data collection. Definition 1.1: A set of data is a collection of observed values representing one or more charac~ teristics of some objects or units. Example 1.1 GSS — A Typical Data Set Every year, the National Opinion Research Center (NORC) publishes the results of a personal interview survey of US. households. This survey is called the General Social Survey (G55) and is the basis for meny studies conducted in the social sciences. In the 1996 GSS, a total of 2904 households were sampled ‘and asked over 70 questions conceming Ifestyes, incomes, religious and political beliefs, and opinions ‘on various topics, Table |.1 lists the deta for a sample of SO respondents on four of the questions asked. This table ilustrates a typical midsized data set Each of the rows corresponds to a particular respondent (labeled 1 through $0 in the fist column). Each of the colurms, starting with column two, are responses to the following four questions: 1. AGE: The respondent's age in years 2. SEX The respondent's sex coded 1 for male and 2 for female 3. HAPPY: The respondent's general happiness, coded 1 for "Not too happy” 2 for “Pretty happy” 3 for “Very happy" ‘4. TVHOURS: The average number of hours the respondent watched TV during a day This data set obviously contains a lot of information atout this sample of 50 respondents Unfortunately this information is hard to interpret when the data are presented as shown in Table 1.1. There are just too many numbers to make any sense of the data — and we are only looking at 50 respondents! By summarizing some aspects of this data set, we can obtain much more usable information and perhaps even answer some specfic questions For example, what can we say about the overall frequency of the vatious levels of happiness? Do some respondents watch 2 lot of TV? Is there a relationship between the age of the respondent and his or her general hapot- ness? Is there a relationship between the age of the respondent and the number of hours of TV watched? We will return to ths data set in Section 1.10 after we have explored some methods for making sense of datasets lle this ore. As we develop more sophisticated methods of analysis in later chapters ‘we will again refer to this data set! " The GSS is discussed on the following Web hiep://www.gs.nor.org

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy