Скачать презентацию Sampling Design How do we gather data

da1fce4d6e70fb83dc634a1733f39c0d.ppt

• Количество слайдов: 35

Sampling Design

How do we gather data? • • • Surveys Interviews Studies – Observational – Retrospective (past) – Prospective (future) • Experiments

Population • the entire group of individuals that we want information about

Census • a complete count of the population

How good is a census? Do frog fairy tale. . . The answer is 83!

Why would we not use a census all the time? 1) 2) 3) 4) Not accurate Very expensive Perhaps impossible the U. S. census – it Look at has a huge amount of error If using destructive sampling, you would in Since taking wanted long to Suppose it census of any it; plus destroy population you atakes a to know • • • population takesweight censuses the average data making the compile the time, of the Breaking strength of soda bottles are VERY costly to do! white-tail deer population in Lifetime of data obsolete by the time we flashlight batteries Texas – wouldget be feasible to it it! Safety ratings for cars do a census?

Sample • A part of the population that we actually examine in order to gather information • Use sample to generalize about the population

Sampling design • refers to the method used to choose the sample from the population

Sampling frame • a list of every individual in the population

Jelly Blubber Activity • Select 10 Jelly blubbers that you think are representative of the population of blubbers in regards to length. • Find the mean length of your sample

For the Jelly Blubber colony: m= 1. 941 cm m = 19. 41 mm

Simple Random Suppose we were to take an SRS of Sample student has the (SRS) 100 BHS students – put each Not only does each students’ name in a hat. Then • same chance select 100 names but every consist of n individuals –from the to be selected randomly from the possible group chosen students has the population of 100 in such a way hat. Each student has the same chance to be selected! that Therefore, it has to be possible for all 100 students to be seniors in order for – every individual has an equal it to be an SRS! chance of being selected – every set of n individuals has an equal chance of being selected

Stratified random sample Homogeneous groups are groups that are alike based upon some characteristic of to group members. Suppose we were the take a stratified random sample of 100 BHS students. Since students are already divided by grade level, grade level can be our strata. Then randomly select 50 seniors and randomly select 50 juniors. • population is divided into homogeneous groups called strata • SRS’s are pulled from each stratum

Systematic random sample Suppose we want to do a systematic random sample of BHS students - number a list of students • select sample by Select a number systematic following abetween 1 and 20 at random. That student will be the first approach student chosen, then choose every 20 student from there. • randomly select where to begin (There approximately 2000 students – if we want a sample of 100, 2000/100 = 20) th

Cluster Sample Suppose we want to do a cluster sample of BHS students. One way to do this would be to randomly select 10 classrooms during 2 nd period. Sample all students in those rooms! • based upon location • randomly pick a location & sample all there

Multistage sample To use a multistage approach to sampling BHS students, we could first divide 2 nd period classes by level (AP, Honors, Regular, etc. ) and randomly select 4 second period classes from each group. Then we could randomly select 5 students from each of those classes. The selection process is done in stages! • select successively smaller groups within the population in stages • SRS used at each stage

SRS • Advantages • Disadvantages – Unbiased – Easy – Large variance – May not be representative – Must have sampling frame (list of population)

Stratified • Advantages • Disadvantages – More precise unbiased estimator than SRS – Less variability – Cost reduced if strata already exists – Difficult to do if you must divide stratum – Formulas for SD & confidence intervals are more complicated – Need sampling frame

Systematic Random Sample • Advantages • Disadvantages – Unbiased – Don’t need sampling frame – Ensure that the sample is spread across population – More efficient, cheaper, etc. – Large variance – Can be confounded by trend or cycle – Formulas are complicated

Cluster Samples • Advantages • Disadvantages – Unbiased – Clusters may – Cost is not be reduced representative – Sampling of population frame may – Formulas are not be complicated available (not needed)

Identify the sampling design 1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc. ) Then they randomly selected 3 colleges from each group. Stratified random sample

Identify the sampling design 2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Cluster sampling

Identify the sampling design 3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10 th customer after them, to fill it out before they leave. Systematic random sampling

Random digit table Numbers can be read across. Numbers can be read vertically. The following is part of the random digit table foundcan be read diagonally. Numbers on page 847 of your textbook: • each entry is equally Row of 7 1 likely 1 8 5 any 3 3 the 4 5 to be 0 2 4 2 5 5 8 0 4 5 7 10 digits 3 8 9 9 3 4 3 5 0 6 • digits are independent of each other 1 0 3

Suppose your population consisted of these 20 people: 1) Aidan 2) Bob 3) Chico 4) Doug 5) Edward We need to 6) Fred will 11) Kathy use double 16) Paul digit 12) Lori numbers, Shawnie random 7) Gloria 17) ignoring 13) Matthew any number 8) Hannah 13) Matthew greater 18) Tracy than 14) Nan 9) Israel 20. Start with Row 1 19) Uncle Sam 10) Jung and read across. 20) Vernon 15) Opus Ignore. Use the following random digits to select a sample of five from these people. Row Stop when five people are selected. So 1 4 5 my sample would consist of : 1 1 8 0 5 1 3 7 2 0 1 5 5 8 0 1 5 7 0 3 8 Aidan, Edward, Matthew, 0 Opus, 3 9 9 3 4 3 5 6 and Tracy

Bias • A systematic error in measuring that causes the estimate Anything data to be wrong! It • favorsmight be attributed to certain outcomes the researchers, the respondent, or to the sampling method!

Sources of Bias • things that can cause bias in your sample • cannot do anything with bad data

Voluntary response • People chose to surveys in An example would be the respond Rememberask the way to in – readers to mail magazines that • the survey. Other voluntary with Usually onlyexamples are calldetermine people in shows, Americanis: response Idol, etc. very strong opinions Remember, respondent selects respondthe participate in themselves to Self-selection!! survey!

Convenience sampling The data obtained by a convenience sample will be biased – however this method is often used for surveys & results reported in newspapers and An example would be stopping magazines! friendly-looking people in the mall to survey. Another example is the surveys left on tables at restaurants - a convenient method! • Ask people who are easy to ask • Produces bias results

Undercoverage People with unlisted phone numbers – usually high-income families • some groups of without People phone numbers – population Suppose you take a are left usually lowsample by randomly income families out of from selecting namesthe selection the phone book – process some groups will not People with ONLY cell have the opportunity of being selected! phones – usually young adults

Response bias Suppose we wanted to survey high school students on drug abuse and we used a uniformed police officer to interview each student in our sample – would we get honest Response biasanswers? occurs when for some reason (interviewer’s or respondent’s fault) you get incorrect answers. • occurs when the behavior of respondent or interviewer causes bias in the sample • wrong answers

Wording of the The level of vocabulary should be appropriate Questions for must be worded as Questions the population you are as possible neutral surveying to avoid influencing the influence the response. • wording can surveying Podunk, TX, – if then you given answers that are should avoid complex vocabulary. • connotation of words if surveying doctors, • – use of “big” words or then use more complex, technical words technical wording.

Source of Bias? 1) Before the presidential election of 1936, FDR against Republican ALF Landon, the magazine Literary Digest predicting Landon winning the election in a 3 -to-2 victory. A survey of 2. 8 million people. George Gallup surveyed only 50, 000 people – since the Digest’s Undercoverage and predicted that survey Roosevelt would win. The Digest’s people comes from car owners, etc. , the survey came from magazine from high-income selected were mostly subscribers, car owners, telephone directories, etc. families and thus mostly Republican! (other answers are possible)

2) Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at SMU. You collect –registerto Convenience sampling easy way receipts for students as they collect data or leave the bookstore during Undercoverage – students who buy lunch one day. bookstores are books from on-line included.

3) To find the average value of a home in Plano, one averages the price of homes that are listed for sale with a realtor. Undercoverage – leaves out homes that are not for sale or homes that are listed with different realtors. (other answers are possible)