Sampling Design

How do we gather data? • • • Surveys Interviews Studies – Observational – Retrospective (past) – Prospective (future) • Experiments

Population • the entire group of individuals that we want information about

Census • a complete count of the population

Why would we not use a census all the time? The answer is 83!

Why would we not use a census all the time? 1) Not accurate 2) Very expensive 3) Perhaps impossible 4) the U.S. census has a huge amount of error • If using destructive sampling, you would destroy the population you wanted data on • Since taking a census of any population takes a long time, the data becomes obsolete by the time we compile it; plus censuses are VERY costly to do! • Suppose you wanted to know the average weight of white-tail deer population in Texas – would it be feasible to do a census? • Breaking strength of soda bottles • Lifetime of flashlight batteries • Safety ratings for cars

Sample • A part of the population that we actually examine in order to gather information • Use sample to generalize about the population

Sampling design • refers to the method used to choose the sample from the population

Sampling frame • a list of every individual in the population

Jelly Blubber Activity • Select 10 Jelly blubbers that you think are representative of the population of blubbers in regards to length. • Find the mean length of your sample

For the Jelly Blubber colony: m= 1. 941 cm m = 19. 41 mm

Simple Random Sample (SRS) • consist of n individuals chosen randomly from the population in such a way that – every individual has an equal chance of being selected – every set of n individuals has an equal chance of being selected Suppose we were to take an SRS of 100 BHS students – put each students' name in a hat. Then randomly select 100 names from the hat. Each student has the same chance to be selected! Not only does each student have the same chance to be selected but every possible group of 100 students has the same chance to be chosen. Therefore, it has to be possible for all 100 students to be seniors in order for it to be an SRS!

Stratified random sample • population is divided into homogeneous groups called strata • SRS's are pulled from each stratum Homogeneous groups are groups that are alike based upon some characteristic of group members. Suppose we were to take a stratified random sample of 100 BHS students. Since students are already divided by grade level, grade level can be our strata. Then randomly select 50 seniors and randomly select 50 juniors.

Systematic random sample • select sample by following a systematic approach • randomly select where to begin Suppose we want to do a systematic random sample of BHS students - number a list of students. Select a number between 1 and 20 at random. That student will be the first student chosen, then choose every 20th student from there. (There are approximately 2000 students – if we want a sample of 100, 2000/100 = 20)

Cluster Sample • based upon location • randomly pick a location & sample all there Suppose we want to do a cluster sample of BHS students. One way to do this would be to randomly select 10 classrooms during 2nd period. Sample all students in those rooms!

Multistage sample • select successively smaller groups within the population in stages • SRS used at each stage To use a multistage approach to sampling BHS students, we could first divide 2nd period classes by level (AP, Honors, Regular, etc.) and randomly select 4 second period classes from each group. Then we could randomly select 5 students from each of those classes. The selection process is done in stages!

SRS • Advantages • Disadvantages – Unbiased – Easy – Large variance – May not be representative – Must have sampling frame (list of population)

Stratified • Advantages • Disadvantages – More precise unbiased estimator than SRS – Less variability – Cost reduced if strata already exists – Difficult to do if you must divide stratum – Formulas for SD & confidence intervals are more complicated – Need sampling frame

Systematic Random Sample • Advantages • Disadvantages – Unbiased – Don’t need sampling frame – Ensure that the sample is spread across population – More efficient, cheaper, etc. – Large variance – Can be confounded by trend or cycle – Formulas are complicated

Cluster Samples • Advantages • Disadvantages – Unbiased – Clusters may – Cost is not be reduced representative – Sampling of population frame may – Formulas are not be complicated available (not needed)

Identify the sampling design 1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc. ) Then they randomly selected 3 colleges from each group. Stratified random sample

Identify the sampling design 2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Cluster sampling

Identify the sampling design 3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10 th customer after them, to fill it out before they leave. Systematic random sampling

Random digit table • each entry is equally likely to be any of the 10 digits • digits are independent of each other The following is part of the random digit table found on page 847 of your textbook: Numbers can be read across. Numbers can be read vertically. Numbers can be read diagonally. Row 1 1 8 5 3 3 4 5 0 2 4 2 5 5 8 0 4 5 7 0 3 8 9 9 3 4 3 5 0 6 1 0 3

Suppose your population consisted of these 20 people: 1) Aidan 2) Bob 3) Chico 4) Doug 5) Edward We need to 6) Fred will 11) Kathy use double 16) Paul digit 12) Lori numbers, Shawnie random 7) Gloria 17) ignoring 13) Matthew any number 8) Hannah 13) Matthew greater 18) Tracy than 14) Nan 9) Israel 20. Start with Row 1 19) Uncle Sam 10) Jung and read across. 20) Vernon 15) Opus Ignore. Use the following random digits to select a sample of five from these people. Row Stop when five people are selected. So 1 4 5 my sample would consist of : 1 1 8 0 5 1 3 7 2 0 1 5 5 8 0 1 5 7 0 3 8 Aidan, Edward, Matthew, 0 Opus, 3 9 9 3 4 3 5 6 and Tracy

Bias • A systematic error in measuring that causes the estimate Anything data to be wrong! It • favorsmight be attributed to certain outcomes the researchers, the respondent, or to the sampling method!

Sources of Bias • things that can cause bias in your sample • cannot do anything with bad data

Voluntary response • People chose to respond to the survey. • Usually only people with very strong opinions respond An example would be the surveys in magazines that ask readers to mail in. Other examples are call-in shows, American Idol, etc. Remember, respondent selects themselves to participate in the survey! Self-selection!!

Convenience sampling • Ask people who are easy to ask • Produces bias results An example would be stopping friendly-looking people in the mall to survey. Another example is the surveys left on tables at restaurants - a convenient method! The data obtained by a convenience sample will be biased – however this method is often used for surveys & results reported in newspapers and magazines!

Undercoverage • some groups of the population are left out of the selection process Suppose you take a sample by randomly selecting names from the phone book – some groups will not have the opportunity of being selected! People with unlisted phone numbers – usually high-income families People without phone numbers – usually low-income families People with ONLY cell phones – usually young adults

Response bias Suppose we wanted to survey high school students on drug abuse and we used a uniformed police officer to interview each student in our sample – would we get honest Response biasanswers? occurs when for some reason (interviewer’s or respondent’s fault) you get incorrect answers. • occurs when the behavior of respondent or interviewer causes bias in the sample • wrong answers

Wording of the Questions • wording can influence the given answers • connotation of words Questions must be worded as neutral as possible to avoid influencing the response. The level of vocabulary should be appropriate for the population you are surveying – if surveying Podunk, TX, then you should avoid complex vocabulary. – if surveying doctors, then use more complex, technical words.

Source of Bias? 1) Before the presidential election of 1936, FDR against Republican ALF Landon, the magazine Literary Digest predicted Landon winning the election in a 3-to-2 victory. A survey of 2.8 million people. George Gallup surveyed only 50,000 people and predicted that Roosevelt would win. The Digest's survey came from magazine subscribers, car owners, telephone directories, etc. Undercoverage – since the Digest's survey comes from people selected from high-income families and thus mostly Republican! (other answers are possible)

2) Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at SMU. You collect receipts for students as they leave the bookstore during lunch one day. Convenience sampling – easy way to collect data or Undercoverage – students who buy books from on-line bookstores are not included.

3) To find the average value of a home in Plano, one averages the price of homes that are listed for sale with a realtor. Undercoverage – leaves out homes that are not for sale or homes that are listed with different realtors. (other answers are possible)