R.M. Notes on Textbook

Chapter 1: Basic Concepts

Hypothesis generation is the logical deduction of expectations from an established theory, generally in this form: Theory X implies that B results from A; therefore, we hypothesize that if X is true, producing A will result in the occurrence of B. We can develop hypotheses by proposing solutions to conflicting findings, observing seemingly paradoxical behavior (ie: the doomsday cult that became even more popular after their prophecy was demonstrated to be incorrect), or paying attention to everyday behavioral tactics that people employ in dealing with others.

The transition to objectivity in social research was marked by a strong emphasis on operationalization (the translation of abstract theoretical constructs into concrete procedures and indicators that can be observed, recorded, and replicated). We must redefine the abstract in empirical terms, specifying variables in such a way that they can be observed or manipulated. We must specify procedures or instruments required to make the actual observations = operationalization of the conceptual model.

It is extremely important to note whether the operationalization has any psychological reality. Does it reflect the underlying processes that give rise to the observable responses?

Specific operations are always imperfect because any particular observation is the function of multiple factors many of which are totally unrelated to the conceptual variable of interest. There are observer errors, instrumentation errors, and environmental and contextual conditions. Therefore, multiple operationalism and triangulation!

Theory testing is a process. However, the interplay between theory and data is not entirely an objective process, as acceptance of a particular theoretical position is determined at least in part by the prevailing social climate and the personalities of those involved in the debate. The process is as follows: (1) derive the implications of a theory; (2) conduct the experiment /collect the data; (3) compare theory with data; (4) revise theory or experiment as needed; (5) continue until some outcome is encountered that cannot be explained by the current theory, at a time when some alternative is available which accounts for all the previous findings and the otherwise inexplicable result

Chapter 2: Fitting Research Design to Research Purpose

Research in behavioral sciences full under two categories: (1) surveys (natural settings, “real world” context, lots of data points); (2) experiments (determine causal relationships and good for situations where you need to isolate and control interrelationships). There are three types of causation: unidirectional, bidirectional, and noncausal covariation.

Type I Error vs. Type II Error: these are errors of course, but think of them in terms of a smiley face and a frowning face. Type I is a smile; Type II is a frown. In a Type I Error, we erroneously conclude that our treatment had an effect, therefore we’re smiling. In a Type II error, we erroneously conclude that our treatment did not have an effect, even though it truly did. We’re frowning because we think we just failed in our experiment.

To have a true experiment, we need to have (1) a manipulation and (2) random assignment. The purpose of most inferential statistical tests is to assess the likelihood that the obtained pattern of data could have occurred by chance. Basic research contributes to the ephemeral universe of knowledge. Applied research involves efforts that are directed toward affecting a particular phenomenon in some preconceived way.

Internal validity has to do with the certainty with which one can attribute a research outcome to the application of a treatment or manipulation that is under the rigid control of the researcher. A study is said to be confounded (or internally invalid) when there is reason to believe that obtained differences in the dependent variable would have occurred even if exposure to the independent variable had not been manipulated. Threats to internal invalidity: MR SMITH IC history, maturation, testing, instrumentation, statistical regression, selection, experimental mortality, selection-history interactions.

External validity is the generalizability of findings, meaning how can the results we discovered apply to the external world? How generalizable are they? Did we look at a broad enough segment of the population? (See Mook for an argument that external validity isn’t always important.) Need: robustness, ecological validity, relevance. Generalizability of operationalizations means we ensured the correct identification was made of dependent and independent variables, along with their underlying nature and relationship. Generalizability of results to other places and participant populations means we ensured the identified relationship is expected to recur at other times and places under different environmental conditions.

Chapter 3: Measuring Concepts

The quality of a given measure is expressed in terms of its reliability and validity.

Reliability means repeatability. It is the proportion of variance due to true score variability. (Observed score = true score + error). Technically, the reliability of a measure is defined as the proportion of total variance in observed scores that is due to true score variability. A perfectly reliable instrument would be one in which this proportion was equal to 1, or in which true score equaled observed score. A perfectly unreliable score would have a ratio of 0.

Random error (typically type II error) is due to chance and tends to increase the variability of scores in nonsystematic ways. It affects the measure’s sensitivity to detecting differences between groups of research participants and tends to cancel out across the groups. Systematic error (type I error) consistently and artificially inflates or deflates the scores within a given group of participants. It doesn’t cancel out between groups and exacerbates differences over and above those that actually exist.

Internal consistency refers to the extent to which the components of a measuring instrument predict or produce similar results. It can be measured by (1) Split-half technique: a measure consisting of set of items is administered to a sample of respondents. The items are divided into two groups of approximately equal number, and the sums are calculated. The extent to which the two half sets interrelate is a measure of the internal consistency of the full set. (2) Cronbach’s coefficient alpha: a hypothetical value that would be obtained if all of the items that could constitute a given scale were available and randomly put into a very large number of tests of equal size. The average correlation between all possible pairs is approximated by coefficient alpha.

Temporal stability refers to the degree to which the data obtained in a test resemble data obtained in a second testing. (1) Test – retest is the most common technique for assessing. The problem with this is if there is too long/short of an interval between tests or if there are uncontrolled events in the interim period. (2) Equivalent forms method is similar to the split-half approach, and two tests are issued at the same time. A high relationship between the two is an indication of the reliability of the instruments.

Validity: does a measurement truly measure what it is supposed to measure? Validity refers to the extent of correspondence between variations in the scores on the instrument and variation among respondents on the underlying construct being studied.

Predictive validity is the accuracy with which the scale predicts the specific level or extremity of the criterion. Many factors can influence the magnitude of a relationship, so predictive validity is not sufficient to establish the validity of a measure

Content validity is the extent to which the content of a measure represents the complete range of the construct under consideration. With factual materials or where the domain is relatively well specified, achieving content validity is not overly difficult. Limitation: assessment is a subjective operation – there are no statistical measures

Construct validity asks: do the contained beliefs, attributes, processes, or predispositions form a meaningful whole? Create it via the known groups method. In this instance, the construct is given to different groups of people who are known to differ on an attribute that is the focus of the instrument.

Convergent and discriminant validity ask: does the instrument have relationships to other constructs that are consistent with the overall theoretical construct? When the predicted patterns occur, they are taken as supportive evidence of the scale’s validity but do not prove it.

Threats to measurement validity: (MEALS) Mood, Extreme Response, Acquiescence, Language, Social desirability.

In the Multitrait- Multimethod Matrix (MTMMM), multiple measures are used to assess the degree to which measures of theoretically related constructs adhere, over and above the relationships that might come about simply as a result of their sharing common methods of measurement. This is outdated and unused, but a good thought experiment according to Crano.

Chapter 4: Designing Experiments

The steps in the “true” experimental design: (1) obtain a pool of participants; (2) pretest them on the dependent variable of interest; (3) randomly assign each participant to experimental or control groups (at least 25-30 individuals per condition); (4) carefully control for differences in the application of the experimental treatment between the two groups; (5) re-measure both groups on the dependent variable at some time following the experimental treatment.

Pretest sensitization is when pretest causes undue sensitivity to the treatment. To avoid this, we may administer the pretest well in advance of the manipulation, though this doesn’t always work, especially if it is an emotionally loaded issue – it could give participants time to ruminate on the issue and it isn’t always administratively possible. Eliminating the pretest may be the best option if you only have one-time access to participants. Good random assignment across a large pool allows the research to presume equivalence of the experimental and control groups

The Solomon four-group design is one way around pretest sensitization. Participants are assigned to one of four groups (a 2×2), so Experimental vs. Control as well as Pretested vs. Not may be tested.

  • Pretest – Treatment – Posttest (X-O-X)
  • Pretest – Posttest (X—X)
  • Treatment – Posttest (–O-X)
  • Posttest (—-X)

Factorial designs (such as the Solomon four-group) combine multiple levels of multiple treatments. Conditions should be constructed such that all levels of each variable are combined with all levels of the others. The overall effect of each variable is called the main effect of that factor. Common forms of interaction effects: (1) divergent/convergent – results of dependent variable diverge or converge (2) crossover – results of dependent variable crossover when graphed.

The experimenter controls an IV. A blocked variable is something the experimenter cannot control (things participants bring with them to the experiment, such as sex, race, disposition, etc.). Using a blocked design gives researcher greater control over sources of variation, equalizing differences among participants across experimental conditions.

In most experimental designs, any one participant is exposed to one and only one experimental condition; however, in some cases participants may be used in more than one “cell” of the design. This design is called a repeated measures study. Variables that are manipulated in this way are called within-subject factors. Example: pretest – posttest – control group design.

Alternatively, individual participants may be exposed to a series of different treatment conditions and measured after each. Every participant may be exposed to all possible treatments or participants may be randomly assigned to different sets of treatments. The effects of one experimental treatment might carry over in some way to influence the effects of subsequent treatments, so the order of treatments in a repeated measures experiment could have a major effect on its results. In this instance, counterbalancing is important. Conditions: (1) each treatment occurs only once; (2) each treatment is preceded by every other treatment only once; (3) number of participants in each group is equal. Such designs assure the equivalence of groups exposed to different treatments, because each participant serves (in effect) as his/her own control group. However, this design is not available in all cases – for example where the effects of one treatment might linger into subsequent treatments

Chapter 5: Constructing Laboratory Experiments

To create a lab experiment: (1) Select participant pool: arrange for a pool of eligible participants, define which participants are eligible to take part in the investigation, and limit participation as needed based on select characteristics. (2) Decide on sample size: key issue is statistical power. Aim for medium effect sizes. In a two-group experiment, this means 64 participants are sufficient to detect a real difference at p <0.05. Practical constraints usually limit the potential power of our research endeavors; (3) Prepare methods: organize procedures (instructions, measures, debriefing documents); (4) Submit to IRB; (5) Setup environment, keeping environmental factors constant (same lab space, same instructions, same facilitator, etc. Not everything can be controlled (for example, you will have to run experiments at different times of the day), where it can’t make sure that every treatment condition is equally likely to occur under each environmental condition. For example, don’t run all treatments in the morning and all controls in the afternoon. Manipulations can be (1) environmental. Example: Participants are filling out a form in a waiting room. Smoke gets pumped in. The more people in the room, the longer they wait to take action. They can be (2) instructional. Instructions with an “accidental” aspect to them are common in social psychological research, and probably constitute one of the social scientist’s most powerful tools. Manipulations can be (3) social. Example: studying conformity by placing a subject in a group of confederates, then asking them to judge line lengths. When the confederates identify an obviously shorter line to be the longer, how does the subject respond? When a researcher does a manipulation check, he is determining whether or not the participants understood the treatment. Did subjects’ experience the treatment in the way that we intended? Did it provoke the thoughts, feelings, or concerns that we were trying to provoke? There are basically two ways to do random assignments: (1) process of randomization (e.g., coin toss, die roll) upon arrival; (2) conditions are randomly ordered in advance. Running more sessions with fewer participants per session minimizes the lost data, should one of the sessions have an unanticipated, unintended disruption (e.g., subject has a medical emergency) If the experimental arrangements force participants to attend to the task requirements of the research (and focus less on themselves), then the study is said to have a high degree of experimental realism. In this, participants are unable to intellectualize their reactions and are responding to the experimental situation in a way that approximates their natural, spontaneous behavior. Mundane realism refers to the degree to which various features of the experiment mirror real world, non-laboratory events that participants might encounter in their day-to-day experiences. These two realisms are not mutually exclusive, and a good design will establish both. But experimental realism is more important for validity of results. Social Simulations are intended to preserve many of the advantages of controlled laboratory experiments, while approaching conditions that are more generalizable to the real world. A well-designed simulation has the potential to isolate the social phenomenon of interest without destroying its natural contextual meaning. Passive role-playing simulations are thought experiments; participants are provided with a scenario and are asked to estimate or predict how they would behave. Active role-playing simulations may involve bargaining/negotiation games or international relations simulations. In analogue experiments, participants are presented with a real situation to respond to directly (rather than act out “as if”). Every feature of the external situation that is considered theoretically relevant has a corresponding feature contained in the laboratory situation. Analogue experiments have been used for some time in various aspects of clinical research (e.g., animal tests in medicine), but are relatively rare in social science research.

Chapter 6: External Validity of Laboratory Experiments

The Enlightenment Effect states that as social research becomes common knowledge, knowledgeable participants confound future research. (Not widely accepted, but has moved research to be more concerned with generalizability)

Research experiments tend to use college sophomores as their participant population. If generalization is the goal, then college sophomores who tend to be smarter, healthier, and more in touch with social forces than average population can be a poor choice. However, if college sophomores are our only choice, they’re certainly better than no psychology at all.

Research experiments have three categorical types of participants: (1) voluntary (agree to be in the experiment); (2) involuntary (ie: as a class requirement); and (3) non-voluntary (ie: people enter a public place and don’t even know they’re part of an experiment).

Webber and Cook (1972) identified four participant types: (1) good participants who attempt to determine hypotheses in order to confirm them; (2) negative participants who attempt to determine hypotheses in order to sabotage the experiment; (3) faithful participants who cooperate, follow instructions, and do not attempt to determine hypotheses; (4) apprehensive participants who worry the experimenter will evaluate their abilities and personality and react accordingly.

When asked on an exam which type of participant was the most desirable, the correct answer was: involuntary participant. I’m not sure why this is correct. I thought the voluntary participant would be most desirable. Final note: the roles identified by Webber and Cook aren’t universally accepted.

Rosenthal (1966) identified experimenter bias, finding that the expectations of the experimenter can be transmitted to participants, whether it’s intentional or not. This may occur by systematically erring in observation, recording, or analysis of data. Or it may occur in the form of cueing the participant to the correct response through verbal or nonverbal reinforcement.

There are three ways to eliminate experimenter bias: (1) Blind Procedures: don’t tell experiment administrators what the experiment is testing. Yes, the administrators may form expectations and interweave their expectations into the experiment, so try to complete the experiment quickly. Also, run all conditions at once (using recordings or written instructions) so the administrator doesn’t know which participant is in which condition. (2) Monitoring: observe experimenter to insure that data transcription and analysis are accurate. However, this is not highly effective and does not prevent subtle, nonverbal cuing of the participant by the experimenter. (3) Mechanized procedures: standards. For example, record instructions, videotape manipulations, or analyze data by computer or 3rd party.

Three most important forms of external validity: (1) Robustness: Can it be replicated? Do findings hold up with different participants and settings? (2) Ecological validity: Is it representative? Is the effect representative of what happens in everyday life? (Note: this is a restrictive form of external validity and not considered useful when testing causal hypotheses. If you’re simply testing whether a given manipulation alters behavior, then whether the test is immediately representative of the real world is not a concern.) (3) Relevance: Does it matter? The connections between research findings and application are often indirect and cumulative rather than immediate. Relevance is a matter of social process (how results are transmitted and used) rather than what the results are.

Under the heading of ecological validity comes (1) Mundane realism: the extent to which the research setting and operations resemble events in normal, everyday life; (2) Psychological realism: the extent to which the psychological processes that occur in the experiment are the same as the ones that occur in everyday life (Even if an experiment appears unrepresentative on the surface, it may have high psychological realism)

Chapter 7: Conducting Experiments Outside the Laboratory

In a lab experiment, participants go to a designated location. In the field, the researcher brings research operations to participants in their own environment. The main difference, aside from context, is that in the field, participants are usually unaware they’re being observed. (Even if they’re informed, they’re likely to forget.) Use these two research designs in conjunction with one other.

Lab Experiment: Naturalistic Observation –> Theory Development –> Laboratory Research
Field Experiment: Field Research –> Theory Development –> Naturalistic Observation

Questions of validity can rarely be resolved in a single experiment. When creating exact replications, you reproduce everything from initial study (particularly the operationalizations of independent and dependent variables) to see if an exact result can be repeated. Only the participants, the time, the place, and (usually) the experimenter are changed. (bad) In conceptual replications, you’re looking to see if a particular empirical relationship can be repeated. To establish construct validity, make operationalization of treatment and effect as dissimilar as possible. Vary both content and research procedures. (good)

When controlling the IV in lab experiments, the experimenter can: (1) create situations from scratch; (2) introduce a systematic variation into existing conditions; (3) direct participants’ attention to a particular aspect the field (ie: seating arrangements).

It is possible to use random assignment in field studies (ie: putting a quarter in a phone booth to see how the person who enters booth responds), however de-selection occurs when participants cannot continue in the group because they do not notice the manipulation (ie: fail to see the quarter).

DVs are unlikely to be influenced by experimental “demand characteristics” or social desirability response biases because there’s no self-report. We are observing behaviors with real consequences.

Problems with unobtrusive measurements (observations without interfering) are in reliability (will not be as great as the more direct measures they are designed to mimic) and validity (may not measure what it’s supposed to measure). The farther removed the actual measure is from the variable of interest, the less likely it is to prove valid.

Internet experiments: reasoning states that if lab and web results are the same, then web results are not biased. However, this rests on a conformation of the null hypothesis to establish convergent validity.

Final note: randomization is possible in field experiments.

Chapter 8: Correlational Design and Causal Analysis

When variables can’t be manipulated, analysis is correlational not causal. Usually, variables are allowed to vary freely, which enhances the potential for accurately recording the covariation between measures. In research experiments, the IV is limited by the controlled manipulation. (Covariation is the extent to which changes in one are related to the other.)

Correlational studies look at: (1) Response-response relationships: response on one measurement compared with responses on another measure (e.g. score on religiosity scale is correlated with score on authoritarianism). Responses on both measures are unconstrained by the researcher. (2) Mixed design: researchers first sort individuals into different “experimental” conditions based on blocked characteristics then look at the effects of an experimental manipulation. This combines experimental assignment and correlational conditions.

Control factors: a variable that enters an experiment but is determined by self-assignment (participants bring it with them). Cannot control it but can measure and account for effects.

Covariate: when a control variable is continuous (e.g. IQ).

Blocking variable: when a control variable is categorical (e.g. gender).

Median split: group divided at the median – those above are “high,” those below are “low.” A problem with median split results in situations where respondents with similar scores (e.g. 49/51) are categorized as different but those with very different scores (e.g. 3/49) are classified within the same group.

Correlation ratio: provide information on the extent of a relationship between variables. Other types of correlation provide information on the degree and type of relationship. (1) Pearson product-moment correlation coefficient: used to determine the extent of linear relationship (most common). (2) Correlation coefficient (r) indicates the magnitude and direction of the linear relationship (-1 to +1)

When product-moment correlational analysis indicates r=0, potential explanations: (1) no systematic relationship; (2) there is a systematic relationship that is not linear (look at a graph); (3) measures are flawed; (4) limitations in measurement (scores on one or both measures has been truncated)

Multiple correlation and regression: use this you want to know the ways in which a combination of different variables relates to a particular measure. Note: generalizing results to new data sets reduces the relationship because analysis presumes scores are free from error. Predictor weights vary with each new analysis (shrinkage).

The problem with freely occurring variables is that they usually have natural covariates that may occur in three forms: (1) the hidden third factor; (2) multidimensional causation: the observed relationship is one of several interrelated factors; (3) confounding of the independent and dependent variables (one variable cannot be extricated from the other).

Partial correlational methodology: to determine whether there is any common variance between criterion and predictor variables after common variation has been removed (PARTIALLING OUT). BUT if you’re hoping to test the validity of theoretical concepts, failing to distinguish between prediction and explanation is dangerous. The existence of a significant partial correlation does not indicate how or why

Path analysis: if all measures are treated as the construct (appropriate when construct can be measured directly, e.g. student attendance, gender). In structural equation modeling (covariance structure modeling) when variables of interest can’t be measured directly, use multiple indicators to capture the construct (the latent variable).

  • Direct effects: change in one variable is directly reflected by a subsequent change in another
  • Indirect effects: relationship is mediated by some other variable
  • Exogenous: variable for which no cause is postulated within the model
  • Endogenous: one that is, at least in part, affected by variable(s) in the theoretical model
  • Recursive model: one-way directional flow – no paths ever return to variable that has already been involved in a relationship
  • Nonrecursive model: allow for causal paths to backtrack. A variable can be a cause and effect

Identification refers to the relative number of unknowns in a set of equations. Unknowns are paths; knowns are correlations. When unidentified, there are more unknowns than knowns and the solution is not possible. When over identified, there are more correlations than paths. “Just identified” means exactly the same number of each.

Disturbances are unmeasured, unspecified causal determinants of endogenous latent variables (called “error term” in ANOVA). They are conceptualized as exogenous variables (causes beyond the theoretical model).

Structural modeling can never be proven correct, but helps render alternative explanations of causal relationships implausible. Note: doesn’t allow for causal statements.

Note difference between use of causal models in hypothesis testing (all relationships predicted in advance) and exploratory research (not set in advance).

A latent variable is a variable that has manifest indicators, similar to multiple operationalism. What you use to triangulate your variable. Example: love is your construct (it is a latent variable). You have three scales measuring it (these are the manifest indicators).

Chapter 9: Quasi-Experiments and Evaluation Research

In correlational research, the investigator is an observer. Variables vary freely and all are DVs.

In experimental research the researcher intervenes, systematically controlling variation in the IV.

In quasi-experiments, everything is set up as if it were a basic treatment-control design; however, the quasi-experiment lacks random assignment.

  • Evaluation Research: assessing the effects of interventions (i.e. a new social program)
  • Needs Assessment: learn extent and distribution of a social problem. (Descriptive)
  • Program Development (aka Formative Evaluation): pilot studies to test programs with controlled experiments. Not to assess impact, but to assist with designing the program
  • Feasibility Studies and Efficacy Research: after design, the next question is how the program can be implemented on a large scale, so begin with a small scale design. Deliver treatment under ideal conditions – failure here is disastrous.
  • Program Effectiveness Evaluation (aka summative evaluation, impact evaluation, or outcome evaluation): assess whether a social program has the desired effect. Causal hypothesis testing.
  • Cost-Benefit Analysis: analysis of program benefits relative to program costs (this is difficult)

The regression fallacy is the belief that scores have a tendency to trend toward mediocrity. It is not true. In regression, the extreme scores regress toward the mean.

Observed score = true score + error.

Test-retest reliability is the degree of similarity in scores between different tests. To control for regression in pretest–posttest research designs, we can compare each posttest score with a “regressed” predicted score. However, this method is easier said than done.

When using comparison group research designs, a problem that arises when groups are not randomly assigned to conditions, is that they likely differ from nonparticipants in a systematic way, so there is an initial non-equivalence. One way to compensate is to try post-hoc matching, where you’re comparing members of the two groups who attained similar scores on the pretest measure. However, making matches on pretreatment variables (e.g., age, socioeconomic status, intelligence, personality scores) is bad. It is better to take into account the nonequivalence between groups

In time series designs, there are two big problems. The carry-over (auto-correlated) errors make it difficult to pinpoint a change in the time series at the one specific time of interest to the evaluation researcher. Also, systematic trends or cycles that affect the pattern of data over a specific time period (Ie: seasonal like cold weather) play a role in any changes.

“Pre-whitening” means remove regularities in the time series before analyses of effects.

When using time series designs or archival data, a access to records may be difficult. Also, there are limits to the information contained in the records.

Comparison time-series designs combine quasi-experiment with comparison-group design. If a social program is introduced in one location but not in another, the results can be compared if the same record-keeping system is available for both sites. If they’re roughly parallel prior to the introduction of the experimental program, but diverge significantly afterwards, many potential alternative explanations for the change in the latter series can be ruled out. Another method of analysis is to include variables in the analysis that are parallel to the critical variables but which should not be affected by the interruption (murder and minor crimes).

In regression-discontinuity designs, we must assume the relationship is linear. When a treatment program is based on a clear selection principle, this design relies on the existence of a systematic, functional relationship between the selection variable and the outcome measure of interest. (Ie: If poverty was the basis of selection, we would expect to find a negative relationship between poverty scores and later achievement w/o the program.) This design type mimics a true experiment in that a group of participants at a cut-off point are randomly assigned to a treatment or a control condition. Then we compare those just above and just below cutoff point. This type of experiment is difficult to implement and more susceptible than true experiments to measurement error.

Ch. 10 Survey Design and Sampling

Survey research is usually not experimental, and the focus is on external validity rather than internal. Generalizability is much more important than internal validity, for the most part.

Selection (sampling) is not assignment: it is concerned with the features of those already in the pool.

  • Census gathers data from every unit (or individual) from a population
  • Survey (sample)gathers data from a representative portion of the pool
  • Precision (standard error)refers to how close a sample estimate is thought represent population
  • Sampling frame list of all units (or individuals) from a specific pool that a sample is taken from
  • Population parameter is the true population value (all in pool were included in the sample)

Two goals of survey sampling:

  1. Efficiency refers to the attempt to balance consideration of the costs with the precision desired.
  2. Economy refers to consideration of reducing expenses in sampling and data collection.

We can sample by using a table of random numbers. Each unit is labeled with a number and we select randomly. This tends to be done with a computer, though there are books filled only with pages of numbers. Another method is systematic sampling, where we take every nth person. This works as long as the list is not arranged in a particular order. For example, if a list had the names of married couples in a row, with the wife listed first, a systematic sampling measure using every 10th person would constitute a sample of only males.

Stratifying controls for variables and reduces the chance of an unusual distribution. To stratify, we need population data on the stratification factors (e.g. political parties) and separate lists for the strata.

  • Stratification: the population is divided into strata before the sample is drawn and a certain number from each strata are included (increases precision)
  • Proportionate stratified (random) sample: (epsem design) the same sampling fraction is used for each stratum (subgroup)
  • Disproportionate stratified (random) sample: a different sampling fraction is used for the strata (subgroups). This design is useful if, say, one or more of the subgroups is important but very small; you might want a larger sample from this subgroup to get a more precise estimate.
  • Cluster sampling: using a segment of the population as the sampling unit (e.g. homeroom)
  • Multistage sampling: using a cluster, then sampling again within these clusters
  • Note: clusters may not be of uniform size, so random selection and the precision of our estimates is compromised. Probability proportional to size sampling (PPS) can fix this problem. This approach ensures that the likelihood of selection within a cluster or multistage sample is equal for all eligible individuals.

    Two-phase sampling is when all respondents take a basic survey, and are then surveyed again. This is useful for sampling rare respondent groups, as these groups are usually disproportionally sampled while the rest of the population is sampled with a lesser sampling fraction.

    Panel surveys assess individual changes of the respondent sample over time.

    Two types of sampling frames: (1) list of names of population and a detailed map of a specific physical environment; (2) no list of population, but a list of clusters where the population may be found

    Random digit dialing is when a random number generator develops lists of telephone numbers or the experimenter randomly chooses numbers from a telephone book.

    When using telephone surveys as opposed to face-to-face, the anonymity provided by phone interviews may increase honesty with questions about sensitive issues. (Ex: “How many times in the last year have you driven while intoxicated?”) This intuition has been supported by numerous studies. However, using the telephone for surveys systematically under-samples the poor.

    Skip pattern: roadmap for interview; questions may be skipped because of previous answers

    Dropouts or non-respondents present a problem, as sampling theory is based on probability theory, which assumes perfect response rates. Ways to fix this problem: cash incentives in mail surveys, send follow-ups and reminders, personalize the survey for the respondent, create shorter questionnaires, and guaranteeing anonymity.

    Research Methods Page 2