Validity, Validity, Validity
Q: What is the difference between a scale and a survey?
A: Scale seeks to measure a certain construct, behavior, knowledge; survey represents a grouping of scales, instructions
Reliability: are you getting consistent scores? (must test it over time – through the same people/same cohort/same time of day) If what you’re measuring isn’t changing, then test-retest reliability is fine.
Validity: does it measure what it’s supposed to measure?
If you’re dealing with a measure that’s not expected to change, reliability is a prerequisite for measurement validity.
Reasons for poor validity:
- Unclear directions
- Different reading sentences and words
- Ambiguity in statements
- Inadequate time limits
- Inappropriate level of difficulty
- Identifiable pattern of answers
- Errors in administration and scoring
- Poor choices of how to test validity
It is true that a scale can get you valid data, but we can never say a scale is valid. Validity is a matter of degree (high validity, moderate validity, low validity, etc.) and is often context-based. A self-esteem scale wouldn’t work on a 1-year-old baby, for example. So we can say a scale has been validated in a certain population. Or we can say the data we collected was valid. In general, when reporting, We don’t think about demand characteristics enough.
Construct Validity: the extent to which a test measures the construct it claims to measure. Am I measuring what I claim to be measuring?
Convergent Validity: a type of construct validity. It is achieved when one measure of concept is associated with different types of measures in the same concept.
Divergent/Discriminant Validity: is achieved when the measure is uncorrelated with theoretically distinct constructs.
Criterion Validity: is the degree to which content on a test (predictor) correlates with performance on relevant criterion measures.
Predictive Validity: very similar to criterion, but defined that criterion is measured immediately and predictive is measured over time. SAT predicting success in college – this is over time. However, criterion validity refers to measuring piano playing via a survey then immediately listening to a student play the piano.
Content Validity: establishing that the measure covers the full range of the concept’s meaning. For example, PERMA measures five things. If I only measure the “A” and then call it a measurement of PERMA, it does not have content validity because it only measures one portion of the whole.
Nomological Validity: a type of validity in which a measure correlates positively in the theoretically predicted way with measures of different but related concepts. How does it move with a slew of related concepts? Picture convergent and divergent of different concepts weaving in a web.
Experimental Validity: a study designed to show that the test “reacts” as it should to a specific treatment. If I have a scale of willingness to forgive then give a forgiveness manipulation, a survey of scale, and show a significance, then I have experimental validity. It moved exactly as I predicted it would.
Face Validity: a test is said to have face validity if it looks like it is going to measure what it is supposed to measure.
Incremental Validity: asks if the test improves on the validity of whatever tests are currently being used. If a scale already exists, create your scale to be shorter OR better (over and above what the current scale measures).
Structural Validity: use confirmatory factor analysis to determine whether items are consistent with the predicted dimensions.
Postdictive Validity: a type of criterion-related validity. Can I give you a scale now and predict things that happened in the past?
Local Validity: explicit check on validity of the test for your population and application.
Generalization Validity: degree to which validity can be extended to other contexts. (No different than external validity)
Consequential Validity: refers to whether or not the uses of the test results are valid. This is difficult because researchers can’t control how their scales are utilized.
Survey, Survey, Survey
“Flying planes can be dangerous…” What does that mean? Does it mean that a pilot flying a plane can be dangerous or that planes in the sky can be dangerous?
Response sets with predetermined answer sets drastically affect the responses. When asking a person (by a mail survey) how much time they spend doing homework per night, their answer drastically varies based on the response set. When the increments were small (ie: 0-.5 hours, .5-1 hour, etc.) with the top response being more than 2.5 hours, 23% said they studied more than 2.5 hours. However, when the bottom increment was under 2.5 hours and the answers escalated from there, 69% said they studied more than 2.5 hours per night.
Measurement Error: the result of inaccurate responses that stem from poor question wording, poor interviewing, survey mode effects and/or some aspect of the respondent’s behavior.
When crafting a survey, it’s all about making good decisions. The more you know about what influences responses, the more questions you will know to ask yourself, and the more successful you will be.
Getting people vested in your survey gets cleaner responses.
Remember: (1) respondent’s attention is a limited resource; (2) respondent is looking for any reason to answer without reading and not feel bad about it; (3) the credibility of the survey can decrease or increase the level of focus of the respondent; (4) very few people will read every word.
Principles of Writing Survey Questions
- Avoid double-barreled questions. Therefore, avoid questions like: “How many days have you worked this season when you were injured or ill?” “Does your job require you to do repeated lifting, pushing, pulling, or bending?” It’s a bad idea to ask two questions in one. You don’t know which part of the question a person is answering. Also, people are not going to read the questions carefully, so even if it’s “properly” worded, their answers are unclear.
- Choose simple or over specialized words. Therefore avoid crafting a question like: “How much do you worry about contamination of drinking water from hydroelectric?” is a poor choice. Use ‘tired ‘rather than ‘exhausted.’ Use ‘honest’ rather than ‘candid.’ Use ‘most important’ rather than ‘top priority.’
- Choose as few words as possible. You want to do whatever you can to make sure that your respondents will read all of the words. An example of what not to do: “The working environment is sufficiently enlightened.”
- Avoid negatively worded “strongly disagree/strongly agree” items. For example, say: “I hate drugs.” Don’t say: “I do not like drugs.” Here’s a poor question on a scale question taken from a published article: “Being able to do things that don’t go against my conscience.”
- Use complete sentences to ask questions. Don’t just write Age: ___ instead ask “How old are you?”
- Use a question about a specific attitude if you want to predict a specific behavior. For example, asking a question like: “What is your attitude toward breast feeding?” will give you somewhat of a relationship to the behavior. However, being more specific and asking: “What is your attitude about breastfeeding your baby” will give you a clearer predictor.
- Do not make assumptions about the literacy levels of your respondent. Ex: “How much do you worry about radionuclides in food grown near a nuclear plant?”
- Do not make assumptions about the knowledge levels of your respondent. Ex: “My parents seriously consider any of the farm worker suggestions for improving safety?” What if the person does not know?
- Avoid vague qualifiers when more precise estimates can be obtained. Here’s an example: “Graduate ‘whizz kids’ are pampered at Sallafield.” Or asking a question like “How often do you go to church?” with answer set varying from “often” to “seldom” provides unanalyzable data because everyone is answering a different question.
- Avoid specificity that exceeds the respondent’s potential for having an accurate, ready-made answer. If you ask a question like “How many emails have you received this year? Be specific.” – people will not be able to accurately answer the question. It is much easier to answer: “How many emails do you receive on average each day?”
- Use equal numbers of positive and negative categories for scalar questions. Ex: “Would you say in general that your health is: excellent, very good, good, fair, or poor?”
- Be sure there is equal difference between interval scales. Ex: “How many times do you kill people in a year? Always, Nearly always, Sometimes, Rarely, Never.” There’s a big gap between “never” and “rarely.”
- Distinguish neutral, don’t know, and not applicable. For example, keep these five items together: Strongly Disagree, Disagree, Neutral, Agree, and Strongly Agree. Then have a distinct line break and list: Don’t Know.
- Spread out your response set to obtain more variance (except in low-education samples). In general, use a 7-point Likert scale, rather than a 4-point scale.
- State both sides of the attitudinal stem. Ex: “To what extent do you agree that the new safety tools are useful?” Instead, use: “to what extent do you agree or disagree…” Also, if you write, agree or disagree, be sure to start response set with the positive. Important if you’re doing pre-test and post-test that you keep the scales in the same order between versions.
- Eliminate check-all-that-apply question formats to reduce primacy effects. You’ll have a lot of error if you’re not counter-balancing. There’s also a poor way to differentiate between the items. And lastly, items at the end of the list will be left off disproportionately. It’s better to revise with using each item on a strongly disagree/strongly agree category.
- Develop response options that are mutually exclusive. Example of a poorly worded question: “Where did you hear about the coal mining accident? – On my way to work; On the radio; While at work; A coworker told me; The television.”
- Use cognitive techniques to improve recall. “What did you think of your senior prom?” is a much different question than if it is asked after: “Please think about what you wore to your senior prom. Who was your date? How did you arrive? Where did you have dinner?” Setting the mood will get better information. However, there is a danger in priming participants. Try to keep the cognitive techniques value neutral – like asking about the weather – rather than planting emotions.
- Avoid the use of slang or jargon. For example, a poor Likert scale could range from (1) “totally” to (5) “not at all.”
- Provide appropriate time referents. A poor example: “Since 1976, how often have you been afraid for your safety while at work?” Don’t ask: “How many times did you see the doctor this month.” Instead, ask: “During the entirety of last month, how many times did you visit the doctor?”
- Soften the impact of potentially objectionable questions. For example, asking “Have you ever been arrested?” as your first question will not only set the tone for the survey, but will also lead participants to believe that the entire questionnaire is filled with sensitive questions. You want to build trust during the questions. Ask sensitive questions toward the end. Build on their momentum of answering questions, based on their time already invested. Also, if the sensitive question turns people off, they will at least have completed the majority of the survey.
- Likert items are meant to be neutral. Use the response set to show the intensity. It’s difficult to answer a question like: “The safety training was the best ever” is difficult to answer on a Likert scale. What would be the difference between agreeing with this statement and strongly agreeing?
- Avoid items that are too similar (if you need to keep similar items on the scale, enlighten the respondent for better results. Giving them a reason such as: “some of these questions appear similar, but there are slight nuances that we have used for control purposes” will get more of a buy-in.
- If you want respondents to rate themselves on an attribute, it is often good to provide a comparison group.
- Be sure each question is technically accurate.
- Avoid asking respondents to make unnecessary calculations.
- Put demographic questions at the end of the survey.
- Choose question wordings that allow essential comparisons to be made with previously collected data.
Respondents have to deal with multiple tasks: interpret, recall, format answers, and report.
Three primary sources of response effects:
- Wording of the question. For example: “Should your university allow hate speeches?” is a different question than: “Should your university forbid hate speeches?”
- Processing and influencing factors. The key is to know that the respondent understands the question in the same way the researcher wanted it to be understood.
- Response alternatives. Response alternatives will influence your response. Structure may limit your response set (more than open-ended questions). The scale will have an impact on the response (as discussed before: under 2.5 hours, 2.5-3 hours, 3-3.5 hours, etc.)
Question Context: the previous question can influence how the next question is perceived. Ex: Are you satisfied with your health insurance? Then next asking: Are you satisfied with the U.S. health care system? You activate a person’s perceptions of the U.S. health care system by first having them think of themselves.
When asking a person to recall or compute a judgment, it’s important to consider if the person has formed a judgment in the first place and if that judgment is accessible. For example, asking about the stairs-to-elevator relationship at the university could report a problem that isn’t even there.
To answer a question about past behavior…
- Understand the question
- Identify the behavior of interest
- Retrieve relevant instances from memory
- Correctly identify the relevant time period
- Correctly date the recalled instances to determine whether they fall within the reference period
- Correctly sum all instances of the behavior to arrive at a frequency report
- Map frequency onto the response alternatives provided by the researcher
- Candidly provide the results of the recall effort to the researcher
Formatting the response: respondents will alter their responses to fit the answer format
Range Effect: the different points on a scale will influence answers. People try to use all answers. If people get to the final question and they’ve never answered with “E” before, they’re going to be inclined to answer “E.”
Frequency Effect: answers change based on what’s around you. For example, asking a group of middle school boys how many girls they’ve kissed will yield different answers depending on the others answering.
Psychological Sources of Context Effects
- Context effects at the comprehension stage: what will happen if a respondent encounters an ambiguous question? Or what happens if we encounter an expert? The expert is more likely to answer consistently than the novice. The novice will be susceptible to mood.
- Context effects at the judgment stage: question difficulty; previously formed judgments may be primed; the norm of reciprocity; subject experience/mood; availability; interviewer bias. (The act of taking a survey often alters a person’s feelings toward a subject.)
- Context effects at the formatting stage: Big Square/Small Square; perceptive effect/range effect. If you ask how moral or immoral a person is who kills 100 people then ask how moral or immoral was the Son of Sam who killed 8 people, the extreme looks a lot better.
- Effect of rank-ordering on subsequent ratings – asking least important/most important questions, people tend to lean left. So if you begin with “good” and end with “bad” on a Likert, people side more with the good on the left side
- Context effects at the editing stage: answers to one question early on can influence questions later.
The Direction of Context Effects — The Inclusion/Exclusion Model specifies the conditions under which question order effects emerge at the judgment stage and predicts their direction, size, and generalization across related issues.
Assimilation Effects: information that is included in the temporary representation that individuals form of the target of judgment will result in assimilation effects because the judgment is based on the information included in the representation used. Basically, you’re assimilating your current mood and information set with what is being asked. The more extreme the information, the more extreme the effect. The less information that is chronologically accessible, the greater the assimilation effect. Expertise helps to moderate assimilation effects. The number of preceding questions impact as well.
Contrast Effects: subtraction based contrast effect (with the exception of this, what do you think about that?) and comparison based contrast effect (think about the professor dating 85 students then think about the professor who forgot to cite a couple sources). It’s difficult to know which way the respondent will lean because we don’t know which information will come to their mind.
- Norm of evenhandedness (reciprocity)
- Anchoring or cognitive based anchor effect (contrast effects of being anchored here and then anchored there)
- Additional carryover effects (assimilation effects)
- Increased positiveness of summary items when asked after the specific items on the same subject. (If you ask several specific positive questions then ask about the overall of that issue, the answer will be more positive. The reverse is true when asked about the negative.)
Implications for questionnaire construction:
- The content of the preceding question determines the information that becomes temporarily accessible in memory
- The number of preceding questions is important
- The generality of the target question is important
Constructing the Questionnaire
Goals in constructing the questionnaire:
- You want every person to read every word
- You want the verbal language and graphical language in accord with one another
- Reduction of not response
- Reduction or avoidance of measurement error
Dillman recommends the booklet format
Dillman’s criteria for ordering the questions:
- Organize the questions by topics
- Organize them in a logical order
- Start with the most salient questions
- Place sensitive questions at the end of the questionnaire
- When possible, group similar response sets together
- Make sure it applies to everyone
- Make sure it’s easy to understand
- Make it interesting
- Make it in accord with the cover letter
Choose the first question carefully:
Many people advise not to start with an open-ended question or else the respondent will assume they’re all open-ended and will hesitate to complete the survey.
Step 1: Define a desired navigational path for reading all information presented on each page of the questionnaire
- Write the questions in a way that minimizes the need to reread portions in order to comprehend the response task
- Please instructions exactly where that information is needed – not simply a lump of instructions all at the beginning
- Place items with the same response categories into an item-in-a-series format (carefully)
- Ask one question at a time
- Minimize the use of matrices
Step 2: Create visual navigation guides and use them in a consistent way to get respondents to follow the prescribed navigational path and correctly interpret the written information.
- Increase the size of written elements to attract attention
- Increase the brightness or color (shading) of visual elements to attract attention and establish appropriate groupings
- Use spacing to identify appropriate groupings of visual elements
- Use similarity to identify appropriate groupings of visual elements
- Maintain a consistent figure/ground format to make responding easier
- Maintain simplicity, regularity, and symmetry to make responding easier
- Begin asking the question in the upper left quadrant; place any info needed by respondent in the lower right quadrant
- Use the largest or brightest symbols to identify the starting point on each page
- Identify the beginning of each succeeding question in a consistent way
- Number the questions consecutively and simply from beginning to end
- Use a consistent figure/ground format to encourage the reading of all words
- Limit the use of reverse print to section headings or question numbers
- Place more blank space between questions than between subcomponents of questions
- Use dark print for questions and light print for answer choices
- Place special instructions inside of question numbers and not as freestanding entities
- Optional or occasionally needed instruction should be separate from the question statement by font or symbol variations
- Do not place instructions in a separate instruction book or in a separate section
- Use lightly shaded colors as background fields on which to write all questions to provide an effective guide to respondents
- When using a shaded background, identifying all answer spaces in white helps reduce non-response
- List answer categories vertically instead of horizontally
- Place answer spaces consistently to either the left or right of category labels
- Use numbers or simple answer boxes for recording answers
- Vertical alignment of question subcomponents among consecutive questions eases the response task
- Avoid double or triple banking of answer choices
- Maintain spaces between answer choices that are consistent with measurement intent
- Maintain consistency throughout a questionnaire in the direction scales are displayed
- Use shorter lines to prevent words from being skipped
Step 3: Develop additional visual navigational guides, the aim of which is to interrupt established navigation behavior and redirect respondents
- Major visual changes are essential for gaining compliance with skip patterns
- Words and phrases that introduce important but easy to miss change in respondent expectation should be visually emphasized consistently but sparingly
- Major visual changes are essential for gaining compliance with skip patterns
- Words and phrases that introduce important but are easy to miss should be visually emphasized consistently but sparingly