Chapter 5: Methodological Issues

This chapter examines data that can provide added background about a respondent's interview session.  Section 5.1, Item Nonresponse, looks at the issue of missing data.  Section 5.2, Interview Validation, provides an overview of the process of recontacting respondents for data quality assurance.  Section 5.3, Interviewer Remarks, focuses on interviewer feedback about the interview itself.  These special data can be particularly helpful when researchers seek to understand outlier responses and anomalies within a specific case.

Return to Table of Contents


5.1 Item Nonresponse

Missing data, or nonresponse, occurs for a number of reasons in the NLSY97 survey.  First, a number of respondents may not participate at all that survey year, causing all information for those respondents in that particular survey year to be missing.  The created variable "Reason for Noninterview" (RNI) is available in each survey round and provides counts for the different reasons (unable to be located, refusal, deceased, etc.) a respondent is not interviewed.  The extent of non-participation in each survey round is illustrated in Chapter 2 (see section 2.4  'Retention and Reasons for Noninterview').

A second reason missing data occurs is that respondents do not provide a valid answer to a question.  When this happens, interviewers make a determination about whether to mark the answer as a 'refusal' or a 'don't know' value.  Interviewers are trained to distinguish between refusal and don't know responses.  For example, a refusal usually stems from such respondent comments as "That's none of your business," "I don't want to say," "I'm not comfortable telling you that," or "I don't want to answer." A 'don't know' response is coded from respondents comments such as, "I have no idea," "I don't know how I could guess," "I wouldn't know," or "I'm not sure how to answer that." Standard interviewing protocol calls for interviewers to try to convert an item non-response either by allaying the concerns underlying a refusal (for example, by assuring privacy or citing the research reasons for a particular questionnaire item) or by providing cognitive aids to the respondent who "doesn't know" (for example, asking "Do you remember what season it was?" or "Do you have a guess what the range might be?).  Only if conversion attempts are ineffective do interviewers record a 'refusal' or 'don't know' response.

A valid skip is another reason for missing data.  Respondents do not answer every question of the survey.  For instance, some questions might apply to only females or a certain age range.  Users should trace back skip patterns to determine whether a respondent was skipped out because a given topic was inapplicable to him/her or because the respondent answered similar questions along a different path.

Missing data can also occur when there is an incorrect flow in the survey instrument.  Incorrect flows may result in some respondents being skipped over a set of questions that should be answered while others answer questions that they should not have been asked.  NLS data archivists have removed from the data most of the extraneous question responses.  While extra information can be removed, missing data is not imputed in the NLSY97 surveys.  Missing data caused by this reason is flagged with a special 'invalid skip' code.  The use of CAPI for surveys reduces the number of invalid skips in complex questionnaires; nevertheless, some invalid skips are still possible in CAPI data.  If the CAPI survey contains a programming mistake, the instrument could incorrectly sequence a respondent.  When these errors are found, the CAPI survey can be corrected in the field to prevent further invalid skips, but the missing data from already completed cases are not retrieved.

All missing data are clearly flagged in the NLSY97 data set with five negative values:    (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview.  In general, these five negative values are reserved as missing value flags.  As an example, Figure 5.1.1 shows the item, "How is R's general health?" Within the item codeblock, the user can see that 7,494 respondents in 2004 gave responses ranging from "excellent" to "poor," three people refused to answer (-1), four people reportedly did not know (-2), one person was not asked the question and was thus a valid skip (-4), and 1482 people were not interviewed that survey year (-5).  In this example, there are no invalid skips (-3).

5.1 Figure 1. NLSY97 Questionnaire Item Codeblock with Nonresponse Highlighted

As would be expected, more sensitive questions in the survey tend to yield a higher amount of missing data in the "refused" categories. 

To improve accuracy of reporting, many of the more sensitive questions are found in the self-administered questionnaire (SAQ) portion of the survey, which in-person respondents answer privately using a laptop.  If the survey is done by phone, however, the SAQ section is not self administered and must be administered verbally by the interviewer.

Return to top


5.2 Interview Validation

After each round of the NLSY97, validation reinterviews are conducted with randomly selected respondents.  These data offer opportunities for studying response variance, item reliability, and other methodological issues.

Approximately 10% of completed interviews are validated every round.  The main goals of the reinterview are to assess the quality of the interview data and to make sure the interview was done properly.  To confirm that the interview was conducted properly, respondents are asked whether the interview took place, length of the interview, mode of the interview, number of sessions required, and whether the incentive payments were made.  To assess data quality, respondents are asked a few questions from the survey. Some of these questions change from year to year.  The data are analyzed to compare the responses in the interview and validation reinterview.  The selected items are items most respondents will have answered in the main interview and represent a range of expected reliability, including items that are likely to agree between interview and reinterview as well as items that may disagree.

Cases are selected to achieve 10% validation of each interviewer's caseload.  The current validation system is designed to select randomly approximately 13% of each interviewer's completed cases for validation.  Additional cases are included for validation if there is concern about some aspect of the interview.  The majority of validation interviews are completed within three weeks of the main interview date.  Occasionally, a small number of reinterviews have been conducted by mail or in person. 

The data from validation can be used to study the consistency of responses between the interview and reinterview.  Estimates of simple response variances can be calculated from the validation data.  For instance, in rounds 2 through 4 it was found that the proportion of cases where the interview and reinterview responses did not match (mismatch rate) was quite low for many factual questions such as the type of housing of the respondent, the highest grade attended, or whether the respondent reported having any income.  Some of these differences could be due to differences in question wording or mode and may be affected by the length of the recall period.  However, in round 4, the mismatch rate was high for questions about expectations, showing that the respondent's expectations were quite unstable.  The validation data can also provide interesting insights about interviewer performance, both positive and negative.   

Although reinterviews have been administered each year since round 2, rounds 2 and 3 were not released for public users.  The public variables have "VALIDRx" as the beginning of each question name and are found in the main file data set.

Round 4:  Between November 2000 and July 2001, 989 respondents completed validation reinterviews for round 4.  This produced an overall project validation rate of 11.9 percent of completed interviews.  The short telephone questionnaire included a validation component that asked for details about the respondents' original round 4 interviews (e.g., duration, mode) and information on whether or not they were paid for their participation.  The reinterview component involved re-asking questions that were drawn directly from the youth interview.  This component included some characteristics of their current residence, several expectations questions, a question about weekly family activities, and two questions concerning the respondent's income from the previous year.  Finally, respondents were asked whether the interviewer they had in round 4 was the same one who conducted their interview in round 3.

Round 5:  Between November 2001 and June 2002, 1,036 respondents completed validation reinterviews for round 5.  This produced an overall project validation rate of 13.1 percent of completed interviews.  Re-interview questions included marital status, current employer name and start date, three child care questions (if female and born in 1983-84), and the respondent's total income  from the previous year.  Respondents also were asked whether the interviewer they had in round 5 was the same one who conducted their interview in round 4.  Finally, new in round 5 was the addition of a question that asked respondents why they completed the interview. 

Round 6:  Between November 2002 and June 2003, 876 respondents completed validation reinterviews for round 6.  This produced an overall project validation rate of 11.1 percent of completed interviews.  The reinterview component involved re-asking questions that were drawn directly from the youth interview.  This component included the number of people in the household, current employer name and type of employer (government, private, non-profit, family business), if the respondent was in the Armed Forces, the number of illnesses or injuries in the last 12 months, if a close relative had died in the past 5 years, if the respondent had ever received food stamps, and the respondent's total income from wage/salary from the previous year.  Respondents also were asked whether the interviewer they had in round 6 was the same one who conducted their interview in round 5.  In addition, respondents were asked why they completed the interview. 

Round 7:  Between October 2003 and July 2004, 1,145 respondents completed validation reinterviews for round 7.  This produced an overall project validation rate of 14.8 percent of completed interviews.  The short telephone questionnaire included the length of the re-interview, mode of interview, number of sessions it took to complete the interview, whether the respondent received payment for the interview and if so, the amount of the payment, the number of people in the household, highest grade attended, if respondents had lived on their own, if the respondent received unemployment compensation benefits, the number of illnesses or injuries in the last 12 months, the respondent's total income from wage/salary from the previous year, whether the same interviewer conducted the prior interview, and if the respondent read any materials provided by NORC.

Round 8:  Between November 2004 and July 2005, 934 respondents completed validation reinterviews for round 8, for an overall project validation rate of 12.4% of completed interviews.  This component included the questions mentioned above for round 7 except the question about living on one's own; it also included a question about whether the respondent had registered to vote in the November 2004 election.

Round 9:  Between November 2005 and August 2006, 1080 respondents completed validation reinterviews for round 9, for an overall project validation rate of 14.7% of completed interviews.  This component included questions about the length of the re-interview; mode of interview; use of a laptop during the interview; number of sessions needed to complete the interview; incentive details; number of household members; highest grade attended; if respondent received unemployment compensation benefits; marital status; number of illnesses or injuries in the last 12 months; respondent's total income from wage/salary from the previous year; whether the same interviewer had conducted the prior interview; and what methods the project had used to contact the respondent;

Return to top


5.3 Interviewer Remarks

Each NLSY97 questionnaire includes an interviewer remarks section containing interview-specific information, which interviewers complete after finishing the interview with the respondent.  Some of the information is objective (the presence of another person during the survey, for instance) while other information is subjective on the part of the interviewer (such as rating how cooperative the respondent was). Questions found in the interviewer remarks section have the prefix "YIR" in their question name.  Interviewers do not receive specific training on completing subjective items, so ratings are not likely to be comparable across interviewers.

Special circumstances. All survey rounds feature a series of questions about special circumstances that might have affected the quality of the data. The interviewers were asked to assess whether the respondent was hard of hearing, unable to see well, unable to read, lacking in basic social skills, mentally handicapped or retarded, physically handicapped, ill/injured, under the influence of drugs or alcohol, had a poor command of English, or experienced environmental distractions during the interview. Starting in round 6, interviewers also were asked to assess whether the respondent had reading abilities good enough to complete the self-administered sections of the interview without an audio component.

Respondent's general demeanor and responsiveness.  In all survey rounds, interviewers rated how informative/cooperative and how candid/honest a respondent appeared during the interview. In addition, the interviews assessed the respondent's overall understanding (good, fair, poor) of the questions. From round 2 on, the interviewer also recorded the number of calls and/or visits needed to complete the interview.

Other respondent characteristics. Starting in round 2, interviewers recorded race and gender for each respondent based on the interviewers' observations. Beginning in round 6, the race question conformed to new OMB guidelines on collecting racial identification of individuals.  The same guidelines are followed in all race questions in the NLSY97 interview since round 6.

Presence of others during interview. All survey rounds include information about whether others were present (listening and/or participating) during the interview and who the person or persons were (infant child, family member, etc.).  Interviewers attempt to secure a private environment for all interviews, so the presence of another individual (other than a small child) is an exception and can be considered a disruption to the interview. 

Home and neighborhood characteristics. In each survey round, the interviewer recorded where the interview took place: inside the respondent's home, immediately outside the respondent's home, in the interviewer's vehicle, or in a separate location. In survey rounds 1-5, where possible, interviewers assessed the exterior and interior condition of the respondent's home and the general state of other buildings in the neighborhood. Interviewers for rounds 1-5 also provided a description of the neighborhood (rural, urban, etc.), the most common type of residence (single home or apartment buildings, for example) found on the respondent's street, and whether the interviewer was concerned for his or her safety while at the interview.

Interview methodology. Starting in round 2, interviewers recorded whether any portion of the interview took place on the phone. In all rounds, interviewers indicated if the interview was in Spanish or English.

Interviewer retention. Starting in round 2, interviewers indicated each survey round whether they had interviewed that respondent the previous survey year.


Return to top Return to Table of Contents