Item Nonresponse


Missing data, or nonresponse, occurs for a number of reasons in the NLSY97 survey.  First, a number of respondents may not participate at all that survey year, causing all information for those respondents in that particular survey year to be missing.  The created variable "Reason for Noninterview" (RNI) is available in each survey round and provides counts for the different reasons (unable to be located, refusal, deceased, etc.) a respondent is not interviewed.  The extent of non-participation in each survey round is illustrated in Retention & Reasons for Noninterview.

A second reason missing data occurs is that respondents do not provide a valid answer to a question.  When this happens, interviewers make a determination about whether to mark the answer as a 'refusal' or a 'don't know' value.  Interviewers are trained to distinguish between refusal and don't know responses.  For example, a refusal usually stems from such respondent comments as "That's none of your business," "I don't want to say," "I'm not comfortable telling you that," or "I don't want to answer." A 'don't know' response is coded from respondents comments such as, "I have no idea," "I don't know how I could guess," "I wouldn't know," or "I'm not sure how to answer that." Standard interviewing protocol calls for interviewers to try to convert an item non-response either by allaying the concerns underlying a refusal (for example, by assuring privacy or citing the research reasons for a particular questionnaire item) or by providing cognitive aids to the respondent who "doesn't know" (for example, asking "Do you remember what season it was?" or "Do you have a guess what the range might be?).  Only if conversion attempts are ineffective do interviewers record a 'refusal' or 'don't know' response.

A valid skip is another reason for missing data.  Respondents do not answer every question of the survey.  For instance, some questions might apply to only females or a certain age range.  Users should trace back skip patterns to determine whether a respondent was skipped out because a given topic was inapplicable to him/her or because the respondent answered similar questions along a different path.

Missing data can also occur when there is an incorrect flow in the survey instrument.  Incorrect flows may result in some respondents being skipped over a set of questions that should be answered while others answer questions that they should not have been asked.  NLS data archivists have removed from the data most of the extraneous question responses.  While extra information can be removed, missing data is not imputed in the NLSY97 surveys.  Missing data caused by this reason is flagged with a special 'invalid skip' code.  The use of CAPI for surveys reduces the number of invalid skips in complex questionnaires; nevertheless, some invalid skips are still possible in CAPI data.  If the CAPI survey contains a programming mistake, the instrument could incorrectly sequence a respondent.  When these errors are found, the CAPI survey can be corrected in the field to prevent further invalid skips, but the missing data from already completed cases are not retrieved.

All missing data are clearly flagged in the NLSY97 data set with five negative values: (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview.  In general, these five negative values are reserved as missing value flags.  As an example, Figure 1 shows the item, "How is R's general health?" Within the item codeblock, the user can see that 7,494 respondents in 2004 gave responses ranging from "excellent" to "poor," three people refused to answer (-1), four people reportedly did not know (-2), one person was not asked the question and was thus a valid skip (-4), and 1,482 people were not interviewed that survey year (-5).  In this example, there are no invalid skips (-3).

Figure 1. NLSY97 Questionnaire Item Codeblock with Nonresponse Highlighted

               
              S49195.00   [YHEA-100]                             Survey Year: 2004                           

                PRIMARY VARIABLE
                             HOW IS R'S GENERAL HEALTH?
            
             Now I would like to ask you some questions about your health.
             In general, how is your health?
                2237       1 Excellent
                2709       2 Very good
                1988       3 Good
                 508       4 Fair
                  42       5 Poor
              -------
                7494 
              Refusal(-1)            3
              Don't Know(-2)         4
             TOTAL =========>     7501   VALID SKIP(-4) 1   NON-INTERVIEW(-5)  1482

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Lead In:  S49194.00 [Default]    S48965.00 [1:1]

Default Next Question:  S49196.00
 
 


As would be expected, more sensitive questions in the survey tend to yield a higher amount of missing data in the "refused" categories. 

To improve accuracy of reporting, many of the more sensitive questions are found in the self-administered questionnaire (SAQ) portion of the survey, which in-person respondents answer privately using a laptop.  If the survey is done by phone, however, the SAQ section is not self administered and must be administered verbally by the interviewer.