Sample Design & Screening Process

Sample Design & Screening Process

Sampling Procedures

In 1978 a list of housing units in selected areas of the United States was created for the first NLSY79 interview. Interviewers went to a random sample of these homes and performed a short interview, called the screener, which provided basic information on every resident of the household. Also using a random sample of Department of Defense records, the survey included members of the military.

Together these two processes provided information, such as name, age, sex, race, and address, on more than 155,000 people. This information was used to identify all individuals ages 14 to 21 as of December 31, 1978. Based on this screener information, each appropriately aged individual was assigned to one of the sample groups. Then, in 1979, interviewers asked individuals on this list to participate in the first NLSY79 interview. Any person who completed the first round interview is considered a member of the NLSY79 cohort.

Important Information

  • Users can identify a respondent's ID number by using variable R00001.
  • Users can identify a respondent's sample type by using variable R01736.

Three independent probability samples compose the NLSY79. These samples are designed to represent the entire population of youth aged 14 to 21 as of December 31, 1978, residing in the United States on January 1, 1979. The three samples are:

  1. a cross-sectional sample (6,111) designed to represent the noninstitutionalized civilian segment of young people living in the United States in 1979 and born January 1, 1957, through December 31, 1964
  2. a set of supplemental samples (5,295) designed to oversample civilian Hispanic or Latino, black, and economically disadvantaged, nonblack/non-Hispanic youths born in the same time period
  3. a military sample (1,280) designed to represent the population born January 1, 1957, through December 31, 1961, serving in the military as of September 30, 1978. The inclusion of the military sample allows comparative civilian-military analyses.

Beginning in 1986, additional information was collected about children born to female NLSY79 respondents. The child sample, when weighted, is representative of American children born to the population of women born in 1957 through 1964 and living in the United States in 1979. The sampling procedures used to select the civilian and military subsamples differed and are discussed separately below. For additional information on NLSY79 sampling procedures, see Frankel et al. (1983) and the Interviewer's Manual for the 1978 household screening (NORC 1978). Sampling issues related to the Children of the NLSY79 are discussed in Baker et al. (1993) and in the NLSY79 Child & Young Adult Data Users Guide.

Screening Procedures

To find people of the correct age and ethnic composition, survey staff screened a large number of individuals in two separate procedures. First, household screening interviews were conducted to select the NLSY79 civilian cross-sectional and supplemental subsamples from the civilian population. Then, a second screening was done to identify the military sample. While the civilian screening made use of field interviewers going to selected households, the military sample was drawn from Department of Defense records.

NORC administered the civilian sample screening interview in approximately 75,000 dwellings and group quarters. These interviews occurred in 1,818 sample segments of 202 Primary Sampling Units (PSUs), which included most of the 50 States and the District of Columbia. The screening interview was designed to elicit information that would identify persons eligible for inclusion in the NLSY79 sample. The civilian screening interviews were completed within 91.2 percent of the cross-sectional and 91.9 percent of the supplemental occupied dwelling units selected for screening.

Cross-Sectional Sample. Approximately 18,000 of the screening interviews were carried out among 918 sample segments in 102 Primary Sampling Units (PSUs), which were selected from the NORC Master Probability Sample of the United States.

Supplemental Sample. A total of 57,000 screening interviews for the supplemental sample were carried out among 900 sample segments in a 100-PSU sample specifically designed to produce statistically efficient samples of Hispanics or Latinos, blacks, and economically disadvantaged, nonblack/non-Hispanics. The NLS sample design, which selected every eligible person connected to the household, generated a representative sample of siblings and spouses living in the same household and satisfying the age restrictions stated above. However, NLSY79 samples do not contain nationally representative samples of siblings and spouses of all ages and living arrangements. When the NLSY79 is used to study sibling pairs and married couples, care must be used in generalizing from the findings of such studies.

Procedures were also developed to establish "linkages" between dwellings and certain types of individuals who might be temporarily absent. As part of the initial screening for the civilian sample, household respondents were asked if there were any persons with primary family connections to the household who were away from the household at the time. Included in this group were college students, military personnel, and those in prisons or other institutions. Household screener respondents were also asked to name persons who might occasionally stay at the dwelling who did not have any other "usual place of residence." For each individual identified in this process, an attempt was made to determine whether the individual would be "linked" to some other household, such as college students living off campus in their own dwelling units. All individuals without other linkages were included in the household composition for purposes of sampling.

Military Sample. Persons on active military duty as of September 30, 1978, were sampled from rosters provided by the Department of Defense. No formal screening interview was conducted.

Sampling Process

Civilian Sample. All civilian sample selection was accomplished through a multi-stage stratified area probability sample of dwelling units and group quarter units. A moderate degree of oversampling of dwelling units within sample listing segments was employed in order to increase the sample composition with respect to the targeted groups of the supplemental sample.

Base year samples of Hispanics or Latinos, blacks, and economically disadvantaged, nonblack/non-Hispanics were selected from individuals identified in both the 102 PSU cross-sectional sample and the 100 PSU special purpose sample. To the extent that individuals identified in the screening phase were obtained with different probabilities of selection (because of selective oversampling), the weighting of base year samples attempts to minimize these probability differences. Since the use of oversampling tends to decrease sample efficiency (that is, variance), attempts were made to minimize required oversampling.

Important Information

At all selected dwellings, attempts were made to obtain appropriate classification information for all persons living in the dwelling. In order to minimize the potential for "interviewer effect," survey interviewers were not informed about specific groups that would be included in the subsequent interviews. However, the distribution of month of birth by birth year departs from randomness for the youngest members of the cohort, those born in 1964 (refer to Figure 1). This lack of randomness most likely comes from two sources. First, some of the screening was done by supervisors and other higher level staff who were familiar with the specific age groups that belonged in the survey, which could have introduced interviewer bias. Second, families who wanted to find out more information could contact NORC or the Department of Labor and find out the age ranges the survey was trying to capture. This extra information could have led to nonrandom self-selection at the edge of the age range.

Figure 1. Number of Respondents Born Each Month by Birth Year 1

1  The month and year of birth is taken from the 1978 screener (R00003. and R00005.). 
Respondents were asked about their dates of birth again in 1981, but the use of these
values does not change the results indicated above. The 1957-63 value can be found by
averaging the total number of birth dates reported for each month over all of the years.

Assignment of a youth to either the cross-sectional or supplemental sample was made using information collected during the household screening interviews and a set of coding instructions prepared by NORC. During the 1978 household screening interviews--from which the sample of NLSY79 respondents was drawn--information was collected on the sex, race, and ethnic origin of each household member and on the total income of the family unit during the past 12 months. A detailed set of coding procedures transformed these raw data into a combined racial/ethnic identifier and an economically disadvantaged qualifier. These criteria were used not only to assign a youth to one of the various subsamples, but also to specify the primary race or ethnicity variable, which provides the basis for weighting. 

Other technical information on the sample assignment process can be found in the Household Screener and Interviewer's Manual (NORC 1978), which includes a copy of the screening instrument and detailed instructions to interviewers for administering the race, ethnic origin, and family income questions; and the Technical Sampling Report (Frankel et al. 1983), which describes the NLSY79 sample selection procedures for the civilian and military subsamples. Both of these documents are available at Further information can be found:

  1. in the 10/4/78 NORC memorandum, which provides the rules used to assign race and poverty status from responses to the screening questions;
  2. a copy of the 1978 poverty income levels by family size and farm-nonfarm residence; and
  3. the Race, Ethnicity, & Immigration section of this guide, which summarizes information in these documents as it relates to the assignment of "Hispanic," "Black," and "non-Hispanic, nonblack" origins used in the sample identification code variable (R01736.) and the race/ethnicity variable (R02147.). 

Base year interviews with the three subsamples were conducted between January and mid-August 1979. Table 1 summarizes base year completion rates for each sample.

Table 1. Base Year Interview Completion: NLSY79

  Designated for Interviewing Interviewed Initial Survey Year
Total Cohort 14574 12686 87%
Cross-Sectional Sample1 6812 6111 90%
Supplemental Sample1 5969 5295 89%
Military Sample 1793 1280 72%
1 As determined through the household screening.

Cross-Sectional Sample. Following the initial screening process, 6,812 individuals from the cross-sectional sample were designated to be interviewed in the base year; of those, 90 percent or 6,111 respondents were actually interviewed in 1979. The cross-sectional sample is designed to maximize the statistical efficiency of samples which are "cross-sectional" with respect to the rural population. Specifically, through the several stages of sample selection (counties, enumeration districts-block groups, sample listing units), probabilities of selection are based upon either total population or total housing units. Except for the economically disadvantaged supplemental sample, sampling of nonblack/non-Hispanic respondents was restricted to the 102 PSU National Sample.

Supplemental Sample. After screening, 5,969 individuals from the supplemental sample were designated for base year interviews, and of these, 89 percent or 5,295 respondents were actually interviewed. Stratification specifically relevant for Hispanics or Latinos, non-Hispanic blacks, and economically disadvantaged, nonblack/non-Hispanics was used. Probability proportional to size procedures were based on size measures for these groups rather than for the general population, making it possible to more nearly equalize the distribution of the targeted groups among the various sampling units than would otherwise be the case.

Military Sample. Of the 1,793 military youth selected for interviews, 1,280 or 72 percent were interviewed in 1979. Selection of the military sample was accomplished in two stages. In the first stage, a sample of approximately 200 "military units" was selected. These units were selected with probabilities proportional to the number of persons born in 1957 through 1961 and serving in the military unit as of September 30, 1978.

Within selected units, persons born in 1957 through 1961 were sampled with probabilities inversely proportional to the first-stage selection probability. Females were sampled at a rate approximately six times that of males in order to produce approximately 850 males and 450 females. Within each sex, the sample was stratified on the basis of branch of military service (Army, Navy, Air Force, and Marine Corps) and geographic location (Eastern U.S., Western U.S., Europe, Far East, other). Of those interviewed in 1979, 824 military respondents were male and 456 were female (see Table 2). The entire military sample was eligible for interview from 1979-84.

Table 2. NLSY79 Military Respondents Interviewed in 1979 by Sex, Race & Military Branch

  Total Males Females
Total Military 1280 824 456
  Non-black/non-Hispanic 951 609 342
  Black 251 162 89
  Hispanic or Latino 78 53 25
Military Branch
  Army 578 354 224
  Navy 280 212 68
  Air Force 293 162 131
  Marine Corps 129 96 33

Child Sample. The number of children assessed during a given child survey year is a function of the number of children born to interviewed NLSY79 mothers, the number of children living in the homes of those mothers, and, finally, the number of those children actually interviewed. Of the 5,842 NLSY79 females eligible for the first child interview in 1986, more than 2,900 mothers and 4,971 children were interviewed. From this sample of eligible children, assessment data were collected for 4,786.

Multiple Respondent Households

Respondents interviewed in 1979 originated from 8,770 unique households; 2,862 households included more than one NLSY79 respondent. The most common relationships between respondents living in multiple respondent households at the time the survey began were those of sibling or spouse (see Table 3). During the 1979 survey, 5,863 respondents were members of a household containing multiple interviewed siblings. More than 330 respondents were members of a household in which their spouse was also interviewed.

Table 3. Number of NLSY79 Civilian Respondents by Type of Household: 1979

Type of Household Number of Respondents Number of Households
Single Respondent 5908 5908
Multiple Siblings
  Two Siblings 3386 1693
  Three Siblings 1725 575
  Four Siblings 604 151
  Five Siblings 130 26
  Six Siblings 18 3
Total Multiple Siblings 5863 2448
Spouse 334 167
Other 581 247
Totals 12686 8770
Note: Siblings may be biological, step, or adopted. Some households may include both siblings and spouses, as well as respondents with other relationships not presented in this table.