Appendix 8: Instrument Rosters

How should researchers use the roster data in analysis?

The data set is organized so that rosters can easily be found and used in research. Because rosters present key pieces of information in a structured format, they are the best place to obtain that information. All variables found on rosters have "Roster Item" as their main area of interest. Each roster has a unique name that serves as the beginning of the question name for all variables on the roster; the same name appears at the beginning of the variable title for each item on the roster. Different rosters have been used in different rounds, depending on the topics included in the interview and the type of information collected. The roster names and question names are shown in Figure 4.

Figure 4. Rosters Included Each Round

Roster Question name Round 1 Round 2-4 Round 5 Round 6-14
Household Information HHI2 (rd. 1), HHI (rds. 2-9) * * * *
Nonresident Roster NONHHI * * * *
Youth Information YOUTH *      
School Roster NEWSCHOOL   * * *
Employer Roster YEMP * * * *
Freelance Jobs Roster FREELANCE * * *  
Training Roster TRAINING * * * *
Biological Children Roster BIOCHILD * *    
Biological/Adopted Children Roster BIOADOPTCHILD     * *
Parent Household Information PARHHI *      
Parent Youth Information PARYOUTH *      
Partner/Spouse PARTNERS * * * *
Other parents of respondent's children1 OTHERPARENTS      collapsed
Partner/spouse information1 CUMPARTNERS      collapsed
1These are collapsed rosters. The variables combine information across survey rounds. All respondents are represented in the roster, regardless of whether they were interviewed in the most recent round. These variables are listed as "XRND" rather than being associated with a survey year in the data.


Important Information

Researchers can locate rosters in the data set by looking at the roster item area of interest, by selecting the appropriate question name, or by searching the any word in context index for variables with "ros" or "roster" and the name of the roster of interest in the title.

When the NLSY97 data set was initially created, variables could only be assigned to one area of interest. The newer data extraction software permits variables to be linked to multiple areas of interest. However, additional areas have not been assigned to every variable. Because roster items were initially located in the roster item area of interest, they may not be grouped with the rest of the data on a particular topic. For example, the school roster variables may not appear if the user searches for the "School Experience" area of interest. For this reason, it is very important that users become familiar with the rosters used in the data set. If a roster is available on the topic of a particular research project, users should always locate that roster using one of the search techniques mentioned above and examine it before using the other (non-roster) variables that relate to their research.

Using rosters in single-round analyses. When looking at the data set, users will notice that many questions are repeated for each person or thing on the roster, and the titles for these repeated questions include a number. This number indicates the line on the roster that corresponds to the person or item being described in that variable. For example, the question "Self-Employed Business/Industry Job 02" indicates the industry of the second job listed on the respondent's self-employment roster. The researcher may then want to examine information such as the respondent's start and stop dates or rate of pay for that job. To find this information, he or she can then look at the data for those items contained in the roster for job #02, or the self-employment job that is on the second line of the roster. For all other questions asked after the roster was created in that same survey year, job #02 will refer to the same self-employment job.

Users should be aware that, in some cases, the information contained in the rosters actually appears in the data set more than once. As Figure 1 suggests, data may first be included at the point in the interview when the information was actually collected. For example, the round 1 screener question SE-28 asked the household informant for the date of birth of each household member. After all the raw data had been gathered, the computer sorted all the answers and created the household roster. At this point the date of birth information is also located in the round 1 roster variables named HHI2_DOB. In the case of the round 1 household roster, both the raw data and roster items are included in the data set.

In other cases, the raw answers may be blanked out of the public use data set. If a reference number is not listed for a given question in the questionnaire, then that raw data item may only be represented in roster form. For example, answers to the raw data questions used to create the employer roster are blanked out and do not appear in the data set. In the printed questionnaire, these questions have no reference numbers. However, all of the data collected in these questions (except for confidential information like the name of the employer) appears in the employer roster.

Important Information

Even though the data may appear more than once, survey staff strongly recommend that researchers use the roster information rather than the raw data whenever possible. Survey staff are working to eliminate these duplicate sources of information.

For some variables, the roster information may be more accurate because some rosters are updated during the interview if the initial report was inaccurate. When survey staff prepare the data for release, they clean up the rosters if necessary but do not necessarily clean the corresponding raw data. Finally, because many rosters are sorted in a particular order, the number of a person or item on the roster will not match the number in the questions that precede roster creation. For example, in the household screener (the SE questions), person #01 is the first household resident mentioned to the interviewer. In the household roster and all later interview questions, person #01 is the oldest person in the household who was eligible for the NLSY97. Person #01 in the SE questions might be person #05 on the roster. It can be very difficult to determine to which person, school, or job a pre-sort question refers. For all of these reasons, roster data are always preferable to raw data in cases where both are available.

Using rosters from more than one round. Because the NLSY97 is a longitudinal survey, researchers often want to link data across survey rounds. However, household residents, jobs, and so on may move around on the roster in different interviews. That is, a father who was listed third on the roster in round 1 might move to position 2 or 4 in round 2. The unique identification numbers (UIDs) are the key to finding the same person or thing in different rounds. Most of the rosters contain variables assigning a unique number to each person or thing listed. This number never changes and can be used to link roster items across rounds. In some cases, it also makes it possible to link people between two different rosters in the same survey. For example, beginning in round 2 the unique ID listed for a child on the biological children roster is the same one assigned to that child on the household roster. Researchers can therefore examine data on both rosters about the same child.

An additional feature of most unique ID numbers is that they incorporate an indicator of the round in which the person or item was first reported. For example, IDs of roster items reported in round 1 may begin with "1" or "97," while those first reported in round 2 begin with "2" or "98." (Beginning with round 3, 4-digit years are used so that IDs begin with "1999" rather than just "99.")

Example-Use of the employer roster in analysis

Continuing the above example, this section explains how to use rosters in data analyses. Although the employer roster is used in the example, most aspects apply to other NLSY97 rosters as well. Emma's information, as organized in the employer rosters, can be used to examine the characteristics of her jobs at the date of each interview or over time. This example focuses primarily on the round 2 employer roster, but use of the roster is similar in each subsequent round.

As described above, Emma worked for Peel's Store and Steed's Diner during the period between the round 1 and round 2 interviews. Information about these employers was sorted and a roster constructed with the most recent employer appearing first. A researcher using these data would need to be aware of the impact of roster construction.

Because the roster is sorted and employers reported in different rounds may be mixed, variables with "Employer #01" in the title do not necessarily refer to employer number 9701, 9801, etc. The #01 refers solely to the order of the job as listed on the current year's roster. The unique identification numbers provide a crosswalk between the two systems of identification. The UIDs also allow users to link employers across survey rounds and to identify the round in which an employer was first reported.

For example, Emma's value for the round 2 variable R24761., "YEMP, Employer 02 Unique ID (Ros Item)," would be 9703-Peel's Store. The user can identify this as an ID assigned in round 1 because it starts with "97," and look at the round 1 UID variables (R05311.-R05317.) to match the employer. In Emma's case, the variable for employer #01 in round 1 would have UID 9703. Therefore, the researcher knows that information about employer #01 in round 1 refers to the same job as variables about employer #02 in round 2. The variables from the two rounds can then be compared to determine if there were any changes in characteristics such as hours worked, rate of pay, occupation, etc.

The roster line numbers and UID variables in the event history data work in the same way. For example, a researcher might want to know Emma's employment status in the first and last week of 1998. In the first week of 1998 (variable EMP_STATUS.01.98), Emma was working at her parents' store, so the status variable would have a value of 9703. Using this UID, researchers can link that job to all of the other information collected during the interview. For the last week of 1998 (variable EMP_STATUS.01.98), when Emma was working at Steed's Diner, the status variable would have a value of 9801. The second set of event history variables, the start and stop dates of each job, uses the roster line numbers. For these variables, the number in the variable title refers to the same job as in the main data set. For example, in the main round 2 data Peel's Store is job #02. The start and stop dates for Peel's Store in the event history data (variables EMP_START_WEEK.02 and EMP_END_WEEK.02) will also have #02 in the variable title.