Documentation

Documentation
Young Women Documentation
All variables present on a main file data set (accessed through NLS Investigator) are documented via: (1) a cohort-specific codebook and (2) an accompanying codebook supplement. This section describes these components and discusses the important types of information found within each.
Codebook
The codebook is the principal element of the documentation system and contains information intended to be complete and self-explanatory for each variable in a data file. Codebook information can be viewed with the use of NLS Investigator by clicking on a variable's reference number once a list of variables has been selected.
Every variable is presented within the documentation as a block of information called a "codeblock." Codeblock entries depict the following information: a reference number, variable title, coding information, frequency distribution, reference to the questionnaire item or source of the variable, and information on the derivation for created variables. The codeblocks of many variables include special notes containing additional information designed to assist in the accurate use of data from that variable.
Codebooks are arranged by reference number. Variables are first grouped according to survey year. Within each survey year, those variables related to the interview (e.g., interview method, interview date, reason for noninterview, sampling weight, etc.) appear first, followed by variables picked up directly from the questionnaire and Information Sheet. In general, created and edited variables appear last, although the created environmental variables are grouped with variables related to the interview in the early survey years.
Coding Information
Each codeblock entry presents the set of legitimate codes that a variable may assume along with a text entry describing the codes. Users should note that coding information for a given variable in the NLS codeblock is not necessarily consistent with the codes found within the questionnaire or for the same variable across years. Use only the codebook coding information for analysis. The following types of code entries occur in NLS codeblocks:
Dichotomous or yes/no variables are uniformly coded "Yes" = 1, "No" = 0. Other dichotomous variables have frequently been reformulated to permit this convention to be followed.
Discrete (Categorical), as in the case of the categories in 'Activity Most of Survey Week 93," below:
1 Working | 5 Keeping house | |
2 With a job, not at work | 6 Unable to work | |
3 Looking for work | 7 Retired | |
4 Going to school | 8 Other |
Continuous (Quantitative), as in the case of 'Hourly Rate of Pay at Current or Last Job 83 *KEY*.' These variables have continuous data, but the codebook presents a frequency distribution as in the sample codeblocks above for ease of use.
Combined Quantitative-Qualitative Variables, i.e., variables that are ostensibly quantitative but may have nonquantitative (categorical) responses, utilize integers equaling the actual values for the quantitative responses and 999 for the qualitative (categorical) response. For example, "YEAR STOPPED WORKING AT 1ST MOST RECENT JOB INTRVNG & LAST" is coded as follows:
60 thru 73 ---- actual year 999 ---- still working there
Multiple Responses: In the early years of the surveys, response categories to multiple entry questions found in certain job search, child care, discrimination, or health questions were coded in a geometric progression. For example, more than one response to the question "Method of seeking employment to be used in next year" was possible. The response categories to that question were each assigned a value as follows:
1 | Checked with public employment agency | |
2 | Checked with private employment agency | |
4 | Checked with employer directly | |
8 | Checked with friends or relatives | |
16 | Placed or answered ads | |
32 | Other method |
Multiple responses were then coded for each respondent by adding the individual codes, which yields a unique value for each combination. Such multiple entry variables were identified by an asterisk (*) next to the answer categories in the questionnaire. If a multiple entry has only a few unique combinations, the codebook will specify the exact combinations; those with many combinations need to be unpacked (staff are working on providing unpacked versions of these variables). See How to Unpack Multiple Entries (PDF) for a how-to guide to this process. After the 1991 survey, this multiple entry practice was discontinued and all responses were coded as yes/no.
Missing Responses: Negative numbers are used to indicate that a respondent does not have a valid value for a particular variable. Different numbers indicate different reasons for nonresponse:
- "Refusal" indicates that the respondent refused to answer a given question. These respondents are assigned a value of ‑1. This code is used for all interviews of this cohort.
- "Don't know" indicates that the respondent did not know the answer to a given question. These respondents are assigned a value of ‑2. This code is used for all interviews of this cohort.
- "Invalid skip" indicates that the respondent was not asked a question that she should have answered, usually due to programming or interviewer error. These respondents are assigned a value of ‑3. This code is only used consistently for CAPI interviews (1995-2003). CAPI is short for Computer-Assisted Personal Interviews.
- "Valid skip" has slightly different meanings depending on survey year. In CAPI interviews (1995-2003), this code indicates that the respondent was skipped past the question intentionally, because she was not in the universe of respondents to whom that question applied. These respondents are assigned a value of -4. In paper and pencil interviews (PAPI), which were used from 1967-92, this code indicates either that the respondent is not in the applicable universe or there was some other error that resulted in a missing response (which generally would have resulted in an invalid skip code in a CAPI survey).
- Finally, a "noninterview" value of -5 indicates that a respondent was not interviewed in that survey year. This code is used for all interviews of this cohort.
User Notes
The missing value codes described above are accurate for the 1995-2003 data releases. In previous years, a more complicated system was used to indicate missing data in the PAPI interviews. Beginning in 1995, the missing values were reassigned using a standardized system that matches the Young Women's CAPI data as well as the other NLS cohorts. This standardization should make it easier to use the data in analysis. However, researchers using programs written for a previous release of the Young Women data may need to change the parts of their programming code related to missing values. Users who need more information about the codes previously used in order to make these adjustments should contact NLS User Services.
Three additional negative codes are used only with the NLS women's cohorts for particular types of nonresponse.
- In questions dealing with usual hours per week worked, if the respondent reported that her hours varied, she was assigned a code of ‑6.
- Women who had been widowed since the last survey were asked a series of questions regarding their husband's care and their financial situation since his death. A code of ‑7 was assigned to women whom the interviewer judged to be emotionally unable to answer these questions.
- Some variables in multiple response question series include codes of ‑8, indicating that the respondent was done with the series.
User Notes
In computer-assisted surveys, respondents are initially assigned a default code of ‑4 (valid skip) for all questions in the interview. Then the ‑4 codes are replaced by valid data. The ‑3 (invalid skip) codes must be inserted into the data as hand-edits when data archivists uncover skip pattern errors during the data cleaning process. Therefore, some respondents classified as valid skips may actually have skipped a question incorrectly. If researchers need to know the exact reason a question was not answered, they can examine the skip patterns and universes in the questionnaire to determine whether any additional respondents should have been identified as invalid skips.
Derivations: The decision rules employed in the creation of constructed variables have been included, whenever possible, in the codebook under the title "DERIVATIONS."This information is designed to enable researchers to determine whether available constructs are appropriate for their needs. In the 'Hourly Rate of Pay at Current or Last Job 83 *KEY*' example, the derivation describes in detail the items of the interview schedule used to create the variable. If the derivation is too lengthy to include in the codebook, the codeblock will instead refer users to the supplemental documentation item that contains variable creation information.
Frequency Distribution: In the case of discrete (categorical) variables, frequency counts are normally shown in the first column to the left of the code categories. In the case of continuous (quantitative) variables, a distribution of the variable is presented using a convenient class interval. The format of these distributions varies.
Questionnaire Item: "Questionnaire item" is a generic term identifying the source of data for a given variable. A questionnaire item may be a question, a check item, or an interviewer's reference item appearing within one of the survey instruments. Questionnaire item identifications are located in the extreme right hand column of the codebook. The question number, when available, is copied exactly from the questionnaire.
During PAPI interview years, all created variables have a question name of simply "CV." Created variables in CAPI survey years usually include the letters CV in the question name and usually have the word *KEY* in their title.
Valid Values Range: Depicted below the frequency distribution are the maximum and minimum fields, which define the range of valid values (the upper and lower limits) for a given question. "MINIMUM" indicates the smallest recorded value exclusive of nonresponse codes; "MAXIMUM" indicates the largest recorded value. In the case of the 'Hourly Rate of Pay' example, the maximum, or highest value recorded, is 9815 with two implied decimal places, or $98.15.
Topcoding Income and Asset Values: Confidentiality issues restrict release of all income and asset values. To ensure respondent confidentiality, income variables exceeding particular limits are truncated each survey year so that values exceeding the upper limits are converted to a set maximum value. These upper limits vary by year, as do the set maximum values. From 1968 through 1971, upper limit dollar amounts were set to 999999. From 1972 through 1980, upper limit variables were set to maximum values of 50000, and in 1982 and 1983 the set maximum value was 50001. Beginning in 1985, income amounts exceeding $100,000 were converted to a set maximum value of 100001.
From the cohort's inception, asset variables exceeding upper limits were truncated to 999999. Beginning in 1983, assets exceeding one million were converted to a set maximum value of 999997. Starting in 1993, the Census Bureau also topcoded selected asset items if it considered that the release of the absolute value might aid in the identification of a respondent. This topcoding was conducted on a case-by-case basis with the mean of the top three values substituted for each respondent who reported such amounts.
Codebook Supplements
Variable creation procedures and supplemental coding information are provided within the Codebook Supplement. The following attachments and appendices are included in the Supplement:
Attachment 4: Bose Index provides a mean occupational prestige score for each of the three-digit 1960 occupation codes for respondents of the cohort.
Appendix 1: Fields of Study in College--Instructions for the Coding Scheme
Appendix 2: State Names and State Codes by Census Division Listing
Appendix 4: Listing of Median Education for Different Occupations
Appendix 5: Source for Occupational Atypicality Scores
Appendix 6: Supplemental Edit Specifications for *KEY* Variables: R03297.00, R03292.00, R03294.00, R03293.00, R03295.00
Appendix 7: Listing of Correction to Employment Status Recode for 1968 and 1969
Appendix 9: Determinants of Early Labor Market Success: Appendix A
Appendix 10: Determinants of Early Labor Market Success: Appendix B
Appendix 11: Determinants of Early Labor Market Success: Appendix C
Appendix 12: Determinants of Early Labor Market Success: Method for Variable Construction
Appendix 18: Union Categories--Copy of Coding Instructions for Name of Union or Employee Association
Appendix 20: Derivations for R05007.00, R05012.00 (Marital Status Patterns)
Appendix 21: Rules for Revising Variables Representing Month and Year since Left School
Appendix 22: GED (General Education Development), SVP (Specific Vocational Preparation), Job-Level, and Job Family Values
Appendix 23: Derivations for R05031.00-R05047.00 (Occupation and Other Job Information before Birth of Child)
Appendix 24: Derivations for R05049.00-R05060.00 (Occupation and Other Job Information after Birth of Child)
Appendix 25: New Geographic and Environmental Variables for 1968-78
Appendix 26: Derivations for 1978 *KEY* Variables
Appendix 27: Source for the Job Characteristics Index
Appendix 28: Source for the Job Satisfaction Measures
Appendix 29: Reason for Reference in Union Certification Election (Item 10e, 1982, R07627.00)
Appendix 30: Derivations for the 1983 *KEY* Variables
Appendix 31: Listing of Changes in 1983 Survey Made after Questionnaire Printed
Appendix 32: Derivations for the 1988 *KEY* Variables
Appendix 33: Derivations for the 1991 *KEY* Variables
Appendix 34: Derivations for the 1993 *KEY* Variables (includes Highest Grade Completed 1993, and topcoding information)
Appendix 35: Geometric Progression Coding
Appendix 36: Summary of the Major Differences Between the 1995 and Earlier Surveys
Appendix 37: Summary of 1995 Data Cleaning Issues
Appendix 38: Derivations for 1995 *KEY* and other Created Variables
Appendix 39: Summary of 1997 Data Cleaning Issues
Appendix 40: Derivations for 1997 *KEY* and other Created Variables
Appendix 41: Derivations for 1999*KEY* and other Created Variables
Appendix 42: Derivations for 2001 *KEY* and other Created Variables
Appendix 43: Derivations for 2003 *KEY* and other Created Variables
Cohorts
- NLSY97
- Topical Guide to the Data
- Asterisk Tables
- I. Employment, Unemployment, and Job Search (age restrictions as of interview date)
- II. Schooling (age restrictions as of 12/31/96)
- III. Training (age restrictions as of interview date)
- IV. Income, Assets, and Program Participation
- V. Family Formation (age restrictions as of end of previous calendar year--12/31/96 in rd 1, 12/31/97 in rd 2, and so on)
- VI. Family Background (age restrictions as of 12/31/1996)
- VII. Expectations
- VIII. Attitudes, Behaviors, and Time Use
- IX. Health (age restrictions as of 12/31/96)
- X. Political Participation
- XI. Environmental Variables (in main data set)
- Education
- Employment
- Household, Geography & Contextual Variables
- Family Background
- Marital History, Childcare & Fertility
- Income
- Health
- Attitudes
- Crime & Substance Use
- Asterisk Tables
- Intro to the Sample
- Using & Understanding the Data
- Other Documentation
- Codebook Supplement
- Introduction to the NLSY97 Created Variable Appendices
- Appendix 1: Education Variable Creation
- Enrollment Status and Highest Grade/Degree - Appendix 1
- Date Received Diploma or Degree - Appendix 1
- Number of Grades Repeated or Skipped - Appendix 1
- Number of Schools Attended - Appendix 1
- Credits Earned toward Bachelor's/Associate's Degree - Appendix 1
- Date Left High School and Highest High School Grade - Appendix 1
- Private or Parochial School - Appendix 1
- SAT/ACT Scores - Appendix 1
- Training: Receipt of Certificate or Vocational License - Appendix 1
- Appendix 2: Employment Variable Creation
- Appendix 3: Family Background and Formation
- Household Size as of Survey Date - Appendix 3
- Marital Status and Marital/Cohabitation History - Appendix 3
- Fertility and Child Status - Appendix 3
- Number of Residences since Age 12 - Appendix 3
- Current Citizenship Status - Appendix 3
- Mother's Age at First Birth/Respondent's Birth
- Relationship to Household Parent Figures (Round 1 Parent Interview) - Appendix 3
- Relationship to Household Parent Figures (Rounds 7-9 Childhood Retrospective) - Appendix 3
- Relationship to Household Parent Figures (Interview Date) - Appendix 3
- Appendix 4: Geographic Variable Creation
- Appendix 5: Income and Assets Variable Creation
- Appendix 6: Event History Creation and Documentation
- Appendix 7: Continuous Month Scheme and Crosswalk
- Appendix 8: Instrument Rosters
- Appendix 9: Family Process and Adolescent Outcome Measures
- Appendix 10: CAT-ASVAB Scores
- Appendix 11: Collection of the Transcript Data (High School)
- Appendix 12: Post-Secondary Transcript Study
- Appendix 13: Cross-Cohort NLSY79/97 Overview
- Attachment 1: Census Industrial & Occupational Classification Codes
- Geocode Codebook Supplement
- Introduction to NLSY97 Geocode Data
- Attachment 100: Census Bureau State and County Codes
- Attachment 101: Metropolitan Statistical Area (MSA)/Core-Based Statistical Area (CBSA) Codes
- Attachment 102: IPEDS Data and College Identification Codes
- Attachment 103: Migration Distance Variables for Respondent Locations
- Attachment 104: Codebook Pages for Geocode and Zipcode Variables
- Questionnaires
- Errata
- Errata for NLSY97 Round 17 Release
- Errata for NLSY97 Round 16 Release
- Addendum: Additional NLSY97 Speech & Post-Secondary Variables Available
- Addendum: NLSY97 Post-Secondary Data and Transcript Data Files Now Available
- Errata for NLSY97 Round 15 Release
- Errata for NLSY97 Round 14 Release
- Errata for NLSY97 Round 13 Release
- Errata for NLSY97 Round 12 Release
- Errata for NLSY97 Round 11 Release
- Errata for NLSY97 Round 10 Release
- Errata for NLSY97 Round 9 Release
- Errata for NLSY97 Round 8 Release
- Errata for NLSY97 Round 7 Release
- Errata for NLSY97 Round 6 Release
- Errata for NLSY97 Round 5 Release
- Errata for NLSY97 Round 4 Release
- Errata for NLSY97 Round 3 Release
- Tutorials
- Technical Sampling Report
- Codebook Supplement
- Get Data
- Topical Guide to the Data
- NLSY79
- Topical Guide to the Data
- Asterisk Tables
- Education
- Employment
- Employment: An Introduction
- Work Experience
- Jobs & Employers
- Class of Worker
- Discrimination
- Fringe Benefits
- Industries
- Job Characteristics Index
- Job Satisfaction
- Job Search
- Labor Force Status
- Military
- Occupations
- Time & Tenure with Employers
- Wages
- Work History Data
- Employer History Roster
- Business Ownership
- Retirement
- Household, Geography & Contextual Variables
- Family Background
- Marital History, Childcare & Fertility
- Income
- Health
- Attitudes
- Crime & Substance Use
- Intro to the Sample
- Using & Understanding the Data
- Other Documentation
- Codebook Supplement
- NLSY79 Attachment 3: Industrial and Occupational Classification Codes
- NLSY79 Attachment 4: Fields of Study in College
- NLSY79 Attachment 5: Index of Labor Unions and Employee Associations
- NLSY79 Attachment 6: Other Kinds of Training Codes
- NLSY79 Attachment 7: Other Certificate Codes
- NLSY79 Attachment 8: Health Codes
- NLSY79 Attachment 100: Geographic Regions
- NLSY79 Attachment 101: Country Codes
- NLSY79 Attachment 102: Federal Information Processing Standards (FIPS)
- NLSY79 Attachment 103: Religion Codes
- NLSY79 Attachment 106: Profiles of American Youth (ASVAB Data/AFQT Scores)
- NLSY79 Appendix 1: Employment Status Recode Variables (1979-1998 and 2006)
- NLSY79 Appendix 2: Total Net Family Income Variable Creation (1979-2014)
- NLSY79 Appendix 3: Job Satisfaction Measures
- NLSY79 Appendix 4: Job Characteristics Index 1979-1982
- NLSY79 Appendix 5: Supplemental Fertility and Relationship Variables
- NLSY79 Appendix 6: Urban-Rural and SMSA-Central City Variables
- NLSY79 Appendix 7: Unemployment Rate
- NLSY79 Appendix 8: Highest Grade Completed & Enrollment Status Variable Creation
- NLSY79 Appendix 9: Linking Employers Through Survey Years
- NLSY79 Appendix 11: Round 12 (1990) Survey Administration Methods
- NLSY79 Appendix 12: Most Important Job Learning Activities (1993-94)
- NLSY79 Appendix 13: Intro to CAPI Questionnaires and Codebooks
- NLSY79 Appendix 14: Instrument Rosters
- NLSY79 Appendix 15: Recipiency Event Histories
- NLSY79 Appendix 16: 1994 Recall Experiment
- NLSY79 Appendix 17: Interviewer Characteristics Data
- NLSY79 Appendix 18: Work History Data
- NLSY79 Appendix 19: SF-12 Health Scale Scoring
- NLSY79 Appendix 20: Round 20 (2002) Early Bird and Income Recall Experiments
- NLSY79 Appendix 21: Attitudinal Scales
- NLSY79 Appendix 22: Migration Distance Variables for Respondent Locations
- NLSY79 Appendix 23: Revised Asset and Debt Variables and Computed TOTAL Net Wealth Variables
- NLSY79 Appendix 24: Reanalysis of the 1980 AFQT Data from the NLSY79
- NLSY79 Appendix 25: Attitudinal Scale Scoring
- NLSY79 Appendix 26: Non-Response to Financial Questions and Entry Points
- NLSY79 Appendix 27: IRT Item Parameter Estimates, Scores and Standard Errors
- NLSY79 Appendix 28: NLSY79 Employer History Roster
- NLSY79 Appendix 29: Date of Interview Current Status Variables
- NLSY79/97 Cross-Cohort Data
- Geocode Codebook Supplement
- Appendix 7: Unemployment Rates
- Appendix 10: Geocode Documentation
- Attachment 100: Geographic Regions
- Attachment 101: Country Codes
- Attachment 102: State FIPS Codes
- Attachment 104, Part A: 1981 Standard Metropolitan Statistical Areas (SMSAs)
- Attachment 104, Part B: 1983 Metropolitan Statistical Areas (MSAs)
- Attachment 104, Part C: 1983 Consolidated MSAs and Associated Primary MSAs (CMSAs and PMSAs)
- Attachment 104, Part D: 1983 PMSAs and Associated CMSAs
- Attachment 104, Part E: 1988 MSAs, CMSAs, and Associated PMSAs
- Attachment 104, Part F: 2004 MSAs, CMSAs, and Associated PMSAs
- Attachment 104, Part G: 2006 Core-Based Statistical Areas (CBSAs)
- Attachment 105: Addendum to FICE Codes
- Attachment 106: Codebook Pages for Geocode and Zipcode Variables
- Questionnaires
- Tutorials
- Errata
- Errata for 1979-2016 Data Release
- Errata for 1979-2014 Data Release
- Errata for 1979-2012 Data Release
- Errata for 1979-2010 Data Release
- Errata for 1979-2008 Data Release
- Errata for 1979-2006 Data Release
- Errata for 1979-2004 Data Release
- Errata for 1979-2002 Data Release
- Errata for 1979-2000 Data Release
- Technical Sampling Report
- School & Transcript Surveys Documentation
- Codebook Supplement
- Get Data
- Topical Guide to the Data
- NLSY79 Child/YA
- Topical Guide to the Data
- Intro to the Sample
- Using & Understanding the Data
- Other Documentation
- Codebook Supplement
- Appendix A: HOME-SF Scales (NLSY79 Child)
- Appendix B: Composition of the Temperament Scales (NLSY79 Child)
- Appendix C: Motor & Social Development (NLSY79 Child)
- Appendix D: Behavior Problems Index (NLSY79 Child)
- Appendix D, Part 1: Composition of the BPI subscales
- Appendix D, Part 2a: BPI Anxious/Depressed Subscale
- Appendix D, Part 2b: BPI Antisocial Subscale
- Appendix D, Part 2c: BPI Dependent Subscale
- Appendix D, Part 2d: BPI Headstrong Subscale
- Appendix D, Part 2e: BPI Hyperactive Subscale
- Appendix D, Part 2f: BPI Peer Conflicts/Withdrawn Subscale
- Appendix D, Part 2g: BPI Full Scale
- Appendix D, Part 3a: BPI Internalizing Subscale
- Appendix D, Part 3b: BPI Externalizing Subscale
- Appendix D, Part 3c: BPI Total Scores
- Appendix E: Sample SPSSx Program for Merging NLSY79 Child/YA & Mother Files
- Appendix F: Sample SAS Program for Merging NLSY79 Child/YA & Mother Files
- Appendix G: NLSY79 Child Assessment Scores, Reference Numbers (2010-2014)
- Appendix H: Identification Codes in the Child and Young Adult Database
- Attachment 100: Codebook Pages for Young Adult Geocode Data
- Questionnaires
- Errata
- Errata for 2016 Child/Young Adult Release
- Errata for 2014 Child/Young Adult Release
- Data Addition: New Work and School Status Variables Created
- Errata for 2012 Child/Young Adult Release
- Errata for 2010 Child/Young Adult Release
- Errata for 2008 Child/Young Adult Release
- Errata for 2006 Child/Young Adult Release
- Errata for 2004 Child/Young Adult Release
- Errata for 2002 Child/Young Adult Release
- Errata for 2000 Child/Young Adult Release
- Research/Technical Reports
- Codebook Supplement
- Get Data
- NLS Mature and Young Women
- NLS Older and Young Men