You are here
Home › Cohorts › NLSY97 › Other Documentation › Errata › Errata for NLSY97 Round 15 Release ›Calculating DesignCorrected Standard Errors for the National Longitudinal Survey of Youth, 1997
Calculating DesignCorrected Standard Errors for the National Longitudinal Survey of Youth, 1997
Overview
The National Longitudinal Survey of Youth, 1997 (NLSY97) is a combination of two areaprobability samples. The CrossSectional (CX) sample is an equalprobability multistage cluster sample of housing units for the entire United States (every housing unit in the United States in 1997 had an equal probability of being in the CX sample). The Supplemental (SU) Sample is a multistage cluster sample of housing units that oversamples Hispanic and nonHispanic Black youths; it is designed so that every eligible Hispanic and nonHispanic Black youth in the United States in a housing unit in 1997 had an equal probability of being in the SU sample.
Since these samples are cluster samples, standard errors are larger for the NLSY97 than simple random sample calculations (calculated without correction for the design) would indicate. To correctly calculate standard errors, design variables must be used in statistical software. Without these design variables, statistical software will assume a simple random sample and underestimate standard errors. For a handful of variables selected below in Table 1, we see multiplier effects on standard errors (labeled DEFT) ranging from 1.30 to 1.62.
In order to facilitate the calculation of design effects, we provide two design variables for every Round 1 NLSY97 interview: VSTRAT and VPSU. VSTRAT is the Variance STRATum while VPSU is the Variance Primary Sampling Unit. The combination of VSTRAT and VPSU reflect the firststage and secondstage units selected as part of the NORC National Sampling Frame. There are two secondstage units (VPSU) for each firststage unit (VSTRAT).
First stage units in the NLSY97 are called Primary Sampling Units (PSUs), each of which is composed of one or more counties. The largest urban areas are selected with certainty to guarantee their representation in NSLY97. Secondstage stage units in the NLSY97 are called segments, each of which is one or more Censusdefined blocks. The firststage and secondstage units are selected with probabilities proportional to size (housing units for the CX sample; minority youths for the SU sample), and the sample housing units (thirdstage units) are then selected to be an equalprobability sample.
To create the variables VSTRAT and VPSU, we recode the PSUs and segments, depending on whether the PSU was selected with certainty. Certainty PSUs are considered strata, so all the segments in one certainty PSU are in one VSTRAT value, with segments divided so that half are assigned to VPSU = 1 while the other half are assigned to VPSU = 2. Some certainty PSUs are large enough to be divided into multiple VSTRAT values with up to twenty segments in one VSTRAT value (ten in each VPSU). Noncertainty PSUs are paired into one VSTRAT value with one PSU assigned to VPSU = 1 while the other PSU is assigned to VPSU = 2. It is rare, but possible, for PSUs to be combined in one VPSU. This strategy was designed by Kirk Wolter.
Here is sample Stata code to analyze the variable ANALYSISVAR within an NLSY97DATAFILE with the appropriate weight variable for the analysis, WTVAR:
use NLSY97DATAFILE.dta,clear
svyset [pweight=WTVAR] , strata(vstrat) psu(vpsu) singleunit(scaled)
svy: mean ANALYSISVAR //mean for continuous variables
svy: proportion ANALYSISVAR //proportion for categorical variables
estat effects // design effectsthis generates the DEFF and DEFT
svy, subpop (if SUBGROUP==1): mean ANALYSISVAR // mean within a subpopulation
svy: tabulate ANALYSISVAR //one way table
The results generated by running this code on select NLSY97 variables are shown in Table 1. They report designcorrected standard errors as well as standard errors assuming simple random sampling as would be estimated in the absence of these design variables.
Table 1: Variance estimation for selected NLSY97 variables 



Variable 
Mean/ proportion 
Sample size 
Estimate 
Designcorrected std error 
SRS std error 
DEFF 
DEFT 

Gross family income 2009 from 2010 interview [T5206900] 
mean 
6527 
64858.5 
923.831 
773.64 
1.728 
1.314 

ASVAB score for Math (percentile) [R9829600] 
mean 
7093 
50.410 
0.638 
0.367 
3.447 
1.857 

ASVAB score for Math (percentile) for females [R9829600, R0536300] 
mean 
8102 
51.508 
0.705 
0.509 
2.222 
1.491 

Weeks worked in 2008 [Z9061800] 
mean 
8011 
40.021 
0.270 
0.224 
1.700 
1.304 

Ever received a bachelorâ€™s degree or higher as of 2011 interview [T6657200] 
proportion 
7398 
0.300 
0.009 
0.006 
2.614 
1.617 

Never received a high school diploma as of 2011 interview [T6657200] 
proportion 
7398 
0.199 
0.007 
0.005 
2.368 
1.539 

Lived with 2 parents (at least 1 bioparent) in Round 1 [R1205300] 
proportion 
8953 
0.675 
0.007 
0.005 
2.216 
1.489 

Tabulations use the Round 1 Cumulating Cases Sampling Weight [R1236101] 

One thing to note is that the sample size for math ASVAB scores for females is greater than the sample size for math ASVAB scores for all respondents. Though the estimate uses only the 3503 observations of females with nonmissing data for ASVAB scores, the variance calculation uses information from all observations in the data whether or not they have valid values for the specific variables. (See here for more information.)
In SAS, one would use PROC SURVEYFREQ to calculate the designcorrected standard errors. SPSS is menudriven, so no code is given here, but you can create designcorrected standard errors within SPSS using the Complex Samples addon.
Potential Problems with Sparse Data
If you are calculating designcorrected standard errors using a subsample of the NLSY97 data set and/or using a variable that has a large number of observations with missing values, you may receive a message such as this:
STATA error handling: â€œmissing standard error because of stratum with single sampling unitâ€
VSTRAT and VPSU were created so that there was a minimum of 7 NLSY97 respondents within a VSTRAT/VPSU cell. However, if all respondents within a cell are missing on a variable, it will be impossible to calculate the standard error. If the dataset is subset (to males or females, for example), this error becomes more likely to happen.
The best workaround is to merge two VSTRATA together to eliminate this problem (the VSTRATA are ordered so that similar VSTRATA are numerically consecutive). To diagnose the problem, run a frequency of the data by VSTRAT/VPSU and look for VSTRAT values with only one VPSU with respondents. Here is an example:
VSTRAT 
VPSU 
# of cases 
â€¦ 
â€¦ 
â€¦ 
x1 
1 
3 
x1 
2 
5 
X 
1 
4 
x+1 
1 
6 
x+1 
2 
4 
â€¦ 
â€¦ 
â€¦ 
The error occurs because VSTRAT = x has four cases with VPSU=1 but none with VPSU=2. This prevents Stata from calculating the variance for this strata (it has nothing to compare VPSU=1 with). The cases were sorted by VSTRAT and VPSU so that the most similar VSTRATA are numbered consecutively and the most similar cases are always within the two VPSU values of one VSTRAT value. The easiest solution is therefore to make room for VSTRAT x within VSTRAT x1 by combining the two VPSUs within VSTRAT x1. Then, the VSTRAT x cases are moved to the other VSPU value within VSTRAT x1. Note that VSTRAT x1 is chosen instead of VSTRAT x+1 in this example because VSTRAT x1 has fewer total cases (8) than VSTRAT x+1 (10). In some, but not all cases, the VPSU 1 cases in VSTRAT x are â€œmore similarâ€ to the VSTRAT x1 cases than the VSTRAT x+1 cases. Here are the two programming steps:
1. If VSTRAT = x1 and VPSU = 2 then VPSU=1
2. If VSTRAT = x then VSTRAT=x1 and VPSU=2.
Here is the revised frequency:
VSTRAT 
VPSU 
# of cases 
â€¦ 
â€¦ 
â€¦ 
x1 
1 
8 
x1 
2 
4 
x+1 
1 
6 
x+1 
2 
4 
â€¦ 
â€¦ 
â€¦ 
This eliminates the â€œstratum with a single sampling unit.â€ In severe cases of data subsets, this step may be required more than once, although this may also indicate that the â€œclusteringâ€ has been removed (by using less than 10 percent of the total sample, for example).
Author: Steven Pedlow, NORC at the University of Chicago
Revised: October 1, 2014
Cohorts
 NLSY97
 Topical Guide to the Data
 Asterisk Tables
 I. Employment, Unemployment, and Job Search (age restrictions as of interview date)
 II. Schooling (age restrictions as of 12/31/96)
 III. Training (age restrictions as of interview date)
 IV. Income, Assets, and Program Participation
 V. Family Formation (age restrictions as of end of previous calendar year12/31/96 in rd 1, 12/31/97 in rd 2, and so on)
 VI. Family Background (age restrictions as of 12/31/1996)
 VII. Expectations
 VIII. Attitudes, Behaviors, and Time Use
 IX. Health (age restrictions as of 12/31/96)
 X. Political Participation
 XI. Environmental Variables (in main data set)
 Education
 Employment
 Household, Geography & Contextual Variables
 Family Background
 Marital History, Childcare & Fertility
 Income
 Health
 Attitudes
 Crime & Substance Use
 Asterisk Tables
 Intro to the Sample
 Using & Understanding the Data
 Other Documentation
 Get Data
 Topical Guide to the Data
 NLSY79
 Topical Guide to the Data
 Asterisk Tables
 Education
 Employment
 Employment: An Introduction
 Work Experience
 Jobs & Employers
 Class of Worker
 Discrimination
 Fringe Benefits
 Industries
 Job Characteristics Index
 Job Satisfaction
 Job Search
 Labor Force Status
 Military
 Occupations
 Time & Tenure with Employers
 Wages
 Work History Data
 Employer History Roster
 Business Ownership
 Retirement
 Household, Geography & Contextual Variables
 Family Background
 Marital History, Childcare & Fertility
 Income
 Health
 Attitudes
 Crime & Substance Use
 Intro to the Sample
 Using & Understanding the Data
 Other Documentation
 Codebook Supplement
 NLSY79 Attachment 3: Industrial and Occupational Classification Codes
 NLSY79 Attachment 4: Fields of Study in College
 NLSY79 Attachment 5: Index of Labor Unions and Employee Associations
 NLSY79 Attachment 6: Other Kinds of Training Codes
 NLSY79 Attachment 7: Other Certificate Codes
 NLSY79 Attachment 8: Health Codes
 NLSY79 Attachment 100: Geographic Regions
 NLSY79 Attachment 101: Country Codes
 NLSY79 Attachment 102: Federal Information Processing Standards (FIPS)
 NLSY79 Attachment 103: Religion Codes
 NLSY79 Attachment 106: Profiles of American Youth (ASVAB Data/AFQT Scores)
 NLSY79 Appendix 1: Employment Status Recode Variables (19791998 and 2006)
 NLSY79 Appendix 2: Total Net Family Income Variable Creation (19792014)
 NLSY79 Appendix 3: Job Satisfaction Measures
 NLSY79 Appendix 4: Job Characteristics Index 19791982
 NLSY79 Appendix 5: Supplemental Fertility and Relationship Variables
 NLSY79 Appendix 6: UrbanRural and SMSACentral City Variables
 NLSY79 Appendix 7: Unemployment Rate
 NLSY79 Appendix 8: Highest Grade Completed & Enrollment Status Variable Creation
 NLSY79 Appendix 9: Linking Employers Through Survey Years
 NLSY79 Appendix 11: Round 12 (1990) Survey Administration Methods
 NLSY79 Appendix 12: Most Important Job Learning Activities (199394)
 NLSY79 Appendix 13: Intro to CAPI Questionnaires and Codebooks
 NLSY79 Appendix 14: Instrument Rosters
 NLSY79 Appendix 15: Recipiency Event Histories
 NLSY79 Appendix 16: 1994 Recall Experiment
 NLSY79 Appendix 17: Interviewer Characteristics Data
 NLSY79 Appendix 18: Work History Data
 NLSY79 Appendix 19: SF12 Health Scale Scoring
 NLSY79 Appendix 20: Round 20 (2002) Early Bird and Income Recall Experiments
 NLSY79 Appendix 21: Attitudinal Scales
 NLSY79 Appendix 22: Migration Distance Variables for Respondent Locations
 NLSY79 Appendix 23: Revised Asset and Debt Variables and Computed Net Worth Variables
 NLSY79 Appendix 24: Reanalysis of the 1980 AFQT Data from the NLSY79
 NLSY79 Appendix 25: Center for Epidemiologic Studies Depression (CESD) Scale
 NLSY79 Appendix 26: NonResponse to Financial Questions and Entry Points
 NLSY79 Appendix 27: IRT Item Parameter Estimates, Scores and Standard Errors
 NLSY79 Appendix 28: NLSY79 Employer History Roster
 Geocode Codebook Supplement
 Appendix 7: Unemployment Rates
 Appendix 10: Geocode Documentation
 Attachment 100: Geographic Regions
 Attachment 101: Country Codes
 Attachment 102: State FIPS Codes
 Attachment 104, Part A: 1981 Standard Metropolitan Statistical Areas (SMSAs)
 Attachment 104, Part B: 1983 Metropolitan Statistical Areas (MSAs)
 Attachment 104, Part C: 1983 Consolidated MSAs and Associated Primary MSAs (CMSAs and PMSAs)
 Attachment 104, Part D: 1983 PMSAs and Associated CMSAs
 Attachment 104, Part E: 1988 MSAs, CMSAs, and Associated PMSAs
 Attachment 104, Part F: 2004 MSAs, CMSAs, and Associated PMSAs
 Attachment 104, Part G: 2006 CoreBased Statistical Areas (CBSAs)
 Attachment 105: Addendum to FICE Codes
 Attachment 106: Codebook Pages for Geocode and Zipcode Variables
 Questionnaires
 Tutorials
 Errata
 Technical Sampling Report
 School & Transcript Surveys Documentation
 Codebook Supplement
 Get Data
 Topical Guide to the Data
 NLSY79 Child/YA
 Topical Guide to the Data
 Intro to the Sample
 Using & Understanding the Data
 Other Documentation
 Codebook Supplement
 Appendix A: HOMESF Scales (NLSY79 Child)
 Appendix B: Composition of the Temperament Scales (NLSY79 Child)
 Appendix C: Motor & Social Development (NLSY79 Child)
 Appendix D: Behavior Problems Index (NLSY79 Child)
 Appendix D, Part 1: Composition of the BPI subscales
 Appendix D, Part 2a: BPI Anxious/Depressed Subscale
 Appendix D, Part 2b: BPI Antisocial Subscale
 Appendix D, Part 2c: BPI Dependent Subscale
 Appendix D, Part 2d: BPI Headstrong Subscale
 Appendix D, Part 2e: BPI Hyperactive Subscale
 Appendix D, Part 2f: BPI Peer Conflicts/Withdrawn Subscale
 Appendix D, Part 2g: BPI Full Scale
 Appendix D, Part 3a: BPI Internalizing Subscale
 Appendix D, Part 3b: BPI Externalizing Subscale
 Appendix D, Part 3c: BPI Total Scores
 Appendix E: Sample SPSSx Program for Merging NLSY79 Child/YA & Mother Files
 Appendix F: Sample SAS Program for Merging NLSY79 Child/YA & Mother Files
 Appendix G: NLSY79 Child Assessment Scores, Reference Numbers (20102014)
 Appendix H: Identification Codes in the Child and Young Adult Database
 Attachment 100: Codebook Pages for Young Adult Geocode Data
 Questionnaires
 Errata
 Errata for 2014 Child/Young Adult Release
 Data Addition: New Work and School Status Variables Created
 Errata for 2012 Child/Young Adult Release
 Errata for 2010 Child/Young Adult Release
 Errata for 2008 Child/Young Adult Release
 Errata for 2006 Child/Young Adult Release
 Errata for 2004 Child/Young Adult Release
 Errata for 2002 Child/Young Adult Release
 Errata for 2000 Child/Young Adult Release
 Errata for NLSY79 Child Interview Dates 19861992
 Research/Technical Reports
 Codebook Supplement
 Get Data
 NLS Mature and Young Women
 NLS Older and Young Men