Custom Weighting Program Documentation

by
Jay Zagorsky
Zagorsky.1@osu.edu

 

I. Overview

Every NLS data release contains a set of cross-sectional weights.  Using these weights provides a simple method for users to correct the raw data for the effects of over-sampling, clustering and differential base year participation.  Unfortunately, while each series of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years worth of data.  This document describes the custom weighting program which provides the ability to create a set of customized longitudinal weights.  These weights improve a researchers’ ability to accurately calculate summary statistics from multiple years of data.

After each field period a set of round specific survey weights are produced.  The custom weighting program simply creates a new temporary list of individuals who meet the criteria selected from the custom weighting home page.  This list is then weighted as if the individuals had participated in a survey round.  The weights for this temporary list are the output of the custom weighting program.

II. Details

The weight calculation program is based on the existing weighting algorithms and data that create the round specific weights.  Conceptually, using a specific list of individuals is identical to calculating weights in non-base year survey rounds.  In non-base year rounds, some individuals participate and others do not.

The NLSY79 technical sampling report describes the six step process used for computing the survey’s weights.

  1. Computation of a base weight, reflecting the case’s selection probability for the screening sample
  2. Adjustment for nonresponse to the screener
  3. Adjustment of the weights resulting from the second step to reflect any subsampling following the screener, such as for race/ethnicity
  4. Development of a combination weight to allow the black and Hispanic cases from the cross-sectional sample to be merged with those from the supplemental sample
  5. Adjustment of the weights for nonresponse to the main interview(s)
  6. Post-stratification of the nonresponse-adjusted weights

While these steps seem complicated, the actual process of weighting a particular round or the custom sample is relatively simple since the program only deals with step 6.  Steps 1 to 5 were all done to calculate the base year (1979) set of survey weights.

To do the sixth step each NLSY79 respondent is given two weights.  The first is called the target weight and the second is called the preliminary weight.  The sum of the target weights for all people in a particular group (say Hispanic Males age 20 in 1978) are survey staff’s best estimate for the size of this group during 1978.  The preliminary weights are more complicated but are basically a number that reflects all the adjustments created in steps 1 through 5.

To create a custom weight, the NLSY79 respondents are broken into fine groups which try to partition people based on race, sample group, age, military service and other factors.  The goal is to group respondents into units of at least 20 people for the civilian sample and 15 people for the military sample.  Much of the custom weighting program is devoted to automating the creation of these small groups, which are also called cells.

Once the optimal number of cells are created, all of the target and preliminary weights associated with respondents in a particular cell are totaled.  These totals are then divided to create an adjustment factor.  This adjustment factors is then multiplied by each respondent’s preliminary weight calculated in the base year.  This adjustment of the base year weights results in the custom weight for a respondent.

After running the custom weighting program an output file is created.  This file has two variables separated by a space; the respondent's id and custom weight.  All custom longitudinal weights, just like the cross-sectional weights, have an implied 2 decimal points.  Hence, if you want to know how many people a person represents you must divide everything by 100 to get the real number).  A value of zero (0) means the respondent is not included/out of the survey.