Sample Weights & Design Effects

National Longitudinal Survey of Youth - 1997 Cohort

Sample Weights & Design Effects

Methodology for Calculating Weights

The assignment of individual respondent weights involved a number of different adjustments. Complete details are found in the NLSY97 Technical Sampling Report, which has step-by-step descriptions of the entire adjustment process. Some of the major adjustments are:

Adjustment One. Computation of a base weight, reflecting the case's selection probability for the screening sample. This step also corrects for missed housing units and caps the base weights in the supplemental sample to prevent extremely high weights;

Adjustment Two. Adjustment for nonresponse to the screener;

Adjustment Three. Development of a combination weight to allow the black and Hispanic cases from the cross-sectional sample to be merged with those from the supplemental sample (non-Hispanic, non-blacks in the supplemental sample were not eligible for the NLSY97 sample);

Adjustment Four. Adjustment of the weights for nonresponse to NLSY97 interviews;

Adjustment Five. Poststratification of the nonresponse-adjusted weights to match national totals.

Calculation of Weights under New "Cumulating Cases" Strategy

Starting in round 4, a new "Cumulating Cases" strategy has been used to calculate weights. Instead of calculating separate CX and SU (cross-sectional and supplemental) base weights and then later combining the separate sample weights, a Horvitz-Thompson approach to weighting is used.

In the Horvitz-Thompson approach, the weights are determined across samples depending only on the overall selection probability (into either sample) of the individual element, giving a single unified set of weights for the cumulated cases. This approach is straightforward. Only Adjustments 1 and 3 above are modified. The probability for a case to be in either sample is simply the sum of the probabilities to be in each sample because the samples are independently drawn. Thus, the base weight for a case is the inverse of the sum of sample selection probabilities for a case:

 

 

Under the old strategy, the separate CX and SU step 1 weights were the reciprocal of the selection probabilities for just that sample:

 

The only other change is that Adjustment 3 is now unnecessary.

Summary of NLSY97 Weights for Rounds 1 through 17

Table 1 summarizes the weights for each of the NLSY97 rounds. The sum of the weights for all of these weights is equal, but when there are fewer positive weights to share this sum, the weights tend to increase. The positive weights for Round 1 respondents who are nonrespondents for any of the other rounds are spread around the round's (or panel's) respondents.

Table 1: Summary Table for NLSY97 Rounds 1-17 Weights Under "Cumulating Cases" Strategy

 

  Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Round 7  Round 8 Round 9
N (> 0) 8,984 8,386 8,209 8,081 7,883 7,898 7,756 7,503  7,338
Sum 19,378,453 19,378,454 19,378,453 19,378,453 19,378,453 19,378,453 19,378,454  19,378,453 19,378,453
Mean 2,157.00 2,310.81 2,360.64 2,398.03 2,458.26 2,453.59 2,498.51  2,582.76 2,640.84
Standard Deviation 931.01 1,011.70 1,021.15 1,052.08 1,066.22 1,092.47 1,118.88  1,162.17 1,200.80

Minimum (> 0)

760.71 846.23 858.76 889.08 866.55 864.75 900.60  868.23 916.66

5th percentile

887.2 938.12 969 983.16 997.49 992.86 1,006.25  1,016.25 1,037.19

25th percentile

1,072.03 1,135.96 1,168.73 1,185.60 1,222.83 1,204.79 1,230.15  1,273.38 1,263.82
Median 2,596.59 2,777.05 2,806.46 2,869.30 2,955.45 2,955.01 3,009.67 3,144.57  3,215.89

75th percentile

2,909.73 3,111.65 3,154.13 3,203.60 3,271.03 3,293.50 3,356.67 3,458.88  3,579.02

95th percentile

3,268.16 3,556.58 3,649.77 3,786.59 3,831.66 3,904.73 3,997.80  4,158.40 4,183.79
Maximum 15,761.82 16,718.19 16,646.80 16,950.37 17,277.02 17,167.26 17,852.01 18,026.17  18,594.09
 
  Round 10 Round 11 Round 12 Round 13 Round 14 Round 15 Round 16 Round 17  
N (> 0) 7,559 7,418 7,490 7,559 7,479 7,423 7,141  7,103  
Sum 19,378,454 19,378,453 19,378,455 19,378,453 19,378,452 19,378,452 19,378,452 19,378,454   
Mean 2,563.63 2,612.36 2,587.20 2,563.63 2,591.05 2,610.60 2,713.69 2,728.21   
Standard Deviation 1,157.31 1,180.74 1,178.50 1,175.61 1,201.68 1,225.20 1,271.64 1,282.70   

Minimum (> 0)

918.64 897.41 894.00 866.01 876.70 882.87 891.34 890.61  

5th percentile

1,022.26 1,046.29 1,024.90 997.75 1,011.30 1,020.56 1,050.96 1,064.28   

25th percentile

1,230.84 1,267.50 1,254.40 1,235.94 1,231.02 1,242.24 1,278.73 1,282.44  
Median 3,124.67 3,178.10 3,146.90 3,144.54 3,155.05 3,167.25 3,327.70  3,390.06  

75th percentile

3,469.62 3,541.70 3,534.60 3,516.00 3,563.37 3,597.63 3,754.63  3,755.54  

95th percentile

4,044.36 4,127.30 4,054.50 4,012.05 4,118.80 4,232.26  4,310.88  4,380.94  
Maximum 18,521.05 18,857.91 19,165.40 18,911.82 19,222.75 19,377.82 20,094.05 19,803.54   

Design Effects

Because the samples are multi-stage stratified random samples instead of simple random samples, respondents tend to be clustered in geographic areas (for more information, see Sample Design & Screening Process). In general, these clusters tend to be alike in a variety of ways for a variety of reasons. For example, there may be cultural differences by locality or ecological differences in labor market conditions. Depending upon the degree of this homogeneity, the conventionally computed standard deviations for the variables, which assume a simple random sample, may be too small. However, by controlling the rate at which particular strata are sampled, multi-stage stratified random samples can improve upon simple random samples. The ratio of the correct standard error to the standard error computed under the assumption of a simple random sample is known as the design effect. The NLSY97 Technical Sampling Report provides design effects for the various strata.

As respondents in the cohort get older, mobility may mix the respondents more uniformly through the country, reducing the clustering of the sample as well as the design effects. Many of the persons who started out in the same PSU will have moved to different areas and may no longer be affected by similar unobservable labor market conditions. As this occurs, the error terms in a regression will more closely approximate the standard error computed for a completely random sample. However, some correlation due to respondents coming from the same household or neighborhood will most likely remain.

By examining the geocode data for the NLSY97, it may be possible to control for some of the environmental factors generating design effects or, if desired, to compute design effects based upon county or metropolitan area clusters.

Reference

Moore, Whitney; Steven Pedlow; Parvati Krishnamurty; and Kirk Wolter. National Longitudinal Survey of Youth 1997 (NLSY97) Technical Sampling Report. Chicago: NORC, 2000.