Chapter 2: Sample Design and Attrition (con't)

On this page: On separate pages:

Chapter 2 part 1 Chapter 2 part 2

Table of Contents


2.7 Sample Representativeness and Attrition

The retention rate for the Mature Women at the final interview was 44.0 percent, or 2,237 of the original 5,083 respondents. Retention rate  is defined as the percent of base-year respondents who were interviewed in any given survey year; included in the calculations are deceased and other out-of-scope respondents (see Table 2.6.2 in the "Eligible Sample and Reasons for Noninterview" section of this chapter for definitions). An analysis of selected characteristics of respondents interviewed in the tenth year samples of the Original Cohorts found that noninterviews had not seriously distorted the sample representativeness of any of the cohorts for the characteristics studied (Rhoton 1984). A second analysis of differential attrition among wealthy and non-wealthy subsamples of each of the four Original Cohorts found that non-wealthy respondents of each cohort showed a consistent tendency toward greater attrition (Rhoton and Nagi 1991). Among the three younger cohorts, almost all of the difference between wealthy and non-wealthy subsamples is accounted for by attrition reasons other than the death of the respondent. In a more recent analysis, Zagorsky and Rhoton (1998) concluded that respondents with lower socio-economic status attrited at a higher rate than those with higher income and educational attainment. Further, the authors found that white respondents were more likely to remain in the survey than blacks and those of other races. For year-by-year retention rates, consult Table 2.4.1 in the "Interview Schedule & Fielding Periods" section of this chapter.

In Table 2.7.1, the percentage of sampled respondents of each race is presented for the base survey year (1967) and the last interview year for which data is available. This table also provides information on numbers of deceased respondents by race. Figure 2.7.1 characterizes the percentage by race of the original sample that has been interviewed at each survey point.

Table 2.7.1 Sample Characteristics by Race: 1967 and 2003
Race1 Number of Interviewed Respondents Retention
(2003 as % of 1967)
Number of Deaths
as of 20032
1967 2003
Non-black 3693 (72.7 %) 1693 (75.7%) 45.8 950
Black 1390 (27.3 %) 544 (24.3%) 39.1 535

1  See section on "Race, Ethnicity & Nationality" in this guide for details on race classifications.
Respondent totals in this table are based on R00023.00.

2  Numbers are derived from R76154.00, a revised created variable that reflects mortality counts 
based on Social Security Administration records.

Figure 2.7.1 Interview Completion Rates among Living Respondents by Race and Survey Year
Interview Completion Rates Graph

Finally, Table 2.7.2 presents the number of interviews completed by respondents, broken down by race. In this table, the "number who completed" columns show how many respondents completed exactly that number of surveys. The "cumulative %" columns show a cumulative total percent of those completing at least a given number of surveys rather than a percentage of those completing an exact number of surveys.

Table 2.7.2 Number of Interviews Respondents Completed out of 21 Surveys, by Race: 1967-2003
  All Respondents Non-black Respondents Black Respondents
Number of Surveys1 Number who
completed
Cumulative % Number who
completed
Cumulative % Number who
completed
Cumulative %
21 1555 30.59 1188 32.17 367 26.40
20 475 39.94 360 41.92 115 34.68
19 317 46.17 223 47.96 94 41.44
18 251 51.11 160 52.29 91 47.99
17 221 55.46 167 56.81 54 51.87
16 218 59.75 150 60.87 68 56.76
15 197 63.62 134 64.50 63 61.29
14 154 66.65 105 67.34 49 64.82
13 79 68.21 58 68.91 21 66.33
12 105 70.27 72 70.86 33 68.71
11 79 71.83 47 72.14 32 71.01
10 48 72.77 32 73.00 16 72.16
9 130 75.33 96 75.60 34 74.60
8 149 78.26 108 78.53 41 77.55
7 204 82.27 154 82.70 50 81.15
6 154 85.30 111 85.70 43 84.24
5 155 88.35 110 88.68 45 87.48
4 127 90.85 85 90.98 42 90.50
3 144 93.68 92 93.47 52 94.24
2 188 97.38 144 97.37 44 97.41
1 133 100.0 97 100.0 36 100.0
Total 5083 100.0 3693 100.0 1390 100.0
Note: This table is based on R00023.00 (race), R00002.00, R00856.10, R00884.10, R01338.10, R02053.10, R02883.10,
R03084.10, R03295.10, R04565.10, R04912.10, R05284.10, R06664.20, R07215.20, R07833.20, R08878.20,
R10093.20, R16014.00, R34985.00, R42671.00, R63203.10., and R76199.00.
1 Surveys completed in any year, not necessarily consecutive survey years.

Return to top


2.8 Sample Weights

This section is divided into a description of the procedures used to develop sample weights and a discussion of the practical application of these weights. Before using NLS data in an analysis, the user should consult the practical usage discussion below to determine when weighting of data is appropriate. Sample-based weights are designed to reflect the underlying population in the year in which the cohort was initially surveyed. Individual weights are assigned after each interview; these weights produce group estimates that are demographically representative of each cohort's base-year population when used in tabulations. Sampling weights for each respondent can be found on the corresponding public data release.

Base-Year Sampling Weights

Population data derived from the NLS are based on multi-stage ratio estimates. The first step was to assign each sample case a basic weight consisting of the reciprocal of the final probability of selection. This probability reflects the differential sampling by race within each stratum. The base-year weights for all those interviewed were adjusted to account for the overrepresentation of blacks in the sample as well as for persons selected after screening who were not interviewed in the initial survey. This adjustment was made separately for each of 16 groupings for the Mature Women, based on the four Census regions (Northeast, North Central, South, and West), urban/rural residence, and race (non-black/black).

In the first stage of ratio weight adjustment, differences at the time of the 1960 Census between the distribution by race and residence of the population as estimated from the sample PSUs and that of total population in each of the four major regions of the country were taken into account. Using 1960 Census data, estimated population totals by race and residence for each region were computed by appropriately weighting the Census counts for PSUs in the sample. Ratios were then computed between these estimates (based on sample PSUs) and the actual population totals for the region as shown by the 1960 Census.

In the second stage ratio adjustment, sample proportions were adjusted to independent current estimates of the civilian noninstitutionalized population by age, sex, and race. These estimates were prepared by carrying forward the most recent Census data (1960) to take account of subsequent aging of the population, mortality, and migration between the United States and other countries (Census Bureau 1966). The adjustment was made by race within three age groups.

Sampling Weight Nonresponse Adjustment

Since the initial interview, reductions in sample size have occurred due to noninterviews. To compensate for these losses, the sampling weights of the individuals who were interviewed are revised. The Mature Women cohort is a panel of individuals into which no new individuals were added after the base year. As a result, all reweighting after the initial survey was calibrated to base-year population parameters. This revision was done in two stages. First, out-of-scope noninterviews in each year were identified by the Census Bureau and eliminated from the sample of noninterviews. This group consisted of individuals who were institutionalized, had died, were members of the armed services, or had moved outside the United States--that is, individuals who were no longer members of the U.S. noninstitutionalized civilian population (Note: in 2003, an attempt was made to interview some of the institutionalized respondents.)

The second stage in the adjustment acknowledges the possible nonrepresentative characteristics of the in-scope interviews. For each survey year, those who are eligible but not interviewed, as well as those who are interviewed, were distributed into 24 nonresponse adjustment cells based on race (black and non-black), length of residence in the United States at first interview (nine or fewer years, ten or more years, N/A), and education (N/A, eight or fewer years, nine to eleven years, twelve or more years) reported in 1967. Within each of the cells, the base-year sampling weights of those interviewed were increased by a factor equal to the reciprocal of the reinterview rate (using base-year weights) in that year.

In 1991, CHRR began investigating the effects of differential nonresponse on sampling weights as then calculated. The original weighting routine was designed to minimize an increase in variance caused by large weights for individuals with certain characteristics. One effect of this original procedure was that certain subsegments of the sample were assigned identical sampling weights. CHRR adjusted the weights to avoid this problem.

Practical Usage

The Mature Women sample was based upon stratified, multi-stage random samples with an oversample of blacks. Each case in each interview year was assigned a weight specific to that year. This weight can be interpreted as an estimate of the number of people in the corresponding population that the individual in the sample represents. This section discusses some ramifications of the weights when used for data analysis.

To tabulate characteristics of the sample (i.e., sample means, totals, or proportions) for a single interview year in order to describe the population being represented, it is necessary to weight the observations using the weights provided. For example, to estimate the average hours worked in 1987 by women age 30-44 as of March 31, 1967, researchers would simply use the weighted average of hours worked, where weight is the 1987 sample weight. These weights are approximately correct when used in this way, with item nonresponse possibly generating small errors. Other applications for which users may wish to apply weighting, but for which the application of weights may not produce the intended result, include:

Samples Generated by Dropping Observations with Item Nonresponses: Users often confine their analysis to subsamples of respondents who provided valid answers to certain questions. In this case, a weighted mean will not represent the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Item nonresponse because of refusals, don't knows, or invalid skips is usually quite small, so the degree to which the weights are incorrect is probably quite small. In the event that item nonresponse constitutes a small proportion of the variables under analysis, population estimates (i.e., weighted sample means, medians, and proportions) would be reasonably accurate. However, population estimates based on data items that have relatively high nonresponse rates, such as family income, may not necessarily be representative of the underlying population of the cohort.

Data from Multiple Waves: Because the weights are specific to a single wave of the study, and because respondents occasionally missed an interview but were contacted in a subsequent wave, a problem similar to item nonresponse arises when the data are used longitudinally. In addition, the weights for a respondent in different years may occasionally be quite dissimilar, leaving the user uncertain about which weight is appropriate. In principle, if a user wished to apply weights to multiple wave data, weights would have to be recomputed based upon the persons for whom complete data are available. If the sample is limited to respondents interviewed in a terminal or end point year, the weight for that year can be used. Users with a more complex sample selection often can obtain reasonably accurate results by using the base-year weights.

Regression Analysis: A common question is whether one should use the provided weights to perform weighted least squares when doing regression analysis. Such a course of action may lead to incorrect estimates. If particular groups follow significantly different regression specifications, the preferred method of analysis is to estimate a separate regression for each group or to use dummy (or indicator) variables to specify group membership. If one wishes to compute the population average effect of, for example, education upon earnings, one may simply compute the weighted average of the regression coefficients obtained for each group, using the sum of the weights for the persons in each group as the weights to be applied to the coefficients. While least squares is an estimator that is linear in the dependent variable, it is nonlinear in explanatory variables, so weighting the observations will generate different results than taking the weighted average of the regression coefficients for the groups. The process of stratifying the sample into groups thought to have different regression coefficients and then testing for equality of coefficients across groups using an F-test is described in most statistics texts.

Researchers unsure of the appropriate grouping may wish to consult a statistician or other person knowledgeable about the data set before specifying the regression model. Note that if subgroups have different regression coefficients, a regression on a random sample of the population would be misspecified.

Custom Weighting Program

Currently, every Mature Women survey contains a created variable that is the respondent's cross-sectional weight.  Using these weights provides a simple method for users to correct the raw data for the effects of over-sampling of blacks and the initial clustering of respondents at the survey's beginning.  Unfortunately, while each set of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years' worth of data.  Users analyzing more than one year of Mature Women data should use longitudinal weights, which improve a researcher's ability to accurately calculate summary statistics from multiple years of data.

Users can create longitudinal weights for the Mature Women through two avenues: by going to the Bureau of Labor Statistics Web site at www.bls.gov/nls or through the NLS Investigator at www.nlsinfo.org/investigator. For the BLS Web site, users first select the cohort-specific link they need (Mature Women, in this case) then choose the "custom weights" option to bring up the Custom Weighting program.

To create a set of custom weights, users provide an email address, then select the survey years corresponding to their research and pick the "Create Custom Weights" button.  The custom weighting program will generate a set of longitudinal weights and email users a compressed file in Winzip format.  The resulting file contains two columns of data, with the columns separated by a blank space.  The first column is the public identification (ID) number of each Mature Women respondent.  The second column is the weight.  If the respondent did not participate in every survey checked off, then the respondent is given a weight of zero.  If the respondent did participate, she is given a positive longitudinal weight.

The Web Investigator method of generating custom weights works in a similar fashion. The Investigator has a Custom Weights option button on the main menu page that prompts users to choose a cohort and the desired years, then generates the weights. Results are sent to the user's Investigator account in a compressed file in Winzip format.

User Notes:  Researchers should note that, like the cross-sectional weights in the data file, the longitudinal weights have two implied decimal places.  This means that before using either type of weight this number should be divided by 100 to determine how many people each respondent represents.

The custom weighting program is a version of the program used to create the cross-sectional weights for the original cohorts since the 1990s.  The primary difference between the cross-sectional and longitudinal weighting programs is in how the list of respondents is created.  In the cross-sectional case the weighting program is given a list of all people who participated in a particular survey round.  In the longitudinal case the weighting program creates a "dummy" survey round where the user specifies who participated and who did not.  This "dummy" round is based on the set of surveys selected.  It then calculates which respondents participated in every survey round chosen by the researcher and uses that list to generate weights. 

User Notes:  Researchers needing to weight very specific lists of respondents and who are comfortable modifying SAS code can ask NLS User Services for the custom weighting code used by the Web site.  Using this custom weighting code, researchers can input the ID numbers of all respondents for whom they need weights generated.

The original cohorts weighting is derived from the base year weights via a two-step process.  First, all out-of-scope noninterviews (respondents who have died, been institutionalized, or moved outside the U.S.) are eliminated from the pool of respondents who are classified as noninterviews.  Second, those who are in-scope, whether or not they do an interview, are distributed into 24 cells based on race (black/non-black), length of residence at the time of the first interview (nine or less years, ten or more years, or unknown) and education (eight or less years, nine to eleven years, twelve or more years, or unknown).

These cells are then examined to see if the cells have too few respondents.  If a cell has too few respondents, it is collapsed with an adjoining cell.  Once the optimal number of cells is created, all of the weights associated with respondents in a particular cell are totaled.  These totals are then divided to create an adjustment factor.  This adjustment factors is then multiplied by each respondent's base year weight, which results in the custom longitudinal weight for a respondent.

References

Census Bureau. Current Population Reports. Series P-25, No. 352, November 18, 1966.

Parnes, Herbert S.; Shea, John R.; Spitz, Ruth S.; Zeller, Frederick A. Dual Careers, Volume 1: A Longitudinal Study of Labor Market Experience of Women. Manpower Research Monograph 21, vol. 1. Washington, DC: U.S. Government Printing Office, 1970.

Rhoton, Patricia. "Attrition and the National Longitudinal Surveys of Labor Market Experience: Avoidance, Control and Correction." Columbus, OH: CHRR, The Ohio State University, 1984.

Rhoton, Patricia and Nagi, Karima. "Attrition by Wealth in the Original NLS Cohorts." Columbus, OH: CHRR, The Ohio State University, 1991.

Shea, John R.; Roderick, Roger D.; Zeller, Frederick A.; and Kohen, Andrew I. Years for Decision, Volume 1: A Longitudinal Study of the Educational and Labor Market Experience of Young Women. Manpower Research Monograph 24, vol. 1. Washington, DC: U.S. Government Printing Office, 1971.

Zagorsky, Jay and Rhoton, Pat. "Attrition and the National Longitudinal Surveys' Mature Women Cohort." Columbus, OH: CHRR, The Ohio State University, 1998.