![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Chapter 2 continued: Sample Design and AttritionReturn to beginning of chapter 2.7 Sample Representativeness and AttritionThe retention rate for the Young Women as of the 2001 interview was 54.4 percent, or 2,806 of the original 5,159 respondents. Retention rate is defined as the percent of base-year respondents who were interviewed in any given survey year; included in the calculations are deceased and other out-of-scope respondents (see Table 2.6.3 for definitions). An analysis of selected characteristics of respondents interviewed in the tenth year samples of the Original Cohorts found that noninterviews had not seriously distorted the sample representativeness of any of the cohorts for the characteristics studied (Rhoton 1984). A second analysis of differential attrition among wealthy and non-wealthy subsamples of each of the four Original Cohorts found that non-wealthy respondents of each cohort showed a consistent tendency toward greater attrition (Rhoton and Nagi 1991). Among the three younger cohorts, almost all of the difference between wealthy and non-wealthy subsamples is accounted for by attrition reasons other than the death of the respondent. In a more recent analysis, Zagorsky and Rhoton (1998) concluded that respondents with lower socio-economic status attrited at a higher rate than those with higher income and educational attainment. Further, the authors found that white respondents were more likely to remain in the survey than blacks and those of other races. For year-by-year retention rates, consult Table 2.4.1 in the “Interview Schedule & Fielding Periods” section of this chapter. In Table 2.7.1, the percentage of sampled respondents of each race is presented for the base survey year (1968) and the most recent interview year for which data are available. This table also provides information on numbers of deceased respondents by race. Figure 2.7.1 characterizes the percentage of the original sample, by race, who have been interviewed at each survey point. Table 2.7.1 Sample Characteristics by Race: 1968 and 2001
Figure 2.7.1 Interview Completion Rates among Living Respondents by Race and Survey Year ![]() Finally, Table 2.7.2 presents the number of interviews completed by respondents, broken down by race. In this table, the “number who completed” columns show how many respondents completed exactly that number of surveys. The “cumulative %” columns show a cumulative total percent of those completing at least a given number of surveys rather than a percentage of those completing an exact number of surveys. Table 2.7.2 Number of Interviews Respondents Completed out of 21 Surveys, by Race: 1968-2001
2.8 Sample WeightsThis section is divided into a description of the procedures used to develop sample weights and a discussion of the practical application of these weights. Before using NLS data in an analysis, the user should consult the practical usage discussion below to determine when weighting of data is appropriate. Sample-based weights are designed to reflect the underlying population in the year in which the cohort was initially surveyed. Individual weights are assigned after each interview; these weights produce group estimates that are demographically representative of each cohort’s base-year population when used in tabulations. Sampling weights for each respondent can be found on the corresponding public data release. Base-Year Sampling Weights Population data derived from the NLS are based on multi-stage ratio estimates. The first step was to assign each sample case a basic weight consisting of the reciprocal of the final probability of selection. This probability reflects the differential sampling by race within each stratum. The base-year weights for all those interviewed were adjusted to account for the overrepresentation of blacks in the sample as well as for persons selected after screening who were not interviewed in the initial survey. This adjustment was made separately for each of 24 groupings for the Young Women, based on the four Census regions (Northeast, North Central, South, and West), race (non-black/black), and three place of residence groupings (urban, rural farm, and rural non-farm). In the first stage of ratio weight adjustment, differences at the time of the 1960 Census between the distribution by race and residence of the population as estimated from the sample PSUs and that of total population in each of the four major regions of the country were taken into account. Using 1960 Census data, estimated population totals by race and residence for each region were computed by appropriately weighting the Census counts for PSUs in the sample. Ratios were then computed between these estimates (based on sample PSUs) and the actual population totals for the region as shown by the 1960 Census. In the second stage ratio adjustment, sample proportions were adjusted to independent current estimates of the civilian noninstitutionalized population by age, sex, and race. These estimates were prepared by carrying forward the most recent Census data (1960) to take account of subsequent aging of the population, mortality, and migration between the United States and other countries (Census Bureau 1966). The adjustment was made by race within five age groups. Sampling Weight Nonresponse Adjustment Since the initial interview, reductions in sample size have occurred due to noninterviews. To compensate for these losses, the sampling weights of the individuals who were interviewed are revised. The Young Women cohort is a panel of individuals into which no new individuals were added after the base year. As a result, all reweighting after the initial survey is calibrated to base-year population parameters. This revision is done in two stages. First, out-of-scope noninterviews in each year are identified by the Census Bureau and eliminated from the sample of noninterviews. This group consists of individuals who are institutionalized, have died, are members of the armed services, or have moved outside the United States—that is, individuals who are no longer members of the U.S. noninstitutionalized civilian population. The second stage in the adjustment acknowledges the possible nonrepresentative characteristics of the in-scope interviews. For each survey year, those who are eligible but not interviewed, as well as those who are interviewed, are distributed into 30 nonresponse adjustment cells based on race (black and non-black), length of residence in the United States at first interview (nine or fewer years, ten or more years, N/A) and father’s occupation (white collar, service, blue collar, farm, N/A) reported in 1968. Within each of the cells, the base-year sampling weights of those interviewed are increased by a factor equal to the reciprocal of the reinterview rate (using base-year weights) in that year. In 1991, CHRR began investigating the effects of differential nonresponse on sampling weights as then calculated. The original weighting routine was designed to minimize an increase in variance caused by large weights for individuals with certain characteristics. One effect of this procedure was that certain subsegments of the sample were assigned identical sampling weights. CHRR adjusted the weights to avoid this problem. Practical Usage The Young Women sample is based upon stratified, multi-stage random samples with an oversample of blacks. Each case in each interview year is assigned a weight specific to that year. This weight can be interpreted as an estimate of the number of people in the corresponding population that the individual in the sample represents. This section discusses some ramifications of the weights when used for data analysis. To tabulate characteristics of the sample (i.e., sample means, totals, or proportions) for a single interview year in order to describe the population being represented, it is necessary to weight the observations using the weights provided. For example, to estimate the average hours worked in 1987 by women age 14–24 as of December 31, 1967, researchers would simply use the weighted average of hours worked, where weight is the 1987 sample weight. These weights are approximately correct when used in this way, with item nonresponse possibly generating small errors. Other applications for which users may wish to apply weighting, but for which the application of weights may not produce the intended result, include: Samples Generated by Dropping Observations with Item Nonresponses: Often users confine their analysis to subsamples of respondents who provided valid answers to certain questions. In this case, a weighted mean will not represent the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Item nonresponse because of refusals, don’t knows, or invalid skips is usually quite small, so the degree to which the weights are incorrect is probably quite small. In the event that item nonresponse constitutes a small proportion of the variables under analysis, population estimates (i.e., weighted sample means, medians, and proportions) would be reasonably accurate. However, population estimates based on data items that have relatively high nonresponse rates, such as family income, may not necessarily be representative of the underlying population of the cohort. Data from Multiple Waves: Because the weights are specific to a single wave of the study, and because respondents occasionally miss an interview but are contacted in a subsequent wave, a problem similar to item nonresponse arises when the data are used longitudinally. In addition, the weights for a respondent in different years may occasionally be quite dissimilar, leaving the user uncertain about which weight is appropriate. In principle, if a user wished to apply weights to multiple wave data, weights would have to be recomputed based upon the persons for whom complete data are available. If the sample is limited to respondents interviewed in a terminal or end point year, the weight for that year can be used. Users with a more complex sample selection often can obtain reasonably accurate results by using the base-year weights. Regression Analysis: A common question is whether one should use the provided weights to perform weighted least squares when doing regression analysis. Such a course of action may lead to incorrect estimates. If particular groups follow significantly different regression specifications, the preferred method of analysis is to estimate a separate regression for each group or to use dummy (or indicator) variables to specify group membership. If one wishes to compute the population average effect of, for example, education upon earnings, one may simply compute the weighted average of the regression coefficients obtained for each group, using the sum of the weights for the persons in each group as the weights to be applied to the coefficients. While least squares is an estimator that is linear in the dependent variable, it is nonlinear in explanatory variables, so weighting the observations will generate different results than taking the weighted average of the regression coefficients for the groups. The process of stratifying the sample into groups thought to have different regression coefficients and then testing for equality of coefficients across groups using an F-test is described in most statistics texts. Researchers unsure of the appropriate grouping may wish to consult a statistician or other person knowledgeable about the data set before specifying the regression model. Note that if subgroups have different regression coefficients, a regression on a random sample of the population would be misspecified. Custom Weighting ProgramCurrently, every Young Women survey contains a created variable that is the respondent’s cross-sectional weight. Using these weights provides a simple method for users to correct the raw data for the effects of over-sampling of blacks and the initial clustering of respondents at the survey’s beginning. Unfortunately, while each set of weights provides an accurate adjustment for any single year, none of the weights provide an accurate method of adjusting multiple years’ worth of data. Users analyzing more than one year of Young Women data should use longitudinal weights, which improve a researchers’ ability to accurately calculate summary statistics from multiple years of data. Users create longitudinal weights for the Young Women by going to the Bureau of Labor Statistics Web site at http://www.bls.gov/nls. This page provides links for data and documentation on the original cohorts. Picking “Create custom weights for the Young Women” on this page brings up the Young Women Custom Weighting program, which is shown in figure 2.8.1. ![]()
Figure 2.8.1: Young Women Custom Weighting Program
To create a set of custom weights, first type in your email address, then select the survey years corresponding to your research and pick the “Create Custom Weights” button. The NLS server will generate a set of longitudinal weights and email you a compressed file in Winzip format. The resulting file contains two columns of data, with the columns separated by a blank space. The first column is the public identification (ID) number of each Young Women respondent. The second column is the weight. If the respondent did not participate in every survey checked off, then the respondent is given a weight of zero. If the respondent did participate, they are given a positive longitudinal weight. User Notes: Researchers should note that like the cross-sectional weights in the data file, the longitudinal weights have two implied decimal places. This means that before using either type of weight you should divide the number by 100 to know how many people each respondent represents. The custom weighting program is an Internet version of the program used to create the cross-sectional weights for the original cohorts during the 1990s. The primary difference between the cross-sectional and longitudinal weighting programs is in how the list of respondents is created. In the cross-sectional case the weighting program is given a list of all people who participated in a particular survey round. In the longitudinal case the weighting program creates a “dummy” survey round where the user specifies who participated and who did not. This “dummy” round is based on the set of surveys selected. It then calculates which respondents participated in every survey round chosen by the researcher and uses that list to generate weights. User Notes: Researchers needing to weight very specific lists of respondents and who are comfortable modifying SAS code can ask NLS user services for the custom weighting code used by the Web site. Using this custom weighting code, researchers can input the id numbers of all respondents for whom they need weights generated. The original cohorts weighting is derived from the base year weights via a two-step process. First, all out-of-scope noninterviews, which are respondents who have died, been institutionalized, or moved outside the U.S. are eliminated from the pool of respondents who are classified as noninterviews. Second, those who are in-scope, whether or not they do an interview, are distributed into 24 cells based on race (black/non-black), length of residence at the time of the first interview (nine or less years, ten or more years, or unknown) and education (eight or less years, nine to eleven years, twelve or more years, or unknown). These cells are then examined to see if the cells have too few respondents. If a cell has too few respondents, it is collapsed with an adjoining cell. Once the optimal number of cells are created, all of the weights associated with respondents in a particular cell are totaled. These totals are then divided to create an adjustment factor. This adjustment factor is then multiplied by each respondent’s base year weight, which results in the custom longitudinal weight for a respondent.
References Census Bureau. Current Population Reports. Series P-25, No. 352, November 18, 1966. Parnes, Herbert S.; Shea, John R.; Spitz, Ruth S.; and Zeller, Frederick A. Dual Careers, Volume 1: A Longitudinal Study of Labor Market Experience of Women. Manpower Research Monograph 21, vol. 1. Washington, DC: U.S. Government Printing Office, 1970. Rhoton, Patricia. “Attrition and the National Longitudinal Surveys of Labor Market Experience: Avoidance, Control and Correction.” Columbus, OH: CHRR, The Ohio State University, 1984. Rhoton, Patricia and Nagi, Karima. “Attrition by Wealth in the Original NLS Cohorts.” Columbus, OH: CHRR, The Ohio State University, 1991. Shea, John R.; Roderick, Roger D.; Zeller, Frederick A.; and Kohen, Andrew I. Years for Decision, Volume 1: A Longitudinal Study of the Educational and Labor Market Experience of Young Women. Manpower Research Monograph 24, vol. 1. Washington, DC: U.S. Government Printing Office, 1971. Zagorsky, Jay and Rhoton, Pat. “Attrition and the National Longitudinal Surveys’ Mature Women Cohort.” Columbus, OH: CHRR, The Ohio State University, 1998.
Return to top Return to beginning of chapter Return to Table of Contents | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||