Appendix 4: Geographic Variable Creation

Return to Table of Contents

Several variables in the main data set provide information about the respondent’s area of residence. These variables permit researchers to identify key characteristics of the area without needing access to the geocode CD-ROM. Geographic variables are created using standard geocoding software (in Round 9, ESRI ArcGIS 9(ArcMap version 9.1)); therefore, no programming code is provided for these variables. Instead, this document offers a brief general description of the methods used to generate these variables. For more information about the process of classifying a respondent’s metropolitan area or about the geographic variables in general, refer to the introduction to the Geocode Codebook Supplement or contact NLS User Services.

User Notes: Researchers should be aware that the process for geocoding respondent residences has been changed since the initial survey round.  All residence information was re-geocoded using the new, more accurate approach, so all variables are comparable across rounds.  However, researchers should not use geographic variables from old data sets in analyses; all geographic data should be taken from the newest release.

Census Region of Residence at Survey Date

Variable Created: CV_CENSUS_REGION

This variable classifies respondents as residing in one of four regions defined by the U.S. Bureau of the Census. These regions are as follows:

Census Division



Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania,  Rhode Island, and Vermont

North Central

Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin


Alabama, Arkansas, Delaware, District of Columbia, Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, and West Virginia


Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming

Return to top

MSA Status at Survey Date

Variable Created: CV_MSA

In rounds 1-7, this variable provided users with information about whether the respondent lived in the central city of the MSA, in another part of the MSA, or outside of an MSA. As defined by the Census Bureau, a central city is the major city lying within a Metropolitan Statistical Area (MSA). Initially, a variable was created using the TIGER/Line files (a database developed by the Census Bureau) to determine whether the respondent lived in an MSA. A second variable was created based on “places” data in the Maptitude program that identified whether the respondent lived in the central city. The variables were then combined to produce a single MSA/central city variable. For rounds 1-7, respondents are coded as follows:

  1. not in MSA

  2. in MSA, not central city

  3. in MSA, central city

  4. in MSA, not known

  5. not in country

Beginning in round 8, new Census standards were used in the creation of this variable. Rather than MSAs, the Census Bureau now defines Core-Based Statistical Areas (CBSA) statistical geographic entities consisting of the county or counties associated with at least one core (urbanized area or urban cluster) with a population of at least 10,000, plus adjacent counties having a high degree of social and economic integration with the core as measured through commuting ties with the counties containing the core (

Metropolitan and micropolitan statistical areas are the two categories of CBSAs. Metropolitan statistical areas have at least one urbanized area of 50,000 or more population, plus adjacent territory that has a high degree of social and economic integration with the core as measured by commuting ties. Micropolitan statistical areas are a new set of statistical areas that have at least one urban cluster of at least 10,000 but less than 50,000 population, plus adjacent territory that has a high degree of social and economic integration with the core as measured by commuting ties. Metropolitan and micropolitan statistical areas are defined in terms of whole counties or county equivalents, including the six New England states. As of June 6, 2003, there are 362 metropolitan statistical areas and 560 micropolitan statistical areas in the United States.

The NLYS97  Round 8 uses CBSA codes updated in January 2006.

The largest city in each metropolitan or micropolitan statistical area is designated a "principal city." Additional cities qualify if specified requirements are met concerning population size and employment. The term "principal city" replaces "central city," the term used in previous standards. In the NLSY97, information about whether the respondent lives in a CBSA and whether the respondent lives within a principal city of the CBSA is combined into a single variable.  This variable, still called CV_MSA, is distinguished from the Rounds 1-7 MSA variables by a title stating that the variable uses "2000 Census standards." The Round 8 variable is coded as follows:

  1. not in CBSA
  2. in CBSA, not principal city
  3. in CBSA, principal city
  4. in CBSA, not known
  5. not in country

Return to top

Rural vs. Urban

Variable Created: CV_URBAN_RURAL

Places are identified as urban or rural by the Census Bureau. Urban places are those in “urbanized areas” or “places” with a population of at least 2,500; all other areas are rural. According to the Census Bureau, about 25 percent of the U.S. population lives in rural areas. The “urbanized area” map in the geocoding software used in round 8 was derived from the 2000 Census Bureau TIGER/Line files. Respondents residing in urban areas are coded 1 and those residing in rural areas are coded 0. Census Bureau information on urban and rural places can be retrieved from the following internet site:

Users should note that this variable includes an “unknown” category, coded 2. This value is assigned to respondents whose zip code includes both urban and rural areas or whose residence cannot be identified precisely enough to classify it as urban or rural. Respondents without valid address data are assigned a value of –3, invalid skip. Respondents who live out of the country are assigned a value of –4, valid skip.

Return to top

Distance to Parents' Residence

Variables Created:

Distance from Parents


Quality of Distance to Parent variables


The distance variables are created based on the respondent’s address and the address of their mother and father as reported in the locator section of the questionnaire.  The longitude and latitude locations of the respondent and their parents are determined using the geocoding software.  Distance is then created "as the crow flies"--that is, as a straight line between the residences, rather than according to actual travel routes.  Note that this variable is

In the public-use dataset, the distance variables are presented using the following categories:

0 Lives in the Same Household
1 1 to 5 miles
2 6 to 10 miles
3 11 to 30 miles
4 31 to 60 miles
5 61 to 100 miles
6 101 to 200 miles
7 201 to 400 miles
8 401 to 700 miles
9 > 700 miles

On the restricted-use Geocode CD, the exact distance is available.

Some of the reported addresses are not complete, so the exact street address can not be determined.  These addresses are assigned the longitude and latitude of the center of the zip code in which they are located.  The quality variables for the distance from parents' residence variables alert users if either the respondent's or the parent's residence was zip centroided.

  1. Neither respondent nor parent is zip centroided
  2. Respondent is zip centroided
  3. Parent is zip centroided
  4. Both respondent and parent are zip centroid
  5. Respondent and/or parent at a foreign location

Return to top

Migration History (Location)

Variable Created: CV_MIGRATE.xx

To provide information about the respondent's migration history, survey staff create this variable based on state and county codes as assigned by the geocoding program. The variable released to public users classifies respondent moves into one of the following four categories: 

  1. Move within county
  2. Move within state; different county
  3. Move between states
  4. Move to or from a foreign country

Respondents who have not moved since the previous interview are assigned a value of -4, valid skip. If respondent address information is incomplete and survey staff are unable to determine locations of the respondent's residences, the respondent is assigned a value of -3, invalid skip.  The input variables from the household are coded with a '0' when the respondent address is in a foreign country.

Return to top Return to Table of Contents