Speech Data in the NLSY97

Speech Data in the NLSY97

In round 15, speech data were collected to learn about the relationship between a worker's speech and his/her labor market success, elaborating on the pilot study carried out by Grogger (2011). There were two main steps involved: collecting audio data and converting the audio data into numerical data suitable for regression analysis. 

Important Information 

The speech variables are best located using the question name (QNAME) search in NLS Investigator. Search for "Question Name starts with SPCH" to find this set of variables.

Audio data collection

Audio data were collected during round 15 of the NLSY97. The data were collected in response to two speech prompts, designed to capture both informal and formal speech. One prompt was administered at the end of the interview, when respondents were asked to recount the happiest moment (HM) in their life since the date of their last interview. The second question, administered during the employment section of the interview, involved a job-search (JS) role-playing exercise where respondents were asked the following:

Let's suppose you applied for a job that sounded really interesting to you and they called you and asked you to come in for an interview. How would you describe your skills, qualifications, and experience to me if I were the person interviewing you for this job? (Employed respondents heard a slightly different preamble to the question.)

All respondents who completed in-person interviews and who gave consent to be recorded were eligible to be assigned at least one speech prompt. Answers were recorded by the on-board microphone in each field interviewer's (FI's) laptop. To make the recording, the CAPI interview software was programmed to turn on the FI's laptop microphone for one minute once a prompt was reached. FIs were provided with instructions designed to keep the respondent talking for as much of that minute as possible.

Because of similarities between African American Vernacular English (AAVE) and Southern American English (SoAE), both stimulus questions were assigned to all African-American and Southern white respondents. Southern white respondents are defined as non-Hispanic whites who resided in the South Census region at age 12. A random sample of 500 respondents who were neither black nor Southern white were also to be assigned both speech prompts, as were roughly 295 other respondents for whom speech data was collected in 2006 as part of Grogger (2011) but who were not included in the other categories above. All other speakers, including non-Southern white respondents and all other respondents, were randomly assigned to only one of the speech prompts.

Table 1 provides data on round-15 speech-prompt sampling and response rates, disaggregated by race/region at age 12. Of the 8,984 original NLSY97 respondents, 7,423 were interviewed during round 15. Among those interviews, 6,579 were carried out in person. Among those, 6,080 respondents provided consent to be recorded and were thus eligible for this coding exercise. The share of round-15 respondents participating in in-person interviews and consenting to be recorded was .83 for blacks, .80 for both Southern whites and non-Hispanic whites, and .84 for the remaining group.

The center panel of Table 1 shows how eligible respondents were assigned to speech prompts. For the most part, the assignments followed the sampling plan fairly closely. All but seven of the black respondents, and all but two of the Southern white respondents, were assigned both questions. Among non-Southern whites and others, 795 respondents were assigned to both stimulus questions. Ten otherwise eligible respondents were not assigned either speech question.

The bottom panel of Table 1 provides counts of eligible respondents for whom audio files were actually generated by the interviews. There is a discrepancy between the number of respondents from whom audio data should have been collected and the number from whom it was actually collected. Of the 6,080 eligible respondents, audio files were obtained from only 4,907. The rate of loss among eligibles was 17 percent for blacks and Southern whites, 21 percent for non-Southern whites, and 20 percent for others. The panel also shows that there were black and Southern whites respondents for whom only one audio file was obtained, when there should have been two.

The reasons for this loss of data are unclear. NLSY project staff indicate that audio files appear not to have been captured for the 1,173 (=6,080-4,907) respondents who were eligible to be recorded but for whom no audio data are available, perhaps due to technical difficulties in the CAPI interviewing system. The loss of recordings is widely distributed among FIs, rather than being concentrated among a few, so appears to have been unintentional.

Producing numerical data from the audio files

To generate data suitable for the regression analysis, anonymous listeners were recruited to listen to the audio files and answer questions about the speakers. After listening to each audio file, listeners were asked to specify the speaker's sex, race/ethnicity, and region of origin. Three listeners were assigned to each audio file. Thus speakers who responded to both the HM and JS prompts have six listener reports, whereas speakers who responded to only one of the prompts have three. To deal with data security issues surrounding the use of potentially identifiable voice data, listeners were recruited from the pool of NORC FIs and research assistants. Data processing was carried out remotely using specially configured laptops that provided secure connections to NORC's computer network, where the audio files resided. All listeners received confidentiality training stipulated by both NORC and BLS. 

Summary characteristics of the listeners are reported in Table 2. The modal listener was white and female, reflecting the demographics of the available workforce. Listeners were drawn from throughout the US, with disproportionately many Midwesterners. All listeners had completed high school; most had at least some tertiary education. The 22 listeners who listened to the JS audio files tended to be older, more Southern, and less educated than the 43 listeners who listened to the HM audio files (10 listened to both). Care was taken to ensure that speakers were not assigned to listeners who had interviewed them during round 15.

The HM files were processed first. All speakers with an HM audio file were in scope for HM data processing unless the file was empty or unintelligible. The top part of Table 3 shows that about 94 percent of the HM audio files were in scope, where this fraction varied from 89 percent for black speakers to 99 percent for non-Southern whites.

Budgetary issues limited the scope of processing for the JS files. The goals for JS file processing were to maximize the number of blacks and Southern whites for whom both HM and JS data were available and to maximize the number of non-Southern whites for whom data from at least one of the speech prompts would be available, while meeting the project budget constraint. A handful of "other" speakers were processed as well. As with the HM data, JS files that were empty or inaudible were deemed out of scope. The middle part of Table 3 shows that 83 percent of the available JS files for black speakers were processed, compared to 92 percent of those for Southern whites and 79 percent of those for non-Southern whites. Speech data from at least one prompt are available for a total of 4,225 NLSY respondents.

Table 1. Round-15 response counts by respondent's race and region at age 12

Race/region Black Southern white Non-Southern white Other Total
Original 1997 sample 2,335  1,160  3,253  2,236  8,984
R15 respondents 2,036  931  2,588  1,868  7,423
In-person interviews 1,833  797  2,269  1,680  6,579
...and consent to record 1,698  741  2,079  1,562  6,080
Speech prompt assignment:          
   Both questions  1,691  739  257  538  3,225
    HM only  1  0  906  516  1,423
    JS only  6  2  913  501  1,422
    No assignment  0  0  3  7  10
           
 At least one audio file  1,402  616  1,638  1,251  4,907
    Both questions  1,283  570  194  419  2,466
    HM only  22  6  706  400  1,134
    JS only  97  40  738  432  1,307
           
           
           

Notes: HM = happiest moment; JS = job search.

Table 2. Percentage distribution of listener characteristics, by speech prompt

Listener Characteristics
Happiest Moment (HM) Prompt Job Search (JS) Prompt
  (1) (2)
SEX    
Male 27 16
Female 73 84
Total 100 100
     
RACE/ETHNICITY    
White 83 84
Black 13 15
Hispanic 2 1
Other 2 0
Total 100 100
     
REGION OF RESIDENCE    
Northeast 21 19
Midwest 37 35
South 21 37
West 21 10
Unknown 0 0
Total 100 100
     
LEVEL OF EDUCATION    
HS diploma or GED 5 24
HS and some college 38 33
Bachelor's degree or higher 57 43
Total 100 100
     
Mean age of listener (years) 48 54

Table 3. Counts of speakers with speech data, by speaker's race and region at age 12

Speech Data Type Black Southern white Non-Southern white Other Total
Happiest Moment (HM) audio file 1,305 576 900 819 3,600
In-scope for Happiest Moment (HM) speech data 1,162 526 890 810 3,388
Job Search (JS) audio file 1,380 610 932 851 3,773
In-scope for Job Search (JS) speech data 1,139 564 739 59 2,501
Any speech data 1,168 567 1,629 861 4,225