Search Results

Source: Sociological Methodology
Resulting in 6 citations.
1. Brand, Jennie E.
Xu, Jiahui
Koch, Bernard
Geraldo, Pablo
Uncovering Sociological Effect Heterogeneity Using Tree-Based Machine Learning
Sociological Methodology published online (4 March 2021): DOI: 10.1177/0081175021993503.
Also: https://journals.sagepub.com/doi/full/10.1177/0081175021993503
Cohort(s): NLSY79
Publisher: American Sociological Association
Keyword(s): Educational Attainment; Heterogeneity; Methods/Methodology; Wages

Permission to reprint the abstract has not been received from the publisher.

Individuals do not respond uniformly to treatments, such as events or interventions. Sociologists routinely partition samples into subgroups to explore how the effects of treatments vary by selected covariates, such as race and gender, on the basis of theoretical priors. Data-driven discoveries are also routine, yet the analyses by which sociologists typically go about them are often problematic and seldom move us beyond our biases to explore new meaningful subgroups. Emerging machine learning methods based on decision trees allow researchers to explore sources of variation that they may not have previously considered or envisaged. In this article, the authors use tree-based machine learning, that is, causal trees, to recursively partition the sample to uncover sources of effect heterogeneity. Assessing a central topic in social inequality, college effects on wages, the authors compare what is learned from covariate and propensity score–based partitioning approaches with recursive partitioning based on causal trees. Decision trees, although superseded by forests for estimation, can be used to uncover subpopulations responsive to treatments. Using observational data, the authors expand on the existing causal tree literature by applying leaf-specific effect estimation strategies to adjust for observed confounding, including inverse propensity weighting, nearest neighbor matching, and doubly robust causal forests. We also assess localized balance metrics and sensitivity analyses to address the possibility of differential imbalance and unobserved confounding. The authors encourage researchers to follow similar data exploration practices in their work on variation in sociological effects and offer a straightforward framework by which to do so.
Bibliography Citation
Brand, Jennie E., Jiahui Xu, Bernard Koch and Pablo Geraldo. "Uncovering Sociological Effect Heterogeneity Using Tree-Based Machine Learning." Sociological Methodology published online (4 March 2021): DOI: 10.1177/0081175021993503.
2. Handcock, Mark S.
Morris, Martina
Relative Distribution Methods
Sociological Methodology 28 (1998): 53-97.
Also: http://depts.washington.edu/socmeth2/2abst98.htm
Cohort(s): NLS General
Publisher: American Sociological Association
Keyword(s): Data Analysis; Income Level; Methods/Methodology; Statistics; Variables, Independent - Covariate; Wage Levels

Permission to reprint the abstract has not been received from the publisher.

Presents an outline of relative distribution methods, with an application to recent changes in the US wage distribution, using data from the 1966 & 1979 panels of the National Longitudinal Survey. Relative distribution methods are a nonparametric statistical framework for analyzing data in a fully distributional context. The framework combines the graphical tools of exploratory data analysis with statistical summaries, decomposition, & inference. The relative distribution is similar to a density ratio, & technically defined as the random variable obtained by transforming a variable from a comparison group by the cumulative distribution function (CDF) of that variable for a reference group. This transformation produces a set of observations, the relative data, that represent the rank of the original comparison value in terms of the reference group's CDF. The density & CDF of the relative data can be used to fully represent & analyze distributional differences, allowing analysis to move beyond comparisons of means & variances. The analytic framework is general & flexible, as the relative density is decomposable into the effect of location & shape differences, & into effects that represent both compositional changes in covariates & changes in the covariate-outcome variable relationship. 5 Tables, 6 Figures, 2 Appendixes, 67 References. Adapted from the source document
Bibliography Citation
Handcock, Mark S. and Martina Morris. "Relative Distribution Methods." Sociological Methodology 28 (1998): 53-97.
3. Sengupta, Nandana
Udell, Madeleine
Srebro, Nathan
Evans, James
Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization
Sociological Methodology published online (22 October 2022): DOI: 10.1177/00811750221125799.
Also: https://journals.sagepub.com/doi/full/10.1177/00811750221125799
Cohort(s): NLSY97
Publisher: American Sociological Association
Keyword(s): Data Quality/Consistency; General Social Survey (GSS); Missing Data/Imputation

Permission to reprint the abstract has not been received from the publisher.

Social science approaches to missing values predict avoided, unrequested, or lost information from dense data sets, typically surveys. The authors propose a matrix factorization approach to missing data imputation that (1) identifies underlying factors to model similarities across respondents and responses and (2) regularizes across factors to reduce their overinfluence for optimal data reconstruction. This approach may enable social scientists to draw new conclusions from sparse data sets with a large number of features, for example, historical or archival sources, online surveys with high attrition rates, or data sets created from Web scraping, which confound traditional imputation techniques. The authors introduce matrix factorization techniques and detail their probabilistic interpretation, and they demonstrate these techniques' consistency with Rubin's multiple imputation framework. The authors show via simulations using artificial data and data from real-world subsets of the General Social Survey and National Longitudinal Study of Youth cases for which matrix factorization techniques may be preferred. These findings recommend the use of matrix factorization for data reconstruction in several settings, particularly when data are Boolean and categorical and when large proportions of the data are missing.
Bibliography Citation
Sengupta, Nandana, Madeleine Udell, Nathan Srebro and James Evans. "Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization." Sociological Methodology published online (22 October 2022): DOI: 10.1177/00811750221125799.
4. Shattuck, Rachel
Rendall, Michael S.
Retrospective Reporting of First Employment in the Life-courses of U.S. Women
Sociological Methodology 47,1 (August 2017): 307-344.
Also: http://journals.sagepub.com/doi/abstract/10.1177/0081175017723397
Cohort(s): NLSY97
Publisher: American Sociological Association
Keyword(s): Comparison Group (Reference group); Data Quality/Consistency; Employment History; Life Course; National Longitudinal Study of Adolescent Health (AddHealth); National Survey of Family Growth (NSFG); Survey of Income and Program Participation (SIPP)

Permission to reprint the abstract has not been received from the publisher.

The authors investigate the accuracy of young women's retrospective reporting on their first substantial employment in three major, nationally representative U.S. surveys, examining hypotheses that longer recall duration, employment histories with lower salience and higher complexity, and an absence of "anchoring" biographical details will adversely affect reporting accuracy. The authors compare retrospective reports to benchmark panel survey estimates for the same cohorts. Sociodemographic groups--notably non-Hispanic white women and women with college-educated mothers--whose early employment histories at these ages are in aggregate more complex (multiple jobs) and lower in salience (more part-time jobs) are more likely to omit the occurrence of their first substantial job or employment and to misreport their first job or employment as occurring at an older age. Also, retrospective reports are skewed toward overreporting longer, therefore more salient, later jobs over shorter, earlier jobs. The relatively small magnitudes of differences, however, indicate that the retrospective questions nevertheless capture these summary indicators of first substantial employment reasonably accurately. Moreover, these differences are especially small for groups of women who are more likely to experience labor-market disadvantage and for women with early births.
Bibliography Citation
Shattuck, Rachel and Michael S. Rendall. "Retrospective Reporting of First Employment in the Life-courses of U.S. Women." Sociological Methodology 47,1 (August 2017): 307-344.
5. Western, Bruce
Bloome, Deirdre
Variance Function Regressions for Studying Inequality
Sociological Methodology 39,1 (August 2009): 293-326.
Also: http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9531.2009.01222.x/abstract
Cohort(s): NLSY79
Publisher: American Sociological Association
Keyword(s): Earnings; Incarceration/Jail; Methods/Methodology; Modeling; Risk-Taking; Variables, Independent - Covariate; Variables, Instrumental

Permission to reprint the abstract has not been received from the publisher.

Regression-based studies of inequality model only between-group differences, yet often these differences are far exceeded by residual inequality. Residual inequality is usually attributed to measurement error or the influence of unobserved characteristics. We present a model, called variance function regression, that includes covariates for both the mean and variance of a dependent variable. In this model, the residual variance is treated as a target for analysis. In analyses of inequality, the residual variance might be interpreted as measuring risk or insecurity. Variance function regressions are illustrated in an analysis of panel data on earnings among released prisoners in the National Longitudinal Survey of Youth. We extend the model to a decomposition analysis, relating the change in inequality to compositional changes in the population and changes in coefficients for the mean and variance. The decomposition is applied to the trend in U.S. earnings inequality among male workers, 1970 to 2005.
Bibliography Citation
Western, Bruce and Deirdre Bloome. "Variance Function Regressions for Studying Inequality." Sociological Methodology 39,1 (August 2009): 293-326.
6. Xie, Yu
Brand, Jennie E.
Jann, Ben
Estimating Heterogeneous Treatment Effects with Observational Data
Sociological Methodology 42,1 (August 2012): 314-347.
Also: http://smx.sagepub.com/content/42/1/314.abstract
Cohort(s): NLSY79
Publisher: American Sociological Association
Keyword(s): College Enrollment; Fertility; Heterogeneity; Modeling; Propensity Scores

Permission to reprint the abstract has not been received from the publisher.

Individuals differ not only in their background characteristics but also in how they respond to a particular treatment, intervention, or stimulation. In particular, treatment effects may vary systematically by the propensity for treatment. In this paper, we discuss a practical approach to studying heterogeneous treatment effects as a function of the treatment propensity, under the same assumption commonly underlying regression analysis: ignorability. We describe one parametric method and two nonparametric methods for estimating interactions between treatment and the propensity for treatment. For the first method, we begin by estimating propensity scores for the probability of treatment given a set of observed covariates for each unit and construct balanced propensity score strata; we then estimate propensity score stratum-specific average treatment effects and evaluate a trend across them. For the second method, we match control units to treated units based on the propensity score and transform the data into treatment-control comparisons at the most elementary level at which such comparisons can be constructed; we then estimate treatment effects as a function of the propensity score by fitting a nonparametric model as a smoothing device. For the third method, we first estimate nonparametric regressions of the outcome variable as a function of the propensity score separately for treated units and for control units and then take the difference between the two nonparametric regressions. We illustrate the application of these methods with an empirical example of the effects of college attendance on women’s fertility.
Bibliography Citation
Xie, Yu, Jennie E. Brand and Ben Jann. "Estimating Heterogeneous Treatment Effects with Observational Data." Sociological Methodology 42,1 (August 2012): 314-347.