I develop a multilevel model for empirical contexts where each individual belongs to a cluster and a treatment is endogenously assigned at the cluster level. When treatment assignment is clustered, the treatment effect cannot be identified in a model with fully flexible cluster heterogeneity. To put restrictions on cluster heterogeneity, I assume that potential outcomes are independent of the treatment, conditioning on two sets of variables: cluster-level characteristics, and the distribution of individual-level characteristics for each cluster. With this selection-on-distribution framework, I control for treatment endogeneity and show how to recover treatment effect heterogeneity in both individual-level and cluster-level variables. To implement this idea, I propose a two-step estimation procedure based on a K-means algorithm. In the first step, I group clusters in terms of their distributions of individual-level characteristics. In the second step, I use the grouping structure to estimate the treatment effect. To illustrate the method, I study the disemployment effect of a raise in the minimum wage level on teenagers.
Treatment effect estimation strategies in the event-study setup, namely panel data with variation in treatment timing, often use the parallel trend assumption that assumes mean independence across different treatment timings. In this paper, I relax the parallel trend assumption by including a latent type variable and develop a conditional two-way fixed-effects model. With a finite support assumption on the latent type variable, I show that an extremum classifier consistently estimates the type assignment. Then I solve the endogeneity problem of the selection into treatment by conditioning on the latent type, through which the treatment timing is correlated with the outcome. I also allow treatment to affect units of different types differently and thus directly model and estimate type-level heterogeneity in treatment effect.
WORK IN PROGRESS
"Clustering Sensitivity with Weakly Dependent Data"
The use of clustered standard errors can be justified with a weak dependence assumption: given a metric of distance between units of observations, such as geographical distance, dependence between two units fades away as the distance grows. Under the weak dependence structure, any clustering structure is valid for inference as along as it clusters observations in a way that the distance between units from different clusters is large. This paper shows that there is large variation in the inference result based on the choice of the clustering structure and suggests a simple remedy to summarize multiple inference results based on multiple clustering structures.