Working Papers
Bureaucracy as a tool for Politicians: Evidence from Germany
This paper studies the impact of a well-functioning bureaucracy on the effectiveness of repression, in the context of Germany's Nazi regime. I compare former Prussian to non-Prussian municipalities within unified Germany in a regression discontinuity framework. When the Nazis persecuted the German Jews, Prussian areas implemented deportations of Jews more efficiently. During the Weimar republic, when Jews were legally protected, violence against Jews is lower in former Prussian areas. In both periods, Prussian local governments had greater `capacity': They were more effective at raising taxes and collecting trash. Capacity derived from greater specialization and better information processing rather than from effort. Specialization may have created the moral wiggle room to implement repugnant directives.
The Economic Effects of the English Parliamentary Enclosures
AppendixWe use a dataset of the entire population of English Parliamentary enclosure acts between 1750 and 1830 to provide the first evidence of their impact. Parliamentary enclosure led to the systematic rationalization of traditional property rights. Exploiting a feature of the Parliamentary process that produced such legislation as a source of exogenous variation, we show that such enclosures were associated with significantly higher crop yields, but also higher land inequality. Our results are in line with a literature going back to Arthur Young and Karl Marx on the effects of Parliamentary enclosure on productivity and inequality. They do not support the argument that informal systems of governance, even in small, cohesive, and stable communities, were able to efficiently allocate commonly used and governed resources.
Publications
Contrastive Entity Coreference and Disambiguation for Historical Texts
Massive-scale historical document collections are crucial for social science research. Despite increasing digitization, these documents typically lack unique cross-document identifiers for individuals mentioned within the texts, as well as individual identifiers from external knowledgebases like Wikipedia/Wikidata. Existing entity disambiguation methods often fall short in accuracy for historical documents, which are replete with individuals not remembered in contemporary knowledgebases. This study makes three key contributions to improve cross-document coreference resolution and disambiguation in historical texts: a massive-scale training dataset replete with hard negatives - that sources over 190 million entity pairs from Wikipedia contexts and disambiguation pages - high-quality evaluation data from hand-labeled historical newswire articles, and trained models evaluated on this historical benchmark. We contrastively train bi-encoder models for coreferencing and disambiguating individuals in historical texts, achieving accurate, scalable performance that identifies out-of-knowledgebase individuals. Our approach significantly surpasses other entity disambiguation models on our historical newswire benchmark. Our models also demonstrate competitive performance on modern entity disambiguation benchmarks, particularly certain news disambiguation datasets.
American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout regions. OCR quality can also be low. This study develops a novel, deep learning pipeline for extracting full article texts from newspaper images and applies it to the nearly 20 million scans in Library of Congress’s public domain Chronicling America collection. The pipeline includes layout detection, legibility classification, custom OCR, and association of article texts spanning multiple bounding boxes. To achieve high scalability, it is built with efficient architectures designed for mobile phones. The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge. The dataset could also be added to the external database of a retrieval-augmented language model to make historical information - ranging from interpretations of political events to minutiae about the lives of people’s ancestors - more widely accessible. Furthermore, structured article texts facilitate using transformer-based methods for popular social science applications like topic classification, detection of reproduced content, and news story clustering. Finally, American Stories provides a massive silver quality dataset for innovating multimodal layout analysis models and other multimodal applications.
The Economic Origins of Government
American Economic Review, Volume 113, Issue 10, Pages 2507-45, lead article
We test between cooperative and extractive theories of the origins of government. We use river shifts in southern Iraq as a natural experiment, in a new archeological panel dataset. A shift away creates a local demand for a government to coordinate because private river irrigation needs to be replaced with public canals. It disincentivizes local extraction as land is no longer productive without irrigation. Consistent with a cooperative theory of government, a river shift away led to state formation, canal construction, and the payment of tribute. We argue that the first governments coordinated between extended households which implemented public good provision.
The Long-Run Impact of the Dissolution of the English Monasteries
Quarterly Journal of Economics, Volume 136, Issue 4, Pages 2093–2145
We examine the long-run economic impact of the Dissolution of the English monasteries in 1535, which is plausibly linked to the commercialization of agriculture and the location of the Industrial Revolution. Using monastic income at the parish level as our explanatory variable, we show that parishes which the Dissolution impacted more had more textile mills and employed a greater share of population outside agriculture, had more gentry and agricultural patent holders, and were more likely to be enclosed. Our results extend Tawney’s famous ‘rise of the gentry’ thesis by linking social change to the Industrial Revolution.
The Origins of Violence in Rwanda
The Review of Economic Studies, Volume 88, Issue 2, Pages 730–763
This paper shows that the intensity of violence in Rwanda's recent past can be traced back to the initial establishment of its precolonial state. Villages that were brought under centralized rule one century earlier experience a doubling of violence during the state-organized 1994 genocide. Instrumental variable estimates exploiting differences in proximity to Nyanza -- an early capital -- suggest these effects are causal. In other periods, when the state faced rebel attacks, with longer state presence, violence is lower. Using data from several sources, including a lab-in-the-field experiment across an abandoned historical boundary, I show that the effect of the historical state is primarily sustained by culturally transmitted norms of obedience. The persistent effect of the precolonial state interacts with government policy: Where the state developed earlier, there is more violence when the Rwandan government mobilized for mass killing and less violence when the government pursued peace.
Short papers and invited submissions
The Collapse of Civilization in Southern Mesopotamia
Cliometrica, Volume 16, Issue 2, Pages 369-404
In the late 9th century rural settlement, agriculture, and urbanization all collapsed in Southern Mesopotamia. We first document this collapse using newly digitized archaeological data. We then present a model of hydraulic society that highlights the collapse of state capacity as a proximate cause of the collapse of the economy, and a shortened horizon of the ruler as a potential driver of the timing of the collapse. Using cross sections of tax collection data for 27 districts in southern Mesopotamia in 812, 846, and 918 we verify that the proximate cause of the crisis was the collapse in state capacity, which meant that the state no longer maintained the irrigation system. A particularly destructive succession struggle, shortening the investment horizon of rulers, determined the timing of the crisis.
Colonialism and Economic Development in Africa
In Carol Lancaster and Nicolas Van de Walle eds. Handbook on the Politics of Development, Oxford University Press.
In this paper we evaluate the impact of colonialism on development in Sub-Saharan Africa. In the world context, colonialism had very heterogeneous effects, operating through many mechanisms, sometimes encouraging development sometimes retarding it. In the African case, however, this heterogeneity is muted, making an assessment of the average effect more interesting. We emphasize that to draw conclusions it is necessary not just to know what actually happened to development during the colonial period, but also to take a view on what might have happened without colonialism and also to take into account the legacy of colonialism. We argue that in the light of plausible counterfactuals, colonialism probably had a uniformly negative effect on development in Africa. To develop this claim we distinguish between three sorts of colonies: (1) those which coincided with a pre-colonial centralized state, (2) those of white settlement, (3) the rest. Each have distinct performance within the colonial period, different counterfactuals and varied legacies.
Research methods
Spatial standard errors for several commonly used M-estimators
We present the asymptotic variance-covariance matrix for M-estimators, and show how it can be used to compute spatial standard errors for a large number of commonly used (non-linear) estimators. We consider OLS, Logit, Probit estimators, Poisson and Negative Binomial regressions, and the special STATA estimators areg and regdhfe. We provide STATA and Python software to implement our findings.
OCR History
OCR History is a python wrapper around Google Vision and Amazon Textract account that allows for simple prototyping of document digitization. It allows preprocessing such as cropping, grayscale conversion, contrast/brightness adjustment, and splitting into subimages. It returns dataframes for tabular inputs.