Schwalbach, Jan and Christian Rauh. 2021. Collecting large-scale comparative text data on legislative debates. In: The Politics of Legislative Debate Around the World, ed. Marc Debus, Hanna Bäck and Jorge M. Fernandes. Oxford: Oxford University Press pp. 91–109.
Abstract of chapter:
Parliamentary speeches present one of the most consistently available sources of information about the political priorities, actor positions, and conflict structures in democratic states. Recent advances of automated text analysis offer more and more tools to tap into this information reservoir in a systematic manner. However, collecting the high-quality text data needed for unleashing the comparative potential of the various text analysis algorithms out there is a costly endeavour and faces various pragmatic hurdles. Against this challenge, this chapter offers three contributions. First, we outline best practice guidelines and useful tools for researchers wishing to collect or to extend existing legislative debate corpora. Second, we present an extended version of the ParlSpeech Corpus which contains machine-readable full text vectors of more than six million speeches from the key legislative chambers of nine countries across periods of up to 32 years. Third, we highlight the difficulties of comparing text-as-data outputs across different parliaments pointing to varying languages, varying traditions and conventions, and varying metadata availability. Along these three steps, the chapter aims to encourage more investment in the collection of textual data on legislative debates.
About the book: