The Effect: An Introduction to Research Design and Causality
"This page contains chapters from my textbook The Effect: An Introduction to Research Design and Causality. This book is intended to introduce students to the concepts of research design and causality in the context of observational data. The book is written in an intuitive way and doesn’t overload on concepts. Why teach regression and research design at the same time when they are fundamentally different things? First learn why you want to structure a design in a certain way, and what it is you want to do to the data, and then afterwards learn the technical details of how to run the appropriate model. This book consists of a Part 1 dedicated to research design and causality, making use of causal diagrams to make the concept of identification straightforward, and a Part 2 dedicated to implementation and common research designs like regresson with controls and regression discontinuity. This book is still in the revision process. Please do send any comments or questions about the material to nhuntington-klein@seattleu.edu or to me on Twitter at @nickchk.

The full book up to the currently-available chapters can be found here.
Part 1: The Design of Research

In this first half of the book, I focus on the concepts behind research. What are we actually trying to do? Develop research questions that, when answered, tell us something interesting about the world. Then we figure out how to select and describe the probability distributions and relationships that answer that research question.

This short introductory chapter makes the case for why research needs a design, and some of the hurdles we’re going to have to overcome to do it!



Empirical research allows us to learn about how the world works. In order to do that we’re going to need to design a study that answers a question. But there are a lot of questions that we can answer without actually learning much of anything. Which questions are good and which aren’t, and how can we tell? Which questions let us learn about how the world works?



This chapter is all about how to describe a single variable - distributions, types of variables, means, medians, etc.. After all, properly describing certain features of a probability distribution is… uh… all that empirical research is, really.



This chapter covers how to talk about the relationship between two differet variables, with a focus on what we’re actually trying to do rather than on technical aspects. What is a conditional distribution? What is a conditional conditional distribution? Should we fit a line to data or take local means?



This chapter talks about the concept of a data generating process and how to identify the answer to your research question. All analyses answer a question. How can you make sure that you’re answering the one you intend to answer? That’s identification.



Data generating processes are a bundle of causal relationships, and learning about how the world works often means trying to untangle one of those strands. Tall order. To do all that we’re going to need to take a good hard look at the knot first. Causal diagrams are a simple way of representing a data generating process. Once we have them figured out, the task of identifying the answer to our research question will become much easier.



Causal diagrams as a concept are well and good but they won’t do much if we don’t have one for our particular research question. This chapter covers how you can build your own model of how the world works. Then you can use it to identify the answer to your research question!



When we observe in data that two variables are related to each other, why is that? There must be some causal pathway linking them, and we should be able to see that causal pathway on a diagram. Thinking about these causal pathways both lets us learn more about why two variables are related, and lets us distinguish the paths we’re trying to uncover (the ones that are related to our research question) from the paths that we’re trying to close down (the distractions!). How can we carve out just the pathways we want?



So far we’ve talked about how to identify our answer by closing back doors, identifying our pathways of interest by carving them out of the raw data. But that’s not the only way we can do it. Sometimes it’s possible to pinpoint the paths we want and estimate them directly. It’s a little bit of magic! How can we do it?



What IS “the effect” anyway? Are we really supposed to believe, in social science no less where everthing is different for everyone, that X has only a single effect for Y, and it works the same way for everybody? Of course not! Things get interesting when we start thinking about how X might have a different effect on Y for different people, and what our attempts to identify the answer to our research question even mean at this point, and what we’re estimating. What is an average treatment effect, and is it really what we want?



This book lays out a careful process for thinking about data generating processes, using all the information we have, and pinpointing exactly what has to occur to be able to identify our effects. But what if we’re a lot less uncertain than that? What can we do when any causal diagram we draw feels just like a guess? In this chapter we talk about a few of the approaches we can take, and some of the stumbling blocks we might run into when using them.



Part 2: The Toolbox
The second part of the book, The Toolbox, is coming a bit later (with only two chapters currently completed), and will focus on standard research designs that are commonly used in applied work. Each chapter will cover, in depth, the concept behind the research design, standard approaches to implementation, and some notes on “How the Pros Do It” - concerns, tricks, and methods related to the design that you might not always see in a textbook, but are a part of every professional researcher’s kit.