On this page I have compiled a list of useful resources for econometrics students. This page includes resources for learning R, information about R packages that can be used to load or import data, resources for learning Stata, and other resources, including links about causal inference, data sources, and more. More generally, I can recommend the Library of Statistical Techniques which contains code examples for all sorts of tasks in Stata, R, Python, etc.
Stata2R. A side-by-side translation guide between Stata and two enormously powerful R packages: data.table for data wrangling and fixest for regression analysis.
Data Wrangling in the Tidyverse and/or data.table Videos and materials about cleaning and manipulating data in R using the tidyverse or data.table (and also Python/pandas).
Power Analysis using Simulation in R How to perform a power analysis using simulation in R.
Teaching Econometrics with R Workshop slides about how R can be used in an econometrics classroom, focusing on data manipulation and getting familiar with the language, targeted at economics faculty who may already be familiar with other statistics packages.
R-Project.org and RStudio.com Get started by installing R (all for free)! First install the latest version of R itself from R-Project. Then, install RStudio.
RStudio Cheat Sheets R basics in easily accessible and laid-out format! Heck, print it and tape it to the wall above where you work.
R for Economists This is a series of videos I made for the purpose of making it easy to get started on R. I try to introduce everything you are likely to need to know when using R in undergraduate economics classes.
Using R for Introductory Econometrics by Florian Heiss. This is a free e-book (physical version available cheap on the website) that describes using R in introductory econometrics courses. The book is designed to work alongside the introductory Wooldridge textbook.
Guide to R for Santa Clara University Economics Students Extensive and very useful guide taking you from the very first steps of using R, with an economics focus. Holds your hand!
Introduction to the dplyr package If you have your own data that you need to clean, the dplyr package is indispensible. Doing things like subsetting data, renaming variables, creating new variables, and sorting data are certainly possible without it, but much easier with it! Watch this video!
Introductory Machine Learning in R Most of you won’t need this! But if you’re interested in machine learning, R is the place to go (unless you know Python), and you can get started here.
Graphs in R (ggplot2) More advanced stuff. But if you want to make some real pretty graphs in R you have to learn ggplot2. Here’s a place to start.
R for Stata Users Making the switch? I’ve found this site very helpful, and a good source of functions that I’ve literally spent hours trying to replicate! Also a good source on more info for using dplyr in the “Split-Apply-Combine” section.
wbstats, install with install.packages(‘wbstats’). This package provides access to all the data available on the World Bank API, which is basically everything on their website. The World Bank keeps track of many country-level indicators over time.
tidycensus, install with install.packages(‘tidycensus’). This package gives you access to data from the US Census and the American Community Survey (plus, if you’re working with maps, US shape files!). These are the largest high-quality data sets you’ll find of cross-sectional data on individual people in the US. You’ll need to get a (free) API key from the Census website.
tidyquant and fredr, install with install.packages(‘tidyquant’) or install.packages(‘fredr’). If you’re looking for financial data, these are your ticket! tidyquant gets data from a number of financial sources, including FRED. fredr only gets data from FRED but is a little easier to use.
icpsrdata, install with install.packages(‘icpsrdata’). This downloads data from ICPSR (you’ll need an account and a keycode). ICPSR is a database of datasets from published social science papers for the purposes of reproducibility. If you’re looking for a study to replicate you could do worse than looking through their database, and using this package to get the data!
NHANES, install with install.packages(‘NHANES’). Data from the US National Health and Nutrition Examination Survey. If you’re interested in health economics you may want to look here.
ipumsr, install with install.packages(‘ipumsr’). IPUMS is a fantastic site that has census data from all around the world, in addition to the US census, American Community Survey, and Current Population Survey. If you’re doing international micro work, look at IPUMS. It’s also the easiest way to get the Current Population Survey (CPS), which is very popular for labor economics. Unfortunately ipumsr won’t get the data from within R; you’ll have to make your own data extract on the IPUMS website and download it. But ipumsr will read that file into R and preserve things like names and labels.
education-data-package-r, install with devtools::install_github(‘UrbanInstitute/education-data-package-r’) (you may have to do install.packages(‘devtools’) first). This package provides access to data on educational institutions in the US, including colleges (in IPEDS and College Scorecard) and K-12 schools (in CCD). This package also has data on county-level poverty rates from SAIPE. You may also want to consider the rscorecard package, install with install.packages(‘rscorecard’) for College Scorecard specifically, although this requires an API key.
psidR, install with install.packages(‘psidR’). The Panel Study of Income Dynamics is a study that doesn’t just follow people over their lifetimes, it follows their children too, generationally! A great source for studying how things follow families through generations..
atus, install with install.packages(‘atus’). Data from the American Time Use Survey, which is a large cross-sectional data set with information on how people spend their time.
Rilostat, install with install.packages(‘Rilostat’). Data from the International Labor Organization. This contains lots of different statistics on labor, like employment, wage gaps, etc., generally aggregated to the national level and changing over time.
gtrendsR, install with install.packages(‘gtrendsR’). This package will download data from Google Trends on the popularity of search terms over time. You can even do searches within specific geographic areas. Be aware that the results of a given query are meant to be comparable within that query. The scale changes from search to search!
In addition to the World Bank data above, there are several other sources of data that you can use to make international comparisons: gapminder, install with install.packages(‘gapminder’), which focuses on life expectancy and GDP per capita changes over time, democracyData, install with devtools::install_github(“xmarquez/democracyData”) (you may have to do install.packages(‘devtools’) first), which contains data on the presence of democratic institutions, and similarly, data360r, install with devtools::install_github(“mrpsonglao/data360r”), which has data on trade competitiveness and governance indicators from the World Bank.
fivethirtyeight, install with install.packages(‘fivethirtyeight’) contains many of the data sets used to write articles on the website FiveThirtyEight. There are data sets on politics and entertainment, and in particular if you’re interested in sports economics you may look here (while noting that for any given sport you may be interested in, Googling something like “r package NFL” will turn up a package to download data from that sport).
There are a number of packages that help you download data related to politics and elections. politicaldata, install with install.packages(‘politicaldata’), has information on US elections, nominations, and polls. Maybe check out the house_results data in there so you can use close elections as a RDD! rsunlight from ProPublica, install with install.packages(‘rsunlight’) has detailed information on individual bills and which congress members sponsor them. You’ll need a ProPublica API key, free.
R already has a bunch of data sets available in it, or in packages you might already use. This handy page documents many of the data sets available easily from within R.
There’s LOTS of data out there that is easy to get but doesn’t have a handy R package to go along with it! In addition to some major data sets that simply make you download a file and load it into R normally, like the National Longitudinal Studies (fantastic data that follows a group of people throughout their lives and is very commonly used), there are plenty of “APIs” out there prepared to give you whatever data you ask for! See this guide if you’re interested in grabbing data from APIs using R. The example it gives there grabs some data available on the New York City website.
Power Analysis using Simulation in Stata How to perform a power analysis using simulation in Stata.
Advanced Stata Tips Videos These are some videos I’ve made about some advanced Stata techniques.
Stata Learning Modules from IDRE Starting-out tips for Stata, with a nice introduction.
Statalist Where Google will take you most of the time if you have Stata questions. No better place for answers to tricky Stata questions. Be sure to search through before posting, as someone else has probably asked your question before.
https://www.stata.com/bookstore/microeconometrics-stata/ by Cameron and Trivedi Not free! But pretty darn good. A Stata-application focused econometrics book.
Data Visualization Checklist. A checklist to use whenever you’re preparing a data visualization.
The Effect: An Introduction to Research Design and Causality. I think it’s pretty good if I do say so!
Causal Inference Slides Slides from my causal inference class.
How to Put Together a Regression. A flowchart for taking a bunch of variables and putting together a regression model from it.
Robustness Tests: What, Why, and How A primer for econometrics students about how robustness tests work, how to think about them, and why they are used.
Teaching with Causal Diagrams Slides from a workshop about teaching research design, econometrics, and applied economics to undergraduates with causal diagrams.
Kaggle data sets The data science site Kaggle has all kinds of wild data sets on everything you can imagine. Generally these aren’t created with economics in mind, but some work for it anyway! You may be able to find something for an interesting project here.
Econometrics Navigator A great resource with information on econometric methods, data sources, and textbooks.
Causal Inference Animated Plots A series of animated plots showing what various causal inference methods actually do to data and how they work.
Causal Inference: The Mixtape by Cunningham A free textbook focusing on the ins and outs of causal inference. Includes Stata code but that’s less of the focus.
IPUMS No easier way to get census-style or panel data from large representative data sets.
Commonly Used Data Sources by Field A list of very useful data sets you may want to consider for your project!
Google Dataset Search A search engine for data sets! Mostly useful for finding previous studies that have made their data sets available publicly but aren’t large standard data sets of the kind you can get on IPUMS.
Web Plot Digitizer Ever seen a graph and wish you had the underlying data to play around with? Check this out.