Skip to main content

The WiNDC Household Data

Mitch Phillipson November 21, 2025


The WiNDC household data was created and has been maintained by Drew Schreiber. The data is pulled from a variety of sources using a sequence of R scripts. I am working on converting these methods into Julia, however I also need to be mindful of future maintenance in light of WiNDC funding. To that end, I am planning to create an R package that will contain the household data processing methods. Julia has the ability to call R code using the RCall.jl package, so this will allow me to use the R package within Julia scripts until I have time to write a native solution.

WiNDC Household Data Process

The goal of the household disaggregation is to add five households, corresponding to quintiles of income. The raw data comes primarily from the Current Population Survey (CPS) and the American Community Survey (ACS). This then gets combined with labor and capital data from the WiNDC State level disaggregation.

The requirement for state-level data is a weakness of the current R scripts. When updating the data it would be logical to update all of the raw data, then use GAMS to generate the GDX files and models. However, in the current process we must:

  1. Update the core data (using Python)
  2. Run the core module in GAMS to generate the state-level data
  3. Run the R scripts to generate the raw household data
  4. Run the household module in GAMS to generate the household-level data.

This process is inherently fragile, the longer the data chain the more likely something will go wrong.

The R Scripts

The files to generate the household data were not available on GitHub until yesterday, here is the repository. There is almost no documentation, my plan is to use this as a starting point to formalize the methods into an R package. This will allow for better version control, testing, and documentation.

Currently each script in the repository is a standalone file. My plan is to refactor these into functions that can be called from a main script. This way we have a clear entry point to the process. The main function will allow the user to specify the windc_build directory and check that the necessary GDX files exist. It will also require the user supply their own Census and BEA API keys as it’s bad practice to share these publicly, although not the worst for this publicly available data.