Skip to main content

Aggregating WiNDC National

Mitch Phillipson January 06, 2026


This example will walk through the code in this repository. This repository contains an example aggregating the WiNDC National dataset to a size that can be run with a trial PATH license (less than 300 variables in the model).

We recommend using Julia with VSCode. Download VSCode from this link and Julia from this link. After installing Julia, install the Julia extension in VSCode by searching for “Julia” in the extensions tab.

Setting up the Environment

To run this code, clone the repository to your local machine. Start Julia in the repository folder and activate/instantiate the environment:

julia> ]
(v1.12) pkg> activate .
(National_Aggregated.jl) pkg> instantiate

This will download the packages listed in Project.toml and create a Manifest.toml file. If you are familiar with Python, this is essentially creating a virtual environment for this project. If you are unfamiliar with environments, they are a way to specify specific package versions for a project without affecting other projects on your machine.

In this example, we are specifying WiNDCNational version 0.3.2 in the Project.toml file. We are doing this because this version of WiNDCNational has a summary.yaml file that does not include 2024. We will run a second example demonstrating how to modify the summary.yaml file in the WiNDCNational package to include 2024 in a later example.

The First Example

The first example is in first_example.jl. This example demonstrates how to aggregate the sectors/commodities in the WiNDC National dataset. Open this file, you can run each line by pressing Shift + Enter in VSCode with the Julia extension installed. Feel free to run each line in the file. We will explain our logic below.

Loading Packages

We are loading packages with two different methods. The first method is using, with loads all exported functions from the package. For example, using DataFrames means we can use all the functions in the DataFrames package without prefixing them with DataFrames. (e.g.xa0use subset instead of DataFrames.subset). The second method is import, which only loads the given items. For example, import MPSGE means to use any function from the MPSGE package, we have to prefix it with MPSGE. (e.g.xa0MPSGE.solve!). We also use import WiNDCContainer:aggregate, this loads only the aggregate function from the WiNDCContainer package, so we can use aggregate without prefixing it with WiNDCContainer.

Building the Data

In WiNDCNational, we load the summary-level US National dataset by calling:

build_us_table(:summary)

This function downloads the necessary data files and builds a National object that contains the summary-level national data. The WiNDCNational package provides a summary.yaml file that defines where to download the data files and the structure of the data. We will discuss this file in the next example.

WiNDCNational has support to load the detailed-level table as well. You can load the detailed-level table by calling:

build_us_table(:detailed)

The detailed data is available for the years 2007, 2012, and 2017. If you need a few detailed sectors/commodities, you can follow this example to aggregate the detailed data to your desired level of detail.

We do not automatically calibrate the data. This is primarily for transparency, so users can see the data before calibration. To calibrate the data, we call:

summary, M = calibrate(X)

This function returns two items, a National object, summary, containing the calibrated data and a JuMP model, M, that contains the JuMP calibration model.

Exploring the Data

Before diving into the aggregation process, we should discuss how to explore the data. The National object, summary, contains all the data in a structured format. This object holds three dataframes: one that holds the data, the sets, and the elements. The data can be viewed using table:

table(summary)

This dataframe has five columns: row, column, year, parameter and value. The first three columns define the domain of the data, the parameter column defines the type of data (e.g.xa0intermediate_demand, import, etc.), and the value column contains the actual data values. Each entry in a domain column corresponds to an element in a set. We can understand this by first looking at the sets:

sets(summary)

The table below is a subset of the sets:

name description domain
commodity Commodities row
sector Sectors col
year year
trade Trade col
export Exports col

The domain of the set commodity is row, meaning that the commodities are only in the row column of the data table. Similarly, the domain of the set sector is col, meaning that the sectors are only in the col column of the data table. We can extract all the data with elements from a specific set by using the table function with extra arguments. For example, to extract all the data working with commodities:

table(summary, :commodity)

The only entries in the row column of the data table will be commodities. Notice you didn’t need to specify the row column, because the set commodity only has a domain of row. If you wanted further restrictions, such as wanting only commodities and sectors, you can call:

table(summary, :commodity, :sector)

To view the elements in a set, we can use the elements function. For example, to view all the commodities:

elements(summary, :commodity)

Similar to the table function, you can specify multiple sets. For example, to view all the commodities and sectors:

elements(summary, :commodity, :sector)

Parameters are sets with elements in the parameter column of the data. To view all the parameters, we can use a DataFrame subset:

subset(sets(summary), :domain => ByRow(==(:parameter)))

By convention, we take parameter names to be Title_Case (or each word uppercase and separated by underscores) with the element being snake_case. For example, the parameter Intermediate_Demand corresponds to the element intermediate_demand:

elements(summary, :Intermediate_Demand)

Just like sets, parameters can be extracted from the data table:

table(summary, :Intermediate_Demand)

You may notice all the values in the above table are negative. By convention, we take inputs/demand/uses to be negative and outputs/supply to be positive. This simplifies the computation of aggregate parameters as we can use DataFrame operations without adjusting the sign of the values. Signs can be flipped using the normalize keyword argument in the table function:

table(summary, :Intermediate_Demand; normalize = :Use)

The value of the normalize keyword can be any parameter. We used Use above as that is the parameter that contains all the uses (inputs/demand).

Some parameters, such as Use, are composite parameters. They are made up of multiple underlying parameters. The parameter Use is composed of all the parameters defined in the BEA Use table:

elements(summary, :Use)

Finally, you can filter the data to extract a single element from a set. For example, to extract the naics code 111CA commodity:

table(summary, :commodity => Symbol("111CA"))

Or, to extract the naics code 111CA commodity in the year 2023:

table(summary, :commodity => Symbol("111CA"), :year => 2023)

Aggregation

To aggregate a set we need to define a mapping from old elements to new elements. In this example we will aggregate both commodities and sectors. The easiest way to define the mapping is using a CSV file and modifying it in Excel. We’ve seen how to extract the commodities and sectors, now we just need to write them to a CSV file:

elements(summary, :commodity, :sector) |> 
    x -> CSV.write("sets/pre_aggregation.csv", x)

This uses a pipe, |>, in Julia. You can interpret this as taking the output of the left-hand side and making it be x in the right-hand side. So the above code is equivalent to:

temp = elements(summary, :commodity, :sector)
CSV.write("sets/pre_aggregation.csv", temp)

just with no temporary variable.

Open this file in Excel, add a column (ours is called aggregate) that defines the new aggregated element for each commodity/sector. Save this file as sets/aggregation.csv. You can define any aggregation you want, we used the base NAICS codes to aggregate the commodities and sectors to their respective 2-digit NAICS codes. You can add as many columns as you’d like, you just need to select them when loading the file back into Julia.

Now we can load this file back into Julia:

aggregate_sectors = CSV.read(
    "sets/aggregation.csv", 
    DataFrame, 
    select = [:aggregate, :name, :set], 
    types = Symbol
    )

This command will read the CSV file into a DataFrame, only selecting the columns aggregate, name, and set, and converting the values to Symbol type.

The aggregation is then trivial using the aggregate function:

X =  aggregate(
    summary, 
    aggregate_sectors;
    set = :set, #optional, defines which column contains the set names
    old_element = :name, # optional, defines which column contains the old element names
    new_element = :aggregate # optional, defines which column contains the new element names
    ) 

The returned X is a new National object with the aggregated commodities and sectors. We can check the commodity elements to verify the aggregation worked:

elements(X, :commodity)

MPSGE Model

The model is created using the national_mpsge function:

M = national_mpsge(X)

To view the documentation for this function, you can either view the National Model Documentation, or use the Julia help mode by typing ? in the Julia REPL and then entering national_mpsge.

By default, the model is only for the year 2023. If you want to create a model for a different year, you can specify the year keyword argument. To create a model for 2017, use:

M_2017 = national_mpsge(X; year = 2017)

The model prints are quite long, as you may have observed. We are working on a concise model view.

To solve the model, use the solve! function from the MPSGE package. We will first set the cumulative_iteration_limit to 0 so that we can verify the data is calibrated at the benchmark solution:

MPSGE.solve!(M; cumulative_iteration_limit=0)

If the model solves successfully, you should see a Postsolved Residual that is very close to 0 (mine is around 1e-11). After solving the model, you can extract results using value. For example, to view the value of the Y variable for agriculture,

MPSGE.value(M[:Y][:agriculture])

It’s not surprising that the value is 1, that is the benchmark solution. Let’s set a counterfactual shock and resolve the model. There are three parameters available to shock: - Import_Tariff[commodity] - Absorption_Tax[commodity] - Output_Tax[sector]

Let’s set tariffs to be a flat 30% across all commodities:

MPSGE.set_value!(M[:Import_Tariff], 0.3)

Now we resolve the model:

MPSGE.solve!(M)

Now if we check the value of Y for agriculture again:

MPSGE.value(M[:Y][:agriculture])

We expect an increase in the output of agricultural goods of about 2.3%. However, it is much more interesting if we could rebuild the entire data table to see the full effects of the tariff shock. We can do this using the reconstruct_table function from the WiNDCNational package:

Y = WiNDCNational.reconstruct_table(X, M)

This is a new National object, Y, that contains the counterfactual data after the tariff shock. We can compare the benchmark and counterfactual data using an inner join:

innerjoin(
  table(X, :year => 2023),
  table(Y, :year => 2023),
  on = [:row, :col, :year, :parameter],
  renamecols = "_benchmark" => "_counterfactual"
)

We are using an innerjoin as some parameters get aggregated in the reconstruction, like Investment_Final_Demand. We can view the elements for both X and Y to see the differences:

elements(X, :investment_final_demand)
elements(Y, :investment_final_demand)

X is more disaggregated than Y. This is due to the model using the aggregated data, but us preserving detail in the original data. For a perfect comparison, we should aggregate X to the same level of detail as the model. We’ll leave this as an exercise for the reader.

Unsurprisingly, the comparison shows both imports and exports decrease across most commodities due to the tariff shock. Feel free to explore the data further, it is fairly straightforward to create graphs and tables from the dataframes.

The Second Example

If you explored the data in the previous example, you may have noticed that the year 2024 is missing. This is because the summary.yaml file in WiNDCNational version 0.3.2 does not include 2024. In this example, we will demonstrate how to modify the summary.yaml file to include 2024 data.

The file that WiNDCNational is using is located here. This is pinned to the v0.3.2 tag in the WiNDCNational.jl repository. Copy and paste the contents of this file into a new file. You can name this file anything you’d like, the repo currently has a summary.yaml file, name yours something else.

There are only two changes needed to include 2024 data. First, we need to add 2024 to the years list near the top of the file. This should be straightforward, follow the existing formatting. Second, we need to update the download_url_common field. The BEA likes to change the URLs for each data release. The updated URL is:

"https://apps.bea.gov/industry/release/zip/SUPPLY-USE.zip"

And that’s it. Save the file and reload the data using:

X = build_us_table("summary.yaml") # Change this to your file name
summary, M = calibrate(X)

Let’s check, just to verify that the 2024 data has been loaded:

table(summary, :year => 2024)

You can take this data, aggregate it, and run the same simulation as before. If you get the exact same results, you’ve probably forgotten the year = 2024 keyword when creating the model.

Best of luck with your modeling! Reach out if you have questions.

Exercises

  1. Aggregate the summary data to match the reconstructed data after solving the model. Use this to compare benchmark and counterfactual data.
  2. Create a model for each year in the data, set a shock, solve the model, and reconstruct the data. You should have a single table with all years. Compare benchmark and counterfactual data across all years.
  3. Create graphs to summarize the effects of your shock.