Aggregating WiNDC National
January 06, 2026
This example will walk through the code in this repository. This repository contains an example aggregating the WiNDC National dataset to a size that can be run with a trial PATH license (less than 300 variables in the model).
We recommend using Julia with VSCode. Download VSCode from this link and Julia from this link. After installing Julia, install the Julia extension in VSCode by searching for “Julia” in the extensions tab.
Setting up the Environment
To run this code, clone the repository to your local machine. Start Julia in the repository folder and activate/instantiate the environment:
julia> ]
(v1.12) pkg> activate .
(National_Aggregated.jl) pkg> instantiateThis will download the packages listed in Project.toml
and create a Manifest.toml file. If you are familiar with
Python, this is essentially creating a virtual environment for this
project. If you are unfamiliar with environments, they are a way to
specify specific package versions for a project without affecting other
projects on your machine.
In this example, we are specifying WiNDCNational version
0.3.2 in the Project.toml file. We are doing
this because this version of WiNDCNational has a
summary.yaml file that does not include 2024.
We will run a second example demonstrating how to modify the
summary.yaml file in the WiNDCNational package
to include 2024 in a later example.
The First Example
The first example is in first_example.jl. This example
demonstrates how to aggregate the sectors/commodities in the WiNDC
National dataset. Open this file, you can run each line by pressing
Shift + Enter in VSCode with the Julia extension installed.
Feel free to run each line in the file. We will explain our logic
below.
Loading Packages
We are loading packages with two different methods. The first method
is using, with loads all exported functions from the
package. For example, using DataFrames means we can use all
the functions in the DataFrames package without prefixing
them with DataFrames. (e.g.xa0use subset instead
of DataFrames.subset). The second method is
import, which only loads the given items. For example,
import MPSGE means to use any function from the
MPSGE package, we have to prefix it with
MPSGE. (e.g.xa0MPSGE.solve!). We also use
import WiNDCContainer:aggregate, this loads only the
aggregate function from the WiNDCContainer
package, so we can use aggregate without prefixing it with
WiNDCContainer.
Building the Data
In WiNDCNational, we load the summary-level US National
dataset by calling:
build_us_table(:summary)This function downloads the necessary data files and builds a
National object that contains the summary-level national
data. The WiNDCNational package provides a summary.yaml
file that defines where to download the data files and the structure of
the data. We will discuss this file in the next example.
WiNDCNational has support to load the detailed-level table as well. You can load the detailed-level table by calling:
build_us_table(:detailed)The detailed data is available for the years 2007, 2012, and 2017. If you need a few detailed sectors/commodities, you can follow this example to aggregate the detailed data to your desired level of detail.
We do not automatically calibrate the data. This is primarily for transparency, so users can see the data before calibration. To calibrate the data, we call:
summary, M = calibrate(X)This function returns two items, a National object,
summary, containing the calibrated data and a JuMP model,
M, that contains the JuMP calibration model.
Exploring the Data
Before diving into the aggregation process, we should discuss how to
explore the data. The National object,
summary, contains all the data in a structured format. This
object holds three dataframes: one that holds the data, the sets, and
the elements. The data can be viewed using table:
table(summary)This dataframe has five columns: row,
column, year, parameter and
value. The first three columns define the domain of the
data, the parameter column defines the type of data
(e.g.xa0intermediate_demand, import, etc.), and the value
column contains the actual data values. Each entry in a domain column
corresponds to an element in a set. We can understand this by first
looking at the sets:
sets(summary)The table below is a subset of the sets:
| name | description | domain |
|---|---|---|
| commodity | Commodities | row |
| sector | Sectors | col |
| year | year | |
| trade | Trade | col |
| export | Exports | col |
The domain of the set commodity is row,
meaning that the commodities are only in the row column of
the data table. Similarly, the domain of the set sector is
col, meaning that the sectors are only in the
col column of the data table. We can extract all the data
with elements from a specific set by using the table
function with extra arguments. For example, to extract all the data
working with commodities:
table(summary, :commodity)The only entries in the row column of the data table
will be commodities. Notice you didn’t need to specify the
row column, because the set commodity only has
a domain of row. If you wanted further restrictions, such
as wanting only commodities and sectors, you can call:
table(summary, :commodity, :sector)To view the elements in a set, we can use the elements
function. For example, to view all the commodities:
elements(summary, :commodity)Similar to the table function, you can specify multiple
sets. For example, to view all the commodities and sectors:
elements(summary, :commodity, :sector)Parameters are sets with elements in the parameter
column of the data. To view all the parameters, we can use a DataFrame
subset:
subset(sets(summary), :domain => ByRow(==(:parameter)))By convention, we take parameter names to be Title_Case (or each word
uppercase and separated by underscores) with the element being
snake_case. For example, the parameter Intermediate_Demand
corresponds to the element intermediate_demand:
elements(summary, :Intermediate_Demand)Just like sets, parameters can be extracted from the data table:
table(summary, :Intermediate_Demand)You may notice all the values in the above table are negative. By
convention, we take inputs/demand/uses to be negative and outputs/supply
to be positive. This simplifies the computation of aggregate parameters
as we can use DataFrame operations without adjusting the sign of the
values. Signs can be flipped using the normalize keyword
argument in the table function:
table(summary, :Intermediate_Demand; normalize = :Use)The value of the normalize keyword can be any parameter.
We used Use above as that is the parameter that contains
all the uses (inputs/demand).
Some parameters, such as Use, are composite parameters.
They are made up of multiple underlying parameters. The parameter
Use is composed of all the parameters defined in the BEA
Use table:
elements(summary, :Use)Finally, you can filter the data to extract a single element from a
set. For example, to extract the naics code 111CA
commodity:
table(summary, :commodity => Symbol("111CA"))Or, to extract the naics code 111CA commodity in the
year 2023:
table(summary, :commodity => Symbol("111CA"), :year => 2023)Aggregation
To aggregate a set we need to define a mapping from old elements to new elements. In this example we will aggregate both commodities and sectors. The easiest way to define the mapping is using a CSV file and modifying it in Excel. We’ve seen how to extract the commodities and sectors, now we just need to write them to a CSV file:
elements(summary, :commodity, :sector) |>
x -> CSV.write("sets/pre_aggregation.csv", x)This uses a pipe, |>, in Julia. You can interpret
this as taking the output of the left-hand side and making it be
x in the right-hand side. So the above code is equivalent
to:
temp = elements(summary, :commodity, :sector)
CSV.write("sets/pre_aggregation.csv", temp)just with no temporary variable.
Open this file in Excel, add a column (ours is called
aggregate) that defines the new aggregated element for each
commodity/sector. Save this file as sets/aggregation.csv.
You can define any aggregation you want, we used the base NAICS
codes to aggregate the commodities and sectors to their respective
2-digit NAICS codes. You can add as many columns as you’d like, you just
need to select them when loading the file back into Julia.
Now we can load this file back into Julia:
aggregate_sectors = CSV.read(
"sets/aggregation.csv",
DataFrame,
select = [:aggregate, :name, :set],
types = Symbol
)This command will read the CSV file into a DataFrame, only selecting
the columns aggregate, name, and
set, and converting the values to Symbol
type.
The aggregation is then trivial using the aggregate
function:
X = aggregate(
summary,
aggregate_sectors;
set = :set, #optional, defines which column contains the set names
old_element = :name, # optional, defines which column contains the old element names
new_element = :aggregate # optional, defines which column contains the new element names
) The returned X is a new National object
with the aggregated commodities and sectors. We can check the commodity
elements to verify the aggregation worked:
elements(X, :commodity)MPSGE Model
The model is created using the national_mpsge
function:
M = national_mpsge(X)To view the documentation for this function, you can either view the
National
Model Documentation, or use the Julia help mode by typing
? in the Julia REPL and then entering
national_mpsge.
By default, the model is only for the year 2023. If you want to
create a model for a different year, you can specify the
year keyword argument. To create a model for 2017, use:
M_2017 = national_mpsge(X; year = 2017)The model prints are quite long, as you may have observed. We are working on a concise model view.
To solve the model, use the solve! function from the
MPSGE package. We will first set the
cumulative_iteration_limit to 0 so that we can verify the
data is calibrated at the benchmark solution:
MPSGE.solve!(M; cumulative_iteration_limit=0)If the model solves successfully, you should see a
Postsolved Residual that is very close to 0 (mine is around
1e-11). After solving the model, you can extract results
using value. For example, to view the value of the
Y variable for agriculture,
MPSGE.value(M[:Y][:agriculture])It’s not surprising that the value is 1, that is the benchmark solution. Let’s set a counterfactual shock and resolve the model. There are three parameters available to shock: - Import_Tariff[commodity] - Absorption_Tax[commodity] - Output_Tax[sector]
Let’s set tariffs to be a flat 30% across all commodities:
MPSGE.set_value!(M[:Import_Tariff], 0.3)Now we resolve the model:
MPSGE.solve!(M)Now if we check the value of Y for
agriculture again:
MPSGE.value(M[:Y][:agriculture])We expect an increase in the output of agricultural goods of about
2.3%. However, it is much more interesting if we could rebuild the
entire data table to see the full effects of the tariff shock. We can do
this using the reconstruct_table function from the
WiNDCNational package:
Y = WiNDCNational.reconstruct_table(X, M)This is a new National object, Y, that
contains the counterfactual data after the tariff shock. We can compare
the benchmark and counterfactual data using an inner join:
innerjoin(
table(X, :year => 2023),
table(Y, :year => 2023),
on = [:row, :col, :year, :parameter],
renamecols = "_benchmark" => "_counterfactual"
)We are using an innerjoin as some parameters get aggregated in the
reconstruction, like Investment_Final_Demand. We can view
the elements for both X and Y to see the
differences:
elements(X, :investment_final_demand)
elements(Y, :investment_final_demand)X is more disaggregated than Y. This is due
to the model using the aggregated data, but us preserving detail in the
original data. For a perfect comparison, we should aggregate
X to the same level of detail as the model. We’ll leave
this as an exercise for the reader.
Unsurprisingly, the comparison shows both imports and exports decrease across most commodities due to the tariff shock. Feel free to explore the data further, it is fairly straightforward to create graphs and tables from the dataframes.
The Second Example
If you explored the data in the previous example, you may have
noticed that the year 2024 is missing. This is because the
summary.yaml file in WiNDCNational version
0.3.2 does not include 2024. In this example, we will
demonstrate how to modify the summary.yaml file to include
2024 data.
The file that WiNDCNational is using is located
here. This is pinned to the v0.3.2 tag in the
WiNDCNational.jl repository. Copy and paste the contents of
this file into a new file. You can name this file anything you’d like,
the repo currently has a summary.yaml file, name yours
something else.
There are only two changes needed to include 2024 data. First, we
need to add 2024 to the years list near the top of the
file. This should be straightforward, follow the existing formatting.
Second, we need to update the download_url_common field.
The BEA likes to change the URLs for each data release. The updated URL
is:
"https://apps.bea.gov/industry/release/zip/SUPPLY-USE.zip"And that’s it. Save the file and reload the data using:
X = build_us_table("summary.yaml") # Change this to your file name
summary, M = calibrate(X)Let’s check, just to verify that the 2024 data has been loaded:
table(summary, :year => 2024)You can take this data, aggregate it, and run the same simulation as
before. If you get the exact same results, you’ve probably forgotten the
year = 2024 keyword when creating the model.
Best of luck with your modeling! Reach out if you have questions.
Exercises
- Aggregate the summary data to match the reconstructed data after solving the model. Use this to compare benchmark and counterfactual data.
- Create a model for each year in the data, set a shock, solve the model, and reconstruct the data. You should have a single table with all years. Compare benchmark and counterfactual data across all years.
- Create graphs to summarize the effects of your shock.