Structuring data for API access and WiNDC.jl
August 01, 2025
How should we be storing and using WiNDC and economic data? We currently use sparse arrays to store sets and parameters. However, this approach means every model is tightly coupled to the specific application, for example the National module has fewer parameters than the Household module. This makes it difficult write generic code that can be used across modules. In this post I discuss a new data model that is more flexible and can be used across all modules.
The information in this post is the basis for WiNDC.jl and the WiNDC API. I make explicit reference to column names and parameters in the WiNDC National module which are subject to change. The concepts in this post will remain the same. This structure is available on GitHub, this is the best place to find the most up-to-date information about the data structure.
Understanding the WiNDC National Data
I am going to use the WiNDC National module as an example to understand our new data model. But first we should understand where the data comes from, the assumptions we make, and how we store the data in GAMS.
All the data for the National module comes from the Supply-Use tables (SUT) provided by the Bureau of Economic Analysis (BEA). The Supply table shows the total supply of goods and services in the economy, while the Use table shows how those goods and services are used in production and consumption.
Here is a figure that shows the structure of the SUT as provided by the BEA:
The SUT is a comprehensive representation of the economy, capturing all transactions between sectors. Because these tables represent both sides of the economy, we get three natural assumptions about the data:
Zero Profit Condition: The total value of inputs must equal the total value of outputs for each sector. This means that the sum of all the rows in the Use table must equal the sum of all the columns in the Supply table for each sector. As an equation, this can be written as: where is a fixed sector and the sums are over all commodities .
Note that this condition does not say that individual firms cannot make profits, surplus is contained in value added. Rather, it says that over the entire economy all payments are accounted for. Or, the economy itself can’t make a profit.
Market Clearance: The total supply of each commodity must equal the total use of that commodity. This means the sum across the columns of the Supply table must equal the sum across the rows of the Use table for each commodity. In other words, all the goods produced must be consumed.
Margin Balance: This is less obvious, but the two columns representing margins,
TransportandTrade, must each sum to zero. In WiNDC we decompose these columns (by sign) into two parameters,margin_supplyandmargin_demand.
From these tables we create sets and
parameters. The sets represent the labels on the rows and
columns of the SUT (sectors, commodities,
value added, etc.) and the parameters represent the values
in the SUT, indexed by the sets. For example,
intermediate_demand is a parameter indexed by
commodities and sectors, and
exports is indexed only by commodities.
Above, I wrote the equation to compute the zero profit condition but
I did not write the market clearance condition. That’s because the
market clearance condition is a long equation, it uses every parameter
except value added and other tax. If only
there was an easy to way to group by the commodities and sum the
values.
Going Long rather than Wide
The standard trick is to make the data long rather than
wide. This means we will make a single table with four columns:
row, column, parameter and
value.
- The column
rowcontains the row labels (commodities,value addedandother tax), - The
columncolumn contains the column labels (sectors,final demand, etc.), - The
parametercolumn contains the parameter names (intermediate_demand,exports, etc.) - The
valuecolumn contains the values from the SUT. By convention, we make values fromusepositive and values fromsupplynegative.
We refer to the row and column columns as
the domain of the data. In general, our table will have
columns, where
is the dimension of the data. If our data included multiple years, then
we would have a year column in the domain and our table
would have 5 columns.
The parameter column is necessary because
intermediate demand and intermediate supply
are both indexed by commodities and sectors,
but they are different parameters.
It might appear that the parameter column duplicates
information. For instance, the exports parameter is only
indexed by commodities, meaning the labels in
column will always agree with the labels in
parameter. In the summary-level tables the column label for
exports is F040 and in the detailed tables it is
F04000. The parameter column will always be
exports, this separates the data from the labels. Meaning I
can filter my large table to view exports without having to worry about
which aggregation level I am using.
Let’s call this long table the data table as it store
all of our data.
Accounting for Sets
The data table does not contain any information about
the sets. To account for this, we create a sets table that
contains the following columns:
name: The name of the set, preferring the singular version of the name (sector,commodity, etc.)description: A description of the set, which can be used to provide more context about the set.domain: Which column of thedatatable the set is used in (row,column).
Notice the sets table does not contain the elements of
the sets. If it did then we would be duplicating the description and
domain information for each element. Instead, we will create a separate
elements table that contains the actual elements of the
sets. The columns of the elements table will be:
name: The name of the element, which is the same as the label in therowandcolumncolumns of thedatatable.description: A description of the element, which can be used to provide more context about the element. For example,F040means export.set: The name of the set that the element belongs to, which is used to link the element to the set in thesetstable.
Computing Market Clearance
Using these three tables, let’s explore the logic of computing the
market clearance condition. The general idea is to filter the
row column of data to be only
commodities then group by row and sum the
values. Let’s break down we do this with our tables:
- Use the
setstable to find which column ofdatacontains each commodity, this gives usrow. - Filter the
elementsto find eachcommodity. - Subset
dataonrowto only include the commodities. - Group
databyrowand sum the values.
The other balance conditions can be computed in similar ways. For
zero profit, you want the sector rather than
commodity. For margin balance, we create a set that
contains the two marginal codes for Transport and
Trade.
The parameter column is not used to compute any of the
balance conditions, although you could explicitly extract the parameters
needed to compute the balance.
Understanding the
parameter Column
The parameter column contain the smallest parameter of
interest that the data belongs to. Let’s take the three blocks
other final demand, personal consumption and
exports. Together, these three blocks make up
final demand. If we wanted to extract
final demand we can filter the parameter
column to only the three blocks. However, this requires us to know the
names of the blocks and is inflexible if the blocks change. For example,
other final demand is not a single column it includes
categories like government spending, investment, and inventory changes.
If we decide to break other final demand into three blocks,
then we would have to update our code to include the new blocks.
Instead, we create a fourth table called parameters that
contains the following columns:
name: The name of the parameter, this is where we would havefinal_demandsubtable: References to theparametercolumn in thedatatable, this is where we would haveother_final_demand,personal_consumption, andexports.
This table may see modification moving forward, for example we don’t
have descriptions for parameters. For example, this may become part of
the sets and elements table in the future or
two similar tables. However, for now it is sufficient to understand the
data model.
To extract final demand we filter the name
column of the parameters table. This will return three rows
with other_final_demand, personal_consumption,
and exports in the subtable column which we
can use to filter the data table. If we end up
disaggregating other_final_demand into three separate
blocks, then we can simply update the parameters table to
include the new blocks without changing any of our code.
Future Work
Currently, I have implemented the data model described above and provided a generic calibration routine. There is a lot of work to be done:
- Determine optimal data structure for the
parameterstable. - Add domain checking to ensure, for example, that
intermediate demandis only indexed bycommoditiesandsectors. - Add methods to aggregate and disaggregate the data.
- Profile the code base and optimize the performance of the data model.
I’ll be working on this project in August, expect future posts as I make progress on these tasks.