The IAMC timeseries format for scenario data¶
Over the past decade, the Integrated Assessment Modeling Consortium (IAMC) developed a standardised tabular timeseries format to exchange scenario data. Previous high-level use cases include reports by the Intergovernmental Panel on Climate Change (IPCC) and model comparison exercises within the Energy Modeling Forum (EMF) hosted by Stanford University.
The table below shows a typical example of integrated-assessment scenario data
following the IAMC format from the CD-LINKS project.
pyam package is geared for analysis and visualization of any scenario
data provided in this structure.
Refer to data.ene.iiasa.ac.at/database for more information on the IAMC format and a full list of previous use cases.
The ‘variable’ column implements a “semi-hierarchical” structure
| character (pipe, not l or i) to indicate the depth.
Semi-hierarchical means that a hierarchy can be imposed, e.g., one can enforce
that the sum of
must be equal to
(if there are no other
However, this is not mandatory, e.g., the sum of
Primary Energy|Gas and
Primary Energy|Fossil should not be equal
Primary Energy because this would double-count fossil fuels.
In its original design, the IAMC data format (see above) assumed that the temporal dimension of any scenario data was restricted to full years represented as integer values.
Two additional use cases are currently supported by
pyam in development
Please reach out to the developers to get more information on this ongoing work.
The IamDataFrame class¶
This table contains the timeseries data related to an ensemble of scenarios. It is structured in long format, where each datapoint is one row. In contrast, the standard IAMC-style format is in wide format (see the example above), where each timeseries is one row and the timesteps are represented as columns.
While long-format tables have advantages for the internal implementation of many
pyam functions, wide-format tables are more intuitive to users.
timeseries() converts between
the formats and returns a
pandas.DataFrame in wide format.
IamDataFrame to file using
to_csv() also writes the data table
in wide format.
The standard columns¶
The columns of the ‘data’ table are
['model', 'scenario', 'region',
'unit', <time_format>, 'value'], where
time_format is ‘year’
when timesteps are given in years (as
int) or ‘time’ when time
is represented on a continuous scale (as
Custom columns of the
IamDataFrame is initialised with columns that are not in the
list above nor interpreted as belonging to the time dimension (in wide format),
these columns are included in the ‘data’ table as additional columns
This feature can be used, for example, to distinguish between multiple
climate models providing different values for the variable
Not all pyam functions currently support the continuous-time format or custom columns in a ‘data’ table. Please reach out via the mailing list or GitHub issues if you are not sure whether your use case is supported.
A word of warning when using custom columns for annotations: pyam drops any data rows where the ‘value’ column is ‘nan’, and it raises an error for ‘nan’ in any other column. Hence, if you are adding variable/region-specific meta information to ‘data’, you need to make sure that you add a value to every single row.
The reason for that implementation is that pandas does not work as expected with ‘nan’ in some situations (see here and here). Therefore, enforcing that there are no ‘nan’s in an IamDataFrame ensures that pyam has a clean dataset on which to operate.
This table is intended for categorisation and quantitative indicators at the model-scenario level. Examples in the SR15 context are the warming category (‘Below 1.5°C’, ‘1.5°C with low overshoot’, etc.) and the cumulative CO2 emissions until the end of the century.
The ‘meta’ table is not intended for annotations of individual
data points. If you want to add meta information at this level
(e.g., which stylized climate model provided the variable
Temperature|Global Mean, or whether a data point is from the
original data source or the result of an operation), this should operate on
the ‘data’ table of the IamDataFrame using the
custom-columns feature (see custom columns above).
pyam package provides two methods for filtering scenario data:
IamDataFrame can be filtered using
col can be any column of the ‘data’ table (i.e.,
['model', 'scenario', 'region', 'unit', 'year'/'time'] or any custom
columns), or a column of the ‘meta’ table. The returned object is
pandas.DataFrame (‘data’) with columns or index
['model', 'scenario'] can be filtered by any ‘meta’ columns from
pyam.filter_by_meta(data, df, col=..., join_meta=False).
The returned object is a
pandas.DataFrame down-selected to those
models-and-scenarios where the ‘meta’ column satisfies the criteria given
Optionally, the ‘meta’ columns are joined to the returned dataframe.
Daniel Huppmann, Elmar Kriegler, Volker Krey, Keywan Riahi, Joeri Rogelj, Katherine Calvin, Florian Humpenoeder, Alexander Popp, Steven K. Rose, John Weyant, and et al. IAMC 1.5 °C Scenario Explorer and Data hosted by IIASA (release 2.0). Integrated Assessment Modeling Consortium & International Institute for Applied Systems Analysis, 2019. URL: https://data.ene.iiasa.ac.at/iamc-1.5c-explorer, doi:10.5281/zenodo.3363345.