# Data Model¶

## The IAMC timeseries format for scenario data¶

Over the past decade, the Integrated Assessment Modeling Consortium (IAMC) developed a standardised tabular timeseries format to exchange scenario data. Previous high-level use cases include reports by the Intergovernmental Panel on Climate Change (IPCC) and model comparison exercises within the Energy Modeling Forum (EMF) hosted by Stanford University.

The table below shows a typical example of integrated-assessment scenario data following the IAMC format from the CD-LINKS project. The pyam package is geared for analysis and visualization of any scenario data provided in this structure.

Illustrative example of IAMC-format timeseries data
via the IAMC 1.5°C Scenario Explorer ([1])

Refer to data.ene.iiasa.ac.at/database for more information on the IAMC format and a full list of previous use cases.

### The variable column¶

The variable column implements a “semi-hierarchical” structure using the | character (pipe, not l or i) to indicate the depth.

Semi-hierarchical means that a hierarchy can be imposed, e.g., one can enforce that the sum of Emissions|CO2|Energy and Emissions|CO2|Other must be equal to Emissions|CO2 (if there are no other Emissions|CO2|… variables). However, this is not mandatory, e.g., the sum of Primary Energy|Coal, Primary Energy|Gas and Primary Energy|Fossil should not be equal to Primary Energy because this would double-count fossil fuels.

Refer to the variable list in the documentation pages of the IAMC 1.5°C Scenario Explorer to see the full list of variables used in the recent IPCC Special Report on Global Warming of 1.5 ºC (SR15).

### The year column¶

In its original design, the IAMC data format (see above) assumed that the temporal dimension of any scenario data was restricted to full years represented as integer values.

Two additional use cases are currently supported by pyam in development mode (beta):

• using representative sub-annual timesteps via the extra_cols feature (see the section on custom columns in the data table)

• using continuous time via pandas.datetime, replacing the name of the year column by time

## The IamDataFrame class¶

A pyam.IamDataFrame instance is a wrapper for two pandas.DataFrame instances (i.e., tables, see the pandas docs for more information).

### The data table¶

This table contains the timeseries data related to an ensemble of scenarios. It is structured in long format, where each datapoint is one row. In contrast, the standard IAMC-style format is in wide format (see the example above), where each timeseries is one row and the timesteps are represented as columns.

While long-format tables have advantages for the internal implementation of many pyam functions, wide-format tables are more intuitive to users. The method timeseries() converts between the formats and returns a pandas.DataFrame in wide format. Exporting an IamDataFrame to file using to_excel() or to_csv() also writes the data table in wide format.

#### The standard columns¶

The columns of the data table are ['model', 'scenario', 'region', 'unit', <time_format>, 'value'], where time_format is year when timesteps are given in years (as int) or time when time is represented on a continuous scale (as pandas.datetime).

#### Custom columns of the data table¶

If an IamDataFrame is initialised with columns that are not in the list above nor interpreted as belonging to the time dimension (in wide format), these columns are included in the data table as extra_cols. This feature can be used, for example, to distinguish between multiple climate models providing different values for the variable Temperature|Global Mean .

Warning

Not all pyam functions currently support the continuous time or custom columns in a data table. Please reach out via the mailing list or GitHub issues if you are not sure whether your use case is supported.

Warning

A word of warning for adding annotations relating to custom columns: pyam drops any data rows which have NaN values. Hence, if you are adding meta information to data, you need to make sure that you add a value to every single row.

The reason for that implementation is that pandas does not work as expected with NaN in many cases (see here and here). Therefore, it is simpler to remove NaN’s to ensure that pyam has a clean dataset on which to operate.

### The meta table¶

This table is intended for categorisation and quantitative indicators at the model-scenario level. Examples in the SR15 context are the warming category (‘Below 1.5°C’, ‘1.5°C with low overshoot’, etc.) and the cumulative CO2 emissions until the end of the century.

pyam attempts to keep the information in meta consistent with data when performing operations (e.g., rename(), append()). See utils.merge_meta() for details.

Note

The meta table is not intended for annotations of individual data points. If you want to add meta information at this level (e.g., which stylized climate model provided the variable Temperature|Global Mean, or whether a data point is from the original data source or the result of an operation), this should operate on the data table of the IamDataFrame using the custom-columns feature (see custom columns above).

## Filtering¶

The pyam package provides two methods for filtering scenario data:

An existing IamDataFrame can be filtered using filter(col=...), where col can be any column of the data table (i.e., ['model', 'scenario', 'region', 'unit', 'year'/'time'] or any custom columns), or a column of the meta table. The returned object is a new IamDataFrame instance.

A pandas.DataFrame (data) with columns or index ['model', 'scenario'] can be filtered by any meta columns from an IamDataFrame (df) using pyam.filter_by_meta(data, df, col=..., join_meta=False). The returned object is a pandas.DataFrame down-selected to those models-and-scenarios where the meta column satisfies the criteria given by col=... . Optionally, the meta columns are joined to the returned dataframe.

## References¶

1

Daniel Huppmann, Elmar Kriegler, Volker Krey, Keywan Riahi, Joeri Rogelj, Katherine Calvin, Florian Humpenoeder, Alexander Popp, Steven K. Rose, John Weyant, and et al. IAMC 1.5 °C Scenario Explorer and Data hosted by IIASA (release 2.0). Integrated Assessment Modeling Consortium & International Institute for Applied Systems Analysis, 2019. URL: https://data.ene.iiasa.ac.at/iamc-1.5c-explorer, doi:10.5281/zenodo.3363345.