The IamDataFrame class

class pyam.IamDataFrame(data, **kwargs)[source]

Scenario timeseries data

The class provides a number of diagnostic features (including validation of data, completeness of variables provided), processing tools (e.g., unit conversion), as well as visualization and plotting tools.

Parameters
dataixmp.Scenario, pd.DataFrame or data file

an instance of an ixmp.Scenario, pandas.DataFrame, or data file with the required data columns. A pandas.DataFrame can have the required data as columns or index. Support is provided additionally for R-style data columns for years, like “X2015”, etc.

kwargs

If value=<col>, melt column <col> to ‘value’ and use <col> name as ‘variable’; or mapping of required columns (IAMC_IDX) to any of the following:

  • one column in data

  • multiple columns, to be concatenated by |

  • a string to be used as value for this column

A pandas.DataFrame with suitable meta indicators can be passed as meta=<df>. The index will be downselected to those scenarios that have timeseries data.

Notes

When initializing an IamDataFrame from an xlsx file, pyam will per default look for the sheets ‘data’ and ‘meta’ to populate the respective tables. Custom sheet names can be specified with kwargs sheet_name (‘data’) and meta_sheet_name (‘meta’) Calling the class with meta_sheet_name=False will skip the import of the ‘meta’ table.

When initializing an IamDataFrame from an object that is already an IamDataFrame instance, the new object will be hard-linked to all attributes of the original object - so any changes on one object (e.g., with inplace=True) may also modify the other object! This is intended behaviour and consistent with pandas but may be confusing for those who are not used to the pandas/Python universe.

Attributes
empty

Indicator whether this object is empty

Methods

aggregate(variable[, components, method, …])

Aggregate timeseries components or sub-categories within each region

aggregate_region(variable[, region, …])

Aggregate a timeseries over a number of subregions

aggregate_time(variable[, column, value, …])

Aggregate a timeseries over a subannual time resolution

append(other[, ignore_meta_conflict, inplace])

Append any IamDataFrame-like object to this object

as_pandas([meta_cols, with_metadata])

Return object as a pandas.DataFrame

bar_plot(*args, **kwargs)

Plot timeseries bars of existing data

categorize(name, value, criteria[, color, …])

Assign scenarios to a category according to specific criteria

check_aggregate(variable[, components, …])

Check whether a timeseries matches the aggregation of its components

check_aggregate_region(variable[, region, …])

Check whether a timeseries matches the aggregation across subregions

check_internal_consistency([components])

Check whether a scenario ensemble is internally consistent

col_apply(col, func, *args, **kwargs)

Apply a function to a column of data or meta

convert_unit(current, to[, factor, …])

Convert all data having current units to new units.

copy()

Make a deepcopy of this object

downscale_region(variable[, region, …])

Downscale a timeseries to a number of subregions

equals(other)

Test if two objects contain the same data and meta indicators

export_meta(excel_writer[, sheet_name])

Write the ‘meta’ indicators of this object to an Excel sheet

filter([keep, inplace])

Return a (copy of a) filtered (downselected) IamDataFrame

head(*args, **kwargs)

Identical to pandas.DataFrame.head() operating on data

interpolate(time)

Interpolate missing values in timeseries (linear interpolation)

line_plot([x, y])

Plot timeseries lines of existing data

load_meta(path, *args, **kwargs)

Load ‘meta’ indicators from file

map_regions(map_col[, agg, copy_col, fname, …])

Plot regional data for a single model, scenario, variable, and year

models()

Get a list of models

normalize([inplace])

Normalize data to a specific data point

pie_plot(*args, **kwargs)

Plot a pie chart

pivot_table(index, columns[, values, …])

Returns a pivot table

regions()

Get a list of regions

rename([mapping, inplace, append, …])

Rename and aggregate columns using groupby().sum() on values

require_variable(variable[, unit, year, …])

Check whether all scenarios have a required variable

reset_exclude()

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)

Plot a scatter chart using meta indicators as columns

scenarios()

Get a list of scenarios

set_meta(meta[, name, index])

Add meta indicators as pandas.Series, list or value (int/float/str)

set_meta_from_data(name[, method, column])

Add meta indicators from downselected timeseries data of self

stack_plot(*args, **kwargs)

Plot timeseries stacks of existing data

swap_time_for_year([inplace])

Convert the time column to year.

tail(*args, **kwargs)

Identical to pandas.DataFrame.tail() operating on data

timeseries([iamc_index])

Returns data as pandas.DataFrame in wide format

to_csv(path[, iamc_index])

Write timeseries data of this object to a csv file

to_datapackage(path)

Write object to a frictionless Data Package

to_excel(excel_writer[, sheet_name, …])

Write object to an Excel spreadsheet

validate([criteria, exclude_on_fail])

Validate scenarios using criteria on timeseries values

variables([include_units])

Get a list of variables

aggregate(variable, components=None, method='sum', recursive=False, append=False)[source]

Aggregate timeseries components or sub-categories within each region

Parameters
variablestr or list of str

variable(s) for which the aggregate will be computed

componentslist of str, optional

list of variables to aggregate, defaults to all sub-categories of variable

methodfunc or str, optional

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

recursivebool, optional

iterate recursively over all subcategories of variable

appendbool, optional

append the aggregate timeseries to self and return None, else return aggregate timeseries as new IamDataFrame

Notes

The aggregation function interprets any missing values (numpy.nan) for individual components as 0.

aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, append=False)[source]

Aggregate a timeseries over a number of subregions

This function allows to add variable sub-categories that are only defined at the region level by setting components=True

Parameters
variablestr or list of str

variable(s) to be aggregated

regionstr, default ‘World’

region to which data will be aggregated

subregionslist of str

list of subregions, defaults to all regions other than region

componentsbool or list of str, optional

variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables

methodfunc or str, optional

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

weightstr, default None

variable to use as weight for the aggregation (currently only supported with method=’sum’)

appendbool, default False

append the aggregate timeseries to self and return None, else return aggregate timeseries as new IamDataFrame

aggregate_time(variable, column='subannual', value='year', components=None, method='sum', append=False)[source]

Aggregate a timeseries over a subannual time resolution

Parameters
variablestr or list of str

variable(s) to be aggregated

columnstr, optional

the data column to be used as subannual time representation

valuestr, optional

the name of the aggregated (subannual) time

componentslist of str

subannual timeslices to be aggregated; defaults to all subannual timeslices other than value

methodfunc or str, optional

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

appendbool, optional

append the aggregate timeseries to self and return None, else return aggregate timeseries as new IamDataFrame

append(other, ignore_meta_conflict=False, inplace=False, **kwargs)[source]

Append any IamDataFrame-like object to this object

Indicators in other.meta that are not in self.meta are merged. Missing values are set to NaN. Conflicting data rows always raise a ValueError.

Parameters
otherIamDataFrame, ixmp.Scenario, pandas.DataFrame or data file

any object castable as IamDataFrame to be appended

ignore_meta_conflictbool, default False

if False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.

inplacebool, default False

if True, do operation inplace and return None

kwargs

passed to IamDataFrame(other, **kwargs) if other is not already an IamDataFrame

as_pandas(meta_cols=True, with_metadata=None)[source]

Return object as a pandas.DataFrame

Parameters
meta_colslist, default None

join data with all meta columns if True (default) or only with columns in list, or return copy of data if False

bar_plot(*args, **kwargs)[source]

Plot timeseries bars of existing data

see pyam.plotting.bar_plot() for all available options

categorize(name, value, criteria, color=None, marker=None, linestyle=None)[source]

Assign scenarios to a category according to specific criteria

Parameters
namestr

column name of the ‘meta’ table

valuestr

category identifier

criteriadict

dictionary with variables mapped to applicable checks (‘up’ and ‘lo’ for respective bounds, ‘year’ for years - optional)

colorstr

assign a color to this category for plotting

markerstr

assign a marker to this category for plotting

linestylestr

assign a linestyle to this category for plotting

check_aggregate(variable, components=None, method='sum', exclude_on_fail=False, multiplier=1, **kwargs)[source]

Check whether a timeseries matches the aggregation of its components

Parameters
variablestr or list of str

variable(s) checked for matching aggregation of sub-categories

componentslist of str, default None

list of variables, defaults to all sub-categories of variable

methodfunc or str, optional

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

exclude_on_failbool, optional

flag scenarios failing validation as exclude: True

multipliernumber, optional

factor when comparing variable and sum of components

kwargsarguments for comparison of values

passed to numpy.isclose()

check_aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, exclude_on_fail=False, **kwargs)[source]

Check whether a timeseries matches the aggregation across subregions

Parameters
variablestr or list of str

variable(s) to be checked for matching aggregation of subregions

regionstr, default ‘World’

region to be checked for matching aggregation of subregions

subregionslist of str

list of subregions, defaults to all regions other than region

componentsbool or list of str, default False

variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables

methodfunc or str, optional

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

weightstr, optional

variable to use as weight for the aggregation (currently only supported with method=’sum’)

exclude_on_failboolean, optional

flag scenarios failing validation as exclude: True

kwargsarguments for comparison of values

passed to numpy.isclose()

check_internal_consistency(components=False, **kwargs)[source]

Check whether a scenario ensemble is internally consistent

We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a DataFrame of inconsistent variables is returned.

Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we store how the regions relate, see this issue.

Parameters
kwargsarguments for comparison of values

passed to numpy.isclose()

componentsbool, optional

passed to check_aggregate_region() if True, use all sub-categories of each variable included in World but not in any of the subregions; if False, only aggregate variables over subregions

col_apply(col, func, *args, **kwargs)[source]

Apply a function to a column of data or meta

Parameters
col: str

column in either data or meta dataframes

func: function

function to apply

convert_unit(current, to, factor=None, registry=None, context=None, inplace=False)[source]

Convert all data having current units to new units.

If factor is given, existing values are multiplied by it, and the to units are assigned to the ‘unit’ column.

Otherwise, the pint package is used to convert from current -> to units without an explicit conversion factor. Pint natively handles conversion between any standard (SI) units that have compatible dimensionality, such as exajoule to terawatt-hours, EJ -> TWh, or tonne per year to gram per second, t / yr -> g / sec.

The default registry includes additional unit definitions relevant for integrated assessment models and energy systems analysis, via the iam-units package. This registry can also be accessed directly, using:

.. code-block:: python

from iam_units import registry

When using this registry, current and to may contain the symbols of greenhouse gas (GHG) species, such as ‘CO2e’, ‘C’, ‘CH4’, ‘N2O’, ‘HFC236fa’, etc., as well as lower-case aliases like ‘co2’ supported by pyam. In this case, context must contain ‘gwp_’ followed by the name of a specific global warming potential (GWP) metric supported by iam_units, e.g. ‘gwp_AR5GWP100’.

Rows with units other than current are not altered.

Parameters
currentstr

Current units to be converted.

tostr

New unit (to be converted to) or symbol for target GHG species. If only the GHG species is provided, the units (e.g. Mt / year) will be the same as current, and an expression combining units and species (e.g. ‘Mt CO2e / yr’) will be placed in the ‘unit’ column.

factorvalue, optional

Explicit factor for conversion without pint.

registrypint.UnitRegistry, optional

Specific unit registry to use for conversion. Default: the iam-units registry.

contextstr or pint.Context, optional

(Name of) a pint context to use in conversion. Required when converting between GHG species using GWP metrics, unless the species indicated by current and to are the same.

inplacebool, optional

Whether to return a new IamDataFrame.

Returns
IamDataFrame

If inplace is False.

None

If inplace is True.

Raises
pint.UndefinedUnitError

if attempting a GWP conversion but context is not given.

pint.DimensionalityError

without factor, when current and to are not compatible units.

copy()[source]

Make a deepcopy of this object

See copy.deepcopy() for details.

downscale_region(variable, region='World', subregions=None, proxy=None, weight=None, append=False)[source]

Downscale a timeseries to a number of subregions

Parameters
variablestr or list of str

variable(s) to be downscaled

regionstr, optional

region from which data will be downscaled

subregionslist of str, optional

list of subregions, defaults to all regions other than region (if using proxy) or region index (if using weight)

proxystr, optional

variable (within the IamDataFrame) to be used as proxy for regional downscaling

weightclass:pandas.DataFrame, optional

dataframe with time dimension as columns (year or datetime.datetime) and regions[, model, scenario] as index

appendbool, optional

append the downscaled timeseries to self and return None, else return downscaled data as new IamDataFrame

property empty

Indicator whether this object is empty

equals(other)[source]

Test if two objects contain the same data and meta indicators

This function allows two IamDataFrame instances to be compared against each other to see if they have the same timeseries data and meta indicators. nan’s in the same location of the meta table are considered equal.

Parameters
other: IamDataFrame

the other IamDataFrame to be compared with self

export_meta(excel_writer, sheet_name='meta')[source]

Write the ‘meta’ indicators of this object to an Excel sheet

Parameters
excel_writerstr, path object or ExcelWriter object

any valid string path, pathlib.Path or pandas.ExcelWriter

sheet_namestr

name of sheet which will contain dataframe of ‘meta’ indicators

filter(keep=True, inplace=False, **kwargs)[source]

Return a (copy of a) filtered (downselected) IamDataFrame

Parameters
keepbool, optional

keep all scenarios satisfying the filters (if True) or the inverse

inplacebool, optional

if True, do operation inplace and return None

filters by kwargs:
The following columns are available for filtering:
  • ‘meta’ columns: filter by string value of that column

  • ‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’: string or list of strings, where * can be used as a wildcard

  • ‘level’: the maximum “depth” of IAM variables (number of ‘|’) (excluding the strings given in the ‘variable’ argument)

  • ‘year’: takes an integer (int/np.int64), a list of integers or

    a range. Note that the last year of a range is not included,

    so range(2010, 2015) is interpreted as [2010, …, 2014]

  • arguments for filtering by datetime.datetime or np.datetime64 (‘month’, ‘hour’, ‘time’)

  • ‘regexp=True’ disables pseudo-regexp syntax in pattern_match()

head(*args, **kwargs)[source]

Identical to pandas.DataFrame.head() operating on data

interpolate(time)[source]

Interpolate missing values in timeseries (linear interpolation)

Parameters
timeint or datetime

Year or datetime.datetime to be interpolated. This must match the datetime/year format of self.

line_plot(x='year', y='value', **kwargs)[source]

Plot timeseries lines of existing data

see pyam.plotting.line_plot() for all available options

load_meta(path, *args, **kwargs)[source]

Load ‘meta’ indicators from file

Parameters
pathstr or path object

any valid string path or pathlib.Path

map_regions(map_col, agg=None, copy_col=None, fname=None, region_col=None, remove_duplicates=False, inplace=False)[source]

Plot regional data for a single model, scenario, variable, and year

see pyam.plotting.region_plot() for all available options

Parameters
map_colstr

The column used to map new regions to. Common examples include iso and 5_region.

aggstr, optional

Perform a data aggregation. Options include: sum.

copy_colstr, optional

Copy the existing region data into a new column for later use.

fnamestr, optional

Use a non-default region mapping file

region_colstring, optional

Use a non-default column name for regions to map from.

remove_duplicatesbool, optional

If there are duplicates in the mapping from one regional level to another, then remove these duplicates by counting the most common mapped value. This option is most useful when mapping from high resolution (e.g., model regions) to low resolution (e.g., 5_region).

inplacebool, optional

if True, do operation inplace and return None

models()[source]

Get a list of models

normalize(inplace=False, **kwargs)[source]

Normalize data to a specific data point

Note: Currently only supports normalizing to a specific time.

Parameters
inplacebool, optional

if True, do operation inplace and return None

kwargs

the column and value on which to normalize (e.g., year=2005)

pie_plot(*args, **kwargs)[source]

Plot a pie chart

see pyam.plotting.pie_plot() for all available options

pivot_table(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]

Returns a pivot table

Parameters
indexstr or list of str

rows for Pivot table

columnsstr or list of str

columns for Pivot table

valuesstr, default ‘value’

dataframe column to aggregate or count

aggfuncstr or function, default ‘count’

function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’

fill_valuescalar, default None

value to replace missing values with

stylestr, default None

output style for pivot table formatting accepts ‘highlight_not_max’, ‘heatmap’

regions()[source]

Get a list of regions

rename(mapping=None, inplace=False, append=False, check_duplicates=True, **kwargs)[source]

Rename and aggregate columns using groupby().sum() on values

When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.

Renaming is only applied to any data where a filter matches for all columns given in mapping. Renaming can only be applied to the model and scenario columns, or to other data columns simultaneously.

Parameters
mappingdict or kwargs

mapping of column name to rename-dictionary of that column

dict(<column_name>: {<current_name_1>: <target_name_1>,
                     <current_name_2>: <target_name_2>})

or kwargs as column_name={<current_name_1>: <target_name_1>, …}

inplacebool, default False

if True, do operation inplace and return None

appendbool, default False

append renamed timeseries to self and return None; else return new IamDataFrame

check_duplicates: bool, default True

check whether conflict between existing and renamed data exists. If True, raise ValueError; if False, rename and merge with groupby().sum().

require_variable(variable, unit=None, year=None, exclude_on_fail=False)[source]

Check whether all scenarios have a required variable

Parameters
variablestr

required variable

unitstr, default None

name of unit (optional)

yearint or list, default None

check whether the variable exists for ANY of the years (if a list)

exclude_on_failbool, default False

flag scenarios missing the required variables as exclude: True

reset_exclude()[source]

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)[source]

Plot a scatter chart using meta indicators as columns

see pyam.plotting.scatter() for all available options

scenarios()[source]

Get a list of scenarios

set_meta(meta, name=None, index=None)[source]

Add meta indicators as pandas.Series, list or value (int/float/str)

Parameters
metapandas.Series, list, int, float or str

column to be added to ‘meta’ (by [‘model’, ‘scenario’] index if possible)

namestr, optional

meta column name (defaults to meta pandas.Series.name); either meta.name or the name kwarg must be defined

indexIamDataFrame, pandas.DataFrame or pandas.MultiIndex, optional

index to be used for setting meta column ([‘model’, ‘scenario’])

set_meta_from_data(name, method=None, column='value', **kwargs)[source]

Add meta indicators from downselected timeseries data of self

Parameters
namestr

column name of the ‘meta’ table

methodfunction, optional

method for aggregation (e.g., numpy.max); required if downselected data do not yield unique values

columnstr, optional

the column from data to be used to derive the indicator

kwargs

passed to filter() for downselected data

stack_plot(*args, **kwargs)[source]

Plot timeseries stacks of existing data

see pyam.plotting.stack_plot() for all available options

swap_time_for_year(inplace=False)[source]

Convert the time column to year.

Parameters
inplacebool, default False

if True, do operation inplace and return None

Raises
ValueError

“time” is not a column of self.data

tail(*args, **kwargs)[source]

Identical to pandas.DataFrame.tail() operating on data

timeseries(iamc_index=False)[source]

Returns data as pandas.DataFrame in wide format

Parameters
iamc_indexbool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns

Raises
ValueError

IamDataFrame is empty

ValueError

reducing to IAMC-index yields an index with duplicates

to_csv(path, iamc_index=False, **kwargs)[source]

Write timeseries data of this object to a csv file

Parameters
pathstr or path object

file path or pathlib.Path

iamc_indexbool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns

to_datapackage(path)[source]

Write object to a frictionless Data Package

More information: https://frictionlessdata.io

Returns the saved datapackage.Package (read the docs). When adding metadata (descriptors), please follow the template defined by https://github.com/OpenEnergyPlatform/metadata

Parameters
pathstring or path object

any valid string path or pathlib.Path

to_excel(excel_writer, sheet_name='data', iamc_index=False, include_meta=True, **kwargs)[source]

Write object to an Excel spreadsheet

Parameters
excel_writerstr, path object or ExcelWriter object

any valid string path, pathlib.Path or pandas.ExcelWriter

sheet_namestring

name of sheet which will contain timeseries() data

iamc_indexbool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns

include_metaboolean or string

if True, write ‘meta’ to an Excel sheet name ‘meta’ (default); if this is a string, use it as sheet name

validate(criteria={}, exclude_on_fail=False)[source]

Validate scenarios using criteria on timeseries values

Returns all scenarios which do not match the criteria and prints a log message, or returns None if all scenarios match the criteria.

When called with exclude_on_fail=True, scenarios not satisfying the criteria will be marked as exclude=True.

Parameters
criteriadict
dictionary with variable keys and validation mappings

(‘up’ and ‘lo’ for respective bounds, ‘year’ for years)

exclude_on_failbool, optional

flag scenarios failing validation as exclude: True

variables(include_units=False)[source]

Get a list of variables

Parameters
include_unitsboolean, default False

include the units