The IamDataFrame class

class pyam.IamDataFrame(data, **kwargs)[source]

Scenario timeseries data

The class provides a number of diagnostic features (including validation of data, completeness of variables provided), processing tools (e.g., unit conversion), as well as visualization and plotting tools.

Parameters
dataixmp.Scenario, pd.DataFrame or data file

an instance of an ixmp.Scenario, pandas.DataFrame, or data file with the required data columns. A pandas.DataFrame can have the required data as columns or index. Support is provided additionally for R-style data columns for years, like “X2015”, etc.

kwargs

if value=col, melt column col to ‘value’ and use col name as ‘variable’; or mapping of required columns (IAMC_IDX) to any of the following:

  • one column in data

  • multiple columns, to be concatenated by |

  • a string to be used as value for this column

Notes

When initializing an IamDataFrame from an xlsx file, pyam will per default look for the sheets ‘data’ and ‘meta’ to populate the respective tables. Custom sheet names can be specified with kwargs sheet_name (‘data’) and meta_sheet_name (‘meta’) Calling the class with meta_sheet_name=False will skip the import of the ‘meta’ table.

Attributes
empty

Indicator whether this object is empty

Methods

aggregate(variable[, components, method, append])

Aggregate timeseries components or sub-categories within each region

aggregate_region(variable[, region, …])

Aggregate a timeseries over a number of subregions

append(other[, ignore_meta_conflict, inplace])

Append any IamDataFrame-like object to this object

as_pandas([with_metadata])

Return object as a pandas.DataFrame

bar_plot(*args, **kwargs)

Plot timeseries bars of existing data

categorize(name, value, criteria[, color, …])

Assign scenarios to a category according to specific criteria

check_aggregate(variable[, components, …])

Check whether a timeseries matches the aggregation of its components

check_aggregate_region(variable[, region, …])

Check whether a timeseries matches the aggregation across subregions

check_internal_consistency([components])

Check whether a scenario ensemble is internally consistent

col_apply(col, func, *args, **kwargs)

Apply a function to a column of data or meta

convert_unit(current[, to, factor, …])

Converts a unit using a given factor or the pint package

copy()

Make a deepcopy of this object

downscale_region(variable, proxy[, region, …])

Downscale a timeseries to a number of subregions

equals(other)

Test if two objects contain the same data and meta indicators

export_meta(excel_writer[, sheet_name])

Write the ‘meta’ table of this object to an Excel sheet

export_metadata(excel_writer[, sheet_name])

Deprecated, see export_meta()

filter([keep, inplace])

Return a (copy of a) filtered (downselected) IamDataFrame

head(*args, **kwargs)

Identical to pandas.DataFrame.head() operating on data

interpolate(time)

Interpolate missing values in timeseries (linear interpolation)

line_plot([x, y])

Plot timeseries lines of existing data

load_meta(path, *args, **kwargs)

Load ‘meta’ table from file

load_metadata(path, *args, **kwargs)

Deprecated, see load_meta()

map_regions(map_col[, agg, copy_col, fname, …])

Plot regional data for a single model, scenario, variable, and year

models()

Get a list of models

normalize([inplace])

Normalize data to a specific data point

pie_plot(*args, **kwargs)

Plot a pie chart

pivot_table(index, columns[, values, …])

Returns a pivot table

regions()

Get a list of regions

rename([mapping, inplace, append, …])

Rename and aggregate columns using groupby().sum() on values

require_variable(variable[, unit, year, …])

Check whether all scenarios have a required variable

reset_exclude()

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)

Plot a scatter chart using metadata columns

scenarios()

Get a list of scenarios

set_meta(meta[, name, index])

Add meta indicators as pandas.Series, list or value (int/float/str)

set_meta_from_data(name[, method, column])

Add metadata indicators from downselected timeseries data of self

stack_plot(*args, **kwargs)

Plot timeseries stacks of existing data

swap_time_for_year([inplace])

Convert the time column to year.

tail(*args, **kwargs)

Identical to pandas.DataFrame.tail() operating on data

timeseries([iamc_index])

Returns ‘data’ as pandas.DataFrame in wide format (time as columns)

to_csv(path[, iamc_index])

Write timeseries data of this object to a csv file

to_datapackage(path)

Write object to a frictionless Data Package

to_excel(excel_writer[, sheet_name, …])

Write object to an Excel spreadsheet

validate([criteria, exclude_on_fail])

Validate scenarios using criteria on timeseries values

variables([include_units])

Get a list of variables

aggregate(variable, components=None, method='sum', append=False)[source]

Aggregate timeseries components or sub-categories within each region

Parameters
variablestr or list of str

variable(s) for which the aggregate will be computed

componentslist of str, default None

list of variables to aggregate, defaults to all sub-categories of variable

methodfunc or str, default ‘sum’

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

appendbool, default False

append the aggregate timeseries to self and return None, else return aggregate timeseries

aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, append=False)[source]

Aggregate a timeseries over a number of subregions

This function allows to add variable sub-categories that are only defined at the region level by setting components=True

Parameters
variablestr or list of str

variable(s) to be aggregated

regionstr, default ‘World’

region to which data will be aggregated

subregionslist of str

list of subregions, defaults to all regions other than region

components: bool or list of str, default False

variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables

methodfunc or str, default ‘sum’

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

weightstr, default None

variable to use as weight for the aggregation (currently only supported with method=’sum’)

appendbool, default False

append the aggregate timeseries to self and return None, else return aggregate timeseries

append(other, ignore_meta_conflict=False, inplace=False, **kwargs)[source]

Append any IamDataFrame-like object to this object

Columns in other.meta that are not in self.meta are always merged, duplicate region-variable-unit-year rows raise a ValueError.

Parameters
otherIamDataFrame, ixmp.Scenario, pandas.DataFrame or data file

any object castable as IamDataFrame to be appended

ignore_meta_conflictbool, default False

if False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.

inplacebool, default False

if True, do operation inplace and return None

kwargsinitializing other as IamDataFrame

passed to IamDataFrame(other, **kwargs)

as_pandas(with_metadata=False)[source]

Return object as a pandas.DataFrame

Parameters
with_metadatabool or dict, default False

if True, join data with all meta columns; if a dict, discover meaningful meta columns from values (in key-value)

bar_plot(*args, **kwargs)[source]

Plot timeseries bars of existing data

see pyam.plotting.bar_plot() for all available options

categorize(name, value, criteria, color=None, marker=None, linestyle=None)[source]

Assign scenarios to a category according to specific criteria

Parameters
namestr

column name of the ‘meta’ table

valuestr

category identifier

criteriadict

dictionary with variables mapped to applicable checks (‘up’ and ‘lo’ for respective bounds, ‘year’ for years - optional)

colorstr

assign a color to this category for plotting

markerstr

assign a marker to this category for plotting

linestylestr

assign a linestyle to this category for plotting

check_aggregate(variable, components=None, method='sum', exclude_on_fail=False, multiplier=1, **kwargs)[source]

Check whether a timeseries matches the aggregation of its components

Parameters
variablestr or list of str

variable(s) checked for matching aggregation of sub-categories

componentslist of str, default None

list of variables, defaults to all sub-categories of variable

methodfunc or str, default ‘sum’

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

exclude_on_failboolean, default False

flag scenarios failing validation as exclude: True

multipliernumber, default 1

factor when comparing variable and sum of components

kwargsarguments for comparison of values

passed to numpy.isclose()

check_aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, exclude_on_fail=False, **kwargs)[source]

Check whether a timeseries matches the aggregation across subregions

Parameters
variablestr or list of str

variable(s) to be checked for matching aggregation of subregions

regionstr, default ‘World’

region to be checked for matching aggregation of subregions

subregionslist of str

list of subregions, defaults to all regions other than region

componentsbool or list of str, default False

variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables

methodfunc or str, default ‘sum’

method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’

weightstr, default None

variable to use as weight for the aggregation (currently only supported with method=’sum’)

exclude_on_failboolean, default False

flag scenarios failing validation as exclude: True

kwargsarguments for comparison of values

passed to numpy.isclose()

check_internal_consistency(components=False, **kwargs)[source]

Check whether a scenario ensemble is internally consistent

We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a DataFrame of inconsistent variables is returned.

Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we store how the regions relate, see this issue.

Parameters
kwargsarguments for comparison of values

passed to numpy.isclose()

componentsbool, default False

passed to check_aggregate_region() if True, use all sub-categories of each variable included in World but not in any of the subregions; if False, only aggregate variables over subregions

col_apply(col, func, *args, **kwargs)[source]

Apply a function to a column of data or meta

Parameters
col: str

column in either data or meta dataframes

func: function

function to apply

convert_unit(current, to=None, factor=None, registry=None, context=None, inplace=False)[source]

Converts a unit using a given factor or the pint package

The pint package natively handles conversion of standard (SI) units (e.g., exajoule to terawatt-hours, EJ -> TWh). It can also parse combined units (e.g., exajoule per year, EJ/yr).

The pint.UnitRegistry used by default loads additional unit definitions relevant for integrated assessment models and energy systems analysis from the IAMconsortium/units repository. You can access that unit registry via pint.get_application_registry().

Parameters
currentstr (or mapping, deprecated)

name of current unit (to be converted from)

tostr

name of new unit (to be converted to)

factorvalue, optional

conversion factor if given, otherwise defaults to the application UnitRegistry

registrypint.UnitRegistry, optional

use a specific :class:´pint.UnitRegistry`; if None, use default application registry with definitions imported from the IAMconsortium/units repository

contextstr, optional

passed to the UnitRegistry

inplacebool, default False

if True, do operation inplace and return None

copy()[source]

Make a deepcopy of this object

See copy.deepcopy() for details.

downscale_region(variable, proxy, region='World', subregions=None, append=False)[source]

Downscale a timeseries to a number of subregions

Parameters
variablestr or list of str

variable(s) to be downscaled

proxystr

variable to be used as proxy (i.e, weight) for the downscaling

regionstr, default ‘World’

region from which data will be downscaled

subregionslist of str

list of subregions, defaults to all regions other than region

appendbool, default False

append the downscaled timeseries to self and return None, else return downscaled data as new IamDataFrame

property empty

Indicator whether this object is empty

equals(other)[source]

Test if two objects contain the same data and meta indicators

This function allows two IamDataFrame instances to be compared against each other to see if they have the same timeseries data and meta indicators. nan’s in the same location of the meta table are considered equal.

Parameters
other: IamDataFrame

the other IamDataFrame to be compared with self

export_meta(excel_writer, sheet_name='meta')[source]

Write the ‘meta’ table of this object to an Excel sheet

Parameters
excel_writer: str, path object or ExcelWriter object

any valid string path, pathlib.Path or pandas.ExcelWriter

sheet_name: str

name of sheet which will contain ‘meta’ table

export_metadata(excel_writer, sheet_name='meta')[source]

Deprecated, see export_meta()

filter(keep=True, inplace=False, **kwargs)[source]

Return a (copy of a) filtered (downselected) IamDataFrame

Parameters
keep: bool, default True

keep all scenarios satisfying the filters (if True) or the inverse

inplace: bool, default False

if True, do operation inplace and return None

filters by kwargs:
The following columns are available for filtering:
  • ‘meta’ columns: filter by string value of that column

  • ‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’: string or list of strings, where * can be used as a wildcard

  • ‘level’: the maximum “depth” of IAM variables (number of ‘|’) (excluding the strings given in the ‘variable’ argument)

  • ‘year’: takes an integer, a list of integers or a range note that the last year of a range is not included, so range(2010, 2015) is interpreted as [2010, …, 2014]

  • arguments for filtering by datetime.datetime (‘month’, ‘hour’, ‘time’)

  • ‘regexp=True’ disables pseudo-regexp syntax in pattern_match()

head(*args, **kwargs)[source]

Identical to pandas.DataFrame.head() operating on data

interpolate(time)[source]

Interpolate missing values in timeseries (linear interpolation)

Parameters
timeint or datetime

Year or datetime.datetime to be interpolated. This must match the datetime/year format of self.

line_plot(x='year', y='value', **kwargs)[source]

Plot timeseries lines of existing data

see pyam.plotting.line_plot() for all available options

load_meta(path, *args, **kwargs)[source]

Load ‘meta’ table from file

Parameters
path: str or path object

any valid string path or pathlib.Path

load_metadata(path, *args, **kwargs)[source]

Deprecated, see load_meta()

map_regions(map_col, agg=None, copy_col=None, fname=None, region_col=None, remove_duplicates=False, inplace=False)[source]

Plot regional data for a single model, scenario, variable, and year

see pyam.plotting.region_plot() for all available options

Parameters
map_colstr

The column used to map new regions to. Common examples include iso and 5_region.

aggstr, optional

Perform a data aggregation. Options include: sum.

copy_colstr, optional

Copy the existing region data into a new column for later use.

fnamestr, optional

Use a non-default region mapping file

region_colstring, optional

Use a non-default column name for regions to map from.

remove_duplicatesbool, optional, default: False

If there are duplicates in the mapping from one regional level to another, then remove these duplicates by counting the most common mapped value. This option is most useful when mapping from high resolution (e.g., model regions) to low resolution (e.g., 5_region).

inplacebool, default False

if True, do operation inplace and return None

models()[source]

Get a list of models

normalize(inplace=False, **kwargs)[source]

Normalize data to a specific data point

Note: Currently only supports normalizing to a specific time.

Parameters
inplacebool, default False

if True, do operation inplace and return None

kwargs

the column and value on which to normalize (e.g., year=2005)

pie_plot(*args, **kwargs)[source]

Plot a pie chart

see pyam.plotting.pie_plot() for all available options

pivot_table(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]

Returns a pivot table

Parameters
indexstr or list of str

rows for Pivot table

columnsstr or list of str

columns for Pivot table

valuesstr, default ‘value’

dataframe column to aggregate or count

aggfuncstr or function, default ‘count’

function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’

fill_valuescalar, default None

value to replace missing values with

stylestr, default None

output style for pivot table formatting accepts ‘highlight_not_max’, ‘heatmap’

regions()[source]

Get a list of regions

rename(mapping=None, inplace=False, append=False, check_duplicates=True, **kwargs)[source]

Rename and aggregate columns using groupby().sum() on values

When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.

Renaming is only applied to any data where a filter matches for all columns given in mapping. Renaming can only be applied to the model and scenario columns, or to other data columns simultaneously.

Parameters
mappingdict or kwargs

mapping of column name to rename-dictionary of that column

dict(<column_name>: {<current_name_1>: <target_name_1>,
                     <current_name_2>: <target_name_2>})

or kwargs as column_name={<current_name_1>: <target_name_1>, …}

inplacebool, default False

if True, do operation inplace and return None

appendbool, default False

append renamed timeseries to self and return None; else return new IamDataFrame

check_duplicates: bool, default True

check whether conflict between existing and renamed data exists. If True, raise ValueError; if False, rename and merge with groupby().sum().

require_variable(variable, unit=None, year=None, exclude_on_fail=False)[source]

Check whether all scenarios have a required variable

Parameters
variablestr

required variable

unitstr, default None

name of unit (optional)

yearint or list, default None

check whether the variable exists for ANY of the years (if a list)

exclude_on_failbool, default False

flag scenarios missing the required variables as exclude: True

reset_exclude()[source]

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)[source]

Plot a scatter chart using metadata columns

see pyam.plotting.scatter() for all available options

scenarios()[source]

Get a list of scenarios

set_meta(meta, name=None, index=None)[source]

Add meta indicators as pandas.Series, list or value (int/float/str)

Parameters
metapandas.Series, list, int, float or str

column to be added to ‘meta’ (by [‘model’, ‘scenario’] index if possible)

namestr, optional

meta column name (defaults to meta pandas.Series.name); either meta.name or the name kwarg must be defined

indexIamDataFrame, pandas.DataFrame or pandas.MultiIndex, optional

index to be used for setting meta column ([‘model’, ‘scenario’])

set_meta_from_data(name, method=None, column='value', **kwargs)[source]

Add metadata indicators from downselected timeseries data of self

Parameters
namestr

column name of the ‘meta’ table

methodfunction, optional

method for aggregation (e.g., numpy.max); required if downselected data do not yield unique values

columnstr, default ‘value’

the column from ‘data’ to be used to derive the indicator

kwargs

passed to filter() for downselected data

stack_plot(*args, **kwargs)[source]

Plot timeseries stacks of existing data

see pyam.plotting.stack_plot() for all available options

swap_time_for_year(inplace=False)[source]

Convert the time column to year.

Parameters
inplacebool, default False

if True, do operation inplace and return None

Raises
ValueError

“time” is not a column of self.data

tail(*args, **kwargs)[source]

Identical to pandas.DataFrame.tail() operating on data

timeseries(iamc_index=False)[source]

Returns ‘data’ as pandas.DataFrame in wide format (time as columns)

Parameters
iamc_indexbool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns

Raises
ValueError

IamDataFrame is empty

ValueError

reducing to IAMC-index yields an index with duplicates

to_csv(path, iamc_index=False, **kwargs)[source]

Write timeseries data of this object to a csv file

Parameters
path: str r path object

file path or pathlib.Path

iamc_index: bool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns

to_datapackage(path)[source]

Write object to a frictionless Data Package

More information: https://frictionlessdata.io

Returns the saved datapackage.Package (read the docs). When adding metadata (descriptors), please follow the template defined by https://github.com/OpenEnergyPlatform/metadata

Parameters
path: string or path object

any valid string path or pathlib.Path

to_excel(excel_writer, sheet_name='data', iamc_index=False, include_meta=True, **kwargs)[source]

Write object to an Excel spreadsheet

Parameters
excel_writer: str, path object or ExcelWriter object

any valid string path, pathlib.Path or pandas.ExcelWriter

sheet_name: string

name of sheet which will contain timeseries() data

iamc_index: bool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns

include_meta: boolean or string

if True, write ‘meta’ to an Excel sheet name ‘meta’ (default); if this is a string, use it as sheet name

validate(criteria={}, exclude_on_fail=False)[source]

Validate scenarios using criteria on timeseries values

Returns all scenarios which do not match the criteria and prints a log message, or returns None if all scenarios match the criteria.

When called with exclude_on_fail=True, scenarios not satisfying the criteria will be marked as exclude=True.

Parameters
criteriadict
dictionary with variable keys and validation mappings

(‘up’ and ‘lo’ for respective bounds, ‘year’ for years)

exclude_on_failbool, default False

flag scenarios failing validation as exclude: True

variables(include_units=False)[source]

Get a list of variables

Parameters
include_unitsboolean, default False

include the units