The IamDataFrame class¶

class pyam.IamDataFrame(data, meta=None, index=['model', 'scenario'], **kwargs)[source]¶

Scenario timeseries data and meta indicators

The class provides a number of diagnostic features (including validation of data, completeness of variables provided), processing tools (e.g., unit conversion), as well as visualization and plotting tools.

Parameters:

datapandas.DataFrame, pathlib.Path or file-like object

Scenario timeseries data following the IAMC data format or a supported variation as pandas object or a path to a file.

metapandas.DataFrame, optional

A dataframe with suitable ‘meta’ indicators in wide (indicator as column name) or long (key/value columns) format. The dataframe will be downselected to scenarios present in data.

indexlist, optional

Columns to use as index names.

**kwargs

If value=<col>, melt column <col> to ‘value’ and use <col> name as ‘variable’; or mapping of required columns (IAMC_IDX) to any of the following:

one column in data
multiple columns, to be concatenated by |
a string to be used as value for this column

Attributes:

compute: Access to advanced computation methods, see IamComputeAccessor
coordinates: Return the list of data coordinates (columns not including index names)
data: Return the timeseries data as a long pandas.DataFrame
dimensions: Return the list of data columns (index names & data coordinates)
empty: Indicator whether this object is empty
exclude: Indicator for exclusion of scenarios, used by validation methods
index: Return all model-scenario combinations as pandas.MultiIndex
model: Return the list of (unique) model names
region: Return the list of (unique) regions
scenario: Return the list of (unique) scenario names
time: The time index, i.e., axis labels related to the time domain
time_domain: Indicator of the time domain: ‘year’, ‘datetime’, or ‘mixed’
unit: Return the list of (unique) units
unit_mapping: Return a dictionary of variables to (list of) corresponding units
variable: Return the list of (unique) variables

Methods

`add`(a, b, name[, axis, fillna, ...])	Add timeseries data items a and b along an axis
`aggregate`(variable[, components, method, ...])	Aggregate timeseries data by components or subcategories within each region.
`aggregate_kyoto_ghg`(*, metric[, append])	Compute the aggregate Kyoto gases from a set of species using a GWP metric
`aggregate_region`(variable[, region, ...])	Aggregate timeseries data by subregions.
`aggregate_subannual`(variable, *[, value, ...])	Aggregate timeseries data by subannual time resolution.
`aggregate_time`(args, *kwargs)	Aggregate timeseries data by subannual time resolution.
`append`(other[, ignore_meta_conflict, ...])	Append any IamDataFrame-like object to this object
`apply`(func, name[, axis, fillna, append, args])	Apply a function to components of timeseries data along an axis
`as_pandas`([meta_cols])	Return object as a pandas.DataFrame
`categorize`(name, value, *[, upper_bound, ...])	Assign meta indicator to all scenarios that meet given validation criteria
`check_aggregate`(variable[, components, ...])	Check whether timeseries data matches the aggregation by its components.
`check_aggregate_region`(variable[, region, ...])	Check whether timeseries data matches the aggregation across subregions.
`check_internal_consistency`([components])	Check whether a scenario ensemble is internally consistent.
`col_apply`(col, func, args, *kwargs)	Apply a function to a column of data or meta
`convert_unit`(current, to[, factor, ...])	Convert all timeseries data having current units to new units.
`copy`()	Make a deepcopy of this object
`diff`(mapping[, periods, append])	Compute the difference of timeseries data along the time dimension
`divide`(a, b, name[, axis, fillna, ...])	Divide the timeseries data items a and b along an axis
`downscale_region`(variable[, region, ...])	Downscale timeseries data to a number of subregions.
`equals`(other)	Test if two objects contain the same data and meta indicators
`export_meta`(excel_writer[, sheet_name])	Write the 'meta' indicators of this object to an Excel spreadsheet
`filter`(*[, keep, inplace])	Return a (copy of a) filtered (downselected) IamDataFrame
`get_data_column`(column)	Return a column from the timeseries data in long format
`head`(args, *kwargs)	Identical to `pandas.DataFrame.head()` operating on data
`info`([n, meta_rows, memory_usage])	Print a summary of the object index dimensions and meta indicators
`interpolate`(time[, inplace])	Interpolate missing values in the timeseries data
`load_meta`(path[, sheet_name, ignore_conflict])	Load 'meta' indicators from file
`multiply`(a, b, name[, axis, fillna, ...])	Multiply timeseries data items a and b along an axis
`normalize`([inplace])	Normalize data to a specific data point
`offset`([padding, fill_value, inplace])	Compute new data which is offset from a specific data point
`pivot_table`(index, columns[, values, ...])	Returns a pivot table
`rename`([mapping, inplace, append, ...])	Rename any index dimension or data coordinate.
`require_data`([region, variable, unit, year, ...])	Check whether scenarios have values for all (combinations of) given elements.
`set_meta`(meta[, name, index])	Add meta indicators as pandas.Series, list or value (int/float/str)
`set_meta_from_data`(name[, method, column])	Add meta indicators from downselected timeseries data
`slice`(*[, keep])	Return a (filtered) slice object of the IamDataFrame timeseries data index
`sort_data`([inplace])	Sort timeseries data by index and coordinates
`subtract`(a, b, name[, axis, fillna, ...])	Compute the difference of timeseries data items a and b along an axis
`swap_time_for_year`([subannual, inplace])	Convert the time dimension to year (as integer).
`swap_year_for_time`([inplace])	Convert the year and subannual dimensions to time (as datetime).
`tail`(args, *kwargs)	Identical to `pandas.DataFrame.tail()` operating on data
`timeseries`([iamc_index])	Returns data as `pandas.DataFrame` in wide format
`to_csv`([path, iamc_index])	Write `IamDataFrame.timeseries()` to a comma-separated values (csv) file
`to_datapackage`(path)	Write object to a frictionless Data Package
`to_excel`(excel_writer[, sheet_name, ...])	Write object to an Excel spreadsheet
`to_ixmp4`(platform[, checkpoint_message])	Save all scenarios as new default runs in an ixmp4.Platform instance
`to_netcdf`(path)	Write object to a NetCDF file
`to_xarray`()	Convert object to an `xarray.Dataset`
`validate`(*[, upper_bound, lower_bound, ...])	Validate scenarios using bounds on (filtered) timeseries 'data' values.

Notes

A pandas.DataFrame can have the required dimensions as columns or index. R-style integer column headers (i.e., X2015) are acceptable.

When initializing an IamDataFrame from an xlsx file, pyam will per default parse all sheets starting with ‘data’ or ‘Data’ for timeseries and a sheet ‘meta’ to populate the respective table. Sheet names can be specified with kwargs sheet_name (‘data’) and meta_sheet_name (‘meta’), where values can be a string or a list and ‘*’ is interpreted as a wildcard. Calling the class with meta_sheet_name=False will skip the import of the ‘meta’ table.

When initializing an IamDataFrame from an object that is already an IamDataFrame instance, the new object will be hard-linked to all attributes of the original object - so any changes on one object (e.g., with inplace=True) may also modify the other object! This is intended behaviour and consistent with pandas but may be confusing for those who are not used to the pandas/Python universe.

add(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶

Add timeseries data items a and b along an axis

This function computes a + b. If a or b are lists, the method applies pandas.groupby().sum() on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.

Parameters:

a, bstr, list of str or a number: Items to be used for the addition.
namestr: Name of the computed timeseries data on the axis.
axisstr, optional: Axis along which to compute.
fillnadict or scalar, optional: Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form {arg: default}.
ignore_unitsbool or str, optional: Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
appendbool, optional: Whether to append aggregated timeseries data to this instance.

Returns:

IamDataFrame or None: Computed timeseries data or None if append=True.

See also

subtract, multiply, divide
apply: Apply a custom function on the timeseries data along any axis.
aggregate: Aggregate timeseries data along the variable hierarchy.
aggregate_region: Aggregate timeseries data along the region dimension.

Notes

This function uses the pint package and the iam-units registry (read the docs) to handle units. pyam will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units '{:~}'.format(u) (see here for more information).

As a result, the notation of returned units may differ from the input format. For example, the unit EJ/yr may be reformatted to EJ / a.

aggregate(variable, components=None, method='sum', recursive=False, append=False)[source]¶

Aggregate timeseries data by components or subcategories within each region.

Parameters:

variablestr or list of str: Variable(s) for which the aggregate will be computed.
componentslist of str, optional: Components to be aggregate, defaults to all subcategories of variable.
methodfunc or str, optional: Aggregation method, e.g. numpy.mean, numpy.sum, ‘min’, ‘max’.
recursivebool or str, optional: Iterate recursively (bottom-up) over all subcategories of variable. If there are existing intermediate variables, it validates the aggregated value. If recursive=’skip-validate’, it skips the validation.
appendbool, optional: Whether to append aggregated timeseries data to this instance.

Returns:

IamDataFrame or None: Aggregated timeseries data or None if append=True.

See also

add: Add timeseries data items along an axis.
aggregate_region: Aggregate timeseries data along the region dimension.

Notes

The aggregation function interprets any missing values (numpy.nan) for individual components as 0.

aggregate_kyoto_ghg(*, metric: str, append: bool = False)[source]¶

Compute the aggregate Kyoto gases from a set of species using a GWP metric

This method aggregates Kyoto gases following the variable template from common-definitons. Emissions of CO2, CH4 and N2O are required.

metric: str: A global warming potential (GWP) metric supported by iam_units, e.g. ‘AR6GWP100’.
appendbool, optional: Append the aggregate emissions timeseries to self and return None, else return aggregated emissions timeseries as new IamDataFrame.

Raises:

ValueError: If any of the required species CO2, CH4 and N2O are missing.

See also

pyam.IamDataFrame.convert_unit
pyam.emissions.ALL_KYOTO_SPECIES

aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, append=False, drop_negative_weights=True)[source]¶

Aggregate timeseries data by subregions.

This function allows to add variable sub-categories that are only defined at the region level by setting components=True.

Parameters:

variablestr or list of str: Variable(s) to be aggregated.
regionstr, optional: Region to which data will be aggregated
subregionslist of str, optional: List of subregions, defaults to all regions other than region.
componentsbool or list of str, optional: Variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables.
methodfunc or str, optional: Method to use for aggregation, e.g. numpy.mean, numpy.sum, ‘min’, ‘max’.
weightstr, optional: Variable to use as weight for the aggregation (currently only supported with method=’sum’).
appendbool, optional: Append the aggregate timeseries to self and return None, else return aggregate timeseries as new IamDataFrame.
drop_negative_weightsbool, optional: Removes any aggregated values that are computed using negative weights.

Returns:

IamDataFrame or None: Aggregated timeseries data or None if append=True.

See also

add: Add timeseries data items a and b along an axis
aggregate: Aggregate timeseries data along the variable hierarchy.
nomenclature.RegionProcessor: Processing of model-specific region-mappings.

Notes

The nomenclature-iamc package supports structured processing of many-to-many-region mappings. Read the user guide for more information.

Aggregate timeseries data by subannual time resolution.

Parameters:

variablestr or list of str: Variable(s) to be aggregated
valuestr, optional: Name of the aggregated timeslice (default None for yearly aggregate)
componentslist of str, optional: Subannual timeslices to be aggregated; defaults to all subannual timeslices other than value
methodfunc or str, optional: Method to use for aggregation, e.g. numpy.mean(), numpy.sum(), ‘min’, ‘max’
appendbool, optional: Append the aggregate timeseries to self and return None, else return aggregate timeseries as new IamDataFrame

aggregate_time(*args, **kwargs)[source]¶

Aggregate timeseries data by subannual time resolution.

This method is deprecated, please use aggregate_subannual() instead.

append(other, ignore_meta_conflict=False, inplace=False, verify_integrity=True, **kwargs)[source]¶

Append any IamDataFrame-like object to this object

Indicators in other.meta that are not in self.meta are merged. Missing values are set to NaN. Conflicting data rows always raise a ValueError.

Parameters:

otherIamDataFrame, pandas.DataFrame or file-like: Any object castable as IamDataFrame to be appended
ignore_meta_conflictbool, optional: If False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.
inplacebool, optional: If True, do operation inplace and return None
verify_integritybool, optional: If True, verify integrity of index
**kwargs: Passed to IamDataFrame(other, **kwargs) if other is not already an IamDataFrame

Returns:

IamDataFrame: If inplace is False.
None: If inplace is True.

Raises:

ValueError: If time domain or other timeseries data index dimension don’t match.

apply(func, name, axis='variable', fillna=None, append=False, args=(), **kwargs)[source]¶

Apply a function to components of timeseries data along an axis

This function computes a function func using timeseries data selected along an axis downselected by keyword arguments. The length of components needs to match the number of required arguments of func.

Parameters:

funcfunction: Function to apply to components along axis.
namestr: Name of the computed timeseries data on the axis.
axisstr, optional: Axis along which to compute.
fillnadict or scalar, optional: Value to fill holes when rows are not defined for items in args or kwds. Can be a scalar or a dictionary of the form {kwd: default}.
appendbool, optional: Whether to append aggregated timeseries data to this instance.
argstuple or list of str: List of variables to pass as positional arguments to func.
**kwargs: Additional keyword arguments to pass as keyword arguments to func. If the name of a variable is given, the associated timeseries is passed. Otherwise the value itself is passed.

Returns:

IamDataFrame or None: Computed timeseries data or None if append=True.

See also

add, subtract, multiply, divide, diff

Notes

This function uses the pint package and the iam-units registry (read the docs) to handle units. pyam uses abbreviated units '{:~}'.format(u) (see here for more information).

As a result, the notation of returned units may differ from the input format. For example, the unit EJ/yr may be reformatted to EJ / a.

as_pandas(meta_cols=True)[source]¶

Return object as a pandas.DataFrame

Parameters:

meta_colslist, optional: join data with all meta columns if True (default) or only with columns in list, or return copy of data if False

categorize(name, value, *, upper_bound: float | None = None, lower_bound: float | None = None, color=None, marker=None, linestyle=None, **kwargs)[source]¶

Assign meta indicator to all scenarios that meet given validation criteria

Parameters:

namestr: Name of the meta indicator.
valuestr: Value of the meta indicator.
upper_bound, lower_boundfloat, optional: Upper and lower bounds for validation criteria of timeseries data.
colorstr, optional: Assign a color to this category for plotting.
markerstr, optional: Assign a marker to this category for plotting.
linestylestr, optional: Assign a linestyle to this category for plotting.
**kwargs: Passed to slice() to downselect datapoints for validation.

See also

validate

check_aggregate(variable, components=None, method='sum', exclude_on_fail=False, multiplier=1, **kwargs)[source]¶

Check whether timeseries data matches the aggregation by its components.

Parameters:

variablestr or list of str: Variable(s) checked for matching aggregation of sub-categories.
componentslist of str, optional: List of variables to aggregate, defaults to sub-categories of variable.
methodfunc or str, optional: Method to use for aggregation, e.g. numpy.mean, numpy.sum, ‘min’, ‘max’.
exclude_on_failbool, optional: If True, set exclude = True for all scenarios where the aggregate does not match the aggregated components.
multipliernumber, optional: Multiplicative factor when comparing variable and sum of components.
**kwargsTolerance arguments for comparison of values: Passed to numpy.isclose().

Returns:

pandas.DataFrame or None: Data where variables and aggregate does not match the aggregated components.

check_aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, exclude_on_fail=False, drop_negative_weights=True, **kwargs)[source]¶

Check whether timeseries data matches the aggregation across subregions.

Parameters:

variablestr or list of str: Variable(s) to be checked for matching aggregation of subregions.
regionstr, optional: Region to be checked for matching aggregation of subregions.
subregionslist of str, optional: List of subregions, defaults to all regions other than region.
componentsbool or list of str, optional: Variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables.
methodfunc or str, optional: Method to use for aggregation, e.g. numpy.mean, numpy.sum, ‘min’, ‘max’.
weightstr, optional: Variable to use as weight for the aggregation (currently only supported with method=’sum’).
exclude_on_failboolean, optional: If True, set exclude = True for all scenarios where the aggregate does not match the aggregated components.
drop_negative_weightsbool, optional: Removes any aggregated values that are computed using negative weights
**kwargsTolerance arguments for comparison of values: Passed to numpy.isclose().

Returns:

pandas.DataFrame or None: Data where variables and region-aggregate does not match.

check_internal_consistency(components=False, **kwargs)[source]¶

Check whether a scenario ensemble is internally consistent.

We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a DataFrame of inconsistent variables is returned.

Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we store how the regions relate, see this issue.

Parameters:

**kwargsarguments for comparison of values: passed to numpy.isclose()
componentsbool, optional: passed to check_aggregate_region() if True, use all sub-categories of each variable included in World but not in any of the subregions; if False, only aggregate variables over subregions

col_apply(col, func, *args, **kwargs)[source]¶

Apply a function to a column of data or meta

Parameters:

col: str: column in either data or meta dataframes
func: function: function to apply

property compute¶: Access to advanced computation methods, see IamComputeAccessor

convert_unit(current, to, factor=None, registry=None, context=None, inplace=False)[source]¶

Convert all timeseries data having current units to new units.

If factor is given, existing values are multiplied by it, and the to units are assigned to the ‘unit’ column.

Otherwise, the pint package is used to convert from current -> to units without an explicit conversion factor. Pint natively handles conversion between any standard (SI) units that have compatible dimensionality, such as exajoule to terawatt-hours, EJ -> TWh, or tonne per year to gram per second, t / yr -> g / sec.

The default registry includes additional unit definitions relevant for integrated assessment models and energy systems analysis, via the iam-units package. This registry can also be accessed directly, using:

from iam_units import registry

When using this registry, current and to may contain the symbols of greenhouse gas (GHG) species, such as ‘CO2e’, ‘C’, ‘CH4’, ‘N2O’, ‘HFC236fa’, etc., as well as lower-case aliases like ‘co2’ supported by pyam. In this case, context must be the name of a specific global warming potential (GWP) metric supported by iam_units, e.g. ‘AR5GWP100’ (optionally prefixed by ‘gwp_’, e.g. ‘gwp_AR5GWP100’).

Rows with units other than current are not altered.

Parameters:

currentstr: Current units to be converted.
tostr: New unit (to be converted to) or symbol for target GHG species. If only the GHG species is provided, the units (e.g. Mt / year) will be the same as current, and an expression combining units and species (e.g. ‘Mt CO2e / yr’) will be placed in the ‘unit’ column.
factorvalue, optional: Explicit factor for conversion without pint.
registrypint.UnitRegistry, optional: Specific unit registry to use for conversion. Default: the iam-units registry.
contextstr or pint.Context, optional: (Name of) the context to use in conversion. Required when converting between GHG species using GWP metrics, unless the species indicated by current and to are the same.
inplacebool, optional: Whether to return a new IamDataFrame.

Returns:

IamDataFrame: If inplace is False.
None: If inplace is True.

Raises:

pint.UndefinedUnitError: if attempting a GWP conversion but context is not given.
pint.DimensionalityError: without factor, when current and to are not compatible units.

property coordinates¶: Return the list of data coordinates (columns not including index names)

copy()[source]¶

Make a deepcopy of this object

See copy.deepcopy() for details.

property data¶: Return the timeseries data as a long pandas.DataFrame

diff(mapping, periods=1, append=False)[source]¶

Compute the difference of timeseries data along the time dimension

This methods behaves as if applying pandas.DataFrame.diff() on the timeseries data in wide format. By default, the diff-value in period t is computed as x[t] - x[t-1].

Parameters:

mappingdict

Mapping of variable item(s) to the name(s) of the diff-ed timeseries data, e.g.,

{"current variable": "name of diff-ed variable", ...}

periodsint, optional

Periods to shift for calculating difference, accepts negative values; passed to pandas.DataFrame.diff().

appendbool, optional

Whether to append computed timeseries data to this instance.

Returns:

IamDataFrame or None: Computed timeseries data or None if append=True.

See also

subtract, apply, interpolate

Notes

This method behaves as if applying pandas.DataFrame.diff() by row in a wide data format, so the difference is computed on the previous existing value. This can lead to unexpected results if the data has inconsistent period lengths.

Use the following to ensure that no missing values exist prior to computing the difference:

df.interpolate(time=df.year)

property dimensions¶: Return the list of data columns (index names & data coordinates)

divide(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶

Divide the timeseries data items a and b along an axis

This function computes a / b. If a or b are lists, the method applies pandas.groupby().sum() on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.

Parameters:

a, bstr, list of str or a number: Numerator and denomanitor for the computation.
namestr: Name of the computed timeseries data on the axis.
axisstr, optional: Axis along which to compute.
fillnadict or scalar, optional: Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form {arg: default}.
ignore_unitsbool or str, optional: Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
appendbool, optional: Whether to append aggregated timeseries data to this instance.

Returns:

IamDataFrame or None: Computed timeseries data or None if append=True.

See also

add, subtract, multiply
apply: Apply a custom function on the timeseries data along any axis.

Notes

This function uses the pint package and the iam-units registry (read the docs) to handle units. pyam will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units '{:~}'.format(u) (see here for more information).

As a result, the notation of returned units may differ from the input format. For example, the unit EJ/yr may be reformatted to EJ / a.

downscale_region(variable, region='World', subregions=None, proxy=None, weight=None, append=False)[source]¶

Downscale timeseries data to a number of subregions.

Parameters:

variablestr or list of str: variable(s) to be downscaled
regionstr, optional: region from which data will be downscaled
subregionslist of str, optional: list of subregions, defaults to all regions other than region (if using proxy) or region index (if using weight)
proxystr, optional: variable (within the IamDataFrame) to be used as proxy for regional downscaling
weightclass:pandas.DataFrame, optional: dataframe with time dimension as columns (year or datetime.datetime) and regions[, model, scenario] as index
appendbool, optional: append the downscaled timeseries to self and return None, else return downscaled data as new IamDataFrame

property empty¶: Indicator whether this object is empty

equals(other)[source]¶

Test if two objects contain the same data and meta indicators

This function allows two IamDataFrame instances to be compared against each other to see if they have the same timeseries data and meta indicators. nan’s in the same location of the meta table are considered equal.

Parameters:

otherIamDataFrame: The other IamDataFrame to be compared with self

property exclude¶

Indicator for exclusion of scenarios, used by validation methods

See also

validate, require_data, check_aggregate, check_aggregate_region

export_meta(excel_writer, sheet_name='meta', **kwargs)[source]¶

Write the ‘meta’ indicators of this object to an Excel spreadsheet

Parameters:

excel_writerstr, path object or ExcelWriter object: File path, pathlib.Path, or existing pandas.ExcelWriter.
sheet_namestr: Name of sheet which will contain ‘meta’.
**kwargs: Passed to pandas.ExcelWriter (if excel_writer is path-like)

filter(*, keep=True, inplace=False, **kwargs)[source]¶

Return a (copy of a) filtered (downselected) IamDataFrame

Parameters:

keepbool, optional: Keep all scenarios satisfying the filters (if True) or the inverse.
inplacebool, optional: If True, do operation inplace and return None.
**kwargs: Arguments for filtering. Read more about the available filter options.

Returns:

pyam.IamDataFrame or None

get_data_column(column)[source]¶

Return a column from the timeseries data in long format

Equivalent to IamDataFrame.data[column].

Parameters:

columnstr: The column name.

Returns:

pd.Series

head(*args, **kwargs)[source]¶: Identical to pandas.DataFrame.head() operating on data

property index¶

Return all model-scenario combinations as pandas.MultiIndex

The index allows to loop over all model-scenario combinations using:

for model, scenario in df.index:
    ...

info(n=80, meta_rows=5, memory_usage=False)[source]¶

Print a summary of the object index dimensions and meta indicators

Parameters:

nint: The maximum line length
meta_rowsint: The maximum number of meta indicators printed

interpolate(time, inplace=False, **kwargs)[source]¶

Interpolate missing values in the timeseries data

This method uses pandas.DataFrame.interpolate(), which applies linear interpolation by default

Parameters:

timeint or datetime, or list-like thereof: Year or datetime.datetime to be interpolated. This must match the datetime/year format of self.
inplacebool, optional: if True, do operation inplace and return None
**kwargs: passed to pandas.DataFrame.interpolate()

load_meta(path, sheet_name='meta', ignore_conflict=False, **kwargs)[source]¶

Load ‘meta’ indicators from file

Parameters:

pathstr, pathlib.Path or pandas.ExcelFile: A valid path or instance of an xlsx or csv file
sheet_namestr, optional: Name of the sheet to be parsed (if xlsx)
ignore_conflictbool, optional: If True, values in path take precedence over existing meta. If False, raise an error in case of conflicts.
**kwargs: Passed to pandas.read_excel() or pandas.read_csv()

property model¶: Return the list of (unique) model names

multiply(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶

Multiply timeseries data items a and b along an axis

This function computes a * b. If a or b are lists, the method applies pandas.groupby().sum() on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.

Parameters:

a, bstr, list of str or a number: Items to be multiplied.
namestr: Name of the computed timeseries data on the axis.
axisstr, optional: Axis along which to compute.
fillnadict or scalar, optional: Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form {arg: default}.
ignore_unitsbool or str, optional: Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
appendbool, optional: Whether to append aggregated timeseries data to this instance.

Returns:

IamDataFrame or None: Computed timeseries data or None if append=True.

See also

add, subtract, divide
apply: Apply a custom function on the timeseries data along any axis.

Notes

This function uses the pint package and the iam-units registry (read the docs) to handle units. pyam will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units '{:~}'.format(u) (see here for more information).

As a result, the notation of returned units may differ from the input format. For example, the unit EJ/yr may be reformatted to EJ / a.

normalize(inplace=False, **kwargs)[source]¶

Normalize data to a specific data point

Note: Currently only supports normalizing to a specific time.

Parameters:

inplacebool, optional: if True, do operation inplace and return None
**kwargs: the column and value on which to normalize (e.g., year=2005)

offset(padding=0, fill_value=None, inplace=False, **kwargs)[source]¶

Compute new data which is offset from a specific data point

For example, offsetting from year=2005 will provide data relative to year=2005 such that the value in 2005 is 0 and all other values value[year] - value[2005].

Conceptually this operation performs as: ` df - df.filter(**kwargs) + padding `

Note: Currently only supports normalizing to a specific time.

Parameters:

paddingfloat, optional: an additional offset padding
fill_valuefloat or None, optional: Applied on subtraction. Fills exisiting missing (NaN) values. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.subtract.html
inplacebool, optional: if True, do operation inplace and return None
**kwargs: the column and value on which to offset (e.g., year=2005)

pivot_table(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]¶

Returns a pivot table

Parameters:

indexstr or list of str: Rows for Pivot table
columnsstr or list of str: Columns for Pivot table
valuesstr, optional: Dataframe column to aggregate or count
aggfuncstr or function, optional: Function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’
fill_valuescalar, optional: Value to replace missing values
stylestr, optional: Output style for pivot table formatting, accepts ‘highlight_not_max’, ‘heatmap’

property region¶: Return the list of (unique) regions

rename(mapping=None, inplace=False, append=False, check_duplicates=True, **kwargs)[source]¶

Rename any index dimension or data coordinate.

When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.

Renaming is only applied to any data row that matches for all columns given in mapping. Renaming can only be applied to the model and scenario columns, or to other data coordinates simultaneously.

Parameters:

mappingdict or kwargs

mapping of column name to rename-dictionary of that column

dict(<column_name>: {<current_name_1>: <target_name_1>,
                     <current_name_2>: <target_name_2>})

or kwargs as column_name={<current_name_1>: <target_name_1>, …}

inplacebool, optional

Do operation inplace and return None.

appendbool, optional

Whether to append aggregated timeseries data to this instance (if inplace=True) or to a returned new instance (if inplace=False).

check_duplicatesbool, optional

Check whether conflicts exist after renaming of timeseries data coordinates. If True, raise a ValueError; if False, rename and merge with groupby().sum().

Returns:

IamDataFrame or None: Aggregated timeseries data as new object or None if inplace=True.

require_data(region=None, variable=None, unit=None, year=None, exclude_on_fail=False)[source]¶

Check whether scenarios have values for all (combinations of) given elements.

Parameters:

regionstr or list of str, optional: Required region(s).
variablestr or list of str, optional: Required variable(s).
unitstr or list of str, optional: Required unit(s).
yearint or list of int, optional: Required year(s).
exclude_on_failbool, optional: If True, set exclude = True for all scenarios that do not satisfy the criteria.

Returns:

pandas.DataFrame or None: A dataframe of missing (combinations of) elements for all scenarios.

property scenario¶: Return the list of (unique) scenario names

set_meta(meta, name=None, index=None)[source]¶

Add meta indicators as pandas.Series, list or value (int/float/str)

Parameters:

metapandas.DataFrame, pandas.Series, list, int, float or str: column to be added to ‘meta’ (by [‘model’, ‘scenario’] index if possible)
namestr, optional: meta column name (defaults to meta pandas.Series.name); either meta.name or the name kwarg must be defined
indexIamDataFrame, pandas.DataFrame or pandas.MultiIndex, optional: index to be used for setting meta column ([‘model’, ‘scenario’])

set_meta_from_data(name, method=None, column='value', **kwargs)[source]¶

Add meta indicators from downselected timeseries data

Parameters:

namestr: Column name of the ‘meta’ table
methodfunction, optional: Method for aggregation (e.g., numpy.max); required if downselected data do not yield unique values
columnstr, optional: The column from data to be used to derive the indicator
**kwargs: Passed to slice() for downselected data

slice(*, keep=True, **kwargs)[source]¶

Return a (filtered) slice object of the IamDataFrame timeseries data index

Parameters:

keepbool, optional: Keep all scenarios satisfying the filters (if True) or the inverse.
**kwargs: Arguments for filtering. Read more about the available filter options.

Returns:

pyam.slice.IamSlice

sort_data(inplace=False)[source]¶

Sort timeseries data by index and coordinates

Parameters:

inplacebool, optional: If True, do operation inplace and return None.

Returns:

IamDataFrame or None: The modified IamDataFrame or None if inplace=True.

subtract(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶

Compute the difference of timeseries data items a and b along an axis

This function computes a - b. If a or b are lists, the method applies pandas.groupby().sum() on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.

Parameters:

a, bstr, list of str or a number: Items to be used for the subtraction.
namestr: Name of the computed timeseries data on the axis.
axisstr, optional: Axis along which to compute.
fillnadict or scalar, optional: Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form {arg: default}.
ignore_unitsbool or str, optional: Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
appendbool, optional: Whether to append aggregated timeseries data to this instance.

Returns:

IamDataFrame or None: Computed timeseries data or None if append=True.

See also

add, multiply, divide
diff: Compute the difference of timeseries data along the time dimension.
apply: Apply a custom function on the timeseries data along any axis.

Notes

This function uses the pint package and the iam-units registry (read the docs) to handle units. pyam will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units '{:~}'.format(u) (see here for more information).

As a result, the notation of returned units may differ from the input format. For example, the unit EJ/yr may be reformatted to EJ / a.

swap_time_for_year(subannual=False, inplace=False)[source]¶

Convert the time dimension to year (as integer).

Parameters:

subannualbool, str or func, optional: Merge non-year components of the “time” domain as new column “subannual”. Apply strftime() on the values of the “time” domain using subannual (if a string) or “%m-%d %H:%M%z” (if True). If it is a function, apply the function on the values of the “time” domain.
inplacebool, optional: If True, do operation inplace and return None.

Returns:

IamDataFrame or None: Object with altered time domain or None if inplace=True.

Raises:

ValueError: “time” is not a column of self.data

See also

swap_year_for_time

swap_year_for_time(inplace=False)[source]¶

Convert the year and subannual dimensions to time (as datetime).

The method applies dateutil.parser.parse() on the combined columns year and subannual:

dateutil.parser.parse([f"{y}-{s}" for y, s in zip(year, subannual)])

Parameters:

inplacebool, optional: If True, do operation inplace and return None.

Returns:

IamDataFrame or None: Object with altered time domain or None if inplace=True.

Raises:

ValueError: “year” or “subannual” are not a column of self.data

See also

swap_time_for_year

tail(*args, **kwargs)[source]¶: Identical to pandas.DataFrame.tail() operating on data

property time¶

The time index, i.e., axis labels related to the time domain

Returns:

A pandas.Index (dtype ‘int64’) if the time_domain is ‘year’
A pandas.DatetimeIndex if the time_domain is ‘datetime’
A pandas.Index if the time_domain is ‘mixed’

property time_domain¶: Indicator of the time domain: ‘year’, ‘datetime’, or ‘mixed’

timeseries(iamc_index=False)[source]¶

Returns data as pandas.DataFrame in wide format

Parameters:

iamc_indexbool, optional: If True, return only IAMC-index [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns.

Raises:

ValueError: IamDataFrame is empty
ValueError: reducing to IAMC-index yields an index with duplicates

to_csv(path=None, iamc_index=False, **kwargs)[source]¶

Write IamDataFrame.timeseries() to a comma-separated values (csv) file

Parameters:

pathstr, path or file-like, optional: File path as string or pathlib.Path, or file-like object. If None, the result is returned as a csv-formatted string. See pandas.DataFrame.to_csv() for details.
iamc_indexbool, optional: If True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all dimensions. See IamDataFrame.timeseries() for details.
**kwargs: Passed to pandas.DataFrame.to_csv().

to_datapackage(path)[source]¶

Write object to a frictionless Data Package

More information: https://frictionlessdata.io

Returns the saved datapackage.Package (read the docs). When adding metadata (descriptors), please follow the template defined by https://github.com/OpenEnergyPlatform/metadata

Parameters:

pathstring or path object: Any valid string path or pathlib.Path.

to_excel(excel_writer, sheet_name='data', iamc_index=False, include_meta=True, **kwargs)[source]¶

Write object to an Excel spreadsheet

Parameters:

excel_writerpath-like, file-like, or ExcelWriter object: File path as string or pathlib.Path, or existing pandas.ExcelWriter.
sheet_namestr, optional: Name of sheet which will contain IamDataFrame.timeseries() data.
iamc_indexbool, optional: If True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all dimensions. See IamDataFrame.timeseries() for details.
include_metabool or str, optional: If True, write meta to a sheet ‘meta’ (default); if this is a string, use it as sheet name.
**kwargs: Passed to pandas.ExcelWriter (if excel_writer is path-like)

to_ixmp4(platform: Platform, checkpoint_message: str = 'Import run from pyam')[source]¶

Save all scenarios as new default runs in an ixmp4.Platform instance

Parameters:

platformixmp4.Platform or str: The ixmp4 platform database instance to which the scenario data is saved.
checkpoint_messagestr: The message for the ixmp4 checkpoint (similar to a commit message).

to_netcdf(path)[source]¶

Write object to a NetCDF file

Parameters:

pathstring or path object: Any valid string path or pathlib.Path.

See also

pyam.read_netcdf

Notes

Read the pyam-netcdf docs for more information on the expected file format structure.

to_xarray()[source]¶

Convert object to an xarray.Dataset

Returns:

xarray.Dataset

property unit¶: Return the list of (unique) units

property unit_mapping¶: Return a dictionary of variables to (list of) corresponding units

validate(*, upper_bound: float | None = None, lower_bound: float | None = None, exclude_on_fail: bool = False, **kwargs) → DataFrame[source]¶

Validate scenarios using bounds on (filtered) timeseries ‘data’ values.

Returns all data rows that do not match the criteria, or returns None if all scenarios match the criteria.

When called with exclude_on_fail=True, scenarios not satisfying the criteria will be marked as exclude=True.

Parameters:

upper_bound, lower_boundfloat, optional: Upper and lower bounds for validation criteria of timeseries data.
exclude_on_failbool, optional: If True, set exclude = True for all scenarios that do not satisfy the criteria.
**kwargs: Passed to slice() to downselect datapoints for validation.

Returns:

pandas.DataFrame or None: All data points that do not satisfy the criteria.

See also

categorize

property variable¶: Return the list of (unique) variables