Python API

Class IamDataFrame

class pyam.IamDataFrame(data, **kwargs)[source]

This class is a wrapper for dataframes following the IAMC format. It provides a number of diagnostic features (including validation of data, completeness of variables provided) as well as a number of visualization and plotting tools.

Parameters
data: ixmp.TimeSeries, ixmp.Scenario, pd.DataFrame or data file

an instance of an TimeSeries or Scenario (requires ixmp), or pd.DataFrame or data file with IAMC-format data columns. A pd.DataFrame can have the required data as columns or index. Support is provided additionally for R-style data columns for years, like “X2015”, etc.

kwargs:

if value=col, melt col to value and use col name as variable; else, mapping of columns required for an IamDataFrame to: - one column in df - multiple columns, which will be concatenated by pipe - a string to be used as value for this column

Methods

aggregate(variable[, components, append])

Compute the aggregate of timeseries components or sub-categories

aggregate_region(variable[, region, …])

Compute the aggregate of timeseries over a number of regions including variable components only defined at the region level

append(other[, ignore_meta_conflict, inplace])

Append any castable object to this IamDataFrame.

as_pandas([with_metadata])

Return this as a pd.DataFrame

bar_plot(*args, **kwargs)

Plot timeseries bars of existing data

categorize(name, value, criteria[, color, …])

Assign scenarios to a category according to specific criteria or display the category assignment

check_aggregate(variable[, components, …])

Check whether a timeseries matches the aggregation of its components

check_aggregate_region(variable[, region, …])

Check whether the region timeseries data match the aggregation of components

check_internal_consistency(**kwargs)

Check whether the database is internally consistent

col_apply(col, func, *args, **kwargs)

Apply a function to a column

convert_unit(conversion_mapping[, inplace])

Converts units based on provided unit conversion factors

copy()

Return a deepcopy of self

export_metadata(path)

Export metadata to Excel

filter([keep, inplace])

Return a filtered IamDataFrame (i.e., a subset of current data)

head(*args, **kwargs)

Identical to pd.DataFrame.head() operating on data

interpolate(year)

Interpolate missing values in timeseries (linear interpolation)

line_plot([x, y])

Plot timeseries lines of existing data

load_metadata(path, *args, **kwargs)

Load metadata exported from pyam.IamDataFrame instance

map_regions(map_col[, agg, copy_col, fname, …])

Plot regional data for a single model, scenario, variable, and year

models()

Get a list of models

normalize([inplace])

Normalize data to a given value.

pie_plot(*args, **kwargs)

Plot a pie chart

pivot_table(index, columns[, values, …])

Returns a pivot table

regions()

Get a list of regions

rename([mapping, inplace, append, …])

Rename and aggregate column entries using groupby.sum() on values.

require_variable(variable[, unit, year, …])

Check whether all scenarios have a required variable

reset_exclude()

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)

Plot a scatter chart using metadata columns

scenarios()

Get a list of scenarios

set_meta(meta[, name, index])

Add metadata indicators as pd.Series, list or value (int/float/str)

set_meta_from_data(name[, method, column])

Add metadata indicators from downselected timeseries data of self

stack_plot(*args, **kwargs)

Plot timeseries stacks of existing data

swap_time_for_year([inplace])

Convert the time column to year.

tail(*args, **kwargs)

Identical to pd.DataFrame.tail() operating on data

timeseries([iamc_index])

Returns a pd.DataFrame in wide format (years or timedate as columns)

to_csv(path[, iamc_index])

Write timeseries data to a csv file

to_excel(excel_writer[, sheet_name, iamc_index])

Write timeseries data to Excel format

validate([criteria, exclude_on_fail])

Validate scenarios using criteria on timeseries values

variables([include_units])

Get a list of variables

aggregate(variable, components=None, append=False)[source]

Compute the aggregate of timeseries components or sub-categories

Parameters
variable: str

variable for which the aggregate should be computed

components: list of str, default None

list of variables, defaults to all sub-categories of variable

append: bool, default False

append the aggregate timeseries to data and return None, else return aggregate timeseries

aggregate_region(variable, region='World', subregions=None, components=None, append=False)[source]

Compute the aggregate of timeseries over a number of regions including variable components only defined at the region level

Parameters
variable: str

variable for which the aggregate should be computed

region: str, default ‘World’

dimension

subregions: list of str

list of subregions, defaults to all regions other than region

components: list of str

list of variables to include in the aggregate from the region level, defaults to all sub-categories of variable included in region but not in any of subregions

append: bool, default False

append the aggregate timeseries to data and return None, else return aggregate timeseries

append(other, ignore_meta_conflict=False, inplace=False, **kwargs)[source]

Append any castable object to this IamDataFrame. Columns in other.meta that are not in self.meta are always merged, duplicate region-variable-unit-year rows raise a ValueError.

Parameters
other: pyam.IamDataFrame, ixmp.TimeSeries, ixmp.Scenario,
pd.DataFrame or data file

An IamDataFrame, TimeSeries or Scenario (requires ixmp), pandas.DataFrame or data file with IAMC-format data columns

ignore_meta_conflictbool, default False

If False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.

inplacebool, default False

If True, do operation inplace and return None

kwargs are passed through to `IamDataFrame(other, **kwargs)`
as_pandas(with_metadata=False)[source]

Return this as a pd.DataFrame

Parameters
with_metadatabool, default False or dict

if True, join data with all meta columns; if a dict, discover meaningful meta columns from values (in key-value)

bar_plot(*args, **kwargs)[source]

Plot timeseries bars of existing data

see pyam.plotting.bar_plot() for all available options

categorize(name, value, criteria, color=None, marker=None, linestyle=None)[source]

Assign scenarios to a category according to specific criteria or display the category assignment

Parameters
name: str

category column name

value: str

category identifier

criteria: dict

dictionary with variables mapped to applicable checks (‘up’ and ‘lo’ for respective bounds, ‘year’ for years - optional)

color: str

assign a color to this category for plotting

marker: str

assign a marker to this category for plotting

linestyle: str

assign a linestyle to this category for plotting

check_aggregate(variable, components=None, exclude_on_fail=False, multiplier=1, **kwargs)[source]

Check whether a timeseries matches the aggregation of its components

Parameters
variable: str

variable to be checked for matching aggregation of sub-categories

components: list of str, default None

list of variables, defaults to all sub-categories of variable

exclude_on_fail: boolean, default False

flag scenarios failing validation as exclude: True

multiplier: number, default 1

factor when comparing variable and sum of components

kwargs: passed to `np.isclose()`
check_aggregate_region(variable, region='World', subregions=None, components=None, exclude_on_fail=False, **kwargs)[source]

Check whether the region timeseries data match the aggregation of components

Parameters
variable: str

variable to be checked for matching aggregation of subregions

region: str, default ‘World’

region to be checked for matching aggregation of subregions

subregions: list of str

list of subregions, defaults to all regions other than region

components: list of str, default None

list of variables, defaults to all sub-categories of variable included in region but not in any of subregions

exclude_on_fail: boolean, default False

flag scenarios failing validation as exclude: True

kwargs: passed to `np.isclose()`
check_internal_consistency(**kwargs)[source]

Check whether the database is internally consistent

We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a dictionary of inconsistent variables is returned.

Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we start to store how the regions relate, see [this issue](https://github.com/IAMconsortium/pyam/issues/106).

Parameters
kwargs: passed to `np.isclose()`
col_apply(col, func, *args, **kwargs)[source]

Apply a function to a column

Parameters
col: string

column in either data or metadata

func: functional

function to apply

convert_unit(conversion_mapping, inplace=False)[source]

Converts units based on provided unit conversion factors

Parameters
conversion_mapping: dict

for each unit for which a conversion should be carried out, provide current unit and target unit and conversion factor {<current unit>: [<target unit>, <conversion factor>]}

inplace: bool, default False

if True, do operation inplace and return None

copy()[source]

Return a deepcopy of self

Documentation about deepcopy is available here

export_metadata(path)[source]

Export metadata to Excel

Parameters
path: string

path/filename for xlsx file of metadata export

filter(keep=True, inplace=False, **kwargs)[source]

Return a filtered IamDataFrame (i.e., a subset of current data)

Parameters
keep: bool, default True

keep all scenarios satisfying the filters (if True) or the inverse

inplace: bool, default False

if True, do operation inplace and return None

filters by kwargs:
The following columns are available for filtering:
  • metadata columns: filter by category assignment

  • ‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’: string or list of strings, where * can be used as a wildcard

  • ‘level’: the maximum “depth” of IAM variables (number of ‘|’) (excluding the strings given in the ‘variable’ argument)

  • ‘year’: takes an integer, a list of integers or a range note that the last year of a range is not included, so range(2010, 2015) is interpreted as [2010, …, 2014]

  • arguments for filtering by datetime.datetime (‘month’, ‘hour’, ‘time’)

  • ‘regexp=True’ disables pseudo-regexp syntax in pattern_match()

head(*args, **kwargs)[source]

Identical to pd.DataFrame.head() operating on data

interpolate(year)[source]

Interpolate missing values in timeseries (linear interpolation)

Parameters
year: int

year to be interpolated

line_plot(x='year', y='value', **kwargs)[source]

Plot timeseries lines of existing data

see pyam.plotting.line_plot() for all available options

load_metadata(path, *args, **kwargs)[source]

Load metadata exported from pyam.IamDataFrame instance

Parameters
path: string

xlsx file with metadata exported from pyam.IamDataFrame instance

map_regions(map_col, agg=None, copy_col=None, fname=None, region_col=None, remove_duplicates=False, inplace=False)[source]

Plot regional data for a single model, scenario, variable, and year

see pyam.plotting.region_plot() for all available options

Parameters
map_col: string

The column used to map new regions to. Common examples include iso and 5_region.

agg: string, optional

Perform a data aggregation. Options include: sum.

copy_col: string, optional

Copy the existing region data into a new column for later use.

fname: string, optional

Use a non-default region mapping file

region_col: string, optional

Use a non-default column name for regions to map from.

remove_duplicates: bool, optional, default: False

If there are duplicates in the mapping from one regional level to another, then remove these duplicates by counting the most common mapped value. This option is most useful when mapping from high resolution (e.g., model regions) to low resolution (e.g., 5_region).

inplacebool, default False

if True, do operation inplace and return None

models()[source]

Get a list of models

normalize(inplace=False, **kwargs)[source]

Normalize data to a given value. Currently only supports normalizing to a specific time.

Parameters
inplace: bool, default False

if True, do operation inplace and return None

kwargs: the values on which to normalize (e.g., `year=2005`)
pie_plot(*args, **kwargs)[source]

Plot a pie chart

see pyam.plotting.pie_plot() for all available options

pivot_table(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]

Returns a pivot table

Parameters
index: str or list of strings

rows for Pivot table

columns: str or list of strings

columns for Pivot table

values: str, default ‘value’

dataframe column to aggregate or count

aggfunc: str or function, default ‘count’

function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’

fill_value: scalar, default None

value to replace missing values with

style: str, default None

output style for pivot table formatting accepts ‘highlight_not_max’, ‘heatmap’

regions()[source]

Get a list of regions

rename(mapping=None, inplace=False, append=False, check_duplicates=True, **kwargs)[source]

Rename and aggregate column entries using groupby.sum() on values. When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.

Renaming is only applied to any data where a filter matches for all columns given in mapping. Renaming can only be applied to the model and scenario columns or to other data columns simultaneously.

Parameters
mapping: dict or kwargs

mapping of column name to rename-dictionary of that column >> {<column_name>: {<current_name_1>: <target_name_1>, >> <current_name_2>: <target_name_2>}} or kwargs as column_name={<current_name_1>: <target_name_1>, …}

inplace: bool, default False

if True, do operation inplace and return None

append: bool, default False

if True, append renamed timeseries to IamDataFrame

check_duplicates: bool, default True

check whether conflict between existing and renamed data exists. If True, raise ValueError; if False, rename and merge with groupby().sum().

require_variable(variable, unit=None, year=None, exclude_on_fail=False)[source]

Check whether all scenarios have a required variable

Parameters
variable: str

required variable

unit: str, default None

name of unit (optional)

year: int or list, default None

years (optional)

exclude_on_fail: bool, default False

flag scenarios missing the required variables as exclude: True

reset_exclude()[source]

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)[source]

Plot a scatter chart using metadata columns

see pyam.plotting.scatter() for all available options

scenarios()[source]

Get a list of scenarios

set_meta(meta, name=None, index=None)[source]

Add metadata indicators as pd.Series, list or value (int/float/str)

Parameters
meta: pd.Series, list, int, float or str

column to be added to metadata (by [‘model’, ‘scenario’] index if possible)

name: str, optional

meta column name (defaults to meta pd.Series.name); either a meta.name or the name kwarg must be defined

index: pyam.IamDataFrame, pd.DataFrame or pd.MultiIndex, optional

index to be used for setting meta column ([‘model’, ‘scenario’])

set_meta_from_data(name, method=None, column='value', **kwargs)[source]

Add metadata indicators from downselected timeseries data of self

Parameters
name: str

meta column name

method: function, optional

method for aggregation, required if downselected data do not yield unique values (e.g., numpy.max())

column: str, optional

the column from data to be used to derive the indicator

kwargs: passed to :meth:`IamDataFrame.filter()` for downselected `data`
stack_plot(*args, **kwargs)[source]

Plot timeseries stacks of existing data

see pyam.plotting.stack_plot() for all available options

swap_time_for_year(inplace=False)[source]

Convert the time column to year.

Parameters
inplace: bool, default False

if True, do operation inplace and return None

Raises
ValueError

“time” is not a column of self.data

tail(*args, **kwargs)[source]

Identical to pd.DataFrame.tail() operating on data

timeseries(iamc_index=False)[source]

Returns a pd.DataFrame in wide format (years or timedate as columns)

Parameters
iamc_index: bool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all data columns

to_csv(path, iamc_index=False, **kwargs)[source]

Write timeseries data to a csv file

Parameters
path: string

file path

iamc_index: bool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all data columns

to_excel(excel_writer, sheet_name='data', iamc_index=False, **kwargs)[source]

Write timeseries data to Excel format

Parameters
excel_writer: string or ExcelWriter object

file path or existing ExcelWriter

sheet_name: string, default ‘data’

name of sheet which will contain IamDataFrame.timeseries() data

iamc_index: bool, default False

if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all data columns

validate(criteria={}, exclude_on_fail=False)[source]

Validate scenarios using criteria on timeseries values

Returns all scenarios which do not match the criteria and prints a log message or returns None if all scenarios match the criteria.

When called with exclude_on_fail=True, scenarios in the object not satisfying the criteria will be marked as exclude=True.

Parameters
criteria: dict
dictionary with variable keys and check values

(‘up’ and ‘lo’ for respective bounds, ‘year’ for years)

exclude_on_fail: bool, default False

flag scenarios failing validation as exclude: True

variables(include_units=False)[source]

Get a list of variables

Parameters
include_units: boolean, default False

include the units

Useful pyam functions

pyam.filter_by_meta(data, df, join_meta=False, **kwargs)[source]

Filter by and join meta columns from an IamDataFrame to a pd.DataFrame

Parameters
data: pd.DataFrame instance

DataFrame to which meta columns are to be joined, index or columns must include [‘model’, ‘scenario’]

df: IamDataFrame instance

IamDataFrame from which meta columns are filtered and joined (optional)

join_meta: bool, default False

join selected columns from df.meta on data

kwargs:

meta columns to be filtered/joined, where col=… applies filters by the given arguments (using utils.pattern_match()) and col=None joins the column without filtering (setting col to np.nan if (model, scenario) not in df.meta.index)

pyam.cumulative(x, first_year, last_year)[source]

Returns the cumulative sum of a timeseries (indexed over years), implements linear interpolation between years, ignores nan’s in the range. The function includes the last-year value of the series, and raises a warning if start_year or last_year is outside of the timeseries range and returns nan

Parameters
x: pandas.Series

a timeseries to be summed over time

first_year: int

first year of the sum

last_year: int

last year of the sum (inclusive)

pyam.fill_series(x, year)[source]

Returns the value of a timeseries (indexed over years) for a year by linear interpolation.

Parameters
x: pandas.Series

a timeseries to be interpolated

year: int

year of interpolation

Class Statistics

This class provides a wrapper for generating descriptive summary statistics for timeseries data using various groupbys or filters. It uses the pandas.describe() function internally and hides the tedious work of filters, groupbys and merging of dataframes.

class pyam.Statistics(df, groupby=None, filters=None, rows=False, percentiles=[0.25, 0.5, 0.75])[source]

This class provides a wrapper for descriptive statistics of IAMC-style timeseries data.

Parameters
df: pyam.IamDataFrame

an IamDataFrame from which to retrieve metadata for grouping, filtering

groupby: str or dict

a column of df.meta to be used for the groupby feature, or a dictionary of {column: list}, where list is used for ordering

filters: list of tuples

arguments for filtering and describing, either ((index, dict) or ((index[0], index[1]), dict), when also using groupby, index must haev length 2.

percentiles: list-like of numbers, optional

The percentiles to include in the output of pandas.describe(). All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

Methods

add(data, header[, row, subheader])

Filter data by arguments of this SummaryStats instance, then apply pd.describe() and format the statistics

reindex([copy])

Reindex the summary statistics dataframe

summarize([center, fullrange, …])

Format the compiled statistics to a concise string output

add(data, header, row=None, subheader=None)[source]

Filter data by arguments of this SummaryStats instance, then apply pd.describe() and format the statistics

Parameters
datapd.DataFrame or pd.Series

data for which summary statistics should be computed

headerstr

column name for descriptive statistics

rowstr

row name for descriptive statistics (required if pyam.Statistics(rows=True))

subheaderstr, optional

column name (level=1) if data is a unnamed pd.Series

reindex(copy=True)[source]

Reindex the summary statistics dataframe

summarize(center='mean', fullrange=None, interquartile=None, custom_format='{:.2f}')[source]

Format the compiled statistics to a concise string output