The IamDataFrame class¶
- class pyam.IamDataFrame(data, meta=None, index=['model', 'scenario'], **kwargs)[source]¶
Scenario timeseries data and meta indicators
The class provides a number of diagnostic features (including validation of data, completeness of variables provided), processing tools (e.g., unit conversion), as well as visualization and plotting tools.
- Parameters:
- data
pandas.DataFrame
,pathlib.Path
or file-like object Scenario timeseries data following the IAMC data format or a supported variation as pandas object or a path to a file.
- meta
pandas.DataFrame
, optional A dataframe with suitable ‘meta’ indicators in wide (indicator as column name) or long (key/value columns) format. The dataframe will be downselected to scenarios present in data.
- indexlist, optional
Columns to use for resulting IamDataFrame index.
- kwargs
If value=<col>, melt column <col> to ‘value’ and use <col> name as ‘variable’; or mapping of required columns (
IAMC_IDX
) to any of the following:one column in data
multiple columns, to be concatenated by
|
a string to be used as value for this column
- data
Notes
A
pandas.DataFrame
can have the required dimensions as columns or index. R-style integer column headers (i.e., X2015) are acceptable.When initializing an
IamDataFrame
from an xlsx file,pyam
will per default parse all sheets starting with ‘data’ or ‘Data’ for timeseries and a sheet ‘meta’ to populate the respective table. Sheet names can be specified with kwargssheet_name
(‘data’) andmeta_sheet_name
(‘meta’), where values can be a string or a list and ‘*’ is interpreted as a wildcard. Calling the class withmeta_sheet_name=False
will skip the import of the ‘meta’ table.When initializing an
IamDataFrame
from an object that is already anIamDataFrame
instance, the new object will be hard-linked to all attributes of the original object - so any changes on one object (e.g., withinplace=True
) may also modify the other object! This is intended behaviour and consistent with pandas but may be confusing for those who are not used to the pandas/Python universe.- Attributes:
compute
Access to advanced computation methods, see
IamComputeAccessor
coordinates
Return the list of data coordinates (columns not including index names)
data
Return the timeseries data as a long
pandas.DataFrame
dimensions
Return the list of data columns (index names & data coordinates)
empty
Indicator whether this object is empty
exclude
Indicator for exclusion of scenarios, used by validation methods
index
Return all model-scenario combinations as
pandas.MultiIndex
model
Return the list of (unique) model names
region
Return the list of (unique) regions
scenario
Return the list of (unique) scenario names
time
The time index, i.e., axis labels related to the time domain.
time_domain
Indicator of the time domain: ‘year’, ‘datetime’, or ‘mixed’
unit
Return the list of (unique) units
unit_mapping
Return a dictionary of variables to (list of) corresponding units
variable
Return the list of (unique) variables
Methods
add
(a, b, name[, axis, fillna, ...])Add timeseries data items a and b along an axis
aggregate
(variable[, components, method, ...])Aggregate timeseries data by components or subcategories within each region.
aggregate_region
(variable[, region, ...])Aggregate timeseries data by subregions.
aggregate_time
(variable[, column, value, ...])Aggregate timeseries data by subannual time resolution.
append
(other[, ignore_meta_conflict, ...])Append any IamDataFrame-like object to this object
apply
(func, name[, axis, fillna, append, args])Apply a function to components of timeseries data along an axis
as_pandas
([meta_cols])Return object as a pandas.DataFrame
categorize
(name, value, criteria[, color, ...])Assign scenarios to a category according to specific criteria
check_aggregate
(variable[, components, ...])Check whether timeseries data matches the aggregation by its components.
check_aggregate_region
(variable[, region, ...])Check whether timeseries data matches the aggregation across subregions.
check_internal_consistency
([components])Check whether a scenario ensemble is internally consistent.
col_apply
(col, func, *args, **kwargs)Apply a function to a column of data or meta
convert_unit
(current, to[, factor, ...])Convert all timeseries data having current units to new units.
copy
()Make a deepcopy of this object
diff
(mapping[, periods, append])Compute the difference of timeseries data along the time dimension
divide
(a, b, name[, axis, fillna, ...])Divide the timeseries data items a and b along an axis
downscale_region
(variable[, region, ...])Downscale timeseries data to a number of subregions.
equals
(other)Test if two objects contain the same data and meta indicators
export_meta
(excel_writer[, sheet_name])Write the 'meta' indicators of this object to an Excel spreadsheet
filter
([keep, inplace])Return a (copy of a) filtered (downselected) IamDataFrame
get_data_column
(column)Return a column from the timeseries data in long format
head
(*args, **kwargs)Identical to
pandas.DataFrame.head()
operating on datainfo
([n, meta_rows, memory_usage])Print a summary of the object index dimensions and meta indicators
interpolate
(time[, inplace])Interpolate missing values in the timeseries data
load_meta
(path[, sheet_name, ignore_conflict])Load 'meta' indicators from file
multiply
(a, b, name[, axis, fillna, ...])Multiply timeseries data items a and b along an axis
normalize
([inplace])Normalize data to a specific data point
offset
([padding, fill_value, inplace])Compute new data which is offset from a specific data point
pivot_table
(index, columns[, values, ...])Returns a pivot table
rename
([mapping, inplace, append, ...])Rename any index dimension or data coordinate.
require_data
([region, variable, unit, year, ...])Check whether scenarios have values for all (combinations of) given elements.
require_variable
(*args, **kwargs)This method is deprecated, use df.require_data() instead.
Reset exclusion assignment for all scenarios to
exclude
= Falseset_meta
(meta[, name, index])Add meta indicators as pandas.Series, list or value (int/float/str)
set_meta_from_data
(name[, method, column])Add meta indicators from downselected timeseries data of self
slice
([keep])Return a (filtered) slice object of the IamDataFrame timeseries data index
subtract
(a, b, name[, axis, fillna, ...])Compute the difference of timeseries data items a and b along an axis
swap_time_for_year
([subannual, inplace])Convert the time dimension to year (as integer).
swap_year_for_time
([inplace])Convert the year and subannual dimensions to time (as datetime).
tail
(*args, **kwargs)Identical to
pandas.DataFrame.tail()
operating on datatimeseries
([iamc_index])Returns data as
pandas.DataFrame
in wide formatto_csv
([path, iamc_index])Write
IamDataFrame.timeseries()
to a comma-separated values (csv) fileto_datapackage
(path)Write object to a frictionless Data Package
to_excel
(excel_writer[, sheet_name, ...])Write object to an Excel spreadsheet
validate
([criteria, exclude_on_fail])Validate scenarios using criteria on timeseries values
map_regions
- add(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶
Add timeseries data items a and b along an axis
This function computes a + b. If a or b are lists, the method applies
pandas.groupby().sum()
on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.- Parameters:
- a, bstr, list of str or a number
Items to be used for the addition.
- namestr
Name of the computed timeseries data on the axis.
- axisstr, optional
Axis along which to compute.
- fillnadict or scalar, optional
Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form
{arg: default}
.- ignore_unitsbool or str, optional
Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
- appendbool, optional
Whether to append aggregated timeseries data to this instance.
- Returns:
IamDataFrame
or NoneComputed timeseries data or None if append=True.
See also
Notes
This function uses the
pint
package and theiam-units
registry (read the docs) to handle units.pyam
will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units'{:~}'.format(u)
(see here for more information).As a result, the notation of returned units may differ from the input format. For example, the unit
EJ/yr
may be reformatted toEJ / a
.
- aggregate(variable, components=None, method='sum', recursive=False, append=False)[source]¶
Aggregate timeseries data by components or subcategories within each region.
- Parameters:
- variablestr or list of str
Variable(s) for which the aggregate will be computed.
- componentslist of str, optional
Components to be aggregate, defaults to all subcategories of variable.
- methodfunc or str, optional
Aggregation method, e.g.
numpy.mean
,numpy.sum
, ‘min’, ‘max’.- recursivebool or str, optional
Iterate recursively (bottom-up) over all subcategories of variable. If there are existing intermediate variables, it validates the aggregated value. If recursive=’skip-validate’, it skips the validation.
- appendbool, optional
Whether to append aggregated timeseries data to this instance.
- Returns:
IamDataFrame
or NoneAggregated timeseries data or None if append=True.
See also
add
Add timeseries data items along an axis.
aggregate_region
Aggregate timeseries data along the region dimension.
Notes
The aggregation function interprets any missing values (
numpy.nan
) for individual components as 0.
- aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, append=False, drop_negative_weights=True)[source]¶
Aggregate timeseries data by subregions.
This function allows to add variable sub-categories that are only defined at the region level by setting components=True.
- Parameters:
- variablestr or list of str
Variable(s) to be aggregated.
- regionstr, optional
Region to which data will be aggregated
- subregionslist of str, optional
List of subregions, defaults to all regions other than region.
- componentsbool or list of str, optional
Variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables.
- methodfunc or str, optional
Method to use for aggregation, e.g.
numpy.mean
,numpy.sum
, ‘min’, ‘max’.- weightstr, optional
Variable to use as weight for the aggregation (currently only supported with method=’sum’).
- appendbool, optional
Append the aggregate timeseries to self and return None, else return aggregate timeseries as new
IamDataFrame
.- drop_negative_weightsbool, optional
Removes any aggregated values that are computed using negative weights.
- Returns:
IamDataFrame
or NoneAggregated timeseries data or None if append=True.
See also
add
Add timeseries data items a and b along an axis
aggregate
Aggregate timeseries data along the variable hierarchy.
nomenclature.RegionProcessor
Processing of model-specific region-mappings.
Notes
The
nomenclature-iamc
package supports structured processing of many-to-many-region mappings. Read the user guide for more information.
- aggregate_time(variable, column='subannual', value='year', components=None, method='sum', append=False)[source]¶
Aggregate timeseries data by subannual time resolution.
- Parameters:
- variablestr or list of str
variable(s) to be aggregated
- columnstr, optional
the data column to be used as subannual time representation
- valuestr, optional
the name of the aggregated (subannual) time
- componentslist of str
subannual timeslices to be aggregated; defaults to all subannual timeslices other than value
- methodfunc or str, optional
method to use for aggregation, e.g.
numpy.mean()
,numpy.sum()
, ‘min’, ‘max’- appendbool, optional
append the aggregate timeseries to self and return None, else return aggregate timeseries as new
IamDataFrame
- append(other, ignore_meta_conflict=False, inplace=False, verify_integrity=True, **kwargs)[source]¶
Append any IamDataFrame-like object to this object
Indicators in other.meta that are not in self.meta are merged. Missing values are set to NaN. Conflicting data rows always raise a ValueError.
- Parameters:
- otherIamDataFrame, pandas.DataFrame or data file
Any object castable as IamDataFrame to be appended
- ignore_meta_conflictbool, optional
If False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.
- inplacebool, optional
If True, do operation inplace and return None
- verify_integritybool, optional
If True, verify integrity of index
- kwargs
Passed to
IamDataFrame(other, **kwargs)
if other is not already an IamDataFrame
- Returns:
- Raises:
- ValueError
If time domain or other timeseries data index dimension don’t match.
- apply(func, name, axis='variable', fillna=None, append=False, args=(), **kwds)[source]¶
Apply a function to components of timeseries data along an axis
This function computes a function func using timeseries data selected along an axis downselected by keyword arguments. The length of components needs to match the number of required arguments of func.
- Parameters:
- funcfunction
Function to apply to components along axis.
- namestr
Name of the computed timeseries data on the axis.
- axisstr, optional
Axis along which to compute.
- fillnadict or scalar, optional
Value to fill holes when rows are not defined for items in args or kwds. Can be a scalar or a dictionary of the form
{kwd: default}
.- appendbool, optional
Whether to append aggregated timeseries data to this instance.
- argstuple or list of str
List of variables to pass as positional arguments to func.
- **kwds
Additional keyword arguments to pass as keyword arguments to func. If the name of a variable is given, the associated timeseries is passed. Otherwise the value itself is passed.
- Returns:
IamDataFrame
or NoneComputed timeseries data or None if append=True.
Notes
This function uses the
pint
package and theiam-units
registry (read the docs) to handle units.pyam
uses abbreviated units'{:~}'.format(u)
(see here for more information).As a result, the notation of returned units may differ from the input format. For example, the unit
EJ/yr
may be reformatted toEJ / a
.
- as_pandas(meta_cols=True)[source]¶
Return object as a pandas.DataFrame
- Parameters:
- meta_colslist, optional
join data with all meta columns if True (default) or only with columns in list, or return copy of data if False
- categorize(name, value, criteria, color=None, marker=None, linestyle=None)[source]¶
Assign scenarios to a category according to specific criteria
- Parameters:
- namestr
column name of the ‘meta’ table
- valuestr
category identifier
- criteriadict
dictionary with variables mapped to applicable checks (‘up’ and ‘lo’ for respective bounds, ‘year’ for years - optional)
- colorstr, optional
assign a color to this category for plotting
- markerstr, optional
assign a marker to this category for plotting
- linestylestr, optional
assign a linestyle to this category for plotting
- check_aggregate(variable, components=None, method='sum', exclude_on_fail=False, multiplier=1, **kwargs)[source]¶
Check whether timeseries data matches the aggregation by its components.
- Parameters:
- variablestr or list of str
Variable(s) checked for matching aggregation of sub-categories.
- componentslist of str, optional
List of variables to aggregate, defaults to sub-categories of variable.
- methodfunc or str, optional
Method to use for aggregation, e.g.
numpy.mean
,numpy.sum
, ‘min’, ‘max’.- exclude_on_failbool, optional
If True, set
exclude
= True for all scenarios where the aggregate does not match the aggregated components.- multipliernumber, optional
Multiplicative factor when comparing variable and sum of components.
- kwargsTolerance arguments for comparison of values
Passed to
numpy.isclose()
.
- Returns:
pandas.DataFrame
or NoneData where variables and aggregate does not match the aggregated components.
- check_aggregate_region(variable, region='World', subregions=None, components=False, method='sum', weight=None, exclude_on_fail=False, drop_negative_weights=True, **kwargs)[source]¶
Check whether timeseries data matches the aggregation across subregions.
- Parameters:
- variablestr or list of str
Variable(s) to be checked for matching aggregation of subregions.
- regionstr, optional
Region to be checked for matching aggregation of subregions.
- subregionslist of str, optional
List of subregions, defaults to all regions other than region.
- componentsbool or list of str, optional
Variables at the region level to be included in the aggregation (ignored if False); if True, use all sub-categories of variable included in region but not in any of the subregions; or explicit list of variables.
- methodfunc or str, optional
Method to use for aggregation, e.g.
numpy.mean
,numpy.sum
, ‘min’, ‘max’.- weightstr, optional
Variable to use as weight for the aggregation (currently only supported with method=’sum’).
- exclude_on_failboolean, optional
If True, set
exclude
= True for all scenarios where the aggregate does not match the aggregated components.- drop_negative_weightsbool, optional
Removes any aggregated values that are computed using negative weights
- kwargsTolerance arguments for comparison of values
Passed to
numpy.isclose()
.
- Returns:
pandas.DataFrame
or NoneData where variables and region-aggregate does not match.
- check_internal_consistency(components=False, **kwargs)[source]¶
Check whether a scenario ensemble is internally consistent.
We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a DataFrame of inconsistent variables is returned.
Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we store how the regions relate, see this issue.
- Parameters:
- kwargsarguments for comparison of values
passed to
numpy.isclose()
- componentsbool, optional
passed to
check_aggregate_region()
if True, use all sub-categories of each variable included in World but not in any of the subregions; if False, only aggregate variables over subregions
- col_apply(col, func, *args, **kwargs)[source]¶
Apply a function to a column of data or meta
- Parameters:
- col: str
column in either data or meta dataframes
- func: function
function to apply
- property compute¶
Access to advanced computation methods, see
IamComputeAccessor
- convert_unit(current, to, factor=None, registry=None, context=None, inplace=False)[source]¶
Convert all timeseries data having current units to new units.
If factor is given, existing values are multiplied by it, and the to units are assigned to the ‘unit’ column.
Otherwise, the
pint
package is used to convert from current -> to units without an explicit conversion factor. Pint natively handles conversion between any standard (SI) units that have compatible dimensionality, such as exajoule to terawatt-hours,EJ -> TWh
, or tonne per year to gram per second,t / yr -> g / sec
.The default registry includes additional unit definitions relevant for integrated assessment models and energy systems analysis, via the iam-units package. This registry can also be accessed directly, using:
from iam_units import registry
When using this registry, current and to may contain the symbols of greenhouse gas (GHG) species, such as ‘CO2e’, ‘C’, ‘CH4’, ‘N2O’, ‘HFC236fa’, etc., as well as lower-case aliases like ‘co2’ supported by
pyam
. In this case, context must be the name of a specific global warming potential (GWP) metric supported byiam_units
, e.g. ‘AR5GWP100’ (optionally prefixed by ‘gwp_’, e.g. ‘gwp_AR5GWP100’).Rows with units other than current are not altered.
- Parameters:
- currentstr
Current units to be converted.
- tostr
New unit (to be converted to) or symbol for target GHG species. If only the GHG species is provided, the units (e.g.
Mt / year
) will be the same as current, and an expression combining units and species (e.g. ‘Mt CO2e / yr’) will be placed in the ‘unit’ column.- factorvalue, optional
Explicit factor for conversion without pint.
- registry
pint.UnitRegistry
, optional Specific unit registry to use for conversion. Default: the iam-units registry.
- contextstr or
pint.Context
, optional (Name of) the context to use in conversion. Required when converting between GHG species using GWP metrics, unless the species indicated by current and to are the same.
- inplacebool, optional
Whether to return a new IamDataFrame.
- Returns:
- Raises:
- pint.UndefinedUnitError
if attempting a GWP conversion but context is not given.
- pint.DimensionalityError
without factor, when current and to are not compatible units.
- property coordinates¶
Return the list of data coordinates (columns not including index names)
- copy()[source]¶
Make a deepcopy of this object
See
copy.deepcopy()
for details.
- property data¶
Return the timeseries data as a long
pandas.DataFrame
- diff(mapping, periods=1, append=False)[source]¶
Compute the difference of timeseries data along the time dimension
This methods behaves as if applying
pandas.DataFrame.diff()
on the timeseries data in wide format. By default, the diff-value in period t is computed as x[t] - x[t-1].- Parameters:
- mappingdict
Mapping of variable item(s) to the name(s) of the diff-ed timeseries data, e.g.,
{"current variable": "name of diff-ed variable", ...}
- periodsint, optional
Periods to shift for calculating difference, accepts negative values; passed to
pandas.DataFrame.diff()
.- appendbool, optional
Whether to append computed timeseries data to this instance.
- Returns:
IamDataFrame
or NoneComputed timeseries data or None if append=True.
See also
Notes
This method behaves as if applying
pandas.DataFrame.diff()
by row in a wide data format, so the difference is computed on the previous existing value. This can lead to unexpected results if the data has inconsistent period lengths.Use the following to ensure that no missing values exist prior to computing the difference:
df.interpolate(time=df.year)
- property dimensions¶
Return the list of data columns (index names & data coordinates)
- divide(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶
Divide the timeseries data items a and b along an axis
This function computes a / b. If a or b are lists, the method applies
pandas.groupby().sum()
on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.- Parameters:
- a, bstr, list of str or a number
Items to be used for the division.
- namestr
Name of the computed timeseries data on the axis.
- axisstr, optional
Axis along which to compute.
- fillnadict or scalar, optional
Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form
{arg: default}
.- ignore_unitsbool or str, optional
Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
- appendbool, optional
Whether to append aggregated timeseries data to this instance.
- Returns:
IamDataFrame
or NoneComputed timeseries data or None if append=True.
See also
Notes
This function uses the
pint
package and theiam-units
registry (read the docs) to handle units.pyam
will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units'{:~}'.format(u)
(see here for more information).As a result, the notation of returned units may differ from the input format. For example, the unit
EJ/yr
may be reformatted toEJ / a
.
- downscale_region(variable, region='World', subregions=None, proxy=None, weight=None, append=False)[source]¶
Downscale timeseries data to a number of subregions.
- Parameters:
- variablestr or list of str
variable(s) to be downscaled
- regionstr, optional
region from which data will be downscaled
- subregionslist of str, optional
list of subregions, defaults to all regions other than region (if using proxy) or region index (if using weight)
- proxystr, optional
variable (within the
IamDataFrame
) to be used as proxy for regional downscaling- weightclass:pandas.DataFrame, optional
dataframe with time dimension as columns (year or
datetime.datetime
) and regions[, model, scenario] as index- appendbool, optional
append the downscaled timeseries to self and return None, else return downscaled data as new IamDataFrame
- property empty¶
Indicator whether this object is empty
- equals(other)[source]¶
Test if two objects contain the same data and meta indicators
This function allows two IamDataFrame instances to be compared against each other to see if they have the same timeseries data and meta indicators. nan’s in the same location of the meta table are considered equal.
- Parameters:
- otherIamDataFrame
the other
IamDataFrame
to be compared with self
- property exclude¶
Indicator for exclusion of scenarios, used by validation methods
See also
- export_meta(excel_writer, sheet_name='meta', **kwargs)[source]¶
Write the ‘meta’ indicators of this object to an Excel spreadsheet
- Parameters:
- excel_writerstr, path object or ExcelWriter object
File path,
pathlib.Path
, or existingpandas.ExcelWriter
.- sheet_namestr
Name of sheet which will contain ‘meta’.
- **kwargs
Passed to
pandas.ExcelWriter
(if excel_writer is path-like)
- filter(keep=True, inplace=False, **kwargs)[source]¶
Return a (copy of a) filtered (downselected) IamDataFrame
- Parameters:
- keepbool, optional
Keep all scenarios satisfying the filters (if True) or the inverse.
- inplacebool, optional
If True, do operation inplace and return None.
- **kwargs
Passed to
slice()
.
- get_data_column(column)[source]¶
Return a column from the timeseries data in long format
Equivalent to
IamDataFrame.data[column]
.- Parameters:
- columnstr
The column name.
- Returns:
- pd.Series
- head(*args, **kwargs)[source]¶
Identical to
pandas.DataFrame.head()
operating on data
- property index¶
Return all model-scenario combinations as
pandas.MultiIndex
The index allows to loop over the available model-scenario combinations using:
for model, scenario in df.index: ...
- info(n=80, meta_rows=5, memory_usage=False)[source]¶
Print a summary of the object index dimensions and meta indicators
- Parameters:
- nint
The maximum line length
- meta_rowsint
The maximum number of meta indicators printed
- interpolate(time, inplace=False, **kwargs)[source]¶
Interpolate missing values in the timeseries data
This method uses
pandas.DataFrame.interpolate()
, which applies linear interpolation by default- Parameters:
- timeint or datetime, or list-like thereof
Year or
datetime.datetime
to be interpolated. This must match the datetime/year format of self.- inplacebool, optional
if True, do operation inplace and return None
- kwargs
passed to
pandas.DataFrame.interpolate()
- load_meta(path, sheet_name='meta', ignore_conflict=False, **kwargs)[source]¶
Load ‘meta’ indicators from file
- Parameters:
- pathstr,
pathlib.Path
orpandas.ExcelFile
A valid path or instance of an xlsx or csv file
- sheet_namestr, optional
Name of the sheet to be parsed (if xlsx)
- ignore_conflictbool, optional
If True, values in path take precedence over existing meta. If False, raise an error in case of conflicts.
- kwargs
Passed to
pandas.read_excel()
orpandas.read_csv()
- pathstr,
- property model¶
Return the list of (unique) model names
- multiply(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶
Multiply timeseries data items a and b along an axis
This function computes a * b. If a or b are lists, the method applies
pandas.groupby().sum()
on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.- Parameters:
- a, bstr, list of str or a number
Items to be multiplied.
- namestr
Name of the computed timeseries data on the axis.
- axisstr, optional
Axis along which to compute.
- fillnadict or scalar, optional
Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form
{arg: default}
.- ignore_unitsbool or str, optional
Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
- appendbool, optional
Whether to append aggregated timeseries data to this instance.
- Returns:
IamDataFrame
or NoneComputed timeseries data or None if append=True.
Notes
This function uses the
pint
package and theiam-units
registry (read the docs) to handle units.pyam
will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units'{:~}'.format(u)
(see here for more information).As a result, the notation of returned units may differ from the input format. For example, the unit
EJ/yr
may be reformatted toEJ / a
.
- normalize(inplace=False, **kwargs)[source]¶
Normalize data to a specific data point
Note: Currently only supports normalizing to a specific time.
- Parameters:
- inplacebool, optional
if
True
, do operation inplace and return None- kwargs
the column and value on which to normalize (e.g., year=2005)
- offset(padding=0, fill_value=None, inplace=False, **kwargs)[source]¶
Compute new data which is offset from a specific data point
For example, offsetting from year=2005 will provide data relative to year=2005 such that the value in 2005 is 0 and all other values value[year] - value[2005].
Conceptually this operation performs as:
` df - df.filter(**kwargs) + padding `
Note: Currently only supports normalizing to a specific time.
- Parameters:
- paddingfloat, optional
an additional offset padding
- fill_valuefloat or None, optional
Applied on subtraction. Fills exisiting missing (NaN) values. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.subtract.html
- inplacebool, optional
if
True
, do operation inplace and return None- kwargs
the column and value on which to offset (e.g., year=2005)
- pivot_table(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]¶
Returns a pivot table
- Parameters:
- indexstr or list of str
Rows for Pivot table
- columnsstr or list of str
Columns for Pivot table
- valuesstr, optional
Dataframe column to aggregate or count
- aggfuncstr or function, optional
Function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’
- fill_valuescalar, optional
Value to replace missing values
- stylestr, optional
Output style for pivot table formatting, accepts ‘highlight_not_max’, ‘heatmap’
- property region¶
Return the list of (unique) regions
- rename(mapping=None, inplace=False, append=False, check_duplicates=True, **kwargs)[source]¶
Rename any index dimension or data coordinate.
When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.
Renaming is only applied to any data row that matches for all columns given in mapping. Renaming can only be applied to the model and scenario columns, or to other data coordinates simultaneously.
- Parameters:
- mappingdict or kwargs
mapping of column name to rename-dictionary of that column
dict(<column_name>: {<current_name_1>: <target_name_1>, <current_name_2>: <target_name_2>})
or kwargs as column_name={<current_name_1>: <target_name_1>, …}
- inplacebool, optional
Do operation inplace and return None.
- appendbool, optional
Whether to append aggregated timeseries data to this instance (if inplace=True) or to a returned new instance (if inplace=False).
- check_duplicatesbool, optional
Check whether conflicts exist after renaming of timeseries data coordinates. If True, raise a ValueError; if False, rename and merge with
groupby().sum()
.
- Returns:
IamDataFrame
or NoneAggregated timeseries data as new object or None if inplace=True.
- require_data(region=None, variable=None, unit=None, year=None, exclude_on_fail=False)[source]¶
Check whether scenarios have values for all (combinations of) given elements.
- Parameters:
- regionstr or list of str, optional
Required region(s).
- variablestr or list of str, optional
Required variable(s).
- unitstr or list of str, optional
Required unit(s).
- yearint or list of int, optional
Required year(s).
- exclude_on_failbool, optional
If True, set
exclude
= True for all scenarios that do not satisfy the criteria.
- Returns:
pandas.DataFrame
or NoneA dataframe of missing (combinations of) elements for all scenarios.
- require_variable(*args, **kwargs)[source]¶
This method is deprecated, use df.require_data() instead.
- property scenario¶
Return the list of (unique) scenario names
- set_meta(meta, name=None, index=None)[source]¶
Add meta indicators as pandas.Series, list or value (int/float/str)
- Parameters:
- metapandas.DataFrame, pandas.Series, list, int, float or str
column to be added to ‘meta’ (by [‘model’, ‘scenario’] index if possible)
- namestr, optional
meta column name (defaults to meta pandas.Series.name); either meta.name or the name kwarg must be defined
- indexIamDataFrame, pandas.DataFrame or pandas.MultiIndex, optional
index to be used for setting meta column ([‘model’, ‘scenario’])
- set_meta_from_data(name, method=None, column='value', **kwargs)[source]¶
Add meta indicators from downselected timeseries data of self
- Parameters:
- namestr
column name of the ‘meta’ table
- methodfunction, optional
method for aggregation (e.g.,
numpy.max
); required if downselected data do not yield unique values- columnstr, optional
the column from data to be used to derive the indicator
- kwargs
passed to
filter()
for downselected data
- slice(keep=True, **kwargs)[source]¶
Return a (filtered) slice object of the IamDataFrame timeseries data index
- Parameters:
- keepbool, optional
Keep all scenarios satisfying the filters (if True) or the inverse.
- **kwargs
Arguments for filtering. See the “Notes”.
- Returns:
Notes
The following arguments are available for filtering:
‘meta’ columns: filter by string value of that column
‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’: string or list of strings, where * can be used as a wildcard
‘index’: list of model, scenario 2-tuples or
pandas.MultiIndex
‘level’: the “depth” of entries in the variable column (number of ‘|’) (excluding the strings given in the ‘variable’ argument)
‘year’: takes an integer (int/np.int64), a list of integers or a range. Note that the last year of a range is not included, so range(2010, 2015) is interpreted as [2010, …, 2014]
‘time_domain’: can be “year” or “datetime”
arguments for filtering by datetime.datetime or np.datetime64 (‘month’, ‘hour’, ‘time’)
‘regexp=True’ disables pseudo-regexp syntax in pattern_match()
- subtract(a, b, name, axis='variable', fillna=None, ignore_units=False, append=False)[source]¶
Compute the difference of timeseries data items a and b along an axis
This function computes a - b. If a or b are lists, the method applies
pandas.groupby().sum()
on each group. If either a or b are not defined for a row and fillna is not specified, no value is computed for that row.- Parameters:
- a, bstr, list of str or a number
Items to be used for the subtraction.
- namestr
Name of the computed timeseries data on the axis.
- axisstr, optional
Axis along which to compute.
- fillnadict or scalar, optional
Value to fill holes when rows are not defined for either a or b. Can be a scalar or a dictionary of the form
{arg: default}
.- ignore_unitsbool or str, optional
Perform operation on values without considering units. Set units of returned data to unknown (if True) or the value of ignore_units (if str).
- appendbool, optional
Whether to append aggregated timeseries data to this instance.
- Returns:
IamDataFrame
or NoneComputed timeseries data or None if append=True.
See also
Notes
This function uses the
pint
package and theiam-units
registry (read the docs) to handle units.pyam
will keep notation consistent with the input format (if possible) and otherwise uses abbreviated units'{:~}'.format(u)
(see here for more information).As a result, the notation of returned units may differ from the input format. For example, the unit
EJ/yr
may be reformatted toEJ / a
.
- swap_time_for_year(subannual=False, inplace=False)[source]¶
Convert the time dimension to year (as integer).
- Parameters:
- subannualbool, str or func, optional
Merge non-year components of the “time” domain as new column “subannual”. Apply
strftime()
on the values of the “time” domain using subannual (if a string) or “%m-%d %H:%M%z” (if True). If it is a function, apply the function on the values of the “time” domain.- inplacebool, optional
If True, do operation inplace and return None.
- Returns:
IamDataFrame
or NoneObject with altered time domain or None if inplace=True.
- Raises:
- ValueError
“time” is not a column of self.data
See also
- swap_year_for_time(inplace=False)[source]¶
Convert the year and subannual dimensions to time (as datetime).
The method applies
dateutil.parser.parse()
on the combined columns year and subannual:dateutil.parser.parse([f"{y}-{s}" for y, s in zip(year, subannual)])
- Parameters:
- inplacebool, optional
If True, do operation inplace and return None.
- Returns:
IamDataFrame
or NoneObject with altered time domain or None if inplace=True.
- Raises:
- ValueError
“year” or “subannual” are not a column of self.data
See also
- tail(*args, **kwargs)[source]¶
Identical to
pandas.DataFrame.tail()
operating on data
- property time¶
The time index, i.e., axis labels related to the time domain.
- Returns:
- A
pandas.Index
(dtype ‘int64’) if thetime_domain
is ‘year’
- A
- A
pandas.DatetimeIndex
if thetime_domain
is ‘datetime’
- A
- A
pandas.Index
if thetime_domain
is ‘mixed’
- A
- property time_domain¶
Indicator of the time domain: ‘year’, ‘datetime’, or ‘mixed’
- timeseries(iamc_index=False)[source]¶
Returns data as
pandas.DataFrame
in wide format- Parameters:
- iamc_indexbool, optional
if True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all ‘data’ columns
- Raises:
- ValueError
IamDataFrame is empty
- ValueError
reducing to IAMC-index yields an index with duplicates
- to_csv(path=None, iamc_index=False, **kwargs)[source]¶
Write
IamDataFrame.timeseries()
to a comma-separated values (csv) file- Parameters:
- pathstr, path or file-like, optional
File path as string or
pathlib.Path
, or file-like object. If None, the result is returned as a csv-formatted string. Seepandas.DataFrame.to_csv()
for details.- iamc_indexbool, optional
If True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all
dimensions
. SeeIamDataFrame.timeseries()
for details.- **kwargs
Passed to
pandas.DataFrame.to_csv()
.
- to_datapackage(path)[source]¶
Write object to a frictionless Data Package
More information: https://frictionlessdata.io
Returns the saved
datapackage.Package
(read the docs). When adding metadata (descriptors), please follow the template defined by https://github.com/OpenEnergyPlatform/metadata- Parameters:
- pathstring or path object
any valid string path or
pathlib.Path
- to_excel(excel_writer, sheet_name='data', iamc_index=False, include_meta=True, **kwargs)[source]¶
Write object to an Excel spreadsheet
- Parameters:
- excel_writerpath-like, file-like, or ExcelWriter object
File path as string or
pathlib.Path
, or existingpandas.ExcelWriter
.- sheet_namestr, optional
Name of sheet which will contain
IamDataFrame.timeseries()
data.- iamc_indexbool, optional
If True, use [‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’]; else, use all
dimensions
. SeeIamDataFrame.timeseries()
for details.- include_metabool or str, optional
If True, write
meta
to a sheet ‘meta’ (default); if this is a string, use it as sheet name.- **kwargs
Passed to
pandas.ExcelWriter
(if excel_writer is path-like)
- property unit¶
Return the list of (unique) units
- property unit_mapping¶
Return a dictionary of variables to (list of) corresponding units
- validate(criteria={}, exclude_on_fail=False)[source]¶
Validate scenarios using criteria on timeseries values
Returns all scenarios which do not match the criteria and prints a log message, or returns None if all scenarios match the criteria.
When called with exclude_on_fail=True, scenarios not satisfying the criteria will be marked as exclude=True.
- Parameters:
- criteriadict
- dictionary with variable keys and validation mappings
(‘up’ and ‘lo’ for respective bounds, ‘year’ for years)
- exclude_on_failbool, optional
If True, set
exclude
= True for all scenarios that do not satisfy the criteria.
- Returns:
pandas.DataFrame
or NoneAll data points that do not satisfy the criteria.
- property variable¶
Return the list of (unique) variables