Algebraic operations on timeseries data¶
The pyam package offers many tools to facilitate processing of scenario data. In this notebook, we illustrate algebraic operations on the timeseries data of an IamDataFrame: addition, subtraction, multiplication, and division.
The algebraic operations are (by default) “unit-aware”, meaning that pyam tries to handle units correctly. This is implemented via the iam-units package, an extension of pint package.
The pint package natively handles conversion of standard (SI) units and commonly used equivalents (e.g., exajoule to terawatt-hours, EJ -> TWh), and it can parse combined units (e.g., exajoule per year, EJ/yr). To better support common use cases when working with energy systems analysis and integrated-assessment scenarios, the default pint.UnitRegistry used by pyam uses the iam-units registry (see IAMconsortium/units), which extends the pint-defaults with a wide range of conversion factors commonly used in that domain.
Overview¶
Import data from file and inspect the scenario
A simple subtraction
Multiplying timeseries data with scalars
Calculating shares and dealing with units
Overriding unit handling
Working on other dimensions of timeseries data
See Also
The pyam package also supports aggregation and downscaling along the sectoral and regional dimensions including consistency checks. See the aggregation/downscaling tutorial notebook for more information.
0. Import data from file and inspect the scenario¶
The stylized scenario used in this tutorial has data for two regions (reg_a
& reg_b
) as well as the World
aggregate, and for categories of variables: primary energy demand, emissions, carbon price, and population.
[1]:
from pyam import IamDataFrame
df = IamDataFrame(data="tutorial_data_aggregating_downscaling.csv")
df
[INFO] 12:03:36 - pyam.core: Reading file tutorial_data_aggregating_downscaling.csv
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/checkouts/latest/pyam/utils.py:318: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
.stack(dropna=True)
[1]:
<class 'pyam.core.IamDataFrame'>
Index:
* model : model_a (1)
* scenario : scen_a (1)
Timeseries data coordinates:
region : World, reg_a, reg_b (3)
variable : Emissions|CO2, Emissions|CO2|AFOLU, ... Primary Energy|Wind (9)
unit : EJ/yr, Mt CO2, USD/t CO2, million (4)
year : 2005, 2010 (2)
[2]:
df.variable
[2]:
['Emissions|CO2',
'Emissions|CO2|AFOLU',
'Emissions|CO2|Bunkers',
'Emissions|CO2|Energy',
'Population',
'Price|Carbon',
'Primary Energy',
'Primary Energy|Coal',
'Primary Energy|Wind']
1. A simple subtraction¶
We first display the existing variables Primary Energy and Primary Energy|Coal.
[3]:
df.filter(variable=["Primary Energy", "Primary Energy|Coal"]).timeseries()
[3]:
2005 | 2010 | |||||
---|---|---|---|---|---|---|
model | scenario | region | variable | unit | ||
model_a | scen_a | World | Primary Energy | EJ/yr | 12.0 | 15.0 |
Primary Energy|Coal | EJ/yr | 9.0 | 10.0 | |||
reg_a | Primary Energy | EJ/yr | 8.0 | 9.0 | ||
Primary Energy|Coal | EJ/yr | 6.0 | 6.0 | |||
reg_b | Primary Energy | EJ/yr | 4.0 | 6.0 | ||
Primary Energy|Coal | EJ/yr | 3.0 | 4.0 |
Now, we subtract fossil fuels (coal) from the total to see non-fossil energy use, and display the timeseries in wide format.
All algebraic-operations functions follow the syntax:
df.<method>(a, b, c) => a <op> b = c
Note that in simple cases, pyam will try to keep the unit consistent during the operation.
[4]:
(
df.subtract(
"Primary Energy", "Primary Energy|Coal", "Primary Energy|Non-Fossil"
).timeseries()
)
[4]:
2005 | 2010 | |||||
---|---|---|---|---|---|---|
model | scenario | region | variable | unit | ||
model_a | scen_a | World | Primary Energy|Non-Fossil | EJ/yr | 3.0 | 5.0 |
reg_a | Primary Energy|Non-Fossil | EJ/yr | 2.0 | 3.0 | ||
reg_b | Primary Energy|Non-Fossil | EJ/yr | 1.0 | 2.0 |
We can also directly merge newly computed timeseries directly into the original IamDataFrame using the keyword argument append=True
.
The new variable Primary Energy|Non-Fossil is then part of the variable list.
[5]:
(
df.subtract(
"Primary Energy",
"Primary Energy|Coal",
"Primary Energy|Non-Fossil",
append=True,
)
)
[6]:
df.variable
[6]:
['Emissions|CO2',
'Emissions|CO2|AFOLU',
'Emissions|CO2|Bunkers',
'Emissions|CO2|Energy',
'Population',
'Price|Carbon',
'Primary Energy',
'Primary Energy|Coal',
'Primary Energy|Non-Fossil',
'Primary Energy|Wind']
2. Multiplying timeseries data with scalars¶
The algebraic operations do not only work on items in the IamDataFrame, but you can also pass scalars.
You will see that in more elaborate computations, pyam may change the notation of the units. In the example below, EJ/yr is changed to EJ / a. This is due to how the pint package works internally.
[7]:
df.multiply("Primary Energy", 3, "PE * 3").timeseries()
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[7]:
2005 | 2010 | |||||
---|---|---|---|---|---|---|
model | scenario | region | variable | unit | ||
model_a | scen_a | World | PE * 3 | EJ / a | 36.0 | 45.0 |
reg_a | PE * 3 | EJ / a | 24.0 | 27.0 | ||
reg_b | PE * 3 | EJ / a | 12.0 | 18.0 |
You can also define a pint.Quantity from the iam-units registry and use this in the calculation. Note that pyam will (try to) correctly reduce the fraction.
[8]:
from iam_units import registry
q = registry.Quantity(3, "t / EJ")
df.multiply("Primary Energy", q, "custom variable").timeseries()
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
[8]:
2005 | 2010 | |||||
---|---|---|---|---|---|---|
model | scenario | region | variable | unit | ||
model_a | scen_a | World | custom variable | t / a | 36.0 | 45.0 |
reg_a | custom variable | t / a | 24.0 | 27.0 | ||
reg_b | custom variable | t / a | 12.0 | 18.0 |
4. Overriding unit handling¶
Even though pint is quite powerful, it does not always work as expected. For example, Mt CO2 is (strictly speaking) not a unit, but a species indicator CO2 combined with a unit.
For illustration, computing the emissions per capita will raise a pint.UndefinedUnitError.
We can override this behavior by setting ignore_units=True
; in this case, the unit of the returned timeseries data will be set to unknown.
[12]:
(
df.divide(
"Emissions|CO2", "Population", "Emissions/Capita", ignore_units=True
).timeseries()
)
[12]:
2005 | 2010 | |||||
---|---|---|---|---|---|---|
model | scenario | region | variable | unit | ||
model_a | scen_a | World | Emissions/Capita | unknown | 3.333333 | 2.8 |
reg_a | Emissions/Capita | unknown | 4.000000 | 3.2 | ||
reg_b | Emissions/Capita | unknown | 2.000000 | 1.6 |
You can also pass a string as the ignore_units
keyword argument. Then, this string will be used as unit.
Seeing that the unit of emissions is Mt CO2 and Population is given in million, we know that the returned value should be given in tons of CO2.
[13]:
(
df.divide(
"Emissions|CO2", "Population", "Emissions/Capita", ignore_units="t CO2"
).timeseries()
)
[13]:
2005 | 2010 | |||||
---|---|---|---|---|---|---|
model | scenario | region | variable | unit | ||
model_a | scen_a | World | Emissions/Capita | t CO2 | 3.333333 | 2.8 |
reg_a | Emissions/Capita | t CO2 | 4.000000 | 3.2 | ||
reg_b | Emissions/Capita | t CO2 | 2.000000 | 1.6 |
5. Working on other dimensions of timeseries data¶
By default, algebraic operations in pyam will work on the variable dimension. But you can pass an axis
keyword argument to, for example, perform computations between scenarios or regions.
Try it!
[ ]: