Algebraic operations on timeseries data

The pyam package offers many tools to facilitate processing of scenario data. In this notebook, we illustrate algebraic operations on the timeseries data of an IamDataFrame: addition, subtraction, multiplication, and division.

The algebraic operations are (by default) “unit-aware”, meaning that pyam tries to handle units correctly. This is implemented via the iam-units package, an extension of pint package.

The pint package natively handles conversion of standard (SI) units and commonly used equivalents (e.g., exajoule to terawatt-hours, EJ -> TWh), and it can parse combined units (e.g., exajoule per year, EJ/yr). To better support common use cases when working with energy systems analysis and integrated-assessment scenarios, the default pint.UnitRegistry used by pyam uses the iam-units registry (see IAMconsortium/units), which extends the pint-defaults with a wide range of conversion factors commonly used in that domain.

Overview

  1. Import data from file and inspect the scenario

  2. A simple subtraction

  3. Multiplying timeseries data with scalars

  4. Calculating shares and dealing with units

  5. Overriding unit handling

  6. Working on other dimensions of timeseries data

See Also

The pyam package also supports aggregation and downscaling along the sectoral and regional dimensions including consistency checks. See the aggregation/downscaling tutorial notebook for more information.

0. Import data from file and inspect the scenario

The stylized scenario used in this tutorial has data for two regions (reg_a & reg_b) as well as the World aggregate, and for categories of variables: primary energy demand, emissions, carbon price, and population.

[1]:
from pyam import IamDataFrame

df = IamDataFrame(data="tutorial_data_aggregating_downscaling.csv")
df
[INFO] 12:03:36 - pyam.core: Reading file tutorial_data_aggregating_downscaling.csv
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/checkouts/latest/pyam/utils.py:318: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
  .stack(dropna=True)
[1]:
<class 'pyam.core.IamDataFrame'>
Index:
 * model    : model_a (1)
 * scenario : scen_a (1)
Timeseries data coordinates:
   region   : World, reg_a, reg_b (3)
   variable : Emissions|CO2, Emissions|CO2|AFOLU, ... Primary Energy|Wind (9)
   unit     : EJ/yr, Mt CO2, USD/t CO2, million (4)
   year     : 2005, 2010 (2)
[2]:
df.variable
[2]:
['Emissions|CO2',
 'Emissions|CO2|AFOLU',
 'Emissions|CO2|Bunkers',
 'Emissions|CO2|Energy',
 'Population',
 'Price|Carbon',
 'Primary Energy',
 'Primary Energy|Coal',
 'Primary Energy|Wind']

1. A simple subtraction

We first display the existing variables Primary Energy and Primary Energy|Coal.

[3]:
df.filter(variable=["Primary Energy", "Primary Energy|Coal"]).timeseries()
[3]:
2005 2010
model scenario region variable unit
model_a scen_a World Primary Energy EJ/yr 12.0 15.0
Primary Energy|Coal EJ/yr 9.0 10.0
reg_a Primary Energy EJ/yr 8.0 9.0
Primary Energy|Coal EJ/yr 6.0 6.0
reg_b Primary Energy EJ/yr 4.0 6.0
Primary Energy|Coal EJ/yr 3.0 4.0

Now, we subtract fossil fuels (coal) from the total to see non-fossil energy use, and display the timeseries in wide format.

All algebraic-operations functions follow the syntax:

df.<method>(a, b, c) => a <op> b = c

Note that in simple cases, pyam will try to keep the unit consistent during the operation.

[4]:
(
    df.subtract(
        "Primary Energy", "Primary Energy|Coal", "Primary Energy|Non-Fossil"
    ).timeseries()
)
[4]:
2005 2010
model scenario region variable unit
model_a scen_a World Primary Energy|Non-Fossil EJ/yr 3.0 5.0
reg_a Primary Energy|Non-Fossil EJ/yr 2.0 3.0
reg_b Primary Energy|Non-Fossil EJ/yr 1.0 2.0

We can also directly merge newly computed timeseries directly into the original IamDataFrame using the keyword argument append=True.

The new variable Primary Energy|Non-Fossil is then part of the variable list.

[5]:
(
    df.subtract(
        "Primary Energy",
        "Primary Energy|Coal",
        "Primary Energy|Non-Fossil",
        append=True,
    )
)
[6]:
df.variable
[6]:
['Emissions|CO2',
 'Emissions|CO2|AFOLU',
 'Emissions|CO2|Bunkers',
 'Emissions|CO2|Energy',
 'Population',
 'Price|Carbon',
 'Primary Energy',
 'Primary Energy|Coal',
 'Primary Energy|Non-Fossil',
 'Primary Energy|Wind']

2. Multiplying timeseries data with scalars

The algebraic operations do not only work on items in the IamDataFrame, but you can also pass scalars.

You will see that in more elaborate computations, pyam may change the notation of the units. In the example below, EJ/yr is changed to EJ / a. This is due to how the pint package works internally.

[7]:
df.multiply("Primary Energy", 3, "PE * 3").timeseries()
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[7]:
2005 2010
model scenario region variable unit
model_a scen_a World PE * 3 EJ / a 36.0 45.0
reg_a PE * 3 EJ / a 24.0 27.0
reg_b PE * 3 EJ / a 12.0 18.0

You can also define a pint.Quantity from the iam-units registry and use this in the calculation. Note that pyam will (try to) correctly reduce the fraction.

[8]:
from iam_units import registry

q = registry.Quantity(3, "t / EJ")
df.multiply("Primary Energy", q, "custom variable").timeseries()
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[8]:
2005 2010
model scenario region variable unit
model_a scen_a World custom variable t / a 36.0 45.0
reg_a custom variable t / a 24.0 27.0
reg_b custom variable t / a 12.0 18.0

3. Calculating shares and dealing with units

As a next step, we calculate the primary energy use per capita.

[9]:
(df.divide("Primary Energy", "Population", "Energy/Capita").timeseries())
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[9]:
2005 2010
model scenario region variable unit
model_a scen_a World Energy/Capita EJ / a / million 4.000000 3.0
reg_a Energy/Capita EJ / a / million 5.333333 3.6
reg_b Energy/Capita EJ / a / million 2.666667 2.4

As illustrated above, the notation of the units may be changed during the computation.

If you do not like the returned units, you can change that using the rename() function.

[10]:
(
    df.divide("Primary Energy", "Population", "Energy/Capita")
    .rename(unit={"EJ / a / million": "EJ/yr/million"})
    .timeseries()
)
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[10]:
2005 2010
model scenario region variable unit
model_a scen_a World Energy/Capita EJ/yr/million 4.000000 3.0
reg_a Energy/Capita EJ/yr/million 5.333333 3.6
reg_b Energy/Capita EJ/yr/million 2.666667 2.4

Or you can use the convert_unit() function; see the unit conversion tutorial notebook for more information.

[11]:
(
    df.divide("Primary Energy", "Population", "Energy/Capita")
    .convert_unit("EJ / a / million", "GWh/yr")
    .timeseries()
)
/home/docs/checkouts/readthedocs.org/user_builds/pyam-iamc/envs/latest/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1601: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values
[11]:
2005 2010
model scenario region variable unit
model_a scen_a World Energy/Capita GWh/yr 1.111111 0.833333
reg_a Energy/Capita GWh/yr 1.481481 1.000000
reg_b Energy/Capita GWh/yr 0.740741 0.666667

4. Overriding unit handling

Even though pint is quite powerful, it does not always work as expected. For example, Mt CO2 is (strictly speaking) not a unit, but a species indicator CO2 combined with a unit.

For illustration, computing the emissions per capita will raise a pint.UndefinedUnitError.

We can override this behavior by setting ignore_units=True; in this case, the unit of the returned timeseries data will be set to unknown.

[12]:
(
    df.divide(
        "Emissions|CO2", "Population", "Emissions/Capita", ignore_units=True
    ).timeseries()
)
[12]:
2005 2010
model scenario region variable unit
model_a scen_a World Emissions/Capita unknown 3.333333 2.8
reg_a Emissions/Capita unknown 4.000000 3.2
reg_b Emissions/Capita unknown 2.000000 1.6

You can also pass a string as the ignore_units keyword argument. Then, this string will be used as unit.

Seeing that the unit of emissions is Mt CO2 and Population is given in million, we know that the returned value should be given in tons of CO2.

[13]:
(
    df.divide(
        "Emissions|CO2", "Population", "Emissions/Capita", ignore_units="t CO2"
    ).timeseries()
)
[13]:
2005 2010
model scenario region variable unit
model_a scen_a World Emissions/Capita t CO2 3.333333 2.8
reg_a Emissions/Capita t CO2 4.000000 3.2
reg_b Emissions/Capita t CO2 2.000000 1.6

5. Working on other dimensions of timeseries data

By default, algebraic operations in pyam will work on the variable dimension. But you can pass an axis keyword argument to, for example, perform computations between scenarios or regions.

Try it!

[ ]: