{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# First steps with the pyam package\n",
    "\n",
    "## Scope and feature overview\n",
    "\n",
    "The **pyam** package provides a range of diagnostic tools and functions\n",
    "for analyzing, visualizing and working with timeseries data following the format established by the *Integrated Assessment Modeling Consortium* ([IAMC](https://www.iamconsortium.org)).\n",
    "\n",
    "The format has been used in several IPCC assessments and numerous model comparison exercises.\n",
    "An illustrative example of this format template is shown below;\n",
    "[read the docs](https://pyam-iamc.readthedocs.io/en/stable/data.html) for more information.\n",
    "\n",
    "<img style=\"float: right; margin: 10px;\" src=\"_static/iamc-logo.png\">\n",
    "\n",
    "| **Model** | **Scenario** | **Region** | **Variable**   | **Unit** | **2005** | **2010** | **2015** |\n",
    "|-----------|--------------|------------|----------------|----------|----------|----------|----------|\n",
    "| MESSAGE   | CD-LINKS 400 | World      | Primary Energy | EJ/y     |    462.5 |    500.7 |      ... |\n",
    "\n",
    "This notebook illustrates the basic functionality of the **pyam** package\n",
    "and the **IamDataFrame** class:\n",
    "\n",
    "0. Load timeseries data from a snapshot file and inspect the scenario ensemble\n",
    "1. Apply filters to the ensemble and display the timeseries data \n",
    "   as [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)\n",
    "2. Visualize timeseries data using the plotting library based on the [matplotlib](https://matplotlib.org/) package\n",
    "3. Perform scenario diagnostic and validation checks\n",
    "4. Categorize scenarios according to timeseries data values\n",
    "5. Compute quantitative indicators for further scenario characterization & diagnostics\n",
    "6. Export data and categorization to a file\n",
    "\n",
    "\n",
    "## Read the docs\n",
    "\n",
    "A comprehensive documentation is available at [pyam-iamc.readthedocs.io](http://pyam-iamc.readthedocs.io)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tutorial data\n",
    "\n",
    "The timeseries data used in this tutorial is a partial snapshot of the scenario ensemble\n",
    "compiled for the IPCC's *Special Report on Global Warming of 1.5°C* ([SR15](http://ipcc.ch/sr15/)).\n",
    "The complete scenario ensemble data is publicly available from the [IAMC 1.5°C Scenario Explorer and Data hosted by IIASA](https://data.ece.iiasa.ac.at/iamc-1.5c-explorer).  \n",
    "\n",
    "Please read the [License](https://data.ece.iiasa.ac.at/iamc-1.5c-explorer/#/license) page of the IAMC 1.5°C Scenario Explorer before using the full scenario data for scientific analysis or other work.\n",
    "\n",
    "<img style=\"float: right; margin: 10px;\" src=\"_static/cdlinks_logo.png\">\n",
    "\n",
    "### Scenarios in the tutorial data\n",
    "\n",
    "The data used for this tutorial consists of selected variables from these sources:\n",
    "\n",
    " - an ensemble of scenarios from the *Horizon 2020* [CD-LINKS](https://www.cd-links.org) project  \n",
    " - the \"Faster Transition Scenario\" from the IEA's [World Energy Outlook 2017](https://www.oecd-ilibrary.org/energy/world-energy-outlook-2017_weo-2017-en),\n",
    " - the \"1.0\" scenario submitted by the GENeSYS-MOD team ([Löffler et al., 2017](https://doi.org/10.3390/en10101468))\n",
    "\n",
    "Please refer to the [About](https://data.ece.iiasa.ac.at/iamc-1.5c-explorer/#/about) page of the *IAMC 1.5°C Scenario Explorer* for references and additional information.\n",
    "\n",
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "The data used here is a partial snapshot of the IAMC 1.5°C Scenario Data!  \n",
    "This tutorial is only intended as an illustration of the **pyam** package.\n",
    "\n",
    "</div>\n",
    "\n",
    "### Citation of the scenario ensemble\n",
    "\n",
    "> D. Huppmann, E. Kriegler, V. Krey, K. Riahi, J. Rogelj, K. Calvin, F. Humpenoeder, A. Popp, S. K. Rose, J. Weyant, et al.  \n",
    "> *IAMC 1.5°C Scenario Explorer and Data hosted by IIASA* (release 2.0)  \n",
    "> Integrated Assessment Modeling Consortium & International Institute for Applied Systems Analysis, 2019.  \n",
    "> doi: [10.5281/zenodo.3363345](https://doi.org/10.5281/zenodo.3363345) | url: [data.ece.iiasa.ac.at/iamc-1.5c-explorer](https://data.ece.iiasa.ac.at/iamc-1.5c-explorer)\n",
    "\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import data from file and inspect the scenario\n",
    "\n",
    "We import the snapshot of the timeseries data from the file ``tutorial_data.csv``.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "If you haven't cloned the **pyam** GitHub repository to your machine, you can download the data file\n",
    "from the folder [docs/tutorials](https://github.com/IAMconsortium/pyam/tree/main/docs/tutorials).  \n",
    "Make sure to place the data file in the same folder as this notebook.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pyam\n",
    "\n",
    "df = pyam.IamDataFrame(data=\"tutorial_data.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a first step, we show an overview of the **IamDataFrame** content by simply calling `df` (alternatively, you can use `print(df)` or [df.info()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.info)).\n",
    "\n",
    "This function returns a concise (abbreviated) overview of the index dimensions and the qualitative/quantitative meta indicators (see an explanation of indicators below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following cells, we display the lists of all models, scenarios, regions, and the mapping of variables to  units in the snapshot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.scenario"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.region"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.unit_mapping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Apply filters to the ensemble and display the timeseries data\n",
    "\n",
    "A selection of the timeseries data  of an **IamDataFrame** can be obtained by applying the [filter()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.filter) function,\n",
    "which takes keyword-arguments of criteria.\n",
    "The function returns a down-selected clone of the **IamDataFrame** instance.\n",
    "\n",
    "### Filtering by model names, scenarios and regions\n",
    "\n",
    "The feature for filtering by **model, scenario or region** \n",
    "are implemented using exact string matching, where ``*`` can be used as a wildcard.\n",
    "\n",
    "First, we want to display the list of all scenarios submitted by the **MESSAGE** modeling team.\n",
    "\n",
    "> Applying the filter argument ``model='MESSAGE'`` will return an empty array  \n",
    "> (because the **MESSAGE** model in the tutorial data is actually called **MESSAGEix-GLOBIOM 1.0**)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(model=\"MESSAGE\").scenario"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Filtering for ``model='MESSAGE*'`` will return all scenarios provided by the **MESSAGEix-GLOBIOM 1.0** model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(model=\"MESSAGE*\").scenario"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inverting the selection\n",
    "\n",
    "Using the keyword `keep=False` allows you to select the inverse of the filter arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(region=\"World\").region"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(region=\"World\", keep=False).region"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filtering by variables and levels\n",
    "\n",
    "Filtering for **variable** strings works in an identical way as above,\n",
    "with ``*`` available as a wildcard.\n",
    "\n",
    "> Filtering for ``Primary Energy`` will return only exactly those data\n",
    "\n",
    "> Filtering for ``Primary Energy|*`` will return all sub-categories of \n",
    "> primary energy (and only the sub-categories)\n",
    "\n",
    "In addition, variables can be filtered by their **level**,\n",
    "i.e., the \"depth\" of the variable in a hierarchical reading of the string separated by `|` (*pipe*, not L or i).\n",
    "That is, the variable ``Primary Energy`` has level 0, while ``Primary Energy|Fossil`` has level 1.\n",
    "\n",
    "Filtering by both **variables** and **level** will search for the hierarchical depth \n",
    "_following the variable string_ so filter arguments ``variable='Primary Energy*'`` and ``level=1``\n",
    "will return all variables immediately below ``Primary Energy``.\n",
    "Filtering by **level** only will return all variables at that depth."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Primary Energy*\", level=1).variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next cell illustrates another use case of the **level** filter argument - filtering by `1-` (as string) instead of `1` (as integer) will return all timeseries data for variables *up to* the specified depth."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Primary Energy*\", level=\"1-\").variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last cell shows how to filter only by **level** without providing a **variable** argument.\n",
    "The example returns all variables that are at the second hierarchical level (i.e., not ``Primary Energy``)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(level=1).variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Displaying timeseries data\n",
    "\n",
    "As a next step, we want to view a selection of the timeseries data.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "The [timeseries()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.timeseries) function\n",
    "returns the data as a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) in the standard IAMC template.  \n",
    "This is a **wide format** table where years are shown as columns.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display_df = df.filter(model=\"MESSAGE*\", variable=\"Primary Energy\", region=\"World\")\n",
    "display_df.timeseries()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filtering by year\n",
    "\n",
    "Filtering for **years** can be done by one integer value, a list of integers, or the Python class [range](https://docs.python.org/3/library/stdtypes.html#ranges).\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "The last year of a range is not included, so `range(2010, 2015)`<br />\n",
    "is interpreted as `[2010, 2011, 2012, 2013, 2014]`.\n",
    "\n",
    "</div>\n",
    "\n",
    "The next cell shows the same down-selected **IamDataFrame** as above, but further reduced to three timesteps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display_df.filter(year=[2010, 2030, 2050]).timeseries()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parallels to the *pandas* data analysis toolkit\n",
    "\n",
    "When developing **pyam**, we followed the syntax of the Python package **pandas** ([read the docs](https://pandas.pydata.org)) closely where possible. In many cases, you can use similar functions directly on the **IamDataFrame**.\n",
    "\n",
    "In the next cell, we illustrate this parallel behaviour. The function [pyam.IamDataFrame.head()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.head) is similar to [pandas.DataFrame.head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html): \n",
    "it returns the first n rows of the 'data' table in **long format** (columns are in year/value format).\n",
    "\n",
    "Similar to the [timeseries()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.timeseries) function shown above, the returned object of [head()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.head) is a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting help\n",
    "\n",
    "When in doubt, you can look at the help for any function by appending a ``?``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize timeseries data using the plotting library\n",
    "\n",
    "This section provides an illustrative example of the plotting features of the **pyam** package.\n",
    "\n",
    "In the next cell, we show a simple line plot of global CO2 emissions. The colours are assigned randomly by default, and **pyam** deactivates the legend if there are too many lines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Emissions|CO2\", region=\"World\").plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most functions of the plotting library also take some intuitive keyword arguments for better styling options or using the same colors across groups of scenarios. For example, `color='scenario'` will use consistent colors for each scenario name (most of them implemented by multiple modeling frameworks).\n",
    "There are now less than 13 colors used, so the legend will be shown by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Emissions|CO2\", region=\"World\").plot(color=\"scenario\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The section on categorization will show more options of the plotting features, as well as a method to set specific colors for different categories. For more information, look at the other tutorials and the [plotting gallery](https://pyam-iamc.readthedocs.io/en/stable/gallery/index.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform scenario diagnostic and validation checks\n",
    "\n",
    "When analyzing scenario results, it is often useful to check whether certain timeseries data exist or the values are within a specific range.\n",
    "For example, it may make sense to ensure that reported data for historical periods are close to established reference data or that near-term developments are reasonable.\n",
    "\n",
    "Before diving into the diagnostics and validation features, we need to briefly introduce the **meta** attribute.\n",
    "This attribute of an **IamDataFrame** is a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html), which can be used to store categorization information and quantitative indicators of each model-scenario.\n",
    "In addition, an **IamDataFrame** has an attribubte **exclude**, which is a [pandas.Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) set to `False` for all model-scenario combinations.\n",
    "\n",
    "The next cell shows the first 10 rows of the 'meta' table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.meta.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following section provides three illustrations of the diagnostic tools:\n",
    "0. Verify that a timeseries `Primary Energy` exists in each scenario\n",
    "   (in at least one year and, in a second step, in the last year of the horizon).\n",
    "1. Validate whether scenarios deviate by more than 10% from the `Primary Energy` reference data reported in the *IEA Energy Statistics* in 2010.\n",
    "2. Use the `exclude_on_fail` option of the validation function to create a sub-selection of the scenario ensemble.\n",
    "\n",
    "For simplicity, the example in this section operates on a down-selected data ensemble that only contains global values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world = df.filter(region=\"World\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check for required variables\n",
    "\n",
    "We first use the [require_data()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.require_data) function to assert that the scenarios contain data for the expected timeseries.\n",
    "\n",
    "This method returns the list of scenarios and data rows where any combination of the required arguments are missing.  \n",
    "If the method returns **None**, all required data are present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world.require_data(variable=\"Primary Energy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world.require_data(variable=\"Primary Energy\", year=2100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The two cells above show that all scenarios report primary-energy data, but not all scenarios provide this timeseries until the end of the century."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Validate numerical values in the timeseries data\n",
    "\n",
    "The [validate()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.validate) function performs checks on specific values of timeseries data. The `criteria` argument specifies a valid range by an upper and lower bound (`up`, `lo`) for a variable and a subset of years to which the validation is applied - all scenarios with a value in at least one year outside that range are considered to *not satisfy* the validation. The function returns a list of data points not satisfying the criteria.\n",
    "\n",
    "According to the [IEA Energy Statistics](https://www.iea.org/statistics/), *Total Primary Energy Supply* was ~540 EJ in 2010. In the next cell, we show all data points that deviate (downwards) by more than 10% from this reference value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world.validate(variable=\"Primary Energy\", year=2010, lower_bound=540 * 0.9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use the `exclude_on_fail` feature to create a sub-selection of the scenario ensemble\n",
    "\n",
    "Per default, the functions above only report how many scenarios or which data points do not satisfy the validation criteria above.\n",
    "However, they also have an option to `exclude_on_fail`, which marks all scenarios failing the validation as `exclude=True` in the 'meta' table.\n",
    "This feature can be particularly helpful when a user wants to perform a number of validation steps and then efficiently remove all scenarios violating any of the criteria as part of a scripted workflow.\n",
    "\n",
    "We illustrate a simple validation workflow using the CO2 emissions. The next cell shows the trajectories of CO2 emissions across all scenarios."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world.filter(variable=\"Emissions|CO2\").plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next two cells perform validation to exclude all scenarios that have implausibly low emissions in 2020 (i.e., unrealistic near-term behaviour) as well as those that do not reduce emissions over the century (i.e., exceed a value of 45000 MT CO2 in any year)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world.validate(\n",
    "    variable=\"Emissions|CO2\", year=2020, lower_bound=38000, exclude_on_fail=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_world.validate(variable=\"Emissions|CO2\", upper_bound=45000, exclude_on_fail=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can select all scenarios that have *not* been marked to be excluded by adding `exclude=False` to the [filter()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.filter) statement.\n",
    "\n",
    "To highlight the difference between the full scenario set and the reduced scenario set based on the validation exclusions, the next cell puts the two plots side by side with a shared y-axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, ax = plt.subplots(1, 2, figsize=(8, 4), sharey=True)\n",
    "\n",
    "df_world_co2 = df_world.filter(variable=\"Emissions|CO2\")\n",
    "\n",
    "df_world_co2.plot(ax=ax[0])\n",
    "df_world_co2.filter(exclude=False).plot(ax=ax[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Categorize scenarios according to timeseries data values\n",
    "\n",
    "It is often useful to apply categorization to classes of scenarios according to specific characteristics of the timeseries data. In the following example, we use the median global mean temperature assessment (computed using MAGICC 6 in the AR5 configuration) to categorize scenarios by their warming by the end of the century (year 2100).\n",
    "\n",
    "### Cleaning up a scenario ensemble for simpler processing\n",
    "\n",
    "When displaying the list of variables in the scenario ensemble earlier, you probably noticed that the variable for the temperature assessment had a rather unwieldy name: `AR5 climate diagnostics|Temperature|Global Mean|MAGICC6|MED`.\n",
    "\n",
    "To simplify further processing, we use the [rename()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.rename) function to change the variable of this timeseries data to `Temperature`. By adding the argument `inplace=True`, the renaming is performed directly on the **IamDataFrame** rather than returning a copy with the change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.rename(\n",
    "    variable={\n",
    "        \"AR5 climate diagnostics|Temperature|Global Mean|MAGICC6|MED\": \"Temperature\"\n",
    "    },\n",
    "    inplace=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the next cell, we display the list of variables again to verify that the renaming was successful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we display the timeseries data of the warming outcome as a line plot. This helps to decide where to set the thresholds for the categories."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Temperature\").plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Categorization assignment\n",
    "\n",
    "We now use the categorization feature to group scenarios by their temperature outcome by the end of the century.\n",
    "\n",
    "The first cell sets the `Temperature` categorization to the default 'uncategorized'.\n",
    "This is not necessary per se (setting a meta column via the categorization will mark all non-assigned rows as 'uncategorized' (if the value is a string) or [np.nan](https://numpy.org/devdocs/reference/constants.html#numpy.NAN).\n",
    "However, having this cell may be helpful in this tutorial notebook if you are going back and forth between cells to reset the assignment.\n",
    "\n",
    "The function [categorize()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.categorize) takes `color` and similar arguments, which can then be used by the plotting library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.set_meta(meta=\"uncategorized\", name=\"warming-category\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.categorize(\n",
    "    \"warming-category\",\n",
    "    \"below 1.6C\",\n",
    "    criteria={\"Temperature\": {\"up\": 1.6, \"year\": 2100}},\n",
    "    color=\"xkcd:baby blue\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.categorize(\n",
    "    \"warming-category\",\n",
    "    \"below 2.5C\",\n",
    "    criteria={\"Temperature\": {\"up\": 2.5, \"lo\": 1.6, \"year\": 2100}},\n",
    "    color=\"xkcd:green\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.categorize(\n",
    "    \"warming-category\",\n",
    "    \"below 3.5C\",\n",
    "    criteria={\"Temperature\": {\"up\": 3.5, \"lo\": 2.5, \"year\": 2100}},\n",
    "    color=\"xkcd:goldenrod\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.categorize(\n",
    "    \"warming-category\",\n",
    "    \"above 3.5C\",\n",
    "    criteria={\"Temperature\": {\"lo\": 3.5, \"year\": 2100}},\n",
    "    color=\"xkcd:crimson\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply categories to timeseries analysis\n",
    "\n",
    "Now, we again display the median global temperature increase for all scenarios, but we use the colouring by category to illustrate the common characteristics across scenarios."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Temperature\").plot(color=\"warming-category\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a last step, we display the aggregate CO2 emissions, but apply the color scheme of the categorization by temperature. This allows to highlight alternative pathways within the same category.\n",
    "\n",
    "Note that the emissions plot also includes one `uncategorized` scenario. The `GENeSYS-MOD` scenario did not provide timeseries data until the end of the century and hence could not be assessed for its warming outcome with MAGICC6 in the SR15 process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.filter(variable=\"Emissions|CO2\", region=\"World\").plot(color=\"warming-category\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compute quantitative indicators for further scenario characterization & diagnostics\n",
    "\n",
    "In the previous section, we classified scenarios in distinct groups by their end-of-century warming outcome. In other use cases, however, it may be of interest to easily derive quantitative indicators and use those for more detailed scenario assessment.\n",
    "\n",
    "In this section, we illustrate two ways to add quantitative indicators.\n",
    "First, we add two indicators derived directly from timeseries data: the warming at the end of the century (`end-of-century-temperature`) and the peak temperature over the entire model horizon (`peak-temperature`).\n",
    "For the end-of-century indicator, we can pass the year of relevant as a filter argument to the [set_meta_from_data()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.set_meta_from_data) function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eoc = \"end-of-century-temperature\"\n",
    "df.set_meta_from_data(name=eoc, variable=\"Temperature\", year=2100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the filter arguments passed to [set_meta_from_data()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.set_meta_from_data) do not yield a unique value (in this case without a specific year), we can pass a `method` to aggregate or select a specific value (e.g., the maximum using the **numpy** package - [read the docs](https://numpy.org))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "peak = \"peak-temperature\"\n",
    "df.set_meta_from_data(name=peak, variable=\"Temperature\", method=np.max)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second method to define quantitative indicators is the function [set_meta()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.set_meta). It can take any [pandas.Series](https://pandas.pydata.org/pandas-docs/stable/reference/series.html) with an index including `model` and `scenario`.\n",
    "\n",
    "In the example, we can now easily derive the \"overshoot\", i.e., the reduction in global temperature after the peak,\n",
    "by computing the difference between the two quantitative indicators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "overshoot = df.meta[peak] - df.meta[eoc]\n",
    "overshoot.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.set_meta(name=\"overshoot\", meta=overshoot)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a last step of this illustrative example, we again display the first 10 rows of the **meta** attribute of the **IamDataFrame**.\n",
    "The **meta** column seen in cell 20 now includes columns with the three quantitative indicators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.meta.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export data and categorization to a file using the IAMC template\n",
    "\n",
    "The **IamDataFrame** can be exported [to_excel()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.to_excel) and [to_csv()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.to_csv) in the IAMC (wide) format.\n",
    "When writing to `xlsx`, both the timeseries data and the 'meta' table of categorization and quantitative indicators will be written to the file, to two sheets named 'data' and 'meta' respectively.\n",
    "\n",
    "As discussed before, these **pyam** functions closely follow the similar **pandas** functions [pd.DataFrame.to_excel()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html) and [pd.DataFrame.to_csv()](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html). It can use any keyword arguments of those functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_excel(\"tutorial_export.xlsx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "**New to Jupyter notebooks?**  \n",
    "Read [this page](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/) for helpful tips and tricks when working with Jupyter notebooks.\n",
    "\n",
    "</div>\n",
    "\n",
    "## Questions?\n",
    "\n",
    "Take a look at the next tutorials - then join our [mailing list](https://groups.io/g/pyam)!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}