{ "cells": [ { "cell_type": "markdown", "id": "a860826e", "metadata": {}, "source": [ "# Prep Notebook, Week 13/15 -- Altair stuffs\n", "\n", "So, the last lecture we ended passing data through Python to Altair to output as vega-lite. What is the benefit to using Python for data analysis? Well, for some of us Python is our bestie and so we want to hang out with it the most. For others, the benefit is that we can do data cleaning in Python and then put the cleaned data into our plots.\n", "\n", "Last time we worked though an example with the buildings dataset, now we'll do a more complex example with the corgi's dataset." ] }, { "cell_type": "code", "execution_count": 1, "id": "9225a116", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import altair as alt\n", "import matplotlib.pyplot as plt # just in case" ] }, { "cell_type": "code", "execution_count": 2, "id": "640400f1", "metadata": {}, "outputs": [], "source": [ "myJekyllDir = '/Users/jnaiman/jnaiman.github.io/'" ] }, { "cell_type": "markdown", "id": "13531926", "metadata": {}, "source": [ "## Corgis Dataset examples" ] }, { "cell_type": "markdown", "id": "753601a0", "metadata": {}, "source": [ "Let's start by reading in this dataset and taking a look -- we'll start with the number of corgis born over time:" ] }, { "cell_type": "code", "execution_count": 3, "id": "7b21af27", "metadata": {}, "outputs": [], "source": [ "corgs = pd.read_csv('https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/corgs_per_country_over_time_columns_2020.csv')" ] }, { "cell_type": "code", "execution_count": 4, "id": "48c29e18", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearsUnited StatesBrazilRussiaJapanVietnamGermanyFranceUnited KingdomItaly...CroatiaNew ZealandIrelandLithuaniaUruguayLatviaSloveniaEstoniaNetherlands AntillesKosovo
01917000000010...0000000000
11918000000000...0000000000
21919000000000...0000000001
31920000000000...0000000000
41921000000000...0000000000
\n", "

5 rows × 41 columns

\n", "
" ], "text/plain": [ " Years United States Brazil Russia Japan Vietnam Germany France \\\n", "0 1917 0 0 0 0 0 0 0 \n", "1 1918 0 0 0 0 0 0 0 \n", "2 1919 0 0 0 0 0 0 0 \n", "3 1920 0 0 0 0 0 0 0 \n", "4 1921 0 0 0 0 0 0 0 \n", "\n", " United Kingdom Italy ... Croatia New Zealand Ireland Lithuania \\\n", "0 1 0 ... 0 0 0 0 \n", "1 0 0 ... 0 0 0 0 \n", "2 0 0 ... 0 0 0 0 \n", "3 0 0 ... 0 0 0 0 \n", "4 0 0 ... 0 0 0 0 \n", "\n", " Uruguay Latvia Slovenia Estonia Netherlands Antilles Kosovo \n", "0 0 0 0 0 0 0 \n", "1 0 0 0 0 0 0 \n", "2 0 0 0 0 0 1 \n", "3 0 0 0 0 0 0 \n", "4 0 0 0 0 0 0 \n", "\n", "[5 rows x 41 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corgs.head()" ] }, { "cell_type": "markdown", "id": "3ac38572", "metadata": {}, "source": [ "So, this shows the number of corgis born over time in different countries. The easiest plot we can think of is probably a line chart of each country over time. First, let's start with the United States:" ] }, { "cell_type": "code", "execution_count": 5, "id": "c7b01113", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linecorg = alt.Chart(corgs).mark_line().encode(\n", " alt.X('Years:Q'),\n", " alt.Y('United States:Q')\n", ")\n", "linecorg" ] }, { "cell_type": "markdown", "id": "367ff0b1", "metadata": {}, "source": [ "Here again, we probably want to specify that we have time units:" ] }, { "cell_type": "code", "execution_count": 6, "id": "fa4a063e", "metadata": {}, "outputs": [], "source": [ "corgs['Years'] = pd.to_datetime(corgs['Years'].astype('int'), format='%Y')" ] }, { "cell_type": "code", "execution_count": 7, "id": "a95ce9c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearsUnited StatesBrazilRussiaJapanVietnamGermanyFranceUnited KingdomItaly...CroatiaNew ZealandIrelandLithuaniaUruguayLatviaSloveniaEstoniaNetherlands AntillesKosovo
01917-01-01000000010...0000000000
11918-01-01000000000...0000000000
21919-01-01000000000...0000000001
31920-01-01000000000...0000000000
41921-01-01000000000...0000000000
..................................................................
992016-01-012290217205455282...8309000271772
1002017-01-012130260212860104...4300200825101
1012018-01-01198032903422552...6300000161256
1022019-01-01118018900306212...0000000161251
1032020-01-012016000000...00000000017
\n", "

104 rows × 41 columns

\n", "
" ], "text/plain": [ " Years United States Brazil Russia Japan Vietnam Germany \\\n", "0 1917-01-01 0 0 0 0 0 0 \n", "1 1918-01-01 0 0 0 0 0 0 \n", "2 1919-01-01 0 0 0 0 0 0 \n", "3 1920-01-01 0 0 0 0 0 0 \n", "4 1921-01-01 0 0 0 0 0 0 \n", ".. ... ... ... ... ... ... ... \n", "99 2016-01-01 229 0 217 2 0 54 \n", "100 2017-01-01 213 0 260 2 1 28 \n", "101 2018-01-01 198 0 329 0 3 42 \n", "102 2019-01-01 118 0 189 0 0 30 \n", "103 2020-01-01 2 0 16 0 0 0 \n", "\n", " France United Kingdom Italy ... Croatia New Zealand Ireland \\\n", "0 0 1 0 ... 0 0 0 \n", "1 0 0 0 ... 0 0 0 \n", "2 0 0 0 ... 0 0 0 \n", "3 0 0 0 ... 0 0 0 \n", "4 0 0 0 ... 0 0 0 \n", ".. ... ... ... ... ... ... ... \n", "99 55 28 2 ... 8 3 0 \n", "100 60 10 4 ... 4 3 0 \n", "101 25 5 2 ... 6 3 0 \n", "102 62 1 2 ... 0 0 0 \n", "103 0 0 0 ... 0 0 0 \n", "\n", " Lithuania Uruguay Latvia Slovenia Estonia Netherlands Antilles \\\n", "0 0 0 0 0 0 0 \n", "1 0 0 0 0 0 0 \n", "2 0 0 0 0 0 0 \n", "3 0 0 0 0 0 0 \n", "4 0 0 0 0 0 0 \n", ".. ... ... ... ... ... ... \n", "99 9 0 0 0 27 17 \n", "100 0 2 0 0 8 25 \n", "101 0 0 0 0 16 12 \n", "102 0 0 0 0 16 12 \n", "103 0 0 0 0 0 0 \n", "\n", " Kosovo \n", "0 0 \n", "1 0 \n", "2 1 \n", "3 0 \n", "4 0 \n", ".. ... \n", "99 72 \n", "100 101 \n", "101 56 \n", "102 51 \n", "103 17 \n", "\n", "[104 rows x 41 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corgs" ] }, { "cell_type": "code", "execution_count": 8, "id": "1c063665", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linecorg = alt.Chart(corgs).mark_line().encode(\n", " alt.X('Years:T'),\n", " alt.Y('United States:Q')\n", ")\n", "linecorg" ] }, { "cell_type": "markdown", "id": "4a16ad92", "metadata": {}, "source": [ "That looks better! To make multiple lines for each country, we want to probably use the `melt` operation we did before. To do that, we need to make years the index:" ] }, { "cell_type": "code", "execution_count": 9, "id": "b5f349e5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
United StatesBrazilRussiaJapanVietnamGermanyFranceUnited KingdomItalySouth Africa...CroatiaNew ZealandIrelandLithuaniaUruguayLatviaSloveniaEstoniaNetherlands AntillesKosovo
Years
1917-01-010000000100...0000000000
1918-01-010000000000...0000000000
1919-01-010000000000...0000000001
1920-01-010000000000...0000000000
1921-01-010000000000...0000000000
\n", "

5 rows × 40 columns

\n", "
" ], "text/plain": [ " United States Brazil Russia Japan Vietnam Germany France \\\n", "Years \n", "1917-01-01 0 0 0 0 0 0 0 \n", "1918-01-01 0 0 0 0 0 0 0 \n", "1919-01-01 0 0 0 0 0 0 0 \n", "1920-01-01 0 0 0 0 0 0 0 \n", "1921-01-01 0 0 0 0 0 0 0 \n", "\n", " United Kingdom Italy South Africa ... Croatia New Zealand \\\n", "Years ... \n", "1917-01-01 1 0 0 ... 0 0 \n", "1918-01-01 0 0 0 ... 0 0 \n", "1919-01-01 0 0 0 ... 0 0 \n", "1920-01-01 0 0 0 ... 0 0 \n", "1921-01-01 0 0 0 ... 0 0 \n", "\n", " Ireland Lithuania Uruguay Latvia Slovenia Estonia \\\n", "Years \n", "1917-01-01 0 0 0 0 0 0 \n", "1918-01-01 0 0 0 0 0 0 \n", "1919-01-01 0 0 0 0 0 0 \n", "1920-01-01 0 0 0 0 0 0 \n", "1921-01-01 0 0 0 0 0 0 \n", "\n", " Netherlands Antilles Kosovo \n", "Years \n", "1917-01-01 0 0 \n", "1918-01-01 0 0 \n", "1919-01-01 0 1 \n", "1920-01-01 0 0 \n", "1921-01-01 0 0 \n", "\n", "[5 rows x 40 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corgs = corgs.set_index('Years')\n", "corgs.head()" ] }, { "cell_type": "markdown", "id": "ae2eed0f", "metadata": {}, "source": [ "Now we can \"melt\" like before:" ] }, { "cell_type": "code", "execution_count": 10, "id": "86f8e8c7", "metadata": {}, "outputs": [], "source": [ "corg_source = corgs.reset_index().melt('Years', \n", " var_name='Country', value_name='country')" ] }, { "cell_type": "code", "execution_count": 11, "id": "d5b017d8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearsCountrycountry
01917-01-01United States0
11918-01-01United States0
21919-01-01United States0
31920-01-01United States0
41921-01-01United States0
\n", "
" ], "text/plain": [ " Years Country country\n", "0 1917-01-01 United States 0\n", "1 1918-01-01 United States 0\n", "2 1919-01-01 United States 0\n", "3 1920-01-01 United States 0\n", "4 1921-01-01 United States 0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_source.head()" ] }, { "cell_type": "markdown", "id": "132438be", "metadata": {}, "source": [ "Now, let's try again with our multi-line plot:" ] }, { "cell_type": "code", "execution_count": 12, "id": "c3a49d72", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linecorg = alt.Chart(corg_source).mark_line().encode(\n", " alt.Y('country:Q'),\n", " x='Years:T',\n", " color='Country:N'\n", ")\n", "linecorg" ] }, { "cell_type": "markdown", "id": "ace87bce", "metadata": {}, "source": [ "Since there is a large range of corgis born, we might try a log-scale on the y-axis:" ] }, { "cell_type": "code", "execution_count": 13, "id": "835b428c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linecorg = alt.Chart(corg_source).mark_line().encode(\n", " alt.Y('country:Q',scale=alt.Scale(type='log')),\n", " x='Years:T',\n", " color='Country:N'\n", ")\n", "linecorg" ] }, { "cell_type": "markdown", "id": "8bb9795c", "metadata": {}, "source": [ "Here we see that there is an issue -- this is probably because there are zeros in our dataset:" ] }, { "cell_type": "code", "execution_count": 14, "id": "315d3094", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_source['country'].min()" ] }, { "cell_type": "markdown", "id": "793c1a4c", "metadata": {}, "source": [ "Potentially, we can \"clean\" these by setting any zeros to NaN's, however in this case those 0's are meaningful -- they are telling us that a particular country has no corgis born in that year! Instead, we have a few different options. One thing we can do [is apply a](https://stackoverflow.com/questions/58032074/why-is-altair-returning-an-empty-chart-when-using-log-scale) [filter opperation](https://altair-viz.github.io/user_guide/transform/filter.html#filter-transform) on our data:" ] }, { "cell_type": "code", "execution_count": 16, "id": "4b21cdd7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linecorg = alt.Chart(corg_source).transform_filter(\n", " alt.datum.country > 0\n", ").mark_line().encode(\n", " alt.Y('country:Q',scale=alt.Scale(type='log')),\n", " x='Years:T',\n", " color='Country:N'\n", ")\n", "linecorg" ] }, { "cell_type": "markdown", "id": "ab762ef4", "metadata": {}, "source": [ "This doesn't really get to our issue though -- we are just filtering out those zero years/countries. In this case, instead of filtering, we probably want to use a symmetric log instead of a log for our scale;" ] }, { "cell_type": "code", "execution_count": 17, "id": "f4b2d163", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linecorg = alt.Chart(corg_source).mark_line().encode(\n", " alt.Y('country:Q',scale=alt.Scale(type='symlog')),\n", " x='Years:T',\n", " color='Country:N'\n", ")\n", "linecorg" ] }, { "cell_type": "markdown", "id": "d02700cc", "metadata": {}, "source": [ "Even with this though, we have sort of a messy plot -- even if we allowed for selections (like with the buildings dataset above) we still have many lines that \"fall off\" our legend which is not helpful.\n", "\n", "Since this is data in different countries, we could think that maybe a good idea would be some kind of map!\n", "\n", "First, let's see what mappable data \"comes with\" vega-datasets:" ] }, { "cell_type": "code", "execution_count": 18, "id": "a573129b", "metadata": {}, "outputs": [], "source": [ "from vega_datasets import data" ] }, { "cell_type": "code", "execution_count": 19, "id": "0cf9ef16", "metadata": {}, "outputs": [], "source": [ "#data.countries" ] }, { "cell_type": "code", "execution_count": 20, "id": "fb55844c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.world_110m" ] }, { "cell_type": "markdown", "id": "3620bf17", "metadata": {}, "source": [ "Let's try plotting both of these:" ] }, { "cell_type": "code", "execution_count": 21, "id": "05f5e5e9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='gray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "world" ] }, { "cell_type": "markdown", "id": "cbf21c56", "metadata": {}, "source": [ "So, one way we can plot data on this plot is to plot dots over each of the countries showing how many corgis have been born in each one. For this though [we probably need the latitude and longitude points](https://altair-viz.github.io/altair-tutorial/notebooks/09-Geographic-plots.html)." ] }, { "cell_type": "markdown", "id": "f6f40961", "metadata": {}, "source": [ "One way to do this is [using the OpenStreetMap project](https://gis.stackexchange.com/questions/212796/getting-latlon-extent-of-country-by-its-name-using-python):" ] }, { "cell_type": "code", "execution_count": 22, "id": "d01ac06f", "metadata": {}, "outputs": [], "source": [ "import requests\n", "def get_boundingbox_country(country, output_as='center'):\n", " \"\"\"\n", " get the bounding box of a country in EPSG4326 given a country name\n", "\n", " Parameters\n", " ----------\n", " country : str\n", " name of the country in english and lowercase\n", " output_as : 'str\n", " chose from 'boundingbox' or 'center'. \n", " - 'boundingbox' for [latmin, latmax, lonmin, lonmax]\n", " - 'center' for [latcenter, loncenter]\n", "\n", " Returns\n", " -------\n", " output : list\n", " list with coordinates as str\n", " \"\"\"\n", " # create url\n", " url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search?country=',\n", " country,\n", " '&format=json&polygon=0')\n", " response = requests.get(url).json()[0]\n", "\n", " # parse response to list\n", " if output_as == 'boundingbox':\n", " lst = response[output_as]\n", " output = [float(i) for i in lst]\n", " if output_as == 'center':\n", " lst = [response.get(key) for key in ['lat','lon']]\n", " output = [float(i) for i in lst]\n", " return output" ] }, { "cell_type": "markdown", "id": "c8473e93", "metadata": {}, "source": [ "We can call OpenStreetMaps in a few different ways, with country codes or names:" ] }, { "cell_type": "code", "execution_count": 23, "id": "8b7ba374", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(39.7837304, -100.445882)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lat,long = get_boundingbox_country('usa')\n", "lat,long" ] }, { "cell_type": "code", "execution_count": 24, "id": "1b257c1c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(39.7837304, -100.445882)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lat,long = get_boundingbox_country('us')\n", "lat,long" ] }, { "cell_type": "code", "execution_count": 25, "id": "d6fda5f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(39.7837304, -100.445882)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lat,long = get_boundingbox_country('United States')\n", "lat,long" ] }, { "cell_type": "markdown", "id": "eeb3cb21", "metadata": {}, "source": [ "There is no guarantee that the dataset is formatted with the correct country codes -- so we'd best check and see:" ] }, { "cell_type": "code", "execution_count": 26, "id": "50e6c74a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['United States', 'Brazil', 'Russia', 'Japan', 'Vietnam', 'Germany',\n", " 'France', 'United Kingdom', 'Italy', 'South Africa', 'Ukraine', 'Spain',\n", " 'Poland', 'Canada', 'Korea, North', 'Romania', 'Australia', 'Portugal',\n", " 'Belgium', 'Czech Republic', 'Hungary', 'Belarus', 'Sweden', 'Austria',\n", " 'Switzerland', 'Israel', 'Serbia', 'Denmark', 'Finland', 'Norway',\n", " 'Croatia', 'New Zealand', 'Ireland', 'Lithuania', 'Uruguay', 'Latvia',\n", " 'Slovenia', 'Estonia', 'Netherlands Antilles', 'Kosovo'],\n", " dtype='object')" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corgs.columns" ] }, { "cell_type": "code", "execution_count": 27, "id": "7cecb018", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "United States 39.7837304 -100.445882\n", "Brazil -10.3333333 -53.2\n", "Russia 64.6863136 97.7453061\n", "Japan 36.5748441 139.2394179\n", "Vietnam 15.9266657 107.9650855\n", "Germany 51.1638175 10.4478313\n", "France 46.603354 1.8883335\n", "United Kingdom 54.7023545 -3.2765753\n", "Italy 42.6384261 12.674297\n", "South Africa -28.8166236 24.991639\n", "Ukraine 49.4871968 31.2718321\n", "Spain 39.3260685 -4.8379791\n", "Poland 52.215933 19.134422\n", "Canada 61.0666922 -107.991707\n" ] }, { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/t7/bwcvd_4177q4872gxghn7p9r0000gq/T/ipykernel_98272/1235722415.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mlat\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlong\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mc\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mcorgs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mla\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlo\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_boundingbox_country\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mc\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mla\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mlat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mla\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m;\u001b[0m \u001b[0mlong\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/var/folders/t7/bwcvd_4177q4872gxghn7p9r0000gq/T/ipykernel_98272/973524379.py\u001b[0m in \u001b[0;36mget_boundingbox_country\u001b[0;34m(country, output_as)\u001b[0m\n\u001b[1;32m 22\u001b[0m \u001b[0mcountry\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 23\u001b[0m '&format=json&polygon=0')\n\u001b[0;32m---> 24\u001b[0;31m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjson\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 25\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;31m# parse response to list\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mIndexError\u001b[0m: list index out of range" ] } ], "source": [ "lat,long = [],[]\n", "for c in corgs.columns:\n", " la,lo = get_boundingbox_country(str(c))\n", " print(c,la,lo)\n", " lat.append(la); long.append(lo)" ] }, { "cell_type": "markdown", "id": "6a9839c8", "metadata": {}, "source": [ "Ah ha! We have an issue, let's see what it is:" ] }, { "cell_type": "code", "execution_count": 28, "id": "b8d24543", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Korea, North'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c" ] }, { "cell_type": "markdown", "id": "07980335", "metadata": {}, "source": [ "In this case, we need to do some data cleaning. We can start from our \"source\" but instead, we probably want to start from the original dataframe, just to be consistent:" ] }, { "cell_type": "code", "execution_count": 29, "id": "be9cd957", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
United StatesBrazilRussiaJapanVietnamGermanyFranceUnited KingdomItalySouth Africa...CroatiaNew ZealandIrelandLithuaniaUruguayLatviaSloveniaEstoniaNetherlands AntillesKosovo
Years
1917-01-010000000100...0000000000
1918-01-010000000000...0000000000
1919-01-010000000000...0000000001
1920-01-010000000000...0000000000
1921-01-010000000000...0000000000
\n", "

5 rows × 40 columns

\n", "
" ], "text/plain": [ " United States Brazil Russia Japan Vietnam Germany France \\\n", "Years \n", "1917-01-01 0 0 0 0 0 0 0 \n", "1918-01-01 0 0 0 0 0 0 0 \n", "1919-01-01 0 0 0 0 0 0 0 \n", "1920-01-01 0 0 0 0 0 0 0 \n", "1921-01-01 0 0 0 0 0 0 0 \n", "\n", " United Kingdom Italy South Africa ... Croatia New Zealand \\\n", "Years ... \n", "1917-01-01 1 0 0 ... 0 0 \n", "1918-01-01 0 0 0 ... 0 0 \n", "1919-01-01 0 0 0 ... 0 0 \n", "1920-01-01 0 0 0 ... 0 0 \n", "1921-01-01 0 0 0 ... 0 0 \n", "\n", " Ireland Lithuania Uruguay Latvia Slovenia Estonia \\\n", "Years \n", "1917-01-01 0 0 0 0 0 0 \n", "1918-01-01 0 0 0 0 0 0 \n", "1919-01-01 0 0 0 0 0 0 \n", "1920-01-01 0 0 0 0 0 0 \n", "1921-01-01 0 0 0 0 0 0 \n", "\n", " Netherlands Antilles Kosovo \n", "Years \n", "1917-01-01 0 0 \n", "1918-01-01 0 0 \n", "1919-01-01 0 1 \n", "1920-01-01 0 0 \n", "1921-01-01 0 0 \n", "\n", "[5 rows x 40 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean = corgs.copy()\n", "#corg_clean = corg_clean.set_index('Years') # note -- don't have to do this again since we started with a rest index\n", "corg_clean = corg_clean.rename(columns={'Korea, North':\"North Korea\"})\n", "corg_clean.head()" ] }, { "cell_type": "markdown", "id": "00761ecd", "metadata": {}, "source": [ "Let's try this again:" ] }, { "cell_type": "code", "execution_count": 30, "id": "189aa8dd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "United States 39.7837304 -100.445882\n", "Brazil -10.3333333 -53.2\n", "Russia 64.6863136 97.7453061\n", "Japan 36.5748441 139.2394179\n", "Vietnam 15.9266657 107.9650855\n", "Germany 51.1638175 10.4478313\n", "France 46.603354 1.8883335\n", "United Kingdom 54.7023545 -3.2765753\n", "Italy 42.6384261 12.674297\n", "South Africa -28.8166236 24.991639\n", "Ukraine 49.4871968 31.2718321\n", "Spain 39.3260685 -4.8379791\n", "Poland 52.215933 19.134422\n", "Canada 61.0666922 -107.991707\n", "North Korea 40.3736611 127.0870417\n", "Romania 45.9852129 24.6859225\n", "Australia -24.7761086 134.755\n", "Portugal 39.6621648 -8.1353519\n", "Belgium 50.6402809 4.6667145\n", "Czech Republic 49.7439047 15.3381061\n", "Hungary 47.1817585 19.5060937\n", "Belarus 53.4250605 27.6971358\n", "Sweden 59.6749712 14.5208584\n", "Austria 47.59397 14.12456\n", "Switzerland 46.7985624 8.2319736\n", "Israel 30.8124247 34.8594762\n", "Serbia 44.1534121 20.55144\n", "Denmark 55.670249 10.3333283\n", "Finland 63.2467777 25.9209164\n", "Norway 60.5000209 9.0999715\n", "Croatia 45.5643442 17.0118954\n", "New Zealand -41.5000831 172.8344077\n", "Ireland 52.865196 -7.9794599\n", "Lithuania 55.3500003 23.7499997\n", "Uruguay -32.8755548 -56.0201525\n", "Latvia 56.8406494 24.7537645\n", "Slovenia 45.8133113 14.4808369\n", "Estonia 58.7523778 25.3319078\n" ] }, { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/t7/bwcvd_4177q4872gxghn7p9r0000gq/T/ipykernel_98272/1978887829.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mlat\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlong\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mc\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mcorg_clean\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mla\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlo\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_boundingbox_country\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mc\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mla\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mlat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mla\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m;\u001b[0m \u001b[0mlong\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlo\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/var/folders/t7/bwcvd_4177q4872gxghn7p9r0000gq/T/ipykernel_98272/973524379.py\u001b[0m in \u001b[0;36mget_boundingbox_country\u001b[0;34m(country, output_as)\u001b[0m\n\u001b[1;32m 22\u001b[0m \u001b[0mcountry\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 23\u001b[0m '&format=json&polygon=0')\n\u001b[0;32m---> 24\u001b[0;31m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjson\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 25\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;31m# parse response to list\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mIndexError\u001b[0m: list index out of range" ] } ], "source": [ "lat,long = [],[]\n", "for c in corg_clean.columns:\n", " la,lo = get_boundingbox_country(str(c))\n", " print(c,la,lo)\n", " lat.append(la); long.append(lo)" ] }, { "cell_type": "code", "execution_count": 31, "id": "b0bce0ac", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Netherlands Antilles'" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c" ] }, { "cell_type": "markdown", "id": "4a0879e8", "metadata": {}, "source": [ "Found another one:" ] }, { "cell_type": "code", "execution_count": 32, "id": "ea02bd7a", "metadata": {}, "outputs": [], "source": [ "corg_clean = corgs.copy()\n", "corg_clean = corg_clean.rename(columns={'Korea, North':\"North Korea\",'Netherlands Antilles':\"Netherlands\"})" ] }, { "cell_type": "code", "execution_count": 33, "id": "3aa7c25c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "United States 39.7837304 -100.445882\n", "Brazil -10.3333333 -53.2\n", "Russia 64.6863136 97.7453061\n", "Japan 36.5748441 139.2394179\n", "Vietnam 15.9266657 107.9650855\n", "Germany 51.1638175 10.4478313\n", "France 46.603354 1.8883335\n", "United Kingdom 54.7023545 -3.2765753\n", "Italy 42.6384261 12.674297\n", "South Africa -28.8166236 24.991639\n", "Ukraine 49.4871968 31.2718321\n", "Spain 39.3260685 -4.8379791\n", "Poland 52.215933 19.134422\n", "Canada 61.0666922 -107.991707\n", "North Korea 40.3736611 127.0870417\n", "Romania 45.9852129 24.6859225\n", "Australia -24.7761086 134.755\n", "Portugal 39.6621648 -8.1353519\n", "Belgium 50.6402809 4.6667145\n", "Czech Republic 49.7439047 15.3381061\n", "Hungary 47.1817585 19.5060937\n", "Belarus 53.4250605 27.6971358\n", "Sweden 59.6749712 14.5208584\n", "Austria 47.59397 14.12456\n", "Switzerland 46.7985624 8.2319736\n", "Israel 30.8124247 34.8594762\n", "Serbia 44.1534121 20.55144\n", "Denmark 55.670249 10.3333283\n", "Finland 63.2467777 25.9209164\n", "Norway 60.5000209 9.0999715\n", "Croatia 45.5643442 17.0118954\n", "New Zealand -41.5000831 172.8344077\n", "Ireland 52.865196 -7.9794599\n", "Lithuania 55.3500003 23.7499997\n", "Uruguay -32.8755548 -56.0201525\n", "Latvia 56.8406494 24.7537645\n", "Slovenia 45.8133113 14.4808369\n", "Estonia 58.7523778 25.3319078\n", "Netherlands 52.24764975 5.541246849406163\n", "Kosovo 42.5869578 20.9021231\n" ] } ], "source": [ "lat,long = [],[]\n", "for c in corg_clean.columns:\n", " la,lo = get_boundingbox_country(str(c))\n", " print(c,la,lo)\n", " lat.append(la); long.append(lo)" ] }, { "cell_type": "markdown", "id": "bb6973f2", "metadata": {}, "source": [ "Hurray! So, this is a case where being able to clean the data in Python was super useful. Let's try adding these as points to our map now that we have cleaned data." ] }, { "cell_type": "markdown", "id": "77da2776", "metadata": {}, "source": [ "One other data transformation we probably want to do is transpose this matrix so that we have countries as rows and corgis born per year as columns:" ] }, { "cell_type": "code", "execution_count": 34, "id": "18ab69a1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-011918-01-011919-01-011920-01-011921-01-011922-01-011923-01-011924-01-011925-01-011926-01-01...2011-01-012012-01-012013-01-012014-01-012015-01-012016-01-012017-01-012018-01-012019-01-012020-01-01
United States0000000000...4084313762803012292131981182
Brazil0000000000...0000000000
Russia0000000000...898211512723721726032918916
Japan0000000000...0000022000
Vietnam0000000000...0000001300
\n", "

5 rows × 104 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n", "United States ... 408 431 376 280 \n", "Brazil ... 0 0 0 0 \n", "Russia ... 89 82 115 127 \n", "Japan ... 0 0 0 0 \n", "Vietnam ... 0 0 0 0 \n", "\n", "Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n", "United States 301 229 213 198 118 \n", "Brazil 0 0 0 0 0 \n", "Russia 237 217 260 329 189 \n", "Japan 0 2 2 0 0 \n", "Vietnam 0 0 1 3 0 \n", "\n", "Years 2020-01-01 \n", "United States 2 \n", "Brazil 0 \n", "Russia 16 \n", "Japan 0 \n", "Vietnam 0 \n", "\n", "[5 rows x 104 columns]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t = corg_clean.T\n", "corg_clean_t.head()" ] }, { "cell_type": "markdown", "id": "4b6c0ef7", "metadata": {}, "source": [ "Since, technically, the index is no longer years, so we should be sure to name the index more intuatively: " ] }, { "cell_type": "code", "execution_count": 35, "id": "4fe7be97", "metadata": {}, "outputs": [], "source": [ "corg_clean_t.index.name = \"Country\"" ] }, { "cell_type": "code", "execution_count": 36, "id": "92c681dc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Years'" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.axes[1].name" ] }, { "cell_type": "code", "execution_count": 37, "id": "b4404f0d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-011918-01-011919-01-011920-01-011921-01-011922-01-011923-01-011924-01-011925-01-011926-01-01...2011-01-012012-01-012013-01-012014-01-012015-01-012016-01-012017-01-012018-01-012019-01-012020-01-01
Country
United States0000000000...4084313762803012292131981182
Brazil0000000000...0000000000
Russia0000000000...898211512723721726032918916
Japan0000000000...0000022000
Vietnam0000000000...0000001300
\n", "

5 rows × 104 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n", "Country ... \n", "United States ... 408 431 376 280 \n", "Brazil ... 0 0 0 0 \n", "Russia ... 89 82 115 127 \n", "Japan ... 0 0 0 0 \n", "Vietnam ... 0 0 0 0 \n", "\n", "Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n", "Country \n", "United States 301 229 213 198 118 \n", "Brazil 0 0 0 0 0 \n", "Russia 237 217 260 329 189 \n", "Japan 0 2 2 0 0 \n", "Vietnam 0 0 1 3 0 \n", "\n", "Years 2020-01-01 \n", "Country \n", "United States 2 \n", "Brazil 0 \n", "Russia 16 \n", "Japan 0 \n", "Vietnam 0 \n", "\n", "[5 rows x 104 columns]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.head()" ] }, { "cell_type": "markdown", "id": "1b97f199", "metadata": {}, "source": [ "Now we can add in the lat/long for each country:" ] }, { "cell_type": "code", "execution_count": 38, "id": "6d5178a1", "metadata": {}, "outputs": [], "source": [ "corg_clean_t['Latitude'] = lat\n", "corg_clean_t['Longitude'] = long" ] }, { "cell_type": "code", "execution_count": 39, "id": "99b6bced", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-01 00:00:001918-01-01 00:00:001919-01-01 00:00:001920-01-01 00:00:001921-01-01 00:00:001922-01-01 00:00:001923-01-01 00:00:001924-01-01 00:00:001925-01-01 00:00:001926-01-01 00:00:00...2013-01-01 00:00:002014-01-01 00:00:002015-01-01 00:00:002016-01-01 00:00:002017-01-01 00:00:002018-01-01 00:00:002019-01-01 00:00:002020-01-01 00:00:00LatitudeLongitude
Country
United States0000000000...376280301229213198118239.783730-100.445882
Brazil0000000000...00000000-10.333333-53.200000
Russia0000000000...1151272372172603291891664.68631497.745306
Japan0000000000...0002200036.574844139.239418
Vietnam0000000000...0000130015.926666107.965086
\n", "

5 rows × 106 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1926-01-01 00:00:00 ... 2013-01-01 00:00:00 \\\n", "Country ... \n", "United States 0 ... 376 \n", "Brazil 0 ... 0 \n", "Russia 0 ... 115 \n", "Japan 0 ... 0 \n", "Vietnam 0 ... 0 \n", "\n", "Years 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00 \\\n", "Country \n", "United States 280 301 229 \n", "Brazil 0 0 0 \n", "Russia 127 237 217 \n", "Japan 0 0 2 \n", "Vietnam 0 0 0 \n", "\n", "Years 2017-01-01 00:00:00 2018-01-01 00:00:00 2019-01-01 00:00:00 \\\n", "Country \n", "United States 213 198 118 \n", "Brazil 0 0 0 \n", "Russia 260 329 189 \n", "Japan 2 0 0 \n", "Vietnam 1 3 0 \n", "\n", "Years 2020-01-01 00:00:00 Latitude Longitude \n", "Country \n", "United States 2 39.783730 -100.445882 \n", "Brazil 0 -10.333333 -53.200000 \n", "Russia 16 64.686314 97.745306 \n", "Japan 0 36.574844 139.239418 \n", "Vietnam 0 15.926666 107.965086 \n", "\n", "[5 rows x 106 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.head()" ] }, { "cell_type": "markdown", "id": "ffbdf81e", "metadata": {}, "source": [ "Now (finally) we can plot those points on our map using just the lat/long:" ] }, { "cell_type": "code", "execution_count": 40, "id": "8af51649", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearsCountryLatitudeLongitude
0United States39.783730-100.445882
1Brazil-10.333333-53.200000
2Russia64.68631497.745306
3Japan36.574844139.239418
4Vietnam15.926666107.965086
\n", "
" ], "text/plain": [ "Years Country Latitude Longitude\n", "0 United States 39.783730 -100.445882\n", "1 Brazil -10.333333 -53.200000\n", "2 Russia 64.686314 97.745306\n", "3 Japan 36.574844 139.239418\n", "4 Vietnam 15.926666 107.965086" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_subset = corg_clean_t.reset_index()[['Country','Latitude','Longitude']]\n", "corg_subset.head()" ] }, { "cell_type": "code", "execution_count": 41, "id": "d2e22c68", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='gray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_subset).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " size=alt.value(100),\n", " tooltip='Country'\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "3f31fdc4", "metadata": {}, "source": [ "We probably want the size of the dot to be proportional to the actual number of corgis born." ] }, { "cell_type": "markdown", "id": "d51446fe", "metadata": {}, "source": [ "To start, let's just plot the total corgis born in a country:" ] }, { "cell_type": "code", "execution_count": 42, "id": "04940dd3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-011918-01-011919-01-011920-01-011921-01-011922-01-011923-01-011924-01-011925-01-011926-01-01...2011-01-012012-01-012013-01-012014-01-012015-01-012016-01-012017-01-012018-01-012019-01-012020-01-01
Country
United States0000000000...4084313762803012292131981182
Brazil0000000000...0000000000
Russia0000000000...898211512723721726032918916
Japan0000000000...0000022000
Vietnam0000000000...0000001300
\n", "

5 rows × 104 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n", "Country ... \n", "United States ... 408 431 376 280 \n", "Brazil ... 0 0 0 0 \n", "Russia ... 89 82 115 127 \n", "Japan ... 0 0 0 0 \n", "Vietnam ... 0 0 0 0 \n", "\n", "Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n", "Country \n", "United States 301 229 213 198 118 \n", "Brazil 0 0 0 0 0 \n", "Russia 237 217 260 329 189 \n", "Japan 0 2 2 0 0 \n", "Vietnam 0 0 1 3 0 \n", "\n", "Years 2020-01-01 \n", "Country \n", "United States 2 \n", "Brazil 0 \n", "Russia 16 \n", "Japan 0 \n", "Vietnam 0 \n", "\n", "[5 rows x 104 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.loc[:, (corg_clean_t.columns != 'Longitude') & (corg_clean_t.columns != \"Latitude\")].head()" ] }, { "cell_type": "markdown", "id": "8812a49a", "metadata": {}, "source": [ "Now we can sum along the Country axis:" ] }, { "cell_type": "code", "execution_count": 43, "id": "f5aa8ea0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Country\n", "United States 16130\n", "Brazil 1\n", "Russia 1834\n", "Japan 7\n", "Vietnam 4\n", "Germany 892\n", "France 597\n", "United Kingdom 2649\n", "Italy 106\n", "South Africa 28\n", "Ukraine 19\n", "Spain 5\n", "Poland 859\n", "Canada 391\n", "North Korea 5\n", "Romania 7\n", "Australia 891\n", "Portugal 7\n", "Belgium 110\n", "Czech Republic 271\n", "Hungary 8\n", "Belarus 38\n", "Sweden 2008\n", "Austria 29\n", "Switzerland 30\n", "Israel 32\n", "Serbia 2\n", "Denmark 2176\n", "Finland 4051\n", "Norway 1077\n", "Croatia 21\n", "New Zealand 1097\n", "Ireland 29\n", "Lithuania 29\n", "Uruguay 2\n", "Latvia 3\n", "Slovenia 1\n", "Estonia 97\n", "Netherlands 536\n", "Kosovo 649\n", "dtype: int64" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.loc[:, (corg_clean_t.columns != 'Longitude') & (corg_clean_t.columns != \"Latitude\")].sum(axis=1)" ] }, { "cell_type": "code", "execution_count": 44, "id": "ceffa58a", "metadata": {}, "outputs": [], "source": [ "corg_clean_t['Total Corg'] = corg_clean_t.loc[:, (corg_clean_t.columns != 'Longitude') & (corg_clean_t.columns != \"Latitude\")].sum(axis=1).values" ] }, { "cell_type": "code", "execution_count": 45, "id": "0e8c048e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-01 00:00:001918-01-01 00:00:001919-01-01 00:00:001920-01-01 00:00:001921-01-01 00:00:001922-01-01 00:00:001923-01-01 00:00:001924-01-01 00:00:001925-01-01 00:00:001926-01-01 00:00:00...2014-01-01 00:00:002015-01-01 00:00:002016-01-01 00:00:002017-01-01 00:00:002018-01-01 00:00:002019-01-01 00:00:002020-01-01 00:00:00LatitudeLongitudeTotal Corg
Country
United States0000000000...280301229213198118239.783730-100.44588216130
Brazil0000000000...0000000-10.333333-53.2000001
Russia0000000000...1272372172603291891664.68631497.7453061834
Japan0000000000...002200036.574844139.2394187
Vietnam0000000000...000130015.926666107.9650864
\n", "

5 rows × 107 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1926-01-01 00:00:00 ... 2014-01-01 00:00:00 \\\n", "Country ... \n", "United States 0 ... 280 \n", "Brazil 0 ... 0 \n", "Russia 0 ... 127 \n", "Japan 0 ... 0 \n", "Vietnam 0 ... 0 \n", "\n", "Years 2015-01-01 00:00:00 2016-01-01 00:00:00 2017-01-01 00:00:00 \\\n", "Country \n", "United States 301 229 213 \n", "Brazil 0 0 0 \n", "Russia 237 217 260 \n", "Japan 0 2 2 \n", "Vietnam 0 0 1 \n", "\n", "Years 2018-01-01 00:00:00 2019-01-01 00:00:00 2020-01-01 00:00:00 \\\n", "Country \n", "United States 198 118 2 \n", "Brazil 0 0 0 \n", "Russia 329 189 16 \n", "Japan 0 0 0 \n", "Vietnam 3 0 0 \n", "\n", "Years Latitude Longitude Total Corg \n", "Country \n", "United States 39.783730 -100.445882 16130 \n", "Brazil -10.333333 -53.200000 1 \n", "Russia 64.686314 97.745306 1834 \n", "Japan 36.574844 139.239418 7 \n", "Vietnam 15.926666 107.965086 4 \n", "\n", "[5 rows x 107 columns]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.head()" ] }, { "cell_type": "markdown", "id": "a561485c", "metadata": {}, "source": [ "Let's again subset our data:" ] }, { "cell_type": "code", "execution_count": 46, "id": "a0e18e70", "metadata": {}, "outputs": [], "source": [ "corg_subset2 = corg_clean_t.reset_index()[['Country','Latitude','Longitude','Total Corg']]" ] }, { "cell_type": "code", "execution_count": 47, "id": "6ea03e09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='gray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_subset2).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " size=alt.Size('Total Corg:Q'),\n", " tooltip='Country',\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "395f4c9e", "metadata": {}, "source": [ "By default, we see that there is some scale applied to the points where there are some that are very small. Let's see if we can mess with this:" ] }, { "cell_type": "code", "execution_count": 48, "id": "04936076", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='gray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_subset2).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " size=alt.Size('Total Corg:Q',scale=None),\n", " tooltip='Country',\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "2a3e1a96", "metadata": {}, "source": [ "Whoa! Now that seems to have gone the other way! Now we have so many overlapping points that its hard to see in \"high corgi\" areas like the UK. Let's try a log scale:" ] }, { "cell_type": "code", "execution_count": 49, "id": "33af2154", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='gray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_subset2).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n", " tooltip='Country',\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "7eba854d", "metadata": {}, "source": [ "Hey that looks better! Though, that blue is a bit hard to see, let's try messing with the colors:" ] }, { "cell_type": "code", "execution_count": 50, "id": "823bb08d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_subset2).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n", " tooltip='Country',\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "1745af2d", "metadata": {}, "source": [ "Neat! Now, wouldn't it be fun if we could see those data points \"grow\" over time. Let's look at our dataframe again:" ] }, { "cell_type": "code", "execution_count": 51, "id": "8c674e4f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-01 00:00:001918-01-01 00:00:001919-01-01 00:00:001920-01-01 00:00:001921-01-01 00:00:001922-01-01 00:00:001923-01-01 00:00:001924-01-01 00:00:001925-01-01 00:00:001926-01-01 00:00:00...2014-01-01 00:00:002015-01-01 00:00:002016-01-01 00:00:002017-01-01 00:00:002018-01-01 00:00:002019-01-01 00:00:002020-01-01 00:00:00LatitudeLongitudeTotal Corg
Country
United States0000000000...280301229213198118239.783730-100.44588216130
Brazil0000000000...0000000-10.333333-53.2000001
Russia0000000000...1272372172603291891664.68631497.7453061834
Japan0000000000...002200036.574844139.2394187
Vietnam0000000000...000130015.926666107.9650864
\n", "

5 rows × 107 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1926-01-01 00:00:00 ... 2014-01-01 00:00:00 \\\n", "Country ... \n", "United States 0 ... 280 \n", "Brazil 0 ... 0 \n", "Russia 0 ... 127 \n", "Japan 0 ... 0 \n", "Vietnam 0 ... 0 \n", "\n", "Years 2015-01-01 00:00:00 2016-01-01 00:00:00 2017-01-01 00:00:00 \\\n", "Country \n", "United States 301 229 213 \n", "Brazil 0 0 0 \n", "Russia 237 217 260 \n", "Japan 0 2 2 \n", "Vietnam 0 0 1 \n", "\n", "Years 2018-01-01 00:00:00 2019-01-01 00:00:00 2020-01-01 00:00:00 \\\n", "Country \n", "United States 198 118 2 \n", "Brazil 0 0 0 \n", "Russia 329 189 16 \n", "Japan 0 0 0 \n", "Vietnam 3 0 0 \n", "\n", "Years Latitude Longitude Total Corg \n", "Country \n", "United States 39.783730 -100.445882 16130 \n", "Brazil -10.333333 -53.200000 1 \n", "Russia 64.686314 97.745306 1834 \n", "Japan 36.574844 139.239418 7 \n", "Vietnam 15.926666 107.965086 4 \n", "\n", "[5 rows x 107 columns]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t.head()" ] }, { "cell_type": "markdown", "id": "91d03087", "metadata": {}, "source": [ "Instead of total corgs overall, what we really want is total corgs over time. Luckily, our columns are in the correct date order:" ] }, { "cell_type": "code", "execution_count": 52, "id": "4b0fc8b4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-01 00:00:001918-01-01 00:00:001919-01-01 00:00:001920-01-01 00:00:001921-01-01 00:00:001922-01-01 00:00:001923-01-01 00:00:001924-01-01 00:00:001925-01-01 00:00:001926-01-01 00:00:00...2013-01-01 00:00:002014-01-01 00:00:002015-01-01 00:00:002016-01-01 00:00:002017-01-01 00:00:002018-01-01 00:00:002019-01-01 00:00:002020-01-01 00:00:00LatitudeLongitude
Country
United States0000000000...376280301229213198118239.783730-100.445882
Brazil0000000000...00000000-10.333333-53.200000
Russia0000000000...1151272372172603291891664.68631497.745306
Japan0000000000...0002200036.574844139.239418
Vietnam0000000000...0000130015.926666107.965086
\n", "

5 rows × 106 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "\n", "Years 1926-01-01 00:00:00 ... 2013-01-01 00:00:00 \\\n", "Country ... \n", "United States 0 ... 376 \n", "Brazil 0 ... 0 \n", "Russia 0 ... 115 \n", "Japan 0 ... 0 \n", "Vietnam 0 ... 0 \n", "\n", "Years 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00 \\\n", "Country \n", "United States 280 301 229 \n", "Brazil 0 0 0 \n", "Russia 127 237 217 \n", "Japan 0 0 2 \n", "Vietnam 0 0 0 \n", "\n", "Years 2017-01-01 00:00:00 2018-01-01 00:00:00 2019-01-01 00:00:00 \\\n", "Country \n", "United States 213 198 118 \n", "Brazil 0 0 0 \n", "Russia 260 329 189 \n", "Japan 2 0 0 \n", "Vietnam 1 3 0 \n", "\n", "Years 2020-01-01 00:00:00 Latitude Longitude \n", "Country \n", "United States 2 39.783730 -100.445882 \n", "Brazil 0 -10.333333 -53.200000 \n", "Russia 16 64.686314 97.745306 \n", "Japan 0 36.574844 139.239418 \n", "Vietnam 0 15.926666 107.965086 \n", "\n", "[5 rows x 106 columns]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean2 = corg_clean_t.loc[:,(corg_clean_t.columns != 'Total Corg')].copy()\n", "corg_clean2.head()" ] }, { "cell_type": "code", "execution_count": 53, "id": "040a6c82", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-011918-01-011919-01-011920-01-011921-01-011922-01-011923-01-011924-01-011925-01-011926-01-01...2011-01-012012-01-012013-01-012014-01-012015-01-012016-01-012017-01-012018-01-012019-01-012020-01-01
Country
United States0000000000...4084313762803012292131981182
Brazil0000000000...0000000000
Russia0000000000...898211512723721726032918916
Japan0000000000...0000022000
Vietnam0000000000...0000001300
\n", "

5 rows × 104 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "\n", "Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n", "Country ... \n", "United States ... 408 431 376 280 \n", "Brazil ... 0 0 0 0 \n", "Russia ... 89 82 115 127 \n", "Japan ... 0 0 0 0 \n", "Vietnam ... 0 0 0 0 \n", "\n", "Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n", "Country \n", "United States 301 229 213 198 118 \n", "Brazil 0 0 0 0 0 \n", "Russia 237 217 260 329 189 \n", "Japan 0 2 2 0 0 \n", "Vietnam 0 0 1 3 0 \n", "\n", "Years 2020-01-01 \n", "Country \n", "United States 2 \n", "Brazil 0 \n", "Russia 16 \n", "Japan 0 \n", "Vietnam 0 \n", "\n", "[5 rows x 104 columns]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')].head()" ] }, { "cell_type": "code", "execution_count": 54, "id": "a242c754", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-011918-01-011919-01-011920-01-011921-01-011922-01-011923-01-011924-01-011925-01-011926-01-01...2011-01-012012-01-012013-01-012014-01-012015-01-012016-01-012017-01-012018-01-012019-01-012020-01-01
Country
United States0000000000...13982144131478915069153701559915812160101612816130
Brazil0000000000...1111111111
Russia0000000000...26234445958682310401300162918181834
Japan0000000000...3333357777
Vietnam0000000000...0000001444
Germany0000000000...428508589666738792820862892892
France0000000000...276302338375395450510535597597
United Kingdom111111351931...2509253125572579260526332643264826492649
Italy0000000000...818181949698102104106106
South Africa0000000000...19232428282828282828
Ukraine0000000000...891010121212181919
Spain0000000000...0000000055
Poland0000000000...337379460526617692748825859859
Canada0000000000...329344359364377383386390391391
North Korea0000000000...0034445555
Romania0000000000...0000000077
Australia0000000000...814827844855861870879887891891
Portugal0000000000...0001233577
Belgium0000000000...6379949698104105107110110
Czech Republic0000000000...139150171188210233255264271271
Hungary0000000000...7778888888
Belarus0000000000...00000512213838
Sweden0000000000...1214134214611525164317911928198820072008
Austria0000000000...21222229292929292929
Switzerland0000000000...28292929303030303030
Israel0000000000...0008131721253232
Serbia0000000000...0000112222
Denmark0000000000...1549165317601856196720452115214821762176
Finland0000000000...2824296731103308350936953961404440484051
Norway0000000000...434503570669770873970105610771077
Croatia0000000000...000031115212121
New Zealand0000000000...1080108010821083108810911094109710971097
Ireland0000000000...29292929292929292929
Lithuania0000000000...552020202929292929
Uruguay0000000000...0000002222
Latvia0000000000...3333333333
Slovenia0000000000...1111111111
Estonia0000000000...11181824305765819797
Netherlands0000000000...323375417448470487512524536536
Kosovo0011111111...185204251288352424525581632649
\n", "

40 rows × 104 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "Germany 0 0 0 0 0 \n", "France 0 0 0 0 0 \n", "United Kingdom 1 1 1 1 1 \n", "Italy 0 0 0 0 0 \n", "South Africa 0 0 0 0 0 \n", "Ukraine 0 0 0 0 0 \n", "Spain 0 0 0 0 0 \n", "Poland 0 0 0 0 0 \n", "Canada 0 0 0 0 0 \n", "North Korea 0 0 0 0 0 \n", "Romania 0 0 0 0 0 \n", "Australia 0 0 0 0 0 \n", "Portugal 0 0 0 0 0 \n", "Belgium 0 0 0 0 0 \n", "Czech Republic 0 0 0 0 0 \n", "Hungary 0 0 0 0 0 \n", "Belarus 0 0 0 0 0 \n", "Sweden 0 0 0 0 0 \n", "Austria 0 0 0 0 0 \n", "Switzerland 0 0 0 0 0 \n", "Israel 0 0 0 0 0 \n", "Serbia 0 0 0 0 0 \n", "Denmark 0 0 0 0 0 \n", "Finland 0 0 0 0 0 \n", "Norway 0 0 0 0 0 \n", "Croatia 0 0 0 0 0 \n", "New Zealand 0 0 0 0 0 \n", "Ireland 0 0 0 0 0 \n", "Lithuania 0 0 0 0 0 \n", "Uruguay 0 0 0 0 0 \n", "Latvia 0 0 0 0 0 \n", "Slovenia 0 0 0 0 0 \n", "Estonia 0 0 0 0 0 \n", "Netherlands 0 0 0 0 0 \n", "Kosovo 0 0 1 1 1 \n", "\n", "Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n", "Country \n", "United States 0 0 0 0 0 \n", "Brazil 0 0 0 0 0 \n", "Russia 0 0 0 0 0 \n", "Japan 0 0 0 0 0 \n", "Vietnam 0 0 0 0 0 \n", "Germany 0 0 0 0 0 \n", "France 0 0 0 0 0 \n", "United Kingdom 1 3 5 19 31 \n", "Italy 0 0 0 0 0 \n", "South Africa 0 0 0 0 0 \n", "Ukraine 0 0 0 0 0 \n", "Spain 0 0 0 0 0 \n", "Poland 0 0 0 0 0 \n", "Canada 0 0 0 0 0 \n", "North Korea 0 0 0 0 0 \n", "Romania 0 0 0 0 0 \n", "Australia 0 0 0 0 0 \n", "Portugal 0 0 0 0 0 \n", "Belgium 0 0 0 0 0 \n", "Czech Republic 0 0 0 0 0 \n", "Hungary 0 0 0 0 0 \n", "Belarus 0 0 0 0 0 \n", "Sweden 0 0 0 0 0 \n", "Austria 0 0 0 0 0 \n", "Switzerland 0 0 0 0 0 \n", "Israel 0 0 0 0 0 \n", "Serbia 0 0 0 0 0 \n", "Denmark 0 0 0 0 0 \n", "Finland 0 0 0 0 0 \n", "Norway 0 0 0 0 0 \n", "Croatia 0 0 0 0 0 \n", "New Zealand 0 0 0 0 0 \n", "Ireland 0 0 0 0 0 \n", "Lithuania 0 0 0 0 0 \n", "Uruguay 0 0 0 0 0 \n", "Latvia 0 0 0 0 0 \n", "Slovenia 0 0 0 0 0 \n", "Estonia 0 0 0 0 0 \n", "Netherlands 0 0 0 0 0 \n", "Kosovo 1 1 1 1 1 \n", "\n", "Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n", "Country ... \n", "United States ... 13982 14413 14789 15069 \n", "Brazil ... 1 1 1 1 \n", "Russia ... 262 344 459 586 \n", "Japan ... 3 3 3 3 \n", "Vietnam ... 0 0 0 0 \n", "Germany ... 428 508 589 666 \n", "France ... 276 302 338 375 \n", "United Kingdom ... 2509 2531 2557 2579 \n", "Italy ... 81 81 81 94 \n", "South Africa ... 19 23 24 28 \n", "Ukraine ... 8 9 10 10 \n", "Spain ... 0 0 0 0 \n", "Poland ... 337 379 460 526 \n", "Canada ... 329 344 359 364 \n", "North Korea ... 0 0 3 4 \n", "Romania ... 0 0 0 0 \n", "Australia ... 814 827 844 855 \n", "Portugal ... 0 0 0 1 \n", "Belgium ... 63 79 94 96 \n", "Czech Republic ... 139 150 171 188 \n", "Hungary ... 7 7 7 8 \n", "Belarus ... 0 0 0 0 \n", "Sweden ... 1214 1342 1461 1525 \n", "Austria ... 21 22 22 29 \n", "Switzerland ... 28 29 29 29 \n", "Israel ... 0 0 0 8 \n", "Serbia ... 0 0 0 0 \n", "Denmark ... 1549 1653 1760 1856 \n", "Finland ... 2824 2967 3110 3308 \n", "Norway ... 434 503 570 669 \n", "Croatia ... 0 0 0 0 \n", "New Zealand ... 1080 1080 1082 1083 \n", "Ireland ... 29 29 29 29 \n", "Lithuania ... 5 5 20 20 \n", "Uruguay ... 0 0 0 0 \n", "Latvia ... 3 3 3 3 \n", "Slovenia ... 1 1 1 1 \n", "Estonia ... 11 18 18 24 \n", "Netherlands ... 323 375 417 448 \n", "Kosovo ... 185 204 251 288 \n", "\n", "Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n", "Country \n", "United States 15370 15599 15812 16010 16128 \n", "Brazil 1 1 1 1 1 \n", "Russia 823 1040 1300 1629 1818 \n", "Japan 3 5 7 7 7 \n", "Vietnam 0 0 1 4 4 \n", "Germany 738 792 820 862 892 \n", "France 395 450 510 535 597 \n", "United Kingdom 2605 2633 2643 2648 2649 \n", "Italy 96 98 102 104 106 \n", "South Africa 28 28 28 28 28 \n", "Ukraine 12 12 12 18 19 \n", "Spain 0 0 0 0 5 \n", "Poland 617 692 748 825 859 \n", "Canada 377 383 386 390 391 \n", "North Korea 4 4 5 5 5 \n", "Romania 0 0 0 0 7 \n", "Australia 861 870 879 887 891 \n", "Portugal 2 3 3 5 7 \n", "Belgium 98 104 105 107 110 \n", "Czech Republic 210 233 255 264 271 \n", "Hungary 8 8 8 8 8 \n", "Belarus 0 5 12 21 38 \n", "Sweden 1643 1791 1928 1988 2007 \n", "Austria 29 29 29 29 29 \n", "Switzerland 30 30 30 30 30 \n", "Israel 13 17 21 25 32 \n", "Serbia 1 1 2 2 2 \n", "Denmark 1967 2045 2115 2148 2176 \n", "Finland 3509 3695 3961 4044 4048 \n", "Norway 770 873 970 1056 1077 \n", "Croatia 3 11 15 21 21 \n", "New Zealand 1088 1091 1094 1097 1097 \n", "Ireland 29 29 29 29 29 \n", "Lithuania 20 29 29 29 29 \n", "Uruguay 0 0 2 2 2 \n", "Latvia 3 3 3 3 3 \n", "Slovenia 1 1 1 1 1 \n", "Estonia 30 57 65 81 97 \n", "Netherlands 470 487 512 524 536 \n", "Kosovo 352 424 525 581 632 \n", "\n", "Years 2020-01-01 \n", "Country \n", "United States 16130 \n", "Brazil 1 \n", "Russia 1834 \n", "Japan 7 \n", "Vietnam 4 \n", "Germany 892 \n", "France 597 \n", "United Kingdom 2649 \n", "Italy 106 \n", "South Africa 28 \n", "Ukraine 19 \n", "Spain 5 \n", "Poland 859 \n", "Canada 391 \n", "North Korea 5 \n", "Romania 7 \n", "Australia 891 \n", "Portugal 7 \n", "Belgium 110 \n", "Czech Republic 271 \n", "Hungary 8 \n", "Belarus 38 \n", "Sweden 2008 \n", "Austria 29 \n", "Switzerland 30 \n", "Israel 32 \n", "Serbia 2 \n", "Denmark 2176 \n", "Finland 4051 \n", "Norway 1077 \n", "Croatia 21 \n", "New Zealand 1097 \n", "Ireland 29 \n", "Lithuania 29 \n", "Uruguay 2 \n", "Latvia 3 \n", "Slovenia 1 \n", "Estonia 97 \n", "Netherlands 536 \n", "Kosovo 649 \n", "\n", "[40 rows x 104 columns]" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')].cumsum(axis=1)" ] }, { "cell_type": "code", "execution_count": 55, "id": "efd62762", "metadata": {}, "outputs": [], "source": [ "corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')] = \\\n", " corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')].cumsum(axis=1)" ] }, { "cell_type": "code", "execution_count": 56, "id": "7d8f7724", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Years1917-01-01 00:00:001918-01-01 00:00:001919-01-01 00:00:001920-01-01 00:00:001921-01-01 00:00:001922-01-01 00:00:001923-01-01 00:00:001924-01-01 00:00:001925-01-01 00:00:001926-01-01 00:00:00...2013-01-01 00:00:002014-01-01 00:00:002015-01-01 00:00:002016-01-01 00:00:002017-01-01 00:00:002018-01-01 00:00:002019-01-01 00:00:002020-01-01 00:00:00LatitudeLongitude
Country
United States0000000000...147891506915370155991581216010161281613039.783730-100.445882
Brazil0000000000...11111111-10.333333-53.200000
Russia0000000000...4595868231040130016291818183464.68631497.745306
Japan0000000000...3335777736.574844139.239418
Vietnam0000000000...0000144415.926666107.965086
Germany0000000000...58966673879282086289289251.16381810.447831
France0000000000...33837539545051053559759746.6033541.888334
United Kingdom111111351931...2557257926052633264326482649264954.702354-3.276575
Italy0000000000...8194969810210410610642.63842612.674297
South Africa0000000000...2428282828282828-28.81662424.991639
Ukraine0000000000...101012121218191949.48719731.271832
Spain0000000000...0000005539.326068-4.837979
Poland0000000000...46052661769274882585985952.21593319.134422
Canada0000000000...35936437738338639039139161.066692-107.991707
North Korea0000000000...3444555540.373661127.087042
Romania0000000000...0000007745.98521324.685923
Australia0000000000...844855861870879887891891-24.776109134.755000
Portugal0000000000...0123357739.662165-8.135352
Belgium0000000000...94969810410510711011050.6402814.666715
Czech Republic0000000000...17118821023325526427127149.74390515.338106
Hungary0000000000...7888888847.18175919.506094
Belarus0000000000...00051221383853.42506127.697136
Sweden0000000000...1461152516431791192819882007200859.67497114.520858
Austria0000000000...222929292929292947.59397014.124560
Switzerland0000000000...292930303030303046.7985628.231974
Israel0000000000...0813172125323230.81242534.859476
Serbia0000000000...0011222244.15341220.551440
Denmark0000000000...1760185619672045211521482176217655.67024910.333328
Finland0000000000...3110330835093695396140444048405163.24677825.920916
Norway0000000000...57066977087397010561077107760.5000219.099972
Croatia0000000000...003111521212145.56434417.011895
New Zealand0000000000...10821083108810911094109710971097-41.500083172.834408
Ireland0000000000...292929292929292952.865196-7.979460
Lithuania0000000000...202020292929292955.35000023.750000
Uruguay0000000000...00002222-32.875555-56.020153
Latvia0000000000...3333333356.84064924.753764
Slovenia0000000000...1111111145.81331114.480837
Estonia0000000000...182430576581979758.75237825.331908
Netherlands0000000000...41744847048751252453653652.2476505.541247
Kosovo0011111111...25128835242452558163264942.58695820.902123
\n", "

40 rows × 106 columns

\n", "
" ], "text/plain": [ "Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "Germany 0 0 0 \n", "France 0 0 0 \n", "United Kingdom 1 1 1 \n", "Italy 0 0 0 \n", "South Africa 0 0 0 \n", "Ukraine 0 0 0 \n", "Spain 0 0 0 \n", "Poland 0 0 0 \n", "Canada 0 0 0 \n", "North Korea 0 0 0 \n", "Romania 0 0 0 \n", "Australia 0 0 0 \n", "Portugal 0 0 0 \n", "Belgium 0 0 0 \n", "Czech Republic 0 0 0 \n", "Hungary 0 0 0 \n", "Belarus 0 0 0 \n", "Sweden 0 0 0 \n", "Austria 0 0 0 \n", "Switzerland 0 0 0 \n", "Israel 0 0 0 \n", "Serbia 0 0 0 \n", "Denmark 0 0 0 \n", "Finland 0 0 0 \n", "Norway 0 0 0 \n", "Croatia 0 0 0 \n", "New Zealand 0 0 0 \n", "Ireland 0 0 0 \n", "Lithuania 0 0 0 \n", "Uruguay 0 0 0 \n", "Latvia 0 0 0 \n", "Slovenia 0 0 0 \n", "Estonia 0 0 0 \n", "Netherlands 0 0 0 \n", "Kosovo 0 0 1 \n", "\n", "Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "Germany 0 0 0 \n", "France 0 0 0 \n", "United Kingdom 1 1 1 \n", "Italy 0 0 0 \n", "South Africa 0 0 0 \n", "Ukraine 0 0 0 \n", "Spain 0 0 0 \n", "Poland 0 0 0 \n", "Canada 0 0 0 \n", "North Korea 0 0 0 \n", "Romania 0 0 0 \n", "Australia 0 0 0 \n", "Portugal 0 0 0 \n", "Belgium 0 0 0 \n", "Czech Republic 0 0 0 \n", "Hungary 0 0 0 \n", "Belarus 0 0 0 \n", "Sweden 0 0 0 \n", "Austria 0 0 0 \n", "Switzerland 0 0 0 \n", "Israel 0 0 0 \n", "Serbia 0 0 0 \n", "Denmark 0 0 0 \n", "Finland 0 0 0 \n", "Norway 0 0 0 \n", "Croatia 0 0 0 \n", "New Zealand 0 0 0 \n", "Ireland 0 0 0 \n", "Lithuania 0 0 0 \n", "Uruguay 0 0 0 \n", "Latvia 0 0 0 \n", "Slovenia 0 0 0 \n", "Estonia 0 0 0 \n", "Netherlands 0 0 0 \n", "Kosovo 1 1 1 \n", "\n", "Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n", "Country \n", "United States 0 0 0 \n", "Brazil 0 0 0 \n", "Russia 0 0 0 \n", "Japan 0 0 0 \n", "Vietnam 0 0 0 \n", "Germany 0 0 0 \n", "France 0 0 0 \n", "United Kingdom 3 5 19 \n", "Italy 0 0 0 \n", "South Africa 0 0 0 \n", "Ukraine 0 0 0 \n", "Spain 0 0 0 \n", "Poland 0 0 0 \n", "Canada 0 0 0 \n", "North Korea 0 0 0 \n", "Romania 0 0 0 \n", "Australia 0 0 0 \n", "Portugal 0 0 0 \n", "Belgium 0 0 0 \n", "Czech Republic 0 0 0 \n", "Hungary 0 0 0 \n", "Belarus 0 0 0 \n", "Sweden 0 0 0 \n", "Austria 0 0 0 \n", "Switzerland 0 0 0 \n", "Israel 0 0 0 \n", "Serbia 0 0 0 \n", "Denmark 0 0 0 \n", "Finland 0 0 0 \n", "Norway 0 0 0 \n", "Croatia 0 0 0 \n", "New Zealand 0 0 0 \n", "Ireland 0 0 0 \n", "Lithuania 0 0 0 \n", "Uruguay 0 0 0 \n", "Latvia 0 0 0 \n", "Slovenia 0 0 0 \n", "Estonia 0 0 0 \n", "Netherlands 0 0 0 \n", "Kosovo 1 1 1 \n", "\n", "Years 1926-01-01 00:00:00 ... 2013-01-01 00:00:00 \\\n", "Country ... \n", "United States 0 ... 14789 \n", "Brazil 0 ... 1 \n", "Russia 0 ... 459 \n", "Japan 0 ... 3 \n", "Vietnam 0 ... 0 \n", "Germany 0 ... 589 \n", "France 0 ... 338 \n", "United Kingdom 31 ... 2557 \n", "Italy 0 ... 81 \n", "South Africa 0 ... 24 \n", "Ukraine 0 ... 10 \n", "Spain 0 ... 0 \n", "Poland 0 ... 460 \n", "Canada 0 ... 359 \n", "North Korea 0 ... 3 \n", "Romania 0 ... 0 \n", "Australia 0 ... 844 \n", "Portugal 0 ... 0 \n", "Belgium 0 ... 94 \n", "Czech Republic 0 ... 171 \n", "Hungary 0 ... 7 \n", "Belarus 0 ... 0 \n", "Sweden 0 ... 1461 \n", "Austria 0 ... 22 \n", "Switzerland 0 ... 29 \n", "Israel 0 ... 0 \n", "Serbia 0 ... 0 \n", "Denmark 0 ... 1760 \n", "Finland 0 ... 3110 \n", "Norway 0 ... 570 \n", "Croatia 0 ... 0 \n", "New Zealand 0 ... 1082 \n", "Ireland 0 ... 29 \n", "Lithuania 0 ... 20 \n", "Uruguay 0 ... 0 \n", "Latvia 0 ... 3 \n", "Slovenia 0 ... 1 \n", "Estonia 0 ... 18 \n", "Netherlands 0 ... 417 \n", "Kosovo 1 ... 251 \n", "\n", "Years 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00 \\\n", "Country \n", "United States 15069 15370 15599 \n", "Brazil 1 1 1 \n", "Russia 586 823 1040 \n", "Japan 3 3 5 \n", "Vietnam 0 0 0 \n", "Germany 666 738 792 \n", "France 375 395 450 \n", "United Kingdom 2579 2605 2633 \n", "Italy 94 96 98 \n", "South Africa 28 28 28 \n", "Ukraine 10 12 12 \n", "Spain 0 0 0 \n", "Poland 526 617 692 \n", "Canada 364 377 383 \n", "North Korea 4 4 4 \n", "Romania 0 0 0 \n", "Australia 855 861 870 \n", "Portugal 1 2 3 \n", "Belgium 96 98 104 \n", "Czech Republic 188 210 233 \n", "Hungary 8 8 8 \n", "Belarus 0 0 5 \n", "Sweden 1525 1643 1791 \n", "Austria 29 29 29 \n", "Switzerland 29 30 30 \n", "Israel 8 13 17 \n", "Serbia 0 1 1 \n", "Denmark 1856 1967 2045 \n", "Finland 3308 3509 3695 \n", "Norway 669 770 873 \n", "Croatia 0 3 11 \n", "New Zealand 1083 1088 1091 \n", "Ireland 29 29 29 \n", "Lithuania 20 20 29 \n", "Uruguay 0 0 0 \n", "Latvia 3 3 3 \n", "Slovenia 1 1 1 \n", "Estonia 24 30 57 \n", "Netherlands 448 470 487 \n", "Kosovo 288 352 424 \n", "\n", "Years 2017-01-01 00:00:00 2018-01-01 00:00:00 2019-01-01 00:00:00 \\\n", "Country \n", "United States 15812 16010 16128 \n", "Brazil 1 1 1 \n", "Russia 1300 1629 1818 \n", "Japan 7 7 7 \n", "Vietnam 1 4 4 \n", "Germany 820 862 892 \n", "France 510 535 597 \n", "United Kingdom 2643 2648 2649 \n", "Italy 102 104 106 \n", "South Africa 28 28 28 \n", "Ukraine 12 18 19 \n", "Spain 0 0 5 \n", "Poland 748 825 859 \n", "Canada 386 390 391 \n", "North Korea 5 5 5 \n", "Romania 0 0 7 \n", "Australia 879 887 891 \n", "Portugal 3 5 7 \n", "Belgium 105 107 110 \n", "Czech Republic 255 264 271 \n", "Hungary 8 8 8 \n", "Belarus 12 21 38 \n", "Sweden 1928 1988 2007 \n", "Austria 29 29 29 \n", "Switzerland 30 30 30 \n", "Israel 21 25 32 \n", "Serbia 2 2 2 \n", "Denmark 2115 2148 2176 \n", "Finland 3961 4044 4048 \n", "Norway 970 1056 1077 \n", "Croatia 15 21 21 \n", "New Zealand 1094 1097 1097 \n", "Ireland 29 29 29 \n", "Lithuania 29 29 29 \n", "Uruguay 2 2 2 \n", "Latvia 3 3 3 \n", "Slovenia 1 1 1 \n", "Estonia 65 81 97 \n", "Netherlands 512 524 536 \n", "Kosovo 525 581 632 \n", "\n", "Years 2020-01-01 00:00:00 Latitude Longitude \n", "Country \n", "United States 16130 39.783730 -100.445882 \n", "Brazil 1 -10.333333 -53.200000 \n", "Russia 1834 64.686314 97.745306 \n", "Japan 7 36.574844 139.239418 \n", "Vietnam 4 15.926666 107.965086 \n", "Germany 892 51.163818 10.447831 \n", "France 597 46.603354 1.888334 \n", "United Kingdom 2649 54.702354 -3.276575 \n", "Italy 106 42.638426 12.674297 \n", "South Africa 28 -28.816624 24.991639 \n", "Ukraine 19 49.487197 31.271832 \n", "Spain 5 39.326068 -4.837979 \n", "Poland 859 52.215933 19.134422 \n", "Canada 391 61.066692 -107.991707 \n", "North Korea 5 40.373661 127.087042 \n", "Romania 7 45.985213 24.685923 \n", "Australia 891 -24.776109 134.755000 \n", "Portugal 7 39.662165 -8.135352 \n", "Belgium 110 50.640281 4.666715 \n", "Czech Republic 271 49.743905 15.338106 \n", "Hungary 8 47.181759 19.506094 \n", "Belarus 38 53.425061 27.697136 \n", "Sweden 2008 59.674971 14.520858 \n", "Austria 29 47.593970 14.124560 \n", "Switzerland 30 46.798562 8.231974 \n", "Israel 32 30.812425 34.859476 \n", "Serbia 2 44.153412 20.551440 \n", "Denmark 2176 55.670249 10.333328 \n", "Finland 4051 63.246778 25.920916 \n", "Norway 1077 60.500021 9.099972 \n", "Croatia 21 45.564344 17.011895 \n", "New Zealand 1097 -41.500083 172.834408 \n", "Ireland 29 52.865196 -7.979460 \n", "Lithuania 29 55.350000 23.750000 \n", "Uruguay 2 -32.875555 -56.020153 \n", "Latvia 3 56.840649 24.753764 \n", "Slovenia 1 45.813311 14.480837 \n", "Estonia 97 58.752378 25.331908 \n", "Netherlands 536 52.247650 5.541247 \n", "Kosovo 649 42.586958 20.902123 \n", "\n", "[40 rows x 106 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean2" ] }, { "cell_type": "code", "execution_count": 57, "id": "c7fda0a7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearsCountry1917-01-01 00:00:001918-01-01 00:00:001919-01-01 00:00:001920-01-01 00:00:001921-01-01 00:00:001922-01-01 00:00:001923-01-01 00:00:001924-01-01 00:00:001925-01-01 00:00:00...2013-01-01 00:00:002014-01-01 00:00:002015-01-01 00:00:002016-01-01 00:00:002017-01-01 00:00:002018-01-01 00:00:002019-01-01 00:00:002020-01-01 00:00:00LatitudeLongitude
0United States000000000...147891506915370155991581216010161281613039.783730-100.445882
1Brazil000000000...11111111-10.333333-53.200000
2Russia000000000...4595868231040130016291818183464.68631497.745306
3Japan000000000...3335777736.574844139.239418
4Vietnam000000000...0000144415.926666107.965086
5Germany000000000...58966673879282086289289251.16381810.447831
6France000000000...33837539545051053559759746.6033541.888334
7United Kingdom1111113519...2557257926052633264326482649264954.702354-3.276575
8Italy000000000...8194969810210410610642.63842612.674297
9South Africa000000000...2428282828282828-28.81662424.991639
10Ukraine000000000...101012121218191949.48719731.271832
11Spain000000000...0000005539.326068-4.837979
12Poland000000000...46052661769274882585985952.21593319.134422
13Canada000000000...35936437738338639039139161.066692-107.991707
14North Korea000000000...3444555540.373661127.087042
15Romania000000000...0000007745.98521324.685923
16Australia000000000...844855861870879887891891-24.776109134.755000
17Portugal000000000...0123357739.662165-8.135352
18Belgium000000000...94969810410510711011050.6402814.666715
19Czech Republic000000000...17118821023325526427127149.74390515.338106
20Hungary000000000...7888888847.18175919.506094
21Belarus000000000...00051221383853.42506127.697136
22Sweden000000000...1461152516431791192819882007200859.67497114.520858
23Austria000000000...222929292929292947.59397014.124560
24Switzerland000000000...292930303030303046.7985628.231974
25Israel000000000...0813172125323230.81242534.859476
26Serbia000000000...0011222244.15341220.551440
27Denmark000000000...1760185619672045211521482176217655.67024910.333328
28Finland000000000...3110330835093695396140444048405163.24677825.920916
29Norway000000000...57066977087397010561077107760.5000219.099972
30Croatia000000000...003111521212145.56434417.011895
31New Zealand000000000...10821083108810911094109710971097-41.500083172.834408
32Ireland000000000...292929292929292952.865196-7.979460
33Lithuania000000000...202020292929292955.35000023.750000
34Uruguay000000000...00002222-32.875555-56.020153
35Latvia000000000...3333333356.84064924.753764
36Slovenia000000000...1111111145.81331114.480837
37Estonia000000000...182430576581979758.75237825.331908
38Netherlands000000000...41744847048751252453653652.2476505.541247
39Kosovo001111111...25128835242452558163264942.58695820.902123
\n", "

40 rows × 107 columns

\n", "
" ], "text/plain": [ "Years Country 1917-01-01 00:00:00 1918-01-01 00:00:00 \\\n", "0 United States 0 0 \n", "1 Brazil 0 0 \n", "2 Russia 0 0 \n", "3 Japan 0 0 \n", "4 Vietnam 0 0 \n", "5 Germany 0 0 \n", "6 France 0 0 \n", "7 United Kingdom 1 1 \n", "8 Italy 0 0 \n", "9 South Africa 0 0 \n", "10 Ukraine 0 0 \n", "11 Spain 0 0 \n", "12 Poland 0 0 \n", "13 Canada 0 0 \n", "14 North Korea 0 0 \n", "15 Romania 0 0 \n", "16 Australia 0 0 \n", "17 Portugal 0 0 \n", "18 Belgium 0 0 \n", "19 Czech Republic 0 0 \n", "20 Hungary 0 0 \n", "21 Belarus 0 0 \n", "22 Sweden 0 0 \n", "23 Austria 0 0 \n", "24 Switzerland 0 0 \n", "25 Israel 0 0 \n", "26 Serbia 0 0 \n", "27 Denmark 0 0 \n", "28 Finland 0 0 \n", "29 Norway 0 0 \n", "30 Croatia 0 0 \n", "31 New Zealand 0 0 \n", "32 Ireland 0 0 \n", "33 Lithuania 0 0 \n", "34 Uruguay 0 0 \n", "35 Latvia 0 0 \n", "36 Slovenia 0 0 \n", "37 Estonia 0 0 \n", "38 Netherlands 0 0 \n", "39 Kosovo 0 0 \n", "\n", "Years 1919-01-01 00:00:00 1920-01-01 00:00:00 1921-01-01 00:00:00 \\\n", "0 0 0 0 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 0 \n", "5 0 0 0 \n", "6 0 0 0 \n", "7 1 1 1 \n", "8 0 0 0 \n", "9 0 0 0 \n", "10 0 0 0 \n", "11 0 0 0 \n", "12 0 0 0 \n", "13 0 0 0 \n", "14 0 0 0 \n", "15 0 0 0 \n", "16 0 0 0 \n", "17 0 0 0 \n", "18 0 0 0 \n", "19 0 0 0 \n", "20 0 0 0 \n", "21 0 0 0 \n", "22 0 0 0 \n", "23 0 0 0 \n", "24 0 0 0 \n", "25 0 0 0 \n", "26 0 0 0 \n", "27 0 0 0 \n", "28 0 0 0 \n", "29 0 0 0 \n", "30 0 0 0 \n", "31 0 0 0 \n", "32 0 0 0 \n", "33 0 0 0 \n", "34 0 0 0 \n", "35 0 0 0 \n", "36 0 0 0 \n", "37 0 0 0 \n", "38 0 0 0 \n", "39 1 1 1 \n", "\n", "Years 1922-01-01 00:00:00 1923-01-01 00:00:00 1924-01-01 00:00:00 \\\n", "0 0 0 0 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 0 \n", "5 0 0 0 \n", "6 0 0 0 \n", "7 1 3 5 \n", "8 0 0 0 \n", "9 0 0 0 \n", "10 0 0 0 \n", "11 0 0 0 \n", "12 0 0 0 \n", "13 0 0 0 \n", "14 0 0 0 \n", "15 0 0 0 \n", "16 0 0 0 \n", "17 0 0 0 \n", "18 0 0 0 \n", "19 0 0 0 \n", "20 0 0 0 \n", "21 0 0 0 \n", "22 0 0 0 \n", "23 0 0 0 \n", "24 0 0 0 \n", "25 0 0 0 \n", "26 0 0 0 \n", "27 0 0 0 \n", "28 0 0 0 \n", "29 0 0 0 \n", "30 0 0 0 \n", "31 0 0 0 \n", "32 0 0 0 \n", "33 0 0 0 \n", "34 0 0 0 \n", "35 0 0 0 \n", "36 0 0 0 \n", "37 0 0 0 \n", "38 0 0 0 \n", "39 1 1 1 \n", "\n", "Years 1925-01-01 00:00:00 ... 2013-01-01 00:00:00 2014-01-01 00:00:00 \\\n", "0 0 ... 14789 15069 \n", "1 0 ... 1 1 \n", "2 0 ... 459 586 \n", "3 0 ... 3 3 \n", "4 0 ... 0 0 \n", "5 0 ... 589 666 \n", "6 0 ... 338 375 \n", "7 19 ... 2557 2579 \n", "8 0 ... 81 94 \n", "9 0 ... 24 28 \n", "10 0 ... 10 10 \n", "11 0 ... 0 0 \n", "12 0 ... 460 526 \n", "13 0 ... 359 364 \n", "14 0 ... 3 4 \n", "15 0 ... 0 0 \n", "16 0 ... 844 855 \n", "17 0 ... 0 1 \n", "18 0 ... 94 96 \n", "19 0 ... 171 188 \n", "20 0 ... 7 8 \n", "21 0 ... 0 0 \n", "22 0 ... 1461 1525 \n", "23 0 ... 22 29 \n", "24 0 ... 29 29 \n", "25 0 ... 0 8 \n", "26 0 ... 0 0 \n", "27 0 ... 1760 1856 \n", "28 0 ... 3110 3308 \n", "29 0 ... 570 669 \n", "30 0 ... 0 0 \n", "31 0 ... 1082 1083 \n", "32 0 ... 29 29 \n", "33 0 ... 20 20 \n", "34 0 ... 0 0 \n", "35 0 ... 3 3 \n", "36 0 ... 1 1 \n", "37 0 ... 18 24 \n", "38 0 ... 417 448 \n", "39 1 ... 251 288 \n", "\n", "Years 2015-01-01 00:00:00 2016-01-01 00:00:00 2017-01-01 00:00:00 \\\n", "0 15370 15599 15812 \n", "1 1 1 1 \n", "2 823 1040 1300 \n", "3 3 5 7 \n", "4 0 0 1 \n", "5 738 792 820 \n", "6 395 450 510 \n", "7 2605 2633 2643 \n", "8 96 98 102 \n", "9 28 28 28 \n", "10 12 12 12 \n", "11 0 0 0 \n", "12 617 692 748 \n", "13 377 383 386 \n", "14 4 4 5 \n", "15 0 0 0 \n", "16 861 870 879 \n", "17 2 3 3 \n", "18 98 104 105 \n", "19 210 233 255 \n", "20 8 8 8 \n", "21 0 5 12 \n", "22 1643 1791 1928 \n", "23 29 29 29 \n", "24 30 30 30 \n", "25 13 17 21 \n", "26 1 1 2 \n", "27 1967 2045 2115 \n", "28 3509 3695 3961 \n", "29 770 873 970 \n", "30 3 11 15 \n", "31 1088 1091 1094 \n", "32 29 29 29 \n", "33 20 29 29 \n", "34 0 0 2 \n", "35 3 3 3 \n", "36 1 1 1 \n", "37 30 57 65 \n", "38 470 487 512 \n", "39 352 424 525 \n", "\n", "Years 2018-01-01 00:00:00 2019-01-01 00:00:00 2020-01-01 00:00:00 \\\n", "0 16010 16128 16130 \n", "1 1 1 1 \n", "2 1629 1818 1834 \n", "3 7 7 7 \n", "4 4 4 4 \n", "5 862 892 892 \n", "6 535 597 597 \n", "7 2648 2649 2649 \n", "8 104 106 106 \n", "9 28 28 28 \n", "10 18 19 19 \n", "11 0 5 5 \n", "12 825 859 859 \n", "13 390 391 391 \n", "14 5 5 5 \n", "15 0 7 7 \n", "16 887 891 891 \n", "17 5 7 7 \n", "18 107 110 110 \n", "19 264 271 271 \n", "20 8 8 8 \n", "21 21 38 38 \n", "22 1988 2007 2008 \n", "23 29 29 29 \n", "24 30 30 30 \n", "25 25 32 32 \n", "26 2 2 2 \n", "27 2148 2176 2176 \n", "28 4044 4048 4051 \n", "29 1056 1077 1077 \n", "30 21 21 21 \n", "31 1097 1097 1097 \n", "32 29 29 29 \n", "33 29 29 29 \n", "34 2 2 2 \n", "35 3 3 3 \n", "36 1 1 1 \n", "37 81 97 97 \n", "38 524 536 536 \n", "39 581 632 649 \n", "\n", "Years Latitude Longitude \n", "0 39.783730 -100.445882 \n", "1 -10.333333 -53.200000 \n", "2 64.686314 97.745306 \n", "3 36.574844 139.239418 \n", "4 15.926666 107.965086 \n", "5 51.163818 10.447831 \n", "6 46.603354 1.888334 \n", "7 54.702354 -3.276575 \n", "8 42.638426 12.674297 \n", "9 -28.816624 24.991639 \n", "10 49.487197 31.271832 \n", "11 39.326068 -4.837979 \n", "12 52.215933 19.134422 \n", "13 61.066692 -107.991707 \n", "14 40.373661 127.087042 \n", "15 45.985213 24.685923 \n", "16 -24.776109 134.755000 \n", "17 39.662165 -8.135352 \n", "18 50.640281 4.666715 \n", "19 49.743905 15.338106 \n", "20 47.181759 19.506094 \n", "21 53.425061 27.697136 \n", "22 59.674971 14.520858 \n", "23 47.593970 14.124560 \n", "24 46.798562 8.231974 \n", "25 30.812425 34.859476 \n", "26 44.153412 20.551440 \n", "27 55.670249 10.333328 \n", "28 63.246778 25.920916 \n", "29 60.500021 9.099972 \n", "30 45.564344 17.011895 \n", "31 -41.500083 172.834408 \n", "32 52.865196 -7.979460 \n", "33 55.350000 23.750000 \n", "34 -32.875555 -56.020153 \n", "35 56.840649 24.753764 \n", "36 45.813311 14.480837 \n", "37 58.752378 25.331908 \n", "38 52.247650 5.541247 \n", "39 42.586958 20.902123 \n", "\n", "[40 rows x 107 columns]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean2.reset_index()" ] }, { "cell_type": "markdown", "id": "a2e7245a", "metadata": {}, "source": [ "Now let's \"remelt\" the data into a format we can use for our plot:" ] }, { "cell_type": "code", "execution_count": 58, "id": "011a29fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryLongitudeLatitudeYearcumulative_sum
0United States-100.44588239.7837301917-01-010
1Brazil-53.200000-10.3333331917-01-010
2Russia97.74530664.6863141917-01-010
3Japan139.23941836.5748441917-01-010
4Vietnam107.96508615.9266661917-01-010
\n", "
" ], "text/plain": [ " Country Longitude Latitude Year cumulative_sum\n", "0 United States -100.445882 39.783730 1917-01-01 0\n", "1 Brazil -53.200000 -10.333333 1917-01-01 0\n", "2 Russia 97.745306 64.686314 1917-01-01 0\n", "3 Japan 139.239418 36.574844 1917-01-01 0\n", "4 Vietnam 107.965086 15.926666 1917-01-01 0" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_melt = corg_clean2.reset_index().melt(['Country','Longitude','Latitude'], \n", " var_name='Year', value_name='cumulative_sum')\n", "corg_melt.head()" ] }, { "cell_type": "markdown", "id": "b9dd0f22", "metadata": {}, "source": [ "Now we can make a little [slider in Altair](https://altair-viz.github.io/user_guide/interactions.html#selection-values-in-expressions) to change the date range for our circles interactively:" ] }, { "cell_type": "code", "execution_count": 59, "id": "b5ea8d6a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(Timestamp('1917-01-01 00:00:00'), Timestamp('2020-01-01 00:00:00'))" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_melt['Year'].min(), corg_melt['Year'].max()" ] }, { "cell_type": "markdown", "id": "ccfc3675", "metadata": {}, "source": [ "Since sliders (at least at the time of writing) [can't have datetime inputs](https://stackoverflow.com/questions/62046930/altair-adding-date-slider-for-interactive-scatter-chart-pot) let's cheat a bit by making another column called \"year_int\":" ] }, { "cell_type": "code", "execution_count": 60, "id": "5724da74", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryLongitudeLatitudeYearcumulative_sumyear_int
0United States-100.44588239.7837301917-01-0101917
1Brazil-53.200000-10.3333331917-01-0101917
2Russia97.74530664.6863141917-01-0101917
3Japan139.23941836.5748441917-01-0101917
4Vietnam107.96508615.9266661917-01-0101917
\n", "
" ], "text/plain": [ " Country Longitude Latitude Year cumulative_sum year_int\n", "0 United States -100.445882 39.783730 1917-01-01 0 1917\n", "1 Brazil -53.200000 -10.333333 1917-01-01 0 1917\n", "2 Russia 97.745306 64.686314 1917-01-01 0 1917\n", "3 Japan 139.239418 36.574844 1917-01-01 0 1917\n", "4 Vietnam 107.965086 15.926666 1917-01-01 0 1917" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_melt['year_int'] = corg_melt['Year'].dt.year.astype('int')\n", "corg_melt.head()" ] }, { "cell_type": "markdown", "id": "acbfe05d", "metadata": {}, "source": [ "Note here to that for [equity selections we don't have to use == signs](https://stackoverflow.com/questions/68071713/in-altair-equality-condition-doesnt-work) in Altair (it won't work... just for fun I guess)." ] }, { "cell_type": "code", "execution_count": 61, "id": "56195c43", "metadata": {}, "outputs": [], "source": [ "slider = alt.binding_range(min=corg_melt['year_int'].min(), \n", " max=corg_melt['year_int'].max(), step=1, name='Max year:')\n", "#selector = alt.selection_single(name=\"SelectorName\", fields=['cutoff'],\n", "# bind=slider, init={'cutoff': 2000})\n", "selector = alt.selection_single(name=\"SelectorName\", fields=['year_int'],\n", " bind=slider, init={'year_int': 2000})" ] }, { "cell_type": "code", "execution_count": 62, "id": "1ded3b87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").properties(\n", " width=800,\n", " height=500\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_melt).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " #size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n", " size=alt.condition(\n", " #((alt.datum.year_int < selector.cutoff-10)&(alt.datum.year_int >= selector.cutoff)),\n", " #(alt.datum.year_int == selector.cutoff),\n", " #\"datum.year_int == selector.cutoff\",\n", " #alt.expr.datum['year_int'] < selector.cutoff,\n", " selector,\n", " alt.Size('cumulative_sum:Q',scale=None), alt.value(0)\n", " ),\n", " tooltip='Country',\n", ").add_selection(\n", " selector\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "24ddbe0b", "metadata": {}, "source": [ "One final thing, let's add in some info about what each dot means in our tooltip:" ] }, { "cell_type": "code", "execution_count": 63, "id": "1fdbcbdf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='lightgray',\n", " stroke='black'\n", ").properties(\n", " width=800,\n", " height=500\n", "#).project('mercator') # note we have a few projections we can use!\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_melt).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " #size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n", " size=alt.condition(\n", " #((alt.datum.year_int < selector.year_int-10)&(alt.datum.year_int >= selector.year_int)),\n", " #(alt.datum.year_int < selector.year_int),\n", " #\"datum.year_int == selector.year_int\",\n", " #alt.expr.datum['year_int'] < selector.year_int,\n", " selector,\n", " alt.Size('cumulative_sum:Q',scale=None), alt.value(0)\n", " ),\n", " tooltip=['Country','cumulative_sum'],\n", ").add_selection(\n", " selector\n", ")\n", "\n", "world + points" ] }, { "cell_type": "markdown", "id": "2c24cdbf", "metadata": {}, "source": [ "Looks nice! Let's save it:" ] }, { "cell_type": "code", "execution_count": 64, "id": "0e4e3f7e", "metadata": {}, "outputs": [], "source": [ "chart_out = world + points\n", "\n", "chart_out.properties(width='container').save(myJekyllDir+\"assets/json/corgis_dotchart_world.json\") " ] }, { "cell_type": "markdown", "id": "3794475b", "metadata": {}, "source": [ "We note that when we run this though, we get a few artifacts. We can try to \"smooth\" the transitions with a bit of interpolation:" ] }, { "cell_type": "code", "execution_count": 65, "id": "65c23b3d", "metadata": {}, "outputs": [], "source": [ "alt.selection_single?" ] }, { "cell_type": "code", "execution_count": 66, "id": "d0c34d9d", "metadata": {}, "outputs": [], "source": [ "slider = alt.binding_range(min=corg_melt['year_int'].min(), \n", " max=corg_melt['year_int'].max(), step=1, name='Max year:')\n", "selector = alt.selection_single(name=\"SelectorName\", fields=['year_int'],\n", " bind=slider, init={'year_int': corg_melt['year_int'].min()},\n", " nearest=True)" ] }, { "cell_type": "code", "execution_count": 67, "id": "4467f02f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo = alt.topo_feature(data.world_110m.url, feature='countries')\n", "\n", "# US states background\n", "world = alt.Chart(geo).mark_geoshape(\n", " fill='lightgray',\n", " stroke='black'\n", ").properties(\n", " width=800,\n", " height=500\n", "#).project('mercator') # note we have a few projections we can use!\n", ").project('equirectangular') # note we have a few projections we can use!\n", "\n", "points = alt.Chart(corg_melt).mark_circle().encode(\n", " longitude='Longitude:Q',\n", " latitude='Latitude:Q',\n", " #size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n", " size=alt.condition(\n", " #((alt.datum.year_int < selector.year_int-10)&(alt.datum.year_int >= selector.year_int)),\n", " #(alt.datum.year_int < selector.year_int),\n", " #\"datum.year_int == selector.year_int\",\n", " #alt.expr.datum['year_int'] < selector.year_int,\n", " selector,\n", " alt.Size('cumulative_sum:Q',scale=None), alt.value(0)\n", " ),\n", " tooltip=['Country','cumulative_sum'],\n", ").add_selection(\n", " selector\n", ")\n", "\n", "world + points" ] }, { "cell_type": "code", "execution_count": 68, "id": "925dd8fc", "metadata": {}, "outputs": [], "source": [ "chart_out = world + points\n", "\n", "chart_out.properties(width='container').save(myJekyllDir+\"assets/json/corgis_dotchart_world_smooth.json\") " ] }, { "cell_type": "markdown", "id": "240f2707", "metadata": {}, "source": [ "Groovy!\n", "\n", "One thing to note here is how much of the data cleaning and transformation we ended up doing in Python. In theory one probably *could* do this in Altair/vega-lite, but not without a lot of headache and in Python, we have the option of checking each \"stage\" of our data transformation so we can make sure it makes sense -- in vega-lite/Altair, we don't really have this option (as easily)." ] }, { "cell_type": "markdown", "id": "e385a097", "metadata": {}, "source": [ "## Extra: Corgi data and choropleth\n", "\n", "One final thing (well, not final final, there are infinite things we can do!) is to instead of plotting points on a map, we can color the a map of the world by the population of corgis at a particular time. \n", "\n", "This is called a [choropleth map](https://altair-viz.github.io/gallery/choropleth.html), and this is probably the last time I will EVER spell that correctly :D \n", "\n", "These can be [a little tricky in Altair](https://altair-viz.github.io/altair-tutorial/notebooks/09-Geographic-plots.html#colored-choropleths) since you have to map between pre-determined names of countries (as stored in the vegadataset world map) and however your data is stored." ] }, { "cell_type": "markdown", "id": "a5583f3d", "metadata": {}, "source": [ "Let's start with just coloring our mappable data based on the total corgis born. First, based on the [documentation about how to do this](https://altair-viz.github.io/altair-tutorial/notebooks/09-Geographic-plots.html#colored-choropleths) we know that we have to match up the world-map ID with whatever ID for each country as listed in our dataset. There are also some [other transformation-related things to be aware of](https://stackoverflow.com/questions/59224026/how-to-add-a-slider-to-a-choropleth-in-altair) that we'll cover after we deal with the ID look up stuff.\n", "\n", "\n", "Let's dig a bit deeper with geopandas:" ] }, { "cell_type": "code", "execution_count": 69, "id": "4393182c", "metadata": {}, "outputs": [], "source": [ "import geopandas" ] }, { "cell_type": "code", "execution_count": 70, "id": "f350e176", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json'" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.world_110m.url" ] }, { "cell_type": "code", "execution_count": 71, "id": "3b7784d1", "metadata": {}, "outputs": [], "source": [ "gdf = geopandas.read_file(data.world_110m.url,include_fields=['name'],layer='countries')" ] }, { "cell_type": "code", "execution_count": 72, "id": "822936ff", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgeometry
04POLYGON ((61.20961 35.64925, 62.23202 35.27011...
124MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -...
28POLYGON ((20.59041 41.85586, 20.46440 41.51565...
3784POLYGON ((51.57952 24.24479, 51.75592 24.29387...
432MULTIPOLYGON (((-66.95887 -54.89756, -67.56368...
.........
172548MULTIPOLYGON (((167.51508 -16.59835, 167.18027...
173887POLYGON ((52.38592 16.38285, 52.19152 15.93771...
174710POLYGON ((28.21888 -32.77244, 27.46287 -33.227...
175894POLYGON ((32.75853 -9.23064, 33.23013 -9.67747...
176716POLYGON ((31.19251 -22.25149, 30.65971 -22.151...
\n", "

177 rows × 2 columns

\n", "
" ], "text/plain": [ " id geometry\n", "0 4 POLYGON ((61.20961 35.64925, 62.23202 35.27011...\n", "1 24 MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -...\n", "2 8 POLYGON ((20.59041 41.85586, 20.46440 41.51565...\n", "3 784 POLYGON ((51.57952 24.24479, 51.75592 24.29387...\n", "4 32 MULTIPOLYGON (((-66.95887 -54.89756, -67.56368...\n", ".. ... ...\n", "172 548 MULTIPOLYGON (((167.51508 -16.59835, 167.18027...\n", "173 887 POLYGON ((52.38592 16.38285, 52.19152 15.93771...\n", "174 710 POLYGON ((28.21888 -32.77244, 27.46287 -33.227...\n", "175 894 POLYGON ((32.75853 -9.23064, 33.23013 -9.67747...\n", "176 716 POLYGON ((31.19251 -22.25149, 30.65971 -22.151...\n", "\n", "[177 rows x 2 columns]" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf" ] }, { "cell_type": "markdown", "id": "2f520b97", "metadata": {}, "source": [ "So, here we see that there is this ID -- this matches up with each country, for example:" ] }, { "cell_type": "code", "execution_count": 73, "id": "773ec570", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf.iloc[0]['geometry']" ] }, { "cell_type": "code", "execution_count": 74, "id": "894e919d", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf.iloc[1]['geometry']" ] }, { "cell_type": "markdown", "id": "229e1e69", "metadata": {}, "source": [ "Are both countries! But how to find out which ones?" ] }, { "cell_type": "markdown", "id": "f59b2829", "metadata": {}, "source": [ "To do that, we have to map the ID's to their [world country codes](https://documentation-resources.opendatasoft.com/explore/dataset/natural-earth-countries-110m/information/). Luckily, that is [already done for us](https://github.com/alisle/world-110m-country-codes)." ] }, { "cell_type": "code", "execution_count": 75, "id": "33b6169f", "metadata": {}, "outputs": [], "source": [ "country_codes = pd.read_json('https://raw.githubusercontent.com/alisle/world-110m-country-codes/master/world-110m-country-codes.json')" ] }, { "cell_type": "code", "execution_count": 76, "id": "67f8b834", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codeidname
0AF4Afghanistan
1AL8Albania
2DZ12Algeria
3AO24Angola
4AQ10Antarctica
\n", "
" ], "text/plain": [ " code id name\n", "0 AF 4 Afghanistan\n", "1 AL 8 Albania\n", "2 DZ 12 Algeria\n", "3 AO 24 Angola\n", "4 AQ 10 Antarctica" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "country_codes.head()" ] }, { "cell_type": "code", "execution_count": 77, "id": "9815d0cf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codename
id
4AFAfghanistan
8ALAlbania
12DZAlgeria
24AOAngola
10AQAntarctica
\n", "
" ], "text/plain": [ " code name\n", "id \n", "4 AF Afghanistan\n", "8 AL Albania\n", "12 DZ Algeria\n", "24 AO Angola\n", "10 AQ Antarctica" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "country_codes = country_codes.set_index('id')\n", "country_codes.head()" ] }, { "cell_type": "code", "execution_count": 78, "id": "d8faba63", "metadata": {}, "outputs": [], "source": [ "gdf['id'] = gdf['id'].astype('int64')" ] }, { "cell_type": "code", "execution_count": 79, "id": "030f9911", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 4\n", "1 24\n", "2 8\n", "3 784\n", "4 32\n", "Name: id, dtype: int64" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf['id'].head()" ] }, { "cell_type": "markdown", "id": "a8009388", "metadata": {}, "source": [ "Let's merge these two tables together by the country ID so that we can get the name." ] }, { "cell_type": "code", "execution_count": 80, "id": "7f1b4a8a", "metadata": {}, "outputs": [], "source": [ "gdf_comb = gdf.join(country_codes, on='id')" ] }, { "cell_type": "code", "execution_count": 81, "id": "3387468c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgeometrycodename
04POLYGON ((61.20961 35.64925, 62.23202 35.27011...AFAfghanistan
124MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -...AOAngola
28POLYGON ((20.59041 41.85586, 20.46440 41.51565...ALAlbania
3784POLYGON ((51.57952 24.24479, 51.75592 24.29387...AEUnited Arab Emirates
432MULTIPOLYGON (((-66.95887 -54.89756, -67.56368...ARArgentina
...............
172548MULTIPOLYGON (((167.51508 -16.59835, 167.18027...VUVanuatu
173887POLYGON ((52.38592 16.38285, 52.19152 15.93771...YEYemen
174710POLYGON ((28.21888 -32.77244, 27.46287 -33.227...ZASouth Africa
175894POLYGON ((32.75853 -9.23064, 33.23013 -9.67747...ZMZambia
176716POLYGON ((31.19251 -22.25149, 30.65971 -22.151...ZWZimbabwe
\n", "

177 rows × 4 columns

\n", "
" ], "text/plain": [ " id geometry code \\\n", "0 4 POLYGON ((61.20961 35.64925, 62.23202 35.27011... AF \n", "1 24 MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -... AO \n", "2 8 POLYGON ((20.59041 41.85586, 20.46440 41.51565... AL \n", "3 784 POLYGON ((51.57952 24.24479, 51.75592 24.29387... AE \n", "4 32 MULTIPOLYGON (((-66.95887 -54.89756, -67.56368... AR \n", ".. ... ... ... \n", "172 548 MULTIPOLYGON (((167.51508 -16.59835, 167.18027... VU \n", "173 887 POLYGON ((52.38592 16.38285, 52.19152 15.93771... YE \n", "174 710 POLYGON ((28.21888 -32.77244, 27.46287 -33.227... ZA \n", "175 894 POLYGON ((32.75853 -9.23064, 33.23013 -9.67747... ZM \n", "176 716 POLYGON ((31.19251 -22.25149, 30.65971 -22.151... ZW \n", "\n", " name \n", "0 Afghanistan \n", "1 Angola \n", "2 Albania \n", "3 United Arab Emirates \n", "4 Argentina \n", ".. ... \n", "172 Vanuatu \n", "173 Yemen \n", "174 South Africa \n", "175 Zambia \n", "176 Zimbabwe \n", "\n", "[177 rows x 4 columns]" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf_comb" ] }, { "cell_type": "markdown", "id": "315eb159", "metadata": {}, "source": [ "Let's re-load our data to start from \"clean\":" ] }, { "cell_type": "code", "execution_count": 82, "id": "c63d4c21", "metadata": {}, "outputs": [], "source": [ "corgs = pd.read_csv('https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/corgs_per_country_over_time_columns_2020.csv')" ] }, { "cell_type": "markdown", "id": "e02e38be", "metadata": {}, "source": [ "Clean this data with known naming issues:" ] }, { "cell_type": "code", "execution_count": 83, "id": "67c50b6f", "metadata": {}, "outputs": [], "source": [ "corg_clean = corgs.rename(columns={'Korea, North':\"North Korea\",'Netherlands Antilles':\"Netherlands\"})" ] }, { "cell_type": "markdown", "id": "ee5cc513", "metadata": {}, "source": [ "And transpose it:" ] }, { "cell_type": "code", "execution_count": 84, "id": "98adf91d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearsCountry191719181919192019211922192319241925...2011201220132014201520162017201820192020
0United States000000000...4084313762803012292131981182
1Brazil000000000...0000000000
2Russia000000000...898211512723721726032918916
3Japan000000000...0000022000
4Vietnam000000000...0000001300
\n", "

5 rows × 105 columns

\n", "
" ], "text/plain": [ "Years Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 \\\n", "0 United States 0 0 0 0 0 0 0 0 0 \n", "1 Brazil 0 0 0 0 0 0 0 0 0 \n", "2 Russia 0 0 0 0 0 0 0 0 0 \n", "3 Japan 0 0 0 0 0 0 0 0 0 \n", "4 Vietnam 0 0 0 0 0 0 0 0 0 \n", "\n", "Years ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 \n", "0 ... 408 431 376 280 301 229 213 198 118 2 \n", "1 ... 0 0 0 0 0 0 0 0 0 0 \n", "2 ... 89 82 115 127 237 217 260 329 189 16 \n", "3 ... 0 0 0 0 0 2 2 0 0 0 \n", "4 ... 0 0 0 0 0 0 1 3 0 0 \n", "\n", "[5 rows x 105 columns]" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean_t = corg_clean.set_index('Years').T\n", "corg_clean_t.index.name = 'Country'\n", "corg_clean2 = corg_clean_t.reset_index()\n", "corg_clean2.head()" ] }, { "cell_type": "markdown", "id": "3e3a2ed6", "metadata": {}, "source": [ "We could turn this into a cumulative sum, but instead, let's leave as is this time and focus on making a chorolpeth plot that shows the corgis born *in that year* instead of a cumulative sum." ] }, { "cell_type": "markdown", "id": "ca5cd0ad", "metadata": {}, "source": [ "Now let's see how many of these columns match up with the names that we have in our corgi table:" ] }, { "cell_type": "code", "execution_count": 85, "id": "b624152c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['United States', 'Brazil', 'Russia', 'Japan', 'Vietnam', 'Germany',\n", " 'France', 'United Kingdom', 'Italy', 'South Africa', 'Ukraine',\n", " 'Spain', 'Poland', 'Canada', 'North Korea', 'Romania', 'Australia',\n", " 'Portugal', 'Belgium', 'Czech Republic', 'Hungary', 'Belarus',\n", " 'Sweden', 'Austria', 'Switzerland', 'Israel', 'Serbia', 'Denmark',\n", " 'Finland', 'Norway', 'Croatia', 'New Zealand', 'Ireland',\n", " 'Lithuania', 'Uruguay', 'Latvia', 'Slovenia', 'Estonia',\n", " 'Netherlands', 'Kosovo'], dtype=object)" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_clean2['Country'].unique()" ] }, { "cell_type": "code", "execution_count": 86, "id": "6c382c1c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "no Russia\n", "no Vietnam\n", "no North Korea\n", "no Kosovo\n" ] } ], "source": [ "for c in corg_clean2['Country'].unique():\n", " if c not in gdf_comb['name'].values:\n", " print('no',c)" ] }, { "cell_type": "markdown", "id": "d572b6ca", "metadata": {}, "source": [ "So, once again, it seems that we are missing some names. Let's see if we can get close with some fuzzy matching:" ] }, { "cell_type": "code", "execution_count": 87, "id": "52b28496", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Afghanistan', 'Angola', 'Albania', 'United Arab Emirates',\n", " 'Argentina', 'Armenia', 'Antarctica',\n", " 'French Southern Territories', 'Australia', 'Austria',\n", " 'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso',\n", " 'Bangladesh', 'Bulgaria', 'Bahamas', 'Bosnia and Herzegovina',\n", " 'Belarus', 'Belize', 'Bolivia, Plurinational State of', 'Brazil',\n", " 'Brunei Darussalam', 'Bhutan', 'Botswana',\n", " 'Central African Republic', 'Canada', 'Switzerland', 'Chile',\n", " 'China', \"Cote d'Ivoire\", 'Cameroon',\n", " 'Congo, the Democratic Republic of the', 'Congo', 'Colombia',\n", " 'Costa Rica', 'Cuba', nan, 'Cyprus', 'Czech Republic', 'Germany',\n", " 'Djibouti', 'Denmark', 'Dominican Republic', 'Algeria', 'Ecuador',\n", " 'Egypt', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia', 'Finland',\n", " 'Fiji', 'Falkland Islands (Malvinas)', 'France', 'Gabon',\n", " 'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia',\n", " 'Guinea-Bissau', 'Equatorial Guinea', 'Greece', 'Greenland',\n", " 'Guatemala', 'Guyana', 'Honduras', 'Croatia', 'Haiti', 'Hungary',\n", " 'Indonesia', 'India', 'Ireland', 'Iran, Islamic Republic of',\n", " 'Iraq', 'Iceland', 'Israel', 'Italy', 'Jamaica', 'Jordan', 'Japan',\n", " 'Kazakhstan', 'Kenya', 'Kyrgyzstan', 'Cambodia',\n", " 'Korea, Republic of', nan, 'Kuwait',\n", " \"Lao People's Democratic Republic\", 'Lebanon', 'Liberia', 'Libya',\n", " 'Sri Lanka', 'Lesotho', 'Lithuania', 'Luxembourg', 'Latvia',\n", " 'Morocco', 'Moldova, Republic of', 'Madagascar', 'Mexico',\n", " 'Macedonia, the former Yugoslav Republic of', 'Mali', 'Myanmar',\n", " 'Montenegro', 'Mongolia', 'Mozambique', 'Mauritania', 'Malawi',\n", " 'Malaysia', 'Namibia', 'New Caledonia', 'Niger', 'Nigeria',\n", " 'Nicaragua', 'Netherlands', 'Norway', 'Nepal', 'New Zealand',\n", " 'Oman', 'Pakistan', 'Panama', 'Peru', 'Philippines',\n", " 'Papua New Guinea', 'Poland', 'Puerto Rico',\n", " \"Korea, Democratic People's Republic of\", 'Portugal', 'Paraguay',\n", " nan, 'Qatar', 'Romania', 'Russian Federation', 'Rwanda',\n", " 'Western Sahara', 'Saudi Arabia', 'Sudan', 'South Sudan',\n", " 'Senegal', 'Solomon Islands', 'Sierra Leone', 'El Salvador', nan,\n", " 'Somalia', 'Serbia', 'Suriname', 'Slovakia', 'Slovenia', 'Sweden',\n", " 'Swaziland', 'Syrian Arab Republic', 'Chad', 'Togo', 'Thailand',\n", " 'Tajikistan', 'Turkmenistan', 'Timor-Leste', 'Trinidad and Tobago',\n", " 'Tunisia', 'Turkey', 'Taiwan, Province of China',\n", " 'Tanzania, United Republic of', 'Uganda', 'Ukraine', 'Uruguay',\n", " 'United States', 'Uzbekistan', 'Venezuela, Bolivarian Republic of',\n", " 'Viet Nam', 'Vanuatu', 'Yemen', 'South Africa', 'Zambia',\n", " 'Zimbabwe'], dtype=object)" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf_comb['name'].values" ] }, { "cell_type": "code", "execution_count": 88, "id": "0bcec1ef", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgeometrycodename
168840MULTIPOLYGON (((-155.68896 18.91661, -155.9373...USUnited States
\n", "
" ], "text/plain": [ " id geometry code \\\n", "168 840 MULTIPOLYGON (((-155.68896 18.91661, -155.9373... US \n", "\n", " name \n", "168 United States " ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf_comb.loc[gdf_comb['name']=='United States']" ] }, { "cell_type": "code", "execution_count": 89, "id": "69131f0a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Russia Russian Federation\n", "found ['Russian Federation'] for Russia\n", "Vietnam Viet Nam\n", "found ['Viet Nam'] for Vietnam\n", "no match North Korea\n", "no match Kosovo\n" ] } ], "source": [ "for c in corg_clean2['Country'].unique():\n", " if c not in gdf_comb['name'].values: # if not in there, look for fuzzy\n", " #print('no',c)\n", " country_match = []\n", " for cc in gdf_comb['name'].values:\n", " #if c in cc: # there is an NaN\n", " if type(cc)==str:\n", " c2 = \"\".join(c.split()).lower()\n", " cc2 = \"\".join(cc.split()).lower()\n", " if c2 in cc2:\n", " country_match.append(cc)\n", " print(c,cc)\n", " if len(country_match) >0:\n", " print('found', country_match, 'for',c)\n", " else:\n", " print('no match',c)\n", " " ] }, { "cell_type": "code", "execution_count": 90, "id": "c4cdfe95", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "found ['Russian Federation'] for Russia\n", "found ['Viet Nam'] for Vietnam\n", "no match North Korea\n", "no match Kosovo\n" ] } ], "source": [ "# store these names to be the same as in our dataset\n", "for c in corg_clean2['Country'].unique():\n", " if c not in gdf_comb['name'].values: # if not in there, look for fuzzy\n", " #print('no',c)\n", " country_match = []\n", " for cc in gdf_comb['name'].values:\n", " #if c in cc: # there is an NaN\n", " if type(cc)==str:\n", " c2 = \"\".join(c.split()).lower()\n", " cc2 = \"\".join(cc.split()).lower()\n", " if c2 in cc2:\n", " country_match.append(cc)\n", " #print(c,cc)\n", " if len(country_match) >0:\n", " print('found', country_match, 'for',c)\n", " if len(country_match)==1: # only one\n", " gdf_comb.loc[gdf_comb['name']==country_match[0],'name'] = c # replace\n", " else:\n", " print('no match',c)" ] }, { "cell_type": "markdown", "id": "7a2cca7a", "metadata": {}, "source": [ "Missing ids for North Korea and Kosovo, in our original corgi dataset, but let's add the IDs that we can to our corgi dataset:" ] }, { "cell_type": "code", "execution_count": 91, "id": "6c1a1dd4", "metadata": {}, "outputs": [], "source": [ "corg_clean2.join?" ] }, { "cell_type": "code", "execution_count": 92, "id": "3de8ad00", "metadata": {}, "outputs": [], "source": [ "corg_join = corg_clean2.reset_index().merge(gdf_comb,right_on='name',left_on='Country')" ] }, { "cell_type": "code", "execution_count": 93, "id": "abeb7b75", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexCountry19171918191919201921192219231924...201520162017201820192020idgeometrycodename
00United States00000000...3012292131981182840MULTIPOLYGON (((-155.68896 18.91661, -155.9373...USUnited States
11Brazil00000000...00000076POLYGON ((-57.62398 -30.21667, -56.29196 -28.8...BRBrazil
22Russia00000000...23721726032918916643MULTIPOLYGON (((142.09162 45.96708, 141.90802 ...RURussia
33Japan00000000...022000392MULTIPOLYGON (((133.01593 32.70419, 132.36432 ...JPJapan
44Vietnam00000000...001300704POLYGON ((108.04968 21.55193, 106.71407 20.697...VNVietnam
\n", "

5 rows × 110 columns

\n", "
" ], "text/plain": [ " index Country 1917 1918 1919 1920 1921 1922 1923 1924 ... \\\n", "0 0 United States 0 0 0 0 0 0 0 0 ... \n", "1 1 Brazil 0 0 0 0 0 0 0 0 ... \n", "2 2 Russia 0 0 0 0 0 0 0 0 ... \n", "3 3 Japan 0 0 0 0 0 0 0 0 ... \n", "4 4 Vietnam 0 0 0 0 0 0 0 0 ... \n", "\n", " 2015 2016 2017 2018 2019 2020 id \\\n", "0 301 229 213 198 118 2 840 \n", "1 0 0 0 0 0 0 76 \n", "2 237 217 260 329 189 16 643 \n", "3 0 2 2 0 0 0 392 \n", "4 0 0 1 3 0 0 704 \n", "\n", " geometry code name \n", "0 MULTIPOLYGON (((-155.68896 18.91661, -155.9373... US United States \n", "1 POLYGON ((-57.62398 -30.21667, -56.29196 -28.8... BR Brazil \n", "2 MULTIPOLYGON (((142.09162 45.96708, 141.90802 ... RU Russia \n", "3 MULTIPOLYGON (((133.01593 32.70419, 132.36432 ... JP Japan \n", "4 POLYGON ((108.04968 21.55193, 106.71407 20.697... VN Vietnam \n", "\n", "[5 rows x 110 columns]" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_join.head()" ] }, { "cell_type": "markdown", "id": "72b6b71b", "metadata": {}, "source": [ "Now that we have merged everything together, let's simplify this table a bit. We know that we want a year slider, so let's change our columns from datetime to years:" ] }, { "cell_type": "code", "execution_count": 94, "id": "3b4ca36c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Country191719181919192019211922192319241925...201220132014201520162017201820192020id
0United States000000000...4313762803012292131981182840
1Brazil000000000...00000000076
2Russia000000000...8211512723721726032918916643
3Japan000000000...000022000392
4Vietnam000000000...000001300704
\n", "

5 rows × 106 columns

\n", "
" ], "text/plain": [ " Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 ... \\\n", "0 United States 0 0 0 0 0 0 0 0 0 ... \n", "1 Brazil 0 0 0 0 0 0 0 0 0 ... \n", "2 Russia 0 0 0 0 0 0 0 0 0 ... \n", "3 Japan 0 0 0 0 0 0 0 0 0 ... \n", "4 Vietnam 0 0 0 0 0 0 0 0 0 ... \n", "\n", " 2012 2013 2014 2015 2016 2017 2018 2019 2020 id \n", "0 431 376 280 301 229 213 198 118 2 840 \n", "1 0 0 0 0 0 0 0 0 0 76 \n", "2 82 115 127 237 217 260 329 189 16 643 \n", "3 0 0 0 0 2 2 0 0 0 392 \n", "4 0 0 0 0 0 1 3 0 0 704 \n", "\n", "[5 rows x 106 columns]" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_join_small = corg_join.copy()\n", "corg_join_small = corg_join_small.drop(['index','code','name','geometry'],axis=1)\n", "corg_join_small.head()" ] }, { "cell_type": "markdown", "id": "4f9f28d7", "metadata": {}, "source": [ "Before making this interactive in anyway, let's first make a static plot. This will help us explore [some of the transformations that we need to do](https://stackoverflow.com/questions/59224026/how-to-add-a-slider-to-a-choropleth-in-altair) to get this whole thing to work." ] }, { "cell_type": "markdown", "id": "f6a61aa3", "metadata": {}, "source": [ "Let's start with a black-background map:" ] }, { "cell_type": "code", "execution_count": 95, "id": "18b8427a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "world = alt.topo_feature(data.world_110m.url, \"countries\")\n", "\n", "background = alt.Chart(world).mark_geoshape(fill=\"black\").project(\"equirectangular\")\n", "background" ] }, { "cell_type": "markdown", "id": "69162b6d", "metadata": {}, "source": [ "Let's add the forground, and in this case, just pick one year to plot. To select this year, let's first change the type of our columns:" ] }, { "cell_type": "code", "execution_count": 96, "id": "d6fb7a4c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Country', 1917, 1918, 1919, 1920, 1921,\n", " 1922, 1923, 1924, 1925,\n", " ...\n", " 2012, 2013, 2014, 2015, 2016, 2017,\n", " 2018, 2019, 2020, 'id'],\n", " dtype='object', length=106)" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_join_small.columns" ] }, { "cell_type": "code", "execution_count": 97, "id": "85f93c64", "metadata": {}, "outputs": [], "source": [ "corg_join_small.columns = corg_join_small.columns.astype('str')" ] }, { "cell_type": "code", "execution_count": 98, "id": "9c79d7ca", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Country', '1917', '1918', '1919', '1920', '1921', '1922', '1923',\n", " '1924', '1925',\n", " ...\n", " '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020',\n", " 'id'],\n", " dtype='object', length=106)" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_join_small.columns" ] }, { "cell_type": "code", "execution_count": 99, "id": "5850ee25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Country2019id
0United States118840
1Brazil076
2Russia189643
3Japan0392
4Vietnam0704
\n", "
" ], "text/plain": [ " Country 2019 id\n", "0 United States 118 840\n", "1 Brazil 0 76\n", "2 Russia 189 643\n", "3 Japan 0 392\n", "4 Vietnam 0 704" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corg_join_small_2019 = corg_join_small[['Country','2019','id']]\n", "corg_join_small_2019.head()" ] }, { "cell_type": "code", "execution_count": 100, "id": "7d407956", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'columns' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/t7/bwcvd_4177q4872gxghn7p9r0000gq/T/ipykernel_98272/3057673268.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtransform_lookup\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mlookup\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'id'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mfrom_\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0malt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mLookupData\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcorg_join_small_2019\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'id'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mencode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mcolor\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mColor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'2019:Q'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscale\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0malt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mScale\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mscheme\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'blues'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'columns' is not defined" ] } ], "source": [ "foreground = alt.Chart(world).mark_geoshape(\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(corg_join_small_2019, 'id', columns)\n", ").encode(\n", " color = alt.Color('2019:Q', scale=alt.Scale(scheme='blues'))\n", ")\n", "background+foreground" ] }, { "cell_type": "markdown", "id": "a8c16888", "metadata": {}, "source": [ "Next, let's use a [transform_fold](https://altair-viz.github.io/user_guide/transform/fold.html) operation to \"unwrap\" our data into a `year` variable and a `Corgs_Born` variable. This is very similar to the \"melting\" operations we did above, but now, it will be done \"on the fly\" in vega-lite/Altair with each year selection.\n", "\n", "To do that, we need to specify which columns to use. In our current case, we just have the 2019 column as what we want to transform by \"folding\":" ] }, { "cell_type": "code", "execution_count": 101, "id": "3cf73021", "metadata": {}, "outputs": [], "source": [ "columns = ['2019']" ] }, { "cell_type": "code", "execution_count": 102, "id": "83bb3b4d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foreground = alt.Chart(world).mark_geoshape(\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(corg_join_small_2019, 'id', columns)\n", ").transform_fold( # this is so that we can do continuous updates\n", " columns, as_=['year', 'Corgs_Born']\n", ").encode(\n", " color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='blues'))\n", ")\n", "background+foreground" ] }, { "cell_type": "markdown", "id": "f47ed4e2", "metadata": {}, "source": [ "So, this works for one year, but we want multiple years. For this we need a slider like we had above:" ] }, { "cell_type": "code", "execution_count": 103, "id": "f92d367f", "metadata": {}, "outputs": [], "source": [ "slider = alt.binding_range(min=1917, max=2020, step=1, name='Year: ')\n", "select_year = alt.selection_single(name=\"year\", fields=['year'], # will operate on a \"year\" variable calculated \"on the fly\"\n", " bind=slider, init={'year': 2006})" ] }, { "cell_type": "markdown", "id": "0f57cf59", "metadata": {}, "source": [ "Which years columns to use now? Well, all of them!" ] }, { "cell_type": "code", "execution_count": 104, "id": "e84f9357", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['1917', '1918', '1919', '1920', '1921'], dtype='\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foreground = alt.Chart(world).mark_geoshape(\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(corg_join_small, 'id', columns)\n", ").transform_fold( # this is so that we can do continuous updates\n", " columns, as_=['year', 'Corgs_Born']\n", ").encode(\n", " color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='blues'))\n", ").add_selection(\n", " select_year\n", ").transform_filter(\n", " select_year\n", ")\n", "background+foreground" ] }, { "cell_type": "markdown", "id": "ad653963", "metadata": {}, "source": [ "But wait! That doesn't work how we think it should... as we can see by the colorbar -- we are getting NaN's. That is because as discussed above, the slider is expecting integers and `year` is a string. We got around this before by having an `year_int` column, but now we want to do this \"on the fly\" -- we can accomplish this by adding another transformation to our Altair specs:" ] }, { "cell_type": "code", "execution_count": 106, "id": "ab8dd3d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foreground = alt.Chart(world).mark_geoshape(\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(corg_join_small, 'id', columns)\n", ").transform_fold( # this is so that we can do continuous updates\n", " columns, as_=['year', 'Corgs_Born']\n", ").transform_calculate(\n", " year='parseInt(datum.year)',\n", ").encode(\n", " color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='blues'))\n", ").add_selection(\n", " select_year\n", ").transform_filter(\n", " select_year\n", ")\n", "background+foreground" ] }, { "cell_type": "markdown", "id": "8a8d9471", "metadata": {}, "source": [ "Nice! One last thing -- let's make it a bit bigger and maybe change the colormap:" ] }, { "cell_type": "code", "execution_count": 107, "id": "f7424df9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foreground = alt.Chart(world).mark_geoshape(\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(corg_join_small, 'id', columns)\n", ").transform_fold( # this is so that we can do continuous updates\n", " columns, as_=['year', 'Corgs_Born']\n", ").transform_calculate(\n", " year='parseInt(datum.year)',\n", ").encode(\n", " color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='viridis')),\n", ").add_selection(\n", " select_year\n", ").transform_filter(\n", " select_year\n", ")\n", "choro_corg = (background+foreground).properties(\n", " width=700,\n", " height=400\n", ")\n", "choro_corg" ] }, { "cell_type": "markdown", "id": "5d34ae4f", "metadata": {}, "source": [ "I lied! One last-last thing, [let's add in our tooltip](https://stackoverflow.com/questions/65987572/adding-dynamic-tooltip-and-title-to-choropleth-with-slider-in-altair):" ] }, { "cell_type": "code", "execution_count": 108, "id": "7557ffb2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foreground = alt.Chart(world).mark_geoshape(\n", ").transform_lookup(\n", " lookup='id',\n", " from_=alt.LookupData(corg_join_small, 'id', columns)\n", ").transform_fold( # this is so that we can do continuous updates\n", " columns, as_=['year', 'Corgs_Born']\n", ").transform_calculate(\n", " year='parseInt(datum.year)',\n", ").encode(\n", " color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='viridis')),\n", " tooltip=['Country:N','Corgs_Born:Q'] # data types might not be necessary\n", ").add_selection(\n", " select_year\n", ").transform_filter(\n", " select_year\n", ")\n", "choro_corg = (background+foreground).properties(\n", " width=700,\n", " height=400\n", ")\n", "choro_corg" ] }, { "cell_type": "markdown", "id": "33e22cb0", "metadata": {}, "source": [ "Let's save it with a contain width tag!" ] }, { "cell_type": "code", "execution_count": 109, "id": "ecaab709", "metadata": {}, "outputs": [], "source": [ "chart_out = background+foreground\n", "\n", "chart_out.properties(width='container').save(myJekyllDir+\"assets/json/corgis_choro_world.json\") " ] }, { "cell_type": "code", "execution_count": null, "id": "b6cf0caf", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c337295a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6d816ca9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "450fa51e", "metadata": {}, "source": [ "## Optional:\n", "\n", "Maybe try saving with HTML just as an example? https://altair-viz.github.io/user_guide/saving_charts.html" ] }, { "cell_type": "markdown", "id": "927b1453", "metadata": {}, "source": [ "https://stackoverflow.com/questions/59224026/how-to-add-a-slider-to-a-choropleth-in-altair" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }