{
"cells": [
{
"cell_type": "markdown",
"id": "a860826e",
"metadata": {},
"source": [
"# Prep Notebook, Week 13\n",
"\n",
"So, the last lecture we ended passing data through Python to Altair to output as vega-lite. What is the benefit to using Python for data analysis? Well, for some of us Python is our bestie and so we want to hang out with it the most. For others, the benefit is that we can do data cleaning in Python and then put the cleaned data into our plots.\n",
"\n",
"Let's work through a few examples:\n",
"\n",
"1. With the corgis dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9225a116",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import altair as alt\n",
"import matplotlib.pyplot as plt # just in case"
]
},
{
"cell_type": "markdown",
"id": "13531926",
"metadata": {},
"source": [
"## 1. Corgis Dataset examples"
]
},
{
"cell_type": "markdown",
"id": "753601a0",
"metadata": {},
"source": [
"Let's start by reading in this dataset and taking a look -- we'll start with the number of corgis born over time:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "7b21af27",
"metadata": {},
"outputs": [],
"source": [
"corgs = pd.read_csv('https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_bcubcg_fall2022/main/data/corgs_per_country_over_time_columns_2020.csv')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "48c29e18",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Years \n",
" United States \n",
" Brazil \n",
" Russia \n",
" Japan \n",
" Vietnam \n",
" Germany \n",
" France \n",
" United Kingdom \n",
" Italy \n",
" ... \n",
" Croatia \n",
" New Zealand \n",
" Ireland \n",
" Lithuania \n",
" Uruguay \n",
" Latvia \n",
" Slovenia \n",
" Estonia \n",
" Netherlands Antilles \n",
" Kosovo \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 1917 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1 \n",
" 1918 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" 1919 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" \n",
" \n",
" 3 \n",
" 1920 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" 1921 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 41 columns
\n",
"
"
],
"text/plain": [
" Years United States Brazil Russia Japan Vietnam Germany France \\\n",
"0 1917 0 0 0 0 0 0 0 \n",
"1 1918 0 0 0 0 0 0 0 \n",
"2 1919 0 0 0 0 0 0 0 \n",
"3 1920 0 0 0 0 0 0 0 \n",
"4 1921 0 0 0 0 0 0 0 \n",
"\n",
" United Kingdom Italy ... Croatia New Zealand Ireland Lithuania \\\n",
"0 1 0 ... 0 0 0 0 \n",
"1 0 0 ... 0 0 0 0 \n",
"2 0 0 ... 0 0 0 0 \n",
"3 0 0 ... 0 0 0 0 \n",
"4 0 0 ... 0 0 0 0 \n",
"\n",
" Uruguay Latvia Slovenia Estonia Netherlands Antilles Kosovo \n",
"0 0 0 0 0 0 0 \n",
"1 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 1 \n",
"3 0 0 0 0 0 0 \n",
"4 0 0 0 0 0 0 \n",
"\n",
"[5 rows x 41 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corgs.head()"
]
},
{
"cell_type": "markdown",
"id": "3ac38572",
"metadata": {},
"source": [
"So, this shows the number of corgis born over time in different countries. The easiest plot we can think of is probably a line chart of each country over time. First, let's start with the United States:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c7b01113",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linecorg = alt.Chart(corgs).mark_line().encode(\n",
" alt.X('Years:Q'),\n",
" alt.Y('United States:Q')\n",
")\n",
"linecorg"
]
},
{
"cell_type": "markdown",
"id": "367ff0b1",
"metadata": {},
"source": [
"Here again, we probably want to specify that we have time units:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "fa4a063e",
"metadata": {},
"outputs": [],
"source": [
"corgs['Years'] = pd.to_datetime(corgs['Years'].astype('int'), format='%Y')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a95ce9c1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Years \n",
" United States \n",
" Brazil \n",
" Russia \n",
" Japan \n",
" Vietnam \n",
" Germany \n",
" France \n",
" United Kingdom \n",
" Italy \n",
" ... \n",
" Croatia \n",
" New Zealand \n",
" Ireland \n",
" Lithuania \n",
" Uruguay \n",
" Latvia \n",
" Slovenia \n",
" Estonia \n",
" Netherlands Antilles \n",
" Kosovo \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 1917-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1 \n",
" 1918-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" 1919-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" \n",
" \n",
" 3 \n",
" 1920-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" 1921-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 41 columns
\n",
"
"
],
"text/plain": [
" Years United States Brazil Russia Japan Vietnam Germany France \\\n",
"0 1917-01-01 0 0 0 0 0 0 0 \n",
"1 1918-01-01 0 0 0 0 0 0 0 \n",
"2 1919-01-01 0 0 0 0 0 0 0 \n",
"3 1920-01-01 0 0 0 0 0 0 0 \n",
"4 1921-01-01 0 0 0 0 0 0 0 \n",
"\n",
" United Kingdom Italy ... Croatia New Zealand Ireland Lithuania \\\n",
"0 1 0 ... 0 0 0 0 \n",
"1 0 0 ... 0 0 0 0 \n",
"2 0 0 ... 0 0 0 0 \n",
"3 0 0 ... 0 0 0 0 \n",
"4 0 0 ... 0 0 0 0 \n",
"\n",
" Uruguay Latvia Slovenia Estonia Netherlands Antilles Kosovo \n",
"0 0 0 0 0 0 0 \n",
"1 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 1 \n",
"3 0 0 0 0 0 0 \n",
"4 0 0 0 0 0 0 \n",
"\n",
"[5 rows x 41 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corgs.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1c063665",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linecorg = alt.Chart(corgs).mark_line().encode(\n",
" alt.X('Years:T'),\n",
" alt.Y('United States:Q')\n",
")\n",
"linecorg"
]
},
{
"cell_type": "markdown",
"id": "4a16ad92",
"metadata": {},
"source": [
"That looks better! To make multiple lines for each country, we want to probably use the `melt` operation we did before. To do that, we need to make years the index:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b5f349e5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" United States \n",
" Brazil \n",
" Russia \n",
" Japan \n",
" Vietnam \n",
" Germany \n",
" France \n",
" United Kingdom \n",
" Italy \n",
" South Africa \n",
" ... \n",
" Croatia \n",
" New Zealand \n",
" Ireland \n",
" Lithuania \n",
" Uruguay \n",
" Latvia \n",
" Slovenia \n",
" Estonia \n",
" Netherlands Antilles \n",
" Kosovo \n",
" \n",
" \n",
" Years \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1917-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1918-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1919-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" \n",
" \n",
" 1920-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1921-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 40 columns
\n",
"
"
],
"text/plain": [
" United States Brazil Russia Japan Vietnam Germany France \\\n",
"Years \n",
"1917-01-01 0 0 0 0 0 0 0 \n",
"1918-01-01 0 0 0 0 0 0 0 \n",
"1919-01-01 0 0 0 0 0 0 0 \n",
"1920-01-01 0 0 0 0 0 0 0 \n",
"1921-01-01 0 0 0 0 0 0 0 \n",
"\n",
" United Kingdom Italy South Africa ... Croatia New Zealand \\\n",
"Years ... \n",
"1917-01-01 1 0 0 ... 0 0 \n",
"1918-01-01 0 0 0 ... 0 0 \n",
"1919-01-01 0 0 0 ... 0 0 \n",
"1920-01-01 0 0 0 ... 0 0 \n",
"1921-01-01 0 0 0 ... 0 0 \n",
"\n",
" Ireland Lithuania Uruguay Latvia Slovenia Estonia \\\n",
"Years \n",
"1917-01-01 0 0 0 0 0 0 \n",
"1918-01-01 0 0 0 0 0 0 \n",
"1919-01-01 0 0 0 0 0 0 \n",
"1920-01-01 0 0 0 0 0 0 \n",
"1921-01-01 0 0 0 0 0 0 \n",
"\n",
" Netherlands Antilles Kosovo \n",
"Years \n",
"1917-01-01 0 0 \n",
"1918-01-01 0 0 \n",
"1919-01-01 0 1 \n",
"1920-01-01 0 0 \n",
"1921-01-01 0 0 \n",
"\n",
"[5 rows x 40 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corgs = corgs.set_index('Years')\n",
"corgs.head()"
]
},
{
"cell_type": "markdown",
"id": "ae2eed0f",
"metadata": {},
"source": [
"Now we can \"melt\" like before:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "86f8e8c7",
"metadata": {},
"outputs": [],
"source": [
"corg_source = corgs.reset_index().melt('Years', \n",
" var_name='Country', value_name='country')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "d5b017d8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Years \n",
" Country \n",
" country \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 1917-01-01 \n",
" United States \n",
" 0 \n",
" \n",
" \n",
" 1 \n",
" 1918-01-01 \n",
" United States \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" 1919-01-01 \n",
" United States \n",
" 0 \n",
" \n",
" \n",
" 3 \n",
" 1920-01-01 \n",
" United States \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" 1921-01-01 \n",
" United States \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Years Country country\n",
"0 1917-01-01 United States 0\n",
"1 1918-01-01 United States 0\n",
"2 1919-01-01 United States 0\n",
"3 1920-01-01 United States 0\n",
"4 1921-01-01 United States 0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_source.head()"
]
},
{
"cell_type": "markdown",
"id": "132438be",
"metadata": {},
"source": [
"Now, let's try again with our multi-line plot:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "c3a49d72",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linecorg = alt.Chart(corg_source).mark_line().encode(\n",
" alt.Y('country:Q'),\n",
" x='Years:T',\n",
" color='Country:N'\n",
")\n",
"linecorg"
]
},
{
"cell_type": "markdown",
"id": "ace87bce",
"metadata": {},
"source": [
"Since there is a large range of corgis born, we might try a log-scale on the y-axis:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "835b428c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linecorg = alt.Chart(corg_source).mark_line().encode(\n",
" alt.Y('country:Q',scale=alt.Scale(type='log')),\n",
" x='Years:T',\n",
" color='Country:N'\n",
")\n",
"linecorg"
]
},
{
"cell_type": "markdown",
"id": "8bb9795c",
"metadata": {},
"source": [
"Here we see that there is an issue -- this is probably because there are zeros in our dataset:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "315d3094",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_source['country'].min()"
]
},
{
"cell_type": "markdown",
"id": "793c1a4c",
"metadata": {},
"source": [
"Potentially, we can \"clean\" these by setting any zeros to NaN's, however in this case those 0's are meaningful -- they are telling us that a particular country has no corgis born in that year! Instead, we have a few different options. One thing we can do [is apply a](https://stackoverflow.com/questions/58032074/why-is-altair-returning-an-empty-chart-when-using-log-scale) [filter opperation](https://altair-viz.github.io/user_guide/transform/filter.html#filter-transform) on our data:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "4b21cdd7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linecorg = alt.Chart(corg_source).transform_filter(\n",
" alt.datum.country > 0\n",
").mark_line().encode(\n",
" alt.Y('country:Q',scale=alt.Scale(type='log')),\n",
" x='Years:T',\n",
" color='Country:N'\n",
")\n",
"linecorg"
]
},
{
"cell_type": "markdown",
"id": "ab762ef4",
"metadata": {},
"source": [
"This doesn't really get to our issue though -- we are just filtering out those zero years/countries. In this case, instead of filtering, we probably want to use a symmetric log instead of a log for our scale;"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "f4b2d163",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"linecorg = alt.Chart(corg_source).mark_line().encode(\n",
" alt.Y('country:Q',scale=alt.Scale(type='symlog')),\n",
" x='Years:T',\n",
" color='Country:N'\n",
")\n",
"linecorg"
]
},
{
"cell_type": "markdown",
"id": "d02700cc",
"metadata": {},
"source": [
"Even with this though, we have sort of a messy plot -- even if we allowed for selections (like with the buildings dataset above) we still have many lines that \"fall off\" our legend which is not helpful.\n",
"\n",
"Since this is data in different countries, we could think that maybe a good idea would be some kind of map!\n",
"\n",
"First, let's see what mappable data \"comes with\" vega-datasets:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "a573129b",
"metadata": {},
"outputs": [],
"source": [
"from vega_datasets import data"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "0cf9ef16",
"metadata": {},
"outputs": [],
"source": [
"#data.countries"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "fb55844c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.world_110m"
]
},
{
"cell_type": "markdown",
"id": "3620bf17",
"metadata": {},
"source": [
"Let's try plotting both of these:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "05f5e5e9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='gray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"world"
]
},
{
"cell_type": "markdown",
"id": "cbf21c56",
"metadata": {},
"source": [
"So, one way we can plot data on this plot is to plot dots over each of the countries showing how many corgis have been born in each one. For this though [we probably need the latitude and longitude points](https://altair-viz.github.io/altair-tutorial/notebooks/09-Geographic-plots.html)."
]
},
{
"cell_type": "markdown",
"id": "f6f40961",
"metadata": {},
"source": [
"One way to do this is [using the OpenStreetMap project](https://gis.stackexchange.com/questions/212796/getting-latlon-extent-of-country-by-its-name-using-python):"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "d01ac06f",
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"def get_boundingbox_country(country, output_as='center'):\n",
" \"\"\"\n",
" get the bounding box of a country in EPSG4326 given a country name\n",
"\n",
" Parameters\n",
" ----------\n",
" country : str\n",
" name of the country in english and lowercase\n",
" output_as : 'str\n",
" chose from 'boundingbox' or 'center'. \n",
" - 'boundingbox' for [latmin, latmax, lonmin, lonmax]\n",
" - 'center' for [latcenter, loncenter]\n",
"\n",
" Returns\n",
" -------\n",
" output : list\n",
" list with coordinates as str\n",
" \"\"\"\n",
" # create url\n",
" url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search?country=',\n",
" country,\n",
" '&format=json&polygon=0')\n",
" response = requests.get(url).json()[0]\n",
"\n",
" # parse response to list\n",
" if output_as == 'boundingbox':\n",
" lst = response[output_as]\n",
" output = [float(i) for i in lst]\n",
" if output_as == 'center':\n",
" lst = [response.get(key) for key in ['lat','lon']]\n",
" output = [float(i) for i in lst]\n",
" return output"
]
},
{
"cell_type": "markdown",
"id": "c8473e93",
"metadata": {},
"source": [
"We can call OpenStreetMaps in a few different ways, with country codes or names:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "8b7ba374",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(39.7837304, -100.445882)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lat,long = get_boundingbox_country('usa')\n",
"lat,long"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "1b257c1c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(39.7837304, -100.445882)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lat,long = get_boundingbox_country('us')\n",
"lat,long"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "d6fda5f6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(39.7837304, -100.445882)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lat,long = get_boundingbox_country('United States')\n",
"lat,long"
]
},
{
"cell_type": "markdown",
"id": "eeb3cb21",
"metadata": {},
"source": [
"There is no guarantee that the dataset is formatted with the correct country codes -- so we'd best check and see:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "50e6c74a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['United States', 'Brazil', 'Russia', 'Japan', 'Vietnam', 'Germany',\n",
" 'France', 'United Kingdom', 'Italy', 'South Africa', 'Ukraine', 'Spain',\n",
" 'Poland', 'Canada', 'Korea, North', 'Romania', 'Australia', 'Portugal',\n",
" 'Belgium', 'Czech Republic', 'Hungary', 'Belarus', 'Sweden', 'Austria',\n",
" 'Switzerland', 'Israel', 'Serbia', 'Denmark', 'Finland', 'Norway',\n",
" 'Croatia', 'New Zealand', 'Ireland', 'Lithuania', 'Uruguay', 'Latvia',\n",
" 'Slovenia', 'Estonia', 'Netherlands Antilles', 'Kosovo'],\n",
" dtype='object')"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corgs.columns"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "7cecb018",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"United States 39.7837304 -100.445882\n",
"Brazil -10.3333333 -53.2\n",
"Russia 64.6863136 97.7453061\n",
"Japan 36.5748441 139.2394179\n",
"Vietnam 15.9266657 107.9650855\n",
"Germany 51.1638175 10.4478313\n",
"France 46.603354 1.8883335\n",
"United Kingdom 54.7023545 -3.2765753\n",
"Italy 42.6384261 12.674297\n",
"South Africa -28.8166236 24.991639\n",
"Ukraine 49.4871968 31.2718321\n",
"Spain 39.3260685 -4.8379791\n",
"Poland 52.215933 19.134422\n",
"Canada 61.0666922 -107.991707\n",
"Korea, North 40.3736611 127.0870417\n",
"Romania 45.9852129 24.6859225\n",
"Australia -24.7761086 134.755\n",
"Portugal 39.6621648 -8.1353519\n",
"Belgium 50.6402809 4.6667145\n",
"Czech Republic 49.7439047 15.3381061\n",
"Hungary 47.1817585 19.5060937\n",
"Belarus 53.4250605 27.6971358\n",
"Sweden 59.6749712 14.5208584\n",
"Austria 47.59397 14.12456\n",
"Switzerland 46.7985624 8.2319736\n",
"Israel 31.39480005 34.63358319049945\n",
"Serbia 44.1534121 20.55144\n",
"Denmark 55.670249 10.3333283\n",
"Finland 63.2467777 25.9209164\n",
"Norway 61.1529386 8.7876653\n",
"Croatia 45.3658443 15.6575209\n",
"New Zealand -41.5000831 172.8344077\n",
"Ireland 52.865196 -7.9794599\n",
"Lithuania 55.3500003 23.7499997\n",
"Uruguay -32.8755548 -56.0201525\n",
"Latvia 56.8406494 24.7537645\n",
"Slovenia 46.1199444 14.8153333\n",
"Estonia 58.7523778 25.3319078\n"
]
},
{
"ename": "IndexError",
"evalue": "list index out of range",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[29], line 3\u001b[0m\n\u001b[1;32m 1\u001b[0m lat,long \u001b[38;5;241m=\u001b[39m [],[]\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m c \u001b[38;5;129;01min\u001b[39;00m corgs\u001b[38;5;241m.\u001b[39mcolumns:\n\u001b[0;32m----> 3\u001b[0m la,lo \u001b[38;5;241m=\u001b[39m \u001b[43mget_boundingbox_country\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mc\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28mprint\u001b[39m(c,la,lo)\n\u001b[1;32m 5\u001b[0m lat\u001b[38;5;241m.\u001b[39mappend(la); long\u001b[38;5;241m.\u001b[39mappend(lo)\n",
"Cell \u001b[0;32mIn[24], line 24\u001b[0m, in \u001b[0;36mget_boundingbox_country\u001b[0;34m(country, output_as)\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[38;5;66;03m# create url\u001b[39;00m\n\u001b[1;32m 21\u001b[0m url \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{0}\u001b[39;00m\u001b[38;5;132;01m{1}\u001b[39;00m\u001b[38;5;132;01m{2}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;241m.\u001b[39mformat(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mhttp://nominatim.openstreetmap.org/search?country=\u001b[39m\u001b[38;5;124m'\u001b[39m,\n\u001b[1;32m 22\u001b[0m country,\n\u001b[1;32m 23\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m&format=json&polygon=0\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m---> 24\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[43mrequests\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[43murl\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mjson\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[1;32m 26\u001b[0m \u001b[38;5;66;03m# parse response to list\u001b[39;00m\n\u001b[1;32m 27\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m output_as \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mboundingbox\u001b[39m\u001b[38;5;124m'\u001b[39m:\n",
"\u001b[0;31mIndexError\u001b[0m: list index out of range"
]
}
],
"source": [
"lat,long = [],[]\n",
"for c in corgs.columns:\n",
" la,lo = get_boundingbox_country(str(c))\n",
" print(c,la,lo)\n",
" lat.append(la); long.append(lo)"
]
},
{
"cell_type": "markdown",
"id": "6a9839c8",
"metadata": {},
"source": [
"Ah ha! We have an issue, let's see what it is:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "b8d24543",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Netherlands Antilles'"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c"
]
},
{
"cell_type": "markdown",
"id": "07980335",
"metadata": {},
"source": [
"In this case, we need to do some data cleaning. We can start from our \"source\" but instead, we probably want to start from the original dataframe, just to be consistent:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "be9cd957",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" United States \n",
" Brazil \n",
" Russia \n",
" Japan \n",
" Vietnam \n",
" Germany \n",
" France \n",
" United Kingdom \n",
" Italy \n",
" South Africa \n",
" ... \n",
" Croatia \n",
" New Zealand \n",
" Ireland \n",
" Lithuania \n",
" Uruguay \n",
" Latvia \n",
" Slovenia \n",
" Estonia \n",
" Netherlands \n",
" Kosovo \n",
" \n",
" \n",
" Years \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1917-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1918-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1919-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" \n",
" \n",
" 1920-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 1921-01-01 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 40 columns
\n",
"
"
],
"text/plain": [
" United States Brazil Russia Japan Vietnam Germany France \\\n",
"Years \n",
"1917-01-01 0 0 0 0 0 0 0 \n",
"1918-01-01 0 0 0 0 0 0 0 \n",
"1919-01-01 0 0 0 0 0 0 0 \n",
"1920-01-01 0 0 0 0 0 0 0 \n",
"1921-01-01 0 0 0 0 0 0 0 \n",
"\n",
" United Kingdom Italy South Africa ... Croatia New Zealand \\\n",
"Years ... \n",
"1917-01-01 1 0 0 ... 0 0 \n",
"1918-01-01 0 0 0 ... 0 0 \n",
"1919-01-01 0 0 0 ... 0 0 \n",
"1920-01-01 0 0 0 ... 0 0 \n",
"1921-01-01 0 0 0 ... 0 0 \n",
"\n",
" Ireland Lithuania Uruguay Latvia Slovenia Estonia \\\n",
"Years \n",
"1917-01-01 0 0 0 0 0 0 \n",
"1918-01-01 0 0 0 0 0 0 \n",
"1919-01-01 0 0 0 0 0 0 \n",
"1920-01-01 0 0 0 0 0 0 \n",
"1921-01-01 0 0 0 0 0 0 \n",
"\n",
" Netherlands Kosovo \n",
"Years \n",
"1917-01-01 0 0 \n",
"1918-01-01 0 0 \n",
"1919-01-01 0 1 \n",
"1920-01-01 0 0 \n",
"1921-01-01 0 0 \n",
"\n",
"[5 rows x 40 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean = corgs.copy()\n",
"#corg_clean = corg_clean.set_index('Years') # note -- don't have to do this again since we started with a rest index\n",
"corg_clean = corg_clean.rename(columns={'Netherlands Antilles':\"Netherlands\"})\n",
"corg_clean.head()"
]
},
{
"cell_type": "markdown",
"id": "00761ecd",
"metadata": {},
"source": [
"Let's try this again:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "189aa8dd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"United States 39.7837304 -100.445882\n",
"Brazil -10.3333333 -53.2\n",
"Russia 64.6863136 97.7453061\n",
"Japan 36.5748441 139.2394179\n",
"Vietnam 15.9266657 107.9650855\n",
"Germany 51.1638175 10.4478313\n",
"France 46.603354 1.8883335\n",
"United Kingdom 54.7023545 -3.2765753\n",
"Italy 42.6384261 12.674297\n",
"South Africa -28.8166236 24.991639\n",
"Ukraine 49.4871968 31.2718321\n",
"Spain 39.3260685 -4.8379791\n",
"Poland 52.215933 19.134422\n",
"Canada 61.0666922 -107.991707\n",
"Korea, North 40.3736611 127.0870417\n",
"Romania 45.9852129 24.6859225\n",
"Australia -24.7761086 134.755\n",
"Portugal 39.6621648 -8.1353519\n",
"Belgium 50.6402809 4.6667145\n",
"Czech Republic 49.7439047 15.3381061\n",
"Hungary 47.1817585 19.5060937\n",
"Belarus 53.4250605 27.6971358\n",
"Sweden 59.6749712 14.5208584\n",
"Austria 47.59397 14.12456\n",
"Switzerland 46.7985624 8.2319736\n",
"Israel 31.39480005 34.63358319049945\n",
"Serbia 44.1534121 20.55144\n",
"Denmark 55.670249 10.3333283\n",
"Finland 63.2467777 25.9209164\n",
"Norway 61.1529386 8.7876653\n",
"Croatia 45.3658443 15.6575209\n",
"New Zealand -41.5000831 172.8344077\n",
"Ireland 52.865196 -7.9794599\n",
"Lithuania 55.3500003 23.7499997\n",
"Uruguay -32.8755548 -56.0201525\n",
"Latvia 56.8406494 24.7537645\n",
"Slovenia 46.1199444 14.8153333\n",
"Estonia 58.7523778 25.3319078\n",
"Netherlands 52.24764975 5.541246849406163\n",
"Kosovo 42.5869578 20.9021231\n"
]
}
],
"source": [
"lat,long = [],[]\n",
"for c in corg_clean.columns:\n",
" la,lo = get_boundingbox_country(str(c))\n",
" print(c,la,lo)\n",
" lat.append(la); long.append(lo)"
]
},
{
"cell_type": "markdown",
"id": "bb6973f2",
"metadata": {},
"source": [
"Hurray! So, this is a case where being able to clean the data in Python was super useful. Let's try adding these as points to our map now that we have cleaned data."
]
},
{
"cell_type": "markdown",
"id": "77da2776",
"metadata": {},
"source": [
"One other data transformation we probably want to do is transpose this matrix so that we have countries as rows and corgis born per year as columns:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "18ab69a1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 \n",
" 1918-01-01 \n",
" 1919-01-01 \n",
" 1920-01-01 \n",
" 1921-01-01 \n",
" 1922-01-01 \n",
" 1923-01-01 \n",
" 1924-01-01 \n",
" 1925-01-01 \n",
" 1926-01-01 \n",
" ... \n",
" 2011-01-01 \n",
" 2012-01-01 \n",
" 2013-01-01 \n",
" 2014-01-01 \n",
" 2015-01-01 \n",
" 2016-01-01 \n",
" 2017-01-01 \n",
" 2018-01-01 \n",
" 2019-01-01 \n",
" 2020-01-01 \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 408 \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 89 \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 104 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n",
"United States ... 408 431 376 280 \n",
"Brazil ... 0 0 0 0 \n",
"Russia ... 89 82 115 127 \n",
"Japan ... 0 0 0 0 \n",
"Vietnam ... 0 0 0 0 \n",
"\n",
"Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n",
"United States 301 229 213 198 118 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 237 217 260 329 189 \n",
"Japan 0 2 2 0 0 \n",
"Vietnam 0 0 1 3 0 \n",
"\n",
"Years 2020-01-01 \n",
"United States 2 \n",
"Brazil 0 \n",
"Russia 16 \n",
"Japan 0 \n",
"Vietnam 0 \n",
"\n",
"[5 rows x 104 columns]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t = corg_clean.T\n",
"corg_clean_t.head()"
]
},
{
"cell_type": "markdown",
"id": "4b6c0ef7",
"metadata": {},
"source": [
"Since, technically, the index is no longer years, so we should be sure to name the index more intuatively: "
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "4fe7be97",
"metadata": {},
"outputs": [],
"source": [
"corg_clean_t.index.name = \"Country\""
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "92c681dc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Years'"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.axes[1].name"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "b4404f0d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 \n",
" 1918-01-01 \n",
" 1919-01-01 \n",
" 1920-01-01 \n",
" 1921-01-01 \n",
" 1922-01-01 \n",
" 1923-01-01 \n",
" 1924-01-01 \n",
" 1925-01-01 \n",
" 1926-01-01 \n",
" ... \n",
" 2011-01-01 \n",
" 2012-01-01 \n",
" 2013-01-01 \n",
" 2014-01-01 \n",
" 2015-01-01 \n",
" 2016-01-01 \n",
" 2017-01-01 \n",
" 2018-01-01 \n",
" 2019-01-01 \n",
" 2020-01-01 \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 408 \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 89 \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 104 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n",
"Country ... \n",
"United States ... 408 431 376 280 \n",
"Brazil ... 0 0 0 0 \n",
"Russia ... 89 82 115 127 \n",
"Japan ... 0 0 0 0 \n",
"Vietnam ... 0 0 0 0 \n",
"\n",
"Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n",
"Country \n",
"United States 301 229 213 198 118 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 237 217 260 329 189 \n",
"Japan 0 2 2 0 0 \n",
"Vietnam 0 0 1 3 0 \n",
"\n",
"Years 2020-01-01 \n",
"Country \n",
"United States 2 \n",
"Brazil 0 \n",
"Russia 16 \n",
"Japan 0 \n",
"Vietnam 0 \n",
"\n",
"[5 rows x 104 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.head()"
]
},
{
"cell_type": "markdown",
"id": "1b97f199",
"metadata": {},
"source": [
"Now we can add in the lat/long for each country:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "6d5178a1",
"metadata": {},
"outputs": [],
"source": [
"corg_clean_t['Latitude'] = lat\n",
"corg_clean_t['Longitude'] = long"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "99b6bced",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 00:00:00 \n",
" 1918-01-01 00:00:00 \n",
" 1919-01-01 00:00:00 \n",
" 1920-01-01 00:00:00 \n",
" 1921-01-01 00:00:00 \n",
" 1922-01-01 00:00:00 \n",
" 1923-01-01 00:00:00 \n",
" 1924-01-01 00:00:00 \n",
" 1925-01-01 00:00:00 \n",
" 1926-01-01 00:00:00 \n",
" ... \n",
" 2013-01-01 00:00:00 \n",
" 2014-01-01 00:00:00 \n",
" 2015-01-01 00:00:00 \n",
" 2016-01-01 00:00:00 \n",
" 2017-01-01 00:00:00 \n",
" 2018-01-01 00:00:00 \n",
" 2019-01-01 00:00:00 \n",
" 2020-01-01 00:00:00 \n",
" Latitude \n",
" Longitude \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 39.783730 \n",
" -100.445882 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" -10.333333 \n",
" -53.200000 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 64.686314 \n",
" 97.745306 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 36.574844 \n",
" 139.239418 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 15.926666 \n",
" 107.965086 \n",
" \n",
" \n",
"
\n",
"
5 rows × 106 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1926-01-01 00:00:00 ... 2013-01-01 00:00:00 \\\n",
"Country ... \n",
"United States 0 ... 376 \n",
"Brazil 0 ... 0 \n",
"Russia 0 ... 115 \n",
"Japan 0 ... 0 \n",
"Vietnam 0 ... 0 \n",
"\n",
"Years 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00 \\\n",
"Country \n",
"United States 280 301 229 \n",
"Brazil 0 0 0 \n",
"Russia 127 237 217 \n",
"Japan 0 0 2 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 2017-01-01 00:00:00 2018-01-01 00:00:00 2019-01-01 00:00:00 \\\n",
"Country \n",
"United States 213 198 118 \n",
"Brazil 0 0 0 \n",
"Russia 260 329 189 \n",
"Japan 2 0 0 \n",
"Vietnam 1 3 0 \n",
"\n",
"Years 2020-01-01 00:00:00 Latitude Longitude \n",
"Country \n",
"United States 2 39.783730 -100.445882 \n",
"Brazil 0 -10.333333 -53.200000 \n",
"Russia 16 64.686314 97.745306 \n",
"Japan 0 36.574844 139.239418 \n",
"Vietnam 0 15.926666 107.965086 \n",
"\n",
"[5 rows x 106 columns]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.head()"
]
},
{
"cell_type": "markdown",
"id": "ffbdf81e",
"metadata": {},
"source": [
"Now (finally) we can plot those points on our map using just the lat/long:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "8af51649",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" Country \n",
" Latitude \n",
" Longitude \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 39.783730 \n",
" -100.445882 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" -10.333333 \n",
" -53.200000 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 64.686314 \n",
" 97.745306 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 36.574844 \n",
" 139.239418 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 15.926666 \n",
" 107.965086 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Years Country Latitude Longitude\n",
"0 United States 39.783730 -100.445882\n",
"1 Brazil -10.333333 -53.200000\n",
"2 Russia 64.686314 97.745306\n",
"3 Japan 36.574844 139.239418\n",
"4 Vietnam 15.926666 107.965086"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_subset = corg_clean_t.reset_index()[['Country','Latitude','Longitude']]\n",
"corg_subset.head()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "d2e22c68",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='gray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_subset).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" size=alt.value(100),\n",
" tooltip='Country'\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "3f31fdc4",
"metadata": {},
"source": [
"We probably want the size of the dot to be proportional to the actual number of corgis born."
]
},
{
"cell_type": "markdown",
"id": "d51446fe",
"metadata": {},
"source": [
"To start, let's just plot the total corgis born in a country:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "04940dd3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 \n",
" 1918-01-01 \n",
" 1919-01-01 \n",
" 1920-01-01 \n",
" 1921-01-01 \n",
" 1922-01-01 \n",
" 1923-01-01 \n",
" 1924-01-01 \n",
" 1925-01-01 \n",
" 1926-01-01 \n",
" ... \n",
" 2011-01-01 \n",
" 2012-01-01 \n",
" 2013-01-01 \n",
" 2014-01-01 \n",
" 2015-01-01 \n",
" 2016-01-01 \n",
" 2017-01-01 \n",
" 2018-01-01 \n",
" 2019-01-01 \n",
" 2020-01-01 \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 408 \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 89 \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 104 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n",
"Country ... \n",
"United States ... 408 431 376 280 \n",
"Brazil ... 0 0 0 0 \n",
"Russia ... 89 82 115 127 \n",
"Japan ... 0 0 0 0 \n",
"Vietnam ... 0 0 0 0 \n",
"\n",
"Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n",
"Country \n",
"United States 301 229 213 198 118 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 237 217 260 329 189 \n",
"Japan 0 2 2 0 0 \n",
"Vietnam 0 0 1 3 0 \n",
"\n",
"Years 2020-01-01 \n",
"Country \n",
"United States 2 \n",
"Brazil 0 \n",
"Russia 16 \n",
"Japan 0 \n",
"Vietnam 0 \n",
"\n",
"[5 rows x 104 columns]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.loc[:, (corg_clean_t.columns != 'Longitude') & (corg_clean_t.columns != \"Latitude\")].head()"
]
},
{
"cell_type": "markdown",
"id": "8812a49a",
"metadata": {},
"source": [
"Now we can sum along the Country axis:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "f5aa8ea0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Country\n",
"United States 16130\n",
"Brazil 1\n",
"Russia 1834\n",
"Japan 7\n",
"Vietnam 4\n",
"Germany 892\n",
"France 597\n",
"United Kingdom 2649\n",
"Italy 106\n",
"South Africa 28\n",
"Ukraine 19\n",
"Spain 5\n",
"Poland 859\n",
"Canada 391\n",
"Korea, North 5\n",
"Romania 7\n",
"Australia 891\n",
"Portugal 7\n",
"Belgium 110\n",
"Czech Republic 271\n",
"Hungary 8\n",
"Belarus 38\n",
"Sweden 2008\n",
"Austria 29\n",
"Switzerland 30\n",
"Israel 32\n",
"Serbia 2\n",
"Denmark 2176\n",
"Finland 4051\n",
"Norway 1077\n",
"Croatia 21\n",
"New Zealand 1097\n",
"Ireland 29\n",
"Lithuania 29\n",
"Uruguay 2\n",
"Latvia 3\n",
"Slovenia 1\n",
"Estonia 97\n",
"Netherlands 536\n",
"Kosovo 649\n",
"dtype: int64"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.loc[:, (corg_clean_t.columns != 'Longitude') & (corg_clean_t.columns != \"Latitude\")].sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "ceffa58a",
"metadata": {},
"outputs": [],
"source": [
"corg_clean_t['Total Corg'] = corg_clean_t.loc[:, (corg_clean_t.columns != 'Longitude') & (corg_clean_t.columns != \"Latitude\")].sum(axis=1).values"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "0e8c048e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 00:00:00 \n",
" 1918-01-01 00:00:00 \n",
" 1919-01-01 00:00:00 \n",
" 1920-01-01 00:00:00 \n",
" 1921-01-01 00:00:00 \n",
" 1922-01-01 00:00:00 \n",
" 1923-01-01 00:00:00 \n",
" 1924-01-01 00:00:00 \n",
" 1925-01-01 00:00:00 \n",
" 1926-01-01 00:00:00 \n",
" ... \n",
" 2014-01-01 00:00:00 \n",
" 2015-01-01 00:00:00 \n",
" 2016-01-01 00:00:00 \n",
" 2017-01-01 00:00:00 \n",
" 2018-01-01 00:00:00 \n",
" 2019-01-01 00:00:00 \n",
" 2020-01-01 00:00:00 \n",
" Latitude \n",
" Longitude \n",
" Total Corg \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 39.783730 \n",
" -100.445882 \n",
" 16130 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" -10.333333 \n",
" -53.200000 \n",
" 1 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 64.686314 \n",
" 97.745306 \n",
" 1834 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 36.574844 \n",
" 139.239418 \n",
" 7 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 15.926666 \n",
" 107.965086 \n",
" 4 \n",
" \n",
" \n",
"
\n",
"
5 rows × 107 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1926-01-01 00:00:00 ... 2014-01-01 00:00:00 \\\n",
"Country ... \n",
"United States 0 ... 280 \n",
"Brazil 0 ... 0 \n",
"Russia 0 ... 127 \n",
"Japan 0 ... 0 \n",
"Vietnam 0 ... 0 \n",
"\n",
"Years 2015-01-01 00:00:00 2016-01-01 00:00:00 2017-01-01 00:00:00 \\\n",
"Country \n",
"United States 301 229 213 \n",
"Brazil 0 0 0 \n",
"Russia 237 217 260 \n",
"Japan 0 2 2 \n",
"Vietnam 0 0 1 \n",
"\n",
"Years 2018-01-01 00:00:00 2019-01-01 00:00:00 2020-01-01 00:00:00 \\\n",
"Country \n",
"United States 198 118 2 \n",
"Brazil 0 0 0 \n",
"Russia 329 189 16 \n",
"Japan 0 0 0 \n",
"Vietnam 3 0 0 \n",
"\n",
"Years Latitude Longitude Total Corg \n",
"Country \n",
"United States 39.783730 -100.445882 16130 \n",
"Brazil -10.333333 -53.200000 1 \n",
"Russia 64.686314 97.745306 1834 \n",
"Japan 36.574844 139.239418 7 \n",
"Vietnam 15.926666 107.965086 4 \n",
"\n",
"[5 rows x 107 columns]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.head()"
]
},
{
"cell_type": "markdown",
"id": "a561485c",
"metadata": {},
"source": [
"Let's again subset our data:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "a0e18e70",
"metadata": {},
"outputs": [],
"source": [
"corg_subset2 = corg_clean_t.reset_index()[['Country','Latitude','Longitude','Total Corg']]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "6ea03e09",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='gray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_subset2).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" size=alt.Size('Total Corg:Q'),\n",
" tooltip='Country',\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "395f4c9e",
"metadata": {},
"source": [
"By default, we see that there is some scale applied to the points where there are some that are very small. Let's see if we can mess with this:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "04936076",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='gray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_subset2).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" size=alt.Size('Total Corg:Q',scale=None),\n",
" tooltip='Country',\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "2a3e1a96",
"metadata": {},
"source": [
"Whoa! Now that seems to have gone the other way! Now we have so many overlapping points that its hard to see in \"high corgi\" areas like the UK. Let's try a log scale:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "33af2154",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='gray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_subset2).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n",
" tooltip='Country',\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "7eba854d",
"metadata": {},
"source": [
"Hey that looks better! Though, that blue is a bit hard to see, let's try messing with the colors:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "823bb08d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='lightgray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_subset2).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n",
" tooltip='Country',\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "1745af2d",
"metadata": {},
"source": [
"Neat! Now, wouldn't it be fun if we could see those data points \"grow\" over time. Let's look at our dataframe again:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "8c674e4f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 00:00:00 \n",
" 1918-01-01 00:00:00 \n",
" 1919-01-01 00:00:00 \n",
" 1920-01-01 00:00:00 \n",
" 1921-01-01 00:00:00 \n",
" 1922-01-01 00:00:00 \n",
" 1923-01-01 00:00:00 \n",
" 1924-01-01 00:00:00 \n",
" 1925-01-01 00:00:00 \n",
" 1926-01-01 00:00:00 \n",
" ... \n",
" 2014-01-01 00:00:00 \n",
" 2015-01-01 00:00:00 \n",
" 2016-01-01 00:00:00 \n",
" 2017-01-01 00:00:00 \n",
" 2018-01-01 00:00:00 \n",
" 2019-01-01 00:00:00 \n",
" 2020-01-01 00:00:00 \n",
" Latitude \n",
" Longitude \n",
" Total Corg \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 39.783730 \n",
" -100.445882 \n",
" 16130 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" -10.333333 \n",
" -53.200000 \n",
" 1 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 64.686314 \n",
" 97.745306 \n",
" 1834 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 36.574844 \n",
" 139.239418 \n",
" 7 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 15.926666 \n",
" 107.965086 \n",
" 4 \n",
" \n",
" \n",
"
\n",
"
5 rows × 107 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1926-01-01 00:00:00 ... 2014-01-01 00:00:00 \\\n",
"Country ... \n",
"United States 0 ... 280 \n",
"Brazil 0 ... 0 \n",
"Russia 0 ... 127 \n",
"Japan 0 ... 0 \n",
"Vietnam 0 ... 0 \n",
"\n",
"Years 2015-01-01 00:00:00 2016-01-01 00:00:00 2017-01-01 00:00:00 \\\n",
"Country \n",
"United States 301 229 213 \n",
"Brazil 0 0 0 \n",
"Russia 237 217 260 \n",
"Japan 0 2 2 \n",
"Vietnam 0 0 1 \n",
"\n",
"Years 2018-01-01 00:00:00 2019-01-01 00:00:00 2020-01-01 00:00:00 \\\n",
"Country \n",
"United States 198 118 2 \n",
"Brazil 0 0 0 \n",
"Russia 329 189 16 \n",
"Japan 0 0 0 \n",
"Vietnam 3 0 0 \n",
"\n",
"Years Latitude Longitude Total Corg \n",
"Country \n",
"United States 39.783730 -100.445882 16130 \n",
"Brazil -10.333333 -53.200000 1 \n",
"Russia 64.686314 97.745306 1834 \n",
"Japan 36.574844 139.239418 7 \n",
"Vietnam 15.926666 107.965086 4 \n",
"\n",
"[5 rows x 107 columns]"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t.head()"
]
},
{
"cell_type": "markdown",
"id": "91d03087",
"metadata": {},
"source": [
"Instead of total corgs overall, what we really want is total corgs over time. Luckily, our columns are in the correct date order:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "4b0fc8b4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 00:00:00 \n",
" 1918-01-01 00:00:00 \n",
" 1919-01-01 00:00:00 \n",
" 1920-01-01 00:00:00 \n",
" 1921-01-01 00:00:00 \n",
" 1922-01-01 00:00:00 \n",
" 1923-01-01 00:00:00 \n",
" 1924-01-01 00:00:00 \n",
" 1925-01-01 00:00:00 \n",
" 1926-01-01 00:00:00 \n",
" ... \n",
" 2013-01-01 00:00:00 \n",
" 2014-01-01 00:00:00 \n",
" 2015-01-01 00:00:00 \n",
" 2016-01-01 00:00:00 \n",
" 2017-01-01 00:00:00 \n",
" 2018-01-01 00:00:00 \n",
" 2019-01-01 00:00:00 \n",
" 2020-01-01 00:00:00 \n",
" Latitude \n",
" Longitude \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 39.783730 \n",
" -100.445882 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" -10.333333 \n",
" -53.200000 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 64.686314 \n",
" 97.745306 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 36.574844 \n",
" 139.239418 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 15.926666 \n",
" 107.965086 \n",
" \n",
" \n",
"
\n",
"
5 rows × 106 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 1926-01-01 00:00:00 ... 2013-01-01 00:00:00 \\\n",
"Country ... \n",
"United States 0 ... 376 \n",
"Brazil 0 ... 0 \n",
"Russia 0 ... 115 \n",
"Japan 0 ... 0 \n",
"Vietnam 0 ... 0 \n",
"\n",
"Years 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00 \\\n",
"Country \n",
"United States 280 301 229 \n",
"Brazil 0 0 0 \n",
"Russia 127 237 217 \n",
"Japan 0 0 2 \n",
"Vietnam 0 0 0 \n",
"\n",
"Years 2017-01-01 00:00:00 2018-01-01 00:00:00 2019-01-01 00:00:00 \\\n",
"Country \n",
"United States 213 198 118 \n",
"Brazil 0 0 0 \n",
"Russia 260 329 189 \n",
"Japan 2 0 0 \n",
"Vietnam 1 3 0 \n",
"\n",
"Years 2020-01-01 00:00:00 Latitude Longitude \n",
"Country \n",
"United States 2 39.783730 -100.445882 \n",
"Brazil 0 -10.333333 -53.200000 \n",
"Russia 16 64.686314 97.745306 \n",
"Japan 0 36.574844 139.239418 \n",
"Vietnam 0 15.926666 107.965086 \n",
"\n",
"[5 rows x 106 columns]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2 = corg_clean_t.loc[:,(corg_clean_t.columns != 'Total Corg')].copy()\n",
"corg_clean2.head()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "040a6c82",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 \n",
" 1918-01-01 \n",
" 1919-01-01 \n",
" 1920-01-01 \n",
" 1921-01-01 \n",
" 1922-01-01 \n",
" 1923-01-01 \n",
" 1924-01-01 \n",
" 1925-01-01 \n",
" 1926-01-01 \n",
" ... \n",
" 2011-01-01 \n",
" 2012-01-01 \n",
" 2013-01-01 \n",
" 2014-01-01 \n",
" 2015-01-01 \n",
" 2016-01-01 \n",
" 2017-01-01 \n",
" 2018-01-01 \n",
" 2019-01-01 \n",
" 2020-01-01 \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 408 \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 89 \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 104 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"\n",
"Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n",
"Country ... \n",
"United States ... 408 431 376 280 \n",
"Brazil ... 0 0 0 0 \n",
"Russia ... 89 82 115 127 \n",
"Japan ... 0 0 0 0 \n",
"Vietnam ... 0 0 0 0 \n",
"\n",
"Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n",
"Country \n",
"United States 301 229 213 198 118 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 237 217 260 329 189 \n",
"Japan 0 2 2 0 0 \n",
"Vietnam 0 0 1 3 0 \n",
"\n",
"Years 2020-01-01 \n",
"Country \n",
"United States 2 \n",
"Brazil 0 \n",
"Russia 16 \n",
"Japan 0 \n",
"Vietnam 0 \n",
"\n",
"[5 rows x 104 columns]"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')].head()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "a242c754",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 \n",
" 1918-01-01 \n",
" 1919-01-01 \n",
" 1920-01-01 \n",
" 1921-01-01 \n",
" 1922-01-01 \n",
" 1923-01-01 \n",
" 1924-01-01 \n",
" 1925-01-01 \n",
" 1926-01-01 \n",
" ... \n",
" 2011-01-01 \n",
" 2012-01-01 \n",
" 2013-01-01 \n",
" 2014-01-01 \n",
" 2015-01-01 \n",
" 2016-01-01 \n",
" 2017-01-01 \n",
" 2018-01-01 \n",
" 2019-01-01 \n",
" 2020-01-01 \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 13982 \n",
" 14413 \n",
" 14789 \n",
" 15069 \n",
" 15370 \n",
" 15599 \n",
" 15812 \n",
" 16010 \n",
" 16128 \n",
" 16130 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 262 \n",
" 344 \n",
" 459 \n",
" 586 \n",
" 823 \n",
" 1040 \n",
" 1300 \n",
" 1629 \n",
" 1818 \n",
" 1834 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 5 \n",
" 7 \n",
" 7 \n",
" 7 \n",
" 7 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 4 \n",
" 4 \n",
" 4 \n",
" \n",
" \n",
" Germany \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 428 \n",
" 508 \n",
" 589 \n",
" 666 \n",
" 738 \n",
" 792 \n",
" 820 \n",
" 862 \n",
" 892 \n",
" 892 \n",
" \n",
" \n",
" France \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 276 \n",
" 302 \n",
" 338 \n",
" 375 \n",
" 395 \n",
" 450 \n",
" 510 \n",
" 535 \n",
" 597 \n",
" 597 \n",
" \n",
" \n",
" United Kingdom \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 3 \n",
" 5 \n",
" 19 \n",
" 31 \n",
" ... \n",
" 2509 \n",
" 2531 \n",
" 2557 \n",
" 2579 \n",
" 2605 \n",
" 2633 \n",
" 2643 \n",
" 2648 \n",
" 2649 \n",
" 2649 \n",
" \n",
" \n",
" Italy \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 81 \n",
" 81 \n",
" 81 \n",
" 94 \n",
" 96 \n",
" 98 \n",
" 102 \n",
" 104 \n",
" 106 \n",
" 106 \n",
" \n",
" \n",
" South Africa \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 19 \n",
" 23 \n",
" 24 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" \n",
" \n",
" Ukraine \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 8 \n",
" 9 \n",
" 10 \n",
" 10 \n",
" 12 \n",
" 12 \n",
" 12 \n",
" 18 \n",
" 19 \n",
" 19 \n",
" \n",
" \n",
" Spain \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 5 \n",
" 5 \n",
" \n",
" \n",
" Poland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 337 \n",
" 379 \n",
" 460 \n",
" 526 \n",
" 617 \n",
" 692 \n",
" 748 \n",
" 825 \n",
" 859 \n",
" 859 \n",
" \n",
" \n",
" Canada \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 329 \n",
" 344 \n",
" 359 \n",
" 364 \n",
" 377 \n",
" 383 \n",
" 386 \n",
" 390 \n",
" 391 \n",
" 391 \n",
" \n",
" \n",
" Korea, North \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 3 \n",
" 4 \n",
" 4 \n",
" 4 \n",
" 5 \n",
" 5 \n",
" 5 \n",
" 5 \n",
" \n",
" \n",
" Romania \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 7 \n",
" 7 \n",
" \n",
" \n",
" Australia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 814 \n",
" 827 \n",
" 844 \n",
" 855 \n",
" 861 \n",
" 870 \n",
" 879 \n",
" 887 \n",
" 891 \n",
" 891 \n",
" \n",
" \n",
" Portugal \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 2 \n",
" 3 \n",
" 3 \n",
" 5 \n",
" 7 \n",
" 7 \n",
" \n",
" \n",
" Belgium \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 63 \n",
" 79 \n",
" 94 \n",
" 96 \n",
" 98 \n",
" 104 \n",
" 105 \n",
" 107 \n",
" 110 \n",
" 110 \n",
" \n",
" \n",
" Czech Republic \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 139 \n",
" 150 \n",
" 171 \n",
" 188 \n",
" 210 \n",
" 233 \n",
" 255 \n",
" 264 \n",
" 271 \n",
" 271 \n",
" \n",
" \n",
" Hungary \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 7 \n",
" 7 \n",
" 7 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" \n",
" \n",
" Belarus \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 5 \n",
" 12 \n",
" 21 \n",
" 38 \n",
" 38 \n",
" \n",
" \n",
" Sweden \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1214 \n",
" 1342 \n",
" 1461 \n",
" 1525 \n",
" 1643 \n",
" 1791 \n",
" 1928 \n",
" 1988 \n",
" 2007 \n",
" 2008 \n",
" \n",
" \n",
" Austria \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 21 \n",
" 22 \n",
" 22 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" \n",
" \n",
" Switzerland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 28 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" \n",
" \n",
" Israel \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 8 \n",
" 13 \n",
" 17 \n",
" 21 \n",
" 25 \n",
" 32 \n",
" 32 \n",
" \n",
" \n",
" Serbia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 1 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" \n",
" \n",
" Denmark \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1549 \n",
" 1653 \n",
" 1760 \n",
" 1856 \n",
" 1967 \n",
" 2045 \n",
" 2115 \n",
" 2148 \n",
" 2176 \n",
" 2176 \n",
" \n",
" \n",
" Finland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 2824 \n",
" 2967 \n",
" 3110 \n",
" 3308 \n",
" 3509 \n",
" 3695 \n",
" 3961 \n",
" 4044 \n",
" 4048 \n",
" 4051 \n",
" \n",
" \n",
" Norway \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 434 \n",
" 503 \n",
" 570 \n",
" 669 \n",
" 770 \n",
" 873 \n",
" 970 \n",
" 1056 \n",
" 1077 \n",
" 1077 \n",
" \n",
" \n",
" Croatia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 3 \n",
" 11 \n",
" 15 \n",
" 21 \n",
" 21 \n",
" 21 \n",
" \n",
" \n",
" New Zealand \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1080 \n",
" 1080 \n",
" 1082 \n",
" 1083 \n",
" 1088 \n",
" 1091 \n",
" 1094 \n",
" 1097 \n",
" 1097 \n",
" 1097 \n",
" \n",
" \n",
" Ireland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" \n",
" \n",
" Lithuania \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 5 \n",
" 5 \n",
" 20 \n",
" 20 \n",
" 20 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" \n",
" \n",
" Uruguay \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" \n",
" \n",
" Latvia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" \n",
" \n",
" Slovenia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" \n",
" \n",
" Estonia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 11 \n",
" 18 \n",
" 18 \n",
" 24 \n",
" 30 \n",
" 57 \n",
" 65 \n",
" 81 \n",
" 97 \n",
" 97 \n",
" \n",
" \n",
" Netherlands \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 323 \n",
" 375 \n",
" 417 \n",
" 448 \n",
" 470 \n",
" 487 \n",
" 512 \n",
" 524 \n",
" 536 \n",
" 536 \n",
" \n",
" \n",
" Kosovo \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" ... \n",
" 185 \n",
" 204 \n",
" 251 \n",
" 288 \n",
" 352 \n",
" 424 \n",
" 525 \n",
" 581 \n",
" 632 \n",
" 649 \n",
" \n",
" \n",
"
\n",
"
40 rows × 104 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 1918-01-01 1919-01-01 1920-01-01 1921-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"Germany 0 0 0 0 0 \n",
"France 0 0 0 0 0 \n",
"United Kingdom 1 1 1 1 1 \n",
"Italy 0 0 0 0 0 \n",
"South Africa 0 0 0 0 0 \n",
"Ukraine 0 0 0 0 0 \n",
"Spain 0 0 0 0 0 \n",
"Poland 0 0 0 0 0 \n",
"Canada 0 0 0 0 0 \n",
"Korea, North 0 0 0 0 0 \n",
"Romania 0 0 0 0 0 \n",
"Australia 0 0 0 0 0 \n",
"Portugal 0 0 0 0 0 \n",
"Belgium 0 0 0 0 0 \n",
"Czech Republic 0 0 0 0 0 \n",
"Hungary 0 0 0 0 0 \n",
"Belarus 0 0 0 0 0 \n",
"Sweden 0 0 0 0 0 \n",
"Austria 0 0 0 0 0 \n",
"Switzerland 0 0 0 0 0 \n",
"Israel 0 0 0 0 0 \n",
"Serbia 0 0 0 0 0 \n",
"Denmark 0 0 0 0 0 \n",
"Finland 0 0 0 0 0 \n",
"Norway 0 0 0 0 0 \n",
"Croatia 0 0 0 0 0 \n",
"New Zealand 0 0 0 0 0 \n",
"Ireland 0 0 0 0 0 \n",
"Lithuania 0 0 0 0 0 \n",
"Uruguay 0 0 0 0 0 \n",
"Latvia 0 0 0 0 0 \n",
"Slovenia 0 0 0 0 0 \n",
"Estonia 0 0 0 0 0 \n",
"Netherlands 0 0 0 0 0 \n",
"Kosovo 0 0 1 1 1 \n",
"\n",
"Years 1922-01-01 1923-01-01 1924-01-01 1925-01-01 1926-01-01 \\\n",
"Country \n",
"United States 0 0 0 0 0 \n",
"Brazil 0 0 0 0 0 \n",
"Russia 0 0 0 0 0 \n",
"Japan 0 0 0 0 0 \n",
"Vietnam 0 0 0 0 0 \n",
"Germany 0 0 0 0 0 \n",
"France 0 0 0 0 0 \n",
"United Kingdom 1 3 5 19 31 \n",
"Italy 0 0 0 0 0 \n",
"South Africa 0 0 0 0 0 \n",
"Ukraine 0 0 0 0 0 \n",
"Spain 0 0 0 0 0 \n",
"Poland 0 0 0 0 0 \n",
"Canada 0 0 0 0 0 \n",
"Korea, North 0 0 0 0 0 \n",
"Romania 0 0 0 0 0 \n",
"Australia 0 0 0 0 0 \n",
"Portugal 0 0 0 0 0 \n",
"Belgium 0 0 0 0 0 \n",
"Czech Republic 0 0 0 0 0 \n",
"Hungary 0 0 0 0 0 \n",
"Belarus 0 0 0 0 0 \n",
"Sweden 0 0 0 0 0 \n",
"Austria 0 0 0 0 0 \n",
"Switzerland 0 0 0 0 0 \n",
"Israel 0 0 0 0 0 \n",
"Serbia 0 0 0 0 0 \n",
"Denmark 0 0 0 0 0 \n",
"Finland 0 0 0 0 0 \n",
"Norway 0 0 0 0 0 \n",
"Croatia 0 0 0 0 0 \n",
"New Zealand 0 0 0 0 0 \n",
"Ireland 0 0 0 0 0 \n",
"Lithuania 0 0 0 0 0 \n",
"Uruguay 0 0 0 0 0 \n",
"Latvia 0 0 0 0 0 \n",
"Slovenia 0 0 0 0 0 \n",
"Estonia 0 0 0 0 0 \n",
"Netherlands 0 0 0 0 0 \n",
"Kosovo 1 1 1 1 1 \n",
"\n",
"Years ... 2011-01-01 2012-01-01 2013-01-01 2014-01-01 \\\n",
"Country ... \n",
"United States ... 13982 14413 14789 15069 \n",
"Brazil ... 1 1 1 1 \n",
"Russia ... 262 344 459 586 \n",
"Japan ... 3 3 3 3 \n",
"Vietnam ... 0 0 0 0 \n",
"Germany ... 428 508 589 666 \n",
"France ... 276 302 338 375 \n",
"United Kingdom ... 2509 2531 2557 2579 \n",
"Italy ... 81 81 81 94 \n",
"South Africa ... 19 23 24 28 \n",
"Ukraine ... 8 9 10 10 \n",
"Spain ... 0 0 0 0 \n",
"Poland ... 337 379 460 526 \n",
"Canada ... 329 344 359 364 \n",
"Korea, North ... 0 0 3 4 \n",
"Romania ... 0 0 0 0 \n",
"Australia ... 814 827 844 855 \n",
"Portugal ... 0 0 0 1 \n",
"Belgium ... 63 79 94 96 \n",
"Czech Republic ... 139 150 171 188 \n",
"Hungary ... 7 7 7 8 \n",
"Belarus ... 0 0 0 0 \n",
"Sweden ... 1214 1342 1461 1525 \n",
"Austria ... 21 22 22 29 \n",
"Switzerland ... 28 29 29 29 \n",
"Israel ... 0 0 0 8 \n",
"Serbia ... 0 0 0 0 \n",
"Denmark ... 1549 1653 1760 1856 \n",
"Finland ... 2824 2967 3110 3308 \n",
"Norway ... 434 503 570 669 \n",
"Croatia ... 0 0 0 0 \n",
"New Zealand ... 1080 1080 1082 1083 \n",
"Ireland ... 29 29 29 29 \n",
"Lithuania ... 5 5 20 20 \n",
"Uruguay ... 0 0 0 0 \n",
"Latvia ... 3 3 3 3 \n",
"Slovenia ... 1 1 1 1 \n",
"Estonia ... 11 18 18 24 \n",
"Netherlands ... 323 375 417 448 \n",
"Kosovo ... 185 204 251 288 \n",
"\n",
"Years 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 \\\n",
"Country \n",
"United States 15370 15599 15812 16010 16128 \n",
"Brazil 1 1 1 1 1 \n",
"Russia 823 1040 1300 1629 1818 \n",
"Japan 3 5 7 7 7 \n",
"Vietnam 0 0 1 4 4 \n",
"Germany 738 792 820 862 892 \n",
"France 395 450 510 535 597 \n",
"United Kingdom 2605 2633 2643 2648 2649 \n",
"Italy 96 98 102 104 106 \n",
"South Africa 28 28 28 28 28 \n",
"Ukraine 12 12 12 18 19 \n",
"Spain 0 0 0 0 5 \n",
"Poland 617 692 748 825 859 \n",
"Canada 377 383 386 390 391 \n",
"Korea, North 4 4 5 5 5 \n",
"Romania 0 0 0 0 7 \n",
"Australia 861 870 879 887 891 \n",
"Portugal 2 3 3 5 7 \n",
"Belgium 98 104 105 107 110 \n",
"Czech Republic 210 233 255 264 271 \n",
"Hungary 8 8 8 8 8 \n",
"Belarus 0 5 12 21 38 \n",
"Sweden 1643 1791 1928 1988 2007 \n",
"Austria 29 29 29 29 29 \n",
"Switzerland 30 30 30 30 30 \n",
"Israel 13 17 21 25 32 \n",
"Serbia 1 1 2 2 2 \n",
"Denmark 1967 2045 2115 2148 2176 \n",
"Finland 3509 3695 3961 4044 4048 \n",
"Norway 770 873 970 1056 1077 \n",
"Croatia 3 11 15 21 21 \n",
"New Zealand 1088 1091 1094 1097 1097 \n",
"Ireland 29 29 29 29 29 \n",
"Lithuania 20 29 29 29 29 \n",
"Uruguay 0 0 2 2 2 \n",
"Latvia 3 3 3 3 3 \n",
"Slovenia 1 1 1 1 1 \n",
"Estonia 30 57 65 81 97 \n",
"Netherlands 470 487 512 524 536 \n",
"Kosovo 352 424 525 581 632 \n",
"\n",
"Years 2020-01-01 \n",
"Country \n",
"United States 16130 \n",
"Brazil 1 \n",
"Russia 1834 \n",
"Japan 7 \n",
"Vietnam 4 \n",
"Germany 892 \n",
"France 597 \n",
"United Kingdom 2649 \n",
"Italy 106 \n",
"South Africa 28 \n",
"Ukraine 19 \n",
"Spain 5 \n",
"Poland 859 \n",
"Canada 391 \n",
"Korea, North 5 \n",
"Romania 7 \n",
"Australia 891 \n",
"Portugal 7 \n",
"Belgium 110 \n",
"Czech Republic 271 \n",
"Hungary 8 \n",
"Belarus 38 \n",
"Sweden 2008 \n",
"Austria 29 \n",
"Switzerland 30 \n",
"Israel 32 \n",
"Serbia 2 \n",
"Denmark 2176 \n",
"Finland 4051 \n",
"Norway 1077 \n",
"Croatia 21 \n",
"New Zealand 1097 \n",
"Ireland 29 \n",
"Lithuania 29 \n",
"Uruguay 2 \n",
"Latvia 3 \n",
"Slovenia 1 \n",
"Estonia 97 \n",
"Netherlands 536 \n",
"Kosovo 649 \n",
"\n",
"[40 rows x 104 columns]"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')].cumsum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "efd62762",
"metadata": {},
"outputs": [],
"source": [
"corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')] = \\\n",
" corg_clean2.loc[:,(corg_clean2.columns != 'Latitude') & (corg_clean2.columns!='Longitude')].cumsum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "7d8f7724",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" 1917-01-01 00:00:00 \n",
" 1918-01-01 00:00:00 \n",
" 1919-01-01 00:00:00 \n",
" 1920-01-01 00:00:00 \n",
" 1921-01-01 00:00:00 \n",
" 1922-01-01 00:00:00 \n",
" 1923-01-01 00:00:00 \n",
" 1924-01-01 00:00:00 \n",
" 1925-01-01 00:00:00 \n",
" 1926-01-01 00:00:00 \n",
" ... \n",
" 2013-01-01 00:00:00 \n",
" 2014-01-01 00:00:00 \n",
" 2015-01-01 00:00:00 \n",
" 2016-01-01 00:00:00 \n",
" 2017-01-01 00:00:00 \n",
" 2018-01-01 00:00:00 \n",
" 2019-01-01 00:00:00 \n",
" 2020-01-01 00:00:00 \n",
" Latitude \n",
" Longitude \n",
" \n",
" \n",
" Country \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 14789 \n",
" 15069 \n",
" 15370 \n",
" 15599 \n",
" 15812 \n",
" 16010 \n",
" 16128 \n",
" 16130 \n",
" 39.783730 \n",
" -100.445882 \n",
" \n",
" \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" -10.333333 \n",
" -53.200000 \n",
" \n",
" \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 459 \n",
" 586 \n",
" 823 \n",
" 1040 \n",
" 1300 \n",
" 1629 \n",
" 1818 \n",
" 1834 \n",
" 64.686314 \n",
" 97.745306 \n",
" \n",
" \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 5 \n",
" 7 \n",
" 7 \n",
" 7 \n",
" 7 \n",
" 36.574844 \n",
" 139.239418 \n",
" \n",
" \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 4 \n",
" 4 \n",
" 4 \n",
" 15.926666 \n",
" 107.965086 \n",
" \n",
" \n",
" Germany \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 589 \n",
" 666 \n",
" 738 \n",
" 792 \n",
" 820 \n",
" 862 \n",
" 892 \n",
" 892 \n",
" 51.163818 \n",
" 10.447831 \n",
" \n",
" \n",
" France \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 338 \n",
" 375 \n",
" 395 \n",
" 450 \n",
" 510 \n",
" 535 \n",
" 597 \n",
" 597 \n",
" 46.603354 \n",
" 1.888334 \n",
" \n",
" \n",
" United Kingdom \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 3 \n",
" 5 \n",
" 19 \n",
" 31 \n",
" ... \n",
" 2557 \n",
" 2579 \n",
" 2605 \n",
" 2633 \n",
" 2643 \n",
" 2648 \n",
" 2649 \n",
" 2649 \n",
" 54.702354 \n",
" -3.276575 \n",
" \n",
" \n",
" Italy \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 81 \n",
" 94 \n",
" 96 \n",
" 98 \n",
" 102 \n",
" 104 \n",
" 106 \n",
" 106 \n",
" 42.638426 \n",
" 12.674297 \n",
" \n",
" \n",
" South Africa \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 24 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" -28.816624 \n",
" 24.991639 \n",
" \n",
" \n",
" Ukraine \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 10 \n",
" 10 \n",
" 12 \n",
" 12 \n",
" 12 \n",
" 18 \n",
" 19 \n",
" 19 \n",
" 49.487197 \n",
" 31.271832 \n",
" \n",
" \n",
" Spain \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 5 \n",
" 5 \n",
" 39.326068 \n",
" -4.837979 \n",
" \n",
" \n",
" Poland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 460 \n",
" 526 \n",
" 617 \n",
" 692 \n",
" 748 \n",
" 825 \n",
" 859 \n",
" 859 \n",
" 52.215933 \n",
" 19.134422 \n",
" \n",
" \n",
" Canada \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 359 \n",
" 364 \n",
" 377 \n",
" 383 \n",
" 386 \n",
" 390 \n",
" 391 \n",
" 391 \n",
" 61.066692 \n",
" -107.991707 \n",
" \n",
" \n",
" Korea, North \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 4 \n",
" 4 \n",
" 4 \n",
" 5 \n",
" 5 \n",
" 5 \n",
" 5 \n",
" 40.373661 \n",
" 127.087042 \n",
" \n",
" \n",
" Romania \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 7 \n",
" 7 \n",
" 45.985213 \n",
" 24.685923 \n",
" \n",
" \n",
" Australia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 844 \n",
" 855 \n",
" 861 \n",
" 870 \n",
" 879 \n",
" 887 \n",
" 891 \n",
" 891 \n",
" -24.776109 \n",
" 134.755000 \n",
" \n",
" \n",
" Portugal \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 1 \n",
" 2 \n",
" 3 \n",
" 3 \n",
" 5 \n",
" 7 \n",
" 7 \n",
" 39.662165 \n",
" -8.135352 \n",
" \n",
" \n",
" Belgium \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 94 \n",
" 96 \n",
" 98 \n",
" 104 \n",
" 105 \n",
" 107 \n",
" 110 \n",
" 110 \n",
" 50.640281 \n",
" 4.666715 \n",
" \n",
" \n",
" Czech Republic \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 171 \n",
" 188 \n",
" 210 \n",
" 233 \n",
" 255 \n",
" 264 \n",
" 271 \n",
" 271 \n",
" 49.743905 \n",
" 15.338106 \n",
" \n",
" \n",
" Hungary \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 7 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 47.181759 \n",
" 19.506094 \n",
" \n",
" \n",
" Belarus \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 5 \n",
" 12 \n",
" 21 \n",
" 38 \n",
" 38 \n",
" 53.425061 \n",
" 27.697136 \n",
" \n",
" \n",
" Sweden \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1461 \n",
" 1525 \n",
" 1643 \n",
" 1791 \n",
" 1928 \n",
" 1988 \n",
" 2007 \n",
" 2008 \n",
" 59.674971 \n",
" 14.520858 \n",
" \n",
" \n",
" Austria \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 22 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 47.593970 \n",
" 14.124560 \n",
" \n",
" \n",
" Switzerland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 29 \n",
" 29 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 46.798562 \n",
" 8.231974 \n",
" \n",
" \n",
" Israel \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 8 \n",
" 13 \n",
" 17 \n",
" 21 \n",
" 25 \n",
" 32 \n",
" 32 \n",
" 31.394800 \n",
" 34.633583 \n",
" \n",
" \n",
" Serbia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 1 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 44.153412 \n",
" 20.551440 \n",
" \n",
" \n",
" Denmark \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1760 \n",
" 1856 \n",
" 1967 \n",
" 2045 \n",
" 2115 \n",
" 2148 \n",
" 2176 \n",
" 2176 \n",
" 55.670249 \n",
" 10.333328 \n",
" \n",
" \n",
" Finland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3110 \n",
" 3308 \n",
" 3509 \n",
" 3695 \n",
" 3961 \n",
" 4044 \n",
" 4048 \n",
" 4051 \n",
" 63.246778 \n",
" 25.920916 \n",
" \n",
" \n",
" Norway \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 570 \n",
" 669 \n",
" 770 \n",
" 873 \n",
" 970 \n",
" 1056 \n",
" 1077 \n",
" 1077 \n",
" 61.152939 \n",
" 8.787665 \n",
" \n",
" \n",
" Croatia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 3 \n",
" 11 \n",
" 15 \n",
" 21 \n",
" 21 \n",
" 21 \n",
" 45.365844 \n",
" 15.657521 \n",
" \n",
" \n",
" New Zealand \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1082 \n",
" 1083 \n",
" 1088 \n",
" 1091 \n",
" 1094 \n",
" 1097 \n",
" 1097 \n",
" 1097 \n",
" -41.500083 \n",
" 172.834408 \n",
" \n",
" \n",
" Ireland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 52.865196 \n",
" -7.979460 \n",
" \n",
" \n",
" Lithuania \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 20 \n",
" 20 \n",
" 20 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 55.350000 \n",
" 23.750000 \n",
" \n",
" \n",
" Uruguay \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" -32.875555 \n",
" -56.020153 \n",
" \n",
" \n",
" Latvia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 56.840649 \n",
" 24.753764 \n",
" \n",
" \n",
" Slovenia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 46.119944 \n",
" 14.815333 \n",
" \n",
" \n",
" Estonia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 18 \n",
" 24 \n",
" 30 \n",
" 57 \n",
" 65 \n",
" 81 \n",
" 97 \n",
" 97 \n",
" 58.752378 \n",
" 25.331908 \n",
" \n",
" \n",
" Netherlands \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 417 \n",
" 448 \n",
" 470 \n",
" 487 \n",
" 512 \n",
" 524 \n",
" 536 \n",
" 536 \n",
" 52.247650 \n",
" 5.541247 \n",
" \n",
" \n",
" Kosovo \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" ... \n",
" 251 \n",
" 288 \n",
" 352 \n",
" 424 \n",
" 525 \n",
" 581 \n",
" 632 \n",
" 649 \n",
" 42.586958 \n",
" 20.902123 \n",
" \n",
" \n",
"
\n",
"
40 rows × 106 columns
\n",
"
"
],
"text/plain": [
"Years 1917-01-01 00:00:00 1918-01-01 00:00:00 1919-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"Germany 0 0 0 \n",
"France 0 0 0 \n",
"United Kingdom 1 1 1 \n",
"Italy 0 0 0 \n",
"South Africa 0 0 0 \n",
"Ukraine 0 0 0 \n",
"Spain 0 0 0 \n",
"Poland 0 0 0 \n",
"Canada 0 0 0 \n",
"Korea, North 0 0 0 \n",
"Romania 0 0 0 \n",
"Australia 0 0 0 \n",
"Portugal 0 0 0 \n",
"Belgium 0 0 0 \n",
"Czech Republic 0 0 0 \n",
"Hungary 0 0 0 \n",
"Belarus 0 0 0 \n",
"Sweden 0 0 0 \n",
"Austria 0 0 0 \n",
"Switzerland 0 0 0 \n",
"Israel 0 0 0 \n",
"Serbia 0 0 0 \n",
"Denmark 0 0 0 \n",
"Finland 0 0 0 \n",
"Norway 0 0 0 \n",
"Croatia 0 0 0 \n",
"New Zealand 0 0 0 \n",
"Ireland 0 0 0 \n",
"Lithuania 0 0 0 \n",
"Uruguay 0 0 0 \n",
"Latvia 0 0 0 \n",
"Slovenia 0 0 0 \n",
"Estonia 0 0 0 \n",
"Netherlands 0 0 0 \n",
"Kosovo 0 0 1 \n",
"\n",
"Years 1920-01-01 00:00:00 1921-01-01 00:00:00 1922-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"Germany 0 0 0 \n",
"France 0 0 0 \n",
"United Kingdom 1 1 1 \n",
"Italy 0 0 0 \n",
"South Africa 0 0 0 \n",
"Ukraine 0 0 0 \n",
"Spain 0 0 0 \n",
"Poland 0 0 0 \n",
"Canada 0 0 0 \n",
"Korea, North 0 0 0 \n",
"Romania 0 0 0 \n",
"Australia 0 0 0 \n",
"Portugal 0 0 0 \n",
"Belgium 0 0 0 \n",
"Czech Republic 0 0 0 \n",
"Hungary 0 0 0 \n",
"Belarus 0 0 0 \n",
"Sweden 0 0 0 \n",
"Austria 0 0 0 \n",
"Switzerland 0 0 0 \n",
"Israel 0 0 0 \n",
"Serbia 0 0 0 \n",
"Denmark 0 0 0 \n",
"Finland 0 0 0 \n",
"Norway 0 0 0 \n",
"Croatia 0 0 0 \n",
"New Zealand 0 0 0 \n",
"Ireland 0 0 0 \n",
"Lithuania 0 0 0 \n",
"Uruguay 0 0 0 \n",
"Latvia 0 0 0 \n",
"Slovenia 0 0 0 \n",
"Estonia 0 0 0 \n",
"Netherlands 0 0 0 \n",
"Kosovo 1 1 1 \n",
"\n",
"Years 1923-01-01 00:00:00 1924-01-01 00:00:00 1925-01-01 00:00:00 \\\n",
"Country \n",
"United States 0 0 0 \n",
"Brazil 0 0 0 \n",
"Russia 0 0 0 \n",
"Japan 0 0 0 \n",
"Vietnam 0 0 0 \n",
"Germany 0 0 0 \n",
"France 0 0 0 \n",
"United Kingdom 3 5 19 \n",
"Italy 0 0 0 \n",
"South Africa 0 0 0 \n",
"Ukraine 0 0 0 \n",
"Spain 0 0 0 \n",
"Poland 0 0 0 \n",
"Canada 0 0 0 \n",
"Korea, North 0 0 0 \n",
"Romania 0 0 0 \n",
"Australia 0 0 0 \n",
"Portugal 0 0 0 \n",
"Belgium 0 0 0 \n",
"Czech Republic 0 0 0 \n",
"Hungary 0 0 0 \n",
"Belarus 0 0 0 \n",
"Sweden 0 0 0 \n",
"Austria 0 0 0 \n",
"Switzerland 0 0 0 \n",
"Israel 0 0 0 \n",
"Serbia 0 0 0 \n",
"Denmark 0 0 0 \n",
"Finland 0 0 0 \n",
"Norway 0 0 0 \n",
"Croatia 0 0 0 \n",
"New Zealand 0 0 0 \n",
"Ireland 0 0 0 \n",
"Lithuania 0 0 0 \n",
"Uruguay 0 0 0 \n",
"Latvia 0 0 0 \n",
"Slovenia 0 0 0 \n",
"Estonia 0 0 0 \n",
"Netherlands 0 0 0 \n",
"Kosovo 1 1 1 \n",
"\n",
"Years 1926-01-01 00:00:00 ... 2013-01-01 00:00:00 \\\n",
"Country ... \n",
"United States 0 ... 14789 \n",
"Brazil 0 ... 1 \n",
"Russia 0 ... 459 \n",
"Japan 0 ... 3 \n",
"Vietnam 0 ... 0 \n",
"Germany 0 ... 589 \n",
"France 0 ... 338 \n",
"United Kingdom 31 ... 2557 \n",
"Italy 0 ... 81 \n",
"South Africa 0 ... 24 \n",
"Ukraine 0 ... 10 \n",
"Spain 0 ... 0 \n",
"Poland 0 ... 460 \n",
"Canada 0 ... 359 \n",
"Korea, North 0 ... 3 \n",
"Romania 0 ... 0 \n",
"Australia 0 ... 844 \n",
"Portugal 0 ... 0 \n",
"Belgium 0 ... 94 \n",
"Czech Republic 0 ... 171 \n",
"Hungary 0 ... 7 \n",
"Belarus 0 ... 0 \n",
"Sweden 0 ... 1461 \n",
"Austria 0 ... 22 \n",
"Switzerland 0 ... 29 \n",
"Israel 0 ... 0 \n",
"Serbia 0 ... 0 \n",
"Denmark 0 ... 1760 \n",
"Finland 0 ... 3110 \n",
"Norway 0 ... 570 \n",
"Croatia 0 ... 0 \n",
"New Zealand 0 ... 1082 \n",
"Ireland 0 ... 29 \n",
"Lithuania 0 ... 20 \n",
"Uruguay 0 ... 0 \n",
"Latvia 0 ... 3 \n",
"Slovenia 0 ... 1 \n",
"Estonia 0 ... 18 \n",
"Netherlands 0 ... 417 \n",
"Kosovo 1 ... 251 \n",
"\n",
"Years 2014-01-01 00:00:00 2015-01-01 00:00:00 2016-01-01 00:00:00 \\\n",
"Country \n",
"United States 15069 15370 15599 \n",
"Brazil 1 1 1 \n",
"Russia 586 823 1040 \n",
"Japan 3 3 5 \n",
"Vietnam 0 0 0 \n",
"Germany 666 738 792 \n",
"France 375 395 450 \n",
"United Kingdom 2579 2605 2633 \n",
"Italy 94 96 98 \n",
"South Africa 28 28 28 \n",
"Ukraine 10 12 12 \n",
"Spain 0 0 0 \n",
"Poland 526 617 692 \n",
"Canada 364 377 383 \n",
"Korea, North 4 4 4 \n",
"Romania 0 0 0 \n",
"Australia 855 861 870 \n",
"Portugal 1 2 3 \n",
"Belgium 96 98 104 \n",
"Czech Republic 188 210 233 \n",
"Hungary 8 8 8 \n",
"Belarus 0 0 5 \n",
"Sweden 1525 1643 1791 \n",
"Austria 29 29 29 \n",
"Switzerland 29 30 30 \n",
"Israel 8 13 17 \n",
"Serbia 0 1 1 \n",
"Denmark 1856 1967 2045 \n",
"Finland 3308 3509 3695 \n",
"Norway 669 770 873 \n",
"Croatia 0 3 11 \n",
"New Zealand 1083 1088 1091 \n",
"Ireland 29 29 29 \n",
"Lithuania 20 20 29 \n",
"Uruguay 0 0 0 \n",
"Latvia 3 3 3 \n",
"Slovenia 1 1 1 \n",
"Estonia 24 30 57 \n",
"Netherlands 448 470 487 \n",
"Kosovo 288 352 424 \n",
"\n",
"Years 2017-01-01 00:00:00 2018-01-01 00:00:00 2019-01-01 00:00:00 \\\n",
"Country \n",
"United States 15812 16010 16128 \n",
"Brazil 1 1 1 \n",
"Russia 1300 1629 1818 \n",
"Japan 7 7 7 \n",
"Vietnam 1 4 4 \n",
"Germany 820 862 892 \n",
"France 510 535 597 \n",
"United Kingdom 2643 2648 2649 \n",
"Italy 102 104 106 \n",
"South Africa 28 28 28 \n",
"Ukraine 12 18 19 \n",
"Spain 0 0 5 \n",
"Poland 748 825 859 \n",
"Canada 386 390 391 \n",
"Korea, North 5 5 5 \n",
"Romania 0 0 7 \n",
"Australia 879 887 891 \n",
"Portugal 3 5 7 \n",
"Belgium 105 107 110 \n",
"Czech Republic 255 264 271 \n",
"Hungary 8 8 8 \n",
"Belarus 12 21 38 \n",
"Sweden 1928 1988 2007 \n",
"Austria 29 29 29 \n",
"Switzerland 30 30 30 \n",
"Israel 21 25 32 \n",
"Serbia 2 2 2 \n",
"Denmark 2115 2148 2176 \n",
"Finland 3961 4044 4048 \n",
"Norway 970 1056 1077 \n",
"Croatia 15 21 21 \n",
"New Zealand 1094 1097 1097 \n",
"Ireland 29 29 29 \n",
"Lithuania 29 29 29 \n",
"Uruguay 2 2 2 \n",
"Latvia 3 3 3 \n",
"Slovenia 1 1 1 \n",
"Estonia 65 81 97 \n",
"Netherlands 512 524 536 \n",
"Kosovo 525 581 632 \n",
"\n",
"Years 2020-01-01 00:00:00 Latitude Longitude \n",
"Country \n",
"United States 16130 39.783730 -100.445882 \n",
"Brazil 1 -10.333333 -53.200000 \n",
"Russia 1834 64.686314 97.745306 \n",
"Japan 7 36.574844 139.239418 \n",
"Vietnam 4 15.926666 107.965086 \n",
"Germany 892 51.163818 10.447831 \n",
"France 597 46.603354 1.888334 \n",
"United Kingdom 2649 54.702354 -3.276575 \n",
"Italy 106 42.638426 12.674297 \n",
"South Africa 28 -28.816624 24.991639 \n",
"Ukraine 19 49.487197 31.271832 \n",
"Spain 5 39.326068 -4.837979 \n",
"Poland 859 52.215933 19.134422 \n",
"Canada 391 61.066692 -107.991707 \n",
"Korea, North 5 40.373661 127.087042 \n",
"Romania 7 45.985213 24.685923 \n",
"Australia 891 -24.776109 134.755000 \n",
"Portugal 7 39.662165 -8.135352 \n",
"Belgium 110 50.640281 4.666715 \n",
"Czech Republic 271 49.743905 15.338106 \n",
"Hungary 8 47.181759 19.506094 \n",
"Belarus 38 53.425061 27.697136 \n",
"Sweden 2008 59.674971 14.520858 \n",
"Austria 29 47.593970 14.124560 \n",
"Switzerland 30 46.798562 8.231974 \n",
"Israel 32 31.394800 34.633583 \n",
"Serbia 2 44.153412 20.551440 \n",
"Denmark 2176 55.670249 10.333328 \n",
"Finland 4051 63.246778 25.920916 \n",
"Norway 1077 61.152939 8.787665 \n",
"Croatia 21 45.365844 15.657521 \n",
"New Zealand 1097 -41.500083 172.834408 \n",
"Ireland 29 52.865196 -7.979460 \n",
"Lithuania 29 55.350000 23.750000 \n",
"Uruguay 2 -32.875555 -56.020153 \n",
"Latvia 3 56.840649 24.753764 \n",
"Slovenia 1 46.119944 14.815333 \n",
"Estonia 97 58.752378 25.331908 \n",
"Netherlands 536 52.247650 5.541247 \n",
"Kosovo 649 42.586958 20.902123 \n",
"\n",
"[40 rows x 106 columns]"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "c7fda0a7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" Country \n",
" 1917-01-01 00:00:00 \n",
" 1918-01-01 00:00:00 \n",
" 1919-01-01 00:00:00 \n",
" 1920-01-01 00:00:00 \n",
" 1921-01-01 00:00:00 \n",
" 1922-01-01 00:00:00 \n",
" 1923-01-01 00:00:00 \n",
" 1924-01-01 00:00:00 \n",
" 1925-01-01 00:00:00 \n",
" ... \n",
" 2013-01-01 00:00:00 \n",
" 2014-01-01 00:00:00 \n",
" 2015-01-01 00:00:00 \n",
" 2016-01-01 00:00:00 \n",
" 2017-01-01 00:00:00 \n",
" 2018-01-01 00:00:00 \n",
" 2019-01-01 00:00:00 \n",
" 2020-01-01 00:00:00 \n",
" Latitude \n",
" Longitude \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 14789 \n",
" 15069 \n",
" 15370 \n",
" 15599 \n",
" 15812 \n",
" 16010 \n",
" 16128 \n",
" 16130 \n",
" 39.783730 \n",
" -100.445882 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" -10.333333 \n",
" -53.200000 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 459 \n",
" 586 \n",
" 823 \n",
" 1040 \n",
" 1300 \n",
" 1629 \n",
" 1818 \n",
" 1834 \n",
" 64.686314 \n",
" 97.745306 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 5 \n",
" 7 \n",
" 7 \n",
" 7 \n",
" 7 \n",
" 36.574844 \n",
" 139.239418 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 4 \n",
" 4 \n",
" 4 \n",
" 15.926666 \n",
" 107.965086 \n",
" \n",
" \n",
" 5 \n",
" Germany \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 589 \n",
" 666 \n",
" 738 \n",
" 792 \n",
" 820 \n",
" 862 \n",
" 892 \n",
" 892 \n",
" 51.163818 \n",
" 10.447831 \n",
" \n",
" \n",
" 6 \n",
" France \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 338 \n",
" 375 \n",
" 395 \n",
" 450 \n",
" 510 \n",
" 535 \n",
" 597 \n",
" 597 \n",
" 46.603354 \n",
" 1.888334 \n",
" \n",
" \n",
" 7 \n",
" United Kingdom \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 3 \n",
" 5 \n",
" 19 \n",
" ... \n",
" 2557 \n",
" 2579 \n",
" 2605 \n",
" 2633 \n",
" 2643 \n",
" 2648 \n",
" 2649 \n",
" 2649 \n",
" 54.702354 \n",
" -3.276575 \n",
" \n",
" \n",
" 8 \n",
" Italy \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 81 \n",
" 94 \n",
" 96 \n",
" 98 \n",
" 102 \n",
" 104 \n",
" 106 \n",
" 106 \n",
" 42.638426 \n",
" 12.674297 \n",
" \n",
" \n",
" 9 \n",
" South Africa \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 24 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" 28 \n",
" -28.816624 \n",
" 24.991639 \n",
" \n",
" \n",
" 10 \n",
" Ukraine \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 10 \n",
" 10 \n",
" 12 \n",
" 12 \n",
" 12 \n",
" 18 \n",
" 19 \n",
" 19 \n",
" 49.487197 \n",
" 31.271832 \n",
" \n",
" \n",
" 11 \n",
" Spain \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 5 \n",
" 5 \n",
" 39.326068 \n",
" -4.837979 \n",
" \n",
" \n",
" 12 \n",
" Poland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 460 \n",
" 526 \n",
" 617 \n",
" 692 \n",
" 748 \n",
" 825 \n",
" 859 \n",
" 859 \n",
" 52.215933 \n",
" 19.134422 \n",
" \n",
" \n",
" 13 \n",
" Canada \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 359 \n",
" 364 \n",
" 377 \n",
" 383 \n",
" 386 \n",
" 390 \n",
" 391 \n",
" 391 \n",
" 61.066692 \n",
" -107.991707 \n",
" \n",
" \n",
" 14 \n",
" Korea, North \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 4 \n",
" 4 \n",
" 4 \n",
" 5 \n",
" 5 \n",
" 5 \n",
" 5 \n",
" 40.373661 \n",
" 127.087042 \n",
" \n",
" \n",
" 15 \n",
" Romania \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 7 \n",
" 7 \n",
" 45.985213 \n",
" 24.685923 \n",
" \n",
" \n",
" 16 \n",
" Australia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 844 \n",
" 855 \n",
" 861 \n",
" 870 \n",
" 879 \n",
" 887 \n",
" 891 \n",
" 891 \n",
" -24.776109 \n",
" 134.755000 \n",
" \n",
" \n",
" 17 \n",
" Portugal \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 1 \n",
" 2 \n",
" 3 \n",
" 3 \n",
" 5 \n",
" 7 \n",
" 7 \n",
" 39.662165 \n",
" -8.135352 \n",
" \n",
" \n",
" 18 \n",
" Belgium \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 94 \n",
" 96 \n",
" 98 \n",
" 104 \n",
" 105 \n",
" 107 \n",
" 110 \n",
" 110 \n",
" 50.640281 \n",
" 4.666715 \n",
" \n",
" \n",
" 19 \n",
" Czech Republic \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 171 \n",
" 188 \n",
" 210 \n",
" 233 \n",
" 255 \n",
" 264 \n",
" 271 \n",
" 271 \n",
" 49.743905 \n",
" 15.338106 \n",
" \n",
" \n",
" 20 \n",
" Hungary \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 7 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 8 \n",
" 47.181759 \n",
" 19.506094 \n",
" \n",
" \n",
" 21 \n",
" Belarus \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 5 \n",
" 12 \n",
" 21 \n",
" 38 \n",
" 38 \n",
" 53.425061 \n",
" 27.697136 \n",
" \n",
" \n",
" 22 \n",
" Sweden \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1461 \n",
" 1525 \n",
" 1643 \n",
" 1791 \n",
" 1928 \n",
" 1988 \n",
" 2007 \n",
" 2008 \n",
" 59.674971 \n",
" 14.520858 \n",
" \n",
" \n",
" 23 \n",
" Austria \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 22 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 47.593970 \n",
" 14.124560 \n",
" \n",
" \n",
" 24 \n",
" Switzerland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 29 \n",
" 29 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 30 \n",
" 46.798562 \n",
" 8.231974 \n",
" \n",
" \n",
" 25 \n",
" Israel \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 8 \n",
" 13 \n",
" 17 \n",
" 21 \n",
" 25 \n",
" 32 \n",
" 32 \n",
" 31.394800 \n",
" 34.633583 \n",
" \n",
" \n",
" 26 \n",
" Serbia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 1 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 44.153412 \n",
" 20.551440 \n",
" \n",
" \n",
" 27 \n",
" Denmark \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1760 \n",
" 1856 \n",
" 1967 \n",
" 2045 \n",
" 2115 \n",
" 2148 \n",
" 2176 \n",
" 2176 \n",
" 55.670249 \n",
" 10.333328 \n",
" \n",
" \n",
" 28 \n",
" Finland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3110 \n",
" 3308 \n",
" 3509 \n",
" 3695 \n",
" 3961 \n",
" 4044 \n",
" 4048 \n",
" 4051 \n",
" 63.246778 \n",
" 25.920916 \n",
" \n",
" \n",
" 29 \n",
" Norway \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 570 \n",
" 669 \n",
" 770 \n",
" 873 \n",
" 970 \n",
" 1056 \n",
" 1077 \n",
" 1077 \n",
" 61.152939 \n",
" 8.787665 \n",
" \n",
" \n",
" 30 \n",
" Croatia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 3 \n",
" 11 \n",
" 15 \n",
" 21 \n",
" 21 \n",
" 21 \n",
" 45.365844 \n",
" 15.657521 \n",
" \n",
" \n",
" 31 \n",
" New Zealand \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1082 \n",
" 1083 \n",
" 1088 \n",
" 1091 \n",
" 1094 \n",
" 1097 \n",
" 1097 \n",
" 1097 \n",
" -41.500083 \n",
" 172.834408 \n",
" \n",
" \n",
" 32 \n",
" Ireland \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 52.865196 \n",
" -7.979460 \n",
" \n",
" \n",
" 33 \n",
" Lithuania \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 20 \n",
" 20 \n",
" 20 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 29 \n",
" 55.350000 \n",
" 23.750000 \n",
" \n",
" \n",
" 34 \n",
" Uruguay \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" 2 \n",
" -32.875555 \n",
" -56.020153 \n",
" \n",
" \n",
" 35 \n",
" Latvia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 3 \n",
" 56.840649 \n",
" 24.753764 \n",
" \n",
" \n",
" 36 \n",
" Slovenia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 46.119944 \n",
" 14.815333 \n",
" \n",
" \n",
" 37 \n",
" Estonia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 18 \n",
" 24 \n",
" 30 \n",
" 57 \n",
" 65 \n",
" 81 \n",
" 97 \n",
" 97 \n",
" 58.752378 \n",
" 25.331908 \n",
" \n",
" \n",
" 38 \n",
" Netherlands \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 417 \n",
" 448 \n",
" 470 \n",
" 487 \n",
" 512 \n",
" 524 \n",
" 536 \n",
" 536 \n",
" 52.247650 \n",
" 5.541247 \n",
" \n",
" \n",
" 39 \n",
" Kosovo \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" 1 \n",
" ... \n",
" 251 \n",
" 288 \n",
" 352 \n",
" 424 \n",
" 525 \n",
" 581 \n",
" 632 \n",
" 649 \n",
" 42.586958 \n",
" 20.902123 \n",
" \n",
" \n",
"
\n",
"
40 rows × 107 columns
\n",
"
"
],
"text/plain": [
"Years Country 1917-01-01 00:00:00 1918-01-01 00:00:00 \\\n",
"0 United States 0 0 \n",
"1 Brazil 0 0 \n",
"2 Russia 0 0 \n",
"3 Japan 0 0 \n",
"4 Vietnam 0 0 \n",
"5 Germany 0 0 \n",
"6 France 0 0 \n",
"7 United Kingdom 1 1 \n",
"8 Italy 0 0 \n",
"9 South Africa 0 0 \n",
"10 Ukraine 0 0 \n",
"11 Spain 0 0 \n",
"12 Poland 0 0 \n",
"13 Canada 0 0 \n",
"14 Korea, North 0 0 \n",
"15 Romania 0 0 \n",
"16 Australia 0 0 \n",
"17 Portugal 0 0 \n",
"18 Belgium 0 0 \n",
"19 Czech Republic 0 0 \n",
"20 Hungary 0 0 \n",
"21 Belarus 0 0 \n",
"22 Sweden 0 0 \n",
"23 Austria 0 0 \n",
"24 Switzerland 0 0 \n",
"25 Israel 0 0 \n",
"26 Serbia 0 0 \n",
"27 Denmark 0 0 \n",
"28 Finland 0 0 \n",
"29 Norway 0 0 \n",
"30 Croatia 0 0 \n",
"31 New Zealand 0 0 \n",
"32 Ireland 0 0 \n",
"33 Lithuania 0 0 \n",
"34 Uruguay 0 0 \n",
"35 Latvia 0 0 \n",
"36 Slovenia 0 0 \n",
"37 Estonia 0 0 \n",
"38 Netherlands 0 0 \n",
"39 Kosovo 0 0 \n",
"\n",
"Years 1919-01-01 00:00:00 1920-01-01 00:00:00 1921-01-01 00:00:00 \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"5 0 0 0 \n",
"6 0 0 0 \n",
"7 1 1 1 \n",
"8 0 0 0 \n",
"9 0 0 0 \n",
"10 0 0 0 \n",
"11 0 0 0 \n",
"12 0 0 0 \n",
"13 0 0 0 \n",
"14 0 0 0 \n",
"15 0 0 0 \n",
"16 0 0 0 \n",
"17 0 0 0 \n",
"18 0 0 0 \n",
"19 0 0 0 \n",
"20 0 0 0 \n",
"21 0 0 0 \n",
"22 0 0 0 \n",
"23 0 0 0 \n",
"24 0 0 0 \n",
"25 0 0 0 \n",
"26 0 0 0 \n",
"27 0 0 0 \n",
"28 0 0 0 \n",
"29 0 0 0 \n",
"30 0 0 0 \n",
"31 0 0 0 \n",
"32 0 0 0 \n",
"33 0 0 0 \n",
"34 0 0 0 \n",
"35 0 0 0 \n",
"36 0 0 0 \n",
"37 0 0 0 \n",
"38 0 0 0 \n",
"39 1 1 1 \n",
"\n",
"Years 1922-01-01 00:00:00 1923-01-01 00:00:00 1924-01-01 00:00:00 \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"5 0 0 0 \n",
"6 0 0 0 \n",
"7 1 3 5 \n",
"8 0 0 0 \n",
"9 0 0 0 \n",
"10 0 0 0 \n",
"11 0 0 0 \n",
"12 0 0 0 \n",
"13 0 0 0 \n",
"14 0 0 0 \n",
"15 0 0 0 \n",
"16 0 0 0 \n",
"17 0 0 0 \n",
"18 0 0 0 \n",
"19 0 0 0 \n",
"20 0 0 0 \n",
"21 0 0 0 \n",
"22 0 0 0 \n",
"23 0 0 0 \n",
"24 0 0 0 \n",
"25 0 0 0 \n",
"26 0 0 0 \n",
"27 0 0 0 \n",
"28 0 0 0 \n",
"29 0 0 0 \n",
"30 0 0 0 \n",
"31 0 0 0 \n",
"32 0 0 0 \n",
"33 0 0 0 \n",
"34 0 0 0 \n",
"35 0 0 0 \n",
"36 0 0 0 \n",
"37 0 0 0 \n",
"38 0 0 0 \n",
"39 1 1 1 \n",
"\n",
"Years 1925-01-01 00:00:00 ... 2013-01-01 00:00:00 2014-01-01 00:00:00 \\\n",
"0 0 ... 14789 15069 \n",
"1 0 ... 1 1 \n",
"2 0 ... 459 586 \n",
"3 0 ... 3 3 \n",
"4 0 ... 0 0 \n",
"5 0 ... 589 666 \n",
"6 0 ... 338 375 \n",
"7 19 ... 2557 2579 \n",
"8 0 ... 81 94 \n",
"9 0 ... 24 28 \n",
"10 0 ... 10 10 \n",
"11 0 ... 0 0 \n",
"12 0 ... 460 526 \n",
"13 0 ... 359 364 \n",
"14 0 ... 3 4 \n",
"15 0 ... 0 0 \n",
"16 0 ... 844 855 \n",
"17 0 ... 0 1 \n",
"18 0 ... 94 96 \n",
"19 0 ... 171 188 \n",
"20 0 ... 7 8 \n",
"21 0 ... 0 0 \n",
"22 0 ... 1461 1525 \n",
"23 0 ... 22 29 \n",
"24 0 ... 29 29 \n",
"25 0 ... 0 8 \n",
"26 0 ... 0 0 \n",
"27 0 ... 1760 1856 \n",
"28 0 ... 3110 3308 \n",
"29 0 ... 570 669 \n",
"30 0 ... 0 0 \n",
"31 0 ... 1082 1083 \n",
"32 0 ... 29 29 \n",
"33 0 ... 20 20 \n",
"34 0 ... 0 0 \n",
"35 0 ... 3 3 \n",
"36 0 ... 1 1 \n",
"37 0 ... 18 24 \n",
"38 0 ... 417 448 \n",
"39 1 ... 251 288 \n",
"\n",
"Years 2015-01-01 00:00:00 2016-01-01 00:00:00 2017-01-01 00:00:00 \\\n",
"0 15370 15599 15812 \n",
"1 1 1 1 \n",
"2 823 1040 1300 \n",
"3 3 5 7 \n",
"4 0 0 1 \n",
"5 738 792 820 \n",
"6 395 450 510 \n",
"7 2605 2633 2643 \n",
"8 96 98 102 \n",
"9 28 28 28 \n",
"10 12 12 12 \n",
"11 0 0 0 \n",
"12 617 692 748 \n",
"13 377 383 386 \n",
"14 4 4 5 \n",
"15 0 0 0 \n",
"16 861 870 879 \n",
"17 2 3 3 \n",
"18 98 104 105 \n",
"19 210 233 255 \n",
"20 8 8 8 \n",
"21 0 5 12 \n",
"22 1643 1791 1928 \n",
"23 29 29 29 \n",
"24 30 30 30 \n",
"25 13 17 21 \n",
"26 1 1 2 \n",
"27 1967 2045 2115 \n",
"28 3509 3695 3961 \n",
"29 770 873 970 \n",
"30 3 11 15 \n",
"31 1088 1091 1094 \n",
"32 29 29 29 \n",
"33 20 29 29 \n",
"34 0 0 2 \n",
"35 3 3 3 \n",
"36 1 1 1 \n",
"37 30 57 65 \n",
"38 470 487 512 \n",
"39 352 424 525 \n",
"\n",
"Years 2018-01-01 00:00:00 2019-01-01 00:00:00 2020-01-01 00:00:00 \\\n",
"0 16010 16128 16130 \n",
"1 1 1 1 \n",
"2 1629 1818 1834 \n",
"3 7 7 7 \n",
"4 4 4 4 \n",
"5 862 892 892 \n",
"6 535 597 597 \n",
"7 2648 2649 2649 \n",
"8 104 106 106 \n",
"9 28 28 28 \n",
"10 18 19 19 \n",
"11 0 5 5 \n",
"12 825 859 859 \n",
"13 390 391 391 \n",
"14 5 5 5 \n",
"15 0 7 7 \n",
"16 887 891 891 \n",
"17 5 7 7 \n",
"18 107 110 110 \n",
"19 264 271 271 \n",
"20 8 8 8 \n",
"21 21 38 38 \n",
"22 1988 2007 2008 \n",
"23 29 29 29 \n",
"24 30 30 30 \n",
"25 25 32 32 \n",
"26 2 2 2 \n",
"27 2148 2176 2176 \n",
"28 4044 4048 4051 \n",
"29 1056 1077 1077 \n",
"30 21 21 21 \n",
"31 1097 1097 1097 \n",
"32 29 29 29 \n",
"33 29 29 29 \n",
"34 2 2 2 \n",
"35 3 3 3 \n",
"36 1 1 1 \n",
"37 81 97 97 \n",
"38 524 536 536 \n",
"39 581 632 649 \n",
"\n",
"Years Latitude Longitude \n",
"0 39.783730 -100.445882 \n",
"1 -10.333333 -53.200000 \n",
"2 64.686314 97.745306 \n",
"3 36.574844 139.239418 \n",
"4 15.926666 107.965086 \n",
"5 51.163818 10.447831 \n",
"6 46.603354 1.888334 \n",
"7 54.702354 -3.276575 \n",
"8 42.638426 12.674297 \n",
"9 -28.816624 24.991639 \n",
"10 49.487197 31.271832 \n",
"11 39.326068 -4.837979 \n",
"12 52.215933 19.134422 \n",
"13 61.066692 -107.991707 \n",
"14 40.373661 127.087042 \n",
"15 45.985213 24.685923 \n",
"16 -24.776109 134.755000 \n",
"17 39.662165 -8.135352 \n",
"18 50.640281 4.666715 \n",
"19 49.743905 15.338106 \n",
"20 47.181759 19.506094 \n",
"21 53.425061 27.697136 \n",
"22 59.674971 14.520858 \n",
"23 47.593970 14.124560 \n",
"24 46.798562 8.231974 \n",
"25 31.394800 34.633583 \n",
"26 44.153412 20.551440 \n",
"27 55.670249 10.333328 \n",
"28 63.246778 25.920916 \n",
"29 61.152939 8.787665 \n",
"30 45.365844 15.657521 \n",
"31 -41.500083 172.834408 \n",
"32 52.865196 -7.979460 \n",
"33 55.350000 23.750000 \n",
"34 -32.875555 -56.020153 \n",
"35 56.840649 24.753764 \n",
"36 46.119944 14.815333 \n",
"37 58.752378 25.331908 \n",
"38 52.247650 5.541247 \n",
"39 42.586958 20.902123 \n",
"\n",
"[40 rows x 107 columns]"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2.reset_index()"
]
},
{
"cell_type": "markdown",
"id": "a2e7245a",
"metadata": {},
"source": [
"Now let's \"remelt\" the data into a format we can use for our plot:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "011a29fa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" Longitude \n",
" Latitude \n",
" Year \n",
" cumulative_sum \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" -100.445882 \n",
" 39.783730 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" -53.200000 \n",
" -10.333333 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 97.745306 \n",
" 64.686314 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 139.239418 \n",
" 36.574844 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 107.965086 \n",
" 15.926666 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Longitude Latitude Year cumulative_sum\n",
"0 United States -100.445882 39.783730 1917-01-01 0\n",
"1 Brazil -53.200000 -10.333333 1917-01-01 0\n",
"2 Russia 97.745306 64.686314 1917-01-01 0\n",
"3 Japan 139.239418 36.574844 1917-01-01 0\n",
"4 Vietnam 107.965086 15.926666 1917-01-01 0"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_melt = corg_clean2.reset_index().melt(['Country','Longitude','Latitude'], \n",
" var_name='Year', value_name='cumulative_sum')\n",
"corg_melt.head()"
]
},
{
"cell_type": "markdown",
"id": "b9dd0f22",
"metadata": {},
"source": [
"Now we can make a little [slider in Altair](https://altair-viz.github.io/user_guide/interactions.html#selection-values-in-expressions) to change the date range for our circles interactively:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "b5ea8d6a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Timestamp('1917-01-01 00:00:00'), Timestamp('2020-01-01 00:00:00'))"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_melt['Year'].min(), corg_melt['Year'].max()"
]
},
{
"cell_type": "markdown",
"id": "ccfc3675",
"metadata": {},
"source": [
"Since sliders (at least at the time of writing) [can't have datetime inputs](https://stackoverflow.com/questions/62046930/altair-adding-date-slider-for-interactive-scatter-chart-pot) let's cheat a bit by making another column called \"year_int\":"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "5724da74",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" Longitude \n",
" Latitude \n",
" Year \n",
" cumulative_sum \n",
" year_int \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" -100.445882 \n",
" 39.783730 \n",
" 1917-01-01 \n",
" 0 \n",
" 1917 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" -53.200000 \n",
" -10.333333 \n",
" 1917-01-01 \n",
" 0 \n",
" 1917 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 97.745306 \n",
" 64.686314 \n",
" 1917-01-01 \n",
" 0 \n",
" 1917 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 139.239418 \n",
" 36.574844 \n",
" 1917-01-01 \n",
" 0 \n",
" 1917 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 107.965086 \n",
" 15.926666 \n",
" 1917-01-01 \n",
" 0 \n",
" 1917 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country Longitude Latitude Year cumulative_sum year_int\n",
"0 United States -100.445882 39.783730 1917-01-01 0 1917\n",
"1 Brazil -53.200000 -10.333333 1917-01-01 0 1917\n",
"2 Russia 97.745306 64.686314 1917-01-01 0 1917\n",
"3 Japan 139.239418 36.574844 1917-01-01 0 1917\n",
"4 Vietnam 107.965086 15.926666 1917-01-01 0 1917"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_melt['year_int'] = corg_melt['Year'].dt.year.astype('int')\n",
"corg_melt.head()"
]
},
{
"cell_type": "markdown",
"id": "acbfe05d",
"metadata": {},
"source": [
"Note here to that for [equity selections we don't have to use == signs](https://stackoverflow.com/questions/68071713/in-altair-equality-condition-doesnt-work) in Altair (it won't work... just for fun I guess)."
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "07fa6c96-851c-4eab-a01b-bfdc496a58e7",
"metadata": {},
"outputs": [],
"source": [
"alt.selection_point?"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "56195c43",
"metadata": {},
"outputs": [],
"source": [
"slider = alt.binding_range(min=corg_melt['year_int'].min(), \n",
" max=corg_melt['year_int'].max(), step=1, name='Max year:')\n",
"#selector = alt.selection_single(name=\"SelectorName\", fields=['cutoff'],\n",
"# bind=slider, init={'cutoff': 2000})\n",
"# selector = alt.selection_single(name=\"SelectorName\", fields=['year_int'],\n",
"# bind=slider, init={'year_int': 2000})\n",
"selector = alt.selection_point(name=\"SelectorName\", fields=['year_int'],\n",
" bind=slider, value=2000)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "1ded3b87",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='lightgray',\n",
" stroke='white'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_melt).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" #size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n",
" size=alt.condition(\n",
" #((alt.datum.year_int < selector.cutoff-10)&(alt.datum.year_int >= selector.cutoff)),\n",
" #(alt.datum.year_int == selector.cutoff),\n",
" #\"datum.year_int == selector.cutoff\",\n",
" #alt.expr.datum['year_int'] < selector.cutoff,\n",
" selector,\n",
" alt.Size('cumulative_sum:Q',scale=None), alt.value(0)\n",
" ),\n",
" tooltip='Country',\n",
"# ).add_selection(\n",
"# selector\n",
"# )\n",
").add_params(\n",
" selector\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "24ddbe0b",
"metadata": {},
"source": [
"One final thing, let's add in some info about what each dot means in our tooltip:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "1fdbcbdf",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='lightgray',\n",
" stroke='black'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
"#).project('mercator') # note we have a few projections we can use!\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_melt).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" #size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n",
" size=alt.condition(\n",
" #((alt.datum.year_int < selector.year_int-10)&(alt.datum.year_int >= selector.year_int)),\n",
" #(alt.datum.year_int < selector.year_int),\n",
" #\"datum.year_int == selector.year_int\",\n",
" #alt.expr.datum['year_int'] < selector.year_int,\n",
" selector,\n",
" alt.Size('cumulative_sum:Q',scale=None), alt.value(0)\n",
" ),\n",
" tooltip=['Country','cumulative_sum'],\n",
").add_params(\n",
" selector\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "markdown",
"id": "2c24cdbf",
"metadata": {},
"source": [
"Looks nice! Let's save it:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "543eb7c1",
"metadata": {},
"outputs": [],
"source": [
"myJekyllDir = '/Users/jnaiman/jnaiman.github.io/assets/json/'\n",
"\n",
"# note: for PL usages\n",
"# myJekyllDir = './'\n",
"# and then you need to download"
]
},
{
"cell_type": "code",
"execution_count": 65,
"id": "0e4e3f7e",
"metadata": {},
"outputs": [],
"source": [
"chart_out = world + points\n",
"\n",
"chart_out.properties(width='container').save(myJekyllDir+\"corgis_dotchart_world.json\") "
]
},
{
"cell_type": "markdown",
"id": "3794475b",
"metadata": {},
"source": [
"We note that when we run this though, we get a few artifacts. We can try to \"smooth\" the transitions with a bit of interpolation:"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "65c23b3d",
"metadata": {},
"outputs": [],
"source": [
"alt.selection_point?"
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "d0c34d9d",
"metadata": {},
"outputs": [],
"source": [
"slider = alt.binding_range(min=corg_melt['year_int'].min(), \n",
" max=corg_melt['year_int'].max(), step=1, name='Max year:')\n",
"# selector = alt.selection_single(name=\"SelectorName\", fields=['year_int'],\n",
"# bind=slider, init={'year_int': corg_melt['year_int'].min()},\n",
"# nearest=True)\n",
"\n",
"selector = alt.selection_point(name=\"SelectorName\", fields=['year_int'],\n",
" bind=slider, value=corg_melt['year_int'].min(), #init={'year_int': corg_melt['year_int'].min()},\n",
" nearest=True)"
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "4467f02f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geo = alt.topo_feature(data.world_110m.url, feature='countries')\n",
"\n",
"# US states background\n",
"world = alt.Chart(geo).mark_geoshape(\n",
" fill='lightgray',\n",
" stroke='black'\n",
").properties(\n",
" width=800,\n",
" height=500\n",
"#).project('mercator') # note we have a few projections we can use!\n",
").project('equirectangular') # note we have a few projections we can use!\n",
"\n",
"points = alt.Chart(corg_melt).mark_circle().encode(\n",
" longitude='Longitude:Q',\n",
" latitude='Latitude:Q',\n",
" #size=alt.Size('Total Corg:Q',scale=alt.Scale(type='log')),\n",
" size=alt.condition(\n",
" #((alt.datum.year_int < selector.year_int-10)&(alt.datum.year_int >= selector.year_int)),\n",
" #(alt.datum.year_int < selector.year_int),\n",
" #\"datum.year_int == selector.year_int\",\n",
" #alt.expr.datum['year_int'] < selector.year_int,\n",
" selector,\n",
" alt.Size('cumulative_sum:Q',scale=None), alt.value(0)\n",
" ),\n",
" tooltip=['Country','cumulative_sum'],\n",
").add_params(\n",
" selector\n",
"#).add_selection(\n",
"# selector\n",
")\n",
"\n",
"world + points"
]
},
{
"cell_type": "code",
"execution_count": 69,
"id": "925dd8fc",
"metadata": {},
"outputs": [],
"source": [
"chart_out = world + points\n",
"\n",
"chart_out.properties(width='container').save(myJekyllDir+\"corgis_dotchart_world_smooth.json\") "
]
},
{
"cell_type": "markdown",
"id": "240f2707",
"metadata": {},
"source": [
"Groovy!\n",
"\n",
"One thing to note here is how much of the data cleaning and transformation we ended up doing in Python. In theory one probably *could* do this in Altair/vega-lite, but not without a lot of headache and in Python, we have the option of checking each \"stage\" of our data transformation so we can make sure it makes sense -- in vega-lite/Altair, we don't really have this option (as easily)."
]
},
{
"cell_type": "markdown",
"id": "e385a097",
"metadata": {},
"source": [
"### Corgi data and choropleth\n",
"\n",
"One final thing (well, not final final, there are infinite things we can do!) is to instead of plotting points on a map, we can color the a map of the world by the population of corgis at a particular time. \n",
"\n",
"This is called a [choropleth map](https://altair-viz.github.io/gallery/choropleth.html), and this is probably the last time I will EVER spell that correctly :D \n",
"\n",
"These can be [a little tricky in Altair](https://altair-viz.github.io/altair-tutorial/notebooks/09-Geographic-plots.html#colored-choropleths) since you have to map between pre-determined names of countries (as stored in the vegadataset world map) and however your data is stored."
]
},
{
"cell_type": "markdown",
"id": "a5583f3d",
"metadata": {},
"source": [
"Let's start with just coloring our mappable data based on the total corgis born. First, based on the [documentation about how to do this](https://altair-viz.github.io/altair-tutorial/notebooks/09-Geographic-plots.html#colored-choropleths) we know that we have to match up the world-map ID with whatever ID for each country as listed in our dataset. There are also some [other transformation-related things to be aware of](https://stackoverflow.com/questions/59224026/how-to-add-a-slider-to-a-choropleth-in-altair) that we'll cover after we deal with the ID look up stuff.\n",
"\n",
"\n",
"Let's dig a bit deeper with geopandas:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "e2d642c1-cf9d-4473-8af9-771469e107b1",
"metadata": {},
"outputs": [],
"source": [
"# import fiona\n",
"# fiona.supported_drivers"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "4393182c",
"metadata": {},
"outputs": [],
"source": [
"import geopandas"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "f350e176",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json'"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.world_110m.url"
]
},
{
"cell_type": "code",
"execution_count": 73,
"id": "3b7784d1",
"metadata": {},
"outputs": [],
"source": [
"#gdf = geopandas.read_file(data.world_110m.url,include_fields=['name'],layer='countries', driver='GeoJSON')\n",
"gdf = geopandas.read_file(data.world_110m.url,layer='countries')"
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "822936ff",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" id \n",
" geometry \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 4 \n",
" POLYGON ((61.20961 35.64925, 62.23202 35.27011... \n",
" \n",
" \n",
" 1 \n",
" 24 \n",
" MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -... \n",
" \n",
" \n",
" 2 \n",
" 8 \n",
" POLYGON ((20.59041 41.85586, 20.46440 41.51565... \n",
" \n",
" \n",
" 3 \n",
" 784 \n",
" POLYGON ((51.57952 24.24479, 51.75592 24.29387... \n",
" \n",
" \n",
" 4 \n",
" 32 \n",
" MULTIPOLYGON (((-66.95887 -54.89756, -67.56368... \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 172 \n",
" 548 \n",
" MULTIPOLYGON (((167.51508 -16.59835, 167.18027... \n",
" \n",
" \n",
" 173 \n",
" 887 \n",
" POLYGON ((52.38592 16.38285, 52.19152 15.93771... \n",
" \n",
" \n",
" 174 \n",
" 710 \n",
" POLYGON ((28.21888 -32.77244, 27.46287 -33.227... \n",
" \n",
" \n",
" 175 \n",
" 894 \n",
" POLYGON ((32.75853 -9.23064, 33.23013 -9.67747... \n",
" \n",
" \n",
" 176 \n",
" 716 \n",
" POLYGON ((31.19251 -22.25149, 30.65971 -22.151... \n",
" \n",
" \n",
"
\n",
"
177 rows × 2 columns
\n",
"
"
],
"text/plain": [
" id geometry\n",
"0 4 POLYGON ((61.20961 35.64925, 62.23202 35.27011...\n",
"1 24 MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -...\n",
"2 8 POLYGON ((20.59041 41.85586, 20.46440 41.51565...\n",
"3 784 POLYGON ((51.57952 24.24479, 51.75592 24.29387...\n",
"4 32 MULTIPOLYGON (((-66.95887 -54.89756, -67.56368...\n",
".. ... ...\n",
"172 548 MULTIPOLYGON (((167.51508 -16.59835, 167.18027...\n",
"173 887 POLYGON ((52.38592 16.38285, 52.19152 15.93771...\n",
"174 710 POLYGON ((28.21888 -32.77244, 27.46287 -33.227...\n",
"175 894 POLYGON ((32.75853 -9.23064, 33.23013 -9.67747...\n",
"176 716 POLYGON ((31.19251 -22.25149, 30.65971 -22.151...\n",
"\n",
"[177 rows x 2 columns]"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf"
]
},
{
"cell_type": "markdown",
"id": "2f520b97",
"metadata": {},
"source": [
"So, here we see that there is this ID -- this matches up with each country, for example:"
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "773ec570",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
" "
],
"text/plain": [
""
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf.iloc[0]['geometry']"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "894e919d",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
" "
],
"text/plain": [
""
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf.iloc[1]['geometry']"
]
},
{
"cell_type": "markdown",
"id": "229e1e69",
"metadata": {},
"source": [
"Are both countries! But how to find out which ones?"
]
},
{
"cell_type": "markdown",
"id": "f59b2829",
"metadata": {},
"source": [
"To do that, we have to map the ID's to their [world country codes](https://documentation-resources.opendatasoft.com/explore/dataset/natural-earth-countries-110m/information/). Luckily, that is [already done for us](https://github.com/alisle/world-110m-country-codes)."
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "33b6169f",
"metadata": {},
"outputs": [],
"source": [
"country_codes = pd.read_json('https://raw.githubusercontent.com/alisle/world-110m-country-codes/master/world-110m-country-codes.json')"
]
},
{
"cell_type": "code",
"execution_count": 78,
"id": "67f8b834",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" code \n",
" id \n",
" name \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" AF \n",
" 4 \n",
" Afghanistan \n",
" \n",
" \n",
" 1 \n",
" AL \n",
" 8 \n",
" Albania \n",
" \n",
" \n",
" 2 \n",
" DZ \n",
" 12 \n",
" Algeria \n",
" \n",
" \n",
" 3 \n",
" AO \n",
" 24 \n",
" Angola \n",
" \n",
" \n",
" 4 \n",
" AQ \n",
" 10 \n",
" Antarctica \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" code id name\n",
"0 AF 4 Afghanistan\n",
"1 AL 8 Albania\n",
"2 DZ 12 Algeria\n",
"3 AO 24 Angola\n",
"4 AQ 10 Antarctica"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"country_codes.head()"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "9815d0cf",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" code \n",
" name \n",
" \n",
" \n",
" id \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 4 \n",
" AF \n",
" Afghanistan \n",
" \n",
" \n",
" 8 \n",
" AL \n",
" Albania \n",
" \n",
" \n",
" 12 \n",
" DZ \n",
" Algeria \n",
" \n",
" \n",
" 24 \n",
" AO \n",
" Angola \n",
" \n",
" \n",
" 10 \n",
" AQ \n",
" Antarctica \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" code name\n",
"id \n",
"4 AF Afghanistan\n",
"8 AL Albania\n",
"12 DZ Algeria\n",
"24 AO Angola\n",
"10 AQ Antarctica"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"country_codes = country_codes.set_index('id')\n",
"country_codes.head()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "d8faba63",
"metadata": {},
"outputs": [],
"source": [
"gdf['id'] = gdf['id'].astype('int64')"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "030f9911",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 4\n",
"1 24\n",
"2 8\n",
"3 784\n",
"4 32\n",
"Name: id, dtype: int64"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf['id'].head()"
]
},
{
"cell_type": "markdown",
"id": "a8009388",
"metadata": {},
"source": [
"Let's merge these two tables together by the country ID so that we can get the name."
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "7f1b4a8a",
"metadata": {},
"outputs": [],
"source": [
"gdf_comb = gdf.join(country_codes, on='id')"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "3387468c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" id \n",
" geometry \n",
" code \n",
" name \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 4 \n",
" POLYGON ((61.20961 35.64925, 62.23202 35.27011... \n",
" AF \n",
" Afghanistan \n",
" \n",
" \n",
" 1 \n",
" 24 \n",
" MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -... \n",
" AO \n",
" Angola \n",
" \n",
" \n",
" 2 \n",
" 8 \n",
" POLYGON ((20.59041 41.85586, 20.46440 41.51565... \n",
" AL \n",
" Albania \n",
" \n",
" \n",
" 3 \n",
" 784 \n",
" POLYGON ((51.57952 24.24479, 51.75592 24.29387... \n",
" AE \n",
" United Arab Emirates \n",
" \n",
" \n",
" 4 \n",
" 32 \n",
" MULTIPOLYGON (((-66.95887 -54.89756, -67.56368... \n",
" AR \n",
" Argentina \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id geometry code \\\n",
"0 4 POLYGON ((61.20961 35.64925, 62.23202 35.27011... AF \n",
"1 24 MULTIPOLYGON (((23.91324 -10.92658, 24.01764 -... AO \n",
"2 8 POLYGON ((20.59041 41.85586, 20.46440 41.51565... AL \n",
"3 784 POLYGON ((51.57952 24.24479, 51.75592 24.29387... AE \n",
"4 32 MULTIPOLYGON (((-66.95887 -54.89756, -67.56368... AR \n",
"\n",
" name \n",
"0 Afghanistan \n",
"1 Angola \n",
"2 Albania \n",
"3 United Arab Emirates \n",
"4 Argentina "
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf_comb.head()"
]
},
{
"cell_type": "markdown",
"id": "315eb159",
"metadata": {},
"source": [
"Let's re-load our data to start from \"clean\":"
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "c63d4c21",
"metadata": {},
"outputs": [],
"source": [
"corgs = pd.read_csv('https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/corgs_per_country_over_time_columns_2020.csv')"
]
},
{
"cell_type": "markdown",
"id": "e02e38be",
"metadata": {},
"source": [
"Clean this data with known naming issues:"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "67c50b6f",
"metadata": {},
"outputs": [],
"source": [
"corg_clean = corgs.rename(columns={'Netherlands Antilles':\"Netherlands\"})"
]
},
{
"cell_type": "markdown",
"id": "ee5cc513",
"metadata": {},
"source": [
"And transpose it:"
]
},
{
"cell_type": "code",
"execution_count": 88,
"id": "98adf91d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" Country \n",
" 1917 \n",
" 1918 \n",
" 1919 \n",
" 1920 \n",
" 1921 \n",
" 1922 \n",
" 1923 \n",
" 1924 \n",
" 1925 \n",
" ... \n",
" 2011 \n",
" 2012 \n",
" 2013 \n",
" 2014 \n",
" 2015 \n",
" 2016 \n",
" 2017 \n",
" 2018 \n",
" 2019 \n",
" 2020 \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 408 \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 89 \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 105 columns
\n",
"
"
],
"text/plain": [
"Years Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 \\\n",
"0 United States 0 0 0 0 0 0 0 0 0 \n",
"1 Brazil 0 0 0 0 0 0 0 0 0 \n",
"2 Russia 0 0 0 0 0 0 0 0 0 \n",
"3 Japan 0 0 0 0 0 0 0 0 0 \n",
"4 Vietnam 0 0 0 0 0 0 0 0 0 \n",
"\n",
"Years ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 \n",
"0 ... 408 431 376 280 301 229 213 198 118 2 \n",
"1 ... 0 0 0 0 0 0 0 0 0 0 \n",
"2 ... 89 82 115 127 237 217 260 329 189 16 \n",
"3 ... 0 0 0 0 0 2 2 0 0 0 \n",
"4 ... 0 0 0 0 0 0 1 3 0 0 \n",
"\n",
"[5 rows x 105 columns]"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean_t = corg_clean.set_index('Years').T\n",
"corg_clean_t.index.name = 'Country'\n",
"corg_clean2 = corg_clean_t.reset_index()\n",
"corg_clean2.head()"
]
},
{
"cell_type": "markdown",
"id": "3e3a2ed6",
"metadata": {},
"source": [
"We could turn this into a cumulative sum, but instead, let's leave as is this time and focus on making a chorolpeth plot that shows the corgis born *in that year* instead of a cumulative sum."
]
},
{
"cell_type": "markdown",
"id": "ca5cd0ad",
"metadata": {},
"source": [
"Now let's see how many of these columns match up with the names that we have in our corgi table:"
]
},
{
"cell_type": "code",
"execution_count": 89,
"id": "b624152c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['United States', 'Brazil', 'Russia', 'Japan', 'Vietnam', 'Germany',\n",
" 'France', 'United Kingdom', 'Italy', 'South Africa', 'Ukraine',\n",
" 'Spain', 'Poland', 'Canada', 'Korea, North', 'Romania',\n",
" 'Australia', 'Portugal', 'Belgium', 'Czech Republic', 'Hungary',\n",
" 'Belarus', 'Sweden', 'Austria', 'Switzerland', 'Israel', 'Serbia',\n",
" 'Denmark', 'Finland', 'Norway', 'Croatia', 'New Zealand',\n",
" 'Ireland', 'Lithuania', 'Uruguay', 'Latvia', 'Slovenia', 'Estonia',\n",
" 'Netherlands', 'Kosovo'], dtype=object)"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2['Country'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "6c382c1c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"no Russia\n",
"no Vietnam\n",
"no Korea, North\n",
"no Kosovo\n"
]
}
],
"source": [
"for c in corg_clean2['Country'].unique():\n",
" if c not in gdf_comb['name'].values:\n",
" print('no',c)"
]
},
{
"cell_type": "markdown",
"id": "d572b6ca",
"metadata": {},
"source": [
"So, once again, it seems that we are missing some names. Let's see if we can get close with some fuzzy matching:"
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "52b28496",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Afghanistan', 'Angola', 'Albania', 'United Arab Emirates',\n",
" 'Argentina', 'Armenia', 'Antarctica',\n",
" 'French Southern Territories', 'Australia', 'Austria',\n",
" 'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso',\n",
" 'Bangladesh', 'Bulgaria', 'Bahamas', 'Bosnia and Herzegovina',\n",
" 'Belarus', 'Belize', 'Bolivia, Plurinational State of', 'Brazil',\n",
" 'Brunei Darussalam', 'Bhutan', 'Botswana',\n",
" 'Central African Republic', 'Canada', 'Switzerland', 'Chile',\n",
" 'China', \"Cote d'Ivoire\", 'Cameroon',\n",
" 'Congo, the Democratic Republic of the', 'Congo', 'Colombia',\n",
" 'Costa Rica', 'Cuba', nan, 'Cyprus', 'Czech Republic', 'Germany',\n",
" 'Djibouti', 'Denmark', 'Dominican Republic', 'Algeria', 'Ecuador',\n",
" 'Egypt', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia', 'Finland',\n",
" 'Fiji', 'Falkland Islands (Malvinas)', 'France', 'Gabon',\n",
" 'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia',\n",
" 'Guinea-Bissau', 'Equatorial Guinea', 'Greece', 'Greenland',\n",
" 'Guatemala', 'Guyana', 'Honduras', 'Croatia', 'Haiti', 'Hungary',\n",
" 'Indonesia', 'India', 'Ireland', 'Iran, Islamic Republic of',\n",
" 'Iraq', 'Iceland', 'Israel', 'Italy', 'Jamaica', 'Jordan', 'Japan',\n",
" 'Kazakhstan', 'Kenya', 'Kyrgyzstan', 'Cambodia',\n",
" 'Korea, Republic of', nan, 'Kuwait',\n",
" \"Lao People's Democratic Republic\", 'Lebanon', 'Liberia', 'Libya',\n",
" 'Sri Lanka', 'Lesotho', 'Lithuania', 'Luxembourg', 'Latvia',\n",
" 'Morocco', 'Moldova, Republic of', 'Madagascar', 'Mexico',\n",
" 'Macedonia, the former Yugoslav Republic of', 'Mali', 'Myanmar',\n",
" 'Montenegro', 'Mongolia', 'Mozambique', 'Mauritania', 'Malawi',\n",
" 'Malaysia', 'Namibia', 'New Caledonia', 'Niger', 'Nigeria',\n",
" 'Nicaragua', 'Netherlands', 'Norway', 'Nepal', 'New Zealand',\n",
" 'Oman', 'Pakistan', 'Panama', 'Peru', 'Philippines',\n",
" 'Papua New Guinea', 'Poland', 'Puerto Rico',\n",
" \"Korea, Democratic People's Republic of\", 'Portugal', 'Paraguay',\n",
" nan, 'Qatar', 'Romania', 'Russian Federation', 'Rwanda',\n",
" 'Western Sahara', 'Saudi Arabia', 'Sudan', 'South Sudan',\n",
" 'Senegal', 'Solomon Islands', 'Sierra Leone', 'El Salvador', nan,\n",
" 'Somalia', 'Serbia', 'Suriname', 'Slovakia', 'Slovenia', 'Sweden',\n",
" 'Swaziland', 'Syrian Arab Republic', 'Chad', 'Togo', 'Thailand',\n",
" 'Tajikistan', 'Turkmenistan', 'Timor-Leste', 'Trinidad and Tobago',\n",
" 'Tunisia', 'Turkey', 'Taiwan, Province of China',\n",
" 'Tanzania, United Republic of', 'Uganda', 'Ukraine', 'Uruguay',\n",
" 'United States', 'Uzbekistan', 'Venezuela, Bolivarian Republic of',\n",
" 'Viet Nam', 'Vanuatu', 'Yemen', 'South Africa', 'Zambia',\n",
" 'Zimbabwe'], dtype=object)"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf_comb['name'].values"
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "0bcec1ef",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" id \n",
" geometry \n",
" code \n",
" name \n",
" \n",
" \n",
" \n",
" \n",
" 168 \n",
" 840 \n",
" MULTIPOLYGON (((-155.68896 18.91661, -155.9373... \n",
" US \n",
" United States \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id geometry code \\\n",
"168 840 MULTIPOLYGON (((-155.68896 18.91661, -155.9373... US \n",
"\n",
" name \n",
"168 United States "
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf_comb.loc[gdf_comb['name']=='United States']"
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "69131f0a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Russia Russian Federation\n",
"found ['Russian Federation'] for Russia\n",
"Vietnam Viet Nam\n",
"found ['Viet Nam'] for Vietnam\n",
"no match Korea, North\n",
"no match Kosovo\n"
]
}
],
"source": [
"for c in corg_clean2['Country'].unique():\n",
" if c not in gdf_comb['name'].values: # if not in there, look for fuzzy\n",
" #print('no',c)\n",
" country_match = []\n",
" for cc in gdf_comb['name'].values:\n",
" #if c in cc: # there is an NaN\n",
" if type(cc)==str:\n",
" c2 = \"\".join(c.split()).lower()\n",
" cc2 = \"\".join(cc.split()).lower()\n",
" if c2 in cc2:\n",
" country_match.append(cc)\n",
" print(c,cc)\n",
" if len(country_match) >0:\n",
" print('found', country_match, 'for',c)\n",
" else:\n",
" print('no match',c)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 94,
"id": "c4cdfe95",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"found ['Russian Federation'] for Russia\n",
"found ['Viet Nam'] for Vietnam\n",
"no match Korea, North\n",
"no match Kosovo\n"
]
}
],
"source": [
"# store these names to be the same as in our dataset\n",
"for c in corg_clean2['Country'].unique():\n",
" if c not in gdf_comb['name'].values: # if not in there, look for fuzzy\n",
" #print('no',c)\n",
" country_match = []\n",
" for cc in gdf_comb['name'].values:\n",
" #if c in cc: # there is an NaN\n",
" if type(cc)==str:\n",
" c2 = \"\".join(c.split()).lower()\n",
" cc2 = \"\".join(cc.split()).lower()\n",
" if c2 in cc2:\n",
" country_match.append(cc)\n",
" #print(c,cc)\n",
" if len(country_match) >0:\n",
" print('found', country_match, 'for',c)\n",
" if len(country_match)==1: # only one\n",
" gdf_comb.loc[gdf_comb['name']==country_match[0],'name'] = c # replace\n",
" else:\n",
" print('no match',c)"
]
},
{
"cell_type": "markdown",
"id": "7a2cca7a",
"metadata": {},
"source": [
"Missing ids for North Korea and Kosovo, in our original corgi dataset, but let's add the IDs that we can to our corgi dataset:"
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "f4485747-5bdf-4e6a-9700-8b707f47418e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Years \n",
" Country \n",
" 1917 \n",
" 1918 \n",
" 1919 \n",
" 1920 \n",
" 1921 \n",
" 1922 \n",
" 1923 \n",
" 1924 \n",
" 1925 \n",
" ... \n",
" 2011 \n",
" 2012 \n",
" 2013 \n",
" 2014 \n",
" 2015 \n",
" 2016 \n",
" 2017 \n",
" 2018 \n",
" 2019 \n",
" 2020 \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 408 \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 89 \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
5 rows × 105 columns
\n",
"
"
],
"text/plain": [
"Years Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 \\\n",
"0 United States 0 0 0 0 0 0 0 0 0 \n",
"1 Brazil 0 0 0 0 0 0 0 0 0 \n",
"2 Russia 0 0 0 0 0 0 0 0 0 \n",
"3 Japan 0 0 0 0 0 0 0 0 0 \n",
"4 Vietnam 0 0 0 0 0 0 0 0 0 \n",
"\n",
"Years ... 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 \n",
"0 ... 408 431 376 280 301 229 213 198 118 2 \n",
"1 ... 0 0 0 0 0 0 0 0 0 0 \n",
"2 ... 89 82 115 127 237 217 260 329 189 16 \n",
"3 ... 0 0 0 0 0 2 2 0 0 0 \n",
"4 ... 0 0 0 0 0 0 1 3 0 0 \n",
"\n",
"[5 rows x 105 columns]"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_clean2.head()"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "6c1a1dd4",
"metadata": {},
"outputs": [],
"source": [
"#corg_clean2.join?"
]
},
{
"cell_type": "code",
"execution_count": 97,
"id": "3de8ad00",
"metadata": {},
"outputs": [],
"source": [
"corg_join = corg_clean2.reset_index().merge(gdf_comb,right_on='name',left_on='Country')"
]
},
{
"cell_type": "code",
"execution_count": 98,
"id": "abeb7b75",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" index \n",
" Country \n",
" 1917 \n",
" 1918 \n",
" 1919 \n",
" 1920 \n",
" 1921 \n",
" 1922 \n",
" 1923 \n",
" 1924 \n",
" ... \n",
" 2015 \n",
" 2016 \n",
" 2017 \n",
" 2018 \n",
" 2019 \n",
" 2020 \n",
" id \n",
" geometry \n",
" code \n",
" name \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 840 \n",
" MULTIPOLYGON (((-155.68896 18.91661, -155.9373... \n",
" US \n",
" United States \n",
" \n",
" \n",
" 1 \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 76 \n",
" POLYGON ((-57.62398 -30.21667, -56.29196 -28.8... \n",
" BR \n",
" Brazil \n",
" \n",
" \n",
" 2 \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 643 \n",
" MULTIPOLYGON (((142.09162 45.96708, 141.90802 ... \n",
" RU \n",
" Russia \n",
" \n",
" \n",
" 3 \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 392 \n",
" MULTIPOLYGON (((133.01593 32.70419, 132.36432 ... \n",
" JP \n",
" Japan \n",
" \n",
" \n",
" 4 \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 704 \n",
" POLYGON ((108.04968 21.55193, 106.71407 20.697... \n",
" VN \n",
" Vietnam \n",
" \n",
" \n",
"
\n",
"
5 rows × 110 columns
\n",
"
"
],
"text/plain": [
" index Country 1917 1918 1919 1920 1921 1922 1923 1924 ... \\\n",
"0 0 United States 0 0 0 0 0 0 0 0 ... \n",
"1 1 Brazil 0 0 0 0 0 0 0 0 ... \n",
"2 2 Russia 0 0 0 0 0 0 0 0 ... \n",
"3 3 Japan 0 0 0 0 0 0 0 0 ... \n",
"4 4 Vietnam 0 0 0 0 0 0 0 0 ... \n",
"\n",
" 2015 2016 2017 2018 2019 2020 id \\\n",
"0 301 229 213 198 118 2 840 \n",
"1 0 0 0 0 0 0 76 \n",
"2 237 217 260 329 189 16 643 \n",
"3 0 2 2 0 0 0 392 \n",
"4 0 0 1 3 0 0 704 \n",
"\n",
" geometry code name \n",
"0 MULTIPOLYGON (((-155.68896 18.91661, -155.9373... US United States \n",
"1 POLYGON ((-57.62398 -30.21667, -56.29196 -28.8... BR Brazil \n",
"2 MULTIPOLYGON (((142.09162 45.96708, 141.90802 ... RU Russia \n",
"3 MULTIPOLYGON (((133.01593 32.70419, 132.36432 ... JP Japan \n",
"4 POLYGON ((108.04968 21.55193, 106.71407 20.697... VN Vietnam \n",
"\n",
"[5 rows x 110 columns]"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join.head()"
]
},
{
"cell_type": "markdown",
"id": "72b6b71b",
"metadata": {},
"source": [
"Now that we have merged everything together, let's simplify this table a bit. We know that we want a year slider, so let's change our columns from datetime to years:"
]
},
{
"cell_type": "code",
"execution_count": 99,
"id": "3b4ca36c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" 1917 \n",
" 1918 \n",
" 1919 \n",
" 1920 \n",
" 1921 \n",
" 1922 \n",
" 1923 \n",
" 1924 \n",
" 1925 \n",
" ... \n",
" 2012 \n",
" 2013 \n",
" 2014 \n",
" 2015 \n",
" 2016 \n",
" 2017 \n",
" 2018 \n",
" 2019 \n",
" 2020 \n",
" id \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 840 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 76 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 643 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 392 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 704 \n",
" \n",
" \n",
"
\n",
"
5 rows × 106 columns
\n",
"
"
],
"text/plain": [
" Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 ... \\\n",
"0 United States 0 0 0 0 0 0 0 0 0 ... \n",
"1 Brazil 0 0 0 0 0 0 0 0 0 ... \n",
"2 Russia 0 0 0 0 0 0 0 0 0 ... \n",
"3 Japan 0 0 0 0 0 0 0 0 0 ... \n",
"4 Vietnam 0 0 0 0 0 0 0 0 0 ... \n",
"\n",
" 2012 2013 2014 2015 2016 2017 2018 2019 2020 id \n",
"0 431 376 280 301 229 213 198 118 2 840 \n",
"1 0 0 0 0 0 0 0 0 0 76 \n",
"2 82 115 127 237 217 260 329 189 16 643 \n",
"3 0 0 0 0 2 2 0 0 0 392 \n",
"4 0 0 0 0 0 1 3 0 0 704 \n",
"\n",
"[5 rows x 106 columns]"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join_small = corg_join.copy()\n",
"corg_join_small = corg_join_small.drop(['index','code','name','geometry'],axis=1)\n",
"corg_join_small.head()"
]
},
{
"cell_type": "markdown",
"id": "4f9f28d7",
"metadata": {},
"source": [
"Before making this interactive in anyway, let's first make a static plot. This will help us explore [some of the transformations that we need to do](https://stackoverflow.com/questions/59224026/how-to-add-a-slider-to-a-choropleth-in-altair) to get this whole thing to work."
]
},
{
"cell_type": "markdown",
"id": "f6a61aa3",
"metadata": {},
"source": [
"Let's start with a black-background map:"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "18b8427a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"world = alt.topo_feature(data.world_110m.url, \"countries\")\n",
"\n",
"background = alt.Chart(world).mark_geoshape(fill=\"black\").project(\n",
" \"equirectangular\"\n",
")\n",
"background"
]
},
{
"cell_type": "markdown",
"id": "69162b6d",
"metadata": {},
"source": [
"Let's add the forground, and in this case, just pick one year to plot. To select this year, let's first change the type of our columns:"
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "d6fb7a4c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Country', 1917, 1918, 1919, 1920, 1921,\n",
" 1922, 1923, 1924, 1925,\n",
" ...\n",
" 2012, 2013, 2014, 2015, 2016, 2017,\n",
" 2018, 2019, 2020, 'id'],\n",
" dtype='object', length=106)"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join_small.columns"
]
},
{
"cell_type": "code",
"execution_count": 103,
"id": "85f93c64",
"metadata": {},
"outputs": [],
"source": [
"corg_join_small.columns = corg_join_small.columns.astype('str')"
]
},
{
"cell_type": "code",
"execution_count": 104,
"id": "9c79d7ca",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Country', '1917', '1918', '1919', '1920', '1921', '1922', '1923',\n",
" '1924', '1925',\n",
" ...\n",
" '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020',\n",
" 'id'],\n",
" dtype='object', length=106)"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join_small.columns"
]
},
{
"cell_type": "code",
"execution_count": 105,
"id": "5850ee25",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" 2019 \n",
" id \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 118 \n",
" 840 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 76 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 189 \n",
" 643 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 392 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 704 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country 2019 id\n",
"0 United States 118 840\n",
"1 Brazil 0 76\n",
"2 Russia 189 643\n",
"3 Japan 0 392\n",
"4 Vietnam 0 704"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join_small_2019 = corg_join_small[['Country','2019','id']]\n",
"corg_join_small_2019.head()"
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "3cf73021",
"metadata": {},
"outputs": [],
"source": [
"columns = ['2019']"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "7d407956",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join_small_2019, 'id', columns)\n",
").encode(\n",
" color = alt.Color('2019:Q', scale=alt.Scale(scheme='blues'))\n",
")\n",
"background+foreground"
]
},
{
"cell_type": "markdown",
"id": "a8c16888",
"metadata": {},
"source": [
"Next, let's use a [transform_fold](https://altair-viz.github.io/user_guide/transform/fold.html) operation to \"unwrap\" our data into a `year` variable and a `Corgs_Born` variable. This is very similar to the \"melting\" operations we did above, but now, it will be done \"on the fly\" in vega-lite/Altair with each year selection.\n",
"\n",
"To do that, we need to specify which columns to use. In our current case, we just have the 2019 column as what we want to transform by \"folding\":"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "83bb3b4d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join_small_2019, 'id', columns)\n",
").transform_fold( # this is so that we can do continuous updates\n",
" columns, as_=['year', 'Corgs_Born']\n",
").encode(\n",
" color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='blues'))\n",
")\n",
"background+foreground"
]
},
{
"cell_type": "markdown",
"id": "f47ed4e2",
"metadata": {},
"source": [
"So, this works for one year, but we want multiple years. For this we need a slider like we had above:"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "f92d367f",
"metadata": {},
"outputs": [],
"source": [
"slider = alt.binding_range(min=1917, max=2020, step=1, name='Year: ')\n",
"# select_year = alt.selection_single(name=\"year\", fields=['year'], # will operate on a \"year\" variable calculated \"on the fly\"\n",
"# bind=slider, init={'year': 2006})\n",
"\n",
"select_year = alt.selection_point(name=\"year\", fields=['year'], # will operate on a \"year\" variable calculated \"on the fly\"\n",
" bind=slider, value = 2006)"
]
},
{
"cell_type": "markdown",
"id": "0f57cf59",
"metadata": {},
"source": [
"Which years columns to use now? Well, all of them!"
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "e84f9357",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['1917', '1918', '1919', '1920', '1921'], dtype='\n",
" #altair-viz-e346a66c07fb4af2b16ad0fa04037d27.vega-embed {\n",
" width: 100%;\n",
" display: flex;\n",
" }\n",
"\n",
" #altair-viz-e346a66c07fb4af2b16ad0fa04037d27.vega-embed details,\n",
" #altair-viz-e346a66c07fb4af2b16ad0fa04037d27.vega-embed details summary {\n",
" position: relative;\n",
" }\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join_small, 'id', columns)\n",
").transform_fold( # this is so that we can do continuous updates\n",
" columns, as_=['year', 'Corgs_Born']\n",
").encode(\n",
" color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='blues'))\n",
").add_params(\n",
" select_year\n",
").transform_filter(\n",
" select_year\n",
")\n",
"background+foreground"
]
},
{
"cell_type": "markdown",
"id": "ad653963",
"metadata": {},
"source": [
"But wait! That doesn't work how we think it should... as we can see by the colorbar -- we are getting NaN's. That is because as discussed above, the slider is expecting integers and `year` is a string. We got around this before by having an `year_int` column, but now we want to do this \"on the fly\" -- we can accomplish this by adding another transformation to our Altair specs:"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "ab8dd3d4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join_small, 'id', columns)\n",
").transform_fold( # this is so that we can do continuous updates\n",
" columns, as_=['year', 'Corgs_Born']\n",
").transform_calculate(\n",
" year='parseInt(datum.year)',\n",
").encode(\n",
" color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='blues'))\n",
").add_params(\n",
" select_year\n",
").transform_filter(\n",
" select_year\n",
")\n",
"background+foreground"
]
},
{
"cell_type": "markdown",
"id": "8a8d9471",
"metadata": {},
"source": [
"Nice! One last thing -- let's make it a bit bigger and maybe change the colormap:"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "f7424df9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join_small, 'id', columns)\n",
").transform_fold( # this is so that we can do continuous updates\n",
" columns, as_=['year', 'Corgs_Born']\n",
").transform_calculate(\n",
" year='parseInt(datum.year)',\n",
").encode(\n",
" color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='viridis')),\n",
").add_params(\n",
" select_year\n",
").transform_filter(\n",
" select_year\n",
")\n",
"choro_corg = (background+foreground).properties(\n",
" width=700,\n",
" height=400\n",
")\n",
"choro_corg"
]
},
{
"cell_type": "markdown",
"id": "5d34ae4f",
"metadata": {},
"source": [
"I lied! One last-last thing, [let's add in our tooltip](https://stackoverflow.com/questions/65987572/adding-dynamic-tooltip-and-title-to-choropleth-with-slider-in-altair):"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "94c06b48-5940-4f32-8747-13df7fa4449f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['1917', '1918', '1919', '1920', '1921', '1922', '1923', '1924',\n",
" '1925', '1926', '1927', '1928', '1929', '1930', '1931', '1932',\n",
" '1933', '1934', '1935', '1936', '1937', '1938', '1939', '1940',\n",
" '1941', '1942', '1943', '1944', '1945', '1946', '1947', '1948',\n",
" '1949', '1950', '1951', '1952', '1953', '1954', '1955', '1956',\n",
" '1957', '1958', '1959', '1960', '1961', '1962', '1963', '1964',\n",
" '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972',\n",
" '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980',\n",
" '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988',\n",
" '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996',\n",
" '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',\n",
" '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',\n",
" '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020',\n",
" '2021'], dtype='\n",
" #altair-viz-edc2acd6264e4ec29396cd0d075d5e84.vega-embed {\n",
" width: 100%;\n",
" display: flex;\n",
" }\n",
"\n",
" #altair-viz-edc2acd6264e4ec29396cd0d075d5e84.vega-embed details,\n",
" #altair-viz-edc2acd6264e4ec29396cd0d075d5e84.vega-embed details summary {\n",
" position: relative;\n",
" }\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join_small, 'id', columns.tolist()+['Country']) # add country\n",
").transform_fold( # this is so that we can do continuous updates\n",
" columns, as_=['year', 'Corgs_Born'] # add in country\n",
").transform_calculate(\n",
" year='parseInt(datum.year)',\n",
").encode(\n",
" color = alt.Color('Corgs_Born:Q', scale=alt.Scale(scheme='viridis')),\n",
" tooltip=['Country:N','Corgs_Born:Q'] # data types might not be necessary\n",
").add_params(\n",
" select_year\n",
").transform_filter(\n",
" select_year\n",
")\n",
"choro_corg = (background+foreground).properties(\n",
" width=700,\n",
" height=400\n",
")\n",
"choro_corg"
]
},
{
"cell_type": "markdown",
"id": "33e22cb0",
"metadata": {},
"source": [
"Let's save it with a contain width tag!"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "ecaab709",
"metadata": {},
"outputs": [],
"source": [
"chart_out = background+foreground\n",
"\n",
"chart_out.properties(width='container').save(myJekyllDir+\"corgis_choro_world.json\") "
]
},
{
"cell_type": "markdown",
"id": "450fa51e",
"metadata": {},
"source": [
"## Optional:\n",
"\n",
"Maybe try saving with HTML just as an example? https://altair-viz.github.io/user_guide/saving_charts.html"
]
},
{
"cell_type": "markdown",
"id": "927b1453",
"metadata": {},
"source": [
"https://stackoverflow.com/questions/59224026/how-to-add-a-slider-to-a-choropleth-in-altair"
]
},
{
"cell_type": "markdown",
"id": "23d6f729",
"metadata": {},
"source": [
"## Clickable map\n",
"\n",
"What about using the map as the \"driver\" plot for a bar plot of the number of corgis born over time?"
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "2308e4e2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" 1917 \n",
" 1918 \n",
" 1919 \n",
" 1920 \n",
" 1921 \n",
" 1922 \n",
" 1923 \n",
" 1924 \n",
" 1925 \n",
" ... \n",
" 2012 \n",
" 2013 \n",
" 2014 \n",
" 2015 \n",
" 2016 \n",
" 2017 \n",
" 2018 \n",
" 2019 \n",
" 2020 \n",
" id \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 431 \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 840 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 76 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 82 \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 643 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 392 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 704 \n",
" \n",
" \n",
"
\n",
"
5 rows × 106 columns
\n",
"
"
],
"text/plain": [
" Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 ... \\\n",
"0 United States 0 0 0 0 0 0 0 0 0 ... \n",
"1 Brazil 0 0 0 0 0 0 0 0 0 ... \n",
"2 Russia 0 0 0 0 0 0 0 0 0 ... \n",
"3 Japan 0 0 0 0 0 0 0 0 0 ... \n",
"4 Vietnam 0 0 0 0 0 0 0 0 0 ... \n",
"\n",
" 2012 2013 2014 2015 2016 2017 2018 2019 2020 id \n",
"0 431 376 280 301 229 213 198 118 2 840 \n",
"1 0 0 0 0 0 0 0 0 0 76 \n",
"2 82 115 127 237 217 260 329 189 16 643 \n",
"3 0 0 0 0 2 2 0 0 0 392 \n",
"4 0 0 0 0 0 1 3 0 0 704 \n",
"\n",
"[5 rows x 106 columns]"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join_small.head()"
]
},
{
"cell_type": "markdown",
"id": "6ca1ac62",
"metadata": {},
"source": [
"Let's sum up all:"
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "54f31f28",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0 16130\n",
"1 1\n",
"2 1834\n",
"3 7\n",
"4 4\n",
"5 892\n",
"6 597\n",
"7 2649\n",
"8 106\n",
"9 28\n",
"10 19\n",
"11 5\n",
"12 859\n",
"13 391\n",
"14 7\n",
"15 891\n",
"16 7\n",
"17 110\n",
"18 271\n",
"19 8\n",
"20 38\n",
"21 2008\n",
"22 29\n",
"23 30\n",
"24 32\n",
"25 2\n",
"26 2176\n",
"27 4051\n",
"28 1077\n",
"29 21\n",
"30 1097\n",
"31 29\n",
"32 29\n",
"33 2\n",
"34 3\n",
"35 1\n",
"36 97\n",
"37 536\n",
"dtype: int64"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join_small.loc[:,(corg_join_small.columns!='Country')&(corg_join_small.columns!='id')].sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 122,
"id": "cde37acd",
"metadata": {},
"outputs": [],
"source": [
"corg_join = corg_join_small.copy()\n",
"corg_join['Total Corg'] = corg_join_small.loc[:,(corg_join_small.columns!='Country')&(corg_join_small.columns!='id')].sum(axis=1).values"
]
},
{
"cell_type": "code",
"execution_count": 123,
"id": "13292731",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" 1917 \n",
" 1918 \n",
" 1919 \n",
" 1920 \n",
" 1921 \n",
" 1922 \n",
" 1923 \n",
" 1924 \n",
" 1925 \n",
" ... \n",
" 2013 \n",
" 2014 \n",
" 2015 \n",
" 2016 \n",
" 2017 \n",
" 2018 \n",
" 2019 \n",
" 2020 \n",
" id \n",
" Total Corg \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 376 \n",
" 280 \n",
" 301 \n",
" 229 \n",
" 213 \n",
" 198 \n",
" 118 \n",
" 2 \n",
" 840 \n",
" 16130 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 76 \n",
" 1 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 115 \n",
" 127 \n",
" 237 \n",
" 217 \n",
" 260 \n",
" 329 \n",
" 189 \n",
" 16 \n",
" 643 \n",
" 1834 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 2 \n",
" 2 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 392 \n",
" 7 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" ... \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 0 \n",
" 1 \n",
" 3 \n",
" 0 \n",
" 0 \n",
" 704 \n",
" 4 \n",
" \n",
" \n",
"
\n",
"
5 rows × 107 columns
\n",
"
"
],
"text/plain": [
" Country 1917 1918 1919 1920 1921 1922 1923 1924 1925 ... \\\n",
"0 United States 0 0 0 0 0 0 0 0 0 ... \n",
"1 Brazil 0 0 0 0 0 0 0 0 0 ... \n",
"2 Russia 0 0 0 0 0 0 0 0 0 ... \n",
"3 Japan 0 0 0 0 0 0 0 0 0 ... \n",
"4 Vietnam 0 0 0 0 0 0 0 0 0 ... \n",
"\n",
" 2013 2014 2015 2016 2017 2018 2019 2020 id Total Corg \n",
"0 376 280 301 229 213 198 118 2 840 16130 \n",
"1 0 0 0 0 0 0 0 0 76 1 \n",
"2 115 127 237 217 260 329 189 16 643 1834 \n",
"3 0 0 0 2 2 0 0 0 392 7 \n",
"4 0 0 0 0 1 3 0 0 704 4 \n",
"\n",
"[5 rows x 107 columns]"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_join.head()"
]
},
{
"cell_type": "markdown",
"id": "e2b8811b",
"metadata": {},
"source": [
"First, let's make a cholorpeth plot:"
]
},
{
"cell_type": "code",
"execution_count": 130,
"id": "25258a1f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join, 'id', ['Country', 'Total Corg']) # add country\n",
").encode(\n",
" color = alt.Color('Total Corg:Q', scale=alt.Scale(scheme='viridis', type='log')),\n",
" tooltip=['Country:N','Total Corg:Q'] # data types might not be necessary\n",
")\n",
"\n",
"choro_corg = (background+foreground).properties(\n",
" width=700,\n",
" height=400\n",
")\n",
"choro_corg"
]
},
{
"cell_type": "code",
"execution_count": 131,
"id": "c5361147",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.LayerChart(...)"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"selector = alt.selection_point(fields=['id'])\n",
"\n",
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join, 'id', ['Country', 'Total Corg']) # add country\n",
").encode(\n",
" color = alt.Color('Total Corg:Q', scale=alt.Scale(scheme='viridis', type='log')),\n",
" tooltip=['Country:N','Total Corg:Q'] # data types might not be necessary\n",
").add_params(\n",
" selector\n",
")\n",
"\n",
"choro_corg = (background+foreground).properties(\n",
" width=700,\n",
" height=400\n",
")\n",
"choro_corg"
]
},
{
"cell_type": "code",
"execution_count": 137,
"id": "ecd1d09e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Country \n",
" id \n",
" Year \n",
" Born \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" United States \n",
" 840 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 1 \n",
" Brazil \n",
" 76 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 2 \n",
" Russia \n",
" 643 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 3 \n",
" Japan \n",
" 392 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
" 4 \n",
" Vietnam \n",
" 704 \n",
" 1917-01-01 \n",
" 0 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Country id Year Born\n",
"0 United States 840 1917-01-01 0\n",
"1 Brazil 76 1917-01-01 0\n",
"2 Russia 643 1917-01-01 0\n",
"3 Japan 392 1917-01-01 0\n",
"4 Vietnam 704 1917-01-01 0"
]
},
"execution_count": 137,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corg_melt2 = corg_join_small.melt(['Country','id'], \n",
" var_name='Year', value_name='Born')\n",
"corg_melt2['Year'] = pd.to_datetime(corg_melt2['Year'].astype('int'), format='%Y')\n",
"corg_melt2.head()"
]
},
{
"cell_type": "code",
"execution_count": 141,
"id": "a63c9c49",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 141,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"line_corg2 = alt.Chart(corg_melt2).mark_line().encode(\n",
" alt.X('Year:T'),\n",
" alt.Y('Born:Q', scale=alt.Scale(type='symlog')),\n",
" color='Country:O'\n",
")\n",
"line_corg2"
]
},
{
"cell_type": "code",
"execution_count": 150,
"id": "bc4de1cc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
""
],
"text/plain": [
"alt.HConcatChart(...)"
]
},
"execution_count": 150,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"selector = alt.selection_point(fields=['id'])\n",
"\n",
"foreground = alt.Chart(world).mark_geoshape(\n",
").transform_lookup(\n",
" lookup='id',\n",
" from_=alt.LookupData(corg_join, 'id', ['Country', 'Total Corg']) # add country\n",
").encode(\n",
" color = alt.Color('Total Corg:Q', scale=alt.Scale(scheme='viridis', type='log')),\n",
" tooltip=['Country:N','Total Corg:Q'] # data types might not be necessary\n",
").add_params(\n",
" selector\n",
")\n",
"\n",
"#line_corg2 = alt.Chart(corg_melt2).mark_line().encode(\n",
"line_corg2 = alt.Chart(corg_melt2).mark_bar().encode(\n",
" alt.X('Year:T'),\n",
" alt.Y('Born:Q', scale=alt.Scale(type='symlog')),\n",
" color='Country:O'\n",
").transform_filter(\n",
" selector\n",
").properties(\n",
" width=200\n",
")\n",
"\n",
"choro_corg = (background+foreground).properties(\n",
" width=500,\n",
" height=400\n",
")\n",
"chart = choro_corg | line_corg2\n",
"chart"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0f4f30a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}