# Week 13, Prep notebook, Part 1

Last week we talked about how Jekyll sites work overall, started to build some plots in Altair.  This week, we'll finish this up and then move on to some more complex plots.

We've learned how to include plots in 4 different ways:
1. saving JSON directly from the vega-editor
1. Copying vega-lite code from other sources (in our case Starboard) and using Altair to export it as JSON
1. Building dashboards with by combining vega-lite code from other sources with Altair layouts
1. Using Altair to make the plot with Python, with the data stored online

We'll now cover the final way:
1. Using Python for the data cleaning/transformation and Altair just for plotting and saving as vega-lite JSON

In [1]:
import altair as alt

## REVIEW: Use Altair to make the chart 

To build straight from Python, we need to read in the data first.  Before, we've been linking to online data, but the nice thing about Altair is that instead of doing data manipulations "on the fly" in vega-lite, we can potentially do them in Python and then save the modified data for our plot.

Before doing that though, let's see if we can translating from vega-lite style to "Altair-vega-lite style".  We can do this with our data still stored online at a url:

In [2]:
mobility_url = 'https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/mobility.csv'

Let's once again re-write `chart1`, but now transforming from the dictionary we have been passing to more "original Altair" formatting and passing data through Python:

Before doing any complicated binning stuff, let's start with a simple scatter plot in "Altair style":

In [3]:
scatter1 = alt.Chart(mobility_url).mark_point().encode(
    x='Mobility:Q', # "Q for quantiative"
    y='Population:Q',
)
scatter1

Note here we used `Q` for quantitative which is one of the [Altair Encoding Data Types](https://altair-viz.github.io/user_guide/encoding.html#encoding-data-types):

| Data Type | Shorthand Code | Description |
|-----------|----------------|-------------|
| quantitative | Q | a continuous real-valued quantity |
| ordinal | O | a discrete ordered quantity |
| nominal | N | a discrete unordered category |
| temporal | T | a time or date value |
| geojson | G | a geographic shape |

Also note that now we define what mark we are using, in this case `mark_point` after we define the "source" of the chart (in this case a URL).

Let's make this scatter plot a bit more complex by coloring by another quantitative variable, like `Income`:

In [4]:
scatter2 = alt.Chart(mobility_url).mark_point().encode(
    x='Mobility:Q', # "Q for quantiative"
    y='Population:Q',
    color=alt.Color('Income:Q')
)
scatter2

We can pick different color schemes in the same way that we use [Vega-lite colormaps](https://vega.github.io/vega/docs/schemes/#reference) and by specifying the scale for the color in Altair:

In [5]:
scatter3 = alt.Chart(mobility_url).mark_point().encode(
    x='Mobility:Q', # "Q for quantiative"
    y='Population:Q',
    color=alt.Color('Income:Q', scale=alt.Scale(scheme='viridis'))
)
scatter3

So, this is a bit more readable, however we still need to do a few things.  To practice binning, let's [bin our color](https://altair-viz.github.io/user_guide/transform/bin.html#bin-transforms) from our `Income` variable on the color bar and in our plot:

In [6]:
scatter4 = alt.Chart(mobility_url).mark_point().encode(
    x='Mobility:Q', # "Q for quantiative"
    y='Population:Q',
    color=alt.Color('Income:Q', scale=alt.Scale(scheme='viridis'),bin=alt.Bin(maxbins=5))
)
scatter4

For this, if we really want to highlight the bins, we probably want more hues, so let's change our colormap:

In [7]:
scatter5 = alt.Chart(mobility_url).mark_point().encode(
    x='Mobility:Q', # "Q for quantiative"
    y='Population:Q',
    color=alt.Color('Income:Q', scale=alt.Scale(scheme='sinebow'),bin=alt.Bin(maxbins=5))
)
scatter5

Groovy!  Also, it does look like we have a pretty big range in the `Population` parameter so maybe we want a log scale on our y-axis:

In [8]:
scatter6 = alt.Chart(mobility_url).mark_point().encode(
    x='Mobility:Q', # "Q for quantiative"
    #y='Population:Q',
    y=alt.Y('Population:Q', scale=alt.Scale(type='log')),
    color=alt.Color('Income:Q', scale=alt.Scale(scheme='sinebow'),bin=alt.Bin(maxbins=5))
)
scatter6

Nice!  Let's save that one:

In [10]:
myJekyllDir = '/Users/jnaiman/jnaiman.github.io/'

In [11]:
scatter6.save(myJekyllDir+"assets/json/population_scatter.json")

Now that we know a bit more about Altair-flavored vega-lite plots, let's try to remake our dashboard plot, but using only Altair-style.

Let's start with the first plot.

Our `from_dict` call looks like:

```javascript
chart1 = alt.Chart.from_dict({
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/mobility.csv"},
  "mark":"rect",
  "height":400,
  "encoding":{
    "x":{"bin":{"maxbins":10}, "field":"Student_teacher_ratio", "type":"quantitative"},
    "y":{"field":"State","type":"ordinal"},
    "color":{"aggregate":"count", "type":"quantitative"} 
  }  
})
```

We can use the [`mark_rect` Altair encoding example](https://altair-viz.github.io/gallery/interactive_cross_highlight.html#interactive-chart-with-cross-highlight) to build this plot:

In [12]:
chart1 = alt.Chart(mobility_url).mark_rect().encode(
    alt.X("Student_teacher_ratio:Q", bin=alt.Bin(maxbins=10)),
    alt.Y("State:O"),
    alt.Color("count()")
).properties(
   height=400
)
chart1

Neat!  Let's also re-make our second chart.  For reference we had the following `from_dict` call:

```javascript
chart2 = alt.Chart.from_dict({
  "data": {"url":"https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/mobility.csv"},
  "mark": "bar",
  "encoding":{
    "x":{"field":"Mobility", "type":"quantitative", "bin":True, "axis":{"title":"Mobility Score"}},
    "y":{"aggregate":"count","type":"quantitative", "axis":{"title":"Mobility Score Distribution"}}
  }
})
```

We can [bin using `mark_bar`](https://altair-viz.github.io/user_guide/transform/bin.html#bin-transforms).

In [13]:
chart2 = alt.Chart(mobility_url).mark_bar().encode(
    alt.X("Mobility:Q", bin=True,axis=alt.Axis(title='Mobility Score')),
    alt.Y('count()', axis=alt.Axis(title='Mobility Score Distribution'))
)
chart2

We can then put them side by side once more:

In [14]:
chart = (chart1.properties(width=300) | chart2.properties(width=300))

In [15]:
chart

Now, we can essentially use the code we used before to make these charts interactive with each other:

In [16]:
brush = alt.selection_interval(encodings=['x','y'])

chart1 = alt.Chart(mobility_url).mark_rect().encode(
    alt.X("Student_teacher_ratio:Q", bin=alt.Bin(maxbins=10)),
    alt.Y("State:O"),
    alt.Color("count()")
).properties(
   height=400
).add_selection(
        brush
)

chart2 = alt.Chart(mobility_url).mark_bar().encode(
    alt.X("Mobility:Q", bin=True,axis=alt.Axis(title='Mobility Score')),
    alt.Y('count()', axis=alt.Axis(title='Mobility Score Distribution'))
).transform_filter(
    brush
)

chart = (chart1.properties(width=300) | chart2.properties(width=300))

chart

Great!  Let's save that again:

In [17]:
chart.save(myJekyllDir+"assets/json/altair_mobility_dashboard.json")

## 1. Python Analysis + Altair Plotting

The real "power" of Altair is that we can use it to (relatively) easily port data that is analyzed/cleaned in Python into a vega-lite JSON that can be used in our Jekyll webpage.  

Let's re-make our dashboard in one final way, which will leverage the fact that we can do data manipulations right here in Python and then "copy" that *transformed* data to a vega-lite plot.

Note: this means that we will be storing the data not online *but in the JSON file that we save*.  Keep this in mind when thinking about the size of your data/json file in this case.

Ok, the first thing we need is the data!  Let's use Pandas to load it:

In [20]:
import pandas as pd
import numpy as np

In [21]:
mobility = pd.read_csv('https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/mobility.csv')

In [22]:
mobility.head()

Unnamed: 0,ID,Name,Mobility,State,Population,Urban,Black,Seg_racial,Seg_income,Seg_poverty,...,Migration_out,Foreign_born,Social_capital,Religious,Violent_crime,Single_mothers,Divorced,Married,Longitude,Latitude
0,100,Johnson City,0.062199,TN,576081,1,0.021,0.09,0.035,0.03,...,0.005,0.012,-0.298,0.514,0.001,0.19,0.11,0.601,-82.436386,36.470371
1,200,Morristown,0.053652,TN,227816,1,0.02,0.093,0.026,0.028,...,0.014,0.023,-0.767,0.544,0.002,0.185,0.116,0.613,-83.407249,36.096539
2,301,Middlesborough,0.072635,TN,66708,0,0.015,0.064,0.024,0.015,...,0.012,0.007,-1.27,0.668,0.001,0.211,0.113,0.59,-83.535332,36.55154
3,302,Knoxville,0.056281,TN,727600,1,0.056,0.21,0.092,0.084,...,0.014,0.02,-0.222,0.602,0.001,0.206,0.114,0.575,-84.24279,35.952259
4,401,Winston-Salem,0.044801,NC,493180,1,0.174,0.262,0.072,0.061,...,0.019,0.053,-0.018,0.488,0.003,0.22,0.092,0.586,-80.505333,36.081276


In [23]:
brush = alt.selection_interval(encodings=['x','y'])

chart1 = alt.Chart(mobility).mark_rect().encode(
    alt.X("Student_teacher_ratio:Q", bin=alt.Bin(maxbins=10)),
    alt.Y("State:O"),
    alt.Color("count()")
).properties(
   height=400
).add_selection(
        brush
)

chart2 = alt.Chart(mobility_url).mark_bar().encode(
    alt.X("Mobility:Q", bin=True,axis=alt.Axis(title='Mobility Score')),
    alt.Y('count()', axis=alt.Axis(title='Mobility Score Distribution'))
).transform_filter(
    brush
)

chart = (chart1.properties(width=300) | chart2.properties(width=300))

chart

So, this looks basically identical to what we had above!  This is because it basically is, however when we save the visualization, it will contain all of the data as well:

In [24]:
chart.save(myJekyllDir+"assets/json/altair_mobility_data_dashboard.json")

But we notice the difference if we look at the size of each JSON file:

In [25]:
import os

In [26]:
os.stat(myJekyllDir+"assets/json/altair_mobility_dashboard.json").st_size # size in bytes - so ~1kb

890

In [27]:
os.stat(myJekyllDir+"assets/json/altair_mobility_data_dashboard.json").st_size # so ~7Mb

699535

So, right now, there is no reason to really use this last method -- the data isn't really being manipulated at all so we don't really need to save it with our JSON output.  Next time we'll go through a few more examples where we can do data cleaning/analysis in Python and then save our plots with Altair.