The grammar of tables in python (pandas) and R (gt)
Introduction
The {ggplot2} 📦1is one of the most widely used packages for data visualization in R. It is based on Grammar of Graphics, Wilkinson (2012), and allows you to generate plots using layers. On the other hand, the Grammar of Tables {gt}📦 2 is used to generate tables with a structure similar to that of {ggplot2}, using layers. Both approaches can be combined with tables that include graphics. In this case, I put together an example with data provided by the open data portal of the Autonomous City of Buenos Aires. The example is replicated with {pandas}📦3.
This post is a short version of the following posts:
The code below shows how to generate the following tables:
1️⃣ Packages
The required packages 📦are loaded. Both in R and in python.
Code
library(tidyverse)
library(lubridate)
library(circular)
library(gt)
library(gtExtras)
library(gtsummary)
library(reshape)
library(sf)
library(stringr)
library(circular)
library(webshot2)
library(reticulate)
::conflict_prefer("filter", "dplyr")
conflicted::conflict_prefer("webshot", "webshot2")
conflicted::conflict_prefer("select", "dplyr")
conflictedoptions(scipen=999)
A conda environment is created for this project, with python 3.9.5:
Code
::conda_create(envname='tables', python_version="3.9.5") reticulate
The required python packages 📦 are installed via conda-forge with reticulate:
Code
::conda_install(envname='tables',
reticulatepackages='numpy', channel = 'conda-forge')
::conda_install(envname='tables',
reticulatepackages='pandas', channel = 'conda-forge')
::conda_install(envname = 'tables',
reticulatepackages='matplotlib=3.5.3', channel='conda-forge')
::conda_install(envname = 'tables',
reticulatepackages='plotnine', channel='conda-forge')
::conda_install(envname = 'tables',
reticulatepackages='jinja2', channel='conda-forge')
::conda_install(envname = 'tables',
reticulatepackages='seaborn', channel='conda-forge')
::conda_install(envname = 'tables',
reticulatepackages='geopandas', channel='conda-forge')
::conda_install(envname = 'tables',
reticulatepackages='ipykernel', channel='conda-forge')
The conda environment is defined for this script:
Code
::use_condaenv(
reticulatecondaenv = 'tables',
required = TRUE
)
Imports:
Code
import numpy as np
import pandas as pd
import geopandas as gpd
from io import BytesIO
import base64
from IPython.core.display import HTML
from plotnine import *
import seaborn as sns
import matplotlib.pyplot as plt
from mizani.formatters import date_format
from mizani.breaks import date_breaks
from scipy.stats import circmean
import pprint
import warnings
"ignore") warnings.filterwarnings(
2️⃣ Data
Data from subway trips in Buenos Aires, Argentina, is used. The period of November 2021 was considered, taking the period of October 2021 to show the percentage variation. The data is loaded with python. Through the {reticulate} package 📦 it will be used with R.
Code
= 'https://cdn.buenosaires.gob.ar/datosabiertos/datasets'
base_url = 'sbase/subte-viajes-molinetes'
dataset
= {
colors 'A':'#18cccc',
'B':'#eb0909',
'C':'#233aa8',
'D':'#02db2e',
'E':'#c618cc',
'H':'#ffdd00'
}
def read_data(url, dataset, file):
= f'{url}/{dataset}/{file}'
path
= (pd.read_csv(path, delimiter=';')
df_
# Remove useless data
'~FECHA.isna()', engine='python')
.query(
# Columns rename
str.lower, axis='columns')
.rename(
.rename({'linea':'line',
'desde':'hour',
'fecha':'date',
'estacion':'station'
=1)
},axis
# Transformations
.assign(= lambda x: [i.replace('Linea','') for i in x['line']],
line = lambda x: pd.to_datetime(x['date'],format='%d/%m/%Y'),
date = lambda x: x['line'].replace(colors)
color
)# Selected columns
'line', 'color', 'date', 'hour', 'station', 'pax_total']]
[[
)
return(df_)
= read_data(url=base_url, dataset=dataset, file='molinetes_112021.csv')
df = read_data(url=base_url,dataset=dataset, file='molinetes_102021.csv') df_oct
Some stations information is also loaded:
Code
= (pd.read_csv('data/stations.csv')
df_stations 'linea':'line', 'estacion':'station'}, axis=1)
.rename({ )
Number of passengers per station:
Code
= {
rename_stations 'Flores': 'San Jose De Flores',
'Saenz Peña ': 'Saenz Peña',
'Callao.b': 'Callao',
'Retiro E': 'Retiro',
'Independencia.h': 'Independencia',
'Pueyrredon.d': 'Pueyrredon',
'General Belgrano':'Belgrano',
'Rosas': 'Juan Manuel De Rosas',
'Patricios': 'Parque Patricios',
'Mariano Moreno': 'Moreno'
}
= (df
df_passengers_station 'line','color','station'], as_index=False)
.groupby([= ('pax_total','sum'))
.agg(pax_total = lambda x: x['station'].str.title())
.assign(station
.replace(rename_stations)=['line','station'])
.merge(df_stations, on )
Spatial data
Geojson data from the city of Buenos Aires is loaded, in this case, both in R and python:
Code
= 'http://cdn.buenosaires.gob.ar/datosabiertos/datasets/barrios/'
base_url = paste0(base_url, 'barrios.geojson')
url
<- st_read(url, quiet = TRUE) %>%
map_caba mutate(barrio=str_to_title(BARRIO))
Code
= gpd.read_file(r.url) map_caba
3️⃣Table
Style and parameters definitions
Some aspects are defined in python:
Code
= [
custom_style 'selector':"caption",
{'props':[("text-align", "left"),
"font-size", "135%"),
("font-weight", "bold")]},
('selector':'th',
{"props": 'text-align : center; background-color: white; color: black; font-size: 18; border-bottom: 1pt solid lightgrey'},
"selector": "",
{"props": [("border", "1px solid lightgrey")]}
]
='A summary of the use of the subway in the city of Buenos Aires, Argentina'
title = 'Analysis period: November 2021'
subtitle
= "<br>Own elaboration based on data from the Open Data Portal from the city of Buenos Aires</br>"
source
= "% of passengers per station in relation to the total number of passengers on that line. For the cuts, the quantiles of the distribution were used. It is observed that Line C presented a very high use in the headwaters, while in the rest of the lines the use was more distributed."
footer_map
= "% Variation in relation to October 2021."
footer_variation
= "The color line represents the mean hour, considering the circular distribution of the hour variable."
footer_clock
= "Percentage per line: Morning = 5 a 11 hs, Afternoon = 12 a 17 hs, Night = 18 a 23 hs" footer_moments_day
Functions definition
Images
A function to map a .jpg image with a column of a dataframe:
Code
def map_line_img(i):
= f'images/{i.lower()}.jpg'
path return '<img src="'+ path + '" width="25" >'
Line plot
The evolution of passengers per day is included in the tables based on a line plot. Here, some functions are defined to generate these plots.
Code
<- function(.line, .color){
fig_evol_pax_total $df %>%
pyfilter(line == .line) %>%
group_by(date) %>%
summarise(n = sum(pax_total)) %>%
ggplot(aes(x = date, y = n)) +
geom_line(color = 'grey', size = 2.5) +
geom_line(color = .color, size = 1.5) +
scale_y_continuous(
labels = scales::unit_format(unit = "K", scale = 1e-3)
+
) labs(x = '', y = 'Passengers') +
theme_minimal() +
theme(
text = element_text(size = 30),
axis.title.y = element_text(color = 'grey'),
panel.grid = element_blank()
) }
Code
def fig_evol_pax_total(i):
=df.query("line==@i")
data_line
= data_line.color.max()
color
=(data_line
data_line'date', as_index=False)
.groupby(
.pax_totalsum()
.
)
= (ggplot(
p = data_line,
data = aes(x='date', y='pax_total',group=1)
mapping +
) =color)+
geom_line(color+
theme_minimal()='',y='N')+
labs(x
scale_x_datetime(= date_format("%Y-%m"),
labels =date_breaks('7 days')
breaks+
)= '', y = '') +
labs(x +
theme_void()
theme(= element_rect(fill=None),
panel_background= element_rect(fill=None),
plot_background =element_text(size=7),
text=element_text(vjust=-0.5)
axis_text_x
)
)return p
def plotnine2html(p,i, width=4, height=2):
= BytesIO()
figfile format='png', width=width, height=height, units='in')
p.save(figfile, 0)
figfile.seek(= base64.b64encode(figfile.getvalue()).decode()
figdata_png = f'<img src="data:image/png;base64,{figdata_png}" />'
imgstr
return imgstr
def map_plot_evol(i):
= fig_evol_pax_total(i)
fig return plotnine2html(fig,i)
Map
Some functions are defined in order to generate the maps in the tables.
Code
<- function(.df, .line){
fig_map <- .df %>%
temp filter(line==.line) %>%
mutate(pax_percent = pax_total / sum(pax_total))
<- round(quantile(temp$pax_percent, c(0,0.25,0.5,0.75,1)),2) %>%
lbreaks as.numeric()
ggplot() +
geom_sf(data = map_caba,
color = "black",
fill = 'white',
size = 0.1,
show.legend = FALSE)+
geom_point(data = temp,
aes(x = long, y = lat, size=pax_percent), alpha=0.7,
fill = temp$color %>% unique(), color='black', shape=21)+
scale_size_continuous(breaks = lbreaks, range=c(1,10),
limits=c(min(temp$pax_percent),max(temp$pax_percent)),
labels = scales::percent(lbreaks, accuracy=0.1))+
theme_void()+
theme(text = element_text(size = 25),
legend.position = 'right',
axis.text = element_blank(),
plot.margin = unit(c(0, 0, 0, 0), "null"))+
labs(x='',y='',size='')
}
Code
def fig_map(i):
=(df_passengers_station
total_passengers'line']==i]
.loc[df_passengers_station['pax_total']
[sum()
.
)
=(df_passengers_station
data_line'line']==i]
.loc[df_passengers_station[
.assign(= lambda x:
pax_percent 'pax_total']/total_passengers)*100
(x[
)
)
= data_line.color.max()
color
= round(
lbreaks 'pax_percent'].quantile([0,0.25,0.5,0.75,1]),2
data_line[
)
= (ggplot(data=map_caba)+
p ='white', color = "black", size = 0.1)+
geom_map(fill=data_line,
geom_point(data=aes(x='long',y='lat', size='pax_percent'),
mapping=0.5, color='black', shape='o', fill=color)+
alpha
scale_size_continuous(=lbreaks,
lbreaksrange=[1,10],
= [
limits 'pax_percent'].min()-1,
data_line['pax_percent'].max()+1
data_line[
],=lambda l: [f'{round(i)}%' for i in l])+
labels+
theme_void()='right')+
theme(legend_position='%')
labs(size
)
return p
def map_plot_mapa(i):
= fig_map(i=i)
fig return plotnine2html(fig, i, width=3, height=3)
Bar plot
In {gt} a function will be used to generate the percentage per moment of the day plot. In python, however, a function needs to be defined:
Code
def fig_percent(i):
= (df
temp 'line==@i')
.query(
.assign(= lambda x: pd.cut(
group_hour 'hour'], bins=3, labels = ['Morning', 'Afternoon', 'Night'])
x[
)'line','group_hour'], as_index=False)
.groupby([= ('pax_total','sum'))
.agg(pax_total
)'perc']=round(
temp['pax_total'] / temp.groupby('line')['pax_total'].transform('sum')*100,2)
temp['perc_lab'] = [str(i)+'%' for i in temp['perc']]
temp[
=(ggplot(data=temp,
p=aes(x='line', y='perc', fill='group_hour', label='perc_lab'))+
mapping= position_stack(reverse=True))+
geom_col(position
geom_text(= position_stack(vjust = .5, reverse=True),
position ='white', size=8)+
color+
coord_flip()'lightgrey', '#A3B1C9','#4C699E'])+
scale_fill_manual([+
theme_void()='none')
theme(legend_position
)return p
def map_plot_percent(i):
= fig_percent(i)
fig return plotnine2html(fig,i, width=4, height=0.4)
Circular plot
For the circular plot, both functions are defined in R and python.
The mean hour is generated in R, with the {circular} package:
Code
<- function(.line, .df) {
get_hour <- .df %>%
temp filter(line == .line) %>%
select(hour, pax_total)
<- untable(temp, num = temp$pax_total) %>%
hour select(-pax_total) %>%
mutate(circular_hour = circular(hour,
template = "clock24",
units = "hours")) %>%
summarise(hour = mean(circular_hour)) %>%
pull(hour)
as.numeric(hour) %% 24
}
Code
<- data.frame(
df_mean_hours line=py$df %>% pull(line) %>% unique()
%>%
) mutate(mean_hour = map(line, ~get_hour(.line=.x, .df=py$df))) %>%
mutate(mean_hour = unlist(mean_hour))
Code
<- function(.line, .df, .color = 'black') {
fig_clock_plot
= df_mean_hours %>% filter(line==.line) %>% pull(mean_hour)
mean_hour
<- data.frame(hour = seq(0, 23)) %>%
temp left_join(
%>%
.df filter(line == .line) %>%
group_by(hour) %>%
summarise(pax_total = sum(pax_total)) %>%
ungroup()
%>%
) mutate(color_hour = ifelse(hour == round(mean_hour), TRUE, FALSE)) %>%
mutate(pax_total = ifelse(is.na(pax_total), 0, pax_total))
%>%
temp ggplot(aes(x = hour, y = pax_total)) +
geom_col(color = 'white', fill = 'lightgrey') +
coord_polar(start = 0) +
geom_vline(xintercept = mean_hour,
color = .color,
size = 2) +
geom_label(
aes(
x = hour,
y = max(pax_total) + quantile(pax_total, 0.3),
color = color_hour,
label = hour
),size = 6,
label.size = NA,
show.legend = FALSE
+
) scale_color_manual(values = c('black', .color)) +
scale_x_continuous(
"",
limits = c(0, 24),
breaks = seq(0, 24),
labels = seq(0, 24)
+
) scale_y_continuous(labels = scales::unit_format(unit = "K", scale = 1e-3)) +
labs(y = 'Passengers') +
theme_minimal() +
theme(text = element_text(size = 25, color = 'grey'),
axis.text.x = element_blank())
}
Code
from matplotlib.ticker import FuncFormatter
def thousand_format(x, pos):
return f'{round(x / 1000)}K'
def gen_clock_plot(df_, mean_hour, x='hour', y='pax_total', color='blue'):
'font', size=8)
plt.rc('off')
plt.axis(
= plt.subplots(figsize=(3,3))
fig, ax = plt.subplot(111, polar=True)
ax
= df_[y].astype('float').to_numpy()
cr = df_[x].astype('float').to_numpy()
hour
= 24
N = 2
bottom = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
theta = (2*np.pi) / N
width
= ax.bar(theta, cr,
bars =width,
width=bottom,
bottom='lightgrey',
color='white')
edgecolor
ax.vlines(= mean_hour*theta.max()/24,
x=0, ymax=df_['pax_total'].max(),
ymin=color)
color
"N")
ax.set_theta_zero_location(-1)
ax.set_theta_direction(= ['0:00', '3:00', '6:00', '9:00', '12:00', '15:00', '18:00', '21:00']
ticks
ax.set_xticklabels(ticks)
ax.yaxis.set_major_formatter(FuncFormatter(thousand_format))False)
ax.grid(for key, spine in ax.spines.items():
False)
spine.set_visible(
return fig
def fig_clock_plot(i):
=df.query("line==@i")
data_line= data_line.color.max()
color
= (pd.DataFrame({'hour':range(0,24)})
data_hour
.merge(df'line==@i')
.query('hour', as_index=False)
.groupby(= ('pax_total','sum')),
.agg(pax_total ='left'
how
)0)
.fillna(
)= r.df_mean_hours.query("line==@i").mean_hour
mean_hour
return gen_clock_plot(data_hour, color=color, mean_hour=mean_hour)
def clock2inlinehtml(p,i):
= BytesIO()
figfile format='png', dpi=100, transparent=True)
plt.savefig(figfile, 0)
figfile.seek(= base64.b64encode(figfile.getvalue()).decode()
figdata_png = f'<img src="data:image/png;base64,{figdata_png}" />'
imgstr
figfile.close()return imgstr
def map_plot_clock(i):
=(2,2))
plt.figure(figsize= fig_clock_plot(i)
fig return clock2inlinehtml(fig,i)
Data definition
First, an R and pandas dataframe is generated. This dataframe will later be styled into a well formated table.
Code
= c('line','Route','most_used_station','Map','pax_total',
cols_selected 'variation', 'passengers_type', 'clock_plot','passengers_per_day','color')
<- py$df %>% select(line, color) %>% unique() %>%
r_table_data
arrange(line) %>%
# Routes
mutate(
Route = case_when(
== 'A' ~ 'Plaza de Mayo - San Pedrito',
line == 'B' ~ 'J.M. Rosas - L.N. Alem',
line == 'C' ~ 'Constitución - Retiro',
line == 'D' ~ 'Congreso de Tucumán - Catedral',
line == 'E' ~ 'Retiro - Plaza de los Virreyes',
line == 'H' ~ 'Hospitales - Facultad de Derecho',
line TRUE ~ ''
)%>%
)
left_join(
$df %>%
pymutate(group_hour = cut(
hour,breaks = 3,
labels = c('Morning', 'Afternoon', 'Night')
%>%
)) group_by(line, group_hour) %>%
summarise(pax_total = sum(pax_total)) %>%
group_by(line) %>%
mutate(pax_percent = round(pax_total / sum(pax_total) * 100)) %>%
group_by(line) %>%
summarise(passengers_type = list(pax_percent))
%>%
)
left_join(py$df %>%
group_by(line) %>%
summarise(pax_total = sum(pax_total))) %>%
left_join(py$df_oct %>%
group_by(line) %>%
summarise(pax_total_oct = sum(pax_total))) %>%
mutate(variation = (pax_total / pax_total_oct - 1)) %>%
left_join(
$df %>%
pygroup_by(line, most_used_station = station) %>%
summarise(pax_total = sum(pax_total)) %>%
group_by(line) %>%
slice(which.max(pax_total)) %>% select(-pax_total)
%>%
)
# Applying functions
mutate(
clock_plot = map2(line, color,
~ fig_clock_plot(.line = .x, .df = py$df, .color = .y)
),
passengers_per_day = map2(line, color,
~ fig_evol_pax_total(.line=.x, .color=.y)
),
Map = map(line, ~ fig_map(.df = py$df_passengers_station, .line = .x)
)%>%
)
# Columns selected in order:
select(all_of(cols_selected))
Code
= {
paths 'A':'Plaza de Mayo - San Pedrito',
'B':'J.M. Rosas - L.N. Alem',
'C':'Constitución - Retiro',
'D':'Congreso de Tucumán - Catedral',
'E':'Retiro - Plaza de los Virreyes',
'H':'Hospitales - Facultad de Derecho'
}
= (df
py_table_data
'line','color'], as_index=False)
.groupby([
.agg(= ('station', pd.Series.mode),
most_used_station = ('pax_total','sum')
pax_total
)
.merge(df_oct'line', as_index=False)
.groupby(= ('pax_total','sum')),
.agg(pax_total_oct ='line', how='left')
on
.assign(= lambda x: (x['pax_total']/x['pax_total_oct']-1),
variation = lambda x: [str(round(i/1000000,2))+'M' for i in x['pax_total']],
pax_total = lambda x: x['line'].replace(paths),
Route = lambda x: x['line'],
passengers_per_day = lambda x: x['line'],
Map = lambda x: x['line'],
clock_plot = lambda x: x['line']
passengers_type
)
[r.cols_selected]
)
= dict(zip(py_table_data['Route'], py_table_data['color']))
color_mapping = dict(zip(py_table_data['Route'], ['#f0f0f0']*6)) color_mapping_back
Table
Code
<- r_table_data %>%
gt_table
gt() %>%
tab_header(
title = md(paste0('**',py$title,'**')),
subtitle = py$subtitle
%>%
)
# Estilo
tab_style(locations = cells_title(groups = 'title'),
style = list(
cell_text(
font = google_font(name = 'Raleway'),
size = 'xx-large', weight = 'bold', align = 'left', color = 'darkblue'
)%>%
))
tab_style(locations = cells_title(groups = 'subtitle'),
style = list(
cell_text(
font = google_font(name = 'Raleway'),
size = 'medium', align = 'left', color = '#666666'
)%>%
))
opt_align_table_header('left') %>%
cols_align('center',
columns = c(
'pax_total','variation','most_used_station',
'clock_plot', 'passengers_per_day')
%>%
)
cols_width(
~ px(50),
line ~ px(100),
Route ~ px(80),
most_used_station ~ px(20),
clock_plot ~ px(80),
pax_total ~ px(100)
variation %>%
)
# Grouping columns
tab_spanner(label = "Use per hour",
columns = c(passengers_type, clock_plot)) %>%
tab_spanner(label= "Number of passengers",
columns = c(pax_total, variation)) %>%
# Colors
tab_style(
style = cell_text(color = "#18cccc"),
locations = cells_body(columns = c(Route), rows = line == 'A')
%>%
) tab_style(
style = cell_text(color = "#eb0909"),
locations = cells_body(columns = c(Route), rows = line == 'B')
%>%
) tab_style(
style = cell_text(color = "#233aa8"),
locations = cells_body(columns = c(Route), rows = line == 'C')
%>%
) tab_style(
style = cell_text(color = "#02db2e"),
locations = cells_body(columns = c(Route), rows = line == 'D')
%>%
) tab_style(
style = cell_text(color = "#c618cc"),
locations = cells_body(columns = c(Route), rows = line == 'E')
%>%
) tab_style(
style = cell_text(color = "#ffdd00"),
locations = cells_body(columns = c(Route), rows = line == 'H')
%>%
) tab_style(style = list(cell_fill(color = "#f0f0f0")),
locations = cells_body(columns = c('Route'))) %>%
# Format numeric columns
fmt_number(pax_total, suffixing = TRUE) %>%
fmt_percent(variation) %>%
# Mappings:
text_transform(
locations = cells_body(columns = c(line)),
fn = function(line) {
lapply(here::here('', paste0('images/', tolower(line), '.jpg')),
height = 25)
local_image,
}%>%
) cols_label(line = '') %>%
gt_plt_bar_stack(
column = passengers_type,
position = 'fill',
labels = c("Morning","Afternoon","Night"),
palette = c('grey', '#A3B1C9','#4C699E'),
fmt_fn = scales::label_percent(scale=1),
width = 60) %>%
text_transform(
locations = cells_body(columns = clock_plot),
fn = function(x) {
map(r_table_data$clock_plot,
::ggplot_image,
gtheight = px(250),
aspect_ratio = 2)
}%>%
)
text_transform(
locations = cells_body(columns = passengers_per_day),
fn = function(x) {
map(r_table_data$passengers_per_day,
::ggplot_image,
gtheight = px(200),
aspect_ratio = 2)
}%>%
)
text_transform(
locations = cells_body(columns = Map),
fn = function(x) {
map(r_table_data$Map,
::ggplot_image,
gtheight = px(250),
aspect_ratio = 2)
}%>%
)
# Naming variables
cols_label(
Map = md('% Passengers per station'),
clock_plot = md('Passengers per hour'),
pax_total = md('Total'),
variation = md('% Variation'),
passengers_per_day = md('Passengers per day'),
most_used_station = md('Most used station')
%>%
)
tab_footnote(cells_column_labels(columns = variation),
footnote = py$footer_variation) %>%
tab_footnote(cells_column_labels(columns = clock_plot),
footnote = py$footer_clock) %>%
tab_footnote(cells_column_labels(columns = Map),
footnote = py$footer_map) %>%
tab_footnote(cells_column_labels(columns = passengers_type),
footnote = py$footer_moments_day) %>%
tab_source_note(
source_note = html(py$source)
%>%
)
tab_style(locations = cells_source_notes(),
style = list(cell_text(
font = google_font(name = 'Raleway'),
size = 'medium', align = 'left', color = '#666666'
%>%
)))
tab_style(locations = cells_footnotes(),
style = list(cell_text(
font = google_font(name = 'Raleway'),
size = 'medium', align = 'left', color = '#666666'
%>%
)))
tab_options(
data_row.padding = px(0),
table.border.top.style = "hidden",
table.border.bottom.style = "hidden",
table_body.border.top.style = "solid",
column_labels.border.bottom.style = "solid"
%>%
)
cols_hide('color')
gt_table
A summary of the use of the subway in the city of Buenos Aires, Argentina | ||||||||
---|---|---|---|---|---|---|---|---|
Analysis period: November 2021 | ||||||||
Route | Most used station | % Passengers per station1 | Number of passengers | Use per hour | Passengers per day | |||
Total | % Variation2 | Morning||Afternoon||Night3 | Passengers per hour4 | |||||
Plaza de Mayo - San Pedrito | San Pedrito | 2.81M | 14.97% | |||||
J.M. Rosas - L.N. Alem | Federico Lacroze | 3.48M | 12.53% | |||||
Constitución - Retiro | Constitucion | 2.13M | 19.51% | |||||
Congreso de Tucumán - Catedral | Congreso de Tucuman | 3.00M | 11.58% | |||||
Retiro - Plaza de los Virreyes | Bolivar | 1.12M | 15.50% | |||||
Hospitales - Facultad de Derecho | Once | 1.52M | 9.22% | |||||
Own elaboration based on data from the Open Data Portal from the city of Buenos Aires |
||||||||
1 % of passengers per station in relation to the total number of passengers on that line. For the cuts, the quantiles of the distribution were used. It is observed that Line C presented a very high use in the headwaters, while in the rest of the lines the use was more distributed. | ||||||||
2 % Variation in relation to October 2021. | ||||||||
3 Percentage per line: Morning = 5 a 11 hs, Afternoon = 12 a 17 hs, Night = 18 a 23 hs | ||||||||
4 The color line represents the mean hour, considering the circular distribution of the hour variable. |
Code
= (py_table_data.drop('color',axis=1)
pd_table
.style
lambda v: f"color: {color_mapping.get(v, 'black')}")
.applymap(
lambda v: f"background-color: {color_mapping_back.get(v, 'white')}")
.applymap(
.set_table_styles(custom_style)
f"""
.set_caption( <h1><span style="color: darkblue">{title}</span><br></h1>
<span style="color: black">{subtitle}</span><br><br>
"""
)
='index')
.hide(axis
# Mappings:
format(
.={
formatter'line':map_line_img,
'Map':map_plot_mapa,
'passengers_per_day': map_plot_evol,
'passengers_type':map_plot_percent,
'clock_plot': map_plot_clock,
'variation': '{:,.2%}'.format
}
)
**{'font-size': '18pt'}, overwrite=False)
.set_properties(=['Route'], **{'width': '10px'}, overwrite=False)
.set_properties(subset=['line'], **{'width': '5px'}, overwrite=False)
.set_properties(subset
.to_html()
'line</th>','')
.replace('pax_total</th>','Number of<br>passengers')
.replace('passengers_type</th>',
.replace("""% Passengers per moment (3)<br>
<span style="color: lightgrey">Morning</span> |
<span style="color: #A3B1C9">Afternoon</span> |
<span style="color: #4C699E">Night</span>
"""
)'Map</th>','% Passengers per station (1)')
.replace('variation</th>','% Variation (2)')
.replace('passengers_per_day</th>','Passengers per day')
.replace('clock_plot</th>','Passengers per hour (4)')
.replace('most_used_station</th>','Most used station')
.replace(+ f"""<caption>
(1) {footer_map}
<br>(2) {footer_variation}
<br>(3) {footer_moments_day}
<br>(4) {footer_clock}
<br>{source}
</caption>"""
)
# Displaying the table
HTML(pd_table)
Route | Most used station | % Passengers per station (1) | Number of passengers | % Variation (2) | % Passengers per moment (3) Morning | Afternoon | Night | Passengers per hour (4) | Passengers per day | |
---|---|---|---|---|---|---|---|---|
Plaza de Mayo - San Pedrito | San Pedrito | 2.81M | 14.97% | |||||
J.M. Rosas - L.N. Alem | Federico Lacroze | 3.48M | 12.53% | |||||
Constitución - Retiro | Constitucion | 2.13M | 19.51% | |||||
Congreso de Tucumán - Catedral | Congreso de Tucuman | 3.0M | 11.58% | |||||
Retiro - Plaza de los Virreyes | Retiro E | 1.12M | 15.50% | |||||
Hospitales - Facultad de Derecho | Santa Fe | 1.52M | 9.22% |
(2) % Variation in relation to October 2021.
(3) Percentage per line: Morning = 5 a 11 hs, Afternoon = 12 a 17 hs, Night = 18 a 23 hs
(4) The color line represents the mean hour, considering the circular distribution of the hour variable.
Own elaboration based on data from the Open Data Portal from the city of Buenos Aires
Saving the table
Both tables are saved as a png image.
Code
::gtsave(gt_table, 'gt_table.png', vwidth = 2400, vheight = 1000) gt
Code
# gt::gtsave(data=gt_table, filename='gt_table.html')
# webshot(
# url='gt_table.html',
# file="gt_table2.png",
# vwidth = 2000, vheight = 1500, cliprect = "viewport"
# )
Code
= open("pd_table.html", "w")
f f.write(pd_table)
546788
Code
f.close()
In this case, I’ve decided to use {webshot2}, an R package to convert an html file to png:
Code
::webshot(
webshot2url='pd_table.html',
file="pd_table.png",
vwidth=2000,
vheight = 2200,
cliprect = "viewport"
)