Palmer Penguins Stacked Bar Chart
The Palmer penguins dataset is a multivariate dataset for data exploration and visualization, and an alternative to the iris dataset.
Data for three different species of penguins (Adelie, Gentoo, and Chinstrap) is collected in the dataset from three islands in the Palmer Archipelago, Antarctica. This note shows how to create a stacked bar chart which displays how the different species of penguins are distributed on each island.
The Palmer Penguins dataset is initially stored in a SQLITE database table. The species population distribution is calculated with the AWK script below.
#!/usr/bin/awk -f
BEGIN {
FS = ","
}
NR==1 {next}
{
a[$2 $3]++
}
END {
printf("%d,%d,%d ", a["Adelie" "Biscoe"],a["Adelie" "Dream"],a["Adelie" "Torgersen"])
printf("%d,%d,%d ", a["Gentoo" "Biscoe"],a["Gentoo" "Dream"],a["Gentoo" "Torgersen"])
printf("%d,%d,%d ", a["Chinstrap" "Biscoe"],a["Chinstrap" "Dream"],a["Chinstrap" "Torgersen"])
}
As shown in the video below, the previous AWK script outputs the number of penguins on each island for the same specie as a set of comma delimited values.
The AWK script stdout is passed to xfigbar a command line program for making bar charts. Xfigbar output, in .fig format, is streamed again to the stdout. The program was compiled with the Mingw-w64 toolchain.
The bar chart can then be finally edited with either XFIG or the IPE drawing editor, the latter requires an extra step. The command line tool figtoipe creates an IPE XML-file from an existing FIG drawing.