πŸ‘©β€πŸ’» chrismanbrown.gitlab.io

Visualizing Data with Sparklines

quick text based charts

2024-02-20

Edward Tufte is one of my favorite designers. His print style has been emulated on the web via Tufte CSS.

https://edwardtufte.github.io/tufte-css/

He also advocates for the use of sparklines:

A sparkline is a small intense, simple, word-sized graphic with typographic resolution. Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes, but rather sparkline graphics can be everywhere a word or number can be: embedded in a sentence, table, headline, map, spreadsheet, graphic.

Here is a sparkline chart I made this morning.

2020: β–β–β–β–β–ˆβ–†β–†β–„β–‚β–‚β–β–ƒ
2021: β–β–β–β–β–β–β–β–β–β–β–ˆβ–
2022: β–„β–†β–„β–‚β–‚β–ƒβ–β–β–…β–ˆβ–†β–ƒ
2023: β–β–β–β–β–β–β–ƒβ–β–β–ˆβ–„β–…
2024: β–ƒβ–ˆβ–β–β–β–β–β–β–β–β–β–

I have a small database where I jot down notes on β€œthings I consume” such as movies, television shows, podcasts, and albums. (I made this database to complement my record keeping on goodreads, where I track all my reading. This is my β€œeverything else” database.) The graph above quickly, tersely shows post frequency by month for the years since I’ve been keeping notes.

The chart takes advantage of unicode characters U+2581 through U+2588: β–β–‚β–ƒβ–„β–…β–†β–‡β–ˆ. Eight bars. So if you map your data to a range of numbers 1 through 8, then you can print out the corresdonding unicode character.

This little ruby script does the trick just fine:

#!/usr/bin/env ruby
bar = ('▁'..'β–ˆ').to_a 
numbers = ARGV.map(&)
min, max = numbers.minmax
div = (max - min) / (bar.size - 1)
puts min == max ? bar.last*numbers.size : numbers.map{|num| bar[((num - min) / div).to_i]}.join

Then you can:

$ sparkline 5 6 7 8 9 10 11 12 11 10 9 8 7 6 5
β–β–‚β–ƒβ–„β–…β–†β–‡β–ˆβ–‡β–†β–…β–„β–ƒβ–‚β–

To make my chart, I first got a range of dates from my database:

$ # [[years]]
$ recsel db/database.rec -P created -C \
  | xargs -I {} gdate -d"{}" +"%Y" \
  | uniq
2020
2021
2022
2023
2024

Great! I’ll want to iterate over those to do something with them…

for year in `[[years]]`
do
  # do something..
done

(I’ll elide some code snippets in [[double square brackets]] for ease of reading. Whe you read β€œ[[years]]” here, you can mentally substitute the recsel | xargs | uniq pipeline above.)

Now, I know that I want a list of posts per month per year. I can start by getting entries from my database.

$ export year="2022" && recsel db/database.rec \
  -P created \
  -C \
  -e "created >> '$(gdate -d"$year-01-01")' && created << '$(gdate -d"$year-12-31 +1 days")'"
Wed, 19 Jan 2022 10:59:19 -0600
Wed, 19 Jan 2022 11:00:57 -0600
Wed, 19 Jan 2022 11:01:34 -0600
...
Sat, 31 Dec 2022 11:25:54 -0700
Sat, 31 Dec 2022 11:25:54 -0700
Sat, 31 Dec 2022 11:25:54 -0700

Cool, I format those as YYYY-MM and count them:

$ # [[recsel]]
$ export year="2022" && recsel db/database.rec \
        -P created \
        -C \
        -e "created >> '$(gdate -d"$year-01-01")' && created << '$(gdate -d"$year-12-31 +1 days")'" \
    | while read d; do gdate -d $d +"%Y-%m"; done \
    | uniq -c \
    | sed 's/^ *//'
6 2022-01
10 2022-02
7 2022-03
2 2022-04
2 2022-05
4 2022-06
9 2022-09
13 2022-10
11 2022-11
4 2022-12

Great! But I need to fill in the gaps: I need the months in which I had zero months too. What I’ll do is seq 1 12 and the format it:

$ # [[months]]
$ export year="2022" && \
    for month in `seq 1 12`; do printf "%d %d-%02d\n" 0 $year $month; done
0 2022-01
0 2022-02
0 2022-03
0 2022-04
0 2022-05
0 2022-06
0 2022-07
0 2022-08
0 2022-09
0 2022-10
0 2022-11
0 2022-12

… and then join them!

$ join -j 2 -t' ' -e "0" -o 2.1 -a1 <([[months]]) <([[recsel]])
6
10
7
2
2
4
0
0
9
13
11
4

Quick join breakdown:

That gives us all twelve months!

So putting it all together, that gives us:

for year in `recsel db/database.rec -P created -C | xargs -I {} gdate -d"{}" +"%Y" | uniq`
do
    printf "$year: "
    join -j 2 -t' ' -e "0" -o 2.1 -a1 \
    <(for month in `seq 1 12`; do printf "%d %d-%02d\n" 0 $year $month; done) \
    <(recsel db/database.rec -P created -e "created >> '$(gdate -d"$year-01-01")' && created << '$(gdate -d"$year-12-31 +1 days")'" -C \
        | while read d; do gdate -d $d +"%Y-%m"; done \
        | uniq -c \
        | sed 's/^ *//') \
    | xargs sparkline
done

Which gives us the pretty little graphic we saw earlier:

2020: β–β–β–β–β–ˆβ–†β–†β–„β–‚β–‚β–β–ƒ
2021: β–β–β–β–β–β–β–β–β–β–β–ˆβ–
2022: β–„β–†β–„β–‚β–‚β–ƒβ–β–β–…β–ˆβ–†β–ƒ
2023: β–β–β–β–β–β–β–ƒβ–β–β–ˆβ–„β–…
2024: β–ƒβ–ˆβ–β–β–β–β–β–β–β–β–β–

Futher Reading: