👩‍💻 chrismanbrown.gitlab.io

Flatten your data

You could make your data flatter

2025-04-09

Contents

  1. Structured Data
  2. Unstructured Data

Structured Data

I’ve spent the last several years being really physically disabled. But now that I am doing better I have started being a lot more active, and have started lifting weights again!

Weightlifting is good for you because it creates bone density, and also strength and stability, which are all things I want to work on now so that I don’t reinjure myself, and also so I have a solid foundation by the time my body starts to deteriorate in my 80s and 90s. And also muscle can help us when we’re sick, strengthen our cognitive function, and level out our blood glucose1.

Anyway, here’s my strategy for logging my workouts and charting my progress. First I plan my lifts in a little notebook2. Then at the gym, I write down what I actually did. Then back at home, I copy it down into a little database on my laptop! Then I can query the database and see how I am improving (or plateauing) over time.

When I was first planning this database, I dismissed sql because this data is not really relational. It’s really just a log. I just need something I can easily append a line to. I decided on jsonlines3 to store the data and jq4 to query it.

Initially my data was fairly nested.

{
  date: '2025-03-15',
  name: 'lower body',
  exercises: [
    {
      name: 'leg press',
      sets: [
        {
          reps: 10,
          weight: 250,
        },
        ...
      ]
    },
    ...
  ],
}
Figure: nested json

I chose json because I thought that the fundamental unit of a workout was the date and type (“lower body”, “upper body”, etc) and that each workout was made of a series of exercises, which in turn had a name (e.g. leg press) and had a weight and a number of reps. That is, I thought it was hierarchical in nature.

This is technically a valid representation of the data. But querying and reporting required drilling down a little too much, and just a little too much mental overhead. I realized as I was trying to create reports that all I really want is a single iterable object that contains weight and reps, the exercise name, and the date.

And after I jq-ed5 my json into that shape I realized, heck! This is just csv!

2025-04-08,leg extension,10,140
2025-04-08,leg extension,8,150
2025-04-08,leg extension,6,160
2025-04-08,leg press,10,240
2025-04-08,leg press,8,250
2025-04-08,leg press,6,260
Figure: flat csv

So now I have much flatter data that is much easier to iterate over. And I can do quick analysis on it with awk, csvsql6, or visidata7.

I don’t know whether I would have arrived at this format if I had just spent a few more minutes in the design phase. It’s possible! But it is also possible that this is the kind of thing that is revealed by just making a minimally viable product and then messing around with it.

This is FAFO Driven Development!

Unstructured Data

You can also flatten you unstructured data, also known in this case as “prose”.

I think about the following quote roughly “all the time.”

[It is] notable that the Feynman lectures (3 volumes) write about all of physics in 1800 pages, using only 2 levels of hierarchical headings: chapters and A-level heads in the text. It also uses the methodology of sentences which then cumulate sequentially into paragraphs, rather than the grunts of bullet points. Undergraduate Caltech physics is very complicated material, but it didn’t require an elaborate hierarchy to organize.

Edward Tufte, forum post, ‘Book design: advice and examples’ thread

Is is easy to forget, when tempted to structure your writing, that writing already benefits from the structure of sentences and paragraphs. There is no need to go into a mania of sub-sub-headings, etc. Although I do admit to loving a good bullet list: it is concise and symmetrical, and sometimes a nice rest from the unending onslaught of sentences.

As for headings, I strive to limit myself to first and second level headings. I start to feel a little squirrely whenever I introduce a third level of headings. And I’m positively allergic to any heading level beyond that. The css I wrote for this site supports headings up to level 6 which seems outrageous to me! I ought to remove at least levels 5 and 6.

Where lists are concerned, I have two friends who are technical writers and copywriters by trade and they’re not even allowed — either by decree or by their own professional standards, I can’t remember — to use nested lists. If you can’t say what you want to say in a single level of bullet points, then you need to rethink whatever it is you’re trying to say! Outlines are certainly useful to organize your thoughts. But are you writing prose, or an opml file?

One of my main hobbies is designing tabletop games. And one thing ttrpg fans love, both designers and players, is a random table. This is the intersection of structured data and unstructured data. And the hobby is predisposed to over-structuring its data. Tables can sometimes just be lists. And lists can sometimes just be paragraphs of sentences. Especially when your desktop publishing application of choice is Scribus8, which is a FOSS gem but which offers only the most rudimentary of support for tables at all, and forces you to rethink your tabular data at every turn.


  1. On Muscle: The Stuff That Moves Us and Why It Matters, Bonnie Tsui. https://search.worldcat.org/title/1493069214↩︎

  2. https://pocketmod.com/howto↩︎

  3. https://jsonlines.org/↩︎

  4. https://jqlang.org/↩︎

  5. For future reference, because I jq so infrequently that I can never remember how to actually use it. For example, this oneliner uses the rarely used (for me) but oh so helpful feature of named variables.

    < workout.jsonl \
    | jq '.date as $date \
      | .exercise[] \
      | .name as $name \
      | .set[] * { date: $date, name: $name, reps: .reps, weight: .weight }'
    ↩︎
  6. https://csvkit.readthedocs.io/en/latest/scripts/csvsql.html↩︎

  7. https://www.visidata.org/↩︎

  8. https://www.scribus.net/↩︎