recfiles
you wont believe this one weird database
2022-01-07
Contents
Introduction
recfiles (“record” files) are a file format for creating feature-rich databases in plain text with minimal markup. The GNU project offers a series of tools and libraries called recutils to work with recfiles.
This post will introduce some of the features of recfiles, list out some pros and cons, and talk about how I’ve been using them lately.
Edit: Thanks to lucidiot and mio for proofreading and submitting corrections.
Features
Records | Querying | Basic database stuff | Joins | Templates
Records
At its most basic, a recfile consists of simple fields and values.
Here’s a database called books.rec:
title: Green Eggs and Ham
author: Ted Geisel
published: 1960
location: home
title: The Very Hungry Caterpillar
author: Eric Carle
published: 1969
location: home
title: King Bidgood's in the Bathtub
author: Audrey Wood
published: 1993
location: loaned
You can see that a recfile consists of multiple records separated by a blank line.
Each record can contain an arbitrary number of lines of a
field: value
format.
That’s all you need to get started.
Querying
The first recutil command you’ll encounter is probably
recsel
.
If I want to see all entries in books.rec, I can run
recsel books.rec
. (This is effectively equivalent to
running cat books.rec
.)
Or I can specify fields to p
rint with the
-p
flag:
recsel -p author,title books.rec
out:
author: Ted Geisel
title: Green Eggs and Ham
author: Eric Carle
title: The Very Hungry Caterpillar
author: Audrey Wood
title: King Bidgood's in the Bathtub
You can also evalute with the -e
flag:
recel -p title -e "published > 1970" books.rec
out:
title: King Bidgood's in the Bathtub
Basic database stuff
You can add all kinds of metadata to your recfile, including constraints and required fields, keys, and autoincrementing fields:
%rec: book
%doc: this represents a book in your collection
%type: published date
%type: author line
%type: title line
%type: status enum home loaned
%mandatory: author title
%key: id
%type: id int
%auto: id
%unique: id
id: 0
title: Green Eggs and Ham
author: Ted Geisel
published: 1960
location: home
id: 1
title: The Very Hungry Caterpillar
author: Eric Carle
published: 1969
location: home
id: 2
title: King Bidgood's in the Bathtub
author: Audrey Wood
published: 1993
location: loaned
(I added an id
field here. It’s unique and
autoincrements.)
At this point you can continue just editing records by hand, but to
level up, you probably want to start using tools like
recins
, recset
, and recdel
. This
is because if you edit the file by hand, you might make mistakes like
omitting a mandatory field, or accidentally repeating a value for a
unique field. Whereas using the correct recutil will generate an error
if trying to do that.
Caveat: or you can continue to edit by hand and periodically run
recfix
to check for errors.
Insert:
recins -f title -v "I Want My Hat Back" -f author -v "Jon Klassen" -f published -v 2012 -f location -v home books.rec
Delete:
recdel -e "id = 3" books.rec
Joins
You can join records!
Let’s update our books.rec:
%rec: book
%doc: this represents a book in your collection
%type: published date
%type: author line
%type: title line
%type: status enum home loaned
%mandatory: author title
%key: id
%type: id int
%auto: id
%unique: id
%type: loanedTo rec person
id: 2
title: King Bidgood's in the Bathtub
author: Audrey Wood
published: 1993
location: loaned
loanedTo: jeffFromWork
...
%rec: person
%key: id
id: jeffFromWork
firstName: Jeff
lastName: Osgood
Note:
We added a
loanedTo
field tobook
with a type ofrec person
. This refers to a newperson
record.We added a new
person
record to the same file. It has an id that can be referenced bybook
.
An aside:
Having book
and person
in the same file
means that we should probably rename books.rec
to something
generic like db.rec
. Let’s assume we did that.
We must also now introduce the -t
flag, which ought to
be -r
since it designates a r
ecord to work
on:
This will show you all books:
recsel -t book db.rec
And this will show you all persons:
recsel -t person db.rec
</aside>
Now, finally, joins:
recsel -t book -j loanedTo -p title,loanedTo_firstName,loanedTo_lastName db.rec
out:
title: King Bidgood's in the Bathtub
loanedTo_firstName: Jeff
loanedTo_lastName: Osgood
Note: pass -j
the name of the field that is a foreign
key.
Templates
recutils also include a recfmt
utility so you can create
a template for your data.
books.templ
:
Dear {{loanedTo_firstName}} {{loanedTo_lastName}},
You have my book, {{title}}, by {{author}}.
Please return it immediately or I will be forced to resort to extreme measures.
Sincerely,
The Librarian
And then:
recsel -t book \
-j loanedTo \
-p title,author,loanedTo_firstName,loanedTo_lastName db.rec |\
recfmt -f books.templ
out:
Dear Jeff Osgood,
You have my book, King Bidgood's in the Bathtub, by Audrey Wood.
Please return it immediately or I will be forced to resort to extreme measures.
Sincerely,
The Librarian
Pros and Cons
Pros:
A plain text database
human-readable
human-editable
diffable and can be checked into version control
Easier (less syntax) than other plain text formats like yaml and <abbr title=““Jason”“>json
Serializable:
rec2csv
is included in recutils. From there you can use csvkit to convert it to json. And between csv and json, you can do anything you want with your data. The inverse is also true: recutils includescsv2rec
. If you have a json source, you can convert to csv and then to a recfileIncremental features/complexity: you can just start typing in your text editor and create a simple recfile without much planning or thinking about a schema. You can add types and constraints and stuff later if you want to, but it’s never necessary.
Cons:
I don’t know, it’s kind of weird?
Using the recutils is actually kind of verbose and unergonomic at times I think, and for that reason I think they often beg to be wrapped up in a bash script or some other kind of script or command runner.
How I’ve Been Using It
Easier to read csv files. I frequently export my goodreads library and my Calibre library to csv. Both are fairly hard to browse in that format, but are super easy to read and query as recfiles.
Gamelog: I started keeping a journal of all the games I play in a recfile. It’s super easy to maintain because I use few recfile features. It’s first and foremost a plain text file. But I can also query it.
Config: it’s dead easy to just start typing keys and values into a recfile and use that as configuration for a program.
Conclusion
So that’s everythig I know about recfiles!
I think they’re kind of neat.
If you want some kind of queryable structured text with minimal markup and syntax, then this is for you. That’s its sweet spot.
While it offers a lot of basic database features, if that’s the direction you’re going, you might be happier just putting your data in a sqlite database or something at the cost of human-readable, human-writable, diffable text.
Resources
project home page: https://www.gnu.org/software/recutils/
tomasino labs (blog): https://labs.tomasino.org/gnu-recutils/
Recutils — Small Technology Notes: https://john.colagioia.net/blog/2020/02/05/recutils.html