Persisting data in Ruby with PStore

25 January 2014 Filed under:

Persisting data in Ruby with PStore

It’s quite common when developing scripts to want to persist data. Configuration variables; the last options chosen; the previous files read; a cache of method return values or the results of complex calculations. To use a database for this sort of simple persistence, even a lightweight database like SQLite, can seem like overkill.

Faced with this situation, many developers opt to “roll their own” persistence solution. Generally, they store what they want to persist in a hash, then write that hash as JSON or YAML to a file. Sometimes these writes happen as the script runs; sometimes they happen at the end of the script’s execution.

But rolling your own in this way is a bad idea. Apart from requiring you to write a lot of boilerplate, it’s also less convenient: you’re forced to write the file manually whenever you want to persist your data. It’s also prone to data loss if multiple processes are reading from and writing to the same file, forcing you to implement locking (or to not, and then corrupt your data).

Wouldn’t it be nice if there was a hash or hash-like data structure that persisted itself to disk for us, without us having to worry about serialising data or writing files?

Fortunately, Ruby’s standard library has us covered with its persistent store library, which it calls PStore; it’s both a specific implementation of a persistent store and also an interface for other implementations. Let’s take a look at both the regular form of PStore and an alternate implementation, to see how it can help us persist data simply and safely.

Regular `PStore`

PStore is part of the standard library, so we can start using it with a simple require — no gems or external dependencies required.

We can create a new persistent store by passing a filename (or an IO object) to PStore.new:

require "pstore"

store = PStore.new("data.pstore")

If the file doesn’t exist, it’ll be created; otherwise, the existing data will be read.

That’s all the groundwork we need. Our store variable now points to a persistent hash, and we can start writing data to it.

Writing data

If we try to treat our persistent store like a regular hash, though, we’ll not have much luck:

store[:last_file] = "example.txt"
# PStore::Error: not in transaction

That’s because PStore requires you to both read and write data from within transactions. But that’s a blessing, not a hindrance. Transactions neatly solve the problem of multiple processes accessing the same store, since only one transaction can run at a time; they also allow you to roll back your changes if you encounter a problem — ensuring that data is never written in an incomplete or corrupted state.

We can start a new transaction by calling the transaction method on our store and passing it a block to execute. Here, we make the same modification we tried to make before:

store.transaction do
  store[:last_file] = "example.txt"
end

This time an exception isn’t raised, and after our block finishes the changes we’ve made are written to the store automatically. We don’t need to wait for our script to finish or for some other later point for the persistence to happen; it’s a constant process, happening after each transaction.

We can make as many changes to the store as we like during one transaction:

store.transaction do
  store[:last_file]        = "example.txt"
  store[:last_file_opened] = Time.now
end

Sometimes, we’ll be calculating data as we progress through a transaction. If we discover part-way through that we can’t or don’t want to finish the transaction, we can call abort. It’ll return from the block and discard any changes that have been made.

store.transaction do
  store[:last_file] = "example.txt"

  user = User.get_current_user
  store.abort unless user

  store[:last_file_user] = user
end

Our final line in the transaction, storing the user’s details, will only be reached if we actually have a user; otherwise, no data will be stored at all, not even the first value we wrote.

We might also want to do the opposite, exiting the block but saving what we’ve done so far. For that, we can use commit rather than abort:

store.transaction do
  store[:last_file] = "example.txt"

  user = User.get_current_user
  store.commit unless user

  store[:last_file_user] = user
end

In this case, if a user isn’t found, we’ll exit the block before writing the :last_file_user value, but the :last_file value will be written.

Reading data

Reading data is straightforward. Just like when we write data, we need to read it in a transaction — to make sure nobody’s trying to write the data at the same time — but otherwise, we just treat the store like a normal hash:

last_file = store.transaction { store[:last_file] }

transaction returns whatever the block returns, so if we’re just fetching a single value we can do this neat one-liner.

PStore offers a fetch method too, which — like Hash#fetch — allows you to specify a default value, used when the key doesn’t exist in the hash. So if we were to call:

value = store.transaction { store.fetch(:nonexistent_key, "default value") }
# => "default value"

We see that, since our key doesn’t exist, fetch returns its second argument.

The file format

Under the hood, Ruby uses Marshal to to convert the hash to something it can write to disk. Marshal returns a byte stream; regardless of other issues, this fact is enough for it to be unsuitable for some applications.

Fortunately, the PStore class isn’t the only way to to use PStore; there are other implementations that use the same interface, but read and write data from and to different formats. One of these that also comes with the standard library is YAML::Store.

`YAML::Store`

YAML::Store, as the name suggests, uses YAML as its data format when persisting the hash; the nice thing about this is that the data as written is human-readable and human-editable. Although finding yourself editing your persisted data regularly should probably prompt you to reconsider how you’re doing things, personally I often find it useful to be able to peek into the data and modify it, removing individual keys or adjusting individual values — something that’s impossible with Marshaled data.

YAML::Store actually inherits from PStore; so, the only difference between the two is how you create them. Rather than PStore.new, we call YAML::Store.new:

require "yaml/store"

store = YAML::Store.new("data.yml")

After that, the store we have works the same as a regular PStore; we read and write data using transactions in exactly the same way, and it behaves like a hash.

Which one you choose, for my money, depends on whether you need to value performance or human readability in the particular application you’re writing. I almost never have a need to optimise the performance of my persistent store, and I generally place a value on being able to view and edit the data; so, I primarily choose YAML::Store. Your mileage, as ever, may vary.

Next time you find yourself wanting to cache small snippets of data, persist options, or otherwise write structured data to disk in a painless way that you know is secure, reach for PStore. Compared to rolling your own solution, you’re likely both to save time and to improve reliability.

Roblog