Paths aren't strings

18 January 2014 Filed under:

Paths aren't strings

In Ruby, we deal often with files; reading them, writing them, checking whether or not they exist. When working with these files, we generally reference them by their paths on the filesystem: /etc/hosts, for example, or /usr/local/bin/git.

As in other languages, it’s pretty common in Ruby to represent these filesystem paths as strings. In a way, that’s fine: it works okay, and if we want to do something that gets to the files that they represent, there are methods on File that can help us find what we want (for fetching the absolute path of a relative filename, or checking whether a file exists, for example).

But in the world of Ruby, with its rich object model, this feels neither very idiomatic nor very object oriented. There’s lots of behaviour associated with paths, and strings don’t encapsulate this behaviour very well.

Paths can be relative, for example. That is, multiple paths that seem different when expressed as a string can in fact correspond to the same file; if we’re in the /usr/local directory, for example, we can reach /etc/hosts using the paths /etc/hosts or ../../etc/hosts; we can reach /usr/local/bin/git with both /usr/local/bin/git and bin/git. To check if one string path is the same as the other, then, we can’t just do path1 == path2.

That’s not all. Paths are representations of files, and those files have attributes and states that matter to our programs. Does the path point to a directory, for example? Does the file the path points to exist? How big is it? Can we read from it? Can we write to it?

Paths are fundamentally also a hierarchical data type, expressed using a delimiter (usually /); we can traverse deeper into the filesystem by adding slash-separated values to a path, and climb back up the filesystem hierarchy by removing them.

The String class in Ruby is aware of precisely none of these behaviours, and so if we want to use them then we’re forced to use a kludgey mix of static methods; things like File.join to build up paths, File.exists? to check for the existence of files, and so on. Some things can’t really be done at all if we store our paths as strings, assuming that things like traversing the filesystem by using split("/") fills you — rightly — with unease.

So if storing paths as strings is an anti-pattern, what are we to do? Well, it turns out that the Ruby standard library comes with a type for just this purpose, albeit one that’s underused: Pathname.

Pathname is part of the standard library in Ruby; it’s not an external dependency like a Gem, so you can safely rely on it being present in all your scripts. Once we’ve required the library, we can create a Pathname in Ruby by passing a string to Pathname.new:

require "pathname"

path = Pathname.new("/etc/hosts")

In fact, there’s a shortcut for Pathname.new; just call Pathname like a method:

path = Pathname("/etc/hosts")

If we do nothing else, we’ve got ourselves an object that behaves in many ways like a string. Its to_s method, for example, returns the path as a human-readable, ordinary string:

path.to_s
# => "/etc/hosts"

In places where things are implicitly converted to strings, then — like puts and print — we can use our Pathname object just as we would a normal string.

It also implements to_path, which is used internally by the File class; so, we can pass our Pathname object into something like File.open, and it will act just the same as if we passed it the path as a string:

File.open(path, "r") { |file| puts file.read }

But we also gain a lot of methods that a string doesn’t have. In this brief overview, I’m going to split them into two categories: inquiry and traversal.

Inquiry

Since our Pathname object knows that it represents a path to a file, unlike a string would, we can ask it questions about the file that our path represents. To continue our above example, we might want to check whether the path points to a directory:

path.directory?
# => false

Or whether the file actually exists:

path.exist?
# => true

We can also check whether the current process has permission to either read from or write to the file:

path.readable?
# => true
path.writable?
# => false

Of course, these aren’t particularly exciting features; they’re already fairly accessible as part of the File class thanks to the FileTest module. But it certainly feels a lot more OO to pass these messages to the path itself, rather than using some entirely separate static methods.

Traversal

For my money, though, it’s when traversing the filesystem that representing paths as Pathname objects really starts to feel worthwhile.

Let’s imagine that you have the following folder structure:

lib/
  + script.rb
data/
  + file.txt

You want to access file.txt from your script.rb script, but you want to make sure that this works whatever working directory you run the script from. That means you need to figure out what the absolute path to file.txt is, and then reference it using this absolute path.

If you’re written a gem, for example, you might well have encountered this sort of task before. A solution I often see is something like the following:

path = File.expand_path(File.join(File.dirname(__FILE__), "..", "data", "some_file"))

I see this pattern in gems a lot, and despite having seen it hundreds of times and knowing instinctively what it’s doing, it still throws me a little when I encounter it: there’s so much noise there that I have to actively think about what the author is doing.

Let’s rewrite this to use Pathname, and see if we can’t reveal our intentions a little more clearly:

path = Pathname(__FILE__).dirname.parent + "data" + "some_file"

We start by getting a reference to the current file. Then, we go up one level to the directory that the file resides in; then up another to the directory one level above.¹

The next step, if you’re used to representing files as strings, might seem odd: we’re just using the + operator to add elements onto the path, but we’re not adding a separator as we might otherwise do either manually ("foo" + "/bar") or with File.join. That’s because Pathname will take care of adding the separators for us every time we append a new element to the path.

I don’t know about you, but to me the second example seems clearer.

We’re not just limited to this simple traversal, either. Let’s imagine we have a path to a file deep in the hierarchy of the file system:

path = Pathname("/some/really/deep/file/in/some/really/deep/folder")

Imagine we want to work our way up the filesystem from our current location until we hit a certain point: a directory with a certain name, for example. There’s no straightforward way to do this with the path represented as a string, but with Pathname it’s easy:

dir = nil
path.ascend { |f| dir = f and break if f.basename.to_s == "some" }

Here, we climb upwards through the filesystem (so we get to folder, then up to deep, then up to really, and so on backwards through the path). As soon as we find a directory whose name is some, we’ve found what we’re looking for and so break out of our loop.

(If we wanted to proceed in the opposite direction — that is, to start with /, then /some, then /some/really, and so on — we could use descend, which is otherwise identical to ascend.)

The great thing about this type of traversal is that we don’t have to touch the filesystem at all. The above example, with its path that doesn’t exist at all, will still execute perfectly well; Pathname has enough information from the path to know each step along the way, right up to the filesystem root.

That’s not to say that we can’t access the filesystem when we want to, though. For example, we don’t have to traverse the filesystem upwards: we can drill down into it with children:

etc = Pathname("/etc")
etc.children
# => [#<Pathname:/etc/AFP.conf>,
#<Pathname:/etc/afpovertcp.cfg>,
#<Pathname:/etc/aliases>,
# ...

The array returned by children contains references — as Pathname objects, naturally — to all the files and directories in the /etc directory.

From here, it’s a short leap to powerful and expressive traversal of the filesystem, especially for methods like children that return arrays (and so have the full power of Enumerable available to them). For example, let’s fetch all the directories in the current directory that have more than 10 files in them:

big_dirs = Pathname(".").children
  .select { |child| child.directory? && child.children.length > 10 }

Or find all the directories that have at least one CSS file in them:

has_css = Pathname(".").children
  .select { |child| child.directory? && child.children.find { |file| file.extname == ".css" } }

There’s much more to Pathname than the small snippet I’ve presented here, but hopefully it’s been enough to convince you to think about using Pathname the next time you want to represent file paths in Ruby. It’s powerful, semantic and, since it’s part of the Ruby standard library, there’s not much excuse not to use it.

In Ruby 2.0, we can simplify this further by calling Pathname(__dir__), eliminating the need for the call to dirname. ↩

Roblog