In my recent post about using Ruby for text processing, I used examples that worked with both standard input and files without actually having to alter my code in any way.
I was able to do this using a construct that’s yet another part of
Ruby’s Perl heritage:1
ARGF. It’s a stream that reads from
either the files that’ve been passed on the command line or, if none
have been specified, from standard input.
Importantly, it does this without the calling code actually having to
know or care which input it’s reading from; this enables you to emulate
the behaviour of many Unix utilities — such as
hosts of others — that allow you either to pipe input or read from
Like other streams in Ruby,
ARGF responds to
each; the block you
pass to it will be invoked once per line in the stream. So to
ARGF works, here’s perhaps the simplest possible use
## Reading from files
If we run the above script with arguments, like so:
Then Ruby will assume that each of the arguments is a file, and
will read from each of the files in turn, from left to right. That means
that our script is equivalent to:
If one of the files doesn’t exist, Ruby will throw its standard
error, like so:
$ ruby argf.rb nonexistent.txt argf.rb:1:in `each': No such file or directory - nonexistent.txt (Errno::ENOENT) from argf.rb:1:in `<main>'
Reading from standard input
If no arguments are specified, then Ruby will read from standard input. That means that our example script is equivalent to:
This enables us to pipe input into our script. So we could call:
$ echo "foo\nbar" | ruby argf.rb
And we’d see the output:
More usefully, this means that we could pipe the input from another process into our script and do something interesting with it.
This “simplest possible” script is, you may have noticed, functionally
cat; it will concatenate files passed to it, and it will
echo back standard input.
ARGF has a few methods that are unique to it.
A few are useful when
ARGF is reading from files: we can use
ARGF.filename to get the name of the file that’s currently being read,
ARGF.file to get an
IO object pointing to the current file.
If you want to know when you’ve moved onto a new file,
come in handy:
ARGF.file.lineno stores the line number that’s
currently being read, which will naturally be
1 when a new file is
started. So, to read from all the files passed on the command line, but
output the name of the file before starting a new file, you could use:
If you’d like not to process a file,
ARGF has you covered too; just
ARGF.skip. This is useful if you only want to process files of
a certain type, or want to stop processing part-way through a file (once
you’ve got what you need, for example).
ARGF is one of the many great examples of how Ruby’s built-in
functionality respects “the Unix way”. It’s essential that flexible and
well-behaved Unix tools accept input both from standard input and from
files, and with
ARGF Ruby makes it trivial to support just that
If you’ve written scripts that either emulate this behaviour themselves
or that only support one method of input (e.g. only accepting standard
input, or only reading from files), then consider using
it can make your life easier and make your scripts more flexible — one
of those win-win situations that are pleasingly frequent in Ruby.
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!
It’s the equivalent of Perl’s