Roblog

home about me text processing with ruby

ARGF in Ruby

3 December 2013

In my recent post about using Ruby for text processing, I used examples that worked with both standard input and files without actually having to alter my code in any way.

I was able to do this using a construct that’s yet another part of Ruby’s Perl heritage:1 ARGF. It’s a stream that reads from either the files that’ve been passed on the command line or, if none have been specified, from standard input.

Importantly, it does this without the calling code actually having to know or care which input it’s reading from; this enables you to emulate the behaviour of many Unix utilities — such as cat, cut, grep, and hosts of others — that allow you either to pipe input or read from files.

Diving in

Like other streams in Ruby, ARGF responds to each; the block you pass to it will be invoked once per line in the stream. So to demonstrate how ARGF works, here’s perhaps the simplest possible use of it:

# argf.rb
ARGF.each do |line|
  puts line
end

 Reading from files

If we run the above script with arguments, like so:

$ ruby argf.rb foo.txt bar.txt baz.txt

Then Ruby will assume that each of the arguments is a file, and ARGF will read from each of the files in turn, from left to right. That means that our script is equivalent to:

ARGV.each do |file|
  File.open(file, 'r').each_line do |line|
    puts line
  end
end

If one of the files doesn’t exist, Ruby will throw its standard ENOENT error, like so:

$ ruby argf.rb nonexistent.txt
argf.rb:1:in `each': No such file or directory - nonexistent.txt (Errno::ENOENT)
	from argf.rb:1:in `<main>'

Reading from standard input

If no arguments are specified, then Ruby will read from standard input. That means that our example script is equivalent to:

while line = $stdin.gets
  puts line
end

This enables us to pipe input into our script. So we could call:

$ echo "foo\nbar" | ruby argf.rb

And we’d see the output:

foo
bar

More usefully, this means that we could pipe the input from another process into our script and do something interesting with it.

This “simplest possible” script is, you may have noticed, functionally equivalent to cat; it will concatenate files passed to it, and it will echo back standard input.

Digging deeper

ARGF has a few methods that are unique to it.

A few are useful when ARGF is reading from files: we can use ARGF.filename to get the name of the file that’s currently being read, and use ARGF.file to get an IO object pointing to the current file.

If you want to know when you’ve moved onto a new file, ARGF.file will come in handy: ARGF.file.lineno stores the line number that’s currently being read, which will naturally be 1 when a new file is started. So, to read from all the files passed on the command line, but output the name of the file before starting a new file, you could use:

ARGF.each do |line|
  puts "\n#{ARGF.filename}:" if ARGF.file.lineno == 1
  puts line
end

If you’d like not to process a file, ARGF has you covered too; just call ARGF.skip. This is useful if you only want to process files of a certain type, or want to stop processing part-way through a file (once you’ve got what you need, for example).

Summing up

ARGF is one of the many great examples of how Ruby’s built-in functionality respects “the Unix way”. It’s essential that flexible and well-behaved Unix tools accept input both from standard input and from files, and with ARGF Ruby makes it trivial to support just that behaviour.

If you’ve written scripts that either emulate this behaviour themselves or that only support one method of input (e.g. only accepting standard input, or only reading from files), then consider using ARGF instead; it can make your life easier and make your scripts more flexible — one of those win-win situations that are pleasingly frequent in Ruby.

Text Processing with Ruby

Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!

  1. It’s the equivalent of Perl’s while(<>) idiom.