It’s often said that Ruby is “Perl done right”: that it combines the terseness and text processing power of Perl with inspiration from Smalltalk, Lisp, and CLU, and in doing so creates a language that’s “the best of all possible worlds”.
Regardless of the merit of this idea, it’s certainly true that — when it comes to text processing — anything you can do in Perl you can do in Ruby, thanks mainly to the fact that Ruby steals wholesale many of the best text processing ideas Perl has. And yet lots of Ruby developers aren’t aware of the power Ruby can give you when it comes to writing throwaway one-liners in the shell.
All users of Unix-like operating systems will find themselves in this
position eventually: you have to process some output from a process or
some files, and you’re just reaching the point where standard tools like
wc and their brethren are beginning to
show their limitations.
You could learn
awk. Or you could reach for a powerful tool that you
already have in your box: Ruby!
We all know, I’m sure, that you can invoke Ruby from the command line by passing it the filename of a script to run:
But did you know you can also pass code as an argument and have Ruby
interpret it? Just use the
-e flag when invoking Ruby:
Nifty, perhaps. But we can get much niftier.
-n switch acts as though the code you pass to Ruby was wrapped in
In short, this means that the code you pass in the
-e argument is
executed once for each line in your input. So, imagining that you had
a file called
foo.txt, with the following content:
foo bar baz
Then invoking Ruby like so:
foo bar baz
Congratulations! You’ve just implemented
cat in Ruby.
But what’s this
Throughout these examples, you’ll perhaps have noticed the use of the
special global variable
$_. When you invoke Ruby this way, it sets
$_ to the current line that’s being processed; so if you wanted to do
something like only print lines that start with “f”, that would be very
Working with standard input
Of course, like
cat, this doesn’t work only with files; you can also
pipe the output of another process, and use its output as your input.
To us a slightly contrived example, we might want to find the ID of any
top that are running on our system.
We can get a list of all running processes with
ps ax. It outputs
an enormous amount, but each line is formatted like follows:
49175 s010 Ss 0:00.18 login -fp rob
We have the process ID in the first column, and the process name in the
right; so all we need to do is print the first column if the line
If you wanted to, you could then pipe that into something like
if you wanted to get rid of all the matching processes. Handy!
(If you’d like to find out more about how you’re able to use the same code to work with both files and standard input, without changing anything, then you can read up on ARGF in Ruby.)
These solutions are pretty concise already. But what if you feel as
though all the
puts statements are a bit unnecessary? Well, Ruby has
-p switch acts similarly to
-n, in that it loops over each of
the lines in the input. However, it goes a bit further: after your code
has finished, it always prints the value of
$_. So, you can imagine it
It’s really useful, then, for doing transformations on the input. If you
wanted to take every line you were given, but replace every instance of
e you found with the letter
a, you could do:
Here, we modify the value of
$_, and this modified value is what’s
printed to the screen.
Of course, our code here runs in a loop; what if we wanted to run something just once, before our loop starts? We might want to initialise a variable, for example.
In Ruby, we can use
BEGIN blocks to do this. They’re an idiom borrowed
awk, and allow us to execute code just once, at the start of the
So, to output line numbers from your input, you could do:
Here, we initialise
0 at the start of the script. The
block executes only once, so is ignored on subsequent loops; we can then
i, producing the following output:
1 foo 2 bar 3 baz
Of course, all of these examples are fairly contrived; I haven’t done
anything that wouldn’t already be possible with tools like
tr, and so on.
But in reality you have access to the whole world of not just the Ruby
standard library but every Ruby Gem too. Just think of the power in
String class alone:
squeeze. Think of
Digest; think of all of the power of
Ruby’s date and time processing;
possibilities are endless.
Getting used to the idea that Ruby can be as much a part of your standard pipeline toolchain as any of the usual Unix tools is an important idea: it suddenly opens up a world of possibilities to do complex processing in a terse and expressive way. Go try it!
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!