Ruby can be invoked from the command line in order to create powerful text-processing one-liners. I wrote about this a while back, in “Ruby’s -e, -n, and -p switches”.
These one-liners are powerful, concise, and expressive. For an example that verges on the magical, how about outputting the third field in a CSV file, but only if the line contains a URL?
$ ruby -F, -ane 'puts $F if /http:/' file.csv
Or outputting the contents of a Markdown file, with curly quotes switched to straight ones?
$ ruby -pe 'gsub(/“|”/, "\"")' foo.markdown
In both of these examples, and indeed in all Ruby one-liners, an oddly
named global variable is at work even when we don’t see it used
explicitly. Its name is
$_ (dollar underscore).
Ruby has many global variables like this; there’s a complete list of
$_ is one of the most useful. Indeed, along with the globals
relating to regular expressions, it’s the only one I use with genuine
There are five key places that
$_ is used. In each one, it’s likely
that we won’t actually see the variable itself; instead, it’s used by
Ruby internally. But knowing that it’s there can help to explain what’s
going on; it helps us thread a connection through several different
areas of the Ruby language, allowing us to peer behind the curtain and
understand Ruby’s magic a little better. Let’s dig in.
1. It’s set to the content of the current line
There are two scenarios in which we loop over lines of input. The first
is when, as in the examples we saw earlier, we use the
switches when invoking Ruby.
When we do this, the Ruby interpreter will loop over the lines of input
for us, running the code that we pass to it once for each line of input.
In doing so, it sets the value of the
$_ variable to the contents of
the current line. For example:
$ printf "foo\nbar" | ruby -ne 'puts $_' foo bar
The reason this happens, though, is because using the
switches is essentially like wrapping your code in the following:
while gets # your code here end
gets that sets the
$_ variable, which means it’s also
accessible in regular Ruby scripts too — not only one-liners. Wherever
$_ will be set to the input that
2. It’s outputted automatically when using
If we use the
-p option when starting Ruby, it’s not necessary for us
to write a
But what does Ruby actually output? You guessed it: the
This means that, if we pass the
-p option to Ruby, we can affect the
output of our script by manipulating the content of the
$ printf "foo\nbar" | ruby -pe '$_ = "baz\n"' baz baz
In that case we reassigned the variable entirely, but we can also mutate it:
$ printf "foo\nbar" | ruby -pe '$_.upcase!' FOO BAR
In this case, we transform the line of input from lowercase to
uppercase. (We can tell that the method mutates the string, rather than
returning a new one, because of the
3. It’s an implicit argument to
When we invoke Ruby with the
-p switches, the behaviour of
some of Ruby’s core methods changes slightly. One such change is how
In an ordinary Ruby script, or a one-liner without
$ ruby -e 'print'
If we invoke Ruby with
we call it without arguments:
$ printf "foo\nbar" | ruby -ne 'print' foo bar
This makes it really easy to write filters, that only output lines that meet certain conditions. For example:
$ printf "foo\nbar" | ruby -ne 'print if $_.start_with? "f"' foo
This one-liner outputs only those lines the start with the letter
4. It’s the implicit receiver of some global string methods
Another behaviour that changes when Ruby is invoked with either the
-p options is that some global methods are defined. They are:
They’re defined in the
Kernel module, the same place as
puts, which means that we don’t call them with a receiver — they’re
But how can we call, say,
gsub in this way? Normally, the receiver of
gsub method is the string that we want to perform a substitution
within. If there’s no receiver, what string will be used instead?
There are no prizes for guessing that the answer is
$_. In this way,
these global methods allow us to perform operations on each line of
input without having to refer to that input explicitly. For example:
$ printf "foo\nbar" | ruby -ne 'puts gsub(/[aeiou]/, "_")' f__ b_r
In this case, we output each line of input, except with all vowels replaced with underscores.
This behaviour is even more useful when used with
-p, since we can
skip the output step:
$ printf "foo\nbar" | ruby -pe 'gsub(/[aeiou]/, "_")' f__ b_r
This works because these global methods actually modify
$_ as well as
manipulating its content; they’re actually equivalent to the
!-suffixed methods on
String, and so the above example is equivalent
$ printf "foo\nbar" | ruby -pe '$_.gsub!(/[aeiou]/, "_")' f__ b_r
Particularly if you’re not comfortable with using
sed, but even if you
are, this is a really powerful way to perform find-and-replace
operations from the command line.
These global methods are otherwise identical to their counterparts from
String class; they’re just a useful shortcut for a common
5. It’s the implicit matcher of regular expressions
The final place that
$_ is used is as the implicit subject of regular
expression matches. It’s this behaviour that I exploited in the very
first example in this post, and it’s this behaviour that’s perhaps most
obscure (or magical, depending on your viewpoint).
This behaviour is triggered either when we use a regular expression in
a conditional context, or by using the
~ operator on a regular
expression. For example:
$ printf "foo\nbar" | ruby -ne 'p ~ /^f/' 0 nil $ printf "foo\nbar" | ruby -ne 'print if /^f/' foo
In the former case, we see that an integer is returned if the expression
matched the current line of input (in this case
0, since the
expression matched at the very first character). If the expression
didn’t match, the method returns
It’s the latter, conditional form that’s most useful, since it allows us to do something based on whether a line matches a given expression — an incredibly common requirement for filter scripts.
Behind the scenes, this translates to the following:
printf "foo\nbar" | ruby -ne 'print $_ if $_ =~ /^f/'
The implicit example is much more magical, but it’s also much shorter and easier to read — and with one-liners, every character counts.
Ruby has lots of cryptic globals, but one that crops up in lots of
different places is
$_. It’s always connected to the idea of
processing input line-by-line, which is a really common requirement.
Getting to know it can help you write nicely concise text processing
scripts — and concision is particularly helpful when you’re writing
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!