Ruby’s regular expressions, like those of most other languages, allow you to pass so-called “pattern modifiers” when creating regular expressions; these modifiers then change the way the regular expression behaves.
Most people are familiar with things like
/i, which makes the regular
expression case-insensitive, or perhaps
/m, which allows the
character to match across multiple lines.
But less commonly used is the
/o modifier. That’s perhaps with good
reason; its role is much more of a niche one than those of the everyday
/x. But it’s occasionally useful, and — like most
nuggets of programming-language trivia — is useful to have stored away
in the back of your mind.
In a nutshell, the
/o modifier to causes any interpolation in
a regular expression to happen only once; the final regular expression
is then cached, and repeated execution won’t incur the penalties of
This isn’t normally too useful, but there are two conditions that, when met, make it worth remembering: if you’re interpolating the result of a method call that might have some costly calculation involved; and if you’re matching lots of values in a loop.
If that’s lost you slightly, don’t worry. Let’s look at an example
— though admittedly a slightly contrived one — that shows how the
modifier can result in a significant increase in performance under the
First, let’s define a method that returns part of a regex. We’ll also
include a call to
sleep, to simulate the effects of performing some
complex calculation, and output a message to show that the method has
Let’s imagine we call this method when creating a regular expression — to end up with something like this, which matches a string consisting only of letters:
Finally, let’s match this regular expression in a loop, so that it’s created repeatedly:
If we were to run this script, we’d see something like the following:
letters() called Matches! letters() called Matches! letters() called Matches! letters() called Matches! letters() called Matches! letters() called Matches! letters() called Matches! letters() called Matches! letters() called Matches!
We can see from the output that every time the regex literal is created
and passed to
letters method is called — in this case
incurring a half-second penalty each time. As a result, the script is
really slow; after all, it takes half a second of execution for each of
the words in our array.
o modifier lets us avoid this. It will perform the interpolation
once, and then cache the resulting regular expression; future execution
of the same line will use this cached expression, and won’t perform the
To take advantage of this, all we need to do is pass
/o when defining
our regular expression:
If we modify our script to use the
o modifier we see the following
output, showing that the
letters method is only called once:
letters() called Matches! Matches! Matches! Matches! Matches! Matches! Matches! Matches! Matches!
We also see a significant increase in speed. Let’s benchmark the two to show the comparison:
The result is, predictably, not even a contest:
user system total real without /o: 0.000000 0.000000 0.000000 ( 4.508294) with /o: 0.000000 0.000000 0.000000 ( 0.501238)
4.5 seconds vs. 0.5 seconds in this admittedly entirely contrived example.
As I said, this isn’t something you’re going to find yourself using every day, or perhaps even every year. But when you find yourself in a situation that calls for it, it’s useful to know about. And hey, sometimes knowledge is worthwhile in and of itself, right?
Text Processing with Ruby
Enjoyed this and want to find out more about data wrangling and text munging in Ruby? You might be interested in Text Processing with Ruby, a book that covers all that and more. It’s published by Pragmatic Bookshelf and is available now!