Roblog

home about me text processing with ruby

Text Processing with Ruby — Buy Now

So much data in the world exists in text format. Sometimes the data we need is in a database; other times it's in a file or on a website. Sometimes it's in a structured format; other times it's more freeform. Regardless, though, it's all text — and it can all be processed in the same fundamental way.

It's incredibly common to need to make sense of or transform this textual data. That's true if you're a developer, but it's also increasingly true of other professions, as more and more textual information is digitised. Having the skills to analyse and manipulate this data is a skill that is becoming ever-more valuable.

The Ruby programming language is an incredibly useful tool for performing these sorts of tasks, and Text Processing with Ruby takes you from learning the very basics to being able to write complex and powerful text processing applications. No knowledge of Ruby is assumed.

Learn how to extract value from the data that likely exists in huge quantities all around you already — and have fun while doing it.

Why did I write Text Processing with Ruby?

My day job involves a lot of data munging, text processing, and generally making sense of data that comes from database exports, CSV files, third-party APIs, and lots of natural language, human-written text.

This sort of task has always appealed to me. I'm interested in most aspects of programming and computer science, but I've always been far more drawn to the practical and the concrete than to the theoretical and the abstract. I've always found it more satisfying to extract some sort of meaning from a jumble of low-quality data than to design an algorithm or create an elegant formal proof. There's something pleasingly tangible about this sort of work.

Ruby is my tool of choice for doing these data wrangling tasks, for a variety of reasons. It's a language I know well; it's a language that I think is incredibly productive and developer-friendly; and it's a language that was consciously designed for this sort of task (and others, of course, but certainly this one too).

I'd always been surprised that, despite Ruby's eminent suitability for text processing tasks, there wasn't a book that attempted to provide a thorough manual for performing them in Ruby. And so, possibly foolishly, I set about writing Text Processing with Ruby in early 2014.

It was picked up by Pragmatic Bookshelf in August 2014 and it's now available to buy.

It covers subjects from the relatively basic — how to read from files and standard input, how to produce templated output with ERB, how to parse delimited files such as CSVs — to the much more complex, like regular expressions, writing parsers, and performing natural language processing tasks.

The book is structured into three parts. The first covers text extraction, the second text transformation and manipulation, and the third writing text to various locations. (The sharp-eyed among you might have spotted that these might easily be referred to as extract, transform, and load, a phrase well-established in the data processing world.)

Most programmers have to process text at some point in their careers. If you'd like to discover how fun that process can be, rather than dreading it, Text Processing with Ruby might just be the book for you!