The Power of Regular Expressions
Written by Ben Friedman, August 15th, 2016
You're writing an essay, and it hits you, you've written something completely wrong! However, all you need to do is fix your spelling of a word, but you spelled it differently every time! Let's say you've written California, california, CaliFornia and a few other strange variations. Wouldn't it be nice if you could just change all of them?
With regular expressions you're problem is solved, and some! Regular Expressions (frequently written as 'regex') is, in a nutshell, a sequence of characters that when interpreted becomes a search pattern. This pattern is then applied to a given set of data, usually text, to check for matches. In the case mentioned above you might write a regex pattern as follows.
Now what does this do though? It just seems to be California without the proper casing. The / symbols are not part of the matching expression in this example, but are boundaries, indicating it should analyze what's inside. The 'i' following is outside the / symbols to act as a modifier however. In this case 'i' indicates casing should not matter when matching. This means this pattern will match anything with the characters matching 'california', regardless whether they're styled 'CaLifornIa', 'CALIFORNIA' or 'caliFORNIA'. This regex pattern 'speaks' as follows:
- Match the string literal 'california'
- Perform this match case insensitively
It's that easy! So you may be wondering, how can I use these myself, and does my program support this?
In most cases just about any modern text editor will have some sort of regex support. This frequently comes up with cases like 'Find and Replace'. Where you may have noticed an additional checkmark option along the lines of 'grep' or 'regex'. If you've ever seen something similar to this in your text editor or word document processor, chances are it's ready to use regex!
A quick disclosure, the example above may or may not work in your program of choice. This is due to flavor and implementation. You see regex, as a concept, is not fully standardized. There's no one way to run it. As a result many implement what seems to best suite their needs. In fact there are even variants of regex that utilize different engines, such as DFA, NFA or a mixture of both, known as flavors. It's up to the software developer of your product to choose this for you, and to decide what's best. They may even pick and choose particular syntax to express certain functionality that is unique to their version, sometimes in a nonstandard way; aka implementation.
All this can be a bit overwhelming, but rest assured, it's well worth the time to learn.
If you develop software regex packages are almost always available, in nearly any language. And because it is itself a language, you can write regex 'code' that is portable, and can save you a tremendous amount of time. Whether you're a writer, editor, programmer, scripter or an individual who simply writes documents, regex could be saving you a good amount of your time.
The fundamentals of regular expressions can be comprehensive, but with the basics, such as literal string matching and characters classes. You can already outpace most others who try to do it the hard way. You can get started with regular expressions at Regex One and www.regular-expressions.info. The former is more friendly for getting introduced to regular expressions, complete with interactive exercises that make it easy to get a grasp of the concept.
Before you know it you may even be able to figure out what this is:
Questions? Corrections? Concerns? Contact us at email@example.com