Regex In 10 Minutes
Today, we'll look at how regular expressions work and how we can leverage them to improve our programming efficiency. We'll start by reviewing some regex basics and then we'll dive into some Xcode-specific use cases.
Introduction
Regular expressions, commonly referred to as regex, represent a search pattern as a sequence of special characters. Oftentimes, it is used to identify misspelled words, validate data, check user input, or scrape the web.
With expressions like this ^(?=(?!(.)\1)([^\DO:105-93+30])(?-1)(?<!\d(?<=(?![5-90-3])\d))).[^\WHY?]$
it's no surprise that people avoid regex whenever possible.
However, mastering regex can greatly improve our capabilities as programmers if we can make it past the awkward syntax and the learning curve. Luckily, regular expressions are universal and exist across all programming languages, so we only have to learn them once.
Similar to how a programming language consists of keywords like for
,if
, while
, etc., regular expressions simply consists of a series of special characters used to express a variety of text patterns.
To begin, we'll examine all of the different types of characters and their respective responsibilities. While the information might seem overwhelming at first, once we look at some examples, I promise it will all make sense.
Getting Started
Regular expressions begin with a /
followed by any number of the following symbols (also referred to as metacharacters).
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.
We use regex to match patterns by combining these metacharacters into longer expressions.
Characters
[ ]
You can use this bracket expression to match text against the character(s) contained within the brackets:
[abc]
would matcha
,b
, orc
[a-z]
would match any lowercase letter froma
toz
[abd-j]
would matcha
,b
,d
,e
,f
,j
[a-zA-Z]
would match letters froma
toz
and fromA
toZ
[0-9]
would match any digit in the range 0 through 9[df]og
matchesdog
andfog
.ab
matchesab
, but notAB
^
You can use this character to match the starting position of the string:
^Hello
will match strings that start with the wordHello
[^ ]
This combination of characters allows us to specify characters we do not want to include in our search:
[^abc]
will match all other characters excepta
,b
, orc
[^a-z]
matches any single character that is not a lowercase letter froma
toz
.
This can be used as a wildcard character to match a single character (excluding newlines):
a.c
would match any three-character string starting with ana
and ending with ac
(i.e.abc
,a4c
,a@c
, etc.) , but would not matchabbc
Remember:[a.c]
would match onlya
,.
, orc
a.*
would match ana
followed by zero or more characters (i.e.a
,abc
,a123
, etc.)view.*Appear
would match all instances ofviewDidAppear
andviewWillAppear
$
Matches the ending position of a string or the position just before a string-ending newline:
[bc]at$
matchesbat
andcat
, but only at the end of the string or line.
Quantifiers
*
This character allows you to match the previous character 0 or more times:
/ab*c/
would matchac
,abc
,abbc
,abbbc
, etc./[abc]*/
would matcha
,b
,c
,ca
,cba
,abcc
and all other permutations of these 3 characters/a.*b/
would matchaxb
,axxb
,a12345b
, etc.[ab]*cd
matchescd
,acd
,bcd
,aacd
,bacd
,abcd
,bbabacd
, etc.
+
The +
operator is quite similar to the *
, but instead allows you to match the previous character 1 or more times:
/ab+c/
would matchabc
,abbc
,abbbc
, but would not matchac
/[df]+og/
would matchdog
,ddog
?
This operator allows us to match the previous character exactly 0 or 1 times:
/ab?c/
would matchac
,abc
, but would not matchabbc
/ea?/
matches onee
followed by an optionala
[bp]?at
matchesat
,bat
, andpat
.
\
Just like in normal programming languages, the backslash allows you to escape special characters:
\+
will match the+
in1+2=3
which would otherwise be treated as a metacharacter\( \)
is now treated as the string "( )" and\{ \}
is now evaluated as "{ }"
{ n }
This operator allows you to match the previous character exactly n
times:
{3}
will match the previous character exactly 3 times{3,}
will match the previous character exactly 3 or more times{2,4}
will match the previous character exactly 2-4 times [inclusive]aa{2}
matchesaa
aa{2,3}
matchesaa
andaaa
Logic
|
This operator allows you to specify alternative possibilities:
t|The
matches the stringt
orThe
explicitly(t|T)he
applied toThe ball is over there
matches bothThe
and thethe
in "there"seriali[sz]e
matches bothserialise
andserialize
(...)
Parentheses allow you to define what's called a capture group which lets you extract the matching text into a variable for later use.
Given the following regular expression:
(\d\d\d)-(\d\d\d)-(\d\d\d\d)
When we apply it to "123-456-7890", we can see the breakdown of the captured groups below:
Now, if we wanted to remove the formatting (i.e. "1234567890"), we could concatenate the captured groups together:
$1$2$3
Note: The captured group - $0
- represents the original expression itself (i.e. 123-456-7890).
Character Classes
\w
This will match all alphanumeric character, including "_" and is case-insensitive - it is equivalent to [a-zA-Z0-9_]
:
\w
applied to "hello world my name is 42" would matchhello
,world
,my
,name
,is
,42
- notice, though, that all spaces are ignored\w{4,}
matches any words 4 or more characters long\w{4,5}
matches any words between 4 and 5 characters in length
\W
This will match anything that isn't a word:
\W
applied to "the year is 2022" would match on2022
and all of the whitespace in between the words
\d
Matches a digit (i.e. [0-9]
).
\D
Matches anything other than a digit (i.e. [^0-9]
) including spaces.
\s
This will match a whitespace.
\S
This will match anything that isn't a whitespace.
Regular Expression Examples & Xcode
To use regular expressions in Xcode, simply select Regular Expression
from the the "Find" menu:
Note: Xcode automatically adds the starting /
in regular expressions for you.
With the theory and the fundamentals out of the way, let's look at some real-world use cases for regex.
Validating An Email
When writing a new regex expression, I find it easier to work backwards from the requirements.
What do we know about an email address?
We know the first part of the email will contain a mixture of uppercase and lowercase letters along with zero or more digits.
As a reminder, the bracket syntax allows us to specify a set of valid characters to match against and the +
operator allows us to look for one or more instances of the previous expression.
So, combining these together we have the first part of our email validation regex:
\[a-zA-Z0-9]+
Then, we know we'll see exactly 1 @
symbol, so our updated implementation now looks like this:
\[a-zA-Z0-9]+@
@ is outside of any brackets because we're looking for a single instance of that character - not a pattern or a group of characters.
Finally, we expect to see another combination of uppercase and lowercase letters followed by a domain extension.
\[a-zA-Z0-9]+@[a-zA-Z0-9.-]+.[a-zA-Z]+
The results seem promising as we're only catching the valid email addresses!
However, this is an overly simplified implementation and would fail on perfectly valid emails like:
- user.name@domain.com
- user_name@domain.co.in
- user_name@domain.co.in
- user-name@domain.co.in
If we tried to be extremely thorough, we'd probably end up in the neighborhood of Perl's 6,500 character long regular expression, so let's mutually agree to treat this as a stopping point 😅.
Validating A Phone Number
Phone numbers can appear in a variety of formats:
- (555) 444-6789
- 555-444-6789
- 555.444.6789
- 555 444 6789
Looking at the first grouping of 3 numbers we can see that they may be surrounded by parentheses, so our regex will start off with:
\(?\d{3})?
Next, we can see that we may have spaces, periods, or hyphens between groupings of 3 characters, so we'll need to handle that as well:
[-.\s]?
Now, once we've combined everything together, our expression will successfully match the area code from our list of sample phone numbers above:
(?\d{3})?[-.\s]?
Now, let's add support for the middle grouping of numbers which will be very similar to the previous expression:
\d{3}[-.\s]?
Finally, we can complete our implementation by validating the final grouping of 4 numbers with \d{4}
.
Here's the final regular expression:
/(?\d{3})?[-.\s]?\d{3}[-.\s]?\d{4}
Matching Whitespace
If you've ever used SwiftLint before, you're likely no stranger to linter warnings about leading and trailing whitespaces. This behavior can be expressed as - /^[ \t]+|[ \t]+$
- which will match any excess whitespace at the beginning or end of a line.
\t
matches a single tab.
Standardizing Coding Style
Let's say that your code contains a mixture of variable names declared in both camel-case (i.e. loginButton
) and snake-case (i.e. login_button
) and you want to standardize them.
Our typical "Find and Replace" options won't work here.
With "Find", our only option would be of finding every "_" character in our codebase which isn't particularly useful; this is a problem only regular expressions can solve.
We can use the following regular expression to find every expression written in snake-case in our codebase:
\w+?_.+?(?=[( )])
This uses an advanced regex feature called lookahead which you can read more about here.
Making Classes Final By Default
Let's say that we want to ensure that all of the UIViewControllers
in our project are final
by default.
It's easy enough to apply this change to all future UIViewControllers
, but how can we apply this change to our existing controllers?
We can use the following expression to find all declarations of UIViewControllers
that start with class
instead of final class
:
^(class)\s[\w]+ViewController:\s?[A-Z]+ViewController
Great! Since we're capturing class
, we can use our captured groups in the "Replace" textfield and hit "Replace All":
Remember that capture groups are 0 based where $0 is the input itself.
Now, all of our previous UIViewController
declarations are now final
.
Regular Expression Builder In Xcode
If you're still feeling a little shaky with the syntax, don't worry!
In Xcode, we can easily create basic regular expressions without the need for this special syntax.
Start by switching the Find accessory action from Contains
to Regular Expression
:
Then, we can select the +
and use the following menu to visually build our regular expressions:
Although this approach is limited in its ability to produce complex regular expressions, it is a good place to start as you learn the basics.
Tools for Practicing Regex
If you're interested in more articles about iOS Development & Swift, check out my YouTube channel or follow me on Twitter.
If you want to be notified whenever I post a new article, join the mailing list below.
Do you have an iOS Interview coming up?
Check out my book Ace The iOS Interview!