Regular Expressions (Part 1/2)

Ever dreamed of speaking a cryptic language no one understands?

Like Ancient Egyptian from the pharaohs' time?

Well, kind of! :D In this chapter, I'll teach you how to write something like this:

#(((https?|ftp)://(w{3}\.)?)(?<!www)(\w+-?)*\.([a-z]{2,4}))#

Believe it or not, this unpronounceable gibberish... actually means something! I swear!

Okay, I won't lie to you... this is going to take some effort because we're diving into one of the trickier parts of PHP. But paradoxically, it's one of the most useful and interesting (some would even say fascinating) topics.

Just keep in mind: this chapter is challenging (in fact, I had to split it into two parts), but it's totally worth your time. Why? Because regular expressions are a super powerful and lightning-fast way to search inside strings (like entire sentences). Think of it as an ultra-advanced version of Find & Replace, a feature you won't want to live without once you've got the hang of it.

Need examples?

Automatically check if a visitor's email address is properly formatted (like "admin@siteraw.com")
Convert a date from US format (05-18-2025) to European (18/05/2025) or Japanese (2025/05/18)
Automatically turn any "https://" address into a clickable link, like some forums do
Or even create your own simplified markup language based on HTML, like bbCode ([b][/b]) or GitHub-flavored Markdown (##)

Open your ears and fasten your seatbelt. Let's go :D

Where and how do you use a regex?

Good news: unlike with PDO, you don't need to activate anything to use regular expressions.

POSIX or PCRE?

There are two main types of regular expressions in PHP, charmingly named:

POSIX: A regular expression language promoted by PHP that's supposedly a bit simpler than PCRE (though still quite complex). Its big downside? It's slower than PCRE.
PCRE: These regular expressions come from another language (Perl). They're a little more advanced, but much faster and more efficient.

So PHP lets you choose between POSIX and PCRE. For me, the choice is obvious: we're going with PCRE. Don't worry — it's not much more complicated than POSIX, but it's way faster. And at our level of PHP, speed is exactly what we care about. :)

The functions we'll use

Since we're using PCRE, we'll be working with several functions that all start with preg_:

preg_grep
preg_split
preg_quote
preg_match
preg_match_all
preg_replace
preg_replace_callback

Each of these has its own purpose — some are just for searching, others for search and replace — but they all use the same syntax, or "language."

Once you've learned how PCRE syntax works, you'll be able to use all of them without breaking a sweat.

To avoid drowning in theory (that would be suuuper boring), let's jump right in and start experimenting with one of these functions: preg_match.

preg_match

This function is great for practicing alongside me and getting a feel for how PCRE works.

All you need to know for now is that it returns a boolean: true or false. It returns true if it finds the word you're looking for in the string, and false if it doesn't.

You have to give it two things: your regex (short for "regular expression") and the string you want to search.

Here's how you'd use it in an if condition:

<?php
if (preg_match("** Your REGEX **", "The string you're searching in")) {
    echo 'The word you're looking for is in the string';
} else {
    echo 'The word you're looking for is NOT in the string';
}
?>

Instead of ** Your REGEX **, you'll enter something in PCRE syntax — like what I showed you at the start of the chapter:

#(((https?|ftp)://(w{3}\.)?)(?<!www)(\w+-?)*\.([a-z]{2,4}))#

That's exactly the kind of thing we're going to focus on next.

Because, let's face it — that mess is NOT easy to read... Hieroglyphs look like child's play in comparison!

Simple regex searches

Let's start with some nice, basic searches. You should be able to follow along easily for now — it only gets tricky when we start mixing things together later. :p

First important thing to know: a regex (i.e., a regular expression) is always wrapped in delimiters — special characters.

You can technically use any special character as a delimiter, but to keep things simple, I'm going to pick one for us: the hash symbol (#)!

So your regex will be wrapped in hash symbols, like this:

#My regex#

Wait... what's the point of the hash symbols if the regex is already inside quotation marks in the PHP function?

Good question! It's because you can add options after the closing delimiter. We won't dive into options just yet (you don't need them to get started), but just know that they go right after the second hash, like this:

#My regex#Options

In place of "My regex," you'll put the word you're trying to find.

Let's say you want to check whether a variable contains the word "guitar." Just use the following regex:

#guitar#

Here's how it looks in PHP:

<?php
if (preg_match("#guitar#", "I love playing the guitar.")) {
    echo 'TRUE';
} else {
    echo 'FALSE';
}
?>

As you can see, our script prints TRUE because the word "guitar" was found in the sentence "I love playing the guitar." :D

Keep that little code snippet in mind — we'll be reusing it a lot, changing either the regex or the sentence we're searching through.

To help you visualize how regex behaves, I'll summarize the results in a table like this:

String	Regex	Result
I love playing the guitar.	#guitar#	TRUE
I love playing the guitar.	#piano#	FALSE

Got it so far? :)

We found "guitar" in the first one, but not "piano" in the second. Easy, right? But I'm about to crank up the difficulty!

Uppercase and lowercase

Here's something important: regular expressions are case-sensitive by default. That means they treat uppercase and lowercase as different characters. Look at these two examples:

String	Regex	Result
I love playing the guitar.	#Guitar#	FALSE
I love playing the guitar.	#GUITAR#	FALSE

So how do we make our regex ignore case differences?

By using an option — this is the only one you need to remember for now. Add the letter i after the second hash, and case differences will be ignored:

String	Regex	Result
I love playing the guitar.	#Guitar#i	TRUE
Long live the GUITAR!	#guitar#i	TRUE
Long live the GUITAR!	#guitar#	FALSE

In that last example, I didn't include the i option, so the result was FALSE. But in the others, you can see that i let us match regardless of case. :)

The OR symbol

Now let's look at the OR symbol, which you've already seen in conditionals: it's the vertical bar |.

With it, your regex can look for multiple possibilities. For example:

#guitar|piano#

This means you're searching for either the word "guitar" or the word "piano." If it finds either, it returns TRUE.

String	Regex	Result
I love playing the guitar.	#guitar\|piano#	TRUE
I love playing the piano.	#guitar\|piano#	TRUE
I love playing the banjo.	#guitar\|piano#	FALSE
I love playing the banjo.	#guitar\|piano\|banjo#	TRUE

In the last one, I used the | symbol twice, meaning we're looking for "guitar" OR "piano" OR "banjo."

Still with me? ^^ Awesome!

Now let's look at matching the start and end of a string — then we'll really start speeding things up. :p

Start and end of a string

Regex can be incredibly precise — you'll see what I mean.

Up to now, our word could appear anywhere in the string. But what if we want to match a word only at the beginning or only at the end?

We'll need two symbols. Memorize these:

^ (caret): indicates the start of a string
$ (dollar sign): indicates the end of a string

So if you want a string that starts with "Hello," you'll use:

#^Hello#

Put the caret in front of the word, and now the word has to appear right at the start, or it'll return FALSE.

Likewise, if you want to make sure a string ends with "noob," write:

#noob$#

Let's look at a few test cases:

String	Regex	Result
Hello little noob	#^Hello#	TRUE
Hello little noob	#noob$#	TRUE
Hello little noob	#^noob#	FALSE
Hello little noob!!!	#noob$#	FALSE

Simple, right?

That last one didn't work because the string ends with "!!!", not "noob." So of course, it returned FALSE...

Character Classes

Let's stop beating around the bush and take a close look at this regex:

#fo[ol]d#

What's inside the brackets is called a character class. It means any one of the letters inside can match. In this case, the regex will match 2 words: "food" and "fold".

It's kind of like the OR we learned earlier, except it applies to a letter, not a full word.

Now, if you put in more letters, like this:

#fo[aol]d#

It means "a" OR "o" OR "l". So this regex matches "food", "foad", and "fold"!

Let's try a few examples:

String	Regex	Result
Every day I fold my shirts	#fo[aol]d#	TRUE
Ew, that food's way too fried	#fo[aol]d#	TRUE
Ew, that food's way too fried	#fo[aol]d$#	FALSE
I am a real noobi	#[aeiouy]$#	TRUE
I am a real noobi	#^[aeiouy]#	FALSE

You probably get the first two regexes. But you might need a bit of help with the next three:

1) In "Ew, that food's way too fried", I used #fo[aol]d$#. If you remember, the $ means the string must end with "food," "foad," or "fold." Since the word's in the middle, the result is FALSE. Try moving it to the end and you'll see it works. ;)
2) Then there's "I am a real noobi" with #[aeiouy]$#. This means the string must end in a vowel (a, e, i, o, u, y). Lucky us, the last letter is "i," so the answer is TRUE. :)
3) Same string again, but now with #^[aeiouy]#. This time, the string must start with a lowercase vowel. But it starts with an uppercase "I", so... FALSE!

Still with me? If you feel a bit lost, scroll back up and reread. It won't hurt. :)

Ranges in Classes

This is where things start to get impressive. :)

Using the dash (-), you can allow a whole range of characters.

For example, earlier we used [aeiouy]. That's manageable. But how about [abcdefghijklmnopqrstuvwxyz] just to match a single letter?

I've got something better! :D

You can write [a-z] instead! Way shorter! Want to stop at "e"? No problem: [a-e].

It also works with numbers: [0-9]. Want a digit from 1 to 8? Just type [1-8].

Even cooler — you can combine two ranges in one class: [a-z0-9], which means "any lowercase letter OR a digit".

And of course, you can include uppercase letters too without using those case-insensitive flags from earlier. That would be: [a-zA-Z0-9]. Much shorter than [abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789] (Hope that makes sense — I'm not about to type out the whole alphabet 50 times if I don't have to!)

Let's test a few:

String	Regex	Result
This sentence contains a letter	#[a-z]#	TRUE
this sentence has no caps or digits	#[A-Z0-9]#	FALSE
I live in the 21st century	#^[0-9]#	FALSE
<h1>A HTML heading tag</h1>	#<h[1-6]>#	TRUE

That last one's especially interesting — we're heading into practical use now. Here we're checking if the string contains an HTML heading tag (<h1> to <h6>).

What If I DON'T Want Those Characters?

If you want to match anything except certain characters, put a ^ at the start of the class.

Wait — what?! o_O Didn't ^ mean the start of a string?

Yes — but inside a class, it means you DON'T want what's listed.

So this regex:

#[^0-9]#

...means you want at least one character in the string that isn't a number.

Let's give your brain a little workout:

String	Regex	Result
This sentence contains more than just digits	#[^0-9]#	TRUE
this sentence has more than just caps and digits	#[^A-Z0-9]#	TRUE
This sentence doesn't start with a lowercase letter	#^[^a-z]#	TRUE
This sentence doesn't end in a vowel	#[^aeiouy]$#	FALSE
SwwcrrmmnnGGgGnngnMmmmmffffz	#[^aeiouy]#	TRUE

That's some hardcore regular expression manipulation :) But we're far from over.

Regex Quantifiers

Quantifiers are symbols that tell us how many times a character or group of characters can repeat.

Say we want to match an email address like siterawfan@siteraw.com.

We'll need to say: "Starts with one or more letters, followed by an @, followed by at least two letters, then a dot, then between 2 and 4 letters (for .com, .net, .biz — which does exist!)."

Now, we're not ready yet to write a full email regex (a bit too soon), but the point is: you have to know how to specify how many times a letter can repeat.

The Most Common Symbols

There are 3 you need to remember:

? (question mark): the character is optional. Can appear 0 or 1 time.
So #a?# matches 0 or 1 "a".
+ (plus): the character is required. Can appear 1 or more times.
So #a+# matches "a", "aa", "aaa", etc.
* (asterisk): the character is optional. Can appear 0, 1, or many times.
So #a*# matches "a", "aa", "aaa"... but it also works if there's no "a" at all!

These symbols apply to the character just before them. So we can allow both "dog" and "dogs" using #dogs?#.

Same idea applies to letters in the middle of a word:

#ph?illies#

This will match "phillies" and "pillies" (as they, like regex, often give headaches).

What If I Want 2 Letters (or More) to Repeat?

Use parentheses. For example, to match "Ayayayayayay" (Ken's battle cry from Street Fighter), use:

#Ay(ay)*#

This matches "Ay", "Ayay", "Ayayay", etc. ("Ouch Ouch Ouch" doesn't count)

You can also use the | symbol inside parentheses. #Ay(ay|oy)*# matches strings like "Ayayayoyayayayoyoyoyoyayoy" — repeating "ay" OR "oy".

Another cool trick — you can add a quantifier after a character class (the ones with brackets!).

For example: #[0-9]+# matches any number, as long as there's at least one digit.

Let's test a few:

String	Regex	Result
eeeee	#e+#	TRUE
ooo	#u?#	TRUE
magnificient	#[0-9]+#	FALSE
Yahoooooo	#^Yaho+$#	TRUE
Yahoooooo is awesome!	#^Yaho+$#	FALSE
Blablablablabla	#^Bla(bla)*$#	TRUE

That last one's cool. The regex #^Yaho+$# means the string must start and end with "Yahoo" — with one or more "o"s. So "Yaho," "Yahoo," "Yahooo," etc. all work — just don't add anything before or after.

And the final regex matches "Bla," "Blabla," "Blablabla," etc. — I used parentheses to say "bla" can repeat.

Being More Precise with Curly Braces

Sometimes you want to say: "repeat exactly 4 times" or "between 4 and 6 times"... That's where curly braces {} come in.

If you followed everything so far, this part will feel easy.

There are 3 ways to use them:

{3}: repeat exactly 3 times
#a{3}# matches "aaa"
{3,5}: repeat between 3 and 5 times
#a{3,5}# matches "aaa", "aaaa", or "aaaaa"
{3,}: repeat at least 3 times
#a{3,}# matches "aaa", "aaaa", "aaaaa", etc. (I'm not writing them all, you get it)

By the way:

? = {0,1}
+ = {1,}
* = {0,}

Let's lock this in with some examples:

String	Regex	Result
eeeee	#e{2,}#	TRUE
Blablablabla	#^Bla(bla){4}$#	FALSE
546789	#^[0-9]{6}$#	TRUE

That was quite a bit to digest, huh? o_O Time to stop here and take a proper break, because... next chapter, we put it all together!

Did I give you a headache? :D If you're still alive and reading this, you're doing great!

Right now, regexes aren't super useful on their own — but in the next chapter, we'll use them for real-world tasks.

Take your time and don't move on until you're sure you've got this down — otherwise, you're headed straight for disaster :)

Enjoyed this PHP & MySQL course?

If you liked this lesson, you can find the book "How to Build a Website in HTML and CSS" from the same authors, available on SiteRaw, in bookstores and in online libraries in either digital or paperback format. You will find a complete PHP & MySQL workshop with many exclusive bonus chapters.

More information

< Summary >