Home > Blog Posts > Dev > Regular Expressions

Regular Expressions


Published: Apr 07 2017

Estimated Read Time:

What are Regular Expressions? They are a way to search text. I don't know the specifics even though I have read about them on Wikipedia.

I love Regular Expressions (RegEx). I know they aren't the best for everything and I am not even that great at them. I like that they are logical and terse and strict.

I found this fun RegEx crossword site and after completing all the regular puzzles I am working on the player submitted ones. They should be much more challenging and force me to learn more of the syntax that I don't know well.

A couple things I use RegEx for are:

  1. Find and replace text in Notepad++ (example: replace all attributes in html/xml tags, remove html/xml tags)
  2. Simple email validation

These aren't perfect cases though. The html/xml parsing needs to account for many possible types of tags and I might run several different expressions instead of one that covers everything. The email one is usually extremely simple and doesn't actually validate but makes sure it has a couple key components. 

Below are some actual examples of simple regular expressions that I might use in code or to get through a large file in Notepad++.


Step 1: 


This will match either of these two texts: "<div>test</div>", "<div class='test'>test</div>". It will fail however if there are multiple lines, which is why I chose the div tag here. 

Step 2:


This will match a div with anything in between across multiple lines. Will still not get everything though. If there is a div within a div then it will not match that in way you might want.

For the markup: 

<div class="div1">
<div class='div2'>

The above regex will match this text only: 

<div class="div1">
<div class='div2'>

This is because it will find the first div and keep matching until it finds an ending div tag. I don't have a simple solution to that. I would likely have to replace a couple of times to get the results I needed.

I would probably not use the div tag and use something else if I could. The nice thing about the capture groups is that it makes replacing text much easier. 

Using the RegEx: 

(<div id='test'>)(.|\r|\n)*?(</div>)


<div id='test'>




<div id='test'>second</div>

Obviously not perfect but I am sure you can see how it's useful. 


The email example is much simpler.


This isn't that great checking against a text file because it will match all kinds of things that have an @ symbol. This does work well for a single textbox input though. It doesn't actually test that everything is valid but it makes sure there is something before and after the @ and a period between the last two items. 

I had planned to write a simple post about simple regular expressions but as you can see it can be complex. Some people find RegEx hard to understand and there are a number of ways that matches may be missed or included when they aren't wanted.


Resources for RegEx: