Python regular expressions are Shared using examples

  • 2020-05-17 05:47:40
  • OfStack

As a concept, regular expressions are not unique to Python. However, the regular expressions in Python have some minor differences in their actual use.

This article is part of a 1 series on Python regular expressions. In the first article in this series, we'll focus on how to use regular expressions in Python and highlight some of the unique features of 1 in Python.

We will introduce some of the methods for searching and finding strings in Python. Then we'll talk about how to use groups to handle the children of the matching objects that we find.

The regular expression module in Python that we are interested in using is usually called 're'.

>>> import re

1. The original type string in Python

The Python compiler USES '\' (backslash) to represent escape characters in string constants.

If the backslash is followed by a string of special characters that the compiler can recognize, the entire escape sequence is replaced with the corresponding special character (for example, '\n' is replaced by the compiler with a newline character).

However, this presents a problem with using regular expressions in Python, because the 're' module also USES backslashes to escape special characters (such as * and +) in regular expressions.

The mix of the two means that sometimes you have to escape the character itself (when the special character is recognized by both the Python and the regular expression compiler), but you don't have to do so at other times (if the special character is only recognized by the Python compiler).

Instead of trying to figure out how many backslashes we need, we can use the original string instead.

The primitive type string can be created simply by prefacing the normal string with a single character 'r' before the double quotes. Python compiler when 1 string is of primitive type

I'm not going to try to make any substitutions. Essentially, you're telling the compiler not to interfere with your strings at all.


>>> string = 'This is a\nnormal string'
>>> rawString = r'and this is a\nraw string'
>>> print string

This is a normal string


>>> print rawString
and this is a\nraw string

This is a primitive type string.

Use regular expressions in Python for lookup

The 're' module provides several methods to query the input string exactly. The methods we will discuss are:

•re.match()
•re.search()
•re.findall()

Each method receives one regular expression and one string to match. Let's look at each of these methods in more detail to see how they work and how they differ.

2. Use re.match to find the start of a match

Let's look at the match() method first. The way the match() method works is that it can only find a match if it matches the pattern at the beginning of the string being searched.
For example, call the mathch() method on the string 'dog cat dog', and find the pattern' dog' will match:


>>> re.match(r'dog', 'dog cat dog')
<_sre.SRE_Match object at 0xb743e720<
>>> match = re.match(r'dog', 'dog cat dog')
>>> match.group(0)
'dog'

We'll talk more about the group() method later. Now, all we need to know is that we called it with 0 as its argument, and the group() method returns the matching pattern found.
I also skipped the return SRE_Match object, which we'll discuss shortly.
However, if we call the math() method on the same string and look for the pattern 'cat', no match will be found.


>>> re.match(r'cat', 'dog cat dog')
>>>

3. Use re.search to find the matches at any location

The search() method is similar to match(), but the search() method does not limit us to looking for matches only from the beginning of a string, so looking for 'cat' in our example string will find a match:


search(r'cat', 'dog cat dog')
>>> match.group(0)
'cat'

However, the search() method stops looking after it finds a match, so in our example string we use the searc() method to look for 'dog' only to find its first occurrence.


>>> match = re.search(r'dog', 'dog cat dog')
>>> match.group(0)
'dog'

4. Use re.findall to hold all matched objects
By far the most used lookup method in Python is the findall() method. When we call the findall() method, we can simply get a list of all matching patterns, rather than match objects (we'll talk more about match objects later). It's easier for me. We call the findall() method on the sample string and we get:


['dog', 'dog']
>>> re.findall(r'cat', 'dog cat dog')
['cat']

5. Use the match.start and match.end methods

So what is the 'match' object that the previous search() and match() methods returned to us earlier?
Instead of simply returning the matching part of a string, the "match object" returned by search() and match() is actually a wrapper class about matching substrings.
You saw earlier that I can get a matching substring by calling the group() method, (as we'll see in the next section, the matching object is actually very useful when dealing with grouping), but the matching object also contains more information about the matching substring.
For example, the match object can tell us where the matched content begins and ends in the original string:


>>> match = re.search(r'dog', 'dog cat dog')
>>> match.start()
>>> match.end()

It is sometimes very useful to know this information.

6. Group by Numbers using mathch.group

As I mentioned earlier, matching objects are very handy when dealing with grouping.
Grouping is the ability to locate specific substrings of an entire regular expression. We can define a group as part 1 of the entire regular expression, and then position that group separately to match the content.
Let's take a look at 1 and see how it works:

>>> contactInfo = 'Doe, John: 555-1212'

The string I just created looks like a fragment taken from someone's address book. We can match this line with a regular expression like this:


>>> re.search(r'\w+, \w+: \S+', contactInfo)
<_sre.SRE_Match object at 0xb74e1ad8<

By surrounding a particular part of the regular expression with parentheses (the characters' (' and ')'), we can group the content and then work with the subgroups separately.

>>> match = re.search(r'(\w+), (\w+): (\S+)', contactInfo)

These groups can be obtained by using the group() method of grouping objects. They can be located by the number order in which they appear from left to right in the regular expression (starting at 1) :


>>> match.group(1)
'Doe'
>>> match.group(2)
'John'
>>> match.group(3)
'555-1212'

The reason group ordinals start at 1 is because the 0th group is reserved for all matched objects (as we'll see in the match() and search() methods we studied earlier).


>>> print rawString
and this is a\nraw string
0

7. Use match.group to group by alias

Sometimes, especially when a regular expression has many groups, it becomes impractical to locate groups by the order in which they appear. Python also allows you to specify a group name with the following statement:

 >>> match = re.search(r'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)', contactInfo)

We can still use the group() method to get the contents of the group, but we will use the group name we specified instead of the group number we used before.


>>> print rawString
and this is a\nraw string
1

This greatly enhances the clarity and readability of the code. You can imagine that as regular expressions get more and more complex, it will become more and more difficult to figure out what a group captures. Naming your groups will tell you and your readers exactly what you want to do.
Although the findall() method does not return grouping objects, it can also use grouping. Similarly, the findall() method returns a collection of 1 tuple, where the N element in each tuple corresponds to the N group in the regular expression.


>>> print rawString
and this is a\nraw string
2

However, naming groups does not apply to the findall() method.

In this article, we covered some of the basics of using regular expressions in Python. We learned about primitive string types (and the headaches it can help you solve when using regular expressions). We also learned how to properly use the match(), search(), and findall() methods for basic queries, and how to use grouping to handle child components of matched objects.

As usual, the Python official documentation for the re module is a great resource for viewing more on this topic.

In a future article, we will discuss the use of regular expressions in Python in more depth. We'll take a more comprehensive look at matching objects, learn how to use them to make substitutions in strings, and even use them to parse Python data structures from text files.


Related articles: