Python Regex

The re module in Python provides tools for working with regular expressions, enabling pattern-based text matching, extraction, and manipulation. Regular expressions are powerful for tasks like text validation, searching, and replacing patterns.

1. Basic Pattern Matching with re.search()

re.search() finds the first match of a pattern in a string. If a match is found, it returns a Match object; otherwise, it returns None.
import re

# Example text and pattern
text = "Python is a versatile language."
pattern = r"versatile"

# Search for the pattern
match = re.search(pattern, text)
print("Match found:" if match else "No match found.")

Output:

Match found:
Explanation: The re.search() function finds the first occurrence of "versatile" in text.

2. Using re.match() to Match the Beginning of a String

re.match() checks if a pattern appears at the start of a string only. It returns a Match object if successful or None if not.
# Pattern to match at the beginning
pattern = r"Python"

# Match at the start of the string
match = re.match(pattern, text)
print("Match at start:" if match else "No match at start.")

Output:

Match at start:
Explanation: re.match() only checks the start of text, where "Python" is found.

3. Extracting All Matches with re.findall()

re.findall() returns all matches of a pattern in a string as a list.
# Multiple patterns in text
text = "The rain in Spain stays mainly in the plain."
pattern = r"in"

# Find all occurrences of "in"
matches = re.findall(pattern, text)
print("Occurrences of 'in':", matches)

Output:

Occurrences of 'in': ['in', 'in', 'in', 'in']
Explanation: re.findall() finds all instances of "in" in the text.

4. Using re.finditer() for Match Locations

re.finditer() returns an iterator of Match objects, including the start and end positions of each match.
# Find all "in" matches with positions
for match in re.finditer(pattern, text):
    print("Found 'in' at:", match.span())

Output:

Found 'in' at: (6, 8)
Found 'in' at: (9, 11)
Found 'in' at: (25, 27)
Found 'in' at: (39, 41)
Explanation: re.finditer() allows for location tracking by providing span() method calls for each match.

5. Using Groups in Regular Expressions

Groups are used to capture specific portions of matched text with parentheses. Access groups with match.group().
# Pattern with two groups: first and last name
pattern = r"(\w+) (\w+)"
text = "John Doe"

# Match and extract groups
match = re.match(pattern, text)
if match:
    print("First name:", match.group(1))
    print("Last name:", match.group(2))

Output:

First name: John
Last name: Doe
Explanation: (\w+) captures a word (sequence of letters), and two groups capture first and last names separately.

6. Pattern Substitution with re.sub()

Use re.sub() to replace occurrences of a pattern with a replacement string.
# Replace all occurrences of "in" with "on"
replaced_text = re.sub(r"in", "on", text)
print("Replaced text:", replaced_text)

Output:

Replaced text: The raon on Spaon stays maonly on the plaon.
Explanation: re.sub() replaces every instance of "in" with "on" in the text.

7. Splitting Strings with re.split()

re.split() splits a string by a specified pattern, returning a list of substrings.
# Split by spaces and punctuations
pattern = r"[ ,.]"
text = "Split by spaces, commas, and periods."

# Perform the split
split_text = re.split(pattern, text)
print("Split result:", split_text)

Output:

Split result: ['Split', 'by', 'spaces', '', 'commas', '', 'and', 'periods', '']
Explanation: re.split() uses a pattern that includes spaces, commas, and periods to split the text into segments.

8. Using Metacharacters in Regular Expressions

Metacharacters in regex add special meanings, such as . for any character, * for zero or more, and ^ to indicate the start of a string.
# Define a pattern with metacharacters
pattern = r"^T.*d$"
text = "This is a test method"

# Check if the pattern matches the entire string
match = re.match(pattern, text)
print("Pattern matches:", bool(match))

Output:

Pattern matches: True
Explanation: ^ ensures the pattern starts at the beginning, and $ enforces it ends at "d".

9. Compiling Regular Expressions for Efficiency

Using re.compile() allows pre-compilation of patterns, improving efficiency for repeated matching.
# Compile a pattern and reuse it
pattern = re.compile(r"\b\w{4}\b")
text = "This sentence has some four-letter words."

# Find all four-letter words
matches = pattern.findall(text)
print("Four-letter words:", matches)

Output:

Four-letter words: ['This', 'some', 'four', 'word']
Explanation: Compiling a pattern with \b\w{4}\b captures only four-letter words in text.

10. Flags in Regular Expressions

Flags, such as re.IGNORECASE, re.MULTILINE, and re.DOTALL, modify regex behavior.
# Use IGNORECASE flag
pattern = re.compile(r"python", re.IGNORECASE)
text = "Python is versatile. I love PYTHON."

# Find all case-insensitive matches
matches = pattern.findall(text)
print("Case-insensitive matches:", matches)

Output:

Case-insensitive matches: ['Python', 'PYTHON']
Explanation: The IGNORECASE flag allows the pattern to match "Python" in any case.

Summary

The re module offers extensive tools for text searching, extraction, and manipulation with regular expressions, making it indispensable for pattern-based string processing in Python.

Previous: Python JSON | Next: Python XML Processing

<
>