How to find overlapping matches with a Regex with Python?

Estimated read time 2 min read

To find overlapping matches with a regex in Python, you can use the re module, specifically the re.finditer() function. By default, the re module finds non-overlapping matches, but by using a positive lookahead assertion in the regex pattern, you can find overlapping matches.

Here’s an example:

import re

def find_overlapping_matches(pattern, text):
    matches = re.finditer(f'(?=({pattern}))', text)
    for match in matches:
        start = match.start()
        end = match.end()
        matched_text = match.group(1)
        print(f"Matched '{matched_text}' at positions {start}-{end-1}.")

# Example usage
text = "abababab"
pattern = "aba"
find_overlapping_matches(pattern, text)

In this example, the find_overlapping_matches() function takes a regex pattern and a text string as input. It uses re.finditer() with a modified pattern that includes a positive lookahead assertion (?=pattern). The positive lookahead allows the regex engine to find overlapping matches.

The function then iterates over the matches and prints the matched text and its positions.

In the example usage, the text is “abababab” and the pattern is “aba”. The output will be:

Matched 'aba' at positions 0-2.
Matched 'aba' at positions 2-4.
Matched 'aba' at positions 4-6.

Each occurrence of the pattern “aba” is found, even though they overlap.

Note that finding overlapping matches can be computationally expensive, especially for large texts and complex patterns. It’s important to be mindful of the potential performance impact and use this technique judiciously.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply