top

Search

Python Tutorial

.

UpGrad

Python Tutorial

Python Regular Expressions

Python regular expressions (regex), a crucial tool for programming and data processing, make it simple to identify, match, and operate with text patterns. A Python developer has to be proficient in using regular expressions for tasks like verifying data, input processing, and retrieving specific details from strings.

In this post, we will delve into the world of regular expressions and discuss the numerous special characters and sequences that make Python regular expressions so powerful. We’ll also go through how to use these patterns to extract data, match specific text patterns, and perform complex text operations.

Example of Python Regular Expressions (RegEx)

Python regular expressions are simple to work with because of the wide variety of functions and methods the Python regex module provides. Regardless of your degree of programming experience, mastering the regex module will greatly increase your ability to interpret textual data.

Python regular expressions examples:

Example 1: Tracking down every instance of a pattern

Let's say we have an object called "text" that says these things:

Text = "The cat in the bonnet sported a big red hat."

The re.findall() method can be used to locate every instance of the word "hat" inside the provided text.

import re

text = "The cat in the bonnet sported a big red hat."

pattern = r'hat'

matches = re.findall(pattern, text)

print (matches)

Output: 

['hat', 'hat']

The list of results from the re.findall() function's search for every instance of the pattern in the text input is returned. Since the word "hat" occurs again in the text, it gives ['hat', 'hat'] in this example.

Example 2: changing a pattern

Let's say we have an object called "text" that says these things:

text = "I love cats! My favorite animal is the cat."

To change every instance of the word "cats" to "dogs" in the provided text, we can use the re.sub() method.

import re

text = "I love cats! My favorite animal is the cat."

pattern = r'cats'

replacement = 'dogs'

new_text = re.sub(pattern, replacement, text)

print (new_text)

Result: 

"I love dogs! My favorite animal is the dog.

In the text provided, the re.sub() method looks for every instance of the given pattern and substitutes it with an alternative string. In this instance, "dogs" is used in place of "cats" everywhere.

MetaCharacters

In Python regular expressions, MetaCharacters are special characters with a defined value. They are utilized to carry out actions like pairing, reiteration, and aggregation and establish the search pattern. In the Python 're' module, the following MetaCharacters are frequently used:

\ – Backslash

Regular expressions depend on the backslash (), also referred to as an escape character. It is used to give specific characters or groups of characters a distinct significance. Here is an illustration of how to use the backslash in regular expressions:

import re

   text = "The price is $10."

   pattern = r'\$'

   matches = re.findall(pattern, text)

   print(matches)

Output: 

['$']

[] – Square Brackets

Regular expressions establish a character set or character range using square brackets []. They allow you to choose a set of characters that can match a certain location in the text.

Here is an illustration showing how to use square brackets in regular expressions:

import re

   text = "The cat sat on the mat."

   pattern = r'[cm]'

   matches = re.findall(pattern, text)

   print(matches)

Output: 

['c', 'm', 'c']

$ – Dollar

In regular expressions, the dollar sign ($) is a metacharacter that is used to match a string's end. It is frequently combined with additional characters or character sets to specify more precise patterns.

The following example shows how to use the dollar sign in regular expressions:

import re

   text = "Hello, how are you today?" 

   pattern = r"today\?$"

   matches = re.findall(pattern, text)

   print(matches)

Output: 

["Today?"] 

. - Dot

In regular expressions, the dot (.) is a special character that matches all single characters other than newlines. In a pattern, it can stand in for any character.

Let's use the following bit of code as an illustration:

import re

text = "Hello World!"

pattern = r"l."

matches = re.findall(pattern, text)

print(matches)

Output:

['ll', 'ld']

| - Or

The logical OR operator is denoted by the special character pipe (|) in regular expressions. It accepts numerous pattern specifications and matches any one of them.

For example:

import re

text = "I adore both cats and dogs."

pattern = "cats|dogs"

matches = pattern, text, re.findall

print(matches)

Output:

'Cats' and 'Dogs'

? - Question

In regular expressions, the alternative match of the previous element is represented by the special character known as the question mark (?). It states that the previous element can appear 0 or 1 times.

For example:

import re

text = "color or colour"

pattern = r"colou?r"

matches = re.findall(pattern, text)

print(matches)

Output:

['color,' 'colour']

Python Regex Module

You can use regular expressions, which are effective tools for pattern matching and text manipulation, with the Python regex package. The regex module makes it simple to look for particular patterns inside a string and carry out different operations based on those patterns.

To further understand how the regex module functions, let's look at an example:

import re

text = "Hello, my email address is frank@gmail.com"

# Use the findall() method to find a pattern.

pattern = r"\w+@\w+\.\w+"

matches = re.findall(pattern, text)

# Print each email address that appears in the text.

print(matches)

Output:

[frank@gmail.com]

In this instance, we import the 're' module and declare a string called 'text' that includes an email address. Then, in order to correlate email addresses, we construct a sequence using regular expression language. The pattern "w+@w+.w+" fits any number of word characters that come before the @ sign, any number of word characters that come before a period, and any number of word characters that come after the period.

re.findall()

To discover every instance of a pattern within a string, use the re.findall in Python, which returns the results as a list. When you need to extract several matches from a text, this method comes in handy.

Let's use an illustration to better understand:

import re

text = "Hello, my name is John Doe. Bella and Max are the names of my two kitties."

# To discover all instances of a name, use findall().

pattern = r"\b[A-Z][a-z]+\b"

names = re.findall(pattern, text)

# Print every name that appears in the text.

print(names)

Output: 

['Hello', 'John', 'Doe', 'Bella', 'Max']

The 're' module is imported in the above instance, and a string called 'text' that includes names is defined. Then, in order to match names, we build a pattern using regular expression notation. 

re.compile()

A regular expression pattern can be precompiled into a regex object using the re.compile in Python. Performance is enhanced since you can reuse the same pattern repeatedly without having to recompile it every single time.

Here is an illustration showing how to utilize the 're.compile()' function:

import re

# Specify a regular expression pattern

pattern = r"\b[A-Z][a-z]+\b"

# Create the regex object by compiling the pattern.

regex_object = re.compile(pattern)

# To find matches, use the built regex object.

"Hello, my name is John Doe. Bella and Max are the names of my two kitties.

# To discover all instances of names in the text, employ the regex module.

names = regex_object.findall(text)

# Print every name that appears in the text.

print(names)

Output: 

Output: ['Hello', 'John', 'Doe', 'Bella', 'Max']

The 're' module is first imported in the above instance. Then, following the same pattern as previously, we build a regular expression sequence that fits the names. 

re.split()

The regex module's 're.split()' method can divide a text into a series of substrings depending on a given pattern. Every time it encounters a match for the pattern, it separates the string.

This example shows how to utilize the 're.split()' function:

import re

# Specify a regular expression pattern

pattern = r"\s+"

# Substrings should be created from the string based on the pattern.

text = "Hello, my name is Robert Johnson."

split_text = re.split(pattern, text)

# Display the outcome substrings.

print(split_text)

Output: 

['Hello,', 'my', 'name', 'is', 'Robert', 'Johnson.']

The 're' module is first imported in this example. The pattern '\s+' is then used to build a regular expression pattern that matches spaces. This pattern detects one or more blank characters. 

The string 'text' is then divided into an array of substrings depending on the given pattern using the 're.split()' method. In this instance, the method separates the string every time it comes across one or more whitespace elements.

re.subn()

In the regex module, the 're.subn()' method replaces instances of an expression in an array with a given replacement string. The changed string and the total number of substitutions are returned as a tuple.

This shows how to use 're.subn()':

import re

Specify a regular expression pattern in #

pattern = r"\d+"

# Establish the substitute string.

replacement = "X"

# Replace the string's letters and numbers.

text ="I have 3 cats and 2 dogs."

modified_text, num_substitutions = re.subn(pattern ("(.*?)(\d+)", r"\1X", text)

# Display the altered string along with the quantity of replacements.

print(modified_text)

print(num_substitutions)

Output:

"I have X cats and X dogs." 2

The '\d+' pattern is used in this example to build a regular expression pattern that matches one or more numbers. We also designate "X" as the substitute string.

Following that, we use the 're.subn()' method to replace each instance of the pattern with an alternative string in the string 'text'.

re.escape()

Special characters in a string can be handled as figurative characters by using the 're.escape()' method in the regex module. This is helpful when using regular expressions including special characters like commas or asterisks.

This example shows how to utilize the 're.escape()' function:

import re

Create a string with particular symbols in it.

string = "(hello)*world"

# Remove special characters from text

escaped_string = re.escape(string)

# Display the escape string.

print(escaped_string)

Output:

"\(hello\)\*world"

In the above example, a string of special characters like brackets and a dash is present. These characters are escaped using the 're.escape()' method, producing the string "\(hello\)\*world". The liberated string can now be utilized in regular expressions as an actual string.

Conclusion

To sum up, regular expressions can be employed to remove certain things from vast volumes of data. You can quickly retrieve the correct format for phone numbers from any country using a well-written regular expression. When working with material that contains phone numbers from various nations, this could be really useful, but you should only concentrate on one. Regular expressions offer a dependable and efficient solution for this kind of task, saving you the time and effort required to manually look for and put together the required data.

FAQs

1. Can regular expressions be used to obtain phone numbers from any country?

Regular expressions may be used to get phone numbers from any country. However, a different regular expression can be used based on how phone numbers are presented in that country.

2. Can data be obtained using regular expressions without any limitations?

Regular expressions do have certain restrictions, despite being a fantastic tool for data extraction. It will not be accepted if your data exhibits anomalies, convoluted trends, or any of these characteristics.

3. How can I verify the accuracy of the regular expressions I've written? 

You can test and validate your regular expressions using several techniques. If the regex fits the required patterns, these algorithms often react right away. You can also construct examples for evaluating your regular expression with known inputs and predicted outputs to ensure it captures the necessary data.

Leave a Reply

Your email address will not be published. Required fields are marked *