View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Understanding Python Regular Expressions with Examples

Updated on 20/05/20255,739 Views

Ever felt like searching for a word in a paragraph was tougher than finding your slippers during a power cut? That’s where Python Regular Expressions come in. Regular expressions, also called RegEx, are tools to match patterns in text. You can think of them as CTRL+F on steroids!

They don’t just search, they recognize complex patterns and help extract, replace, or validate content. In India, whether you’re filtering Aadhaar numbers, checking mobile numbers, or validating email formats for job applications, regex saves time and effort. Let’s walk through the mystical yet practical world of regex in Python.

Pursue our Software Engineering courses to get hands-on experience!

What are Python Regular Expressions (RegEx)?

Python Regular Expressions are a sequence of characters used to define a search pattern. These patterns are used with the re module to match or compare strings, find substrings, or even replace them. They come in handy for validating inputs, scraping data, and many day-to-day automation tasks.

Take your skills to the next level with these top programs:

RegEX Module

Python includes a built-in package named re that allows you to work with Regular Expressions efficiently. Here’s the syntax to import the re module:

import re

Explanation: Before using any regular expressions, we need to import the re module. It contains all the handy functions like match(), search(), findall(), and more that help us work with regex. Without this, Python won’t understand what regex even means!

Basic Regex Functions in Python

Here are some of the common Python RegEX functions:

Function

Description

re.match()

Matches a pattern at the beginning of the string

re.search()

Searches the entire string for a pattern

re.findall()

Returns a list of all matches

re.sub()

Replaces the pattern with a new string

Let’s break these down with examples:

re.match()

import re
result = re.match("Ram", "Ram went to school")
print(result)

Output:

<re.Match object; span=(0, 3), match='Ram'>

Explanation: The match() function checks whether the string starts with "Ram". Since it does, we get a match object. If it started with something else, it would return None. It’s like checking if your train ticket has today’s date—if not, you’re not boarding. This is efficient when you want to check headers, titles, or specific starting points in text.

re.search()

import re
result = re.search("Delhi", "Mumbai to Delhi via train")
print(result)

Output:

<re.Match object; span=(11, 16), match='Delhi'>

Explanation: search() looks through the entire string to find a match, regardless of its position. Think of it like your mom searching for her specs all over the house—they’re found eventually! Ideal for checking if a keyword or phrase exists in large chunks of data.

re.findall()

import re
text = "My PIN codes are 110001 and 560034."
result = re.findall(r"\d{6}", text)
print(result)

Output:

['110001', '560034']

Explanation: Here, \d{6} looks for exactly 6 digits, matching Indian PIN codes. findall() returns all matches in a list. Super useful if you’re scanning a document for all phone numbers or OTPs. It acts like a smart assistant collecting all relevant data points in one go.

re.sub()

import re
text = "This is bad. That is bad too."
updated_text = re.sub("bad", "awesome", text)
print(updated_text)

Output:

This is awesome. That is awesome too.

Explanation: The sub() method replaces all instances of "bad" with "awesome". Great for editing out those ‘unwanted’ terms from user comments or tweets. You can also use it to mask sensitive info—like replacing email IDs with asterisks for privacy.

Also Read: 16+ Essential Python String Methods You Should Know (With Examples) article!

What are the Commonly Used Regex Patterns?

Here are some of the commonly used regex patterns:

Pattern

Meaning

Example Match

\d

Any digit (0-9)

5, 9

\D

Non-digit character

a, #

\w

Word character (a-z, A-Z, 0-9, _)

A, 7, _

\W

Non-word character

!, @

\s

Whitespace character

space, tab

\S

Non-whitespace character

a, 9, %

.

Any character except newline

a, B, %, 1

^

Starts with

^Hello matches "Hello world"

$

Ends with

world$ matches "Hello world"

Example – Indian Mobile Number Validation

import re
text = "Contact: 9876543210"
match = re.search(r"[6-9]\d{9}", text)
print(match.group())

Output:

9876543210

Explanation: This pattern ensures the number starts with 6-9 and has 10 digits in total. Perfect for verifying mobile numbers in India. It's commonly used by e-commerce apps during signup or OTP validation.

What is Grouping & Capturing?

Grouping in Python Regular Expressions allows you to isolate and extract specific parts of a match using parentheses (). It’s like filling multiple tiffin boxes from a big pot of biryani - you don't just take the whole pot, you take only the portions you want, neatly separated.

Each set of parentheses in a regex pattern defines a capture group, and when the regex matches a string, it stores each group’s result separately. This is particularly helpful when you want to extract structured data - like separating an STD code from a phone number, or a date into day, month, and year.

These groups can then be accessed using group() for a single match or groups() to get all matched groups as a tuple.

Example – Extract STD Code and Number

import re
text = "STD Code: 080, Number: 23456789"
match = re.search(r"(\d{3}), Number: (\d{8})", text)
print(match.groups())

Output:

('080', '23456789')

Explanation: Here, groups() returns a tuple of matched groups. Handy when you need to separate the area code from the number—like BSNL used to do! It gives you organized access to data, like slicing laddoos into neat pieces.

Regex Flags in Python

Regex flags modify how patterns behave. They’re like extra filters added to your sunglasses - changing how you see the string.

Flag

Description

re.IGNORECASE

Ignores case differences (A == a)

re.MULTILINE

^ and $ match start/end of each line

re.DOTALL

Makes . match newline characters as well

Example – Case Insensitive Search

import re
text = "Welcome to Delhi"
match = re.search("delhi", text, re.IGNORECASE)
print(match.group())

Output:

Delhi

Explanation: Without the IGNORECASE flag, "delhi" wouldn’t match "Delhi". This flag is useful when dealing with user input in different capitalizations, like names or cities.

Regex Cheat Sheet

What You Want to Match

Regex Pattern

Indian Mobile Number

[6-9]\d{9}

PAN Card

[A-Z]{5}[0-9]{4}[A-Z]

PIN Code

\d{6}

Email

\w+@\w+\.\w{2,3}

Vehicle Number (MH12 XY 1234)

[A-Z]{2}\d{2} [A-Z]{2} \d{4}

IFSC Code

[A-Z]{4}0[A-Z0-9]{6}

Common Regex Mistakes to Avoid

Here are some of the common mistakes to avoid:

  1. Forgetting to use raw strings (r""): Without it, backslashes create bugs. Always prefix regex with r.
  2. Overusing . wildcard: It's greedy and can match unintended characters. Use with caution.
  3. Ignoring greedy vs. non-greedy matching: By default, regex quantifiers like * and + are greedy—they match as much text as possible. This can lead to unexpected results, such as capturing too much text. Use *? or +? for non-greedy (lazy) matching to capture the smallest possible portion.
  4. Skipping escape characters: Want to match . literally? Use \. not just .

Note:

  1. Use raw strings with r"" to avoid escaping backslashes (\). Much cleaner!
  2. Avoid over-engineering. Don't use regex when simple string methods can do the job.
  3. Regex in Python is case-sensitive by default. Use re.IGNORECASE if needed.
  4. Need to validate PAN numbers? Regex to the rescue!

re.search(r"[A-Z]{5}[0-9]{4}[A-Z]", "ABCDE1234F")

Explanation: This pattern matches valid Indian PAN numbers—5 uppercase letters, 4 digits, and 1 letter. Your code’s version of KYC! It’s essential for financial applications or backend KYC workflows.

  1. Use online tools smartly. Sites like regex101.com let you see exactly what each part of your regex is doing. Debug smarter, not harder.
  2. Take up small problems like extracting dates from newspaper lines or Aadhaar numbers from PDF content.
  3. Learn from mistakes. Save your old regex fails in a notepad. Next time you’ll laugh and learn at the same time!

Real-World Use Cases

Here are some of the Python regular expressions real-world used cases:

  • Validating input fields in college admission forms.
  • Filtering data while scraping job listings from Naukri or LinkedIn.
  • Masking personal data like Aadhaar or phone numbers before saving logs.
  • Extracting hashtags from Instagram captions for your side hustle.
  • Validating IFSC codes, PIN codes, and GSTINs in fintech apps.
  • Matching Indian vehicle numbers or license plate formats for traffic violation systems.
  • Validating GSTIN format in e-commerce invoices.
  • Cleaning up HTML tags or unwanted characters from scraped web data.

Conclusion

Python Regular Expressions are incredibly powerful tools for text processing. With the re module, you can search, match, and manipulate strings with precision and speed. From checking the format of a PIN code to scraping phone numbers from a messy webpage, regex proves its worth in countless real-world applications.

Sure, regex might seem intimidating at first—kind of like deciphering your grandmother’s secret masala blend—but once you understand the patterns, you’re on your way to automating everything from form validations to data cleanup. It’s like giving your code a sixth sense for spotting order in chaos.

Start small, test often, and soon you’ll be writing regex like a jugaadu pro! Next time you’re tangled in a text problem, think regex—because even the messiest data has a pattern. And hey, for practice, try building a form validator for a school admission form. Validate name, mobile number, email, and PIN code.

FAQs

1. What is the use of regular expressions in Python?

Python Regular Expressions are used for pattern matching in strings. They help validate input data like emails, PIN codes, or phone numbers, extract specific patterns, and clean or modify large datasets efficiently.

2. Which Python module supports regular expressions?

The re module in Python supports all regular expression operations. You must import it before using functions like match(), search(), or findall() to perform regex-related tasks in your programs.

3. What is the difference between match() and search()?

match() checks for a pattern only at the beginning of a string, while search() scans the entire string for a match. If the pattern appears later, match() will return None, but search() can still succeed.

4. Can regex be used to validate Indian PAN numbers?

Yes, regular expressions are ideal for validating PAN numbers using the pattern [A-Z]{5}[0-9]{4}[A-Z]. It ensures correct format: five letters, four digits, and one letter—common for income tax forms in India.

5. How do I extract multiple values using regex in Python?

Use re.findall() to extract all matching values from a string. It returns a list of results. This is helpful when scanning for multiple phone numbers, PIN codes, or dates in a single document.

6. Is Python regex case-sensitive by default?

Yes, regex in Python is case-sensitive unless you use the re.IGNORECASE flag. This allows you to match uppercase and lowercase versions of text like names, cities, or states, regardless of user input format.

7. How can I replace text using regex in Python?

You can use re.sub() to find and replace patterns within a string. It’s useful for cleaning up dirty data, replacing slang in text, or masking sensitive information like Aadhaar or email addresses.

8. What does the \d pattern mean in regex?

The \d pattern matches any digit from 0 to 9. It's used to locate numeric values like mobile numbers, OTPs, or invoice IDs. If you want 6-digit PIN codes, use \d{6} instead.

9. Can regex validate Indian mobile numbers?

Yes, use [6-9]\d{9} to validate 10-digit Indian mobile numbers starting with digits 6 to 9. It ensures that the number format is correct and avoids matching landlines or invalid inputs.

10. What are groups in Python regular expressions?

Groups are defined using parentheses () and are used to extract sub-patterns from a match. They're helpful when splitting a phone number into STD code and number, or separating date, month, and year fields.

11. Why should I use raw strings in regex patterns?

Raw strings (r"") prevent Python from misinterpreting backslashes in your pattern. Without raw strings, \d could throw an error or not work as expected, leading to bugs in pattern matching.

12. How do I make the dot (.) match newlines in regex?

Use the re.DOTALL flag to make the dot . match newline characters as well. This is useful when you want to match entire paragraphs or multiline texts without missing line breaks.

13. What’s the role of anchors like ^ and $ in regex?

The ^ anchor asserts the start of a string, and $ asserts the end. They're used to check if a string starts or ends with specific words—ideal for validating headers or footer content in documents.

14. Can regex be used for web scraping in Python?

Yes, regex can extract specific patterns like product IDs, prices, or dates from raw HTML. But for structured data, it’s better to combine regex with libraries like BeautifulSoup or Scrapy for cleaner results.

image

Take our Free Quiz on Python

Answer quick questions and assess your Python knowledge

right-top-arrow
image
Join 10M+ Learners & Transform Your Career
Learn on a personalised AI-powered platform that offers best-in-class content, live sessions & mentorship from leading industry experts.
advertise-arrow

Free Courses

Explore Our Free Software Tutorials

upGrad Learner Support

Talk to our experts. We are available 7 days a week, 9 AM to 12 AM (midnight)

text

Indian Nationals

1800 210 2020

text

Foreign Nationals

+918068792934

Disclaimer

1.The above statistics depend on various factors and individual results may vary. Past performance is no guarantee of future results.

2.The student assumes full responsibility for all expenses associated with visas, travel, & related costs. upGrad does not provide any a.