Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
KnowledgeHut upGradKnowledgeHut upGradBackend Development Bootcamp
  • Self-Paced
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

Regular Expressions in Python [With Examples]: How to Implement?

Updated on 30 November, 2022

5.77K+ views
7 min read

While processing raw data from any source, extracting the right information is important so that meaningful insights can be obtained from the data. Sometimes it becomes difficult to take out the specific pattern from the data especially in the case of textual data.

The textual data consist of paragraphs of information collected via survey forms, scrapping websites, and other sources. The Channing of different string accessors with pandas functions or other custom functions can get the work done, but what if a more specific pattern needs to be obtained? Regular expressions do this job with ease.

What is a Regular Expression (RegEx)?

A regular expression is a representation of a set of characters for strings. It presents a generalized formula for a particular pattern in the strings which helps in segregating the right information from the pool of data. The expression usually consists of symbols or characters that help in forming the rule but, at first glance, it may seem weird and difficult to grasp. These symbols have associated meanings that are described here.

Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Meta-characters in RegEx

  1. ‘.’: is a wildcard, matches a single character (any character, but just once)
  2. ^: denotes start of the string
  3. $: denotes the end of the string
  4. [ ]: matches one of the sets of characters within [ ]
  5. [a-z]: matches one of the range of characters a,b,…,z
  6. [^abc] : matches a character that is not a,b or c.
  7. a|b: matches either a or b, where a and b are strings
  8. () : provides scoping for operators
  9. \ : enables escape for special characters (\t, \n, \b, \.)
  10. \b: matches word boundary
  11. \d : any digit, equivalent to [0-9]
  12. \D: any non digit, equivalent to [^0-9]
  13. \s : any whitespace, equivalent to [ \t\n\r\f\v]
  14. \S : any non-whitespace, equivalent to [^\t\n\r\f\v]
  15. \w : any alphanumeric, equivalent to [a-zA-Z0-9_]
  16. \W : any non-alphanumeric, equivalent to [^a-zA-Z0-9_]
  17. ‘*’: matches zero or more occurrences
  18. ‘+’: matches one or more occurrences
  19. ‘?’: matches zero or one occurrence
  20. {n}: exactly n repetitions, n>=0
  21. {n,}: at least n repetitions
  22. {,n}: at most n repetitions
  23. {m,n}: at least m repetitions and at most n repetitions

Our learners also read: Top Python Free Courses

Examples to Understand The Workaround

Now that you are aware of the characters that make up a RegEx, let’s see how this works:

1. Email Filtering:

Suppose you want to filter out all the email ids from a long paragraph. The general format for an email is:

username@domain_name. <top_level_domain>

The username can be alphanumeric, and therefore, we can use \w to denote them but there is a possibility that the user creates an account as firstName.surname. To tackle this, we will escape the dot and create a set of characters. Next, domain_name should be only alphabetic and therefore, A-Za-z will denote that. The top-level domain is usually .com, .in, .org but depending on the use case, you can choose either the whole alphabet range or filter specific domains.

The regular expression of this will look like this:

^([a-zA-Z0-9_.]+)@([a-zA-Z0-9-]+)\.([a-zA-Z]{2,4})$

Here the start and end of the pattern are also declared as well the top-level domain can only contain 2-4 characters. The whole expression has 3 groups.

2. Dates Filtering:

The textual information you are extracting may contain the dates and no separate column is made available for you. The dates are an essential factor that helps in filtering data or time series analysis. A particular date takes the format of date/month/year, where date and month can interchange.

Also, months can be numeric as well as alphabets form and in alphabets either abbreviations or full names. It mainly depends on how many cases are present in our data and can only be achieved by hit and trial.

A simple RegEx that covers a variety of dates is shown below:

^(\d{1,2})[/-](\d{1,2})[/-](\d{2,4})$

This pattern captures the date format with a hyphen or forward slash. The date and month are confined to one or two-digit and year up to four0 digits. The respective entities are captured as groups that are optional in this case.

Also Read: Python Project Ideas and Topics

How to Implement it in Python?

The regular expressions we just built are satisfying the respective criteria we assumed and now it’s time to implement them in Python code. Python has a built-in module called re module that implements the working of these expressions. Simply,

import re

pattern = ‘^(\d{1,2})[/-](\d{1,2})[/-](\d{2,4})$’

Remodule offers a wide range of functions and all of them have different use cases. Let’s look at some of the important functions:

  1. re.findall(): This function returns the list of all the matches in the test string based on the pattern passed. Consider this example:

string = ‘25-12-1999 random text here 25/12/1999’

print(re.findall(pattern, string))

It will return only the dates from the string in a list.

  1. re.sub(): Sub in this function stands for substitution and does the same thing. It substitutes the matches with the replacement value provided. The function takes in the pattern, string, replacement value, and optional parameter of the count. The count parameter controls how many occurrences you want to replace. By default, it replaces all of them and returns the new string.
  2. re.split(): It splits the string at the matched sites and returns the parts as separate strings in a list.
  3. re.search(): This function returns the match object that contains the match found in the string along with all the groups it captured. It can come in handy when you want to store these groups as separate columns.

To perform this:

match  = re.search(pattern, string)

match.group(1)

Group(0) returns the whole match and corresponding next numbers denote other groups.

Checkout: Python Developer Salary in India

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on The Future of Consumer Data in an Open Data Economy

 

 

Conclusion

Regular expressions are a powerful way to capture patterns in textual data. It may take a bit of extra effort to hold command of the various characters but it simplifies the process of data extraction in complex use cases.

Frequently Asked Questions (FAQs)

1. Give some examples of Regular Expressions in Python.

The following examples illustrate the functioning or regular expressions in Python:
a. Email Filtering
The regular expressions can be efficiently used to filter emails. The regular syntax for email filtering is - ^((a-zA-Z0-9_.)+)@((a-zA-Z0-9-)+).((a-zA-Z){2,4})$
This expression is divided into three groups and tackles many cases including - when the username is alphanumeric and when it has a dot, for eg., “first.last@”. This expression will be used for top domains that contain 2-4 characters.
b. Dates Filtering
Dates can be a crucial factor while handling data filtering. The textual data that you are dealing with often contains dates. The regular expression or RegEx that extracts the data from a normal text is - ^(d{1,2})(/-)(d{1,2})(/-)(d{2,4})$
The date and the month can be up to 2 digits while the month can be up to 4 digits.

2. What are the functions involved in the implementation of regular expressions in Python?

The following functions are involved in the implementation of regular expressions in Python:
1. re.findall() - This function accepts a pattern that is to be matched with the text string. It returns the strings that are a match.
2. re.sub() - Sub in “re.sub” stands for “substitution”. This method performs exactly the same function as the “re.findall()” function does.
3. re.split() - It separates the strings around the separator which is to be passed to it as its parameter. The separator could be anything.
4. re.search() - This function returns the match found in the string along with other string groups that it has captured.

3. What are some special sequences used in regular expressions?

The following are some of the special sequences used in regular expressions:
1. A: Check if the string starts with the given character.
2. (Forward Slash) b: Checks if the string starts or ends with the given character. (string)/b checks for the beginning while (backward slash) b (string) checks for the end.
3. B: It is exactly opposite to the b. Checks if the string does not start with the given character.
4. d: Checks for the numerical values in the string.
5. D: Checks for any non-numerical value or character.
6. s: Checks for any whitespace character.
7. S: Checks for any non-whitespace character.
8. w: Checks for any alphanumeric character.
9. W: Checks for any non-alphanumeric character.