A collection of common regular expressions bundled with an easy to use interface.

Overview

CommonRegex

Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the hard work so you don't have to.

Pull requests welcome!

Installation

Install via pip

sudo pip install commonregex

or via setup.py

python setup.py install

Usage

>>> from commonregex import CommonRegex
>>> parsed_text = CommonRegex("""John, please get that article on www.linkedin.com to me by 5:00PM 
                               on Jan 9th 2012. 4:00 would be ideal, actually. If you have any 
                               questions, You can reach me at (519)-236-2723x341 or get in touch with
                               my associate at [email protected]""")
>>> parsed_text.times
['5:00PM', '4:00']
>>> parsed_text.dates
['Jan 9th 2012']
>>> parsed_text.links
['www.linkedin.com']
>>> parsed_text.phones
['(519)-236-2727']
>>> parsed_text.phones_with_exts
['(519)-236-2723x341']
>>> parsed_text.emails
['[email protected]']

Alternatively, you can generate a single CommonRegex instance and use it to parse multiple segments of text.

>>> parser = CommonRegex()
>>> parser.times("When are you free?  Do you want to meet up for coffee at 4:00?")
['4:00']

Finally, all regular expressions used are publicly exposed.

>>> from commonregex import email
>>> import re
>>> text = "...get in touch with my associate at [email protected]"
>>> re.sub(email, "[email protected]", text)
'...get in touch with my associate at [email protected]'
>>> from commonregex import time
>>> for m in time.finditer("Does 6:00 or 7:00 work better?"):
>>>     print m.start(), m.group()     
5 6:00 
13 7:00 

Please note that this module is currently English/US specific.

Supported Methods/Attributes

  • obj.dates, obj.dates()
  • obj.times, obj.times()
  • obj.phones, obj.phones()
  • obj.phones_with_exts, obj.phones_with_exts()
  • obj.links, obj.links()
  • obj.emails, obj.emails()
  • obj.ips, obj.ips()
  • obj.ipv6s, obj.ipv6s()
  • obj.prices, obj.prices()
  • obj.hex_colors, obj.hex_colors()
  • obj.credit_cards, obj.credit_cards()
  • obj.btc_addresses, obj.btc_addresses()
  • obj.street_addresses, obj.street_addresses()
  • obj.zip_codes, obj.zip_codes()
  • obj.po_boxes, obj.po_boxes()
  • obj.ssn_number, obj.ssn_number()

CommonRegex Ports:

CommonRegexRust

[CommonRegexJS] (https://github.com/talyssonoc/CommonRegexJS)

[CommonRegexScala] (https://github.com/everpeace/CommonRegexScala)

[CommonRegexJava] (https://github.com/talyssonoc/CommonRegexJava)

[CommonRegexCobra] (https://github.com/PurityLake/CommonRegex-Cobra)

[CommonRegexDart] (https://github.com/aufdemrand/CommonRegexDart)

[CommonRegexRuby] (https://github.com/talyssonoc/CommonRegexRuby)

[CommonRegexPHP] (https://github.com/james2doyle/CommonRegexPHP)

Analytics

Comments
  • Regex for HTML/XHTML

    Regex for HTML/XHTML

    adding support for identifying tags and their attributes as well as replacement of attribute values making it more awesome. I am half way down there need some testing.

    opened by fresco1108 6
  • Added a class of regular expressions that cover phone numbers w/ extensions

    Added a class of regular expressions that cover phone numbers w/ extensions

    Right now, CommonRegex does not match phone numbers that include extensions. I added these in as an additional type of regular expressions, phones_with_exts. I updated the tests, along with the readme to include the new type.

    opened by JasonKessler 3
  • Return position of matched text

    Return position of matched text

    Instead of returning an array of literals, return an array of objects of matched text and start position inside the original parsed text. Also, it would be good to have a list of all matched texts sorted by their position on the original text.

    opened by piranna 3
  • IPv6 Support

    IPv6 Support

    Can we get IPv6 support on .ip? Wouldn't mind if it were a .ipv6, actually, to keep it separate.

    CommonRegex("2001:0db8::ff00:0042:8329").ip
    []
    
    opened by KSoute 2
  • Added money support

    Added money support

    $[ws][sign][digits,]digits[.fractional-digits][ws]

    Elements in square brackets ([ and ]) are optional. The following table describes each element.

    ELEMENT DESCRIPTION ws Optional white space. sign An optional sign. digits A sequence of digits ranging from 0 to 9. , A culture-specific thousands separator symbol. . A culture-specific decimal point symbol. fractional-digits A sequence of digits ranging from 0 to 9.

    opened by vasilcovsky 2
  • Link to Rust Support?

    Link to Rust Support?

    @madisonmay Check out this rust version of CommonRegex!

    https://github.com/hskang9/CommonRegexRust

    https://crates.io/crates/commonregex

    Can I link this to README?

    opened by hskang9 1
  • Add regex for hexColor (rgb and rgba)

    Add regex for hexColor (rgb and rgba)

    Hello, some more feature for CommoRegex. Now this module can find rgb and rgba hex color What do you think? Idea comes from JS Port: https://github.com/talyssonoc/CommonRegexJS

    opened by qw1mb0 1
  • Fix raw string literal and add better ip regex

    Fix raw string literal and add better ip regex

    Hello, madisonmay! I'm fix spaces (1 Tab = 2 space, not 4 space). Add better ip regex. And fix raw string literal, because on my system this led to some error, like that: wimbo@wimbo-hp:~/github/CommonRegex$ python3 commonregex.py File "commonregex.py", line 18 return (ur'(?:' + regex + ur')?') ^ SyntaxError: invalid syntax

    But after remove i'm test on Python 2.7.4 and 3.3.1: wimbo@wimbo-hp:~/github/CommonRegex$ python2 --version Python 2.7.4 wimbo@wimbo-hp:~/github/CommonRegex$ python2 commonregex.py ['Jan 9th 2012', '8/23/12'] ['8:00', '5:00AM'] ['(520) 820 7123', '1-230-241-2422'] [] ['[email protected]'] ['127.0.0.1'] wimbo@wimbo-hp:~/github/CommonRegex$ python3 --version Python 3.3.1 wimbo@wimbo-hp:~/github/CommonRegex$ python3 commonregex.py ['Jan 9th 2012', '8/23/12'] ['8:00', '5:00AM'] ['(520) 820 7123', '1-230-241-2422'] [] ['[email protected]'] ['127.0.0.1'

    Looks right, What do you think?

    Sorry my bad english.

    opened by qw1mb0 1
  • AttributeError: 'CommonRegex' object has no attribute 'ssn_number'

    AttributeError: 'CommonRegex' object has no attribute 'ssn_number'

    I am attempting to parse the following test-data.txt with version commonregex==1.5.4:

    2523088780
    social security number: 428-34-4474
    this is far less expensive than the alternative
    114 jeffery street
    usa
    
    from commonregex import CommonRegex
    
    with open('./test-data.txt') as data:
        parsed_text = CommonRegex(data.read())
    

    and receiving the error:

    parsed_text.ssn_number 
    

    >>> AttributeError: 'CommonRegex' object has no attribute 'ssn_number'
    

    no problems with emails, phones, etc:

    >>> parsed_text.emails
    

    ['[email protected]']
    

    Appreciate it

    opened by tlgevers 2
  • Street Address can't parse following

    Street Address can't parse following

    3015 POE RD., HOUSTON, TX 77051

    Do I need to do something first, or is this something that can be corrected in the street parser?

    Oh, the call I am making is for sa in street_address.finditer(res): res = sa.string break

    This is working for other addresses but first one it failed on. Also unlike RE, it stopped on a \n and ignored the next line. Do I need to pre-parse the text?

    opened by quentinjs 1
  • PO Boxes not working

    PO Boxes not working

    I got this error on the most recent version of Commonregex (1.5.4): AttributeError: 'CommonRegex' object has no attribute 'po_boxes'

    My code just creates a parser object and tries to retrieve po_boxes.

    opened by RafaelCenzano 0
  • Reading from CSV error - TypeError: expected string or bytes-like object

    Reading from CSV error - TypeError: expected string or bytes-like object

    Traceback (most recent call last): File "csv_parse.py", line 14, in parsed_text = CommonRegex({row[2]}) File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 53, in init setattr(self, key, method()) File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 39, in regex_method return [x.strip() for x in self.regex.findall(text or self.obj.text)] TypeError: expected string or bytes-like object

    opened by yuva670 0
Owner
Madison May
Machine Learning Architect at @IndicoDataSolutions
Madison May
A collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to provide an easy-to-use access to them.

KGQA Datasets Brief Introduction This repository is a collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to

Semantic Systems research group 21 Jan 6, 2023
Waydroid is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.

Waydroid is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.

WayDroid 4.7k Jan 8, 2023
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

High-Performance and Automatic Computing 151 Dec 24, 2022
poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions

poetry2nix poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions. It does so by parsing pyproject.t

Nix community projects 405 Dec 29, 2022
Wrappers around the most common maya.cmds and maya.api use cases

Maya FunctionSet (maya_fn) A package that decompose core maya.cmds and maya.api features to a set of simple functions. Tests The recommended approach

Ryan Porter 9 Mar 12, 2022
A collection of Workflows samples for various use cases

Workflows Samples Workflows allow you to orchestrate and automate Google Cloud and HTTP-based API services with serverless workflows.

Google Cloud Platform 76 Jan 7, 2023
Collection of system-wide scripts that I use on my Gentoo

linux-scripts Collection of scripts that I use on my Gentoo machine. I tend to put all scripts in /scripts directory. It is not likely that you would

Xoores 1 Jan 9, 2022
Team Curie is a group of people working together to achieve a common aim

Team Curie is a group of people working together to achieve a common aim. We are enthusiasts!.... We are setting the pace!.... We offer encouragement and motivation....And we believe TeamWork makes the DreamWork.

null 4 Aug 7, 2021
A practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python without library.

Finding-LCM-using-python-from-scratch Here, I write a practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python wi

Sachin Vinayak Dabhade 4 Sep 24, 2021
An improved version of the common ˙pacman -S˙

BetterPacmanLook An improved version of the common pacman -S. Installation I know that this is probably one of the worst solutions and i will be worki

null 1 Nov 6, 2021
A compilation of useful scripts to automate common tasks

Scripts-To-Automate-This A compilation of useful scripts for common tasks Name What it does Type Add file extensions Adds ".png" to a list of file nam

null 0 Nov 5, 2021
The goal of this program was to find the most common color in my living room.

The goal of this program was to find the most common color in my living room. I found a dataset online with colors names and their corr

null 1 Nov 9, 2021
pyToledo is a Python library to interact with the common virtual learning environment for the Association KU Leuven (Toledo).

pyToledo pyToledo is a Python library to interact with the common virtual learning environment for the Association KU Leuven a.k.a Toledo. Motivation

Daan Vervacke 5 Jan 3, 2022
Data wrangling & common calculations for results from qMem measurement software

qMem Datawrangler This script processes output of qMem measurement software into an Origin ® compatible *.csv files and matplotlib graphs to quickly v

Julian 1 Nov 30, 2021
A calculator for common measurements used in sci-fi books.

Sci-fi-speed-calculator A calculator for common measurements used in sci-fi books. Author: Tyler Windmemuth Purpose: This program allows sci-fi author

Tyler Windemuth 0 Apr 22, 2022
A Python tool to check ASS subtitles for common mistakes and errors.

A Python tool to check ASS subtitles for common mistakes and errors.

null 1 Dec 18, 2021
Python Common things by Problem Fighter Library, (Exception, Debug Log, etc.)

In the name of God, the Most Gracious, the Most Merciful. PF-PY-Common Documentation Install and update using pip: pip install -U xxxx Please find the

Problem Fighter 3 Jan 15, 2022
HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures

HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures.

EntySec 8 Nov 9, 2022
Project aims to map out common user behavior on the computer

User-Behavior-Mapping-Tool Project aims to map out common user behavior on the computer. Most of the code is based on the research by kacos2000 found

trustedsec 136 Dec 23, 2022