A collection of common regular expressions bundled with an easy to use interface.

Overview

CommonRegex

Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the hard work so you don't have to.

Pull requests welcome!

Installation

Install via pip

sudo pip install commonregex

or via setup.py

python setup.py install

Usage

>>> from commonregex import CommonRegex
>>> parsed_text = CommonRegex("""John, please get that article on www.linkedin.com to me by 5:00PM 
                               on Jan 9th 2012. 4:00 would be ideal, actually. If you have any 
                               questions, You can reach me at (519)-236-2723x341 or get in touch with
                               my associate at [email protected]""")
>>> parsed_text.times
['5:00PM', '4:00']
>>> parsed_text.dates
['Jan 9th 2012']
>>> parsed_text.links
['www.linkedin.com']
>>> parsed_text.phones
['(519)-236-2727']
>>> parsed_text.phones_with_exts
['(519)-236-2723x341']
>>> parsed_text.emails
['[email protected]']

Alternatively, you can generate a single CommonRegex instance and use it to parse multiple segments of text.

>>> parser = CommonRegex()
>>> parser.times("When are you free?  Do you want to meet up for coffee at 4:00?")
['4:00']

Finally, all regular expressions used are publicly exposed.

>>> from commonregex import email
>>> import re
>>> text = "...get in touch with my associate at [email protected]"
>>> re.sub(email, "[email protected]", text)
'...get in touch with my associate at [email protected]'
>>> from commonregex import time
>>> for m in time.finditer("Does 6:00 or 7:00 work better?"):
>>>     print m.start(), m.group()     
5 6:00 
13 7:00 

Please note that this module is currently English/US specific.

Supported Methods/Attributes

  • obj.dates, obj.dates()
  • obj.times, obj.times()
  • obj.phones, obj.phones()
  • obj.phones_with_exts, obj.phones_with_exts()
  • obj.links, obj.links()
  • obj.emails, obj.emails()
  • obj.ips, obj.ips()
  • obj.ipv6s, obj.ipv6s()
  • obj.prices, obj.prices()
  • obj.hex_colors, obj.hex_colors()
  • obj.credit_cards, obj.credit_cards()
  • obj.btc_addresses, obj.btc_addresses()
  • obj.street_addresses, obj.street_addresses()
  • obj.zip_codes, obj.zip_codes()
  • obj.po_boxes, obj.po_boxes()
  • obj.ssn_number, obj.ssn_number()

CommonRegex Ports:

CommonRegexRust

[CommonRegexJS] (https://github.com/talyssonoc/CommonRegexJS)

[CommonRegexScala] (https://github.com/everpeace/CommonRegexScala)

[CommonRegexJava] (https://github.com/talyssonoc/CommonRegexJava)

[CommonRegexCobra] (https://github.com/PurityLake/CommonRegex-Cobra)

[CommonRegexDart] (https://github.com/aufdemrand/CommonRegexDart)

[CommonRegexRuby] (https://github.com/talyssonoc/CommonRegexRuby)

[CommonRegexPHP] (https://github.com/james2doyle/CommonRegexPHP)

Analytics

Comments
  • Regex for HTML/XHTML

    Regex for HTML/XHTML

    adding support for identifying tags and their attributes as well as replacement of attribute values making it more awesome. I am half way down there need some testing.

    opened by fresco1108 6
  • Added a class of regular expressions that cover phone numbers w/ extensions

    Added a class of regular expressions that cover phone numbers w/ extensions

    Right now, CommonRegex does not match phone numbers that include extensions. I added these in as an additional type of regular expressions, phones_with_exts. I updated the tests, along with the readme to include the new type.

    opened by JasonKessler 3
  • Return position of matched text

    Return position of matched text

    Instead of returning an array of literals, return an array of objects of matched text and start position inside the original parsed text. Also, it would be good to have a list of all matched texts sorted by their position on the original text.

    opened by piranna 3
  • IPv6 Support

    IPv6 Support

    Can we get IPv6 support on .ip? Wouldn't mind if it were a .ipv6, actually, to keep it separate.

    CommonRegex("2001:0db8::ff00:0042:8329").ip
    []
    
    opened by KSoute 2
  • Added money support

    Added money support

    $[ws][sign][digits,]digits[.fractional-digits][ws]

    Elements in square brackets ([ and ]) are optional. The following table describes each element.

    ELEMENT DESCRIPTION ws Optional white space. sign An optional sign. digits A sequence of digits ranging from 0 to 9. , A culture-specific thousands separator symbol. . A culture-specific decimal point symbol. fractional-digits A sequence of digits ranging from 0 to 9.

    opened by vasilcovsky 2
  • Link to Rust Support?

    Link to Rust Support?

    @madisonmay Check out this rust version of CommonRegex!

    https://github.com/hskang9/CommonRegexRust

    https://crates.io/crates/commonregex

    Can I link this to README?

    opened by hskang9 1
  • Add regex for hexColor (rgb and rgba)

    Add regex for hexColor (rgb and rgba)

    Hello, some more feature for CommoRegex. Now this module can find rgb and rgba hex color What do you think? Idea comes from JS Port: https://github.com/talyssonoc/CommonRegexJS

    opened by qw1mb0 1
  • Fix raw string literal and add better ip regex

    Fix raw string literal and add better ip regex

    Hello, madisonmay! I'm fix spaces (1 Tab = 2 space, not 4 space). Add better ip regex. And fix raw string literal, because on my system this led to some error, like that: wimbo@wimbo-hp:~/github/CommonRegex$ python3 commonregex.py File "commonregex.py", line 18 return (ur'(?:' + regex + ur')?') ^ SyntaxError: invalid syntax

    But after remove i'm test on Python 2.7.4 and 3.3.1: wimbo@wimbo-hp:~/github/CommonRegex$ python2 --version Python 2.7.4 wimbo@wimbo-hp:~/github/CommonRegex$ python2 commonregex.py ['Jan 9th 2012', '8/23/12'] ['8:00', '5:00AM'] ['(520) 820 7123', '1-230-241-2422'] [] ['[email protected]'] ['127.0.0.1'] wimbo@wimbo-hp:~/github/CommonRegex$ python3 --version Python 3.3.1 wimbo@wimbo-hp:~/github/CommonRegex$ python3 commonregex.py ['Jan 9th 2012', '8/23/12'] ['8:00', '5:00AM'] ['(520) 820 7123', '1-230-241-2422'] [] ['[email protected]'] ['127.0.0.1'

    Looks right, What do you think?

    Sorry my bad english.

    opened by qw1mb0 1
  • AttributeError: 'CommonRegex' object has no attribute 'ssn_number'

    AttributeError: 'CommonRegex' object has no attribute 'ssn_number'

    I am attempting to parse the following test-data.txt with version commonregex==1.5.4:

    2523088780
    social security number: 428-34-4474
    this is far less expensive than the alternative
    114 jeffery street
    usa
    
    from commonregex import CommonRegex
    
    with open('./test-data.txt') as data:
        parsed_text = CommonRegex(data.read())
    

    and receiving the error:

    parsed_text.ssn_number 
    

    >>> AttributeError: 'CommonRegex' object has no attribute 'ssn_number'
    

    no problems with emails, phones, etc:

    >>> parsed_text.emails
    

    ['[email protected]']
    

    Appreciate it

    opened by tlgevers 2
  • Street Address can't parse following

    Street Address can't parse following

    3015 POE RD., HOUSTON, TX 77051

    Do I need to do something first, or is this something that can be corrected in the street parser?

    Oh, the call I am making is for sa in street_address.finditer(res): res = sa.string break

    This is working for other addresses but first one it failed on. Also unlike RE, it stopped on a \n and ignored the next line. Do I need to pre-parse the text?

    opened by quentinjs 1
  • PO Boxes not working

    PO Boxes not working

    I got this error on the most recent version of Commonregex (1.5.4): AttributeError: 'CommonRegex' object has no attribute 'po_boxes'

    My code just creates a parser object and tries to retrieve po_boxes.

    opened by RafaelCenzano 0
  • Reading from CSV error - TypeError: expected string or bytes-like object

    Reading from CSV error - TypeError: expected string or bytes-like object

    Traceback (most recent call last): File "csv_parse.py", line 14, in parsed_text = CommonRegex({row[2]}) File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 53, in init setattr(self, key, method()) File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 39, in regex_method return [x.strip() for x in self.regex.findall(text or self.obj.text)] TypeError: expected string or bytes-like object

    opened by yuva670 0
Owner
Madison May
Machine Learning Architect at @IndicoDataSolutions
Madison May
tade is a discussion/forum/link aggregator application. It provides three interfaces: a regular web page, a mailing list bridge and an NNTP server

tade is a discussion/forum/link aggregator application. It provides three interfaces: a regular web page, a mailing list bridge and an NNTP server

Manos Pitsidianakis 23 Nov 4, 2022
Enable ++x and --x expressions in Python

By default, Python supports neither pre-increments (like ++x) nor post-increments (like x++). However, the first ones are syntactically correct since Python parses them as two subsequent +x operations, where + is the unary plus operator (same with --x and the unary minus). They both have no effect, since in practice -(-x) == +(+x) == x.

Alexander Borzunov 85 Dec 29, 2022
Let's renew the puzzle collection. We'll produce a collection of new puzzles out of the lichess game database.

Let's renew the puzzle collection. We'll produce a collection of new puzzles out of the lichess game database.

Thibault Duplessis 96 Jan 3, 2023
MongoDB utility to inflate the contents of small collection to a new larger collection

MongoDB Data Inflater ("data-inflater") The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using

Paul Done 3 Nov 28, 2021
A repository containing several general purpose Python scripts to automate daily and common tasks.

General Purpose Scripts Introduction This repository holds a curated list of Python scripts which aim to help us automate daily and common tasks. You

GDSC RCCIIT 46 Dec 25, 2022
A script to check for common mistakes in LaTeX source files of scientific papers.

LaTeX Paper Linter This script checks for common mistakes in LaTeX source files of scientific papers. Usage python3 paperlint.py <file.tex> [-i/x <inc

Michael Schwarz 12 Nov 16, 2022
A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

null 7 Sep 8, 2022
Stubmaker is an easy-to-use tool for generating python stubs.

Stubmaker is an easy-to-use tool for generating python stubs. Requirements Stubmaker is to be run under Python 3.7.4+ No side effects during

Toloka 24 Aug 28, 2022
Shypan, a simple, easy to use, full-featured library written in Python.

Shypan, a simple, easy to use, full-featured library written in Python.

ShypanLib 4 Dec 8, 2021
Finds price floor for every single attribute in a given collection

Solana Solanart Scanner Enjoy the Free Code Steps to run Download VS Code

Dalton Nisbett 19 Oct 20, 2022
Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

PhynX 6 Dec 7, 2021
A collection of custom scripts for working with Quake assets.

Custom Quake Tools A collection of custom scripts for working with Quake assets. Features Script to list all BSP files in a Quake mod

Jason Brownlee 3 Jul 5, 2022
A collection of resources/tools and analyses for the angr binary analysis framework.

Awesome angr A collection of resources/tools and analyses for the angr binary analysis framework. This page does not only collect links and external r

null 105 Jan 2, 2023
Modest utility collection for development with AIOHTTP framework.

aiohttp-things Modest utility collection for development with AIOHTTP framework. Documentation https://aiohttp-things.readthedocs.io Installation Inst

Ruslan Ilyasovich Gilfanov 0 Dec 11, 2022
Collection of code auto-generation utility scripts for the Horizon `Boot` system module

boot-scripts This is a collection of code auto-generation utility scripts for the Horizon Boot system module, intended for use in Atmosphère. Usage Us

null 4 Oct 11, 2022
Airspy-Utils is a small software collection to help with firmware related operations on Airspy HF+ devices.

Airspy-Utils Airspy-Utils is a small software collection to help with firmware related operations on Airspy HF+ devices on Linux (and other free syste

Dhiru Kholia 11 Oct 4, 2022
A collection of tools for biomedical research assay analysis in Python.

waltlabtools A collection of tools for biomedical research assay analysis in Python. Key Features Analysis for assays such as digital ELISA, including

Tyler Dougan 1 Apr 18, 2022
A collection of utility functions to prototype geometry processing research in python

gpytoolbox This repo is a work in progress and contains general utility functions I have needed to code while trying to work on geometry process resea

Silvia Sellán 73 Jan 6, 2023
osqueryIR is an artifact collection tool for Linux systems.

osqueryIR osqueryIR is an artifact collection tool for Linux systems. It provides the following capabilities: Execute osquery SQL queries Collect file

AbdulRhman Alfaifi 7 Nov 2, 2022