python parser for human readable dates

Overview


Dateparser

Python parser for human readable dates

PyPI - Downloads PypI - Version Code Coverage Travis - Build Readthedocs - Docs

Key FeaturesHow To UseInstallationCommon use casesYou may also like...License

Key Features

  • Support for almost every existing date format: absolute dates, relative dates ("two weeks ago" or "tomorrow"), timestamps, etc.
  • Support for more than 200 language locales.
  • Language autodetection
  • Customizable behavior through settings.
  • Support for non-Gregorian calendar systems.
  • Support for dates with timezones abbreviations or UTC offsets ("August 14, 2015 EST", "21 July 2013 10:15 pm +0500"...)
  • Search dates in longer texts.

How To Use

The most straightforward way to parse dates with dateparser is to use the dateparser.parse() function, that wraps around most of the functionality of the module.

>>> import dateparser

>>> dateparser.parse('Fri, 12 Dec 2014 10:55:50')
datetime.datetime(2014, 12, 12, 10, 55, 50)

>>> dateparser.parse('1991-05-17')
datetime.datetime(1991, 5, 17, 0, 0)

>>> dateparser.parse('In two months')  # today is 1st Aug 2020
datetime.datetime(2020, 10, 1, 11, 12, 27, 764201)

>>> dateparser.parse('1484823450')  # timestamp
datetime.datetime(2017, 1, 19, 10, 57, 30)

>>> dateparser.parse('January 12, 2012 10:00 PM EST')
datetime.datetime(2012, 1, 12, 22, 0, tzinfo=<StaticTzInfo 'EST'>)

As you can see, dateparser works with different date formats, but it can also be used directly with strings in different languages:

>>> dateparser.parse('Martes 21 de Octubre de 2014')  # Spanish (Tuesday 21 October 2014)
datetime.datetime(2014, 10, 21, 0, 0)

>>> dateparser.parse('Le 11 Décembre 2014 à 09:00')  # French (11 December 2014 at 09:00)
datetime.datetime(2014, 12, 11, 9, 0)

>>> dateparser.parse('13 января 2015 г. в 13:34')  # Russian (13 January 2015 at 13:34)
datetime.datetime(2015, 1, 13, 13, 34)

>>> dateparser.parse('1 เดือนตุลาคม 2005, 1:00 AM')  # Thai (1 October 2005, 1:00 AM)
datetime.datetime(2005, 10, 1, 1, 0)

>>> dateparser.parse('yaklaşık 23 saat önce')  # Turkish (23 hours ago), current time: 12:46
datetime.datetime(2019, 9, 7, 13, 46)

>>> dateparser.parse('2小时前')  # Chinese (2 hours ago), current time: 22:30
datetime.datetime(2018, 5, 31, 20, 30)

You can control multiple behaviors by using the settings parameter:

>>> dateparser.parse('2014-10-12', settings={'DATE_ORDER': 'YMD'})
datetime.datetime(2014, 10, 12, 0, 0)

>>> dateparser.parse('2014-10-12', settings={'DATE_ORDER': 'YDM'})
datetime.datetime(2014, 12, 10, 0, 0)

>>> dateparser.parse('1 year', settings={'PREFER_DATES_FROM': 'future'})  # Today is 2020-09-23
datetime.datetime(2021, 9, 23, 0, 0)

>>> dateparser.parse('tomorrow', settings={'RELATIVE_BASE': datetime.datetime(1992, 1, 1)})
datetime.datetime(1992, 1, 2, 0, 0)

To see more examples on how to use the settings, check the settings section in the docs.

False positives

dateparser will do its best to return a date, dealing with multiple formats and different locales. For that reason it is important that the input is a valid date, otherwise it could return false positives.

To reduce the possibility of receiving false positives, make sure that:

  • The input string it's a valid date and it doesn't contain any other words or numbers.
  • If you know the language or languages beforehand you add them through the languages or locales properties.

On the other hand, if you want to exclude any of the default parsers (timestamp, relative-time...) or change the order in which they are executed, you can do so through the settings PARSERS.

Installation

Dateparser supports Python >= 3.5. You can install it by doing:

$ pip install dateparser

If you want to use the jalali or hijri calendar, you need to install the calendars extra:

$ pip install dateparser[calendars]

Common use cases

dateparser can be used with a really different number of purposes, but it stands out when it comes to:

Consuming data from different sources:

  • Scraping: extract dates from different places with several different formats and languages
  • IoT: consuming data coming from different sources with different date formats
  • Tooling: consuming dates from different logs / sources
  • Format transformations: when transforming dates coming from different files (PDF, CSV, etc.) to other formats (database, etc).

Offering natural interaction with users:

  • Tooling and CLI: allow users to write “3 days ago” to retrieve information.
  • Search engine: allow people to search by date in an easiest / natural format.
  • Bots: allow users to interact with a bot easily

You may also like...

  • price-parser - A small library for extracting price and currency from raw text strings.
  • number-parser - Library to convert numbers written in the natural language to it's equivalent numeric forms.
  • Scrapy - Web crawling and web scraping framework

License

BSD3-Clause

Comments
  • Adding after and next keywords with basic functionality.

    Adding after and next keywords with basic functionality.

    resolves #635 resolves #573

    Before

    >>>dateparser.parse('after 15 days').strftime('%a %Y-%m-%d') 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'NoneType' object has no attribute 'strftime'
    
    >>>dateparser.parse('next tuesday').strftime('%a %Y-%m-%d') 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'NoneType' object has no attribute 'strftime'
    

    Now

    >>> dateparser.parse('now').strftime('%a %Y-%m-%d')
    'Sun 2020-03-22'
    
    >>> dateparser.parse('after 15 days').strftime('%a %Y-%m-%d')
    'Mon 2020-04-06'
    
    >>> dateparser.parse('next sunday').strftime('%a %Y-%m-%d')
    'Sun 2020-03-29'
    
    >>> dateparser.parse('next tuesday').strftime('%a %Y-%m-%d')
    'Tue 2020-03-24'
    
    
    opened by harsh9200 48
  • adding support for pytz timezones

    adding support for pytz timezones

    A added 3 tests for EST and all existing tests are passing.

    For your consideration. There was no existing issue for anything to do with it. I'm uncertain in that case if I'm supposed to create one so I will. Issue: https://github.com/scrapinghub/dateparser/issues/5

    example of transitional_info in pytz

    opened by JBKahn 33
  • Optional Language Detect

    Optional Language Detect

    Implementing optional language detection and allow the creation of custom language detection libraries by users.

    This work is currently in progress and this PR is created for the monitoring purpose by the maintainers to suggest progress.

    This PR is currently not complete.

    TODOS:

    • [X] Restructuring protect to pass function for language detection.
    • [X] Managing HTTP exception for fasttext model download.
    • [x] Updating setup.py for optional language detection.
    • [x] Mock test of parse, search_dates and DateDataParser.
    • [x] Unit tests for language detect functions.
    • [x] Functionality analysis of the implemented language detection.
    • [x] Testing optional language detection settings.
    • [x] Make apply_setting independent functions.
    • [x] Fasttext model download manager for models.
    • [x] Documentation optional language detection settings.
    • [x] Language Mapping (CLDR & ISO 639).
    • [x] Check working with other settings.
    • [x] Removing unsupported locale from language detecting
    • [x] Improving docs.
    • [x] Setting fasttext default language.
    • [x] Removing langauge_map.json and sinking to language_data.py
    • [x] langdetect set default DetectorFactory without changing global state.
    • [x] dateparser-download set default caching folder.
    • [x] Documenting dateparser-download.
    • [x] Changes for preventing breaking changes.
    • [x] detect_languages_func -> detect_languages_function
    opened by gavishpoddar 26
  • bug fix and possible fix for #11

    bug fix and possible fix for #11

    The use of lists there seems to make sense (should be forced as a list, right?).

    I'm not sure why you're specifically removing the timezone?

    This resolves https://github.com/scrapinghub/dateparser/issues/11 if that was simply a bug. Couldn't tell the intent of the line.

    In [1]: from dateparser.date import DateDataParser
    
    In [2]: ddp = DateDataParser()
    
    In [3]: ddp.get_date_data('2014-10-09T17:57:39+00:00')['date_obj']
    Out[3]: datetime.datetime(2014, 10, 9, 17, 57, 39, tzinfo=tzutc())
    
    In [4]: ddp.get_date_data('2014-10-09T17:57:39+00:00', '')['date_obj']
    Out[4]: datetime.datetime(2014, 10, 9, 17, 57, 39, tzinfo=tzutc())
    
    In [5]: ddp.get_date_data('2014-10-09T17:57:39+00:00', '%Y')['date_obj']
    Out[5]: datetime.datetime(2014, 10, 9, 17, 57, 39, tzinfo=tzutc())
    
    
    opened by JBKahn 20
  • Added support for fractional units

    Added support for fractional units

    Fixes #753

    Added support for fractional units by making the regex pattern support digits before a decimal point and converting num to a float instead of an int on line 150.

    What this changes

    Before this fix, parse("in 0.5 hours") would return the time 5 hours in the future instead of 0.5 hours in the future.

    With this fix, parsing fractional units are no longer parsed incorrectly.

    Tests

    >>> parse("now")
    datetime.datetime(2021, 2, 7, 18, 50, 10, 107501)
    >>> parse("in 0.5 hours")
    datetime.datetime(2021, 2, 7, 19, 20, 15, 773800)
    >>> parse("in 3.75 hours") 
    datetime.datetime(2021, 2, 7, 22, 35, 20, 463238)
    >>> parse("2.5 hours ago")  
    datetime.datetime(2021, 2, 7, 16, 20, 29, 614937)
    

    What was changed

    To make it easier for future reviewers, the change is much smaller than it appears to be. The reason there are 131+ files changed is because the regex in write_complete_data.py was updated and ran.

    Very few lines of code were changed in this PR:

    - PATTERN = re.compile(r'(\d+)\s*(%s)\b' % _UNITS, re.I | re.S | re.U)
    + PATTERN = re.compile(r'(\d+[.,]?\d*)\s*(%s)\b' % _UNITS, re.I | re.S | re.U)
    
    - kwargs[unit + 's'] = int(num)
    + kwargs[unit + 's'] = float(num.replace(",", "."))
    
    - string = RELATIVE_PATTERN.sub(r'(\\d+)', string)
    + string = RELATIVE_PATTERN.sub(r'(\\d+[.,]?\\d*)', string)
    
    • Number regex changed in supplementary yaml files for custom relative-type-regex sections (multiple locations)
    In all occurrences in supplementary yaml:
    - (\d+)
    + (\d+[.,]?\d*)
    
    line 40
    + RE_SANITIZE_CROATIAN = re.compile(r'(\d+)\.\s?(\d+)\.\s?(\d+)\.( u)?', flags=re.I | re.U)
    line 110
    + date_string = RE_SANITIZE_CROATIAN.sub(r'\1.\2.\3 ', date_string)  # extra '.' and 'u' interferes with parsing relative fractional dates
    
    - return not (all([isinstance(x, str) for x in value])
    -                and all(['(\\d+)' in x for x in value]))
    + return not (all([isinstance(x, str) for x in value])
    +                and all(['(\\d+[.,]?\\d*)' in x for x in value]))
    
    line 99
    + # Fractional units
    + param('2.5 hours', ago={'hours': 2.5}, period='day'),
    + param('10.75 minutes', ago={'minutes': 10.75}, period='day'),
    + param('1.5 days', ago={'days': 1.5}, period='day'),
    line 616
    + # Fractional units
    + param('2.5 hours', ago={'hours': 2.5}, period='day'),
    + param('10.75 minutes', ago={'minutes': 10.75}, period='day'),
    + param('1.5 days', ago={'days': 1.5}, period='day'),
    line 1108
    + # Fractional units
    + param('in 2.5 hours', in_future={'hours': 2.5}, period='day'),
    + param('in 10.75 minutes', in_future={'minutes': 10.75}, period='day'),
    + param('in 1.5 days', in_future={'days': 1.5}, period='day'),
    + param('in 0,5 hours', in_future={'hours': 0.5}, period='day'),
    line 1635
    + # Fractional units
    + param('2.5 hours ago', date(2010, 6, 4), time(10, 45)),
    + param('in 10.75 minutes', date(2010, 6, 4), time(13, 25, 45)),
    + param('in 1.5 days', date(2010, 6, 6), time(1, 15)),
    + param('0,5 hours ago', date(2010, 6, 4), time(12, 45)),
    

    See these comments for shortcuts to view the changes

    opened by DenverCoder1 19
  • Bad escape characters trigger an exception

    Bad escape characters trigger an exception

    Note: As a workaround for this issue, we have pinned regex. Which makes Python 3.11 support either impossible or uncomfortable. The goal now is to remove that version pin on regex without making this issue resurface.

    Hello everyone,

    Tried parsing under python 3.7.5 and 3.9

    dateparser.parse('12/12/12')

    It also gives the same output for any "valid" input shown in the doc:

    dateparser.parse('Fri, 12 Dec 2014 10:55:50')
    dateparser.parse('22 Décembre 2010', date_formats=['%d %B %Y'])
    ...
    

    Here's the error:

    
    ---------------------------------------------------------------------------
    error                                     Traceback (most recent call last)
    Input In [46], in <cell line: 1>()
    ----> 1 dateparser.parse("12/12/12")
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\conf.py:92, in apply_settings.<locals>.wrapper(*args, **kwargs)
         89 if not isinstance(kwargs['settings'], Settings):
         90     raise TypeError("settings can only be either dict or instance of Settings class")
    ---> 92 return f(*args, **kwargs)
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\__init__.py:61, in parse(date_string, date_formats, languages, locales, region, settings, detect_languages_function)
         57 if languages or locales or region or detect_languages_function or not settings._default:
         58     parser = DateDataParser(languages=languages, locales=locales,
         59                             region=region, settings=settings, detect_languages_function=detect_languages_function)
    ---> 61 data = parser.get_date_data(date_string, date_formats)
         63 if data:
         64     return data['date_obj']
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\date.py:428, in DateDataParser.get_date_data(self, date_string, date_formats)
        425 date_string = sanitize_date(date_string)
        427 for locale in self._get_applicable_locales(date_string):
    --> 428     parsed_date = _DateLocaleParser.parse(
        429         locale, date_string, date_formats, settings=self._settings)
        430     if parsed_date:
        431         parsed_date['locale'] = locale.shortname
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\date.py:178, in _DateLocaleParser.parse(cls, locale, date_string, date_formats, settings)
        175 @classmethod
        176 def parse(cls, locale, date_string, date_formats=None, settings=None):
        177     instance = cls(locale, date_string, date_formats, settings)
    --> 178     return instance._parse()
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\date.py:182, in _DateLocaleParser._parse(self)
        180 def _parse(self):
        181     for parser_name in self._settings.PARSERS:
    --> 182         date_data = self._parsers[parser_name]()
        183         if self._is_valid_date_data(date_data):
        184             return date_data
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\date.py:196, in _DateLocaleParser._try_freshness_parser(self)
        194 def _try_freshness_parser(self):
        195     try:
    --> 196         return freshness_date_parser.get_date_data(self._get_translated_date(), self._settings)
        197     except (OverflowError, ValueError):
        198         return None
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\date.py:234, in _DateLocaleParser._get_translated_date(self)
        232 def _get_translated_date(self):
        233     if self._translated_date is None:
    --> 234         self._translated_date = self.locale.translate(
        235             self.date_string, keep_formatting=False, settings=self._settings)
        236     return self._translated_date
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\languages\locale.py:131, in Locale.translate(self, date_string, keep_formatting, settings)
        128 dictionary = self._get_dictionary(settings)
        129 date_string_tokens = dictionary.split(date_string, keep_formatting)
    --> 131 relative_translations = self._get_relative_translations(settings=settings)
        133 for i, word in enumerate(date_string_tokens):
        134     word = word.lower()
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\languages\locale.py:158, in Locale._get_relative_translations(self, settings)
        155 if settings.NORMALIZE:
        156     if self._normalized_relative_translations is None:
        157         self._normalized_relative_translations = (
    --> 158             self._generate_relative_translations(normalize=True))
        159     return self._normalized_relative_translations
        160 else:
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\dateparser\languages\locale.py:172, in Locale._generate_relative_translations(self, normalize)
        170     value = list(map(normalize_unicode, value))
        171 pattern = '|'.join(sorted(value, key=len, reverse=True))
    --> 172 pattern = DIGIT_GROUP_PATTERN.sub(r'?P<n>\d+', pattern)
        173 pattern = re.compile(r'^(?:{})$'.format(pattern), re.UNICODE | re.IGNORECASE)
        174 relative_dictionary[pattern] = key
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\regex\regex.py:700, in _compile_replacement_helper(pattern, template)
        695     break
        696 if ch == "\\":
        697     # '_compile_replacement' will return either an int group reference
        698     # or a string literal. It returns items (plural) in order to handle
        699     # a 2-character literal (an invalid escape sequence).
    --> 700     is_group, items = _compile_replacement(source, pattern, is_unicode)
        701     if is_group:
        702         # It's a group, so first flush the literal.
        703         if literal:
    
    File c:\users\strey\appdata\local\programs\python\python39\lib\site-packages\regex\_regex_core.py:1736, in _compile_replacement(source, pattern, is_unicode)
       1733         if value is not None:
       1734             return False, [value]
    -> 1736     raise error("bad escape \\%s" % ch, source.string, source.pos)
       1738 if isinstance(source.sep, bytes):
       1739     octal_mask = 0xFF
    
    error: bad escape \d at position 7
    

    How to reproduce: Env: windows 10

    • Fresh install of python 3.7.5 or 3.9
    • Make a simple python file including these 2 lines:
    import dateparser
    dateparser.parse("12/12/12")
    
    good first issue 
    opened by Etirf 17
  • Improving language detection

    Improving language detection

    Improving language detection using optional language detection library

    Related Issues #567, #575, #612.

    Implementing plugging wrappers for optional language detection library, wrapper template, and docs on implementing a custom wrapper.

    opened by gavishpoddar 17
  • WIP : added support for century

    WIP : added support for century

    @noviluni sir, This is PR which show's the code for error #817

    This PR add's support for century word #725 So, here I was unable to generate en.py because of this (#817 ) error !! Thanks

    opened by NEERAJAP2001 14
  • Huge performance problem: `DateDataParser` stored multiple duplicate `previous_locales`

    Huge performance problem: `DateDataParser` stored multiple duplicate `previous_locales`

    We got surprised that dateparser.parse() gets slower over time (in a long-running process ~1 day), from hundereds of ms to 3 seconds per call! One possible cause is the following:

    DateDataParser.get_date_data() stores some previously used locales:

    if self.try_previous_locales:
        self.previous_locales.insert(0, locale)
    

    However, we observed that dateparser._default_parser.previous_locales contained many duplicate instances of the same set of locales. Eg. 2189 instances of 63 unique locales.

    The problem with the code above is that self.previous_locales should be a set, possibly ordered, but there's no check that locale is already inside.

    Anyway, even when we run dateparser.parse() it's pretty slow (~400 ms / item). However with only languages=['en] it's really fast (~0.25 ms / item).

    Package version: latest 0.7.0, Python 2.7.

    performance 
    opened by bzamecnik 14
  • Result of dateparser.parse() should not depend on locale

    Result of dateparser.parse() should not depend on locale

    Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dateparser
    >>> dateparser.__version__
    '0.5.0'
    >>> import locale
    >>> locale.getlocale()
    (None, None)
    >>> dateparser.parse('21-FEB-2016', date_formats=['%d-%b-%Y'], languages=['en'])
    datetime.datetime(2016, 2, 21, 0, 0)
    >>> locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
    'fr_FR.UTF-8'
    >>> dateparser.parse('21-FEB-2016', date_formats=['%d-%b-%Y'], languages=['en'])
    >>>
    

    This is a problem for me: I want to be able to get a good result of dateparser.parse when I give languages=['en'] as argument even if my locale is not english. The users of my software (Odoo) may have any locale, but dateparser.parse() used with the "languages" argument should always give the same result !

    opened by alexis-via 12
  • Python 3.9 PytzUsageWarning

    Python 3.9 PytzUsageWarning

    When upgrading to Python 3.9, getting the following warning:

    Python 3.9.5 (default, Jun  4 2021, 12:28:51)
    [GCC 7.5.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dateparser
    >>> dateparser.parse("2021-08-02")
    /opt/conda/lib/python3.9/site-packages/dateparser/date_parser.py:35: PytzUsageWarning: The localize method is no longer necessary, as this time zone supports the fold attribute (PEP 495). For more details on migrating to a PEP 495-compliant implementation, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
      date_obj = stz.localize(date_obj)
    datetime.datetime(2021, 8, 2, 0, 0)
    

    Is there an anticipated fix for this or a way to suppress this?

    Thanks!

    cleanup 
    opened by kkoehncke 11
  • 1st - 12th treated as months rather than days

    1st - 12th treated as months rather than days

    Using the common date abbreviations of Date + st, nd, rd, th alone returns months for 1st - 12th rather than days.

    Example of current behavior (desired) with 13th - 31st

    dateparser.parse("13th")
    dateparser.parse("25th 5pm")
    > 2023-01-13 00:00:00
    > 2023-01-25 17:00:00
    

    Example of current behavior (unwanted) with 1st - 12th

    dateparser.parse("2nd 9pm")
    dateparser.parse("12th 11am")
    > 2023-02-09 21:00:00
    > 2023-12-09 11:00:00
    

    Example of desired behavior with 1st - 12th

    dateparser.parse("2nd 9pm")
    dateparser.parse("12th 11am")
    > 2023-01-02 21:00:00
    > 2023-01-12 11:00:00
    

    I'm aware this may be avoided by passing a DATE_ORDER option through the settings, but that would break other functionality when the default order of MDY is desired. And the use of the date abbreviations specified above are very common even with MDY, so some sort of solution to allow both would be ideal.

    opened by Etorix0005 0
  • Fixing current_period logic for weekdays

    Fixing current_period logic for weekdays

    When the current_period setting is used the current period for weekdays should be treated as the current week.

    Test Date: 2023-01-09 00:00:00 (Monday) Test Input: dateparser.parse("Thursday", settings={'PREFER_DATES_FROM': 'current_period'}) Current Output: 2023-01-05 00:00:00 (Previous Week) Desired Output: 2023-01-12 00:00:00 (Current Week)

    opened by Etorix0005 0
  • Is not parsing properly the Last-Modified header value when is a Monday

    Is not parsing properly the Last-Modified header value when is a Monday

    Hi,

    Just in case, in the previous version we were able to parse the following 'Mon, 05 Dec 2022 19:43:03 GMT' without any problem and now is returning None.

    This is happening only when is Monday.

    It could be related to this commit: https://github.com/scrapinghub/dateparser/commit/89a5b29517a358e2784cc818687b7b6bed343159 ?

    Thank you!

    Best, Nicolas.

    Type: Bug Status: Bug confirmed 
    opened by rustico 1
  • Parse settings do not accept subclass types

    Parse settings do not accept subclass types

    Using a settings dictionary when parsing a date results in an error if the provided object is a subclass of the type specified in the dateparser settings schema, such as when using a pendulum.DateTime object . So running

    settings = {
        "RELATIVE_BASE": pendulum.now()
    }
    
    dateparser.parse("-15m", settings=settings)
    

    will generate the error

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/lib/python3.8/site-packages/dateparser/conf.py", line 92, in wrapper
        return f(*args, **kwargs)
      File "/lib/python3.8/site-packages/dateparser/__init__.py", line 58, in parse
        parser = DateDataParser(languages=languages, locales=locales,
      File "/lib/python3.8/site-packages/dateparser/conf.py", line 92, in wrapper
        return f(*args, **kwargs)
      File "/lib/python3.8/site-packages/dateparser/date.py", line 387, in __init__
        check_settings(settings)
      File "/lib/python3.8/site-packages/dateparser/conf.py", line 244, in check_settings
        raise SettingValidationError(
    dateparser.conf.SettingValidationError: "RELATIVE_BASE" must be "datetime", not "DateTime".
    

    Even though a pendulum.DateTime is a subclass of a datetime object. The check at this line https://github.com/scrapinghub/dateparser/blob/master/dateparser/conf.py#L243 should call isinstance instead of directly checking type equality to ensure subclassed objects are allowed in settings definitions. This is a regression from previous versions where pendulum.DateTime objects were allowed.

    Type: Bug 
    opened by majikman111 0
  • [WIP] Fix PytzUsageWarning

    [WIP] Fix PytzUsageWarning

    Close #1089

    >>> from datetime import datetime
    >>> dateparser.parse(
    ...     "in 3 hours",
    ...     settings={
    ...         "TO_TIMEZONE": "UTC",
    ...         "RELATIVE_BASE": datetime.now(),
    ...     },
    ... )
    datetime.datetime(2022, 12, 7, 14, 43, 47, 345343)
    
    opened by serhii73 1
  • Replace OrderedDict to dict

    Replace OrderedDict to dict

    Makes sense wherever it does not affect external APIs. e.g. if a function uses OrderedDict internally, then yes; if a function returns OrderedDict, we probably should follow a deprecation approach, e.g. deprecate that function (but keep it for a time) and provide a new one that returns a dict.

    opened by serhii73 3
Releases(v1.1.5)
  • v1.1.5(Dec 29, 2022)

    Improvements:

    • Parse short versions of day, month, and year (#1103)
    • Add a test for “in 1d” (#1104)
    • Update languages_info (#1107)
    • Add a workaround for zipimporter not having exec_module before Python 3.10 (#1069)
    • Stabilize tests at midnight (#1111)
    • Add a test case for French (#1110)

    Cleanups:

    • Remove the requirements-build file (#1113)
    Source code(tar.gz)
    Source code(zip)
  • v1.1.4(Nov 21, 2022)

    Improvements:

    • Improved support for languages such as Slovak, Indonesian, Hindi, German, and Japanese (#1064, #1094, #986, #1071, #1068)
    • Recursively create a model home (#996)
    • Replace regex sub with simple string replace (#1095)
    • Add Python 3.10, 3.11 support (#1096)
    • Drop support for Python 3.5, 3.6 versions (#1097)
    Source code(tar.gz)
    Source code(zip)
  • v1.1.3(Nov 3, 2022)

    New features:

    • Add support for fractional units (#876)

    Improvements:

    • Fix the returned datetime skipping a day with time+timezone input and PREFER_DATES_FROM = 'future' (#1002)
    • Fix input translation breaking keep_formatting (#720)
    • English: support "till date" (#1005)
    • English: support “after” and “before” in relative dates (#1008)

    Cleanups:

    • Reorganize internal data (#1090)
    • CI updates (#1088)
    Source code(tar.gz)
    Source code(zip)
  • v1.1.2(Oct 20, 2022)

    Improvements:

    • Added support for negative timestamp (#1060)
    • Fixed PytzUsageWarning for Python versions >= 3.6 (#1062)
    • Added support for dates with dots and spaces (#1028)
    • Improved support for Ukrainian, Croatian and Russian (#1072, #1074, #1079, #1082, #1073, #1083)
    • Added support for parsing Unix timestamps consistently regardless of timezones (#954)
    • Improved tests (#1086)
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Mar 17, 2022)

    Improvements:

    • Fixed issue with regex library by pinning dependencies to an earlier version (< 2022.3.15, https://github.com/scrapinghub/dateparser/pull/1046).
    • Extended support for Russian language dates starting with lowercase (https://github.com/scrapinghub/dateparser/pull/999).
    • Allowed to use_given_order for languages too (https://github.com/scrapinghub/dateparser/pull/997).
    • Fixed link to settings section (https://github.com/scrapinghub/dateparser/pull/1018).
    • Defined UTF-8 encoding for Windows (https://github.com/scrapinghub/dateparser/pull/998).
    • Fixed directories creation error in CLI utils (https://github.com/scrapinghub/dateparser/pull/1022).
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Oct 4, 2021)

    New features:

    • Support language detection based on langdetect, fastText, or a custom implementation (see #932)
    • Add support for 'by
    • Sort default language list by internet usage (see #805)

    Improvements:

    • Improved support of Chinese (#910), Czech (#977)
    • Improvements in search_dates (see #953)
    • Make order of previous locales deterministic (see #851)
    • Fix parsing with trailing space (see #841)
    • Consider RETURN_TIME_AS_PERIOD for timestamp times (see #922)
    • Exclude failing regex version (see #974)
    • Ongoing work multithreading support (see #881, #885)
    • Add demo URL (see #883)

    QA:

    • Migrate pipelines from Travis CI to Github Actions (see #859, #879, #884, #886, #911, #966)
    • Use versioned CLDR data (see #825)
    • Add a script to update table of supported languages and locales (see #601)
    • Sort 'skip' keys in yaml files (see #844)
    • Improve test coverage (see #827)
    • Code cleanup (see #888, #907, #951, #958, #957)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Oct 29, 2020)

    Breaking changes:

    • Drop support for Python 2.7 and pypy (see #727, #744, #748, #749, #754, #755, #758, #761, #763, #764, #777 and #783)
    • Now DateDataParser.get_date_data() returns a DateData object instead of a dict (see #778).
    • From now wrong settings are not silenced and raise SettingValidationError (see #797)
    • Now dateparser.parse() is deterministic and doesn't try previous locales. Also, DateDataParser.get_date_data() doesn't try the previous locales by default (see #781)
    • Remove the 'base-formats' parser (see #721)
    • Extract the 'no-spaces-time' parser from the 'absolute-time' parser and make it an optional parser (see #786)
    • Remove numeral_translation_data (see #782)
    • Remove the undocumented SKIP_TOKENS_PARSER and FUZZY settings (see #728, #794)
    • Remove support for using strings in date_formats (see #726)
    • The undocumented ExactLanguageSearch class has been moved to the private scope and some internal methods have changed (see #778)
    • Changes in dateparser.utils: normalize_unicode() doesn't accept bytes as input and convert_to_unicode has been deprecated (see #749)

    New features:

    • Add Python 3.9 support (see #732, #823)
    • Detect hours separated with a period/dot (see #741)
    • Add support for "decade" (see #762)
    • Add support for the hijri calendar in Python ≥ 3.6 (see #718)

    Improvements:

    • New logo! (see #719)
    • Improve the README and docs (see #779, #722)
    • Fix the "calendars" extra (see #740)
    • Fix leap years when PREFER_DATES_FROM is set (see #738)
    • Fix STRICT_PARSING setting in no-spaces-time parser (see #715)
    • Consider RETURN_AS_TIME_PERIOD setting for relative-time parser (see #807)
    • Parse the 24hr time format with meridian info (see #634)
    • Other small improvements (see #698, #709, #710, #712, #730, #731, #735, #739, #784, #788, #795 and #801)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.6(Jun 12, 2020)

    Improvements:

    • Rename scripts to dateparser_scripts to avoid name collisions with modules from other packages or projects (see https://github.com/scrapinghub/dateparser/pull/707)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.5(Jun 10, 2020)

    New features:

    • Add Python 3.8 support (see https://github.com/scrapinghub/dateparser/pull/664)
    • Implement a REQUIRE_PARTS setting (see https://github.com/scrapinghub/dateparser/pull/703)
    • Add support for subscript and superscript numbers (see https://github.com/scrapinghub/dateparser/pull/684)
    • Extended French support (see https://github.com/scrapinghub/dateparser/pull/672)
    • Extended German support (see https://github.com/scrapinghub/dateparser/pull/673)

    Improvements:

    • Migrate test suite to Pytest (see https://github.com/scrapinghub/dateparser/pull/662)
    • Add test to check the yaml and json files content (see https://github.com/scrapinghub/dateparser/pull/663 and https://github.com/scrapinghub/dateparser/pull/692)
    • Add flake8 pipeline with pytest-flake8 (see https://github.com/scrapinghub/dateparser/pull/665)
    • Add partial support for 8-digit dates without separators (see https://github.com/scrapinghub/dateparser/pull/639)
    • Fix possible OverflowError errors and explicitly avoid to raise ValueError when parsing relative dates (see https://github.com/scrapinghub/dateparser/pull/686)
    • Fix double-digit GMT and UTC parsing (see https://github.com/scrapinghub/dateparser/pull/632)
    • Fix bug when using DATE_ORDER (see https://github.com/scrapinghub/dateparser/pull/628)
    • Fix bug when parsing relative time with timezone (see https://github.com/scrapinghub/dateparser/pull/503)
    • Fix milliseconds parsing (see https://github.com/scrapinghub/dateparser/pull/572 and https://github.com/scrapinghub/dateparser/pull/661)
    • Fix wrong values to be interpreted as 'future' in PREFER_DATES_FROM (see https://github.com/scrapinghub/dateparser/pull/629)
    • Other small improvements (see https://github.com/scrapinghub/dateparser/pull/667, https://github.com/scrapinghub/dateparser/pull/675, https://github.com/scrapinghub/dateparser/pull/511, https://github.com/scrapinghub/dateparser/pull/626, https://github.com/scrapinghub/dateparser/pull/512, https://github.com/scrapinghub/dateparser/pull/509, https://github.com/scrapinghub/dateparser/pull/696, https://github.com/scrapinghub/dateparser/pull/702 and https://github.com/scrapinghub/dateparser/pull/699)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.4(Mar 6, 2020)

    New features

    • Extended Norwegian support (see https://github.com/scrapinghub/dateparser/pull/598)
    • Implement a PARSERS setting (see https://github.com/scrapinghub/dateparser/pull/603)

    Improvements

    • Add support for PREFER_DATES_FROM in relative/freshness parser (see https://github.com/scrapinghub/dateparser/pull/414)
    • Add support for PREFER_DAY_OF_MONTH in base-formats parser (see https://github.com/scrapinghub/dateparser/pull/611)
    • Added UTC -00:00 as a valid offset (see https://github.com/scrapinghub/dateparser/pull/574)
    • Fix support for “one” (see https://github.com/scrapinghub/dateparser/pull/593)
    • Fix TypeError when parsing some invalid dates (see https://github.com/scrapinghub/dateparser/pull/536)
    • Fix tokenizer for non recognized characters (see https://github.com/scrapinghub/dateparser/pull/622)
    • Prevent installing regex 2019.02.19 (see https://github.com/scrapinghub/dateparser/pull/600)
    • Resolve DeprecationWarning related to raw string escape sequences (see https://github.com/scrapinghub/dateparser/pull/596)
    • Implement a tox environment to build the documentation (see https://github.com/scrapinghub/dateparser/pull/604)
    • Improve tests stability (see https://github.com/scrapinghub/dateparser/pull/591, https://github.com/scrapinghub/dateparser/pull/605)
    • Documentation improvements (see https://github.com/scrapinghub/dateparser/pull/510, https://github.com/scrapinghub/dateparser/pull/578, https://github.com/scrapinghub/dateparser/pull/619, https://github.com/scrapinghub/dateparser/pull/614, https://github.com/scrapinghub/dateparser/pull/620)
    • Performance improvements (see https://github.com/scrapinghub/dateparser/pull/570, https://github.com/scrapinghub/dateparser/pull/569, https://github.com/scrapinghub/dateparser/pull/625)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.3(Mar 6, 2020)

  • v0.7.2(Sep 17, 2019)

    Features:

    • Extended Czech support
    • Added time to valid periods
    • Added timezone information to dates found with search_dates()
    • Support strings as date formats

    Improvements:

    • Fixed Collections ABCs depreciation warning
    • Fixed dates with trailing colons not being parsed
    • Fixed date format override on any settings change
    • Fixed parsing current weekday as past date, regardless of settings
    • Added UTC -2:30 as a valid offset
    • Added Python 3.7 to supported versions, dropped support for Python 3.3 and 3.4
    • Moved to importlib from imp where possible
    • Improved support for Catalan
    • Documentation improvements
    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Feb 12, 2019)

    Features/news:

    • Added detected language to return value of search_dates()
    • Performance improvements
    • Refreshed versions of dependencies

    Improvements:

    • Fixed unpickleable DateTime objects with timezones
    • Fixed regex pattern to avoid new behaviour of re.split() in Python 3.7
    • Fixed an exception thrown when parsing colons
    • Fixed tests failing on days with number greater than 30
    • Fixed ZeroDivisionError exceptions
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Feb 8, 2018)

    Features added during Google Summer of Code 2017:

    • Harvesting language data from Unicode CLDR database (https://github.com/unicode-cldr/cldr-json), which includes over 200 locales (#321) - authored by Sarthak Maddan. See full currently supported locale list in README.
    • Extracting dates from longer strings of text (#324) - authored by Elena Zakharova. Special thanks for their awesome contributions!

    New features:

    • Added (independently from CLDR) Georgian (#308) and Swedish (#305)

    Improvements:

    • Improved support of Chinese (#359), Thai (#345), French (#301, #304), Russian (#302)
    • Removed ruamel.yaml from dependencies (#374). This should reduce the number of installation issues and improve performance as the result of moving away from YAML as basic data storage format. Note that YAML is still used as format for support language files.
    • Improved performance through using pre-compiling frequent regexes and lazy loading of data (#293, #294, #295, #315)
    • Extended tests (#316, #317, #318, #323)
    • Updated nose_parameterized to its current package, parameterized (#381)

    Planned for next release:

    • Full language and locale names
    • Performance and stability improvements
    • Documentation improvements
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 13, 2017)

    New features:

    • Consistent parsing in terms of true python representation of date string. See #281
    • Added support for Bangla, Bulgarian and Hindi languages.

    Improvements:

    • Major bug fixes related to parser and system's locale. See #277, #282
    • Type check for timezone arguments in settings. see #267
    • Pinned dependencies' versions in requirements. See #265
    • Improved support for cn, es, dutch languages. See #274, #272, #285

    Packaging:

    • Make calendars extras to be used at the time of installation if need to use calendars feature.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Dec 18, 2016)

    0.5.1 (2016-12-18)

    New features:

    • Added support for Hebrew

    Improvements:

    • Safer loading of YAML. See #251
    • Better timezone parsing for freshness dates. See #256
    • Pinned dependencies' versions in requirements. See #265
    • Improved support for zh, fi languages. See #249, #250, #248, #244
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Sep 26, 2016)

    New features:

    • DateDataParser now also returns detected language in the result dictionary.
    • Explicit and lucid timezone conversion for a given datestring using TIMEZONE, TO_TIMEZONE settings.
    • Added Hungarian langauge.
    • Added setting, STRICT_PARSING to ignore imcomplete dates.

    Improvements:

    • Fixed quite a few parser bugs reported in issues #219, #222, #207, #224.
    • Improved support for chinese language.
    • Consistent interface for both Jalali and Hijri parsers.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jun 16, 2016)

    New features:

    • Support for Language based date order preference while parsing ambiguous dates.
    • Support for parsing dates with no spaces in between components.
    • Support for custom date order preference using settings.
    • Support for parsing generic relative dates in future.e.g. tomorrow, in two weeks, etc.
    • Added RELATIVE_BASE settings to set date context to any datetime in past or future.
    • Replaced dateutil.parser.parse with dateparser's own parser.
    • Little/no tolerance for invalid dates

    Improvements:

    • Added simplifications for 12 noon and 12 midnight.
    • Fixed several bugs
    • Replaced PyYAML library by its active fork ruamel.yaml which also fixed the issues with installation on windows using python35.
    • More predictable date_formats handling.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.5(Apr 27, 2016)

    New features:

    • Danish language support.
    • Japanese language support.
    • Support for parsing date strings with accents.

    Improvements:

    • Transformed languages.yaml into base file and separate files for each language.
    • Fixed vietnamese language simplifications.
    • No more version restrictions for python-dateutil.
    • Timezone parsing improvements.
    • Fixed test environments.
    • Cleaned language codes. Now we strictly follow codes as in ISO 639-1.
    • Improved Chinese dates parsing.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.4(Mar 3, 2016)

  • v0.3.3(Feb 29, 2016)

    New features:

    • Finnish language support.

    Improvements:

    • Faster parsing with switching to regex module.
    • RETURN_AS_TIMEZONE_AWARE setting to return tz aware date object.
    • Fixed conflicts with month/weekday names similarity across languages.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.2(Jan 25, 2016)

    New features:

    • Added Hijri Calendar support.
    • Added settings for better control over parsing dates.
    • Support to convert parsed time to the given timezone for both complete and relative dates.

    Improvements:

    • Fixed problem with caching :func:datetime.now in :class:FreshnessDateDataParser.
    • Added month names and week day names abbreviations to several languages.
    • More simplifications for Russian and Ukranian languages.
    • Fixed problem with parsing time component of date strings with several kinds of apostrophes.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Oct 28, 2015)

    New features:

    • Support for Jalali Calendar.
    • Belarusian language support.
    • Indonesian language support.

    Improvements:

    • Extended support for Russian and Polish.
    • Fixed bug with time zone recognition.
    • Fixed bug with incorrect translation of "second" for Portuguese.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jul 29, 2015)

    New features:

    • Compatibility with Python 3 and PyPy.

    Improvements:

    • languages.yaml data cleaned up to make it human-readable.
    • Improved Spanish date parsing.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Jun 18, 2015)

Owner
Scrapinghub
Turn web content into useful data
Scrapinghub
Make Python datetime formatting human readable

Make Python datetime formatting human readable

James Timmins 0 Oct 3, 2021
Parse human-readable date/time strings

parsedatetime Parse human-readable date/time strings. Python 2.6 or greater is required for parsedatetime version 1.0 or greater. While we still test

Mike Taylor 651 Dec 23, 2022
A Python library for dealing with dates

moment A Python library for dealing with dates/times. Inspired by Moment.js and Kenneth Reitz's Requests library. Ideas were also taken from the Times

Zach Williams 709 Dec 9, 2022
Friendly Python Dates

When.py: Friendly Dates and Times Production: Development: User-friendly functions to help perform common date and time actions. Usage To get the syst

Andy Dirnberger 191 Oct 14, 2022
Better dates & times for Python

Arrow: Better dates & times for Python Arrow is a Python library that offers a sensible and human-friendly approach to creating, manipulating, formatt

Arrow 8.2k Jan 5, 2023
A datetime parser in Python by Ari24-cb24 and NekoFantic

datetimeparser A datetime parser in Python by Ari24-cb24 and NekoFantic V 1.0 Erinnerung für den Parser Auf falsche Eingaben überprüfen Liste an Event

AriDevelopment 13 Dec 30, 2022
ISO 8601 date/time parser

ISO 8601 date/time parser This module implements ISO 8601 date, time and duration parsing. The implementation follows ISO8601:2004 standard, and imple

null 118 Dec 20, 2022
Useful extensions to the standard Python datetime features

dateutil - powerful extensions to datetime The dateutil module provides powerful extensions to the standard datetime module, available in Python. Inst

null 2k Dec 29, 2022
Python datetimes made easy

Pendulum Python datetimes made easy. Supports Python 2.7 and 3.4+. >>> import pendulum >>> now_in_paris = pendulum.now('Europe/Paris') >>> now_in_par

Sébastien Eustace 5.3k Jan 6, 2023
PyTime is an easy-use Python module which aims to operate date/time/datetime by string.

PyTime PyTime is an easy-use Python module which aims to operate date/time/datetime by string. PyTime allows you using nonregular datetime string to g

Sinux 148 Dec 9, 2022
pytz Python historical timezone library and database

pytz Brings the IANA tz database into Python. This library allows accurate and cross platform timezone calculations. pytz contains generated code, and

Stub 236 Jan 3, 2023
Generate and work with holidays in Python

python-holidays A fast, efficient Python library for generating country, province and state specific sets of holidays on the fly. It aims to make dete

Maurizio Montel 881 Dec 29, 2022
A Python module that tries to figure out what your local timezone is

tzlocal This Python module returns a tzinfo object with the local timezone information under Unix and Windows. It requires either Python 3.9+ or the b

Lennart Regebro 159 Dec 16, 2022
A simple in-process python scheduler library, designed to be integrated seamlessly with the `datetime` standard library.

scheduler A simple in-process python scheduler library, designed to be integrated seamlessly with the datetime standard library. Due to the support of

null 30 Dec 30, 2022
darts is a Python library for easy manipulation and forecasting of time series.

A python library for easy manipulation and forecasting of time series.

Unit8 5.2k Jan 1, 2023
A simple digital clock made with the help of python

Digital-Clock ⏰ Description ?? ✔️ A simple digital clock made with the help of python. The code is easy to understand and implement. With this reposit

Mohit 0 Dec 10, 2021
Neogex is a human readable parser standard, being implemented in Python

Neogex (New Expressions) Parsing Standard Much like Regex, Neogex allows for string parsing and validation based on a set of requirements. Unlike Rege

Seamus Donnellan 1 Dec 17, 2021
A Python 3 library for parsing human-written times and dates

Chronyk A small Python 3 library containing some handy tools for handling time, especially when it comes to interfacing with those pesky humans. Featu

Felix Wiegand 339 Dec 19, 2022
Transpiles some Python into human-readable Golang.

pytago Transpiles some Python into human-readable Golang. Try out the web demo Installation and usage There are two "officially" supported ways to use

Michael Phelps 318 Jan 3, 2023
Make Python datetime formatting human readable

Make Python datetime formatting human readable

James Timmins 0 Oct 3, 2021