Transistor, a Python web scraping framework for intelligent use cases.

BOM Quote Manufacturing

Last update: Nov 5, 2022

Overview

Web data collection and storage for intelligent use cases.

https://ci.appveyor.com/api/projects/status/xfg2yedwyrbyxysy/branch/master?svg=true

transistor

About

The web is full of data. Transistor is a web scraping framework for collecting, storing, and using targeted data from structured web pages.

Transistor's current strengths are in being able to:

provide an interface to use Splash headless browser / javascript rendering service.
includes optional support for using the scrapinghub.com Crawlera 'smart' proxy service.
ingest keyword search terms from a spreadsheet or use RabbitMQ or Redis as a message broker, transforming keywords into task queues.
scale one Spider into an arbitrary number of workers combined into a WorkGroup.
coordinate an arbitrary number of WorkGroups searching an arbitrary number of websites, into one scrape job.
send out all the WorkGroups concurrently, using gevent based asynchronous I/O.
return data from each website for each search term 'task' in our list, for easy website-to-website comparison.
export data to CSV, XML, JSON, pickle, file object, and/or your own custom exporter.
save targeted scrape data to the database of your choice.

Suitable use cases include:

comparing attributes like stock status and price, for a list of book titles or part numbers, across multiple websites.
concurrently process a large list of search terms on a search engine and then scrape results, or follow links first and then scrape results.

Development of Transistor is sponsored by BOM Quote Manufacturing. Here is a Medium story from the author about creating Transistor: That time I coded 90-hours in one week.

Primary goals:

Enable scraping targeted data from a wide range of websites including sites rendered with Javascript.
Navigate websites which present logins, custom forms, and other blockers to data collection, like captchas.
Provide asynchronous I/O for task execution, using gevent.
Easily integrate within a web app like Flask, Django , or other python based web frameworks.
Provide spreadsheet based data ingest and export options, like import a list of search terms from excel, ods, csv, and export data to each as well.
Utilize quick and easy integrated task work queues which can be automatically filled with data search terms by a simple spreadsheet import.
Able to integrate with more robust task queues like Celery while using rabbitmq or redis as a message broker as desired.
Provide hooks for users to persist data via any method they choose, while also supporting our own opinionated choice which is a PostgreSQL database along with newt.db.
Contain useful abstractions, classes, and interfaces for scraping and crawling with machine learning assistance (wip, timeline tbd).
Further support data science use cases of the persisted data, where convenient and useful for us to provide in this library (wip, timeline tbd).
Provide a command line interface (low priority wip, timeline tbd).

Quickstart

First, install Transistor from pypi:

pip install transistor

If you have previously installed Transistor, please ensure you are using the latest version:

pip-install --upgrade transistor

Next, setup Splash, following the Quickstart instructions. Finally, follow the minimal abbreviated Quickstart example books_to_scrape as detailed below.

This example is explained in more detail in the source code found in the examples/books_to_scrape folder, including fully implementing object persistence with newt.db.

Quickstart: Setup Splash

Successfully scraping is now a complex affair. Most websites with useuful data will rate limit, inspect headers, present captchas, and use javascript that must be rendered to get the data you want.

This rules out using simple python requests scripts for most serious use. So, setup becomes much more complicated.

To deal with this, we are going to use Splash, "A Lightweight, scriptable browser as a service with an HTTP API".

Transistor also supports the optional use of a smart proxy service from scrapinghub called Crawlera. The crawlera smart proxy service helps us:

avoid getting our own server IP banned
enable regional browsing which is important to us, because data can differ per region on the websites we want to scrape, and we are interested in those differences

The minimum monthly cost for the smallest size crawlera C10 plan is $25 USD/month. This level is useful but can easily be overly restrictive. The next level up is $100/month.

The easiest way to get setup with Splash is to use Aquarium and that is what we are going to do. Using Aquarium requires Docker and Docker Compose.

Windows Setup

On Windows, the easiest way to get started with Docker is to use Chocolately to install docker-desktop (the successor to docker-for-windows, which has now been depreciated). Using Chocolately requires installing Chocolately.

Then, to install docker-desktop with Chocolately:

C:\> choco install docker-desktop

You will likely need to restart your Windows box after installing docker-desktop, even if it doesn't tell you to do so.

All Platforms

Install Docker for your platform. For Aquarium, follow the installation instructions.

After setting up Splash with Aquarium, ensure you set the following environment variables:

SPLASH_USERNAME = '<username you set during Aquarium setup>'
SPLASH_PASSWORD = '<password you set during Aquarium setup>'

Finally, to run Splash service, cd to the Aquarium repo on your hard drive, and then run docker-compose up in your command prompt.

Troubleshooting Aquarium and Splash service:

Ensure you are in the aquarium folder when you run the docker-compose up command.
You may have some initial problem if you did not share your hard drive with Docker.
Share your hard drive with docker (google is your friend to figure out how to do this).
Try to run the docker-compose up command again.
Note, upon computer/server restart, you need to ensure the Splash service is started, either daemonized or with docker-compose up.

At this point, you should have a splash service running in your command prompt.

Crawlera

Using crawlera is optional and not required for this books_to_scrape quickstart.

But, if you want to use Crawlera with Transistor, first, register for the service and buy a subscription at scrapinghub.com.

After registering for Crawlera, create accounts in scrapinghub.com for each region you would like to present a proxied ip address from. For our case, we are setup to handle three regions, ALL for global, China, and USA.

Next, you should set environment variables on your computer/server with the api key for each region you need, like below:

CRAWLERA_ALL = '<your crawlera account api key for ALL regions>'
CRAWLERA_CN = '<your crawlera account api key for China region>'
CRAWLERA_USA = '<your crawlera account api key for USA region>'
CRAWLERA_REGIONS = 'CRAWLERA_ALL,CRAWLERA_USA,CRAWLERA_CN'

There are some utility functions which are helpful for working with crawlera found in transistor/utility/crawlera.py which require the CRAWLERA_REGIONS environment variable to be set. CRAWLERA_REGIONS should just be a comma separated string of whatever region environment variables you have set.

Finally, to use Crawlera, you will need to pass a keyword arg like crawlera_user=<your api key> into your custom Scraper spider which has been subclassed from the SplashScraper class. Alternately, you can directly set crawlera_user in your custom subclassed Scraper's __init__() method like self.crawlera_user = os.environ.get('CRAWLERA_USA', None).

Last, you must pass in a Lua script in the script argument which supports the Crawlera service. We have included two Lua scripts in transistor\scrapers\scripts folder which will be helpful to work out-of-the-box. Of course, to get the full power of Splash + Crawlera you will need to read their documentations and also come up to speed on how to customize the Lua script to fully use Splash, to do things like fill out forms and navigate pages.

Quickstart: `books_to_scrape` example

See examples/books_to_scrape for a fully working example with more detailed notes in the source code. We'll go through an abbreviated setup here, without many of the longer notes and database/persistence parts that you can find in the examples folder source code.

In this abbreviated example, we will create a Spider to crawl the books.toscrape.com website to search for 20 different book titles, which the titles are ingested from an excel spreadsheet. After we find the book titles, we will export the targeted data to a different csv file.

The books_to_scrape example assumes we have a column of 20 book titles in an excel file, with a column heading in the spreadsheet named item. We plan to scrape the domain books.toscrape.com to find the book titles. For the book titles we find, we will scrape the sale price and stock status.

First, let's setup a custom scraper Spider by subclassing SplashScraper. This will enable it to use the Splash headless browser.

Next, create a few custom methods to parse the html found by the SplashScraper and saved in the self.page attribute, with beautifulsoup4.

from transistor.scrapers import SplashScraper

class BooksToScrapeScraper(SplashScraper):
    """
    Given a book title, scrape books.toscrape.com/index.html
    for the book cost and stock status.
    """

    def __init__(self, book_title: str, script=None, **kwargs):
        """
        Create the instance with a few custom attributes and
        set the baseurl
        """
        super().__init__(script=script, **kwargs)
        self.baseurl = 'http://books.toscrape.com/'
        self.book_title = book_title
        self.price = None
        self.stock = None

    def start_http_session(self, url=None, timeout=(3.05, 10.05)):
        """
        Starts the scrape session. Normally, you can just call
        super().start_http_session(). In this case, we also want to start out
        with a call to self._find_title() to kickoff the crawl.
        """
        super().start_http_session(url=url, timeout=timeout)
        return self._find_title()

    # now, define your custom books.toscrape.com scraper logic below

    def _find_title(self):
        """
        Search for the book title in the current page. If it isn't found, crawl
        to the next page.
        """
        if self.page:
            title = self.page.find("a", title=self.book_title)
            if title:
                return self._find_price_and_stock(title)
            else:
                return self._crawl()
        return None

    def _next_page(self):
        """
        Find the url to the next page from the pagination link.
        """
        if self.page:
            next_page = self.page.find('li', class_='next').find('a')
            if next_page:
                if next_page['href'].startswith('catalogue'):
                    return self.baseurl + next_page['href']
                else:
                    return self.baseurl + '/catalogue/' + next_page['href']
        return None

    def _crawl(self):
        """
        Navigate to the next url page using the SplashScraper.open() method and
        then call find_title again, to see if we found our tasked title.
        """
        if self._next_page():
            self.open(url=self._next_page())
            return self._find_title()
        return print(f'Crawled all pages. Title not found.')

    def _find_price_and_stock(self, title):
        """
        The tasked title has been found and so now find the price and stock and
        assign them to class attributes self.price and self.stock for now.
        """
        price_div = title.find_parent(
            "h3").find_next_sibling(
            'div', class_='product_price')

        self.price = price_div.find('p', class_='price_color').text
        self.stock = price_div.find('p', class_='instock availability').text.translate(
            {ord(c): None for c in '\n\t\r'}).strip()
        print('Found the Title, Price, and Stock.')

Next, we need to setup two more subclasses from baseclasses SplashScraperItem and ItemLoader. This will allow us to export the data from the SplashScraper spider to the csv spreadsheet.

Specifically, we are interested to export the book_title, stock and price attributes. See more detail in examples/books_to_scrape/persistence/serialization.py file.

from transistor.persistence.item import Field
from transistor.persistence import SplashScraperItems
from transistor.persistence.loader import ItemLoader


class BookItems(SplashScraperItems):
    # -- names of your customized scraper class attributes go here -- #

    book_title = Field()  # the book_title which we searched
    price = Field()  # the self.price attribute
    stock = Field()  # the self.stock attribute


def serialize_price(value):
    """
    A simple serializer used in BookItemsLoader to ensure USD is
    prefixed on the `price` Field, for the data returned in the scrape.
    :param value: the scraped value for the `price` Field
    """
    if value:
        return f"UK {str(value)}"

class BookItemsLoader(ItemLoader):
    def write(self):
        """
        Write your scraper's exported custom data attributes to the
        BookItems class. Call super() to also capture attributes
        built-in from the Base ItemLoader class.

        Last, ensure you assign the attributes from `self.items` to
        `self.spider.<attribute>` and finally you must return
        self.items in this method.
        """

        # now, define your custom items
        self.items['book_title'] = self.spider.book_title
        self.items['stock'] = self.spider.stock
        # set the value with self.serialize_field(field, name, value) as needed,
        # for example, `serialize_price` below turns '£50.10' into 'UK £50.10'
        # the '£50.10' is the original scraped value from the website stored in
        # self.scraper.price, but we think it is more clear as 'UK £50.10'
        self.items['price'] = self.serialize_field(
            field=Field(serializer=serialize_price),
            name='price',
            value=self.spider.price)

        # call super() to write the built-in SplashScraper Items from ItemLoader
        super().write()

        return self.items

Finally, to run the scrape, we will need to create a main.py file. This is all we need for the minimal example to scrape and export targeted data to csv.

So, at this point, we've:

Setup a custom scraper BooksToScrapeScraper by subclassing SplashScraper.
Setup BookItems by subclassing SplashScraperItems.
Setup BookItemsLoader by subclassing ItemLoader.
Wrote a simple serializer with the serialize_price function, which appends 'UK' to the returned price attribute data.

Next, we are ready to setup a main.py file as the final entry point to run our first scrape and export the data to a csv file.

The first thing we need to do is perform some imports.

#  -*- coding: utf-8 -*-
# in main.py, monkey patching for gevent must be done first
from gevent import monkey
monkey.patch_all()
# you probably need to add your project directory to the pythonpath like below
import sys
sys.path.insert(0, "C:/Users/<username>/repos/books_to_scrape")

# finally, import from transistor and your own custom code
from transistor import StatefulBook, WorkGroup, BaseWorkGroupManager
from transistor.persistence.exporters import CsvItemExporter
from <path-to-your-custom-scraper> import BooksToScrapeScraper
from <path-to-your-custom-Items/ItemsLoader> import BookItems, BookItemsLoader

Second, setup a StatefulBook which will read the book_titles.xlsx file and transform the book titles from the spreadsheet "titles" column into task queues for our WorkGroups.

# we need to get the filepath to your book_titles.xlsx excel file, you can copy it
# from transistor/examples/books_to_scrape/schedulers/stateful_book/book_titles.xlsx
# need a variable like below:
# filepath = 'your/path/to/book_titles.xlsx'

# including some file path code here as a hint because it's not so straightforward
from pathlib import Path
from os.path import dirname as d
from os.path import abspath
root_dir = d(d(abspath(__file__)))
def get_file_path(filename):
    """
    Find the book_titles excel file path.
    """
    root = Path(root_dir)
    filepath = root / 'files' / filename
    return r'{}'.format(filepath)

# now we can use get_file_path to set the variable named `filepath`

filepath = get_file_path('book_titles.xlsx')
trackers = ['books.toscrape.com']
tasks = StatefulBook(filepath, trackers, keywords="titles")

Third, setup a list of exporters which than then be passed to whichever WorkGroup objects you want to use them with. In this case, we are just going to use the built-in CsvItemExporter but we could also use additional exporters to do multiple exports at the same time, if desired.

exporters=[
        CsvItemExporter(
            fields_to_export=['book_title', 'stock', 'price'],
            file=open('c:/book_data.csv', 'a+b'))
    ]

Fourth, setup the WorkGroup in a list we'll call groups. We use a list here because you can setup as many WorkGroup objects with unique target websites and as many individual workers, as you need:

groups = [
WorkGroup(
    name='books.toscrape.com',
    url='http://books.toscrape.com/',
    spider=BooksToScrapeScraper,
    items=BookItems,
    loader=BookItemsLoader,
    exporters=exporters,
    workers=20,  # this creates 20 Spiders and assigns each a book as a task
    kwargs={'timeout': (3.0, 20.0)})
]

Fifth, setup the WorkGroupManager and prepare the file to call the manager.main() method to start the scrape job:

# If you want to execute all the scrapers at the same time, ensure the pool is
# marginally larger than the sum of the total number of workers assigned in the
# list of WorkGroup objects. However, sometimes you may want to constrain your pool
# to a specific number less than your scrapers. That's also OK. This is useful
# like Crawlera's C10 instance, only allows 10 concurrent workers. Set pool=10.
manager = BaseWorkGroupManager(job_id='books_scrape', tasks=tasks, workgroups=groups, pool=25)

if __name__ == "__main__":
    manager.main()  # call manager.main() to start the job.

Finally, run python main.py and then profit. After a brief Spider runtime to crawl the books.toscrape.com website and write the data, you should have a newly exported csv file in the filepath you setup, 'c:/book_data.csv' in our example above.

To summarize what we did in main.py:

We setup a BaseWorkGroupManager, wrapped our spider BooksToScrapeScraper inside a list of WorkGroup objects called groups. Then we passed the groups list to the BaseWorkGroupManager.

Passing a list of WorkGroup objects allows the WorkGroupManager to run multiple jobs targeting different websites, concurrently.
In this simple example, we are only scraping books.toscrape.com, but if we wanted to also scrape books.toscrape.com.cn, then we'd setup two BaseGroup objects and wrap them each in their own WorkGroup, one for each domain.

NOTE-1: A more robust use case will also subclass the BaseWorker class. Because, it provides several methods as hooks for data persistence and post-scrape manipulation. Also, one may also consider to subclass the WorkGroupManager class and override it's monitor method. This is another hook point to have access to the BaseWorker object before it shuts down for good.

Refer to the full example in the examples/books_to_scrape/workgroup.py file for an example of customizing BaseWorker and WorkGroupManager methods. In the example, we show how to to save data to postgresql with newt.db but you can use whichever db you choose.

NOTE-2: If you do try to follow the more detailed example in examples/books_to_scrape, including data persistence with postgresql and newt.db, you may need to set the environment variable:

TRANSISTOR_DEBUG = 1

Whether or not you actually need to set this TRANSISTOR_DEBUG environment variable will depend on how you setup your settings.py and newt_db.py files. If you copy the files verbatim as shown in the examples/books_to_scrape folder, then you will need to set it.

Directly Using A SplashScraper

Perhaps you just want to do a quick one-off scrape?

It is possible to just use your custom scraper subclassed from SplashScraper directly, without going through all the work to setup a StatefulBook, BaseWorker, BaseGroup, WorkGroup, and WorkGroupManager.

Just fire it up in a python repl like below and ensure the start_http_session method is run, which can generally be done by setting autorun=True.

>>> from my_custom_scrapers.component.mousekey import MouseKeyScraper
>>> ms = MouseKeyScraper(part_number='C1210C106K4RACTU', autorun=True)

After the scrape completes, various methods and attributes from SplashScraper and SplashBrowser are available, plus your custom attributes and methods from your own subclassed scraper, are available:

>>> print(ms.stock())
'4,000'
>>> print(ms.pricing())
'{"1" : "USD $0.379", "10" : "USD $0.349"}'

Architecture Summary

Transistor provides useful layers and objects in the following categories:

Layers & Services

javascript rendering service / headless browser layer:

Transistor uses Splash implemented with Aquarium cookicutter docker template.
Splash provides a programmable headless browser to render javascript and Aquarium provides robust concurrency with multiple Splash instances that are load balanced with HAProxy .
Transistor provides integration with Splash through our SplashBrowser class found in transistor/browsers/splash_browser.py.

smart proxy service:

Transistor supports use of Crawlera , which is a paid smart proxy service providing robust protection against getting our own ip banned while scraping sites that actively present challenges to web data collection.
Crawlera use is optional. It has a minimum monthly cost of $25 USD for starter package and next level up is currently $100 USD/month.
in using Crawlera, the concurrency provided by gevent for asynchronous I/O along with Splash running with Aquarium, is absolutely required, because a single request with Splash + Crawlera is quite slow, taking up to 15 minutes or more to successfully return a result.

Spiders

browsers

see: transistor/browsers
wrap python-requests and beautifulsoup4 libraries to serve our various scraping browser needs.
browser API is generally created by subclassing and overriding the well known mechanicalsoup library to work with Splash and/or Splash + Crawlera.
if Javascript support is not needed for a simple scrape, it is nice to just use mechanicalsoup's StatefulBrowser class directly as a Scraper, like as shown in examples/cny_exchange_rate.py .
a Browser object is generally instantiated inside of a Scraper object, where it handles items like fetching the page, parsing headers, creating a self.page object to parse with beautifulsoup4, handling failures with automatic retries, and setting class attributes accessible to our Scraper object.

scrapers

see transistor/scrapers
instantiates a browser to grab the page object, implements various html filter methods on page to return the target data, can use Splash headless browser/javascript rendering service to navigate links, fill out forms, and submit data.
for a Splash or Splash + Crawlera based scraper Spider, the SplashScraper base class provides a minimal required Lua script and all required connection logic. However, more complex use cases will require providing your own custom modified Lua script.
the scraper design is built around gevent based asynchronous I/O, and this design allows to send out an arbitrarily large number of scraper workers, with each scraper worker assigned a specific scrape task to complete.
the current core design, in allowing to send out an arbitrarily large number of scraper workers, is not necessarily an optimal design to 'crawl' pages in search of targeted data. Where it shines is when you need to use a webpage search function on an arbitrarily large list of search tasks, await the search results for each task, and finally return a scraped result for each task.

crawlers (wip, on the to-do list)

see transistor/crawlers (not yet implemented)
this crawling Spider will be supported through a base class called SplashCrawler.
while it is straightforward to use the current Transistor scraper SplashScraper design to do basic crawling (see examples/books_to_scrape/scraper.py for an example) the current way to do this with Transistor is not optimal for crawling. So we'll implement modified designs for crawling spiders.
specifics TBD, may be fully custom or else may reuse some good architecture parts of scrapy, although if we do that, it will be done so we don't need a scrapy dependency and further it will be using gevent for asynchronous I/O.

Program I/O

schedulers:

BOOKS

see transistor/schedulers/books
a StatefulBook object provides an interface to work with spreadsheet based data.
for example, a book facilitates importing a column of keyword search term data, like 'book titles' or 'electronic component part numbers', from a designated column in an .xlsx file.
after importing the keyword search terms, the book will transform each search term into a task contained in a TaskTracker object
each TaskTracker will contain a queue of tasks to be assigned by the WorkGroupManager, and will ultimately allow an arbitrarily large number of WorkGroups of BaseWorkers to execute the tasks, concurrently.

RabbitMQ & Redis

see transistor/schedulers/brokers
provides the ExchangeQueue class in transistor.scheulers.brokers.queues which can be passed to the tasks parameter of BaseWorkGroupManager
Just pass the appropriate connection string to ExchangeQueue and BaseWorkGroupManager and you can use either RabbitMQ or Redis as a message broker, thanks to kombu.
in this case, the BaseWorkGroupManager also acts as a AMQP consumer which can receive messages from RabbitMQ message broker

workers:

a BaseWorker object encapsulates a Spider object like the SplashScraper or SplashCrawler objects, which has been customized by the end user to navigate and extract the targeted data from a structured web page.
a BaseGroup object can then be created, to encapsulate the BaseWorker object which contains the Spider object.
The purpose of this BaseGroup object is to enable concurrency and scale by being able to spin up an arbitrarily large number of BaseWorker objects, each assigned a different scrape task for execution.
the BaseGroup object can then receive tasks to execute, like individual book titles or electronic component part numbers to search, delegated by a WorkGroupManager class.
each BaseWorker in the BaseGroup also processes web request results, as they are returned from it's wrapped SplashScraper object. BaseWorker methods include hooks for exporting data to mutiple formats like csv/xml or saving it to the db of your choice.
each BaseGroup should be wrapped in a WorkGroup which is passed to the WorkGroupManager. Objects which the BaseWorker will use to process the Spider after it returns from the scrape should also be specified in WorkGroup, like Items, ItemLoader, and Exporter.

managers:

the overall purpose of the WorkGroupManager object is to provide yet more scale and concurrency through asynchronous I/O.
The WorkGroupManager can spin up an arbitrarily large number of WorkGroup objects while assigning each BaseWorker/Spider in each of the WorkGroup objects, individual scrape tasks.
This design approach is most useful when you have a finite pipeline of scrape tasks which you want to search and compare the same terms, across multiple different websites, with each website targeted by one WorkGroup.
for example, we may have a list of 50 electronic component part numbers, which we want to search each part number in ten different regional websites. The WorkGroupManager can spin up a WorkGroup for each of the 10 websites, assign 50 workers to each WorkGroup, and send out 500 BaseWorkers each with 1 task to fill, concurrently.
to further describe the WorkGroupManager, it is a middle-layer between StatefulBook and BaseGroup. It ingests TaskTracker objects from the StatefulBook object. It is also involved to switch states for TaskTracker objects, useful to track the task state like completed, in progress, or failed (this last detail is a work-in-progress).

Persistence

exporters

see transistor/persistence/exporters
export data from a Spider to various formats, including csv, xml, json, xml, pickle, and pretty print to a file object.

Object Storage, Search, and Retrieval

Transistor can be used with the whichever database or persistence model you choose to implement. But, it will offer some open-source code in support of below:

SQLAlchemy

we use SQL Alchemy extensively and may include some contributed code as we find appropriate or useful to keep in the Transistor repository. At least, an example for reference will be included in the examples folder.

object-relational database using PostgreSQL with newt.db.

persist and store your custom python objects containing your web scraped data, directly in a PostgreSQL database, while also converting your python objects to JSON, automatically indexing them for super-quick searches, and making it available to be used from within your application or externally.
leverage PostgreSQL's strong JSON support as a document database while also enabling "ease of working with your data as ordinary objects in memory".
this is accomplished with newt.db which turns PostgreSQL into an object-relational database while leveraging PostgreSQL's well integrated JSON support.
newt.db is itself a wrapper built over the battle tested ZODB python object database and RelStorage which integrates ZODB with PostgreSQL.
more on newt.db here [1] and here [2]

[1]	Why Postgres Should Be Your Document Database (blog.jetbrains.com)

[2]	Newt DB, the amphibious database (newtdb.org).

Database Setup

Transistor maintainers prefer to use PostgreSQL with newt.db. Below is a quick setup walkthrough.

After you have a valid PostgreSQL installation, you should install newt.db:

pip install newt.db

After installation of newt.db you need to provide a URI connection string for newt.db to connect to PostgreSQL. An example setup might use two files for this, with a URI as shown in examples/books_to_scrape/settings.py and a second file to setup newt.db as shown in examples/books_to_scrape/persistence/newt_db.py as shown below:

examples/books_to_scrape/settings.py

not recreated here, check the source file

examples/books_to_scrape/newt_db.py:

import os
import newt.db
from examples.books_to_scrape.settings import DevConfig, ProdConfig, TestConfig
from transistor.utility.utils import get_debug_flag

def get_config():
    if 'APPVEYOR' in os.environ:
        return TestConfig
    return DevConfig if get_debug_flag() else ProdConfig

CONFIG = get_config()
ndb = newt.db.connection(CONFIG.NEWT_DB_URI)

Next, we need to store our first two python objects in newt.db, which are:

A list collection object, so we have a place to store our scrapes.
An object to hold our list collection object, so that we can have a list of lists

from transistor.persistence.newt_db.collections import SpiderList, SpiderLists

Now, from your python repl:

from transistor.newt_db import ndb

>>> ndb.root.spiders = SpiderLists()  # Assigning SpiderLists() is only required during initial setup. Or else, when/if you change the SpiderLists() object, for example, to provide more functionality to the class.
>>> ndb.root.spiders.add('first-scrape', SpiderList())  # You will add a new SpiderList() anytime you need a new list container. Like, every single scrape you save.  See ``process_exports`` method in ``examples/books_to_scrape/workgroup.py``.
>>> ndb.commit() # you must explicitly commit() after each change to newt.db.

At this point, you are ready-to-go with newt.db and PostgreSQL.

Later, when you have a scraper object instance, such as BooksToScrapeScraper() which has finished it's web scrape cycle, it will be stored in the SpiderList() named first-scrape like such:

>>> ndb.root.spiders['first-scrape'].add(BooksToScrapeScraper(name="books.toscrape.com", book_title="Soumission"))

More on StatefulBook

Practical use requires multiple methods of input and output. StatefulBook provides a method for reading an excel file with one column of search terms, part numbers in the below example, which we would like to search and scrape data from multiple websites which sell such components:

>>> from transistor import StatefulBook

>>> filepath = '/path/to/your/file.xlsx'
>>> trackers = ['mousekey.cn', 'mousekey.com', 'digidog.com.cn', 'digidog.com']

This will create four separate task trackers for each of the four websites to search with the part numbers:

>>> book = StatefulBook(filepath, trackers, keywords="part_numbers")

>>> book.to_do()

Output:

deque([<TaskTracker(name=mousekey.cn)>, <TaskTracker(name=mousekey.com)>, <TaskTracker(name=digidog.com.cn)>, <TaskTracker(name=digidog.com)>])

So now, each website we intend to scrape, has it's own task queue. To work with an individual tracker and see what is in it's individual to_do work queue:

>>> for tracker in book.to_do():
>>> if tracker.name == 'mousekey.cn':
>>>     ms_tracker = tracker

>>> print(ms_tracker)

    <TaskTracker(name=mousekey.cn)>

>>> ms_tracker.to_do()

    deque(['050R30-76B', '1050170001', '12401598E4#2A', '525591052', '687710152002', 'ZL38063LDG1'])

Testing

The easiest way to test your scraper logic is to download the webpage html and then pass in the html file with a test dict. Below is an example:

from pathlib import Path
data_folder = Path("c:/Users/<your-username>/repos/<your-repo-name>/tests/scrapers/component/mousekey")
file_to_open = data_folder / "mousekey.cn.html"
f = open(file_to_open, encoding='utf-8')
page = f.read()
test_dict = {"_test_true": True, "_test_page_text": page, "_test_status_code": 200, "autostart": True}

from my_custom_scrapers.component.mousekey import MouseKeyScraper

ms = MouseKeyScraper(part_number='GRM1555C1H180JA01D', **test_dict)

assert ms.stock() == '17,090'
assert ms.pricing() == '{"1": "CNY ¥0.7888", "10": "CNY ¥0.25984", "100": "CNY ¥0.1102", ' \
           '"500": "CNY ¥0.07888", "10,000": "CNY ¥0.03944"}'

Comments

Update openpyxl requirement from <2.6.0,>=2.5.0 to >=2.5.0,<2.7.0
Updates the requirements on openpyxl to permit the latest version.

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

Finally, you can contact us by mentioning @dependabot.

dependencies
opened by dependabot-preview[bot] 2

Update relstorage to 3.4.5

This PR updates RelStorage[postgresql] from 2.1.1 to 3.4.5.

Changelog

3.4.5

==================

- Scale the new timing metrics introduced in 3.4.2 to milliseconds.
This matches the scale of other timing metrics produced
automatically by the use of ``perfmetrics`` in this package.
Similarly, append ``.t`` to the end of their names for the same
reason.

3.4.4

==================

- Fix an exception sending stats when TPC is aborted because of an error
during voting such as a ``ConflictError``. This only affected those
deployments with perfmetrics configured to use a StatsD client. See
:issue:`464`.

3.4.3

==================

- PostgreSQL: Log the backend PID at the start of TPC. This can help
correlate error messages from the server. See :issue:`460`.

- Make more conflict errors include information about the OIDs and
TIDs that may have been involved in the conflict.

- Add support for pg8000 1.17 and newer; tested with 1.19.2. See
:issue:`438`.

3.4.2

==================

- Fix write replica selection after a disconnect, and generally
further improve handling of unexpectedly closed store connections.

- Release the critical section a bit sooner at commit time, when
possible. Only affects gevent-based drivers. See :issue:`454`.

- Add support for mysql-connector-python-8.0.24.

- Add StatsD counter metrics
&quot;relstorage.storage.tpc_vote.unable_to_acquire_lock&quot;,
&quot;relstorage.storage.tpc_vote.total_conflicts,&quot;
&quot;relstorage.storage.tpc_vote.readCurrent_conflicts,&quot;
&quot;relstorage.storage.tpc_vote.committed_conflicts,&quot; and
&quot;relstorage.storage.tpc_vote.resolved_conflicts&quot;. Also add StatsD
timer metrics &quot;relstorage.storage.tpc_vote.objects_locked&quot; and
&quot;relstorage.storage.tpc_vote.between_vote_and_finish&quot; corresponding
to existing log messages. The rate at which these are sampled, as
well as the rate at which many method timings are sampled, defaults
to 10% (0.1) and can be controlled with the
``RS_PERF_STATSD_SAMPLE_RATE`` environment variable. See :issue:`453`.

3.4.1

==================

- RelStorage has moved from Travis CI to `GitHub Actions
&lt;https://github.com/zodb/relstorage/actions&gt;`_ for macOS and Linux
tests and manylinux wheel building. See :issue:`437`.
- RelStorage is now tested with PostgreSQL 13.1. See :issue:`427`.
- RelStorage is now tested with PyMySQL 1.0. See :issue:`434`.
- Update the bundled boost C++ library from 1.71 to 1.75.
- Improve the way store connections are managed to make it less likely
a &quot;stale&quot; store connection that hasn&#39;t actually been checked for
liveness gets used.

3.4.0

==================

- Improve the logging of ``zodbconvert``. The regular minute logging
contains more information and takes blob sizes into account, and
debug logging is more useful, logging about four times a minute.
Some extraneous logging was bumped down to trace.

- Fix psycopg2 logging debug-level warnings from the PostgreSQL server
on transaction commit about not actually being in a transaction.
(Sadly this just squashes the warning, it doesn&#39;t eliminate the
round trip that generates it.)

- Improve the performance of packing databases, especially
history-free databases. See :issue:`275`.

- Give ``zodbpack`` the ability to check for missing references in
RelStorages with the ``--check-refs-only`` argument. This will
perform a pre-pack with GC, and then report on any objects that
would be kept and refer to an object that does not exist. This can
be much faster than external scripts such as those provided by
``zc.zodbdgc``, though it definitely only reports missing references
one level deep.

This is new functionality. Feedback, as always, is very welcome!

- Avoid extra pickling operations of transaction meta data extensions
by using the new ``extension_bytes`` property introduced in ZODB
5.6. This results in higher-fidelity copies of storages, and may
slightly improve the speed of the process too. See :issue:`424`.

- Require ZODB 5.6, up from ZODB 5.5. See :issue:`424`.

- Make ``zodbconvert`` *much faster* (around 5 times faster) when the
destination is a history-free RelStorage and the source supports
``record_iternext()`` (like RelStorage and FileStorage do). This
also applies to the ``copyTransactionsFrom`` method. This is disabled
with the ``--incremental`` option, however. Be sure to read the
updated zodbconvert documentation.

3.3.2

==================

- Fix an ``UnboundLocalError`` in case a store connection could not be
opened. This error shadowed the original error opening the
connection. See :issue:`421`.

3.3.1

==================

- Manylinux wheels: Do not specify the C++ standard to use when
compiling. This seemed to result in an incompatibility with
manylinux1 systems that was not caught by ``auditwheel``.

3.3.0

==================

- The &quot;MySQLdb&quot; driver didn&#39;t properly use server-side cursors when
requested. This would result in unexpected increased memory usage
for things like packing and storage iteration.

- Make RelStorage instances implement
``IStorageCurrentRecordIteration``. This lets both
history-preserving and history-free storages work with
``zodbupdate``. See :issue:`389`.

- RelStorage instances now pool their storage connection. Depending on
the workload and ZODB configuration, this can result in requiring
fewer storage connections. See :issue:`409` and :pr:`417`.

There is a potential semantic change: Under some circumstances, the
``loadBefore`` and ``loadSerial`` methods could be used to load
states from the future (not visible to the storage&#39;s load
connection) by using the store connection. This ability has been
removed.

- Add support for Python 3.9.

- Drop support for Python 3.5.

- Build manylinux x86-64 and macOS wheels on Travis CI as part of the
release process. These join the Windows wheels in being
automatically uploaded to PyPI.

3.2.1

==================

- Improve the speed of loading large cache files by reducing the cost
of cache validation.

- The timing metrics for ``current_object_oids`` are always collected,
not just sampled. MySQL and PostgreSQL will only call this method
once at startup during persistent cache validation. Other databases
may call this method once during the commit process.

- Add the ability to limit how long persistent cache validation will
spend polling the database for invalid OIDs. Set the environment
variable ``RS_CACHE_POLL_TIMEOUT`` to a number of seconds before
importing RelStorage to use this.

- Avoid an ``AttributeError`` if a persistent ``zope.component`` site
manager is installed as the current site, it&#39;s a ghost, and we&#39;re
making a load query for the first time in a particular connection.
See :issue:`411`.

- Add some DEBUG level logging around forced invalidations of
persistent object caches due to exceeding the cache MVCC limits. See
:issue:`338`.

3.2.0

==================

- Make the ``gevent psycopg2`` driver support critical sections. This
reduces the amount of gevent switches that occur while database
locks are held under a carefully chosen set of circumstances that
attempt to balance overall throughput against latency. See
:issue:`407`.

- Source distributions: Fix installation when Cython isn&#39;t available.
Previously it incorrectly assumed a &#39;.c&#39; extension which lead to
compiler errors. See :issue:`405`.

- Improve various log messages.

3.1.2

==================

- Fix the psycopg2cffi driver inadvertently depending on the
``psycopg2`` package. See :issue:`403`.
- Make the error messages for unavailable drivers include more
information on underlying causes.
- Log a debug message when an &quot;auto&quot; driver is successfully resolved.
- Add a ``--debug`` argument to the ``zodbconvert`` command line tool
to enable DEBUG level logging.
- Add support for pg8000 1.16. Previously, a ``TypeError`` was raised.

3.1.1

==================

- Add support for pg8000 &gt;= 1.15.3. Previously, a ``TypeError`` was
raised.

- SQLite: Committing a transaction releases some resources sooner.
This makes it more likely that auto-checkpointing of WAL files will be
able to reclaim space in some scenarios. See :issue:`401`.

3.1.0

==================

- Use unsigned BTrees for internal data structures to avoid wrapping
in large databases. Requires BTrees 4.7.2.

3.0.1

==================

- Oracle: Fix an AttributeError saving to Oracle. See :pr:`380` by Mauro
Amico.

- MySQL+gevent: Release the critical section a bit sooner. See :issue:`381`.

- SQLite+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`382`.

- MySQL+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`385`.  This also included
some minor optimizations.

.. caution::

  This introduces a change in a stored procedure that is not
  compatible with older versions of RelStorage. When this version
  is first deployed, if there are older versions of RelStorage
  still running, they will be unable to commit. They will fail with
  a transient conflict error; they may attempt retries, but wil not
  succeed. Read-only transactions will continue to work.

3.0.0

==================

- Build binary wheels for Python 3.8 on Windows.

3.0rc1

===================

- SQLite: Avoid logging (at DEBUG level) an error executing ``PRAGMA
OPTIMIZE`` when closing a read-only (load) connection. Now, the
error is avoided by making the connection writable.

- PostgreSQL: Reduce the load connection&#39;s isolation level from
``SERIALIZABLE`` to ``REPEATABLE READ`` (two of the three other
supported databases also operate at this level). This allows
connecting to hot standby/streaming replicas. Since the connection
is read-only, and there were no other ``SERIALIZABLE`` transactions
(the store connection operates in ``READ COMMITTED`` mode), there
should be no other visible effects. See :issue:`376`.

- PostgreSQL: pg8000: Properly handle a ``port`` specification in the
``dsn`` configuration. See :issue:`378`.

- PostgreSQL: All drivers pass the ``application_name`` parameter at
connect time instead of later. This solves an issue with psycopg2
and psycopg2cffi connecting to hot standbys.

- All databases: If ``create-schema`` is false, use a read-only
connection to verify that the schema is correct.

- Packaging: Prune unused headers from the include/ directory.

3.0b3

==================

- SQLite: Fix a bug that could lead to invalid OIDs being allocated if
transactions were imported from another storage.

3.0b2

==================

- SQLite: Require the database to be in dedicated directory.

.. caution::

  This introduces a change to the &lt;sqlite3&gt; configuration.
  Please review the documentation. It is possible to migrate a
  database created earlier to the new structure, but no automated
  tooling or documentation is provided for that.

- SQLite: Allow configuration of many of SQLite&#39;s PRAGMAs for advanced
tuning.

- SQLite: Fix resetting OIDs when zapping a storage. This could be a
problem for benchmarks.

- SQLite: Fix large prefetches resulting in ``OperationalError``

- SQLite: Improve the speed of copying transactions into a SQLite
storage (e.g., with zodbconvert).

- SQLite: Substantially improve general performance. See :pr:`368`.

- SQLite: Add the ``gevent sqlite3`` driver that periodically yields
to the gevent loop at configurable intervals.

- PostgreSQL: Improve the speed of  writes when using the &#39;gevent
psycopg2&#39; driver.

3.0b1

==================

- Make SQLite and Oracle both use UPSERT queries instead of multiple
database round trips.

- Fix an exception with large transactions on SQLite.

- Fix compiling the C extension on very new versions of Microsoft
Visual Studio.

3.0a13

===================

- Further speed improvements and memory efficiency gains of around 30%
for the cache.

- Restore support for Python 2.7 on Windows.

- No longer require Cython to build from a sdist (.tar.gz).

- Add support for using a SQLite file as a RelStorage backend, if all
processes accessing it will be on a single machine. The advantage
over FileStorage is that multiple processes can use the database
concurrently. To allow multiple processes to use a FileStorage one
must deploy ZEO, even if all processes are on a single machine. See
:pr:`362`.

- Fix and test Oracle. The minimum required cx_oracle is now 6.0.

- Add support for Python 3.8.

3.0a12

===================

- Add the ``gevent psycopg2`` driver to allow using the fast psycopg2
driver with gevent.

- Conflict resolution prefetches data for conflicted objects, reducing
the number of database queries and locks needed.

- Introduce a driver-agnostic method for elevating database connection
priority during critical times of two-phase commit, and implement it
for the ``gevent MySQLdb`` driver. This reduces the amount of gevent
switches that occur while database locks are held under a carefully
chosen set of circumstances that attempt to balance overall
throughput against latency. See :issue:`339`.

- Drop support for Python 2.7 on Windows. The required compiler is
very old. See :issue:`358`.

- Substantially reduce the overhead of the cache, making it mome
memory efficient. Also make it substantially faster. This was done
by rewriting it in C. See :issue:`358`.

3.0a11

===================

- Make ``poll_invalidations`` handle other retryable internal
exceptions besides just ``ReadConflictError`` so they don&#39;t
propagate out to ``transaction.begin()``.

- Make the zodburi resolver entry points not require a specific
RelStorage extra such as &#39;postgres&#39;, in case there is a desire to
use a different database driver than the default that&#39;s installed
with that extra. See :issue:`342`, reported by Éloi Rivard.

- Make the zodburi resolvers accept the &#39;driver&#39; query paramater to
allow selecting a specific driver to use. This functions the same as
in a ZConfig configuration.

- Make the zodburi resolvers more strict on the distinction between
boolean arguments and arbitrary integer arguments. Previously, a
query like ``?read_only=12345&amp;cache_local_mb=yes`` would have been
interpreted as ``True`` and ``1``, respectively. Now it produces errors.

- Fix the calculation of the persistent cache size, especially on
Python 2. This is used to determine when to shrink the disk cache.
See :issue:`317`.

- Fix several race conditions when packing history-free storages
through a combination of changes in ordering and more strongly
consistent (``READ ONLY REPEATABLE READ``) transactions.
Reported in :issue:`325` by krissik with initial PR by Andreas
Gabriel.

- Make ``zodbpack`` pass RelStorage specific options like
``--prepack`` and ``--use-prepack-state`` to the RelStorage, even
when it has been wrapped in a ``zc.zlibstorage``.

- Reduce the amount of memory required to pack a RelStorage through
more careful datastructure choices. On CPython 3, the peak
memory usage of the prepack phase can be up to 9 times less. On
CPython 2, pre-packing a 30MM row storage required 3GB memory; now
it requires about 200MB.

- Use server-side cursors during packing when available, further
reducing the amount of memory required. See :issue:`165`.

- Make history-free database iterators from the same storage use a
consistent view of the database (until a transaction is committed
using the storage or ``sync()`` is called). This prevents data loss
in some cases. See :issue:`344`.

- Make copying transactions *from* a history-free RelStorage (e.g., with
``zodbconvert``) require substantially less memory (75% less).

- Make copying transactions *to* a RelStorage clean up temporary blob
files.

- Make ``zodbconvert`` log progress at intervals instead of for every
transaction. Logging every transaction could add significant overhead
unless stdout was redirected to a file.

- Avoid attempting to lock objects being created. See :issue:`329`.

- Make cache vacuuming faster.

3.0a10

===================

- Fix a bug where the persistent cache might not properly detect
object invalidations if the MVCC index pulled too far ahead at save
time. Now it explicitly checks for invalidations at load time, as
earlier versions did. See :pr:`343`.

- Require perfmetrics 3.0.

3.0a9

==================

- Several minor logging improvements.

- Allow many internal constants to be set with environment variables
at startup for experimentation. These are presently undocumented; if
they prove useful to adjust in different environments they may be
promoted to full configuration options.

- Fix importing RelStorage when ``zope.schema`` is not installed.
``zope.schema`` is intended to be a test dependency and optional for
production deployments. Reported in :issue:`334` by Jonathan Lung.

- Make the gevent MySQL driver more efficient at avoiding needless  waits.

- Due to a bug in MySQL (incorrectly rounding the &#39;minute&#39; value of a
timestamp up), TIDs generated in the last half second of a minute
would suddenly jump ahead by 4,266,903,756 integers (a full minute).

- Fix leaking an internal value for ``innodb_lock_timeout`` across
commits on MySQL. This could lead to ``tpc_vote`` blocking longer
than desired. See :issue:`331`.

- Fix ``undo`` to purge the objects whose transaction was revoked from
the cache.

- Make historical storages read-only, raising
``ReadOnlyHistoryError``, during the commit process. Previously this
was only enforced at the ``Connection`` level.

- Rewrite the cache to understand the MVCC nature of the connections
that use it.

This eliminates the use of &quot;checkpoints.&quot; Checkpoints established a
sort of index for objects to allow them to be found in the cache
without necessarily knowing their ``_p_serial`` value. To achieve
good hit rates in large databases, large values for the
``cache-delta-size-limit`` were needed, but if there were lots of
writes, polling to update those large checkpoints could become very
expensive. Because checkpoints were separate in each ZODB connection
in a process, and because when one connection changed its
checkpoints every other connection would also change its checkpoints
on the next access, this could quickly become a problem in highly
concurrent environments (many connections making many large database
queries at the same time). See :issue:`311`.

The new system uses a series of chained maps representing polling
points to build the same index data. All connections can share all
the maps for their view of the database and earlier. New polls add
new maps to the front of the list as needed, and old mapps are
removed once they are no longer needed by any active transaction.
This simulates the underlying database&#39;s MVCC approach.

Other benefits of this approach include:

- No more large polls. While each connection still polls for each
 transaction it enters, they now share state and only poll against
 the last time a poll occurred, not the last time they were used.
 The result should be smaller, more predictable polling.

- Having a model of object visibility allows the cache to use more
 efficient data structures: it can now use the smaller LOBTree to
 reduce the memory occupied by the cache. It also requires
 fewer cache entries overall to store multiple revisions of an
 object, reducing the overhead. And there are no more key copies
 required after a checkpoint change, again reducing overhead and
 making the LRU algorithm more efficient.

- The cache&#39;s LRU algorithm is now at the object level, not the
 object/serial pair.

- Objects that are known to have been changed but whose old revision
 is still in the cache are preemptively removed when no references
 to them are possible, reducing cache memory usage.

- The persistent cache can now guarantee not to write out data that
 it knows to be stale.

Dropping checkpoints probably makes memcache less effective, but
memcache hasn&#39;t been recommended for awhile.

3.0a8

==================

- Improve the safety of the persistent local cache in high-concurrency
environments using older versions of SQLite. Perform a quick
integrity check on startup and refuse to use the cache files if they
are reported corrupt.

- Switch the order in which object locks are taken: try shared locks
first and only then attempt exclusive locks. Shared locks do not
have to block, so a quick lock timeout here means that a
``ReadConflictError`` is inevitable. This works best on PostgreSQL
and MySQL 8, which support true non-blocking locks. On MySQL 5.7,
non-blocking locks are emulated with a 1s timeout. See :issue:`310`.

.. note:: The transaction machinery will retry read conflict errors
         by default. The more rapid detection of them may lead to
         extra retries if there was a process still finishing its
         commit. Consider adding small sleep backoffs to retry
         logic.

- Fix MySQL to immediately rollback its transaction when it gets a
lock timeout, while still in the stored procedure on the database.
Previously it would have required a round trip to the Python
process, which could take an arbitrary amount of time while the
transaction may have still been holding some locks. (After
:issue:`310` they would only be shared locks, but before they would
have been exclusive locks.) This should make for faster recovery in
heavily loaded environments with lots of conflicts. See :issue:`313`.

- Make MySQL clear its temp tables using a single round trip.
Truncation is optional and disabled by default. See :issue:`319`.

- Fix PostgreSQL to not send the definition of the temporary tables
for every transaction. This is only necessary for the first
transaction.

- Improve handling of commit and rollback, especially on PostgreSQL.
We now generate many fewer unneeded rollbacks. See :issue:`289`.

- Stop checking the status of ``readCurrent`` OIDs twice.

- Make the gevent MySQL driver yield more frequently while getting
large result sets. Previously it would block in C to read the entire
result set. Now it yields according to the cursor&#39;s ``arraysize``.
See :issue:`315`.

- Polling for changes now iterates the cursor instead of using
``fetchall()``. This can reduce memory usage and provide better
behaviour in a concurrent environment, depending on the cursor
implementation.

- Add three environment variables to control the odds of whether any
given poll actually suggests shifted checkpoints. These are all
floating point numbers between 0 and 1. They are
``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL`` (default to 0.7,
i.e., 70%), ``RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT``
(default 0.8) and ``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE``
(default 0.2). (There are corresponding class variables on the
storage cache that could also be set.) Use values of ``1``, ``1``
and ``0`` to restore the old completely deterministic behaviour.
It&#39;s not clear whether these will be useful, so they are not
officially options yet but they may become so. Feedback is
appreciated! See :issue:`323`.

.. note::

  These were removed in 3.0a9.

3.0a7

==================

- Eliminate runtime dependency on ZEO. See :issue:`293`.

- Fix a rare race condition allocating OIDs on MySQL. See
:issue:`283`.

- Optimize the ``loadBefore`` method. It appears to be mostly used in
the tests.

- Fix the blob cache cleanup thread to use a real native thread if
we&#39;re monkey-patched by gevent, using gevent&#39;s thread pool.
Previously, cleaning up the blob cache would block the event loop
for the duration. See :issue:`296`.

- Improve the thread safety and resource usage of blob cache cleanup.
Previously it could spawn many useless threads.

- When caching a newly uploaded blob for a history free storage, if
there&#39;s an older revision of the blob in the cache, and it is not in
use, go ahead and preemptively remove it from disk. This can help
prevent the cache size from growing out of hand and limit the number
of expensive full cache checks required. See :issue:`297`.

- Change the default value of the configuration setting
``shared-blob-dir`` to false, meaning that the default is now to use
a blob cache. If you were using shared blobs before, you&#39;ll need to
explicitly set a value for ``shared-blob-dir`` to ``true`` before
starting RelStorage.

- Add an option, ``blob-cache-size-check-external``, that causes the
blob cache cleanup process to run in a subprocess instead of a
thread. This can free up the storage process to handle requests.
This is not recommended on Windows. (``python -m
relstorage.blobhelper.cached /path/to/cache size_in_bytes`` can be
used to run a manual cleanup at any time. This is currently an
internal implementation detail.)

- Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable ``RELSTORAGE_ABORT_EARLY``. Aborting early releases
database locks to allow other transactions to make progress
immediately. See :issue:`50`.

- Reduce the strength of locks taken by ``Connection.readCurrent`` so
that they don&#39;t conflict with other connections that just want to
verify they haven&#39;t changed. This also lets us immediately detect a
conflict error with an in-progress transaction that is trying to
alter those objects. See :issue:`302`.

- Make databases that use row-level locks (MySQL and PostgreSQL) raise
specific exceptions on failures to acquire those locks. A different
exception is raised for rows a transaction needs to modify compared
to rows it only needs to read. Both are considered transient to
encourage transaction middleware to retry. See :issue:`303`.

- Move more of the vote phase of transaction commit into a database
stored procedure on MySQL and PostgreSQL, beginning with taking the
row-level locks. This eliminates several more database round trips
and the need for the Python thread (or greenlet) to repeatedly
release and then acquire the GIL while holding global locks. See
:issue:`304`.

- Make conflict resolution require fewer database round trips,
especially on PostgreSQL and MySQL, at the expense of using more
memory. In the ideal case it now only needs one (MySQL) or two
(PostgreSQL) queries. Previously it needed at least twice the number
of trips as there were conflicting objects. On both databases, the
benchmarks are 40% to 80% faster (depending on cache configuration).

3.0a6

==================

Enhancements
------------

- Eliminate a few extra round trips to the database on transaction
completion: One extra ``ROLLBACK`` in all databases, and one query
against the ``transaction`` table in history-preserving databases.
See :issue:`159`.

- Prepare more statements used during regular polling.

- Gracefully handle certain disconnected exceptions when rolling back
connections in between transactions. See :issue:`280`.

- Fix a cache error (&quot;TypeError: NoneType object is not
subscriptable&quot;) when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
&lt;https://pypi.org/project/zc.zodbdgc/&gt;`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

  It is critical that ``pack-gc`` be turned off (set to false) in a
  multi-database and that only ``multi-zodb-gc`` be used to perform
  garbage collection.

Packing
~~~~~~~

- Make ``RelStorage.pack()`` also accept a TID from the RelStorage
database to pack to. The usual Unix timestamp form for choosing a
pack time can be ambiguous in the event of multiple transactions
within a very short period of time. This is mostly a concern for
automated tests.

Similarly, it will accept a value less than 0 to mean the most
recent transaction in the database. This is useful when machine
clocks may not be well synchronized, or from automated tests.

Implementation
--------------

- Remove vestigial top-level thread locks. No instance of RelStorage
is thread safe.

RelStorage is an ``IMVCCStorage``, which means that each ZODB
``Connection`` gets its own new storage object. No visible storage
state is shared among Connections. Connections are explicitly
documented as not being thread safe. Since 2.0, RelStorage&#39;s
Connection instances have taken advantage of that fact to be a
little lighter weight through not being thread safe. However, they
still paid the overhead of locking method calls and code complexity.

The top-level storage (the one belonging to a ``ZODB.DB``) still
used heavyweight locks in earlier releases. ``ZODB.DB.storage`` is
documented as being only useful for tests, and the ``DB`` object
itself does not expose any operations that use the storage in a way
that would require thread safety.

The remaining thread safety support has been removed. This
simplifies the code and reduces overhead.

If you were previously using the ``ZODB.DB.storage`` object, or a
``RelStorage`` instance you constructed manually, from multiple
threads, instead make sure each thread has a distinct
``RelStorage.new_instance()`` object.

- A ``RelStorage`` instance now only implements the appropriate subset
of ZODB storage interfaces according to its configuration. For
example, if there is no configured ``blob-dir``, it won&#39;t implement
``IBlobStorage``, and if ``keep-history`` is false, it won&#39;t
implement ``IStorageUndoable``.

- Refactor RelStorage internals for a cleaner separation of concerns.
This includes how (some) queries are written and managed, making it
easier to prepare statements, but only those actually used.


MySQL
-----

- On MySQL, move allocating a TID into the database. On benchmarks
of a local machine this can be a scant few percent faster, but it&#39;s
primarily intended to reduce the number of round-trips to the
database. This is a step towards :issue:`281`. See :pr:`286`.

- On MySQL, set the connection timezone to be UTC. This is necessary
to get values consistent between ``UTC_TIMESTAMP``,
``UNIX_TIMESTAMP``, ``FROM_UNIXTIME``, and Python&#39;s ``time.gmtime``,
as used for comparing TIDs.

- On MySQL, move most steps of finishing a transaction into a stored
procedure. Together with the TID allocation changes, this reduces
the number of database queries from::

 1 to lock
  + 1 to get TID
  + 1 to store transaction (0 in history free)
  + 1 to move states
  + 1 for blobs (2 in history free)
  + 1 to set current (0 in history free)
  + 1 to commit
 = 7 or 6 (in history free)

down to 1. This is expected to be especially helpful for gevent
deployments, as the database lock is held, the transaction finalized
and committed, and the database lock released, all without involving
greenlets or greenlet switches. By allowing the GIL to be released
longer it may also be helpful for threaded environments. See
:issue:`281` and :pr:`287` for benchmarks and specifics.

.. caution::

 MySQL 5.7.18 and earlier contain a severe bug that causes the
 server to crash when the stored procedure is executed.


- Make PyMySQL use the same precision as mysqlclient when sending
floating point parameters.

- Automatically detect when MySQL stored procedures in the database
are out of date with the current source in this package and replace
them.

PostgreSQL
----------

- As for MySQL, move allocating a TID into the database.

- As for MySQL, move most steps of finishing a transaction into a
stored procedure. On psycopg2 and psycopg2cffi this is done in a
single database call. With pg8000, however, it still takes two, with
the second call being the COMMIT call that releases locks.

- Speed up getting the approximate number of objects
(``len(storage)``) in a database by using the estimates collected by
the autovacuum process or analyzing tables, instead of asking for a
full table scan.

3.0a5

==================

- Reduce the time that MySQL will wait to perform OID garbage
collection on startup. See :issue:`271`.

- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
``ProgrammingError: (2014, &quot;Commands out of sync; you can&#39;t run this
command now&quot;)``. See :issue:`270`.

- Fix the &quot;gevent MySQLdb&quot; driver to be cooperative during ``commit``
and ``rollback`` operations. Previously, it would block the event
loop for the entire time it took to send the commit or rollback
request, the server to perform the request, and the result to be
returned. Now, it frees the event loop after sending the request.
See :issue:`272`.

- Call ``set_min_oid`` less often if a storage is just updating
existing objects, not creating its own.

- Fix an occasional possible deadlock in MySQL&#39;s ``set_min_oid``. See
:pr:`276`.

3.0a4

==================

- Add support for the ZODB 5 ``connection.prefetch(*args)`` API. This
takes either OIDs (``obj._p_oid``) or persistent ghost objects, or
an iterator of those things, and asks the storage to load them into
its cache for use in the future. In RelStorage, this uses the shared
cache and so may be useful for more than one thread. This can be
3x or more faster than loading objects on-demand. See :issue:`239`.

- Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL
versions natively handle blobs greater than 2GB in size, and the
server was already chunking the blobs for storage, so our layer of
extra chunking has become unnecessary.

.. important::

  The first time a storage is opened with this version,
  blobs that have multiple chunks will be collapsed into a single
  chunk. If there are many blobs larger than 2GB, this could take
  some time.

  It is recommended you have a backup before installing this
  version.

  To verify that the blobs were correctly migrated, you should
  clean or remove your configured blob-cache directory, forcing new
  blobs to be downloaded.

- Fix a bug that left large objects behind if a PostgreSQL database
containing any blobs was ever zapped (with ``storage.zap_all()``).
The ``zodbconvert`` command, the ``zodbshootout`` command, and the
RelStorage test suite could all zap databases. Running the
``vacuumlo`` command included with PostgreSQL will free such
orphaned large objects, after which a regular ``vacuumdb`` command
can be used to reclaim space. See :issue:`260`.

- Conflict resolution can use data from the cache, thus potentially
eliminating a database hit during a very time-sensitive process.
Please file issues if you encounter any strange behaviour when
concurrently packing to the present time and also resolving
conflicts, in case there are corner cases.

- Packing a storage now invalidates the cached values that were packed
away. For the global caches this helps reduce memory pressure; for
the local cache this helps reduce memory pressure and ensure a more
useful persistent cache (this probably matters most when running on
a single machine).

- Make MySQL use ``ON DUPLICATE KEY UPDATE`` rather than ``REPLACE``.
This can be friendlier to the storage engine as it performs an
in-place ``UPDATE`` rather than a ``DELETE`` followed by an
``INSERT``. See :issue:`189`.

- Make PostgreSQL use an upsert query for moving rows into place on
history-preserving databases.

- Support ZODB 5&#39;s parallel commit feature. This means that the
database-wide commit lock is taken much later in the process, and
held for a much shorter time than before.

Previously, the commit lock was taken during the ``tpc_vote`` phase,
and held while we checked ``Connection.readCurrent`` values, and
checked for (and hopefully resolved) conflicts. Other transaction
resources (such as other ZODB databases in a multi-db setup) then
got to vote while we held this lock. Finally, in ``tpc_finally``,
objects were moved into place and the lock was released. This
prevented any other storage instances from checking for
``readCurrent`` or conflicts while we were doing that.

Now, ``tpc_vote`` is (usually) able to check
``Connection.readCurrent`` and check and resolve conflicts without
taking the commit lock. Only in ``tpc_finish``, when we need to
finally allocate the transaction ID, is the commit lock taken, and
only held for the duration needed to finally move objects into
place. This allows other storages for this database, and other
transaction resources for this transaction, to proceed with voting,
conflict resolution, etc, in parallel.

Consistent results are maintained by use of object-level row
locking. Thus, two transactions that attempt to modify the same
object will now only block each other.

There are two exceptions. First, if the ``storage.restore()`` method
is used, the commit lock must be taken very early (before
``tpc_vote``). This is usually only done as part of copying one
database to another. Second, if the storage is configured with a
shared blob directory instead of a blob cache (meaning that blobs
are *only* stored on the filesystem) and the transaction has added
or mutated blobs, the commit lock must be taken somewhat early to
ensure blobs can be saved (after conflict resolution, etc, but
before the end of ``tpc_vote``). It is recommended to store blobs on
the RDBMS server and use a blob cache. The shared blob layout can be
considered deprecated for this reason).

In addition, the new locking scheme means that packing no longer
needs to acquire a commit lock and more work can proceed in parallel
with regular commits. (Though, there may have been some regressions
in the deletion phase of packing speed MySQL; this has not been
benchmarked.)

.. note::

  If the environment variable ``RELSTORAGE_LOCK_EARLY`` is
  set when RelStorage is imported, then parallel commit will not be
  enabled, and the commit lock will be taken at the beginning of
  the tpc_vote phase, just like before: conflict resolution and
  readCurrent will all be handled with the lock held.

  This is intended for use diagnosing and temporarily working
  around bugs, such as the database driver reporting a deadlock
  error. If you find it necessary to use this setting, please
  report an issue at https://github.com/zodb/relstorage/issues.

See :issue:`125`.

- Deprecate the option ``shared-blob-dir``. Shared blob dirs prevent
using parallel commits when blobs are part of a transaction.

- Remove the &#39;umysqldb&#39; driver option. This driver exhibited failures
with row-level locking used for parallel commits. See :issue:`264`.

- Migrate all remaining MySQL tables to InnoDB. This is primarily the
tables used during packing, but also the table used for allocating
new OIDs.

Tables will be converted the first time a storage is opened that is
allowed to create the schema (``create-schema`` in the
configuration; default is true). For large tables, this may take
some time, so it is recommended to finish any outstanding packs
before upgrading RelStorage.

If schema creation is not allowed, and required tables are not using
InnoDB, an exception will be raised. Please contact the RelStorage
maintainers on GitHub if you have a need to use a storage engine
besides InnoDB.

This allows for better error detection during packing with parallel
commits. It is also required for `MySQL Group Replication
&lt;https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html&gt;`_.
Benchmarking also shows that creating new objects can be up to 15%
faster due to faster OID allocation.

Things to be aware of:

 - MySQL&#39;s `general conversion notes
   &lt;https://dev.mysql.com/doc/refman/8.0/en/converting-tables-to-innodb.html&gt;`_
   suggest that if you had tuned certain server parameters for
   MyISAM tables (which RelStorage only used during packing) it
   might be good to evaluate those parameters again.
 - InnoDB tables may take more disk space than MyISAM tables.
 - The ``new_oid`` table may temporarily have more rows in it at one
   time than before. They will still be garbage collected
   eventually. The change in strategy was necessary to handle
   concurrent transactions better.

See :issue:`188`.

- Fix an ``OperationalError: database is locked`` that could occur on
startup if multiple processes were reading or writing the cache
database. See :issue:`266`.

3.0a3

==================

- Zapping a storage now also removes any persistent cache files. See
:issue:`241`.

- Zapping a MySQL storage now issues ``DROP TABLE`` statements instead
of ``DELETE FROM`` statements. This is much faster on large
databases. See :issue:`242`.

- Workaround the PyPy 7.1 JIT bug using MySQL Connector/Python. It is no
longer necessary to disable the JIT in PyPy 7.1.

- On PostgreSQL, use PostgreSQL&#39;s efficient binary ``COPY FROM`` to
store objects into the database. This can be 20-40% faster. See
:issue:`247`.

- Use more efficient mechanisms to poll the database for current TIDs
when verifying serials in transactions.

- Silence a warning about ``cursor.connection`` from pg8000. See
:issue:`238`.

- Poll the database for the correct TIDs of older transactions when
loading from a persistent cache, and only use the entries if they
are current. This restores the functionality lost in the fix for
:issue:`249`.

- Increase the default cache delta limit sizes.

- Fix a race condition accessing non-shared blobs when the blob cache
limit was reached which could result in blobs appearing to be
spuriously empty. This was only observed on macOS. See :issue:`219`.

- Fix a bug computing the cache delta maps when restoring from
persistent cache that could cause data from a single transaction to
be stale, leading to spurious conflicts.

3.0a2

==================

- Drop support for PostgreSQL versions earlier than 9.6. See
:issue:`220`.

- Make MySQL and PostgreSQL use a prepared statement to get
transaction IDs. PostgreSQL also uses a prepared statement to set
them. This can be slightly faster. See :issue:`246`.

- Make PostgreSQL use a prepared statement to move objects to their
final destination during commit (history free only). See
:issue:`246`.

- Fix an issue with persistent caches written to from multiple
instances sometimes getting stale data after a restart. Note: This
makes the persistent cache less useful for objects that rarely
change in a database that features other actively changing objects;
it is hoped this can be addressed in the future. See :issue:`249`.

3.0a1

==================

- Add support for Python 3.7.

- Drop support for Python 3.4.

- Drop support for Python 2.7.8 and earlier.

- Drop support for ZODB 4 and ZEO 4.

- Officially drop support for versions of MySQL before 5.7.9. We haven&#39;t
been testing on anything older than that for some time, and older
than 5.6 for some time before that.

- Drop the ``poll_interval`` parameter. It has been deprecated with a
warning and ignored since 2.0.0b2. See :issue:`222`.

- Drop support for pg8000 older than 1.11.0.

- Drop support for MySQL Connector/Python older than 8.0.16. Many
older versions are known to be broken. Note that the C extension,
while available, is not currently recommended due to internal
errors. See :issue:`228`.

- Test support for MySQL Connector/Python on PyPy. See :issue:`228`.

.. caution:: Prior to PyPy 7.2 or RelStorage 3.0a3, it is necessary to disable JIT
            inlining due to `a PyPy bug
            &lt;https://bitbucket.org/pypy/pypy/issues/3014/jit-issue-inlining-structunpack-hh&gt;`_
            with ``struct.unpack``.

- Drop support for PyPy older than 5.3.1.

- Drop support for the &quot;MySQL Connector/Python&quot; driver name since it
wasn&#39;t possible to know if it would use the C extension or the
Python implementation. Instead, explicitly use the &#39;Py&#39; or &#39;C&#39;
prefixed name. See :pr:`229`.

- Drop the internal and undocumented environment variables that could be
used to force configurations that did not specify a database driver
to use a specific driver. Instead, list the driver in the database
configuration.

- Opening a RelStorage configuration object read from ZConfig more
than once would lose the database driver setting, reverting to
&#39;auto&#39;. It now retains the setting. See :issue:`231`.

- Fix Python 3 with mysqlclient 1.4. See :issue:`213`.

- Drop support for mysqlclient &lt; 1.4.

- Make driver names in RelStorage configurations case-insensitive
(e.g., &#39;MySQLdb&#39; and &#39;mysqldb&#39; are both valid). See :issue:`227`.

- Rename the column ``transaction.empty`` to ``transaction.is_empty``
for compatibility with MySQL 8.0, where ``empty`` is now a reserved
word. The migration will happen automatically when a storage is
first opened, unless it is configured not to create the schema.

.. note:: This migration has not been tested for Oracle.

.. note:: You must run this migration *before* attempting to upgrade
         a MySQL 5 database to MySQL 8. If you cannot run the
         upgrade through opening the storage, the statement is
         ``ALTER TABLE transaction CHANGE empty is_empty BOOLEAN
         NOT NULL DEFAULT FALSE``.

- Stop getting a warning about invalid optimizer syntax when packing a
MySQL database (especially with the PyMySQL driver). See
:issue:`163`.

- Add ``gevent MySQLdb``, a new driver that cooperates with gevent
while still using the C extensions of ``mysqlclient`` to communicate
with MySQL. This is now recommended over ``umysqldb``, which is
deprecated and will be removed.

- Rewrite the persistent cache implementation. It now is likely to
produce much higher hit rates (100% on some benchmarks, compared to
1-2% before). It is currently slower to read and write, however.
This is a work in progress. See :pr:`243`.

- Add more aggressive validation and, when possible, corrections for
certain types of cache consistency errors. Previously an
``AssertionError`` would be raised with the message &quot;Detected an
inconsistency between RelStorage and the database...&quot;. We now
proactively try harder to avoid that situation based on some
educated guesses about when it could happen, and should it still
happen we now reset the cache and raise a type of ``TransientError``
allowing the application to retry. A few instances where previously
incorrect data could be cached may now raise such a
``TransientError``. See :pr:`245`.

Links

PyPI: https://pypi.org/project/relstorage
Changelog: https://pyup.io/changelogs/relstorage/
Docs: https://relstorage.readthedocs.io/

opened by pyup-bot 1

Bump relstorage[postgresql] from 2.1.1 to 3.4.2
Bumps relstorage[postgresql] from 2.1.1 to 3.4.2.

Changelog

Sourced from relstorage[postgresql]'s changelog.

3.4.2 (2021-04-21)

Fix write replica selection after a disconnect, and generally further improve handling of unexpectedly closed store connections.

Release the critical section a bit sooner at commit time, when possible. Only affects gevent-based drivers. See 454.

Add support for mysql-connector-python-8.0.24.

Add StatsD counter metrics "relstorage.storage.tpc_vote.unable_to_acquire_lock", "relstorage.storage.tpc_vote.total_conflicts," "relstorage.storage.tpc_vote.readCurrent_conflicts," "relstorage.storage.tpc_vote.committed_conflicts," and "relstorage.storage.tpc_vote.resolved_conflicts". Also add StatsD timer metrics "relstorage.storage.tpc_vote.objects_locked" and "relstorage.storage.tpc_vote.between_vote_and_finish" corresponding to existing log messages. The rate at which these are sampled, as well as the rate at which many method timings are sampled, defaults to 10% (0.1) and can be controlled with the RS_PERF_STATSD_SAMPLE_RATE environment variable. See 453.

3.4.1 (2021-04-12)

RelStorage has moved from Travis CI to GitHub Actions for macOS and Linux tests and manylinux wheel building. See 437.

RelStorage is now tested with PostgreSQL 13.1. See 427.

RelStorage is now tested with PyMySQL 1.0. See 434.

Update the bundled boost C++ library from 1.71 to 1.75.

Improve the way store connections are managed to make it less likely a "stale" store connection that hasn't actually been checked for liveness gets used.

3.4.0 (2020-10-19)

Improve the logging of zodbconvert. The regular minute logging contains more information and takes blob sizes into account, and debug logging is more useful, logging about four times a minute. Some extraneous logging was bumped down to trace.

Fix psycopg2 logging debug-level warnings from the PostgreSQL server on transaction commit about not actually being in a transaction. (Sadly this just squashes the warning, it doesn't eliminate the round trip that generates it.)

Improve the performance of packing databases, especially history-free databases. See 275.

Give zodbpack the ability to check for missing references in RelStorages with the --check-refs-only argument. This will perform a pre-pack with GC, and then report on any objects that would be kept and refer to an object that does not exist. This can be much faster than external scripts such as those provided by zc.zodbdgc, though it definitely only reports missing references one level deep.

This is new functionality. Feedback, as always, is very welcome!

Avoid extra pickling operations of transaction meta data extensions by using the new extension_bytes property introduced in ZODB 5.6. This results in higher-fidelity copies of storages, and may slightly improve the speed of the process too. See 424.

Require ZODB 5.6, up from ZODB 5.5. See 424.

Make zodbconvert much faster (around 5 times faster) when the destination is a history-free RelStorage and the source supports record_iternext() (like RelStorage and FileStorage do). This also applies to the copyTransactionsFrom method. This is disabled with the --incremental option, however. Be sure to read the updated zodbconvert documentation.

3.3.2 (2020-09-21)

Fix an UnboundLocalError in case a store connection could not be opened. This error shadowed the original error opening the connection. See 421.

3.3.1 (2020-09-14)

Manylinux wheels: Do not specify the C++ standard to use when compiling. This seemed to result in an incompatibility with manylinux1 systems that was not caught by auditwheel.

3.3.0 (2020-09-14)

The "MySQLdb" driver didn't properly use server-side cursors when requested. This would result in unexpected increased memory usage for things like packing and storage iteration.

Make RelStorage instances implement IStorageCurrentRecordIteration. This lets both history-preserving and history-free storages work with zodbupdate. See 389.

RelStorage instances now pool their storage connection. Depending on the workload and ZODB configuration, this can result in requiring fewer storage connections. See 409 and 417.

There is a potential semantic change: Under some circumstances, the loadBefore and loadSerial methods could be used to load states from the future (not visible to the storage's load connection) by using the store connection. This ability has been removed.

... (truncated)

Commits

8d3dbf8 Preparing release 3.4.2

7919129 Merge pull request #459 from zodb/issue454

d65ddbc Another timeout to increase on CI.

c4a1b2e Add more metrics.

cc9bf10 Merge pull request #458 from zodb/issue454

0faf5d3 Tweak timeout on CI.

31f903a Add support for mysql-connector-python 8.0.24

073f46f Exit the critical phase sooner.

9c24aaf Merge pull request #456 from zodb/issue454

7fb8890 Sigh. The GHA caching is not reliable; the lint step (cpython 3.9) is restori...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1

Update relstorage to 3.4.2

This PR updates RelStorage[postgresql] from 2.1.1 to 3.4.2.

Changelog

3.4.2

==================

- Fix write replica selection after a disconnect, and generally
further improve handling of unexpectedly closed store connections.

- Release the critical section a bit sooner at commit time, when
possible. Only affects gevent-based drivers. See :issue:`454`.

- Add support for mysql-connector-python-8.0.24.

- Add StatsD counter metrics
&quot;relstorage.storage.tpc_vote.unable_to_acquire_lock&quot;,
&quot;relstorage.storage.tpc_vote.total_conflicts,&quot;
&quot;relstorage.storage.tpc_vote.readCurrent_conflicts,&quot;
&quot;relstorage.storage.tpc_vote.committed_conflicts,&quot; and
&quot;relstorage.storage.tpc_vote.resolved_conflicts&quot;. Also add StatsD
timer metrics &quot;relstorage.storage.tpc_vote.objects_locked&quot; and
&quot;relstorage.storage.tpc_vote.between_vote_and_finish&quot; corresponding
to existing log messages. The rate at which these are sampled, as
well as the rate at which many method timings are sampled, defaults
to 10% (0.1) and can be controlled with the
``RS_PERF_STATSD_SAMPLE_RATE`` environment variable. See :issue:`453`.

3.4.1

==================

- RelStorage has moved from Travis CI to `GitHub Actions
&lt;https://github.com/zodb/relstorage/actions&gt;`_ for macOS and Linux
tests and manylinux wheel building. See :issue:`437`.
- RelStorage is now tested with PostgreSQL 13.1. See :issue:`427`.
- RelStorage is now tested with PyMySQL 1.0. See :issue:`434`.
- Update the bundled boost C++ library from 1.71 to 1.75.
- Improve the way store connections are managed to make it less likely
a &quot;stale&quot; store connection that hasn&#39;t actually been checked for
liveness gets used.

3.4.0

==================

- Improve the logging of ``zodbconvert``. The regular minute logging
contains more information and takes blob sizes into account, and
debug logging is more useful, logging about four times a minute.
Some extraneous logging was bumped down to trace.

- Fix psycopg2 logging debug-level warnings from the PostgreSQL server
on transaction commit about not actually being in a transaction.
(Sadly this just squashes the warning, it doesn&#39;t eliminate the
round trip that generates it.)

- Improve the performance of packing databases, especially
history-free databases. See :issue:`275`.

- Give ``zodbpack`` the ability to check for missing references in
RelStorages with the ``--check-refs-only`` argument. This will
perform a pre-pack with GC, and then report on any objects that
would be kept and refer to an object that does not exist. This can
be much faster than external scripts such as those provided by
``zc.zodbdgc``, though it definitely only reports missing references
one level deep.

This is new functionality. Feedback, as always, is very welcome!

- Avoid extra pickling operations of transaction meta data extensions
by using the new ``extension_bytes`` property introduced in ZODB
5.6. This results in higher-fidelity copies of storages, and may
slightly improve the speed of the process too. See :issue:`424`.

- Require ZODB 5.6, up from ZODB 5.5. See :issue:`424`.

- Make ``zodbconvert`` *much faster* (around 5 times faster) when the
destination is a history-free RelStorage and the source supports
``record_iternext()`` (like RelStorage and FileStorage do). This
also applies to the ``copyTransactionsFrom`` method. This is disabled
with the ``--incremental`` option, however. Be sure to read the
updated zodbconvert documentation.

3.3.2

==================

- Fix an ``UnboundLocalError`` in case a store connection could not be
opened. This error shadowed the original error opening the
connection. See :issue:`421`.

3.3.1

==================

- Manylinux wheels: Do not specify the C++ standard to use when
compiling. This seemed to result in an incompatibility with
manylinux1 systems that was not caught by ``auditwheel``.

3.3.0

==================

- The &quot;MySQLdb&quot; driver didn&#39;t properly use server-side cursors when
requested. This would result in unexpected increased memory usage
for things like packing and storage iteration.

- Make RelStorage instances implement
``IStorageCurrentRecordIteration``. This lets both
history-preserving and history-free storages work with
``zodbupdate``. See :issue:`389`.

- RelStorage instances now pool their storage connection. Depending on
the workload and ZODB configuration, this can result in requiring
fewer storage connections. See :issue:`409` and :pr:`417`.

There is a potential semantic change: Under some circumstances, the
``loadBefore`` and ``loadSerial`` methods could be used to load
states from the future (not visible to the storage&#39;s load
connection) by using the store connection. This ability has been
removed.

- Add support for Python 3.9.

- Drop support for Python 3.5.

- Build manylinux x86-64 and macOS wheels on Travis CI as part of the
release process. These join the Windows wheels in being
automatically uploaded to PyPI.

3.2.1

==================

- Improve the speed of loading large cache files by reducing the cost
of cache validation.

- The timing metrics for ``current_object_oids`` are always collected,
not just sampled. MySQL and PostgreSQL will only call this method
once at startup during persistent cache validation. Other databases
may call this method once during the commit process.

- Add the ability to limit how long persistent cache validation will
spend polling the database for invalid OIDs. Set the environment
variable ``RS_CACHE_POLL_TIMEOUT`` to a number of seconds before
importing RelStorage to use this.

- Avoid an ``AttributeError`` if a persistent ``zope.component`` site
manager is installed as the current site, it&#39;s a ghost, and we&#39;re
making a load query for the first time in a particular connection.
See :issue:`411`.

- Add some DEBUG level logging around forced invalidations of
persistent object caches due to exceeding the cache MVCC limits. See
:issue:`338`.

3.2.0

==================

- Make the ``gevent psycopg2`` driver support critical sections. This
reduces the amount of gevent switches that occur while database
locks are held under a carefully chosen set of circumstances that
attempt to balance overall throughput against latency. See
:issue:`407`.

- Source distributions: Fix installation when Cython isn&#39;t available.
Previously it incorrectly assumed a &#39;.c&#39; extension which lead to
compiler errors. See :issue:`405`.

- Improve various log messages.

3.1.2

==================

- Fix the psycopg2cffi driver inadvertently depending on the
``psycopg2`` package. See :issue:`403`.
- Make the error messages for unavailable drivers include more
information on underlying causes.
- Log a debug message when an &quot;auto&quot; driver is successfully resolved.
- Add a ``--debug`` argument to the ``zodbconvert`` command line tool
to enable DEBUG level logging.
- Add support for pg8000 1.16. Previously, a ``TypeError`` was raised.

3.1.1

==================

- Add support for pg8000 &gt;= 1.15.3. Previously, a ``TypeError`` was
raised.

- SQLite: Committing a transaction releases some resources sooner.
This makes it more likely that auto-checkpointing of WAL files will be
able to reclaim space in some scenarios. See :issue:`401`.

3.1.0

==================

- Use unsigned BTrees for internal data structures to avoid wrapping
in large databases. Requires BTrees 4.7.2.

3.0.1

==================

- Oracle: Fix an AttributeError saving to Oracle. See :pr:`380` by Mauro
Amico.

- MySQL+gevent: Release the critical section a bit sooner. See :issue:`381`.

- SQLite+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`382`.

- MySQL+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`385`.  This also included
some minor optimizations.

.. caution::

  This introduces a change in a stored procedure that is not
  compatible with older versions of RelStorage. When this version
  is first deployed, if there are older versions of RelStorage
  still running, they will be unable to commit. They will fail with
  a transient conflict error; they may attempt retries, but wil not
  succeed. Read-only transactions will continue to work.

3.0.0

==================

- Build binary wheels for Python 3.8 on Windows.

3.0rc1

===================

- SQLite: Avoid logging (at DEBUG level) an error executing ``PRAGMA
OPTIMIZE`` when closing a read-only (load) connection. Now, the
error is avoided by making the connection writable.

- PostgreSQL: Reduce the load connection&#39;s isolation level from
``SERIALIZABLE`` to ``REPEATABLE READ`` (two of the three other
supported databases also operate at this level). This allows
connecting to hot standby/streaming replicas. Since the connection
is read-only, and there were no other ``SERIALIZABLE`` transactions
(the store connection operates in ``READ COMMITTED`` mode), there
should be no other visible effects. See :issue:`376`.

- PostgreSQL: pg8000: Properly handle a ``port`` specification in the
``dsn`` configuration. See :issue:`378`.

- PostgreSQL: All drivers pass the ``application_name`` parameter at
connect time instead of later. This solves an issue with psycopg2
and psycopg2cffi connecting to hot standbys.

- All databases: If ``create-schema`` is false, use a read-only
connection to verify that the schema is correct.

- Packaging: Prune unused headers from the include/ directory.

3.0b3

==================

- SQLite: Fix a bug that could lead to invalid OIDs being allocated if
transactions were imported from another storage.

3.0b2

==================

- SQLite: Require the database to be in dedicated directory.

.. caution::

  This introduces a change to the &lt;sqlite3&gt; configuration.
  Please review the documentation. It is possible to migrate a
  database created earlier to the new structure, but no automated
  tooling or documentation is provided for that.

- SQLite: Allow configuration of many of SQLite&#39;s PRAGMAs for advanced
tuning.

- SQLite: Fix resetting OIDs when zapping a storage. This could be a
problem for benchmarks.

- SQLite: Fix large prefetches resulting in ``OperationalError``

- SQLite: Improve the speed of copying transactions into a SQLite
storage (e.g., with zodbconvert).

- SQLite: Substantially improve general performance. See :pr:`368`.

- SQLite: Add the ``gevent sqlite3`` driver that periodically yields
to the gevent loop at configurable intervals.

- PostgreSQL: Improve the speed of  writes when using the &#39;gevent
psycopg2&#39; driver.

3.0b1

==================

- Make SQLite and Oracle both use UPSERT queries instead of multiple
database round trips.

- Fix an exception with large transactions on SQLite.

- Fix compiling the C extension on very new versions of Microsoft
Visual Studio.

3.0a13

===================

- Further speed improvements and memory efficiency gains of around 30%
for the cache.

- Restore support for Python 2.7 on Windows.

- No longer require Cython to build from a sdist (.tar.gz).

- Add support for using a SQLite file as a RelStorage backend, if all
processes accessing it will be on a single machine. The advantage
over FileStorage is that multiple processes can use the database
concurrently. To allow multiple processes to use a FileStorage one
must deploy ZEO, even if all processes are on a single machine. See
:pr:`362`.

- Fix and test Oracle. The minimum required cx_oracle is now 6.0.

- Add support for Python 3.8.

3.0a12

===================

- Add the ``gevent psycopg2`` driver to allow using the fast psycopg2
driver with gevent.

- Conflict resolution prefetches data for conflicted objects, reducing
the number of database queries and locks needed.

- Introduce a driver-agnostic method for elevating database connection
priority during critical times of two-phase commit, and implement it
for the ``gevent MySQLdb`` driver. This reduces the amount of gevent
switches that occur while database locks are held under a carefully
chosen set of circumstances that attempt to balance overall
throughput against latency. See :issue:`339`.

- Drop support for Python 2.7 on Windows. The required compiler is
very old. See :issue:`358`.

- Substantially reduce the overhead of the cache, making it mome
memory efficient. Also make it substantially faster. This was done
by rewriting it in C. See :issue:`358`.

3.0a11

===================

- Make ``poll_invalidations`` handle other retryable internal
exceptions besides just ``ReadConflictError`` so they don&#39;t
propagate out to ``transaction.begin()``.

- Make the zodburi resolver entry points not require a specific
RelStorage extra such as &#39;postgres&#39;, in case there is a desire to
use a different database driver than the default that&#39;s installed
with that extra. See :issue:`342`, reported by Éloi Rivard.

- Make the zodburi resolvers accept the &#39;driver&#39; query paramater to
allow selecting a specific driver to use. This functions the same as
in a ZConfig configuration.

- Make the zodburi resolvers more strict on the distinction between
boolean arguments and arbitrary integer arguments. Previously, a
query like ``?read_only=12345&amp;cache_local_mb=yes`` would have been
interpreted as ``True`` and ``1``, respectively. Now it produces errors.

- Fix the calculation of the persistent cache size, especially on
Python 2. This is used to determine when to shrink the disk cache.
See :issue:`317`.

- Fix several race conditions when packing history-free storages
through a combination of changes in ordering and more strongly
consistent (``READ ONLY REPEATABLE READ``) transactions.
Reported in :issue:`325` by krissik with initial PR by Andreas
Gabriel.

- Make ``zodbpack`` pass RelStorage specific options like
``--prepack`` and ``--use-prepack-state`` to the RelStorage, even
when it has been wrapped in a ``zc.zlibstorage``.

- Reduce the amount of memory required to pack a RelStorage through
more careful datastructure choices. On CPython 3, the peak
memory usage of the prepack phase can be up to 9 times less. On
CPython 2, pre-packing a 30MM row storage required 3GB memory; now
it requires about 200MB.

- Use server-side cursors during packing when available, further
reducing the amount of memory required. See :issue:`165`.

- Make history-free database iterators from the same storage use a
consistent view of the database (until a transaction is committed
using the storage or ``sync()`` is called). This prevents data loss
in some cases. See :issue:`344`.

- Make copying transactions *from* a history-free RelStorage (e.g., with
``zodbconvert``) require substantially less memory (75% less).

- Make copying transactions *to* a RelStorage clean up temporary blob
files.

- Make ``zodbconvert`` log progress at intervals instead of for every
transaction. Logging every transaction could add significant overhead
unless stdout was redirected to a file.

- Avoid attempting to lock objects being created. See :issue:`329`.

- Make cache vacuuming faster.

3.0a10

===================

- Fix a bug where the persistent cache might not properly detect
object invalidations if the MVCC index pulled too far ahead at save
time. Now it explicitly checks for invalidations at load time, as
earlier versions did. See :pr:`343`.

- Require perfmetrics 3.0.

3.0a9

==================

- Several minor logging improvements.

- Allow many internal constants to be set with environment variables
at startup for experimentation. These are presently undocumented; if
they prove useful to adjust in different environments they may be
promoted to full configuration options.

- Fix importing RelStorage when ``zope.schema`` is not installed.
``zope.schema`` is intended to be a test dependency and optional for
production deployments. Reported in :issue:`334` by Jonathan Lung.

- Make the gevent MySQL driver more efficient at avoiding needless  waits.

- Due to a bug in MySQL (incorrectly rounding the &#39;minute&#39; value of a
timestamp up), TIDs generated in the last half second of a minute
would suddenly jump ahead by 4,266,903,756 integers (a full minute).

- Fix leaking an internal value for ``innodb_lock_timeout`` across
commits on MySQL. This could lead to ``tpc_vote`` blocking longer
than desired. See :issue:`331`.

- Fix ``undo`` to purge the objects whose transaction was revoked from
the cache.

- Make historical storages read-only, raising
``ReadOnlyHistoryError``, during the commit process. Previously this
was only enforced at the ``Connection`` level.

- Rewrite the cache to understand the MVCC nature of the connections
that use it.

This eliminates the use of &quot;checkpoints.&quot; Checkpoints established a
sort of index for objects to allow them to be found in the cache
without necessarily knowing their ``_p_serial`` value. To achieve
good hit rates in large databases, large values for the
``cache-delta-size-limit`` were needed, but if there were lots of
writes, polling to update those large checkpoints could become very
expensive. Because checkpoints were separate in each ZODB connection
in a process, and because when one connection changed its
checkpoints every other connection would also change its checkpoints
on the next access, this could quickly become a problem in highly
concurrent environments (many connections making many large database
queries at the same time). See :issue:`311`.

The new system uses a series of chained maps representing polling
points to build the same index data. All connections can share all
the maps for their view of the database and earlier. New polls add
new maps to the front of the list as needed, and old mapps are
removed once they are no longer needed by any active transaction.
This simulates the underlying database&#39;s MVCC approach.

Other benefits of this approach include:

- No more large polls. While each connection still polls for each
 transaction it enters, they now share state and only poll against
 the last time a poll occurred, not the last time they were used.
 The result should be smaller, more predictable polling.

- Having a model of object visibility allows the cache to use more
 efficient data structures: it can now use the smaller LOBTree to
 reduce the memory occupied by the cache. It also requires
 fewer cache entries overall to store multiple revisions of an
 object, reducing the overhead. And there are no more key copies
 required after a checkpoint change, again reducing overhead and
 making the LRU algorithm more efficient.

- The cache&#39;s LRU algorithm is now at the object level, not the
 object/serial pair.

- Objects that are known to have been changed but whose old revision
 is still in the cache are preemptively removed when no references
 to them are possible, reducing cache memory usage.

- The persistent cache can now guarantee not to write out data that
 it knows to be stale.

Dropping checkpoints probably makes memcache less effective, but
memcache hasn&#39;t been recommended for awhile.

3.0a8

==================

- Improve the safety of the persistent local cache in high-concurrency
environments using older versions of SQLite. Perform a quick
integrity check on startup and refuse to use the cache files if they
are reported corrupt.

- Switch the order in which object locks are taken: try shared locks
first and only then attempt exclusive locks. Shared locks do not
have to block, so a quick lock timeout here means that a
``ReadConflictError`` is inevitable. This works best on PostgreSQL
and MySQL 8, which support true non-blocking locks. On MySQL 5.7,
non-blocking locks are emulated with a 1s timeout. See :issue:`310`.

.. note:: The transaction machinery will retry read conflict errors
         by default. The more rapid detection of them may lead to
         extra retries if there was a process still finishing its
         commit. Consider adding small sleep backoffs to retry
         logic.

- Fix MySQL to immediately rollback its transaction when it gets a
lock timeout, while still in the stored procedure on the database.
Previously it would have required a round trip to the Python
process, which could take an arbitrary amount of time while the
transaction may have still been holding some locks. (After
:issue:`310` they would only be shared locks, but before they would
have been exclusive locks.) This should make for faster recovery in
heavily loaded environments with lots of conflicts. See :issue:`313`.

- Make MySQL clear its temp tables using a single round trip.
Truncation is optional and disabled by default. See :issue:`319`.

- Fix PostgreSQL to not send the definition of the temporary tables
for every transaction. This is only necessary for the first
transaction.

- Improve handling of commit and rollback, especially on PostgreSQL.
We now generate many fewer unneeded rollbacks. See :issue:`289`.

- Stop checking the status of ``readCurrent`` OIDs twice.

- Make the gevent MySQL driver yield more frequently while getting
large result sets. Previously it would block in C to read the entire
result set. Now it yields according to the cursor&#39;s ``arraysize``.
See :issue:`315`.

- Polling for changes now iterates the cursor instead of using
``fetchall()``. This can reduce memory usage and provide better
behaviour in a concurrent environment, depending on the cursor
implementation.

- Add three environment variables to control the odds of whether any
given poll actually suggests shifted checkpoints. These are all
floating point numbers between 0 and 1. They are
``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL`` (default to 0.7,
i.e., 70%), ``RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT``
(default 0.8) and ``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE``
(default 0.2). (There are corresponding class variables on the
storage cache that could also be set.) Use values of ``1``, ``1``
and ``0`` to restore the old completely deterministic behaviour.
It&#39;s not clear whether these will be useful, so they are not
officially options yet but they may become so. Feedback is
appreciated! See :issue:`323`.

.. note::

  These were removed in 3.0a9.

3.0a7

==================

- Eliminate runtime dependency on ZEO. See :issue:`293`.

- Fix a rare race condition allocating OIDs on MySQL. See
:issue:`283`.

- Optimize the ``loadBefore`` method. It appears to be mostly used in
the tests.

- Fix the blob cache cleanup thread to use a real native thread if
we&#39;re monkey-patched by gevent, using gevent&#39;s thread pool.
Previously, cleaning up the blob cache would block the event loop
for the duration. See :issue:`296`.

- Improve the thread safety and resource usage of blob cache cleanup.
Previously it could spawn many useless threads.

- When caching a newly uploaded blob for a history free storage, if
there&#39;s an older revision of the blob in the cache, and it is not in
use, go ahead and preemptively remove it from disk. This can help
prevent the cache size from growing out of hand and limit the number
of expensive full cache checks required. See :issue:`297`.

- Change the default value of the configuration setting
``shared-blob-dir`` to false, meaning that the default is now to use
a blob cache. If you were using shared blobs before, you&#39;ll need to
explicitly set a value for ``shared-blob-dir`` to ``true`` before
starting RelStorage.

- Add an option, ``blob-cache-size-check-external``, that causes the
blob cache cleanup process to run in a subprocess instead of a
thread. This can free up the storage process to handle requests.
This is not recommended on Windows. (``python -m
relstorage.blobhelper.cached /path/to/cache size_in_bytes`` can be
used to run a manual cleanup at any time. This is currently an
internal implementation detail.)

- Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable ``RELSTORAGE_ABORT_EARLY``. Aborting early releases
database locks to allow other transactions to make progress
immediately. See :issue:`50`.

- Reduce the strength of locks taken by ``Connection.readCurrent`` so
that they don&#39;t conflict with other connections that just want to
verify they haven&#39;t changed. This also lets us immediately detect a
conflict error with an in-progress transaction that is trying to
alter those objects. See :issue:`302`.

- Make databases that use row-level locks (MySQL and PostgreSQL) raise
specific exceptions on failures to acquire those locks. A different
exception is raised for rows a transaction needs to modify compared
to rows it only needs to read. Both are considered transient to
encourage transaction middleware to retry. See :issue:`303`.

- Move more of the vote phase of transaction commit into a database
stored procedure on MySQL and PostgreSQL, beginning with taking the
row-level locks. This eliminates several more database round trips
and the need for the Python thread (or greenlet) to repeatedly
release and then acquire the GIL while holding global locks. See
:issue:`304`.

- Make conflict resolution require fewer database round trips,
especially on PostgreSQL and MySQL, at the expense of using more
memory. In the ideal case it now only needs one (MySQL) or two
(PostgreSQL) queries. Previously it needed at least twice the number
of trips as there were conflicting objects. On both databases, the
benchmarks are 40% to 80% faster (depending on cache configuration).

3.0a6

==================

Enhancements
------------

- Eliminate a few extra round trips to the database on transaction
completion: One extra ``ROLLBACK`` in all databases, and one query
against the ``transaction`` table in history-preserving databases.
See :issue:`159`.

- Prepare more statements used during regular polling.

- Gracefully handle certain disconnected exceptions when rolling back
connections in between transactions. See :issue:`280`.

- Fix a cache error (&quot;TypeError: NoneType object is not
subscriptable&quot;) when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
&lt;https://pypi.org/project/zc.zodbdgc/&gt;`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

  It is critical that ``pack-gc`` be turned off (set to false) in a
  multi-database and that only ``multi-zodb-gc`` be used to perform
  garbage collection.

Packing
~~~~~~~

- Make ``RelStorage.pack()`` also accept a TID from the RelStorage
database to pack to. The usual Unix timestamp form for choosing a
pack time can be ambiguous in the event of multiple transactions
within a very short period of time. This is mostly a concern for
automated tests.

Similarly, it will accept a value less than 0 to mean the most
recent transaction in the database. This is useful when machine
clocks may not be well synchronized, or from automated tests.

Implementation
--------------

- Remove vestigial top-level thread locks. No instance of RelStorage
is thread safe.

RelStorage is an ``IMVCCStorage``, which means that each ZODB
``Connection`` gets its own new storage object. No visible storage
state is shared among Connections. Connections are explicitly
documented as not being thread safe. Since 2.0, RelStorage&#39;s
Connection instances have taken advantage of that fact to be a
little lighter weight through not being thread safe. However, they
still paid the overhead of locking method calls and code complexity.

The top-level storage (the one belonging to a ``ZODB.DB``) still
used heavyweight locks in earlier releases. ``ZODB.DB.storage`` is
documented as being only useful for tests, and the ``DB`` object
itself does not expose any operations that use the storage in a way
that would require thread safety.

The remaining thread safety support has been removed. This
simplifies the code and reduces overhead.

If you were previously using the ``ZODB.DB.storage`` object, or a
``RelStorage`` instance you constructed manually, from multiple
threads, instead make sure each thread has a distinct
``RelStorage.new_instance()`` object.

- A ``RelStorage`` instance now only implements the appropriate subset
of ZODB storage interfaces according to its configuration. For
example, if there is no configured ``blob-dir``, it won&#39;t implement
``IBlobStorage``, and if ``keep-history`` is false, it won&#39;t
implement ``IStorageUndoable``.

- Refactor RelStorage internals for a cleaner separation of concerns.
This includes how (some) queries are written and managed, making it
easier to prepare statements, but only those actually used.


MySQL
-----

- On MySQL, move allocating a TID into the database. On benchmarks
of a local machine this can be a scant few percent faster, but it&#39;s
primarily intended to reduce the number of round-trips to the
database. This is a step towards :issue:`281`. See :pr:`286`.

- On MySQL, set the connection timezone to be UTC. This is necessary
to get values consistent between ``UTC_TIMESTAMP``,
``UNIX_TIMESTAMP``, ``FROM_UNIXTIME``, and Python&#39;s ``time.gmtime``,
as used for comparing TIDs.

- On MySQL, move most steps of finishing a transaction into a stored
procedure. Together with the TID allocation changes, this reduces
the number of database queries from::

 1 to lock
  + 1 to get TID
  + 1 to store transaction (0 in history free)
  + 1 to move states
  + 1 for blobs (2 in history free)
  + 1 to set current (0 in history free)
  + 1 to commit
 = 7 or 6 (in history free)

down to 1. This is expected to be especially helpful for gevent
deployments, as the database lock is held, the transaction finalized
and committed, and the database lock released, all without involving
greenlets or greenlet switches. By allowing the GIL to be released
longer it may also be helpful for threaded environments. See
:issue:`281` and :pr:`287` for benchmarks and specifics.

.. caution::

 MySQL 5.7.18 and earlier contain a severe bug that causes the
 server to crash when the stored procedure is executed.


- Make PyMySQL use the same precision as mysqlclient when sending
floating point parameters.

- Automatically detect when MySQL stored procedures in the database
are out of date with the current source in this package and replace
them.

PostgreSQL
----------

- As for MySQL, move allocating a TID into the database.

- As for MySQL, move most steps of finishing a transaction into a
stored procedure. On psycopg2 and psycopg2cffi this is done in a
single database call. With pg8000, however, it still takes two, with
the second call being the COMMIT call that releases locks.

- Speed up getting the approximate number of objects
(``len(storage)``) in a database by using the estimates collected by
the autovacuum process or analyzing tables, instead of asking for a
full table scan.

3.0a5

==================

- Reduce the time that MySQL will wait to perform OID garbage
collection on startup. See :issue:`271`.

- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
``ProgrammingError: (2014, &quot;Commands out of sync; you can&#39;t run this
command now&quot;)``. See :issue:`270`.

- Fix the &quot;gevent MySQLdb&quot; driver to be cooperative during ``commit``
and ``rollback`` operations. Previously, it would block the event
loop for the entire time it took to send the commit or rollback
request, the server to perform the request, and the result to be
returned. Now, it frees the event loop after sending the request.
See :issue:`272`.

- Call ``set_min_oid`` less often if a storage is just updating
existing objects, not creating its own.

- Fix an occasional possible deadlock in MySQL&#39;s ``set_min_oid``. See
:pr:`276`.

3.0a4

==================

- Add support for the ZODB 5 ``connection.prefetch(*args)`` API. This
takes either OIDs (``obj._p_oid``) or persistent ghost objects, or
an iterator of those things, and asks the storage to load them into
its cache for use in the future. In RelStorage, this uses the shared
cache and so may be useful for more than one thread. This can be
3x or more faster than loading objects on-demand. See :issue:`239`.

- Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL
versions natively handle blobs greater than 2GB in size, and the
server was already chunking the blobs for storage, so our layer of
extra chunking has become unnecessary.

.. important::

  The first time a storage is opened with this version,
  blobs that have multiple chunks will be collapsed into a single
  chunk. If there are many blobs larger than 2GB, this could take
  some time.

  It is recommended you have a backup before installing this
  version.

  To verify that the blobs were correctly migrated, you should
  clean or remove your configured blob-cache directory, forcing new
  blobs to be downloaded.

- Fix a bug that left large objects behind if a PostgreSQL database
containing any blobs was ever zapped (with ``storage.zap_all()``).
The ``zodbconvert`` command, the ``zodbshootout`` command, and the
RelStorage test suite could all zap databases. Running the
``vacuumlo`` command included with PostgreSQL will free such
orphaned large objects, after which a regular ``vacuumdb`` command
can be used to reclaim space. See :issue:`260`.

- Conflict resolution can use data from the cache, thus potentially
eliminating a database hit during a very time-sensitive process.
Please file issues if you encounter any strange behaviour when
concurrently packing to the present time and also resolving
conflicts, in case there are corner cases.

- Packing a storage now invalidates the cached values that were packed
away. For the global caches this helps reduce memory pressure; for
the local cache this helps reduce memory pressure and ensure a more
useful persistent cache (this probably matters most when running on
a single machine).

- Make MySQL use ``ON DUPLICATE KEY UPDATE`` rather than ``REPLACE``.
This can be friendlier to the storage engine as it performs an
in-place ``UPDATE`` rather than a ``DELETE`` followed by an
``INSERT``. See :issue:`189`.

- Make PostgreSQL use an upsert query for moving rows into place on
history-preserving databases.

- Support ZODB 5&#39;s parallel commit feature. This means that the
database-wide commit lock is taken much later in the process, and
held for a much shorter time than before.

Previously, the commit lock was taken during the ``tpc_vote`` phase,
and held while we checked ``Connection.readCurrent`` values, and
checked for (and hopefully resolved) conflicts. Other transaction
resources (such as other ZODB databases in a multi-db setup) then
got to vote while we held this lock. Finally, in ``tpc_finally``,
objects were moved into place and the lock was released. This
prevented any other storage instances from checking for
``readCurrent`` or conflicts while we were doing that.

Now, ``tpc_vote`` is (usually) able to check
``Connection.readCurrent`` and check and resolve conflicts without
taking the commit lock. Only in ``tpc_finish``, when we need to
finally allocate the transaction ID, is the commit lock taken, and
only held for the duration needed to finally move objects into
place. This allows other storages for this database, and other
transaction resources for this transaction, to proceed with voting,
conflict resolution, etc, in parallel.

Consistent results are maintained by use of object-level row
locking. Thus, two transactions that attempt to modify the same
object will now only block each other.

There are two exceptions. First, if the ``storage.restore()`` method
is used, the commit lock must be taken very early (before
``tpc_vote``). This is usually only done as part of copying one
database to another. Second, if the storage is configured with a
shared blob directory instead of a blob cache (meaning that blobs
are *only* stored on the filesystem) and the transaction has added
or mutated blobs, the commit lock must be taken somewhat early to
ensure blobs can be saved (after conflict resolution, etc, but
before the end of ``tpc_vote``). It is recommended to store blobs on
the RDBMS server and use a blob cache. The shared blob layout can be
considered deprecated for this reason).

In addition, the new locking scheme means that packing no longer
needs to acquire a commit lock and more work can proceed in parallel
with regular commits. (Though, there may have been some regressions
in the deletion phase of packing speed MySQL; this has not been
benchmarked.)

.. note::

  If the environment variable ``RELSTORAGE_LOCK_EARLY`` is
  set when RelStorage is imported, then parallel commit will not be
  enabled, and the commit lock will be taken at the beginning of
  the tpc_vote phase, just like before: conflict resolution and
  readCurrent will all be handled with the lock held.

  This is intended for use diagnosing and temporarily working
  around bugs, such as the database driver reporting a deadlock
  error. If you find it necessary to use this setting, please
  report an issue at https://github.com/zodb/relstorage/issues.

See :issue:`125`.

- Deprecate the option ``shared-blob-dir``. Shared blob dirs prevent
using parallel commits when blobs are part of a transaction.

- Remove the &#39;umysqldb&#39; driver option. This driver exhibited failures
with row-level locking used for parallel commits. See :issue:`264`.

- Migrate all remaining MySQL tables to InnoDB. This is primarily the
tables used during packing, but also the table used for allocating
new OIDs.

Tables will be converted the first time a storage is opened that is
allowed to create the schema (``create-schema`` in the
configuration; default is true). For large tables, this may take
some time, so it is recommended to finish any outstanding packs
before upgrading RelStorage.

If schema creation is not allowed, and required tables are not using
InnoDB, an exception will be raised. Please contact the RelStorage
maintainers on GitHub if you have a need to use a storage engine
besides InnoDB.

This allows for better error detection during packing with parallel
commits. It is also required for `MySQL Group Replication
&lt;https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html&gt;`_.
Benchmarking also shows that creating new objects can be up to 15%
faster due to faster OID allocation.

Things to be aware of:

 - MySQL&#39;s `general conversion notes
   &lt;https://dev.mysql.com/doc/refman/8.0/en/converting-tables-to-innodb.html&gt;`_
   suggest that if you had tuned certain server parameters for
   MyISAM tables (which RelStorage only used during packing) it
   might be good to evaluate those parameters again.
 - InnoDB tables may take more disk space than MyISAM tables.
 - The ``new_oid`` table may temporarily have more rows in it at one
   time than before. They will still be garbage collected
   eventually. The change in strategy was necessary to handle
   concurrent transactions better.

See :issue:`188`.

- Fix an ``OperationalError: database is locked`` that could occur on
startup if multiple processes were reading or writing the cache
database. See :issue:`266`.

3.0a3

==================

- Zapping a storage now also removes any persistent cache files. See
:issue:`241`.

- Zapping a MySQL storage now issues ``DROP TABLE`` statements instead
of ``DELETE FROM`` statements. This is much faster on large
databases. See :issue:`242`.

- Workaround the PyPy 7.1 JIT bug using MySQL Connector/Python. It is no
longer necessary to disable the JIT in PyPy 7.1.

- On PostgreSQL, use PostgreSQL&#39;s efficient binary ``COPY FROM`` to
store objects into the database. This can be 20-40% faster. See
:issue:`247`.

- Use more efficient mechanisms to poll the database for current TIDs
when verifying serials in transactions.

- Silence a warning about ``cursor.connection`` from pg8000. See
:issue:`238`.

- Poll the database for the correct TIDs of older transactions when
loading from a persistent cache, and only use the entries if they
are current. This restores the functionality lost in the fix for
:issue:`249`.

- Increase the default cache delta limit sizes.

- Fix a race condition accessing non-shared blobs when the blob cache
limit was reached which could result in blobs appearing to be
spuriously empty. This was only observed on macOS. See :issue:`219`.

- Fix a bug computing the cache delta maps when restoring from
persistent cache that could cause data from a single transaction to
be stale, leading to spurious conflicts.

3.0a2

==================

- Drop support for PostgreSQL versions earlier than 9.6. See
:issue:`220`.

- Make MySQL and PostgreSQL use a prepared statement to get
transaction IDs. PostgreSQL also uses a prepared statement to set
them. This can be slightly faster. See :issue:`246`.

- Make PostgreSQL use a prepared statement to move objects to their
final destination during commit (history free only). See
:issue:`246`.

- Fix an issue with persistent caches written to from multiple
instances sometimes getting stale data after a restart. Note: This
makes the persistent cache less useful for objects that rarely
change in a database that features other actively changing objects;
it is hoped this can be addressed in the future. See :issue:`249`.

3.0a1

==================

- Add support for Python 3.7.

- Drop support for Python 3.4.

- Drop support for Python 2.7.8 and earlier.

- Drop support for ZODB 4 and ZEO 4.

- Officially drop support for versions of MySQL before 5.7.9. We haven&#39;t
been testing on anything older than that for some time, and older
than 5.6 for some time before that.

- Drop the ``poll_interval`` parameter. It has been deprecated with a
warning and ignored since 2.0.0b2. See :issue:`222`.

- Drop support for pg8000 older than 1.11.0.

- Drop support for MySQL Connector/Python older than 8.0.16. Many
older versions are known to be broken. Note that the C extension,
while available, is not currently recommended due to internal
errors. See :issue:`228`.

- Test support for MySQL Connector/Python on PyPy. See :issue:`228`.

.. caution:: Prior to PyPy 7.2 or RelStorage 3.0a3, it is necessary to disable JIT
            inlining due to `a PyPy bug
            &lt;https://bitbucket.org/pypy/pypy/issues/3014/jit-issue-inlining-structunpack-hh&gt;`_
            with ``struct.unpack``.

- Drop support for PyPy older than 5.3.1.

- Drop support for the &quot;MySQL Connector/Python&quot; driver name since it
wasn&#39;t possible to know if it would use the C extension or the
Python implementation. Instead, explicitly use the &#39;Py&#39; or &#39;C&#39;
prefixed name. See :pr:`229`.

- Drop the internal and undocumented environment variables that could be
used to force configurations that did not specify a database driver
to use a specific driver. Instead, list the driver in the database
configuration.

- Opening a RelStorage configuration object read from ZConfig more
than once would lose the database driver setting, reverting to
&#39;auto&#39;. It now retains the setting. See :issue:`231`.

- Fix Python 3 with mysqlclient 1.4. See :issue:`213`.

- Drop support for mysqlclient &lt; 1.4.

- Make driver names in RelStorage configurations case-insensitive
(e.g., &#39;MySQLdb&#39; and &#39;mysqldb&#39; are both valid). See :issue:`227`.

- Rename the column ``transaction.empty`` to ``transaction.is_empty``
for compatibility with MySQL 8.0, where ``empty`` is now a reserved
word. The migration will happen automatically when a storage is
first opened, unless it is configured not to create the schema.

.. note:: This migration has not been tested for Oracle.

.. note:: You must run this migration *before* attempting to upgrade
         a MySQL 5 database to MySQL 8. If you cannot run the
         upgrade through opening the storage, the statement is
         ``ALTER TABLE transaction CHANGE empty is_empty BOOLEAN
         NOT NULL DEFAULT FALSE``.

- Stop getting a warning about invalid optimizer syntax when packing a
MySQL database (especially with the PyMySQL driver). See
:issue:`163`.

- Add ``gevent MySQLdb``, a new driver that cooperates with gevent
while still using the C extensions of ``mysqlclient`` to communicate
with MySQL. This is now recommended over ``umysqldb``, which is
deprecated and will be removed.

- Rewrite the persistent cache implementation. It now is likely to
produce much higher hit rates (100% on some benchmarks, compared to
1-2% before). It is currently slower to read and write, however.
This is a work in progress. See :pr:`243`.

- Add more aggressive validation and, when possible, corrections for
certain types of cache consistency errors. Previously an
``AssertionError`` would be raised with the message &quot;Detected an
inconsistency between RelStorage and the database...&quot;. We now
proactively try harder to avoid that situation based on some
educated guesses about when it could happen, and should it still
happen we now reset the cache and raise a type of ``TransientError``
allowing the application to retry. A few instances where previously
incorrect data could be cached may now raise such a
``TransientError``. See :pr:`245`.

Links

PyPI: https://pypi.org/project/relstorage
Changelog: https://pyup.io/changelogs/relstorage/
Docs: https://relstorage.readthedocs.io/

opened by pyup-bot 1

Update relstorage to 3.4.1

This PR updates RelStorage[postgresql] from 2.1.1 to 3.4.1.

Changelog

3.4.1

==================

- RelStorage has moved from Travis CI to `GitHub Actions
&lt;https://github.com/zodb/relstorage/actions&gt;`_ for macOS and Linux
tests and manylinux wheel building. See :issue:`437`.
- RelStorage is now tested with PostgreSQL 13.1. See :issue:`427`.
- RelStorage is now tested with PyMySQL 1.0. See :issue:`434`.
- Update the bundled boost C++ library from 1.71 to 1.75.
- Improve the way store connections are managed to make it less likely
a &quot;stale&quot; store connection that hasn&#39;t actually been checked for
liveness gets used.

3.4.0

==================

- Improve the logging of ``zodbconvert``. The regular minute logging
contains more information and takes blob sizes into account, and
debug logging is more useful, logging about four times a minute.
Some extraneous logging was bumped down to trace.

- Fix psycopg2 logging debug-level warnings from the PostgreSQL server
on transaction commit about not actually being in a transaction.
(Sadly this just squashes the warning, it doesn&#39;t eliminate the
round trip that generates it.)

- Improve the performance of packing databases, especially
history-free databases. See :issue:`275`.

- Give ``zodbpack`` the ability to check for missing references in
RelStorages with the ``--check-refs-only`` argument. This will
perform a pre-pack with GC, and then report on any objects that
would be kept and refer to an object that does not exist. This can
be much faster than external scripts such as those provided by
``zc.zodbdgc``, though it definitely only reports missing references
one level deep.

This is new functionality. Feedback, as always, is very welcome!

- Avoid extra pickling operations of transaction meta data extensions
by using the new ``extension_bytes`` property introduced in ZODB
5.6. This results in higher-fidelity copies of storages, and may
slightly improve the speed of the process too. See :issue:`424`.

- Require ZODB 5.6, up from ZODB 5.5. See :issue:`424`.

- Make ``zodbconvert`` *much faster* (around 5 times faster) when the
destination is a history-free RelStorage and the source supports
``record_iternext()`` (like RelStorage and FileStorage do). This
also applies to the ``copyTransactionsFrom`` method. This is disabled
with the ``--incremental`` option, however. Be sure to read the
updated zodbconvert documentation.

3.3.2

==================

- Fix an ``UnboundLocalError`` in case a store connection could not be
opened. This error shadowed the original error opening the
connection. See :issue:`421`.

3.3.1

==================

- Manylinux wheels: Do not specify the C++ standard to use when
compiling. This seemed to result in an incompatibility with
manylinux1 systems that was not caught by ``auditwheel``.

3.3.0

==================

- The &quot;MySQLdb&quot; driver didn&#39;t properly use server-side cursors when
requested. This would result in unexpected increased memory usage
for things like packing and storage iteration.

- Make RelStorage instances implement
``IStorageCurrentRecordIteration``. This lets both
history-preserving and history-free storages work with
``zodbupdate``. See :issue:`389`.

- RelStorage instances now pool their storage connection. Depending on
the workload and ZODB configuration, this can result in requiring
fewer storage connections. See :issue:`409` and :pr:`417`.

There is a potential semantic change: Under some circumstances, the
``loadBefore`` and ``loadSerial`` methods could be used to load
states from the future (not visible to the storage&#39;s load
connection) by using the store connection. This ability has been
removed.

- Add support for Python 3.9.

- Drop support for Python 3.5.

- Build manylinux x86-64 and macOS wheels on Travis CI as part of the
release process. These join the Windows wheels in being
automatically uploaded to PyPI.

3.2.1

==================

- Improve the speed of loading large cache files by reducing the cost
of cache validation.

- The timing metrics for ``current_object_oids`` are always collected,
not just sampled. MySQL and PostgreSQL will only call this method
once at startup during persistent cache validation. Other databases
may call this method once during the commit process.

- Add the ability to limit how long persistent cache validation will
spend polling the database for invalid OIDs. Set the environment
variable ``RS_CACHE_POLL_TIMEOUT`` to a number of seconds before
importing RelStorage to use this.

- Avoid an ``AttributeError`` if a persistent ``zope.component`` site
manager is installed as the current site, it&#39;s a ghost, and we&#39;re
making a load query for the first time in a particular connection.
See :issue:`411`.

- Add some DEBUG level logging around forced invalidations of
persistent object caches due to exceeding the cache MVCC limits. See
:issue:`338`.

3.2.0

==================

- Make the ``gevent psycopg2`` driver support critical sections. This
reduces the amount of gevent switches that occur while database
locks are held under a carefully chosen set of circumstances that
attempt to balance overall throughput against latency. See
:issue:`407`.

- Source distributions: Fix installation when Cython isn&#39;t available.
Previously it incorrectly assumed a &#39;.c&#39; extension which lead to
compiler errors. See :issue:`405`.

- Improve various log messages.

3.1.2

==================

- Fix the psycopg2cffi driver inadvertently depending on the
``psycopg2`` package. See :issue:`403`.
- Make the error messages for unavailable drivers include more
information on underlying causes.
- Log a debug message when an &quot;auto&quot; driver is successfully resolved.
- Add a ``--debug`` argument to the ``zodbconvert`` command line tool
to enable DEBUG level logging.
- Add support for pg8000 1.16. Previously, a ``TypeError`` was raised.

3.1.1

==================

- Add support for pg8000 &gt;= 1.15.3. Previously, a ``TypeError`` was
raised.

- SQLite: Committing a transaction releases some resources sooner.
This makes it more likely that auto-checkpointing of WAL files will be
able to reclaim space in some scenarios. See :issue:`401`.

3.1.0

==================

- Use unsigned BTrees for internal data structures to avoid wrapping
in large databases. Requires BTrees 4.7.2.

3.0.1

==================

- Oracle: Fix an AttributeError saving to Oracle. See :pr:`380` by Mauro
Amico.

- MySQL+gevent: Release the critical section a bit sooner. See :issue:`381`.

- SQLite+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`382`.

- MySQL+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`385`.  This also included
some minor optimizations.

.. caution::

  This introduces a change in a stored procedure that is not
  compatible with older versions of RelStorage. When this version
  is first deployed, if there are older versions of RelStorage
  still running, they will be unable to commit. They will fail with
  a transient conflict error; they may attempt retries, but wil not
  succeed. Read-only transactions will continue to work.

3.0.0

==================

- Build binary wheels for Python 3.8 on Windows.

3.0rc1

===================

- SQLite: Avoid logging (at DEBUG level) an error executing ``PRAGMA
OPTIMIZE`` when closing a read-only (load) connection. Now, the
error is avoided by making the connection writable.

- PostgreSQL: Reduce the load connection&#39;s isolation level from
``SERIALIZABLE`` to ``REPEATABLE READ`` (two of the three other
supported databases also operate at this level). This allows
connecting to hot standby/streaming replicas. Since the connection
is read-only, and there were no other ``SERIALIZABLE`` transactions
(the store connection operates in ``READ COMMITTED`` mode), there
should be no other visible effects. See :issue:`376`.

- PostgreSQL: pg8000: Properly handle a ``port`` specification in the
``dsn`` configuration. See :issue:`378`.

- PostgreSQL: All drivers pass the ``application_name`` parameter at
connect time instead of later. This solves an issue with psycopg2
and psycopg2cffi connecting to hot standbys.

- All databases: If ``create-schema`` is false, use a read-only
connection to verify that the schema is correct.

- Packaging: Prune unused headers from the include/ directory.

3.0b3

==================

- SQLite: Fix a bug that could lead to invalid OIDs being allocated if
transactions were imported from another storage.

3.0b2

==================

- SQLite: Require the database to be in dedicated directory.

.. caution::

  This introduces a change to the &lt;sqlite3&gt; configuration.
  Please review the documentation. It is possible to migrate a
  database created earlier to the new structure, but no automated
  tooling or documentation is provided for that.

- SQLite: Allow configuration of many of SQLite&#39;s PRAGMAs for advanced
tuning.

- SQLite: Fix resetting OIDs when zapping a storage. This could be a
problem for benchmarks.

- SQLite: Fix large prefetches resulting in ``OperationalError``

- SQLite: Improve the speed of copying transactions into a SQLite
storage (e.g., with zodbconvert).

- SQLite: Substantially improve general performance. See :pr:`368`.

- SQLite: Add the ``gevent sqlite3`` driver that periodically yields
to the gevent loop at configurable intervals.

- PostgreSQL: Improve the speed of  writes when using the &#39;gevent
psycopg2&#39; driver.

3.0b1

==================

- Make SQLite and Oracle both use UPSERT queries instead of multiple
database round trips.

- Fix an exception with large transactions on SQLite.

- Fix compiling the C extension on very new versions of Microsoft
Visual Studio.

3.0a13

===================

- Further speed improvements and memory efficiency gains of around 30%
for the cache.

- Restore support for Python 2.7 on Windows.

- No longer require Cython to build from a sdist (.tar.gz).

- Add support for using a SQLite file as a RelStorage backend, if all
processes accessing it will be on a single machine. The advantage
over FileStorage is that multiple processes can use the database
concurrently. To allow multiple processes to use a FileStorage one
must deploy ZEO, even if all processes are on a single machine. See
:pr:`362`.

- Fix and test Oracle. The minimum required cx_oracle is now 6.0.

- Add support for Python 3.8.

3.0a12

===================

- Add the ``gevent psycopg2`` driver to allow using the fast psycopg2
driver with gevent.

- Conflict resolution prefetches data for conflicted objects, reducing
the number of database queries and locks needed.

- Introduce a driver-agnostic method for elevating database connection
priority during critical times of two-phase commit, and implement it
for the ``gevent MySQLdb`` driver. This reduces the amount of gevent
switches that occur while database locks are held under a carefully
chosen set of circumstances that attempt to balance overall
throughput against latency. See :issue:`339`.

- Drop support for Python 2.7 on Windows. The required compiler is
very old. See :issue:`358`.

- Substantially reduce the overhead of the cache, making it mome
memory efficient. Also make it substantially faster. This was done
by rewriting it in C. See :issue:`358`.

3.0a11

===================

- Make ``poll_invalidations`` handle other retryable internal
exceptions besides just ``ReadConflictError`` so they don&#39;t
propagate out to ``transaction.begin()``.

- Make the zodburi resolver entry points not require a specific
RelStorage extra such as &#39;postgres&#39;, in case there is a desire to
use a different database driver than the default that&#39;s installed
with that extra. See :issue:`342`, reported by Éloi Rivard.

- Make the zodburi resolvers accept the &#39;driver&#39; query paramater to
allow selecting a specific driver to use. This functions the same as
in a ZConfig configuration.

- Make the zodburi resolvers more strict on the distinction between
boolean arguments and arbitrary integer arguments. Previously, a
query like ``?read_only=12345&amp;cache_local_mb=yes`` would have been
interpreted as ``True`` and ``1``, respectively. Now it produces errors.

- Fix the calculation of the persistent cache size, especially on
Python 2. This is used to determine when to shrink the disk cache.
See :issue:`317`.

- Fix several race conditions when packing history-free storages
through a combination of changes in ordering and more strongly
consistent (``READ ONLY REPEATABLE READ``) transactions.
Reported in :issue:`325` by krissik with initial PR by Andreas
Gabriel.

- Make ``zodbpack`` pass RelStorage specific options like
``--prepack`` and ``--use-prepack-state`` to the RelStorage, even
when it has been wrapped in a ``zc.zlibstorage``.

- Reduce the amount of memory required to pack a RelStorage through
more careful datastructure choices. On CPython 3, the peak
memory usage of the prepack phase can be up to 9 times less. On
CPython 2, pre-packing a 30MM row storage required 3GB memory; now
it requires about 200MB.

- Use server-side cursors during packing when available, further
reducing the amount of memory required. See :issue:`165`.

- Make history-free database iterators from the same storage use a
consistent view of the database (until a transaction is committed
using the storage or ``sync()`` is called). This prevents data loss
in some cases. See :issue:`344`.

- Make copying transactions *from* a history-free RelStorage (e.g., with
``zodbconvert``) require substantially less memory (75% less).

- Make copying transactions *to* a RelStorage clean up temporary blob
files.

- Make ``zodbconvert`` log progress at intervals instead of for every
transaction. Logging every transaction could add significant overhead
unless stdout was redirected to a file.

- Avoid attempting to lock objects being created. See :issue:`329`.

- Make cache vacuuming faster.

3.0a10

===================

- Fix a bug where the persistent cache might not properly detect
object invalidations if the MVCC index pulled too far ahead at save
time. Now it explicitly checks for invalidations at load time, as
earlier versions did. See :pr:`343`.

- Require perfmetrics 3.0.

3.0a9

==================

- Several minor logging improvements.

- Allow many internal constants to be set with environment variables
at startup for experimentation. These are presently undocumented; if
they prove useful to adjust in different environments they may be
promoted to full configuration options.

- Fix importing RelStorage when ``zope.schema`` is not installed.
``zope.schema`` is intended to be a test dependency and optional for
production deployments. Reported in :issue:`334` by Jonathan Lung.

- Make the gevent MySQL driver more efficient at avoiding needless  waits.

- Due to a bug in MySQL (incorrectly rounding the &#39;minute&#39; value of a
timestamp up), TIDs generated in the last half second of a minute
would suddenly jump ahead by 4,266,903,756 integers (a full minute).

- Fix leaking an internal value for ``innodb_lock_timeout`` across
commits on MySQL. This could lead to ``tpc_vote`` blocking longer
than desired. See :issue:`331`.

- Fix ``undo`` to purge the objects whose transaction was revoked from
the cache.

- Make historical storages read-only, raising
``ReadOnlyHistoryError``, during the commit process. Previously this
was only enforced at the ``Connection`` level.

- Rewrite the cache to understand the MVCC nature of the connections
that use it.

This eliminates the use of &quot;checkpoints.&quot; Checkpoints established a
sort of index for objects to allow them to be found in the cache
without necessarily knowing their ``_p_serial`` value. To achieve
good hit rates in large databases, large values for the
``cache-delta-size-limit`` were needed, but if there were lots of
writes, polling to update those large checkpoints could become very
expensive. Because checkpoints were separate in each ZODB connection
in a process, and because when one connection changed its
checkpoints every other connection would also change its checkpoints
on the next access, this could quickly become a problem in highly
concurrent environments (many connections making many large database
queries at the same time). See :issue:`311`.

The new system uses a series of chained maps representing polling
points to build the same index data. All connections can share all
the maps for their view of the database and earlier. New polls add
new maps to the front of the list as needed, and old mapps are
removed once they are no longer needed by any active transaction.
This simulates the underlying database&#39;s MVCC approach.

Other benefits of this approach include:

- No more large polls. While each connection still polls for each
 transaction it enters, they now share state and only poll against
 the last time a poll occurred, not the last time they were used.
 The result should be smaller, more predictable polling.

- Having a model of object visibility allows the cache to use more
 efficient data structures: it can now use the smaller LOBTree to
 reduce the memory occupied by the cache. It also requires
 fewer cache entries overall to store multiple revisions of an
 object, reducing the overhead. And there are no more key copies
 required after a checkpoint change, again reducing overhead and
 making the LRU algorithm more efficient.

- The cache&#39;s LRU algorithm is now at the object level, not the
 object/serial pair.

- Objects that are known to have been changed but whose old revision
 is still in the cache are preemptively removed when no references
 to them are possible, reducing cache memory usage.

- The persistent cache can now guarantee not to write out data that
 it knows to be stale.

Dropping checkpoints probably makes memcache less effective, but
memcache hasn&#39;t been recommended for awhile.

3.0a8

==================

- Improve the safety of the persistent local cache in high-concurrency
environments using older versions of SQLite. Perform a quick
integrity check on startup and refuse to use the cache files if they
are reported corrupt.

- Switch the order in which object locks are taken: try shared locks
first and only then attempt exclusive locks. Shared locks do not
have to block, so a quick lock timeout here means that a
``ReadConflictError`` is inevitable. This works best on PostgreSQL
and MySQL 8, which support true non-blocking locks. On MySQL 5.7,
non-blocking locks are emulated with a 1s timeout. See :issue:`310`.

.. note:: The transaction machinery will retry read conflict errors
         by default. The more rapid detection of them may lead to
         extra retries if there was a process still finishing its
         commit. Consider adding small sleep backoffs to retry
         logic.

- Fix MySQL to immediately rollback its transaction when it gets a
lock timeout, while still in the stored procedure on the database.
Previously it would have required a round trip to the Python
process, which could take an arbitrary amount of time while the
transaction may have still been holding some locks. (After
:issue:`310` they would only be shared locks, but before they would
have been exclusive locks.) This should make for faster recovery in
heavily loaded environments with lots of conflicts. See :issue:`313`.

- Make MySQL clear its temp tables using a single round trip.
Truncation is optional and disabled by default. See :issue:`319`.

- Fix PostgreSQL to not send the definition of the temporary tables
for every transaction. This is only necessary for the first
transaction.

- Improve handling of commit and rollback, especially on PostgreSQL.
We now generate many fewer unneeded rollbacks. See :issue:`289`.

- Stop checking the status of ``readCurrent`` OIDs twice.

- Make the gevent MySQL driver yield more frequently while getting
large result sets. Previously it would block in C to read the entire
result set. Now it yields according to the cursor&#39;s ``arraysize``.
See :issue:`315`.

- Polling for changes now iterates the cursor instead of using
``fetchall()``. This can reduce memory usage and provide better
behaviour in a concurrent environment, depending on the cursor
implementation.

- Add three environment variables to control the odds of whether any
given poll actually suggests shifted checkpoints. These are all
floating point numbers between 0 and 1. They are
``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL`` (default to 0.7,
i.e., 70%), ``RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT``
(default 0.8) and ``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE``
(default 0.2). (There are corresponding class variables on the
storage cache that could also be set.) Use values of ``1``, ``1``
and ``0`` to restore the old completely deterministic behaviour.
It&#39;s not clear whether these will be useful, so they are not
officially options yet but they may become so. Feedback is
appreciated! See :issue:`323`.

.. note::

  These were removed in 3.0a9.

3.0a7

==================

- Eliminate runtime dependency on ZEO. See :issue:`293`.

- Fix a rare race condition allocating OIDs on MySQL. See
:issue:`283`.

- Optimize the ``loadBefore`` method. It appears to be mostly used in
the tests.

- Fix the blob cache cleanup thread to use a real native thread if
we&#39;re monkey-patched by gevent, using gevent&#39;s thread pool.
Previously, cleaning up the blob cache would block the event loop
for the duration. See :issue:`296`.

- Improve the thread safety and resource usage of blob cache cleanup.
Previously it could spawn many useless threads.

- When caching a newly uploaded blob for a history free storage, if
there&#39;s an older revision of the blob in the cache, and it is not in
use, go ahead and preemptively remove it from disk. This can help
prevent the cache size from growing out of hand and limit the number
of expensive full cache checks required. See :issue:`297`.

- Change the default value of the configuration setting
``shared-blob-dir`` to false, meaning that the default is now to use
a blob cache. If you were using shared blobs before, you&#39;ll need to
explicitly set a value for ``shared-blob-dir`` to ``true`` before
starting RelStorage.

- Add an option, ``blob-cache-size-check-external``, that causes the
blob cache cleanup process to run in a subprocess instead of a
thread. This can free up the storage process to handle requests.
This is not recommended on Windows. (``python -m
relstorage.blobhelper.cached /path/to/cache size_in_bytes`` can be
used to run a manual cleanup at any time. This is currently an
internal implementation detail.)

- Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable ``RELSTORAGE_ABORT_EARLY``. Aborting early releases
database locks to allow other transactions to make progress
immediately. See :issue:`50`.

- Reduce the strength of locks taken by ``Connection.readCurrent`` so
that they don&#39;t conflict with other connections that just want to
verify they haven&#39;t changed. This also lets us immediately detect a
conflict error with an in-progress transaction that is trying to
alter those objects. See :issue:`302`.

- Make databases that use row-level locks (MySQL and PostgreSQL) raise
specific exceptions on failures to acquire those locks. A different
exception is raised for rows a transaction needs to modify compared
to rows it only needs to read. Both are considered transient to
encourage transaction middleware to retry. See :issue:`303`.

- Move more of the vote phase of transaction commit into a database
stored procedure on MySQL and PostgreSQL, beginning with taking the
row-level locks. This eliminates several more database round trips
and the need for the Python thread (or greenlet) to repeatedly
release and then acquire the GIL while holding global locks. See
:issue:`304`.

- Make conflict resolution require fewer database round trips,
especially on PostgreSQL and MySQL, at the expense of using more
memory. In the ideal case it now only needs one (MySQL) or two
(PostgreSQL) queries. Previously it needed at least twice the number
of trips as there were conflicting objects. On both databases, the
benchmarks are 40% to 80% faster (depending on cache configuration).

3.0a6

==================

Enhancements
------------

- Eliminate a few extra round trips to the database on transaction
completion: One extra ``ROLLBACK`` in all databases, and one query
against the ``transaction`` table in history-preserving databases.
See :issue:`159`.

- Prepare more statements used during regular polling.

- Gracefully handle certain disconnected exceptions when rolling back
connections in between transactions. See :issue:`280`.

- Fix a cache error (&quot;TypeError: NoneType object is not
subscriptable&quot;) when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
&lt;https://pypi.org/project/zc.zodbdgc/&gt;`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

  It is critical that ``pack-gc`` be turned off (set to false) in a
  multi-database and that only ``multi-zodb-gc`` be used to perform
  garbage collection.

Packing
~~~~~~~

- Make ``RelStorage.pack()`` also accept a TID from the RelStorage
database to pack to. The usual Unix timestamp form for choosing a
pack time can be ambiguous in the event of multiple transactions
within a very short period of time. This is mostly a concern for
automated tests.

Similarly, it will accept a value less than 0 to mean the most
recent transaction in the database. This is useful when machine
clocks may not be well synchronized, or from automated tests.

Implementation
--------------

- Remove vestigial top-level thread locks. No instance of RelStorage
is thread safe.

RelStorage is an ``IMVCCStorage``, which means that each ZODB
``Connection`` gets its own new storage object. No visible storage
state is shared among Connections. Connections are explicitly
documented as not being thread safe. Since 2.0, RelStorage&#39;s
Connection instances have taken advantage of that fact to be a
little lighter weight through not being thread safe. However, they
still paid the overhead of locking method calls and code complexity.

The top-level storage (the one belonging to a ``ZODB.DB``) still
used heavyweight locks in earlier releases. ``ZODB.DB.storage`` is
documented as being only useful for tests, and the ``DB`` object
itself does not expose any operations that use the storage in a way
that would require thread safety.

The remaining thread safety support has been removed. This
simplifies the code and reduces overhead.

If you were previously using the ``ZODB.DB.storage`` object, or a
``RelStorage`` instance you constructed manually, from multiple
threads, instead make sure each thread has a distinct
``RelStorage.new_instance()`` object.

- A ``RelStorage`` instance now only implements the appropriate subset
of ZODB storage interfaces according to its configuration. For
example, if there is no configured ``blob-dir``, it won&#39;t implement
``IBlobStorage``, and if ``keep-history`` is false, it won&#39;t
implement ``IStorageUndoable``.

- Refactor RelStorage internals for a cleaner separation of concerns.
This includes how (some) queries are written and managed, making it
easier to prepare statements, but only those actually used.


MySQL
-----

- On MySQL, move allocating a TID into the database. On benchmarks
of a local machine this can be a scant few percent faster, but it&#39;s
primarily intended to reduce the number of round-trips to the
database. This is a step towards :issue:`281`. See :pr:`286`.

- On MySQL, set the connection timezone to be UTC. This is necessary
to get values consistent between ``UTC_TIMESTAMP``,
``UNIX_TIMESTAMP``, ``FROM_UNIXTIME``, and Python&#39;s ``time.gmtime``,
as used for comparing TIDs.

- On MySQL, move most steps of finishing a transaction into a stored
procedure. Together with the TID allocation changes, this reduces
the number of database queries from::

 1 to lock
  + 1 to get TID
  + 1 to store transaction (0 in history free)
  + 1 to move states
  + 1 for blobs (2 in history free)
  + 1 to set current (0 in history free)
  + 1 to commit
 = 7 or 6 (in history free)

down to 1. This is expected to be especially helpful for gevent
deployments, as the database lock is held, the transaction finalized
and committed, and the database lock released, all without involving
greenlets or greenlet switches. By allowing the GIL to be released
longer it may also be helpful for threaded environments. See
:issue:`281` and :pr:`287` for benchmarks and specifics.

.. caution::

 MySQL 5.7.18 and earlier contain a severe bug that causes the
 server to crash when the stored procedure is executed.


- Make PyMySQL use the same precision as mysqlclient when sending
floating point parameters.

- Automatically detect when MySQL stored procedures in the database
are out of date with the current source in this package and replace
them.

PostgreSQL
----------

- As for MySQL, move allocating a TID into the database.

- As for MySQL, move most steps of finishing a transaction into a
stored procedure. On psycopg2 and psycopg2cffi this is done in a
single database call. With pg8000, however, it still takes two, with
the second call being the COMMIT call that releases locks.

- Speed up getting the approximate number of objects
(``len(storage)``) in a database by using the estimates collected by
the autovacuum process or analyzing tables, instead of asking for a
full table scan.

3.0a5

==================

- Reduce the time that MySQL will wait to perform OID garbage
collection on startup. See :issue:`271`.

- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
``ProgrammingError: (2014, &quot;Commands out of sync; you can&#39;t run this
command now&quot;)``. See :issue:`270`.

- Fix the &quot;gevent MySQLdb&quot; driver to be cooperative during ``commit``
and ``rollback`` operations. Previously, it would block the event
loop for the entire time it took to send the commit or rollback
request, the server to perform the request, and the result to be
returned. Now, it frees the event loop after sending the request.
See :issue:`272`.

- Call ``set_min_oid`` less often if a storage is just updating
existing objects, not creating its own.

- Fix an occasional possible deadlock in MySQL&#39;s ``set_min_oid``. See
:pr:`276`.

3.0a4

==================

- Add support for the ZODB 5 ``connection.prefetch(*args)`` API. This
takes either OIDs (``obj._p_oid``) or persistent ghost objects, or
an iterator of those things, and asks the storage to load them into
its cache for use in the future. In RelStorage, this uses the shared
cache and so may be useful for more than one thread. This can be
3x or more faster than loading objects on-demand. See :issue:`239`.

- Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL
versions natively handle blobs greater than 2GB in size, and the
server was already chunking the blobs for storage, so our layer of
extra chunking has become unnecessary.

.. important::

  The first time a storage is opened with this version,
  blobs that have multiple chunks will be collapsed into a single
  chunk. If there are many blobs larger than 2GB, this could take
  some time.

  It is recommended you have a backup before installing this
  version.

  To verify that the blobs were correctly migrated, you should
  clean or remove your configured blob-cache directory, forcing new
  blobs to be downloaded.

- Fix a bug that left large objects behind if a PostgreSQL database
containing any blobs was ever zapped (with ``storage.zap_all()``).
The ``zodbconvert`` command, the ``zodbshootout`` command, and the
RelStorage test suite could all zap databases. Running the
``vacuumlo`` command included with PostgreSQL will free such
orphaned large objects, after which a regular ``vacuumdb`` command
can be used to reclaim space. See :issue:`260`.

- Conflict resolution can use data from the cache, thus potentially
eliminating a database hit during a very time-sensitive process.
Please file issues if you encounter any strange behaviour when
concurrently packing to the present time and also resolving
conflicts, in case there are corner cases.

- Packing a storage now invalidates the cached values that were packed
away. For the global caches this helps reduce memory pressure; for
the local cache this helps reduce memory pressure and ensure a more
useful persistent cache (this probably matters most when running on
a single machine).

- Make MySQL use ``ON DUPLICATE KEY UPDATE`` rather than ``REPLACE``.
This can be friendlier to the storage engine as it performs an
in-place ``UPDATE`` rather than a ``DELETE`` followed by an
``INSERT``. See :issue:`189`.

- Make PostgreSQL use an upsert query for moving rows into place on
history-preserving databases.

- Support ZODB 5&#39;s parallel commit feature. This means that the
database-wide commit lock is taken much later in the process, and
held for a much shorter time than before.

Previously, the commit lock was taken during the ``tpc_vote`` phase,
and held while we checked ``Connection.readCurrent`` values, and
checked for (and hopefully resolved) conflicts. Other transaction
resources (such as other ZODB databases in a multi-db setup) then
got to vote while we held this lock. Finally, in ``tpc_finally``,
objects were moved into place and the lock was released. This
prevented any other storage instances from checking for
``readCurrent`` or conflicts while we were doing that.

Now, ``tpc_vote`` is (usually) able to check
``Connection.readCurrent`` and check and resolve conflicts without
taking the commit lock. Only in ``tpc_finish``, when we need to
finally allocate the transaction ID, is the commit lock taken, and
only held for the duration needed to finally move objects into
place. This allows other storages for this database, and other
transaction resources for this transaction, to proceed with voting,
conflict resolution, etc, in parallel.

Consistent results are maintained by use of object-level row
locking. Thus, two transactions that attempt to modify the same
object will now only block each other.

There are two exceptions. First, if the ``storage.restore()`` method
is used, the commit lock must be taken very early (before
``tpc_vote``). This is usually only done as part of copying one
database to another. Second, if the storage is configured with a
shared blob directory instead of a blob cache (meaning that blobs
are *only* stored on the filesystem) and the transaction has added
or mutated blobs, the commit lock must be taken somewhat early to
ensure blobs can be saved (after conflict resolution, etc, but
before the end of ``tpc_vote``). It is recommended to store blobs on
the RDBMS server and use a blob cache. The shared blob layout can be
considered deprecated for this reason).

In addition, the new locking scheme means that packing no longer
needs to acquire a commit lock and more work can proceed in parallel
with regular commits. (Though, there may have been some regressions
in the deletion phase of packing speed MySQL; this has not been
benchmarked.)

.. note::

  If the environment variable ``RELSTORAGE_LOCK_EARLY`` is
  set when RelStorage is imported, then parallel commit will not be
  enabled, and the commit lock will be taken at the beginning of
  the tpc_vote phase, just like before: conflict resolution and
  readCurrent will all be handled with the lock held.

  This is intended for use diagnosing and temporarily working
  around bugs, such as the database driver reporting a deadlock
  error. If you find it necessary to use this setting, please
  report an issue at https://github.com/zodb/relstorage/issues.

See :issue:`125`.

- Deprecate the option ``shared-blob-dir``. Shared blob dirs prevent
using parallel commits when blobs are part of a transaction.

- Remove the &#39;umysqldb&#39; driver option. This driver exhibited failures
with row-level locking used for parallel commits. See :issue:`264`.

- Migrate all remaining MySQL tables to InnoDB. This is primarily the
tables used during packing, but also the table used for allocating
new OIDs.

Tables will be converted the first time a storage is opened that is
allowed to create the schema (``create-schema`` in the
configuration; default is true). For large tables, this may take
some time, so it is recommended to finish any outstanding packs
before upgrading RelStorage.

If schema creation is not allowed, and required tables are not using
InnoDB, an exception will be raised. Please contact the RelStorage
maintainers on GitHub if you have a need to use a storage engine
besides InnoDB.

This allows for better error detection during packing with parallel
commits. It is also required for `MySQL Group Replication
&lt;https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html&gt;`_.
Benchmarking also shows that creating new objects can be up to 15%
faster due to faster OID allocation.

Things to be aware of:

 - MySQL&#39;s `general conversion notes
   &lt;https://dev.mysql.com/doc/refman/8.0/en/converting-tables-to-innodb.html&gt;`_
   suggest that if you had tuned certain server parameters for
   MyISAM tables (which RelStorage only used during packing) it
   might be good to evaluate those parameters again.
 - InnoDB tables may take more disk space than MyISAM tables.
 - The ``new_oid`` table may temporarily have more rows in it at one
   time than before. They will still be garbage collected
   eventually. The change in strategy was necessary to handle
   concurrent transactions better.

See :issue:`188`.

- Fix an ``OperationalError: database is locked`` that could occur on
startup if multiple processes were reading or writing the cache
database. See :issue:`266`.

3.0a3

==================

- Zapping a storage now also removes any persistent cache files. See
:issue:`241`.

- Zapping a MySQL storage now issues ``DROP TABLE`` statements instead
of ``DELETE FROM`` statements. This is much faster on large
databases. See :issue:`242`.

- Workaround the PyPy 7.1 JIT bug using MySQL Connector/Python. It is no
longer necessary to disable the JIT in PyPy 7.1.

- On PostgreSQL, use PostgreSQL&#39;s efficient binary ``COPY FROM`` to
store objects into the database. This can be 20-40% faster. See
:issue:`247`.

- Use more efficient mechanisms to poll the database for current TIDs
when verifying serials in transactions.

- Silence a warning about ``cursor.connection`` from pg8000. See
:issue:`238`.

- Poll the database for the correct TIDs of older transactions when
loading from a persistent cache, and only use the entries if they
are current. This restores the functionality lost in the fix for
:issue:`249`.

- Increase the default cache delta limit sizes.

- Fix a race condition accessing non-shared blobs when the blob cache
limit was reached which could result in blobs appearing to be
spuriously empty. This was only observed on macOS. See :issue:`219`.

- Fix a bug computing the cache delta maps when restoring from
persistent cache that could cause data from a single transaction to
be stale, leading to spurious conflicts.

3.0a2

==================

- Drop support for PostgreSQL versions earlier than 9.6. See
:issue:`220`.

- Make MySQL and PostgreSQL use a prepared statement to get
transaction IDs. PostgreSQL also uses a prepared statement to set
them. This can be slightly faster. See :issue:`246`.

- Make PostgreSQL use a prepared statement to move objects to their
final destination during commit (history free only). See
:issue:`246`.

- Fix an issue with persistent caches written to from multiple
instances sometimes getting stale data after a restart. Note: This
makes the persistent cache less useful for objects that rarely
change in a database that features other actively changing objects;
it is hoped this can be addressed in the future. See :issue:`249`.

3.0a1

==================

- Add support for Python 3.7.

- Drop support for Python 3.4.

- Drop support for Python 2.7.8 and earlier.

- Drop support for ZODB 4 and ZEO 4.

- Officially drop support for versions of MySQL before 5.7.9. We haven&#39;t
been testing on anything older than that for some time, and older
than 5.6 for some time before that.

- Drop the ``poll_interval`` parameter. It has been deprecated with a
warning and ignored since 2.0.0b2. See :issue:`222`.

- Drop support for pg8000 older than 1.11.0.

- Drop support for MySQL Connector/Python older than 8.0.16. Many
older versions are known to be broken. Note that the C extension,
while available, is not currently recommended due to internal
errors. See :issue:`228`.

- Test support for MySQL Connector/Python on PyPy. See :issue:`228`.

.. caution:: Prior to PyPy 7.2 or RelStorage 3.0a3, it is necessary to disable JIT
            inlining due to `a PyPy bug
            &lt;https://bitbucket.org/pypy/pypy/issues/3014/jit-issue-inlining-structunpack-hh&gt;`_
            with ``struct.unpack``.

- Drop support for PyPy older than 5.3.1.

- Drop support for the &quot;MySQL Connector/Python&quot; driver name since it
wasn&#39;t possible to know if it would use the C extension or the
Python implementation. Instead, explicitly use the &#39;Py&#39; or &#39;C&#39;
prefixed name. See :pr:`229`.

- Drop the internal and undocumented environment variables that could be
used to force configurations that did not specify a database driver
to use a specific driver. Instead, list the driver in the database
configuration.

- Opening a RelStorage configuration object read from ZConfig more
than once would lose the database driver setting, reverting to
&#39;auto&#39;. It now retains the setting. See :issue:`231`.

- Fix Python 3 with mysqlclient 1.4. See :issue:`213`.

- Drop support for mysqlclient &lt; 1.4.

- Make driver names in RelStorage configurations case-insensitive
(e.g., &#39;MySQLdb&#39; and &#39;mysqldb&#39; are both valid). See :issue:`227`.

- Rename the column ``transaction.empty`` to ``transaction.is_empty``
for compatibility with MySQL 8.0, where ``empty`` is now a reserved
word. The migration will happen automatically when a storage is
first opened, unless it is configured not to create the schema.

.. note:: This migration has not been tested for Oracle.

.. note:: You must run this migration *before* attempting to upgrade
         a MySQL 5 database to MySQL 8. If you cannot run the
         upgrade through opening the storage, the statement is
         ``ALTER TABLE transaction CHANGE empty is_empty BOOLEAN
         NOT NULL DEFAULT FALSE``.

- Stop getting a warning about invalid optimizer syntax when packing a
MySQL database (especially with the PyMySQL driver). See
:issue:`163`.

- Add ``gevent MySQLdb``, a new driver that cooperates with gevent
while still using the C extensions of ``mysqlclient`` to communicate
with MySQL. This is now recommended over ``umysqldb``, which is
deprecated and will be removed.

- Rewrite the persistent cache implementation. It now is likely to
produce much higher hit rates (100% on some benchmarks, compared to
1-2% before). It is currently slower to read and write, however.
This is a work in progress. See :pr:`243`.

- Add more aggressive validation and, when possible, corrections for
certain types of cache consistency errors. Previously an
``AssertionError`` would be raised with the message &quot;Detected an
inconsistency between RelStorage and the database...&quot;. We now
proactively try harder to avoid that situation based on some
educated guesses about when it could happen, and should it still
happen we now reset the cache and raise a type of ``TransientError``
allowing the application to retry. A few instances where previously
incorrect data could be cached may now raise such a
``TransientError``. See :pr:`245`.

Links

PyPI: https://pypi.org/project/relstorage
Changelog: https://pyup.io/changelogs/relstorage/
Docs: https://relstorage.readthedocs.io/

opened by pyup-bot 1

Bump relstorage[postgresql] from 2.1.1 to 3.4.1
Bumps relstorage[postgresql] from 2.1.1 to 3.4.1.

Changelog

Sourced from relstorage[postgresql]'s changelog.

3.4.1 (2021-04-12)

RelStorage has moved from Travis CI to GitHub Actions for macOS and Linux tests and manylinux wheel building. See 437.

RelStorage is now tested with PostgreSQL 13.1. See 427.

RelStorage is now tested with PyMySQL 1.0. See 434.

Update the bundled boost C++ library from 1.71 to 1.75.

Improve the way store connections are managed to make it less likely a "stale" store connection that hasn't actually been checked for liveness gets used.

3.4.0 (2020-10-19)

Improve the logging of zodbconvert. The regular minute logging contains more information and takes blob sizes into account, and debug logging is more useful, logging about four times a minute. Some extraneous logging was bumped down to trace.

Fix psycopg2 logging debug-level warnings from the PostgreSQL server on transaction commit about not actually being in a transaction. (Sadly this just squashes the warning, it doesn't eliminate the round trip that generates it.)

Improve the performance of packing databases, especially history-free databases. See 275.

Give zodbpack the ability to check for missing references in RelStorages with the --check-refs-only argument. This will perform a pre-pack with GC, and then report on any objects that would be kept and refer to an object that does not exist. This can be much faster than external scripts such as those provided by zc.zodbdgc, though it definitely only reports missing references one level deep.

This is new functionality. Feedback, as always, is very welcome!

Avoid extra pickling operations of transaction meta data extensions by using the new extension_bytes property introduced in ZODB 5.6. This results in higher-fidelity copies of storages, and may slightly improve the speed of the process too. See 424.

Require ZODB 5.6, up from ZODB 5.5. See 424.

Make zodbconvert much faster (around 5 times faster) when the destination is a history-free RelStorage and the source supports record_iternext() (like RelStorage and FileStorage do). This also applies to the copyTransactionsFrom method. This is disabled with the --incremental option, however. Be sure to read the updated zodbconvert documentation.

3.3.2 (2020-09-21)

Fix an UnboundLocalError in case a store connection could not be opened. This error shadowed the original error opening the connection. See 421.

3.3.1 (2020-09-14)

Manylinux wheels: Do not specify the C++ standard to use when compiling. This seemed to result in an incompatibility with manylinux1 systems that was not caught by auditwheel.

3.3.0 (2020-09-14)

The "MySQLdb" driver didn't properly use server-side cursors when requested. This would result in unexpected increased memory usage for things like packing and storage iteration.

Make RelStorage instances implement IStorageCurrentRecordIteration. This lets both history-preserving and history-free storages work with zodbupdate. See 389.

RelStorage instances now pool their storage connection. Depending on the workload and ZODB configuration, this can result in requiring fewer storage connections. See 409 and 417.

There is a potential semantic change: Under some circumstances, the loadBefore and loadSerial methods could be used to load states from the future (not visible to the storage's load connection) by using the store connection. This ability has been removed.

Add support for Python 3.9.

Drop support for Python 3.5.

Build manylinux x86-64 and macOS wheels on Travis CI as part of the release process. These join the Windows wheels in being automatically uploaded to PyPI.

3.2.1 (2020-08-28)

Improve the speed of loading large cache files by reducing the cost of cache validation.

... (truncated)

Commits

4bd8929 Preparing release 3.4.1

426be9f Tell zest.releaser not to create a wheel. CI does that.

5e6ef78 Merge pull request #452 from zodb/better_store_conn_cleanup

825c0aa Improve the way store connections are managed

a8f517a Add implementation notes on readCurrent on PostgreSQL: it does disk IO. [skip...

9e7a638 Add in-process contended local_client benchmarks. [skip ci]

bf567fc Tweak multiplier in testing locks on CI.

0051e3a Merge pull request #447 from zodb/boost_1_75

405febd Fix links in README.rst [skip ci]

998e3d5 Update bundled boost from 1.71 to 1.75.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Update keyring requirement from <22.0,>=17.0.0 to >=17.0.0,<23.0
Updates the requirements on keyring to permit the latest version.

Changelog

Sourced from keyring's changelog.

v22.0.0

Renamed macOS backend from OS_X to macOS. Any users specifying the backend by name will need to use the new name keyring.backends.macOS.

v21.8.0

#438: For better interoperability with other applications, Windows backend now attempts to decode passwords using UTF-8 if UTF-16 decoding fails. Passwords are still stored as UTF-16.

v21.7.0

#437: Package now declares typing support.

v21.6.0

#403: Keyring no longer eagerly initializes the backend on import, but instead defers the backend initialization until a keyring is accessed. Any callers reliant on this early intialization behavior may need to call keyring.core.init_backend() to explicitly initialize the detected backend.

v21.5.0

#474: SecretService and KWallet backends are now disabled if the relevant names are not available on D-Bus. Keyring should now be much more responsive in these environments.

#463: Fixed regression in KWallet get_credential where a simple string was returned instead of a SimpleCredential.

v21.4.0

#431: KWallet backend now supports get_credential.

v21.3.1

#445: Suppress errors when sys.argv is not a list of at least one element.

v21.3.0

#440: Keyring now honors XDG_CONFIG_HOME as ~/.config.

#452: SecretService get_credential now returns None for unmatched query.

v21.2.1

#426: Restored lenience on startup when entry point metadata is missing.

#423: Avoid RecursionError when initializing backends when a limit is supplied.

v21.2.0

... (truncated)

Commits

4aed205 Rename macOS backend to reflect the modern name inclusive of macOS 11.

3355c65 Remove output_password, no longer used for its intended purpose.

1da42cc Switch to argparse

6621222 Refactor cli.CommandLineTool.run for simplicity.

083f5c9 Merge https://github.com/jaraco/skeleton

77fbe1d Use extend-ignore in flake8 config (#33)

a9b3f68 Use license_files instead of license_file in meta (#35)

1731fbe Replace pep517.build with build (#37)

3e876d7 Enable complexity limit. Fixes jaraco/skeleton#34.

02c48f6 Remove legacy metaclass

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Update mechanicalsoup requirement from <0.13.0,>=0.11.0 to >=0.11.0,<1.1.0
Updates the requirements on mechanicalsoup to permit the latest version.

Release notes

Sourced from mechanicalsoup's releases.

Version 1.0.0

This is the last release that will support Python 2.7. Thanks to the many contributors that made this release possible!

Main changes:

Added support for Python 3.8 and 3.9.

StatefulBrowser has new properties page, form, and url, which can be used in place of the methods get_current_page, get_current_form and get_url respectively (e.g. the new x.page is equivalent to x.get_current_page()). These methods may be deprecated in a future release. #175

StatefulBrowser.form will raise an AttributeError instead of returning None if no form has been selected yet. Note that StatefulBrowser.get_current_form() still returns None for backward compatibility.

Bug fixes

Decompose <select> elements with the same name when adding a new input element to a form. #297

The params and data kwargs passed to submit will now properly be forwarded to the underlying request for GET methods (whereas previously params was being overwritten by data). #343

Changelog

Sourced from mechanicalsoup's changelog.

Release Notes

Version 1.1 (in development)

Version 1.0

This is the last release that will support Python 2.7. Thanks to the many contributors that made this release possible!

Main changes:

Added support for Python 3.8 and 3.9.

StatefulBrowser has new properties page, form, and url, which can be used in place of the methods get_current_page, get_current_form and get_url respectively (e.g. the new x.page is equivalent to x.get_current_page()). These methods may be deprecated in a future release. [#175]

StatefulBrowser.form will raise an AttributeError instead of returning None if no form has been selected yet. Note that StatefulBrowser.get_current_form() still returns None for backward compatibility.

Bug fixes

Decompose <select> elements with the same name when adding a new input element to a form. [#297]

The params and data kwargs passed to submit will now properly be forwarded to the underlying request for GET methods (whereas previously params was being overwritten by data). [#343]

Version 0.12

Main changes:

Changes in official python version support: added 3.7 and dropped 3.4.

Added ability to submit a form without updating StatefulBrowser internal state: submit_selected(..., update_state=False). This means you get a response from the form submission, but your browser stays on the same page. Useful for handling forms that result in a file download or open a new tab.

Bug fixes

Improve handling of form enctype to behave like a real browser. [#242]

HTML type attributes are no longer required to be lowercase. [#245]

Form controls with the disabled attribute will no longer be submitted to improve compliance with the HTML standard. If you were relying on this bug to submit disabled elements, you can still achieve this by deleting the disabled attribute from the element in the ~mechanicalsoup.Form object directly. [#248]

When a form containing a file input field is submitted without choosing a file, an empty filename & content will be sent just like in a real browser. [#250]

<option> tags without a value attribute will now use their text as the value. [#252]

The optional url_regex argument to follow_link and download_link was fixed so that it is no longer ignored. [#256]

Allow duplicate submit elements instead of raising a LinkNotFoundError. [#264]

Our thanks to the many new contributors in this release!

Version 0.11

This release focuses on fixing bugs related to uncommon HTTP/HTML scenarios and on improving the documentation.

Bug fixes

Constructing a ~mechanicalsoup.Form instance from a bs4.element.Tag whose tag name is not form will now emit a warning, and may be deprecated in the future. [#228]

Commits

1ab5da4 Release 1.0.0

ad09c3c Improve handling of kwargs passed to Requests (#343)

64a2a59 Add support for Python 3.9 (#347)

e6da833 Add get_request_kwargs() to check before submitting (#339)

5dc6008 Add note that GitHub example fails if 2FA enabled

0a5e179 Link to the stable documentation, rather than the latest one

9c2a5b7 stateful_browser.py: fix flake8 warning (#332)

b6eab9d Tiny fix to set documenation (#331)

424efe3 Merge pull request #325 from hemberger/duckduckgo

7a53fd0 Fix expl_duck_duck_go.py by adding user agent

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1
Bump relstorage[postgresql] from 2.1.1 to 3.4.0
Bumps relstorage[postgresql] from 2.1.1 to 3.4.0.

Changelog

Sourced from relstorage[postgresql]'s changelog.

3.4.0 (2020-10-19)

Improve the logging of zodbconvert. The regular minute logging contains more information and takes blob sizes into account, and debug logging is more useful, logging about four times a minute. Some extraneous logging was bumped down to trace.

Fix psycopg2 logging debug-level warnings from the PostgreSQL server on transaction commit about not actually being in a transaction. (Sadly this just squashes the warning, it doesn't eliminate the round trip that generates it.)

Improve the performance of packing databases, especially history-free databases. See 275.

Give zodbpack the ability to check for missing references in RelStorages with the --check-refs-only argument. This will perform a pre-pack with GC, and then report on any objects that would be kept and refer to an object that does not exist. This can be much faster than external scripts such as those provided by zc.zodbdgc, though it definitely only reports missing references one level deep.

This is new functionality. Feedback, as always, is very welcome!

Avoid extra pickling operations of transaction meta data extensions by using the new extension_bytes property introduced in ZODB 5.6. This results in higher-fidelity copies of storages, and may slightly improve the speed of the process too. See 424.

Require ZODB 5.6, up from ZODB 5.5. See 424.

Make zodbconvert much faster (around 5 times faster) when the destination is a history-free RelStorage and the source supports record_iternext() (like RelStorage and FileStorage do). This also applies to the copyTransactionsFrom method. This is disabled with the --incremental option, however. Be sure to read the updated zodbconvert documentation.

3.3.2 (2020-09-21)

Fix an UnboundLocalError in case a store connection could not be opened. This error shadowed the original error opening the connection. See 421.

3.3.1 (2020-09-14)

Manylinux wheels: Do not specify the C++ standard to use when compiling. This seemed to result in an incompatibility with manylinux1 systems that was not caught by auditwheel.

3.3.0 (2020-09-14)

The "MySQLdb" driver didn't properly use server-side cursors when requested. This would result in unexpected increased memory usage for things like packing and storage iteration.

Make RelStorage instances implement IStorageCurrentRecordIteration. This lets both history-preserving and history-free storages work with zodbupdate. See 389.

RelStorage instances now pool their storage connection. Depending on the workload and ZODB configuration, this can result in requiring fewer storage connections. See 409 and 417.

There is a potential semantic change: Under some circumstances, the loadBefore and loadSerial methods could be used to load states from the future (not visible to the storage's load connection) by using the store connection. This ability has been removed.

Add support for Python 3.9.

Drop support for Python 3.5.

Build manylinux x86-64 and macOS wheels on Travis CI as part of the release process. These join the Windows wheels in being automatically uploaded to PyPI.

3.2.1 (2020-08-28)

Improve the speed of loading large cache files by reducing the cost of cache validation.

The timing metrics for current_object_oids are always collected, not just sampled. MySQL and PostgreSQL will only call this method once at startup during persistent cache validation. Other databases may call this method once during the commit process.

Add the ability to limit how long persistent cache validation will spend polling the database for invalid OIDs. Set the environment variable RS_CACHE_POLL_TIMEOUT to a number of seconds before importing RelStorage to use this.

Avoid an AttributeError if a persistent zope.component site manager is installed as the current site, it's a ghost, and we're making a load query for the first time in a particular connection. See 411.

Add some DEBUG level logging around forced invalidations of persistent object caches due to exceeding the cache MVCC limits. See 338.

3.2.0 (2020-07-20)

Make the gevent psycopg2 driver support critical sections. This reduces the amount of gevent switches that occur while database locks are held under a carefully chosen set of circumstances that attempt to balance overall throughput against latency. See 407.

Commits

9d17577 Preparing release 3.4.0

df5aa16 Merge pull request #431 from zodb/issue425

798d34c coverage

7ee766b Fix for Windows tests: close before attempting removal.

c16c3bc Update zodbconvert logging for fast copies. Also document the (unfortunate) i...

e5fe15d Handle storage wrappers like zc.zlibstorage in copyTransactionsFrom.

5aa7410 More geveric zodbconvert testing: Do a deep comparison of current recurds.

2917373 Fix tests on Python 2 and with PyMySQLConnector.

57fa895 zodbconvert: more testing.

69d9af9 Fix the record_iternext tests and explicitly disable the copy optimizations w...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1

Update relstorage to 3.4.0

This PR updates RelStorage[postgresql] from 2.1.1 to 3.4.0.

Changelog

3.4.0

==================

- Improve the logging of ``zodbconvert``. The regular minute logging
contains more information and takes blob sizes into account, and
debug logging is more useful, logging about four times a minute.
Some extraneous logging was bumped down to trace.

- Fix psycopg2 logging debug-level warnings from the PostgreSQL server
on transaction commit about not actually being in a transaction.
(Sadly this just squashes the warning, it doesn&#39;t eliminate the
round trip that generates it.)

- Improve the performance of packing databases, especially
history-free databases. See :issue:`275`.

- Give ``zodbpack`` the ability to check for missing references in
RelStorages with the ``--check-refs-only`` argument. This will
perform a pre-pack with GC, and then report on any objects that
would be kept and refer to an object that does not exist. This can
be much faster than external scripts such as those provided by
``zc.zodbdgc``, though it definitely only reports missing references
one level deep.

This is new functionality. Feedback, as always, is very welcome!

- Avoid extra pickling operations of transaction meta data extensions
by using the new ``extension_bytes`` property introduced in ZODB
5.6. This results in higher-fidelity copies of storages, and may
slightly improve the speed of the process too. See :issue:`424`.

- Require ZODB 5.6, up from ZODB 5.5. See :issue:`424`.

- Make ``zodbconvert`` *much faster* (around 5 times faster) when the
destination is a history-free RelStorage and the source supports
``record_iternext()`` (like RelStorage and FileStorage do). This
also applies to the ``copyTransactionsFrom`` method. This is disabled
with the ``--incremental`` option, however. Be sure to read the
updated zodbconvert documentation.

3.3.2

==================

- Fix an ``UnboundLocalError`` in case a store connection could not be
opened. This error shadowed the original error opening the
connection. See :issue:`421`.

3.3.1

==================

- Manylinux wheels: Do not specify the C++ standard to use when
compiling. This seemed to result in an incompatibility with
manylinux1 systems that was not caught by ``auditwheel``.

3.3.0

==================

- The &quot;MySQLdb&quot; driver didn&#39;t properly use server-side cursors when
requested. This would result in unexpected increased memory usage
for things like packing and storage iteration.

- Make RelStorage instances implement
``IStorageCurrentRecordIteration``. This lets both
history-preserving and history-free storages work with
``zodbupdate``. See :issue:`389`.

- RelStorage instances now pool their storage connection. Depending on
the workload and ZODB configuration, this can result in requiring
fewer storage connections. See :issue:`409` and :pr:`417`.

There is a potential semantic change: Under some circumstances, the
``loadBefore`` and ``loadSerial`` methods could be used to load
states from the future (not visible to the storage&#39;s load
connection) by using the store connection. This ability has been
removed.

- Add support for Python 3.9.

- Drop support for Python 3.5.

- Build manylinux x86-64 and macOS wheels on Travis CI as part of the
release process. These join the Windows wheels in being
automatically uploaded to PyPI.

3.2.1

==================

- Improve the speed of loading large cache files by reducing the cost
of cache validation.

- The timing metrics for ``current_object_oids`` are always collected,
not just sampled. MySQL and PostgreSQL will only call this method
once at startup during persistent cache validation. Other databases
may call this method once during the commit process.

- Add the ability to limit how long persistent cache validation will
spend polling the database for invalid OIDs. Set the environment
variable ``RS_CACHE_POLL_TIMEOUT`` to a number of seconds before
importing RelStorage to use this.

- Avoid an ``AttributeError`` if a persistent ``zope.component`` site
manager is installed as the current site, it&#39;s a ghost, and we&#39;re
making a load query for the first time in a particular connection.
See :issue:`411`.

- Add some DEBUG level logging around forced invalidations of
persistent object caches due to exceeding the cache MVCC limits. See
:issue:`338`.

3.2.0

==================

- Make the ``gevent psycopg2`` driver support critical sections. This
reduces the amount of gevent switches that occur while database
locks are held under a carefully chosen set of circumstances that
attempt to balance overall throughput against latency. See
:issue:`407`.

- Source distributions: Fix installation when Cython isn&#39;t available.
Previously it incorrectly assumed a &#39;.c&#39; extension which lead to
compiler errors. See :issue:`405`.

- Improve various log messages.

3.1.2

==================

- Fix the psycopg2cffi driver inadvertently depending on the
``psycopg2`` package. See :issue:`403`.
- Make the error messages for unavailable drivers include more
information on underlying causes.
- Log a debug message when an &quot;auto&quot; driver is successfully resolved.
- Add a ``--debug`` argument to the ``zodbconvert`` command line tool
to enable DEBUG level logging.
- Add support for pg8000 1.16. Previously, a ``TypeError`` was raised.

3.1.1

==================

- Add support for pg8000 &gt;= 1.15.3. Previously, a ``TypeError`` was
raised.

- SQLite: Committing a transaction releases some resources sooner.
This makes it more likely that auto-checkpointing of WAL files will be
able to reclaim space in some scenarios. See :issue:`401`.

3.1.0

==================

- Use unsigned BTrees for internal data structures to avoid wrapping
in large databases. Requires BTrees 4.7.2.

3.0.1

==================

- Oracle: Fix an AttributeError saving to Oracle. See :pr:`380` by Mauro
Amico.

- MySQL+gevent: Release the critical section a bit sooner. See :issue:`381`.

- SQLite+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`382`.

- MySQL+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`385`.  This also included
some minor optimizations.

.. caution::

  This introduces a change in a stored procedure that is not
  compatible with older versions of RelStorage. When this version
  is first deployed, if there are older versions of RelStorage
  still running, they will be unable to commit. They will fail with
  a transient conflict error; they may attempt retries, but wil not
  succeed. Read-only transactions will continue to work.

3.0.0

==================

- Build binary wheels for Python 3.8 on Windows.

3.0rc1

===================

- SQLite: Avoid logging (at DEBUG level) an error executing ``PRAGMA
OPTIMIZE`` when closing a read-only (load) connection. Now, the
error is avoided by making the connection writable.

- PostgreSQL: Reduce the load connection&#39;s isolation level from
``SERIALIZABLE`` to ``REPEATABLE READ`` (two of the three other
supported databases also operate at this level). This allows
connecting to hot standby/streaming replicas. Since the connection
is read-only, and there were no other ``SERIALIZABLE`` transactions
(the store connection operates in ``READ COMMITTED`` mode), there
should be no other visible effects. See :issue:`376`.

- PostgreSQL: pg8000: Properly handle a ``port`` specification in the
``dsn`` configuration. See :issue:`378`.

- PostgreSQL: All drivers pass the ``application_name`` parameter at
connect time instead of later. This solves an issue with psycopg2
and psycopg2cffi connecting to hot standbys.

- All databases: If ``create-schema`` is false, use a read-only
connection to verify that the schema is correct.

- Packaging: Prune unused headers from the include/ directory.

3.0b3

==================

- SQLite: Fix a bug that could lead to invalid OIDs being allocated if
transactions were imported from another storage.

3.0b2

==================

- SQLite: Require the database to be in dedicated directory.

.. caution::

  This introduces a change to the &lt;sqlite3&gt; configuration.
  Please review the documentation. It is possible to migrate a
  database created earlier to the new structure, but no automated
  tooling or documentation is provided for that.

- SQLite: Allow configuration of many of SQLite&#39;s PRAGMAs for advanced
tuning.

- SQLite: Fix resetting OIDs when zapping a storage. This could be a
problem for benchmarks.

- SQLite: Fix large prefetches resulting in ``OperationalError``

- SQLite: Improve the speed of copying transactions into a SQLite
storage (e.g., with zodbconvert).

- SQLite: Substantially improve general performance. See :pr:`368`.

- SQLite: Add the ``gevent sqlite3`` driver that periodically yields
to the gevent loop at configurable intervals.

- PostgreSQL: Improve the speed of  writes when using the &#39;gevent
psycopg2&#39; driver.

3.0b1

==================

- Make SQLite and Oracle both use UPSERT queries instead of multiple
database round trips.

- Fix an exception with large transactions on SQLite.

- Fix compiling the C extension on very new versions of Microsoft
Visual Studio.

3.0a13

===================

- Further speed improvements and memory efficiency gains of around 30%
for the cache.

- Restore support for Python 2.7 on Windows.

- No longer require Cython to build from a sdist (.tar.gz).

- Add support for using a SQLite file as a RelStorage backend, if all
processes accessing it will be on a single machine. The advantage
over FileStorage is that multiple processes can use the database
concurrently. To allow multiple processes to use a FileStorage one
must deploy ZEO, even if all processes are on a single machine. See
:pr:`362`.

- Fix and test Oracle. The minimum required cx_oracle is now 6.0.

- Add support for Python 3.8.

3.0a12

===================

- Add the ``gevent psycopg2`` driver to allow using the fast psycopg2
driver with gevent.

- Conflict resolution prefetches data for conflicted objects, reducing
the number of database queries and locks needed.

- Introduce a driver-agnostic method for elevating database connection
priority during critical times of two-phase commit, and implement it
for the ``gevent MySQLdb`` driver. This reduces the amount of gevent
switches that occur while database locks are held under a carefully
chosen set of circumstances that attempt to balance overall
throughput against latency. See :issue:`339`.

- Drop support for Python 2.7 on Windows. The required compiler is
very old. See :issue:`358`.

- Substantially reduce the overhead of the cache, making it mome
memory efficient. Also make it substantially faster. This was done
by rewriting it in C. See :issue:`358`.

3.0a11

===================

- Make ``poll_invalidations`` handle other retryable internal
exceptions besides just ``ReadConflictError`` so they don&#39;t
propagate out to ``transaction.begin()``.

- Make the zodburi resolver entry points not require a specific
RelStorage extra such as &#39;postgres&#39;, in case there is a desire to
use a different database driver than the default that&#39;s installed
with that extra. See :issue:`342`, reported by Éloi Rivard.

- Make the zodburi resolvers accept the &#39;driver&#39; query paramater to
allow selecting a specific driver to use. This functions the same as
in a ZConfig configuration.

- Make the zodburi resolvers more strict on the distinction between
boolean arguments and arbitrary integer arguments. Previously, a
query like ``?read_only=12345&amp;cache_local_mb=yes`` would have been
interpreted as ``True`` and ``1``, respectively. Now it produces errors.

- Fix the calculation of the persistent cache size, especially on
Python 2. This is used to determine when to shrink the disk cache.
See :issue:`317`.

- Fix several race conditions when packing history-free storages
through a combination of changes in ordering and more strongly
consistent (``READ ONLY REPEATABLE READ``) transactions.
Reported in :issue:`325` by krissik with initial PR by Andreas
Gabriel.

- Make ``zodbpack`` pass RelStorage specific options like
``--prepack`` and ``--use-prepack-state`` to the RelStorage, even
when it has been wrapped in a ``zc.zlibstorage``.

- Reduce the amount of memory required to pack a RelStorage through
more careful datastructure choices. On CPython 3, the peak
memory usage of the prepack phase can be up to 9 times less. On
CPython 2, pre-packing a 30MM row storage required 3GB memory; now
it requires about 200MB.

- Use server-side cursors during packing when available, further
reducing the amount of memory required. See :issue:`165`.

- Make history-free database iterators from the same storage use a
consistent view of the database (until a transaction is committed
using the storage or ``sync()`` is called). This prevents data loss
in some cases. See :issue:`344`.

- Make copying transactions *from* a history-free RelStorage (e.g., with
``zodbconvert``) require substantially less memory (75% less).

- Make copying transactions *to* a RelStorage clean up temporary blob
files.

- Make ``zodbconvert`` log progress at intervals instead of for every
transaction. Logging every transaction could add significant overhead
unless stdout was redirected to a file.

- Avoid attempting to lock objects being created. See :issue:`329`.

- Make cache vacuuming faster.

3.0a10

===================

- Fix a bug where the persistent cache might not properly detect
object invalidations if the MVCC index pulled too far ahead at save
time. Now it explicitly checks for invalidations at load time, as
earlier versions did. See :pr:`343`.

- Require perfmetrics 3.0.

3.0a9

==================

- Several minor logging improvements.

- Allow many internal constants to be set with environment variables
at startup for experimentation. These are presently undocumented; if
they prove useful to adjust in different environments they may be
promoted to full configuration options.

- Fix importing RelStorage when ``zope.schema`` is not installed.
``zope.schema`` is intended to be a test dependency and optional for
production deployments. Reported in :issue:`334` by Jonathan Lung.

- Make the gevent MySQL driver more efficient at avoiding needless  waits.

- Due to a bug in MySQL (incorrectly rounding the &#39;minute&#39; value of a
timestamp up), TIDs generated in the last half second of a minute
would suddenly jump ahead by 4,266,903,756 integers (a full minute).

- Fix leaking an internal value for ``innodb_lock_timeout`` across
commits on MySQL. This could lead to ``tpc_vote`` blocking longer
than desired. See :issue:`331`.

- Fix ``undo`` to purge the objects whose transaction was revoked from
the cache.

- Make historical storages read-only, raising
``ReadOnlyHistoryError``, during the commit process. Previously this
was only enforced at the ``Connection`` level.

- Rewrite the cache to understand the MVCC nature of the connections
that use it.

This eliminates the use of &quot;checkpoints.&quot; Checkpoints established a
sort of index for objects to allow them to be found in the cache
without necessarily knowing their ``_p_serial`` value. To achieve
good hit rates in large databases, large values for the
``cache-delta-size-limit`` were needed, but if there were lots of
writes, polling to update those large checkpoints could become very
expensive. Because checkpoints were separate in each ZODB connection
in a process, and because when one connection changed its
checkpoints every other connection would also change its checkpoints
on the next access, this could quickly become a problem in highly
concurrent environments (many connections making many large database
queries at the same time). See :issue:`311`.

The new system uses a series of chained maps representing polling
points to build the same index data. All connections can share all
the maps for their view of the database and earlier. New polls add
new maps to the front of the list as needed, and old mapps are
removed once they are no longer needed by any active transaction.
This simulates the underlying database&#39;s MVCC approach.

Other benefits of this approach include:

- No more large polls. While each connection still polls for each
 transaction it enters, they now share state and only poll against
 the last time a poll occurred, not the last time they were used.
 The result should be smaller, more predictable polling.

- Having a model of object visibility allows the cache to use more
 efficient data structures: it can now use the smaller LOBTree to
 reduce the memory occupied by the cache. It also requires
 fewer cache entries overall to store multiple revisions of an
 object, reducing the overhead. And there are no more key copies
 required after a checkpoint change, again reducing overhead and
 making the LRU algorithm more efficient.

- The cache&#39;s LRU algorithm is now at the object level, not the
 object/serial pair.

- Objects that are known to have been changed but whose old revision
 is still in the cache are preemptively removed when no references
 to them are possible, reducing cache memory usage.

- The persistent cache can now guarantee not to write out data that
 it knows to be stale.

Dropping checkpoints probably makes memcache less effective, but
memcache hasn&#39;t been recommended for awhile.

3.0a8

==================

- Improve the safety of the persistent local cache in high-concurrency
environments using older versions of SQLite. Perform a quick
integrity check on startup and refuse to use the cache files if they
are reported corrupt.

- Switch the order in which object locks are taken: try shared locks
first and only then attempt exclusive locks. Shared locks do not
have to block, so a quick lock timeout here means that a
``ReadConflictError`` is inevitable. This works best on PostgreSQL
and MySQL 8, which support true non-blocking locks. On MySQL 5.7,
non-blocking locks are emulated with a 1s timeout. See :issue:`310`.

.. note:: The transaction machinery will retry read conflict errors
         by default. The more rapid detection of them may lead to
         extra retries if there was a process still finishing its
         commit. Consider adding small sleep backoffs to retry
         logic.

- Fix MySQL to immediately rollback its transaction when it gets a
lock timeout, while still in the stored procedure on the database.
Previously it would have required a round trip to the Python
process, which could take an arbitrary amount of time while the
transaction may have still been holding some locks. (After
:issue:`310` they would only be shared locks, but before they would
have been exclusive locks.) This should make for faster recovery in
heavily loaded environments with lots of conflicts. See :issue:`313`.

- Make MySQL clear its temp tables using a single round trip.
Truncation is optional and disabled by default. See :issue:`319`.

- Fix PostgreSQL to not send the definition of the temporary tables
for every transaction. This is only necessary for the first
transaction.

- Improve handling of commit and rollback, especially on PostgreSQL.
We now generate many fewer unneeded rollbacks. See :issue:`289`.

- Stop checking the status of ``readCurrent`` OIDs twice.

- Make the gevent MySQL driver yield more frequently while getting
large result sets. Previously it would block in C to read the entire
result set. Now it yields according to the cursor&#39;s ``arraysize``.
See :issue:`315`.

- Polling for changes now iterates the cursor instead of using
``fetchall()``. This can reduce memory usage and provide better
behaviour in a concurrent environment, depending on the cursor
implementation.

- Add three environment variables to control the odds of whether any
given poll actually suggests shifted checkpoints. These are all
floating point numbers between 0 and 1. They are
``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL`` (default to 0.7,
i.e., 70%), ``RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT``
(default 0.8) and ``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE``
(default 0.2). (There are corresponding class variables on the
storage cache that could also be set.) Use values of ``1``, ``1``
and ``0`` to restore the old completely deterministic behaviour.
It&#39;s not clear whether these will be useful, so they are not
officially options yet but they may become so. Feedback is
appreciated! See :issue:`323`.

.. note::

  These were removed in 3.0a9.

3.0a7

==================

- Eliminate runtime dependency on ZEO. See :issue:`293`.

- Fix a rare race condition allocating OIDs on MySQL. See
:issue:`283`.

- Optimize the ``loadBefore`` method. It appears to be mostly used in
the tests.

- Fix the blob cache cleanup thread to use a real native thread if
we&#39;re monkey-patched by gevent, using gevent&#39;s thread pool.
Previously, cleaning up the blob cache would block the event loop
for the duration. See :issue:`296`.

- Improve the thread safety and resource usage of blob cache cleanup.
Previously it could spawn many useless threads.

- When caching a newly uploaded blob for a history free storage, if
there&#39;s an older revision of the blob in the cache, and it is not in
use, go ahead and preemptively remove it from disk. This can help
prevent the cache size from growing out of hand and limit the number
of expensive full cache checks required. See :issue:`297`.

- Change the default value of the configuration setting
``shared-blob-dir`` to false, meaning that the default is now to use
a blob cache. If you were using shared blobs before, you&#39;ll need to
explicitly set a value for ``shared-blob-dir`` to ``true`` before
starting RelStorage.

- Add an option, ``blob-cache-size-check-external``, that causes the
blob cache cleanup process to run in a subprocess instead of a
thread. This can free up the storage process to handle requests.
This is not recommended on Windows. (``python -m
relstorage.blobhelper.cached /path/to/cache size_in_bytes`` can be
used to run a manual cleanup at any time. This is currently an
internal implementation detail.)

- Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable ``RELSTORAGE_ABORT_EARLY``. Aborting early releases
database locks to allow other transactions to make progress
immediately. See :issue:`50`.

- Reduce the strength of locks taken by ``Connection.readCurrent`` so
that they don&#39;t conflict with other connections that just want to
verify they haven&#39;t changed. This also lets us immediately detect a
conflict error with an in-progress transaction that is trying to
alter those objects. See :issue:`302`.

- Make databases that use row-level locks (MySQL and PostgreSQL) raise
specific exceptions on failures to acquire those locks. A different
exception is raised for rows a transaction needs to modify compared
to rows it only needs to read. Both are considered transient to
encourage transaction middleware to retry. See :issue:`303`.

- Move more of the vote phase of transaction commit into a database
stored procedure on MySQL and PostgreSQL, beginning with taking the
row-level locks. This eliminates several more database round trips
and the need for the Python thread (or greenlet) to repeatedly
release and then acquire the GIL while holding global locks. See
:issue:`304`.

- Make conflict resolution require fewer database round trips,
especially on PostgreSQL and MySQL, at the expense of using more
memory. In the ideal case it now only needs one (MySQL) or two
(PostgreSQL) queries. Previously it needed at least twice the number
of trips as there were conflicting objects. On both databases, the
benchmarks are 40% to 80% faster (depending on cache configuration).

3.0a6

==================

Enhancements
------------

- Eliminate a few extra round trips to the database on transaction
completion: One extra ``ROLLBACK`` in all databases, and one query
against the ``transaction`` table in history-preserving databases.
See :issue:`159`.

- Prepare more statements used during regular polling.

- Gracefully handle certain disconnected exceptions when rolling back
connections in between transactions. See :issue:`280`.

- Fix a cache error (&quot;TypeError: NoneType object is not
subscriptable&quot;) when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
&lt;https://pypi.org/project/zc.zodbdgc/&gt;`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

  It is critical that ``pack-gc`` be turned off (set to false) in a
  multi-database and that only ``multi-zodb-gc`` be used to perform
  garbage collection.

Packing
~~~~~~~

- Make ``RelStorage.pack()`` also accept a TID from the RelStorage
database to pack to. The usual Unix timestamp form for choosing a
pack time can be ambiguous in the event of multiple transactions
within a very short period of time. This is mostly a concern for
automated tests.

Similarly, it will accept a value less than 0 to mean the most
recent transaction in the database. This is useful when machine
clocks may not be well synchronized, or from automated tests.

Implementation
--------------

- Remove vestigial top-level thread locks. No instance of RelStorage
is thread safe.

RelStorage is an ``IMVCCStorage``, which means that each ZODB
``Connection`` gets its own new storage object. No visible storage
state is shared among Connections. Connections are explicitly
documented as not being thread safe. Since 2.0, RelStorage&#39;s
Connection instances have taken advantage of that fact to be a
little lighter weight through not being thread safe. However, they
still paid the overhead of locking method calls and code complexity.

The top-level storage (the one belonging to a ``ZODB.DB``) still
used heavyweight locks in earlier releases. ``ZODB.DB.storage`` is
documented as being only useful for tests, and the ``DB`` object
itself does not expose any operations that use the storage in a way
that would require thread safety.

The remaining thread safety support has been removed. This
simplifies the code and reduces overhead.

If you were previously using the ``ZODB.DB.storage`` object, or a
``RelStorage`` instance you constructed manually, from multiple
threads, instead make sure each thread has a distinct
``RelStorage.new_instance()`` object.

- A ``RelStorage`` instance now only implements the appropriate subset
of ZODB storage interfaces according to its configuration. For
example, if there is no configured ``blob-dir``, it won&#39;t implement
``IBlobStorage``, and if ``keep-history`` is false, it won&#39;t
implement ``IStorageUndoable``.

- Refactor RelStorage internals for a cleaner separation of concerns.
This includes how (some) queries are written and managed, making it
easier to prepare statements, but only those actually used.


MySQL
-----

- On MySQL, move allocating a TID into the database. On benchmarks
of a local machine this can be a scant few percent faster, but it&#39;s
primarily intended to reduce the number of round-trips to the
database. This is a step towards :issue:`281`. See :pr:`286`.

- On MySQL, set the connection timezone to be UTC. This is necessary
to get values consistent between ``UTC_TIMESTAMP``,
``UNIX_TIMESTAMP``, ``FROM_UNIXTIME``, and Python&#39;s ``time.gmtime``,
as used for comparing TIDs.

- On MySQL, move most steps of finishing a transaction into a stored
procedure. Together with the TID allocation changes, this reduces
the number of database queries from::

 1 to lock
  + 1 to get TID
  + 1 to store transaction (0 in history free)
  + 1 to move states
  + 1 for blobs (2 in history free)
  + 1 to set current (0 in history free)
  + 1 to commit
 = 7 or 6 (in history free)

down to 1. This is expected to be especially helpful for gevent
deployments, as the database lock is held, the transaction finalized
and committed, and the database lock released, all without involving
greenlets or greenlet switches. By allowing the GIL to be released
longer it may also be helpful for threaded environments. See
:issue:`281` and :pr:`287` for benchmarks and specifics.

.. caution::

 MySQL 5.7.18 and earlier contain a severe bug that causes the
 server to crash when the stored procedure is executed.


- Make PyMySQL use the same precision as mysqlclient when sending
floating point parameters.

- Automatically detect when MySQL stored procedures in the database
are out of date with the current source in this package and replace
them.

PostgreSQL
----------

- As for MySQL, move allocating a TID into the database.

- As for MySQL, move most steps of finishing a transaction into a
stored procedure. On psycopg2 and psycopg2cffi this is done in a
single database call. With pg8000, however, it still takes two, with
the second call being the COMMIT call that releases locks.

- Speed up getting the approximate number of objects
(``len(storage)``) in a database by using the estimates collected by
the autovacuum process or analyzing tables, instead of asking for a
full table scan.

3.0a5

==================

- Reduce the time that MySQL will wait to perform OID garbage
collection on startup. See :issue:`271`.

- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
``ProgrammingError: (2014, &quot;Commands out of sync; you can&#39;t run this
command now&quot;)``. See :issue:`270`.

- Fix the &quot;gevent MySQLdb&quot; driver to be cooperative during ``commit``
and ``rollback`` operations. Previously, it would block the event
loop for the entire time it took to send the commit or rollback
request, the server to perform the request, and the result to be
returned. Now, it frees the event loop after sending the request.
See :issue:`272`.

- Call ``set_min_oid`` less often if a storage is just updating
existing objects, not creating its own.

- Fix an occasional possible deadlock in MySQL&#39;s ``set_min_oid``. See
:pr:`276`.

3.0a4

==================

- Add support for the ZODB 5 ``connection.prefetch(*args)`` API. This
takes either OIDs (``obj._p_oid``) or persistent ghost objects, or
an iterator of those things, and asks the storage to load them into
its cache for use in the future. In RelStorage, this uses the shared
cache and so may be useful for more than one thread. This can be
3x or more faster than loading objects on-demand. See :issue:`239`.

- Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL
versions natively handle blobs greater than 2GB in size, and the
server was already chunking the blobs for storage, so our layer of
extra chunking has become unnecessary.

.. important::

  The first time a storage is opened with this version,
  blobs that have multiple chunks will be collapsed into a single
  chunk. If there are many blobs larger than 2GB, this could take
  some time.

  It is recommended you have a backup before installing this
  version.

  To verify that the blobs were correctly migrated, you should
  clean or remove your configured blob-cache directory, forcing new
  blobs to be downloaded.

- Fix a bug that left large objects behind if a PostgreSQL database
containing any blobs was ever zapped (with ``storage.zap_all()``).
The ``zodbconvert`` command, the ``zodbshootout`` command, and the
RelStorage test suite could all zap databases. Running the
``vacuumlo`` command included with PostgreSQL will free such
orphaned large objects, after which a regular ``vacuumdb`` command
can be used to reclaim space. See :issue:`260`.

- Conflict resolution can use data from the cache, thus potentially
eliminating a database hit during a very time-sensitive process.
Please file issues if you encounter any strange behaviour when
concurrently packing to the present time and also resolving
conflicts, in case there are corner cases.

- Packing a storage now invalidates the cached values that were packed
away. For the global caches this helps reduce memory pressure; for
the local cache this helps reduce memory pressure and ensure a more
useful persistent cache (this probably matters most when running on
a single machine).

- Make MySQL use ``ON DUPLICATE KEY UPDATE`` rather than ``REPLACE``.
This can be friendlier to the storage engine as it performs an
in-place ``UPDATE`` rather than a ``DELETE`` followed by an
``INSERT``. See :issue:`189`.

- Make PostgreSQL use an upsert query for moving rows into place on
history-preserving databases.

- Support ZODB 5&#39;s parallel commit feature. This means that the
database-wide commit lock is taken much later in the process, and
held for a much shorter time than before.

Previously, the commit lock was taken during the ``tpc_vote`` phase,
and held while we checked ``Connection.readCurrent`` values, and
checked for (and hopefully resolved) conflicts. Other transaction
resources (such as other ZODB databases in a multi-db setup) then
got to vote while we held this lock. Finally, in ``tpc_finally``,
objects were moved into place and the lock was released. This
prevented any other storage instances from checking for
``readCurrent`` or conflicts while we were doing that.

Now, ``tpc_vote`` is (usually) able to check
``Connection.readCurrent`` and check and resolve conflicts without
taking the commit lock. Only in ``tpc_finish``, when we need to
finally allocate the transaction ID, is the commit lock taken, and
only held for the duration needed to finally move objects into
place. This allows other storages for this database, and other
transaction resources for this transaction, to proceed with voting,
conflict resolution, etc, in parallel.

Consistent results are maintained by use of object-level row
locking. Thus, two transactions that attempt to modify the same
object will now only block each other.

There are two exceptions. First, if the ``storage.restore()`` method
is used, the commit lock must be taken very early (before
``tpc_vote``). This is usually only done as part of copying one
database to another. Second, if the storage is configured with a
shared blob directory instead of a blob cache (meaning that blobs
are *only* stored on the filesystem) and the transaction has added
or mutated blobs, the commit lock must be taken somewhat early to
ensure blobs can be saved (after conflict resolution, etc, but
before the end of ``tpc_vote``). It is recommended to store blobs on
the RDBMS server and use a blob cache. The shared blob layout can be
considered deprecated for this reason).

In addition, the new locking scheme means that packing no longer
needs to acquire a commit lock and more work can proceed in parallel
with regular commits. (Though, there may have been some regressions
in the deletion phase of packing speed MySQL; this has not been
benchmarked.)

.. note::

  If the environment variable ``RELSTORAGE_LOCK_EARLY`` is
  set when RelStorage is imported, then parallel commit will not be
  enabled, and the commit lock will be taken at the beginning of
  the tpc_vote phase, just like before: conflict resolution and
  readCurrent will all be handled with the lock held.

  This is intended for use diagnosing and temporarily working
  around bugs, such as the database driver reporting a deadlock
  error. If you find it necessary to use this setting, please
  report an issue at https://github.com/zodb/relstorage/issues.

See :issue:`125`.

- Deprecate the option ``shared-blob-dir``. Shared blob dirs prevent
using parallel commits when blobs are part of a transaction.

- Remove the &#39;umysqldb&#39; driver option. This driver exhibited failures
with row-level locking used for parallel commits. See :issue:`264`.

- Migrate all remaining MySQL tables to InnoDB. This is primarily the
tables used during packing, but also the table used for allocating
new OIDs.

Tables will be converted the first time a storage is opened that is
allowed to create the schema (``create-schema`` in the
configuration; default is true). For large tables, this may take
some time, so it is recommended to finish any outstanding packs
before upgrading RelStorage.

If schema creation is not allowed, and required tables are not using
InnoDB, an exception will be raised. Please contact the RelStorage
maintainers on GitHub if you have a need to use a storage engine
besides InnoDB.

This allows for better error detection during packing with parallel
commits. It is also required for `MySQL Group Replication
&lt;https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html&gt;`_.
Benchmarking also shows that creating new objects can be up to 15%
faster due to faster OID allocation.

Things to be aware of:

 - MySQL&#39;s `general conversion notes
   &lt;https://dev.mysql.com/doc/refman/8.0/en/converting-tables-to-innodb.html&gt;`_
   suggest that if you had tuned certain server parameters for
   MyISAM tables (which RelStorage only used during packing) it
   might be good to evaluate those parameters again.
 - InnoDB tables may take more disk space than MyISAM tables.
 - The ``new_oid`` table may temporarily have more rows in it at one
   time than before. They will still be garbage collected
   eventually. The change in strategy was necessary to handle
   concurrent transactions better.

See :issue:`188`.

- Fix an ``OperationalError: database is locked`` that could occur on
startup if multiple processes were reading or writing the cache
database. See :issue:`266`.

3.0a3

==================

- Zapping a storage now also removes any persistent cache files. See
:issue:`241`.

- Zapping a MySQL storage now issues ``DROP TABLE`` statements instead
of ``DELETE FROM`` statements. This is much faster on large
databases. See :issue:`242`.

- Workaround the PyPy 7.1 JIT bug using MySQL Connector/Python. It is no
longer necessary to disable the JIT in PyPy 7.1.

- On PostgreSQL, use PostgreSQL&#39;s efficient binary ``COPY FROM`` to
store objects into the database. This can be 20-40% faster. See
:issue:`247`.

- Use more efficient mechanisms to poll the database for current TIDs
when verifying serials in transactions.

- Silence a warning about ``cursor.connection`` from pg8000. See
:issue:`238`.

- Poll the database for the correct TIDs of older transactions when
loading from a persistent cache, and only use the entries if they
are current. This restores the functionality lost in the fix for
:issue:`249`.

- Increase the default cache delta limit sizes.

- Fix a race condition accessing non-shared blobs when the blob cache
limit was reached which could result in blobs appearing to be
spuriously empty. This was only observed on macOS. See :issue:`219`.

- Fix a bug computing the cache delta maps when restoring from
persistent cache that could cause data from a single transaction to
be stale, leading to spurious conflicts.

3.0a2

==================

- Drop support for PostgreSQL versions earlier than 9.6. See
:issue:`220`.

- Make MySQL and PostgreSQL use a prepared statement to get
transaction IDs. PostgreSQL also uses a prepared statement to set
them. This can be slightly faster. See :issue:`246`.

- Make PostgreSQL use a prepared statement to move objects to their
final destination during commit (history free only). See
:issue:`246`.

- Fix an issue with persistent caches written to from multiple
instances sometimes getting stale data after a restart. Note: This
makes the persistent cache less useful for objects that rarely
change in a database that features other actively changing objects;
it is hoped this can be addressed in the future. See :issue:`249`.

3.0a1

==================

- Add support for Python 3.7.

- Drop support for Python 3.4.

- Drop support for Python 2.7.8 and earlier.

- Drop support for ZODB 4 and ZEO 4.

- Officially drop support for versions of MySQL before 5.7.9. We haven&#39;t
been testing on anything older than that for some time, and older
than 5.6 for some time before that.

- Drop the ``poll_interval`` parameter. It has been deprecated with a
warning and ignored since 2.0.0b2. See :issue:`222`.

- Drop support for pg8000 older than 1.11.0.

- Drop support for MySQL Connector/Python older than 8.0.16. Many
older versions are known to be broken. Note that the C extension,
while available, is not currently recommended due to internal
errors. See :issue:`228`.

- Test support for MySQL Connector/Python on PyPy. See :issue:`228`.

.. caution:: Prior to PyPy 7.2 or RelStorage 3.0a3, it is necessary to disable JIT
            inlining due to `a PyPy bug
            &lt;https://bitbucket.org/pypy/pypy/issues/3014/jit-issue-inlining-structunpack-hh&gt;`_
            with ``struct.unpack``.

- Drop support for PyPy older than 5.3.1.

- Drop support for the &quot;MySQL Connector/Python&quot; driver name since it
wasn&#39;t possible to know if it would use the C extension or the
Python implementation. Instead, explicitly use the &#39;Py&#39; or &#39;C&#39;
prefixed name. See :pr:`229`.

- Drop the internal and undocumented environment variables that could be
used to force configurations that did not specify a database driver
to use a specific driver. Instead, list the driver in the database
configuration.

- Opening a RelStorage configuration object read from ZConfig more
than once would lose the database driver setting, reverting to
&#39;auto&#39;. It now retains the setting. See :issue:`231`.

- Fix Python 3 with mysqlclient 1.4. See :issue:`213`.

- Drop support for mysqlclient &lt; 1.4.

- Make driver names in RelStorage configurations case-insensitive
(e.g., &#39;MySQLdb&#39; and &#39;mysqldb&#39; are both valid). See :issue:`227`.

- Rename the column ``transaction.empty`` to ``transaction.is_empty``
for compatibility with MySQL 8.0, where ``empty`` is now a reserved
word. The migration will happen automatically when a storage is
first opened, unless it is configured not to create the schema.

.. note:: This migration has not been tested for Oracle.

.. note:: You must run this migration *before* attempting to upgrade
         a MySQL 5 database to MySQL 8. If you cannot run the
         upgrade through opening the storage, the statement is
         ``ALTER TABLE transaction CHANGE empty is_empty BOOLEAN
         NOT NULL DEFAULT FALSE``.

- Stop getting a warning about invalid optimizer syntax when packing a
MySQL database (especially with the PyMySQL driver). See
:issue:`163`.

- Add ``gevent MySQLdb``, a new driver that cooperates with gevent
while still using the C extensions of ``mysqlclient`` to communicate
with MySQL. This is now recommended over ``umysqldb``, which is
deprecated and will be removed.

- Rewrite the persistent cache implementation. It now is likely to
produce much higher hit rates (100% on some benchmarks, compared to
1-2% before). It is currently slower to read and write, however.
This is a work in progress. See :pr:`243`.

- Add more aggressive validation and, when possible, corrections for
certain types of cache consistency errors. Previously an
``AssertionError`` would be raised with the message &quot;Detected an
inconsistency between RelStorage and the database...&quot;. We now
proactively try harder to avoid that situation based on some
educated guesses about when it could happen, and should it still
happen we now reset the cache and raise a type of ``TransientError``
allowing the application to retry. A few instances where previously
incorrect data could be cached may now raise such a
``TransientError``. See :pr:`245`.

Links

PyPI: https://pypi.org/project/relstorage
Changelog: https://pyup.io/changelogs/relstorage/
Docs: https://relstorage.readthedocs.io/

opened by pyup-bot 1

Bump relstorage[postgresql] from 2.1.1 to 3.3.2
Bumps relstorage[postgresql] from 2.1.1 to 3.3.2.

Changelog

Sourced from relstorage[postgresql]'s changelog.

3.3.2 (2020-09-21)

Fix an UnboundLocalError in case a store connection could not be opened. This error shadowed the original error opening the connection. See 421.

3.3.1 (2020-09-14)

Manylinux wheels: Do not specify the C++ standard to use when compiling. This seemed to result in an incompatibility with manylinux1 systems that was not caught by auditwheel.

3.3.0 (2020-09-14)

The "MySQLdb" driver didn't properly use server-side cursors when requested. This would result in unexpected increased memory usage for things like packing and storage iteration.

Make RelStorage instances implement IStorageCurrentRecordIteration. This lets both history-preserving and history-free storages work with zodbupdate. See 389.

RelStorage instances now pool their storage connection. Depending on the workload and ZODB configuration, this can result in requiring fewer storage connections. See 409 and 417.

There is a potential semantic change: Under some circumstances, the loadBefore and loadSerial methods could be used to load states from the future (not visible to the storage's load connection) by using the store connection. This ability has been removed.

Add support for Python 3.9.

Drop support for Python 3.5.

Build manylinux x86-64 and macOS wheels on Travis CI as part of the release process. These join the Windows wheels in being automatically uploaded to PyPI.

3.2.1 (2020-08-28)

Improve the speed of loading large cache files by reducing the cost of cache validation.

The timing metrics for current_object_oids are always collected, not just sampled. MySQL and PostgreSQL will only call this method once at startup during persistent cache validation. Other databases may call this method once during the commit process.

Add the ability to limit how long persistent cache validation will spend polling the database for invalid OIDs. Set the environment variable RS_CACHE_POLL_TIMEOUT to a number of seconds before importing RelStorage to use this.

Avoid an AttributeError if a persistent zope.component site manager is installed as the current site, it's a ghost, and we're making a load query for the first time in a particular connection. See 411.

Add some DEBUG level logging around forced invalidations of persistent object caches due to exceeding the cache MVCC limits. See 338.

3.2.0 (2020-07-20)

Make the gevent psycopg2 driver support critical sections. This reduces the amount of gevent switches that occur while database locks are held under a carefully chosen set of circumstances that attempt to balance overall throughput against latency. See 407.

Source distributions: Fix installation when Cython isn't available. Previously it incorrectly assumed a '.c' extension which lead to compiler errors. See 405.

Improve various log messages.

3.1.2 (2020-07-14)

Fix the psycopg2cffi driver inadvertently depending on the psycopg2 package. See 403.

Make the error messages for unavailable drivers include more information on underlying causes.

Log a debug message when an "auto" driver is successfully resolved.

Add a --debug argument to the zodbconvert command line tool to enable DEBUG level logging.

Add support for pg8000 1.16. Previously, a TypeError was raised.

3.1.1 (2020-07-02)

Commits

eca5589 Preparing release 3.3.2

e5e49ea Merge pull request #422 from zodb/issue421

4aef41f Fix UnboundLocalError in StoreConnectionPool.borrowing.

b030d57 Back to development: 3.3.2

a8e2f6f Preparing release 3.3.1

b5f7bab Manylinux wheels: Try not specifying -std=gnu++11

127184d Back to development: 3.3.1

399d08c Preparing release 3.3.0

0026fe1 Add FAQ on backing up a sqlite RelStorage. [skip ci]

f0bb9bb Merge pull request #419 from zodb/wheels-on-travis

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 1

Pin coverage to latest version 7.0.4

This PR pins coverage to the latest release 7.0.4.

Changelog

7.0.4

--------------------------

- Performance: an internal cache of file names was accidentally disabled,
resulting in sometimes drastic reductions in performance.  This is now fixed,
closing `issue 1527`_.   Thanks to Ivan Ciuvalschii for the reproducible test
case.

.. _issue 1527: https://github.com/nedbat/coveragepy/issues/1527


.. _changes_7-0-3:

7.0.3

--------------------------

- Fix: when using pytest-cov or pytest-xdist, or perhaps both, the combining
step could fail with ``assert row is not None`` using 7.0.2.  This was due to
a race condition that has always been possible and is still possible. In
7.0.1 and before, the error was silently swallowed by the combining code.
Now it will produce a message &quot;Couldn&#x27;t combine data file&quot; and ignore the
data file as it used to do before 7.0.2.  Closes `issue 1522`_.

.. _issue 1522: https://github.com/nedbat/coveragepy/issues/1522


.. _changes_7-0-2:

7.0.2

--------------------------

- Fix: when using the ``[run] relative_files = True`` setting, a relative
``[paths]`` pattern was still being made absolute.  This is now fixed,
closing `issue 1519`_.

- Fix: if Python doesn&#x27;t provide tomllib, then TOML configuration files can
only be read if coverage.py is installed with the ``[toml]`` extra.
Coverage.py will raise an error if TOML support is not installed when it sees
your settings are in a .toml file. But it didn&#x27;t understand that
``[tools.coverage]`` was a valid section header, so the error wasn&#x27;t reported
if you used that header, and settings were silently ignored.  This is now
fixed, closing `issue 1516`_.

- Fix: adjusted how decorators are traced on PyPy 7.3.10, fixing `issue 1515`_.

- Fix: the ``coverage lcov`` report did not properly implement the
``--fail-under=MIN`` option.  This has been fixed.

- Refactor: added many type annotations, including a number of refactorings.
This should not affect outward behavior, but they were a bit invasive in some
places, so keep your eyes peeled for oddities.

- Refactor: removed the vestigial and long untested support for Jython and
IronPython.

.. _issue 1515: https://github.com/nedbat/coveragepy/issues/1515
.. _issue 1516: https://github.com/nedbat/coveragepy/issues/1516
.. _issue 1519: https://github.com/nedbat/coveragepy/issues/1519


.. _changes_7-0-1:

7.0.1

--------------------------

- When checking if a file mapping resolved to a file that exists, we weren&#x27;t
considering files in .whl files.  This is now fixed, closing `issue 1511`_.

- File pattern rules were too strict, forbidding plus signs and curly braces in
directory and file names.  This is now fixed, closing `issue 1513`_.

- Unusual Unicode or control characters in source files could prevent
reporting.  This is now fixed, closing `issue 1512`_.

- The PyPy wheel now installs on PyPy 3.7, 3.8, and 3.9, closing `issue 1510`_.

.. _issue 1510: https://github.com/nedbat/coveragepy/issues/1510
.. _issue 1511: https://github.com/nedbat/coveragepy/issues/1511
.. _issue 1512: https://github.com/nedbat/coveragepy/issues/1512
.. _issue 1513: https://github.com/nedbat/coveragepy/issues/1513


.. _changes_7-0-0:

7.0.0

--------------------------

Nothing new beyond 7.0.0b1.


.. _changes_7-0-0b1:

7.0.0b1

&lt;changes_7-0-0b1_&gt;`_.)

- Changes to file pattern matching, which might require updating your
configuration:

- Previously, ``*`` would incorrectly match directory separators, making
 precise matching difficult.  This is now fixed, closing `issue 1407`_.

- Now ``**`` matches any number of nested directories, including none.

- Improvements to combining data files when using the
:ref:`config_run_relative_files` setting:

- During ``coverage combine``, relative file paths are implicitly combined
 without needing a ``[paths]`` configuration setting.  This also fixed
 `issue 991`_.

- A ``[paths]`` setting like ``*/foo`` will now match ``foo/bar.py`` so that
 relative file paths can be combined more easily.

- The setting is properly interpreted in more places, fixing `issue 1280`_.

- Fixed environment variable expansion in pyproject.toml files.  It was overly
broad, causing errors outside of coverage.py settings, as described in `issue
1481`_ and `issue 1345`_.  This is now fixed, but in rare cases will require
changing your pyproject.toml to quote non-string values that use environment
substitution.

- Fixed internal logic that prevented coverage.py from running on
implementations other than CPython or PyPy (`issue 1474`_).

.. _issue 991: https://github.com/nedbat/coveragepy/issues/991
.. _issue 1280: https://github.com/nedbat/coveragepy/issues/1280
.. _issue 1345: https://github.com/nedbat/coveragepy/issues/1345
.. _issue 1407: https://github.com/nedbat/coveragepy/issues/1407
.. _issue 1474: https://github.com/nedbat/coveragepy/issues/1474
.. _issue 1481: https://github.com/nedbat/coveragepy/issues/1481


.. _changes_6-5-0:

6.6.0

- Changes to file pattern matching, which might require updating your
configuration:

- Previously, ``*`` would incorrectly match directory separators, making
 precise matching difficult.  This is now fixed, closing `issue 1407`_.

- Now ``**`` matches any number of nested directories, including none.

- Improvements to combining data files when using the
:ref:`config_run_relative_files` setting, which might require updating your
configuration:

- During ``coverage combine``, relative file paths are implicitly combined
 without needing a ``[paths]`` configuration setting.  This also fixed
 `issue 991`_.

- A ``[paths]`` setting like ``*/foo`` will now match ``foo/bar.py`` so that
 relative file paths can be combined more easily.

- The :ref:`config_run_relative_files` setting is properly interpreted in
 more places, fixing `issue 1280`_.

- When remapping file paths with ``[paths]``, a path will be remapped only if
the resulting path exists.  The documentation has long said the prefix had to
exist, but it was never enforced.  This fixes `issue 608`_, improves `issue
649`_, and closes `issue 757`_.

- Reporting operations now implicitly use the ``[paths]`` setting to remap file
paths within a single data file.  Combining multiple files still requires the
``coverage combine`` step, but this simplifies some single-file situations.
Closes `issue 1212`_ and `issue 713`_.

- The ``coverage report`` command now has a ``--format=`` option.  The original
style is now ``--format=text``, and is the default.

- Using ``--format=markdown`` will write the table in Markdown format, thanks
 to `Steve Oswald &lt;pull 1479_&gt;`_, closing `issue 1418`_.

- Using ``--format=total`` will write a single total number to the
 output.  This can be useful for making badges or writing status updates.

- Combining data files with ``coverage combine`` now hashes the data files to
skip files that add no new information.  This can reduce the time needed.
Many details affect the speed-up, but for coverage.py&#x27;s own test suite,
combining is about 40% faster. Closes `issue 1483`_.

- When searching for completely un-executed files, coverage.py uses the
presence of ``__init__.py`` files to determine which directories have source
that could have been imported.  However, `implicit namespace packages`_ don&#x27;t
require ``__init__.py``.  A new setting ``[report]
include_namespace_packages`` tells coverage.py to consider these directories
during reporting.  Thanks to `Felix Horvat &lt;pull 1387_&gt;`_ for the
contribution.  Closes `issue 1383`_ and `issue 1024`_.

- Fixed environment variable expansion in pyproject.toml files.  It was overly
broad, causing errors outside of coverage.py settings, as described in `issue
1481`_ and `issue 1345`_.  This is now fixed, but in rare cases will require
changing your pyproject.toml to quote non-string values that use environment
substitution.

- An empty file has a coverage total of 100%, but used to fail with
``--fail-under``.  This has been fixed, closing `issue 1470`_.

- The text report table no longer writes out two separator lines if there are
no files listed in the table.  One is plenty.

- Fixed a mis-measurement of a strange use of wildcard alternatives in
match/case statements, closing `issue 1421`_.

- Fixed internal logic that prevented coverage.py from running on
implementations other than CPython or PyPy (`issue 1474`_).

- The deprecated ``[run] note`` setting has been completely removed.

.. _implicit namespace packages: https://peps.python.org/pep-0420/
.. _issue 608: https://github.com/nedbat/coveragepy/issues/608
.. _issue 649: https://github.com/nedbat/coveragepy/issues/649
.. _issue 713: https://github.com/nedbat/coveragepy/issues/713
.. _issue 757: https://github.com/nedbat/coveragepy/issues/757
.. _issue 991: https://github.com/nedbat/coveragepy/issues/991
.. _issue 1024: https://github.com/nedbat/coveragepy/issues/1024
.. _issue 1212: https://github.com/nedbat/coveragepy/issues/1212
.. _issue 1280: https://github.com/nedbat/coveragepy/issues/1280
.. _issue 1345: https://github.com/nedbat/coveragepy/issues/1345
.. _issue 1383: https://github.com/nedbat/coveragepy/issues/1383
.. _issue 1407: https://github.com/nedbat/coveragepy/issues/1407
.. _issue 1418: https://github.com/nedbat/coveragepy/issues/1418
.. _issue 1421: https://github.com/nedbat/coveragepy/issues/1421
.. _issue 1470: https://github.com/nedbat/coveragepy/issues/1470
.. _issue 1474: https://github.com/nedbat/coveragepy/issues/1474
.. _issue 1481: https://github.com/nedbat/coveragepy/issues/1481
.. _issue 1483: https://github.com/nedbat/coveragepy/issues/1483
.. _pull 1387: https://github.com/nedbat/coveragepy/pull/1387
.. _pull 1479: https://github.com/nedbat/coveragepy/pull/1479



.. _changes_6-6-0b1:

6.6.0b1

----------------------------

6.5.0

--------------------------

- The JSON report now includes details of which branches were taken, and which
are missing for each file. Thanks, `Christoph Blessing &lt;pull 1438_&gt;`_. Closes
`issue 1425`_.

- Starting with coverage.py 6.2, ``class`` statements were marked as a branch.
This wasn&#x27;t right, and has been reverted, fixing `issue 1449`_. Note this
will very slightly reduce your coverage total if you are measuring branch
coverage.

- Packaging is now compliant with `PEP 517`_, closing `issue 1395`_.

- A new debug option ``--debug=pathmap`` shows details of the remapping of
paths that happens during combine due to the ``[paths]`` setting.

- Fix an internal problem with caching of invalid Python parsing. Found by
OSS-Fuzz, fixing their `bug 50381`_.

.. _bug 50381: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50381
.. _PEP 517: https://peps.python.org/pep-0517/
.. _issue 1395: https://github.com/nedbat/coveragepy/issues/1395
.. _issue 1425: https://github.com/nedbat/coveragepy/issues/1425
.. _issue 1449: https://github.com/nedbat/coveragepy/issues/1449
.. _pull 1438: https://github.com/nedbat/coveragepy/pull/1438


.. _changes_6-4-4:

6.4.4

--------------------------

- Wheels are now provided for Python 3.11.


.. _changes_6-4-3:

6.4.3

--------------------------

- Fix a failure when combining data files if the file names contained glob-like
patterns.  Thanks, `Michael Krebs and Benjamin Schubert &lt;pull 1405_&gt;`_.

- Fix a messaging failure when combining Windows data files on a different
drive than the current directory, closing `issue 1428`_.  Thanks, `Lorenzo
Micò &lt;pull 1430_&gt;`_.

- Fix path calculations when running in the root directory, as you might do in
a Docker container. Thanks `Arthur Rio &lt;pull 1403_&gt;`_.

- Filtering in the HTML report wouldn&#x27;t work when reloading the index page.
This is now fixed.  Thanks, `Marc Legendre &lt;pull 1413_&gt;`_.

- Fix a problem with Cython code measurement, closing `issue 972`_.  Thanks,
`Matus Valo &lt;pull 1347_&gt;`_.

.. _issue 972: https://github.com/nedbat/coveragepy/issues/972
.. _issue 1428: https://github.com/nedbat/coveragepy/issues/1428
.. _pull 1347: https://github.com/nedbat/coveragepy/pull/1347
.. _pull 1403: https://github.com/nedbat/coveragepy/issues/1403
.. _pull 1405: https://github.com/nedbat/coveragepy/issues/1405
.. _pull 1413: https://github.com/nedbat/coveragepy/issues/1413
.. _pull 1430: https://github.com/nedbat/coveragepy/pull/1430


.. _changes_6-4-2:

6.4.2

--------------------------

- Updated for a small change in Python 3.11.0 beta 4: modules now start with a
line with line number 0, which is ignored.  This line cannot be executed, so
coverage totals were thrown off.  This line is now ignored by coverage.py,
but this also means that truly empty modules (like ``__init__.py``) have no
lines in them, rather than one phantom line.  Fixes `issue 1419`_.

- Internal debugging data added to sys.modules is now an actual module, to
avoid confusing code that examines everything in sys.modules.  Thanks,
`Yilei Yang &lt;pull 1399_&gt;`_.

.. _issue 1419: https://github.com/nedbat/coveragepy/issues/1419
.. _pull 1399: https://github.com/nedbat/coveragepy/pull/1399


.. _changes_6-4-1:

6.4.1

--------------------------

- Greatly improved performance on PyPy, and other environments that need the
pure Python trace function.  Thanks, Carl Friedrich Bolz-Tereick (`pull
1381`_ and `pull 1388`_).  Slightly improved performance when using the C
trace function, as most environments do.  Closes `issue 1339`_.

- The conditions for using tomllib from the standard library have been made
more precise, so that 3.11 alphas will continue to work. Closes `issue
1390`_.

.. _issue 1339: https://github.com/nedbat/coveragepy/issues/1339
.. _pull 1381: https://github.com/nedbat/coveragepy/pull/1381
.. _pull 1388: https://github.com/nedbat/coveragepy/pull/1388
.. _issue 1390: https://github.com/nedbat/coveragepy/issues/1390


.. _changes_64:

6.4

------------------------

- A new setting, :ref:`config_run_sigterm`, controls whether a SIGTERM signal
handler is used.  In 6.3, the signal handler was always installed, to capture
data at unusual process ends.  Unfortunately, this introduced other problems
(see `issue 1310`_).  Now the signal handler is only used if you opt-in by
setting ``[run] sigterm = true``.

- Small changes to the HTML report:

- Added links to next and previous file, and more keyboard shortcuts: ``[``
 and ``]`` for next file and previous file; ``u`` for up to the index; and
 ``?`` to open/close the help panel.  Thanks, `J. M. F. Tsang
 &lt;pull 1364_&gt;`_.

- The time stamp and version are displayed at the top of the report.  Thanks,
 `Ammar Askar &lt;pull 1354_&gt;`_. Closes `issue 1351`_.

- A new debug option ``debug=sqldata`` adds more detail to ``debug=sql``,
logging all the data being written to the database.

- Previously, running ``coverage report`` (or any of the reporting commands) in
an empty directory would create a .coverage data file.  Now they do not,
fixing `issue 1328`_.

- On Python 3.11, the ``[toml]`` extra no longer installs tomli, instead using
tomllib from the standard library.  Thanks `Shantanu &lt;pull 1359_&gt;`_.

- In-memory CoverageData objects now properly update(), closing `issue 1323`_.

.. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
.. _issue 1323: https://github.com/nedbat/coveragepy/issues/1323
.. _issue 1328: https://github.com/nedbat/coveragepy/issues/1328
.. _issue 1351: https://github.com/nedbat/coveragepy/issues/1351
.. _pull 1354: https://github.com/nedbat/coveragepy/pull/1354
.. _pull 1359: https://github.com/nedbat/coveragepy/pull/1359
.. _pull 1364: https://github.com/nedbat/coveragepy/pull/1364


.. _changes_633:

6.3.3

--------------------------

- Fix: Coverage.py now builds successfully on CPython 3.11 (3.11.0b1) again.
Closes `issue 1367`_.  Some results for generators may have changed.

.. _issue 1367: https://github.com/nedbat/coveragepy/issues/1367


.. _changes_632:

6.3.2

--------------------------

- Fix: adapt to pypy3.9&#x27;s decorator tracing behavior.  It now traces function
decorators like CPython 3.8: both the -line and the def-line are traced.
Fixes `issue 1326`_.

- Debug: added ``pybehave`` to the list of :ref:`coverage debug &lt;cmd_debug&gt;`
and :ref:`cmd_run_debug` options.

- Fix: show an intelligible error message if ``--concurrency=multiprocessing``
is used without a configuration file.  Closes `issue 1320`_.

.. _issue 1320: https://github.com/nedbat/coveragepy/issues/1320
.. _issue 1326: https://github.com/nedbat/coveragepy/issues/1326


.. _changes_631:

6.3.1

--------------------------

- Fix: deadlocks could occur when terminating processes.  Some of these
deadlocks (described in `issue 1310`_) are now fixed.

- Fix: a signal handler was being set from multiple threads, causing an error:
&quot;ValueError: signal only works in main thread&quot;.  This is now fixed, closing
`issue 1312`_.

- Fix: ``--precision`` on the command-line was being ignored while considering
``--fail-under``.  This is now fixed, thanks to
`Marcelo Trylesinski &lt;pull 1317_&gt;`_.

- Fix: releases no longer provide 3.11.0-alpha wheels. Coverage.py uses CPython
internal fields which are moving during the alpha phase. Fixes `issue 1316`_.

.. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
.. _issue 1312: https://github.com/nedbat/coveragepy/issues/1312
.. _issue 1316: https://github.com/nedbat/coveragepy/issues/1316
.. _pull 1317: https://github.com/nedbat/coveragepy/pull/1317


.. _changes_63:

6.3

------------------------

- Feature: Added the ``lcov`` command to generate reports in LCOV format.
Thanks, `Bradley Burns &lt;pull 1289_&gt;`_. Closes issues `587 &lt;issue 587_&gt;`_
and `626 &lt;issue 626_&gt;`_.

- Feature: the coverage data file can now be specified on the command line with
the ``--data-file`` option in any command that reads or writes data.  This is
in addition to the existing ``COVERAGE_FILE`` environment variable.  Closes
`issue 624`_. Thanks, `Nikita Bloshchanevich &lt;pull 1304_&gt;`_.

- Feature: coverage measurement data will now be written when a SIGTERM signal
is received by the process.  This includes
:meth:`Process.terminate &lt;python:multiprocessing.Process.terminate&gt;`,
and other ways to terminate a process.  Currently this is only on Linux and
Mac; Windows is not supported.  Fixes `issue 1307`_.

- Dropped support for Python 3.6, which reached end-of-life on 2021-12-23.

- Updated Python 3.11 support to 3.11.0a4, fixing `issue 1294`_.

- Fix: the coverage data file is now created in a more robust way, to avoid
problems when multiple processes are trying to write data at once. Fixes
issues `1303 &lt;issue 1303_&gt;`_ and `883 &lt;issue 883_&gt;`_.

- Fix: a .gitignore file will only be written into the HTML report output
directory if the directory is empty.  This should prevent certain unfortunate
accidents of writing the file where it is not wanted.

- Releases now have MacOS arm64 wheels for Apple Silicon, fixing `issue 1288`_.

.. _issue 587: https://github.com/nedbat/coveragepy/issues/587
.. _issue 624: https://github.com/nedbat/coveragepy/issues/624
.. _issue 626: https://github.com/nedbat/coveragepy/issues/626
.. _issue 883: https://github.com/nedbat/coveragepy/issues/883
.. _issue 1288: https://github.com/nedbat/coveragepy/issues/1288
.. _issue 1294: https://github.com/nedbat/coveragepy/issues/1294
.. _issue 1303: https://github.com/nedbat/coveragepy/issues/1303
.. _issue 1307: https://github.com/nedbat/coveragepy/issues/1307
.. _pull 1289: https://github.com/nedbat/coveragepy/pull/1289
.. _pull 1304: https://github.com/nedbat/coveragepy/pull/1304


.. _changes_62:

6.2

------------------------

- Feature: Now the ``--concurrency`` setting can now have a list of values, so
that threads and another lightweight threading package can be measured
together, such as ``--concurrency=gevent,thread``.  Closes `issue 1012`_ and
`issue 1082`_.

- Fix: A module specified as the ``source`` setting is imported during startup,
before the user program imports it.  This could cause problems if the rest of
the program isn&#x27;t ready yet.  For example, `issue 1203`_ describes a Django
setting that is accessed before settings have been configured.  Now the early
import is wrapped in a try/except so errors then don&#x27;t stop execution.

- Fix: A colon in a decorator expression would cause an exclusion to end too
early, preventing the exclusion of the decorated function. This is now fixed.

- Fix: The HTML report now will not overwrite a .gitignore file that already
exists in the HTML output directory (follow-on for `issue 1244`_).

- API: The exceptions raised by Coverage.py have been specialized, to provide
finer-grained catching of exceptions by third-party code.

- API: Using ``suffix=False`` when constructing a Coverage object with
multiprocessing wouldn&#x27;t suppress the data file suffix (`issue 989`_).  This
is now fixed.

- Debug: The ``coverage debug data`` command will now sniff out combinable data
files, and report on all of them.

- Debug: The ``coverage debug`` command used to accept a number of topics at a
time, and show all of them, though this was never documented.  This no longer
works, to allow for command-line options in the future.

.. _issue 989: https://github.com/nedbat/coveragepy/issues/989
.. _issue 1012: https://github.com/nedbat/coveragepy/issues/1012
.. _issue 1082: https://github.com/nedbat/coveragepy/issues/1082
.. _issue 1203: https://github.com/nedbat/coveragepy/issues/1203


.. _changes_612:

6.1.2

--------------------------

- Python 3.11 is supported (tested with 3.11.0a2).  One still-open issue has to
do with `exits through with-statements &lt;issue 1270_&gt;`_.

- Fix: When remapping file paths through the ``[paths]`` setting while
combining, the ``[run] relative_files`` setting was ignored, resulting in
absolute paths for remapped file names (`issue 1147`_).  This is now fixed.

- Fix: Complex conditionals over excluded lines could have incorrectly reported
a missing branch (`issue 1271`_). This is now fixed.

- Fix: More exceptions are now handled when trying to parse source files for
reporting.  Problems that used to terminate coverage.py can now be handled
with ``[report] ignore_errors``.  This helps with plugins failing to read
files (`django_coverage_plugin issue 78`_).

- Fix: Removed another vestige of jQuery from the source tarball
(`issue 840`_).

- Fix: Added a default value for a new-to-6.x argument of an internal class.
This unsupported class is being used by coveralls (`issue 1273`_). Although
I&#x27;d rather not &quot;fix&quot; unsupported interfaces, it&#x27;s actually nicer with a
default value.

.. _django_coverage_plugin issue 78: https://github.com/nedbat/django_coverage_plugin/issues/78
.. _issue 1147: https://github.com/nedbat/coveragepy/issues/1147
.. _issue 1270: https://github.com/nedbat/coveragepy/issues/1270
.. _issue 1271: https://github.com/nedbat/coveragepy/issues/1271
.. _issue 1273: https://github.com/nedbat/coveragepy/issues/1273


.. _changes_611:

6.1.1

--------------------------

- Fix: The sticky header on the HTML report didn&#x27;t work unless you had branch
coverage enabled. This is now fixed: the sticky header works for everyone.
(Do people still use coverage without branch measurement!? j/k)

- Fix: When using explicitly declared namespace packages, the &quot;already imported
a file that will be measured&quot; warning would be issued (`issue 888`_).  This
is now fixed.

.. _issue 888: https://github.com/nedbat/coveragepy/issues/888


.. _changes_61:

6.1

------------------------

- Deprecated: The ``annotate`` command and the ``Coverage.annotate`` function
will be removed in a future version, unless people let me know that they are
using it.  Instead, the ``html`` command gives better-looking (and more
accurate) output, and the ``report -m`` command will tell you line numbers of
missing lines.  Please get in touch if you have a reason to use ``annotate``
over those better options: nednedbatchelder.com.

- Feature: Coverage now sets an environment variable, ``COVERAGE_RUN`` when
running your code with the ``coverage run`` command.  The value is not
important, and may change in the future.  Closes `issue 553`_.

- Feature: The HTML report pages for Python source files now have a sticky
header so the file name and controls are always visible.

- Feature: The ``xml`` and ``json`` commands now describe what they wrote
where.

- Feature: The ``html``, ``combine``, ``xml``, and ``json`` commands all accept
a ``-q/--quiet`` option to suppress the messages they write to stdout about
what they are doing (`issue 1254`_).

- Feature: The ``html`` command writes a ``.gitignore`` file into the HTML
output directory, to prevent the report from being committed to git.  If you
want to commit it, you will need to delete that file.  Closes `issue 1244`_.

- Feature: Added support for PyPy 3.8.

- Fix: More generated code is now excluded from measurement.  Code such as
`attrs`_ boilerplate, or doctest code, was being measured though the
synthetic line numbers meant they were never reported.  Once Cython was
involved though, the generated .so files were parsed as Python, raising
syntax errors, as reported in `issue 1160`_.  This is now fixed.

- Fix: When sorting human-readable names, numeric components are sorted
correctly: file10.py will appear after file9.py.  This applies to file names,
module names, environment variables, and test contexts.

- Performance: Branch coverage measurement is faster, though you might only
notice on code that is executed many times, such as long-running loops.

- Build: jQuery is no longer used or vendored (`issue 840`_ and `issue 1118`_).
Huge thanks to Nils Kattenbeck (septatrix) for the conversion to vanilla
JavaScript in `pull request 1248`_.

.. _issue 553: https://github.com/nedbat/coveragepy/issues/553
.. _issue 840: https://github.com/nedbat/coveragepy/issues/840
.. _issue 1118: https://github.com/nedbat/coveragepy/issues/1118
.. _issue 1160: https://github.com/nedbat/coveragepy/issues/1160
.. _issue 1244: https://github.com/nedbat/coveragepy/issues/1244
.. _pull request 1248: https://github.com/nedbat/coveragepy/pull/1248
.. _issue 1254: https://github.com/nedbat/coveragepy/issues/1254
.. _attrs: https://www.attrs.org/


.. _changes_602:

6.0.2

--------------------------

- Namespace packages being measured weren&#x27;t properly handled by the new code
that ignores third-party packages. If the namespace package was installed, it
was ignored as a third-party package.  That problem (`issue 1231`_) is now
fixed.

- Packages named as &quot;source packages&quot; (with ``source``, or ``source_pkgs``, or
pytest-cov&#x27;s ``--cov``) might have been only partially measured.  Their
top-level statements could be marked as un-executed, because they were
imported by coverage.py before measurement began (`issue 1232`_).  This is
now fixed, but the package will be imported twice, once by coverage.py, then
again by your test suite.  This could cause problems if importing the package
has side effects.

- The :meth:`.CoverageData.contexts_by_lineno` method was documented to return
a dict, but was returning a defaultdict.  Now it returns a plain dict.  It
also no longer returns negative numbered keys.

.. _issue 1231: https://github.com/nedbat/coveragepy/issues/1231
.. _issue 1232: https://github.com/nedbat/coveragepy/issues/1232


.. _changes_601:

6.0.1

--------------------------

- In 6.0, the coverage.py exceptions moved from coverage.misc to
coverage.exceptions. These exceptions are not part of the public supported
API, CoverageException is. But a number of other third-party packages were
importing the exceptions from coverage.misc, so they are now available from
there again (`issue 1226`_).

- Changed an internal detail of how tomli is imported, so that tomli can use
coverage.py for their own test suite (`issue 1228`_).

- Defend against an obscure possibility under code obfuscation, where a
function can have an argument called &quot;self&quot;, but no local named &quot;self&quot;
(`pull request 1210`_).  Thanks, Ben Carlsson.

.. _pull request 1210: https://github.com/nedbat/coveragepy/pull/1210
.. _issue 1226: https://github.com/nedbat/coveragepy/issues/1226
.. _issue 1228: https://github.com/nedbat/coveragepy/issues/1228


.. _changes_60:

6.0

------------------------

- The ``coverage html`` command now prints a message indicating where the HTML
report was written.  Fixes `issue 1195`_.

- The ``coverage combine`` command now prints messages indicating each data
file being combined.  Fixes `issue 1105`_.

- The HTML report now includes a sentence about skipped files due to
``skip_covered`` or ``skip_empty`` settings.  Fixes `issue 1163`_.

- Unrecognized options in the configuration file are no longer errors. They are
now warnings, to ease the use of coverage across versions.  Fixes `issue
1035`_.

- Fix handling of exceptions through context managers in Python 3.10. A missing
exception is no longer considered a missing branch from the with statement.
Fixes `issue 1205`_.

- Fix another rarer instance of &quot;Error binding parameter 0 - probably
unsupported type.&quot; (`issue 1010`_).

- Creating a directory for the coverage data file now is safer against
conflicts when two coverage runs happen simultaneously (`pull 1220`_).
Thanks, Clément Pit-Claudel.

.. _issue 1035: https://github.com/nedbat/coveragepy/issues/1035
.. _issue 1105: https://github.com/nedbat/coveragepy/issues/1105
.. _issue 1163: https://github.com/nedbat/coveragepy/issues/1163
.. _issue 1195: https://github.com/nedbat/coveragepy/issues/1195
.. _issue 1205: https://github.com/nedbat/coveragepy/issues/1205
.. _pull 1220: https://github.com/nedbat/coveragepy/pull/1220


.. _changes_60b1:

6.0b1

--------------------------

- Dropped support for Python 2.7, PyPy 2, and Python 3.5.

- Added support for the Python 3.10 ``match/case`` syntax.

- Data collection is now thread-safe.  There may have been rare instances of
exceptions raised in multi-threaded programs.

- Plugins (like the `Django coverage plugin`_) were generating &quot;Already
imported a file that will be measured&quot; warnings about Django itself.  These
have been fixed, closing `issue 1150`_.

- Warnings generated by coverage.py are now real Python warnings.

- Using ``--fail-under=100`` with coverage near 100% could result in the
self-contradictory message :code:`total of 100 is less than fail-under=100`.
This bug (`issue 1168`_) is now fixed.

- The ``COVERAGE_DEBUG_FILE`` environment variable now accepts ``stdout`` and
``stderr`` to write to those destinations.

- TOML parsing now uses the `tomli`_ library.

- Some minor changes to usually invisible details of the HTML report:

- Use a modern hash algorithm when fingerprinting, for high-security
 environments (`issue 1189`_).  When generating the HTML report, we save the
 hash of the data, to avoid regenerating an unchanged HTML page. We used to
 use MD5 to generate the hash, and now use SHA-3-256.  This was never a
 security concern, but security scanners would notice the MD5 algorithm and
 raise a false alarm.

- Change how report file names are generated, to avoid leading underscores
 (`issue 1167`_), to avoid rare file name collisions (`issue 584`_), and to
 avoid file names becoming too long (`issue 580`_).

.. _Django coverage plugin: https://pypi.org/project/django-coverage-plugin/
.. _issue 580: https://github.com/nedbat/coveragepy/issues/580
.. _issue 584: https://github.com/nedbat/coveragepy/issues/584
.. _issue 1150: https://github.com/nedbat/coveragepy/issues/1150
.. _issue 1167: https://github.com/nedbat/coveragepy/issues/1167
.. _issue 1168: https://github.com/nedbat/coveragepy/issues/1168
.. _issue 1189: https://github.com/nedbat/coveragepy/issues/1189
.. _tomli: https://pypi.org/project/tomli/


.. _changes_56b1:

5.6b1

--------------------------

Note: 5.6 final was never released. These changes are part of 6.0.

- Third-party packages are now ignored in coverage reporting.  This solves a
few problems:

- Coverage will no longer report about other people&#x27;s code (`issue 876`_).
 This is true even when using ``--source=.`` with a venv in the current
 directory.

- Coverage will no longer generate &quot;Already imported a file that will be
 measured&quot; warnings about coverage itself (`issue 905`_).

- The HTML report uses j/k to move up and down among the highlighted chunks of
code.  They used to highlight the current chunk, but 5.0 broke that behavior.
Now the highlighting is working again.

- The JSON report now includes ``percent_covered_display``, a string with the
total percentage, rounded to the same number of decimal places as the other
reports&#x27; totals.

.. _issue 876: https://github.com/nedbat/coveragepy/issues/876
.. _issue 905: https://github.com/nedbat/coveragepy/issues/905


.. _changes_55:

5.5

------------------------

- ``coverage combine`` has a new option, ``--keep`` to keep the original data
files after combining them.  The default is still to delete the files after
they have been combined.  This was requested in `issue 1108`_ and implemented
in `pull request 1110`_.  Thanks, Éric Larivière.

- When reporting missing branches in ``coverage report``, branches aren&#x27;t
reported that jump to missing lines.  This adds to the long-standing behavior
of not reporting branches from missing lines.  Now branches are only reported
if both the source and destination lines are executed.  Closes both `issue
1065`_ and `issue 955`_.

- Minor improvements to the HTML report:

- The state of the line visibility selector buttons is saved in local storage
 so you don&#x27;t have to fiddle with them so often, fixing `issue 1123`_.

- It has a little more room for line numbers so that 4-digit numbers work
 well, fixing `issue 1124`_.

- Improved the error message when combining line and branch data, so that users
will be more likely to understand what&#x27;s happening, closing `issue 803`_.

.. _issue 803: https://github.com/nedbat/coveragepy/issues/803
.. _issue 955: https://github.com/nedbat/coveragepy/issues/955
.. _issue 1065: https://github.com/nedbat/coveragepy/issues/1065
.. _issue 1108: https://github.com/nedbat/coveragepy/issues/1108
.. _pull request 1110: https://github.com/nedbat/coveragepy/pull/1110
.. _issue 1123: https://github.com/nedbat/coveragepy/issues/1123
.. _issue 1124: https://github.com/nedbat/coveragepy/issues/1124


.. _changes_54:

5.4

------------------------

- The text report produced by ``coverage report`` now always outputs a TOTAL
line, even if only one Python file is reported.  This makes regex parsing
of the output easier.  Thanks, Judson Neer.  This had been requested a number
of times (`issue 1086`_, `issue 922`_, `issue 732`_).

- The ``skip_covered`` and ``skip_empty`` settings in the configuration file
can now be specified in the ``[html]`` section, so that text reports and HTML
reports can use separate settings.  The HTML report will still use the
``[report]`` settings if there isn&#x27;t a value in the ``[html]`` section.
Closes `issue 1090`_.

- Combining files on Windows across drives now works properly, fixing `issue
577`_.  Thanks, `Valentin Lab &lt;pr1080_&gt;`_.

- Fix an obscure warning from deep in the _decimal module, as reported in
`issue 1084`_.

- Update to support Python 3.10 alphas in progress, including `PEP 626: Precise
line numbers for debugging and other tools &lt;pep626_&gt;`_.

.. _issue 577: https://github.com/nedbat/coveragepy/issues/577
.. _issue 732: https://github.com/nedbat/coveragepy/issues/732
.. _issue 922: https://github.com/nedbat/coveragepy/issues/922
.. _issue 1084: https://github.com/nedbat/coveragepy/issues/1084
.. _issue 1086: https://github.com/nedbat/coveragepy/issues/1086
.. _issue 1090: https://github.com/nedbat/coveragepy/issues/1090
.. _pr1080: https://github.com/nedbat/coveragepy/pull/1080
.. _pep626: https://www.python.org/dev/peps/pep-0626/


.. _changes_531:

5.3.1

--------------------------

- When using ``--source`` on a large source tree, v5.x was slower than previous
versions.  This performance regression is now fixed, closing `issue 1037`_.

- Mysterious SQLite errors can happen on PyPy, as reported in `issue 1010`_. An
immediate retry seems to fix the problem, although it is an unsatisfying
solution.

- The HTML report now saves the sort order in a more widely supported way,
fixing `issue 986`_.  Thanks, Sebastián Ramírez (`pull request 1066`_).

- The HTML report pages now have a :ref:`Sleepy Snake &lt;sleepy&gt;` favicon.

- Wheels are now provided for manylinux2010, and for PyPy3 (pp36 and pp37).

- Continuous integration has moved from Travis and AppVeyor to GitHub Actions.

.. _issue 986: https://github.com/nedbat/coveragepy/issues/986
.. _issue 1037: https://github.com/nedbat/coveragepy/issues/1037
.. _issue 1010: https://github.com/nedbat/coveragepy/issues/1010
.. _pull request 1066: https://github.com/nedbat/coveragepy/pull/1066

.. _changes_53:

5.3

------------------------

- The ``source`` setting has always been interpreted as either a file path or a
module, depending on which existed.  If both interpretations were valid, it
was assumed to be a file path.  The new ``source_pkgs`` setting can be used
to name a package to disambiguate this case.  Thanks, Thomas Grainger. Fixes
`issue 268`_.

- If a plugin was disabled due to an exception, we used to still try to record
its information, causing an exception, as reported in `issue 1011`_.  This is
now fixed.

.. _issue 268: https://github.com/nedbat/coveragepy/issues/268
.. _issue 1011: https://github.com/nedbat/coveragepy/issues/1011


.. endchangesinclude

Older changes
-------------

The complete history is available in the `coverage.py docs`__.

__ https://coverage.readthedocs.io/en/latest/changes.html

Links

PyPI: https://pypi.org/project/coverage
Changelog: https://pyup.io/changelogs/coverage/
Repo: https://github.com/nedbat/coveragepy

update

opened by pyup-bot 0

Pin coverage to latest version 7.0.3

This PR pins coverage to the latest release 7.0.3.

Changelog

7.0.3

--------------------------

- Fix: when using pytest-cov or pytest-xdist, or perhaps both, the combining
step could fail with ``assert row is not None`` using 7.0.2.  This was due to
a race condition that has always been possible and is still possible. In
7.0.1 and before, the error was silently swallowed by the combining code.
Now it will produce a message &quot;Couldn&#x27;t combine data file&quot; and ignore the
data file as it used to do before 7.0.2.  Closes `issue 1522`_.

.. _issue 1522: https://github.com/nedbat/coveragepy/issues/1522


.. _changes_7-0-2:

7.0.2

--------------------------

- Fix: when using the ``[run] relative_files = True`` setting, a relative
``[paths]`` pattern was still being made absolute.  This is now fixed,
closing `issue 1519`_.

- Fix: if Python doesn&#x27;t provide tomllib, then TOML configuration files can
only be read if coverage.py is installed with the ``[toml]`` extra.
Coverage.py will raise an error if TOML support is not installed when it sees
your settings are in a .toml file. But it didn&#x27;t understand that
``[tools.coverage]`` was a valid section header, so the error wasn&#x27;t reported
if you used that header, and settings were silently ignored.  This is now
fixed, closing `issue 1516`_.

- Fix: adjusted how decorators are traced on PyPy 7.3.10, fixing `issue 1515`_.

- Fix: the ``coverage lcov`` report did not properly implement the
``--fail-under=MIN`` option.  This has been fixed.

- Refactor: added many type annotations, including a number of refactorings.
This should not affect outward behavior, but they were a bit invasive in some
places, so keep your eyes peeled for oddities.

- Refactor: removed the vestigial and long untested support for Jython and
IronPython.

.. _issue 1515: https://github.com/nedbat/coveragepy/issues/1515
.. _issue 1516: https://github.com/nedbat/coveragepy/issues/1516
.. _issue 1519: https://github.com/nedbat/coveragepy/issues/1519


.. _changes_7-0-1:

7.0.1

--------------------------

- When checking if a file mapping resolved to a file that exists, we weren&#x27;t
considering files in .whl files.  This is now fixed, closing `issue 1511`_.

- File pattern rules were too strict, forbidding plus signs and curly braces in
directory and file names.  This is now fixed, closing `issue 1513`_.

- Unusual Unicode or control characters in source files could prevent
reporting.  This is now fixed, closing `issue 1512`_.

- The PyPy wheel now installs on PyPy 3.7, 3.8, and 3.9, closing `issue 1510`_.

.. _issue 1510: https://github.com/nedbat/coveragepy/issues/1510
.. _issue 1511: https://github.com/nedbat/coveragepy/issues/1511
.. _issue 1512: https://github.com/nedbat/coveragepy/issues/1512
.. _issue 1513: https://github.com/nedbat/coveragepy/issues/1513


.. _changes_7-0-0:

7.0.0

--------------------------

Nothing new beyond 7.0.0b1.


.. _changes_7-0-0b1:

7.0.0b1

&lt;changes_7-0-0b1_&gt;`_.)

- Changes to file pattern matching, which might require updating your
configuration:

- Previously, ``*`` would incorrectly match directory separators, making
 precise matching difficult.  This is now fixed, closing `issue 1407`_.

- Now ``**`` matches any number of nested directories, including none.

- Improvements to combining data files when using the
:ref:`config_run_relative_files` setting:

- During ``coverage combine``, relative file paths are implicitly combined
 without needing a ``[paths]`` configuration setting.  This also fixed
 `issue 991`_.

- A ``[paths]`` setting like ``*/foo`` will now match ``foo/bar.py`` so that
 relative file paths can be combined more easily.

- The setting is properly interpreted in more places, fixing `issue 1280`_.

- Fixed environment variable expansion in pyproject.toml files.  It was overly
broad, causing errors outside of coverage.py settings, as described in `issue
1481`_ and `issue 1345`_.  This is now fixed, but in rare cases will require
changing your pyproject.toml to quote non-string values that use environment
substitution.

- Fixed internal logic that prevented coverage.py from running on
implementations other than CPython or PyPy (`issue 1474`_).

.. _issue 991: https://github.com/nedbat/coveragepy/issues/991
.. _issue 1280: https://github.com/nedbat/coveragepy/issues/1280
.. _issue 1345: https://github.com/nedbat/coveragepy/issues/1345
.. _issue 1407: https://github.com/nedbat/coveragepy/issues/1407
.. _issue 1474: https://github.com/nedbat/coveragepy/issues/1474
.. _issue 1481: https://github.com/nedbat/coveragepy/issues/1481


.. _changes_6-5-0:

6.6.0

- Changes to file pattern matching, which might require updating your
configuration:

- Previously, ``*`` would incorrectly match directory separators, making
 precise matching difficult.  This is now fixed, closing `issue 1407`_.

- Now ``**`` matches any number of nested directories, including none.

- Improvements to combining data files when using the
:ref:`config_run_relative_files` setting, which might require updating your
configuration:

- During ``coverage combine``, relative file paths are implicitly combined
 without needing a ``[paths]`` configuration setting.  This also fixed
 `issue 991`_.

- A ``[paths]`` setting like ``*/foo`` will now match ``foo/bar.py`` so that
 relative file paths can be combined more easily.

- The :ref:`config_run_relative_files` setting is properly interpreted in
 more places, fixing `issue 1280`_.

- When remapping file paths with ``[paths]``, a path will be remapped only if
the resulting path exists.  The documentation has long said the prefix had to
exist, but it was never enforced.  This fixes `issue 608`_, improves `issue
649`_, and closes `issue 757`_.

- Reporting operations now implicitly use the ``[paths]`` setting to remap file
paths within a single data file.  Combining multiple files still requires the
``coverage combine`` step, but this simplifies some single-file situations.
Closes `issue 1212`_ and `issue 713`_.

- The ``coverage report`` command now has a ``--format=`` option.  The original
style is now ``--format=text``, and is the default.

- Using ``--format=markdown`` will write the table in Markdown format, thanks
 to `Steve Oswald &lt;pull 1479_&gt;`_, closing `issue 1418`_.

- Using ``--format=total`` will write a single total number to the
 output.  This can be useful for making badges or writing status updates.

- Combining data files with ``coverage combine`` now hashes the data files to
skip files that add no new information.  This can reduce the time needed.
Many details affect the speed-up, but for coverage.py&#x27;s own test suite,
combining is about 40% faster. Closes `issue 1483`_.

- When searching for completely un-executed files, coverage.py uses the
presence of ``__init__.py`` files to determine which directories have source
that could have been imported.  However, `implicit namespace packages`_ don&#x27;t
require ``__init__.py``.  A new setting ``[report]
include_namespace_packages`` tells coverage.py to consider these directories
during reporting.  Thanks to `Felix Horvat &lt;pull 1387_&gt;`_ for the
contribution.  Closes `issue 1383`_ and `issue 1024`_.

- Fixed environment variable expansion in pyproject.toml files.  It was overly
broad, causing errors outside of coverage.py settings, as described in `issue
1481`_ and `issue 1345`_.  This is now fixed, but in rare cases will require
changing your pyproject.toml to quote non-string values that use environment
substitution.

- An empty file has a coverage total of 100%, but used to fail with
``--fail-under``.  This has been fixed, closing `issue 1470`_.

- The text report table no longer writes out two separator lines if there are
no files listed in the table.  One is plenty.

- Fixed a mis-measurement of a strange use of wildcard alternatives in
match/case statements, closing `issue 1421`_.

- Fixed internal logic that prevented coverage.py from running on
implementations other than CPython or PyPy (`issue 1474`_).

- The deprecated ``[run] note`` setting has been completely removed.

.. _implicit namespace packages: https://peps.python.org/pep-0420/
.. _issue 608: https://github.com/nedbat/coveragepy/issues/608
.. _issue 649: https://github.com/nedbat/coveragepy/issues/649
.. _issue 713: https://github.com/nedbat/coveragepy/issues/713
.. _issue 757: https://github.com/nedbat/coveragepy/issues/757
.. _issue 991: https://github.com/nedbat/coveragepy/issues/991
.. _issue 1024: https://github.com/nedbat/coveragepy/issues/1024
.. _issue 1212: https://github.com/nedbat/coveragepy/issues/1212
.. _issue 1280: https://github.com/nedbat/coveragepy/issues/1280
.. _issue 1345: https://github.com/nedbat/coveragepy/issues/1345
.. _issue 1383: https://github.com/nedbat/coveragepy/issues/1383
.. _issue 1407: https://github.com/nedbat/coveragepy/issues/1407
.. _issue 1418: https://github.com/nedbat/coveragepy/issues/1418
.. _issue 1421: https://github.com/nedbat/coveragepy/issues/1421
.. _issue 1470: https://github.com/nedbat/coveragepy/issues/1470
.. _issue 1474: https://github.com/nedbat/coveragepy/issues/1474
.. _issue 1481: https://github.com/nedbat/coveragepy/issues/1481
.. _issue 1483: https://github.com/nedbat/coveragepy/issues/1483
.. _pull 1387: https://github.com/nedbat/coveragepy/pull/1387
.. _pull 1479: https://github.com/nedbat/coveragepy/pull/1479



.. _changes_6-6-0b1:

6.6.0b1

----------------------------

6.5.0

--------------------------

- The JSON report now includes details of which branches were taken, and which
are missing for each file. Thanks, `Christoph Blessing &lt;pull 1438_&gt;`_. Closes
`issue 1425`_.

- Starting with coverage.py 6.2, ``class`` statements were marked as a branch.
This wasn&#x27;t right, and has been reverted, fixing `issue 1449`_. Note this
will very slightly reduce your coverage total if you are measuring branch
coverage.

- Packaging is now compliant with `PEP 517`_, closing `issue 1395`_.

- A new debug option ``--debug=pathmap`` shows details of the remapping of
paths that happens during combine due to the ``[paths]`` setting.

- Fix an internal problem with caching of invalid Python parsing. Found by
OSS-Fuzz, fixing their `bug 50381`_.

.. _bug 50381: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50381
.. _PEP 517: https://peps.python.org/pep-0517/
.. _issue 1395: https://github.com/nedbat/coveragepy/issues/1395
.. _issue 1425: https://github.com/nedbat/coveragepy/issues/1425
.. _issue 1449: https://github.com/nedbat/coveragepy/issues/1449
.. _pull 1438: https://github.com/nedbat/coveragepy/pull/1438


.. _changes_6-4-4:

6.4.4

--------------------------

- Wheels are now provided for Python 3.11.


.. _changes_6-4-3:

6.4.3

--------------------------

- Fix a failure when combining data files if the file names contained glob-like
patterns.  Thanks, `Michael Krebs and Benjamin Schubert &lt;pull 1405_&gt;`_.

- Fix a messaging failure when combining Windows data files on a different
drive than the current directory, closing `issue 1428`_.  Thanks, `Lorenzo
Micò &lt;pull 1430_&gt;`_.

- Fix path calculations when running in the root directory, as you might do in
a Docker container. Thanks `Arthur Rio &lt;pull 1403_&gt;`_.

- Filtering in the HTML report wouldn&#x27;t work when reloading the index page.
This is now fixed.  Thanks, `Marc Legendre &lt;pull 1413_&gt;`_.

- Fix a problem with Cython code measurement, closing `issue 972`_.  Thanks,
`Matus Valo &lt;pull 1347_&gt;`_.

.. _issue 972: https://github.com/nedbat/coveragepy/issues/972
.. _issue 1428: https://github.com/nedbat/coveragepy/issues/1428
.. _pull 1347: https://github.com/nedbat/coveragepy/pull/1347
.. _pull 1403: https://github.com/nedbat/coveragepy/issues/1403
.. _pull 1405: https://github.com/nedbat/coveragepy/issues/1405
.. _pull 1413: https://github.com/nedbat/coveragepy/issues/1413
.. _pull 1430: https://github.com/nedbat/coveragepy/pull/1430


.. _changes_6-4-2:

6.4.2

--------------------------

- Updated for a small change in Python 3.11.0 beta 4: modules now start with a
line with line number 0, which is ignored.  This line cannot be executed, so
coverage totals were thrown off.  This line is now ignored by coverage.py,
but this also means that truly empty modules (like ``__init__.py``) have no
lines in them, rather than one phantom line.  Fixes `issue 1419`_.

- Internal debugging data added to sys.modules is now an actual module, to
avoid confusing code that examines everything in sys.modules.  Thanks,
`Yilei Yang &lt;pull 1399_&gt;`_.

.. _issue 1419: https://github.com/nedbat/coveragepy/issues/1419
.. _pull 1399: https://github.com/nedbat/coveragepy/pull/1399


.. _changes_6-4-1:

6.4.1

--------------------------

- Greatly improved performance on PyPy, and other environments that need the
pure Python trace function.  Thanks, Carl Friedrich Bolz-Tereick (`pull
1381`_ and `pull 1388`_).  Slightly improved performance when using the C
trace function, as most environments do.  Closes `issue 1339`_.

- The conditions for using tomllib from the standard library have been made
more precise, so that 3.11 alphas will continue to work. Closes `issue
1390`_.

.. _issue 1339: https://github.com/nedbat/coveragepy/issues/1339
.. _pull 1381: https://github.com/nedbat/coveragepy/pull/1381
.. _pull 1388: https://github.com/nedbat/coveragepy/pull/1388
.. _issue 1390: https://github.com/nedbat/coveragepy/issues/1390


.. _changes_64:

6.4

------------------------

- A new setting, :ref:`config_run_sigterm`, controls whether a SIGTERM signal
handler is used.  In 6.3, the signal handler was always installed, to capture
data at unusual process ends.  Unfortunately, this introduced other problems
(see `issue 1310`_).  Now the signal handler is only used if you opt-in by
setting ``[run] sigterm = true``.

- Small changes to the HTML report:

- Added links to next and previous file, and more keyboard shortcuts: ``[``
 and ``]`` for next file and previous file; ``u`` for up to the index; and
 ``?`` to open/close the help panel.  Thanks, `J. M. F. Tsang
 &lt;pull 1364_&gt;`_.

- The time stamp and version are displayed at the top of the report.  Thanks,
 `Ammar Askar &lt;pull 1354_&gt;`_. Closes `issue 1351`_.

- A new debug option ``debug=sqldata`` adds more detail to ``debug=sql``,
logging all the data being written to the database.

- Previously, running ``coverage report`` (or any of the reporting commands) in
an empty directory would create a .coverage data file.  Now they do not,
fixing `issue 1328`_.

- On Python 3.11, the ``[toml]`` extra no longer installs tomli, instead using
tomllib from the standard library.  Thanks `Shantanu &lt;pull 1359_&gt;`_.

- In-memory CoverageData objects now properly update(), closing `issue 1323`_.

.. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
.. _issue 1323: https://github.com/nedbat/coveragepy/issues/1323
.. _issue 1328: https://github.com/nedbat/coveragepy/issues/1328
.. _issue 1351: https://github.com/nedbat/coveragepy/issues/1351
.. _pull 1354: https://github.com/nedbat/coveragepy/pull/1354
.. _pull 1359: https://github.com/nedbat/coveragepy/pull/1359
.. _pull 1364: https://github.com/nedbat/coveragepy/pull/1364


.. _changes_633:

6.3.3

--------------------------

- Fix: Coverage.py now builds successfully on CPython 3.11 (3.11.0b1) again.
Closes `issue 1367`_.  Some results for generators may have changed.

.. _issue 1367: https://github.com/nedbat/coveragepy/issues/1367


.. _changes_632:

6.3.2

--------------------------

- Fix: adapt to pypy3.9&#x27;s decorator tracing behavior.  It now traces function
decorators like CPython 3.8: both the -line and the def-line are traced.
Fixes `issue 1326`_.

- Debug: added ``pybehave`` to the list of :ref:`coverage debug &lt;cmd_debug&gt;`
and :ref:`cmd_run_debug` options.

- Fix: show an intelligible error message if ``--concurrency=multiprocessing``
is used without a configuration file.  Closes `issue 1320`_.

.. _issue 1320: https://github.com/nedbat/coveragepy/issues/1320
.. _issue 1326: https://github.com/nedbat/coveragepy/issues/1326


.. _changes_631:

6.3.1

--------------------------

- Fix: deadlocks could occur when terminating processes.  Some of these
deadlocks (described in `issue 1310`_) are now fixed.

- Fix: a signal handler was being set from multiple threads, causing an error:
&quot;ValueError: signal only works in main thread&quot;.  This is now fixed, closing
`issue 1312`_.

- Fix: ``--precision`` on the command-line was being ignored while considering
``--fail-under``.  This is now fixed, thanks to
`Marcelo Trylesinski &lt;pull 1317_&gt;`_.

- Fix: releases no longer provide 3.11.0-alpha wheels. Coverage.py uses CPython
internal fields which are moving during the alpha phase. Fixes `issue 1316`_.

.. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
.. _issue 1312: https://github.com/nedbat/coveragepy/issues/1312
.. _issue 1316: https://github.com/nedbat/coveragepy/issues/1316
.. _pull 1317: https://github.com/nedbat/coveragepy/pull/1317


.. _changes_63:

6.3

------------------------

- Feature: Added the ``lcov`` command to generate reports in LCOV format.
Thanks, `Bradley Burns &lt;pull 1289_&gt;`_. Closes issues `587 &lt;issue 587_&gt;`_
and `626 &lt;issue 626_&gt;`_.

- Feature: the coverage data file can now be specified on the command line with
the ``--data-file`` option in any command that reads or writes data.  This is
in addition to the existing ``COVERAGE_FILE`` environment variable.  Closes
`issue 624`_. Thanks, `Nikita Bloshchanevich &lt;pull 1304_&gt;`_.

- Feature: coverage measurement data will now be written when a SIGTERM signal
is received by the process.  This includes
:meth:`Process.terminate &lt;python:multiprocessing.Process.terminate&gt;`,
and other ways to terminate a process.  Currently this is only on Linux and
Mac; Windows is not supported.  Fixes `issue 1307`_.

- Dropped support for Python 3.6, which reached end-of-life on 2021-12-23.

- Updated Python 3.11 support to 3.11.0a4, fixing `issue 1294`_.

- Fix: the coverage data file is now created in a more robust way, to avoid
problems when multiple processes are trying to write data at once. Fixes
issues `1303 &lt;issue 1303_&gt;`_ and `883 &lt;issue 883_&gt;`_.

- Fix: a .gitignore file will only be written into the HTML report output
directory if the directory is empty.  This should prevent certain unfortunate
accidents of writing the file where it is not wanted.

- Releases now have MacOS arm64 wheels for Apple Silicon, fixing `issue 1288`_.

.. _issue 587: https://github.com/nedbat/coveragepy/issues/587
.. _issue 624: https://github.com/nedbat/coveragepy/issues/624
.. _issue 626: https://github.com/nedbat/coveragepy/issues/626
.. _issue 883: https://github.com/nedbat/coveragepy/issues/883
.. _issue 1288: https://github.com/nedbat/coveragepy/issues/1288
.. _issue 1294: https://github.com/nedbat/coveragepy/issues/1294
.. _issue 1303: https://github.com/nedbat/coveragepy/issues/1303
.. _issue 1307: https://github.com/nedbat/coveragepy/issues/1307
.. _pull 1289: https://github.com/nedbat/coveragepy/pull/1289
.. _pull 1304: https://github.com/nedbat/coveragepy/pull/1304


.. _changes_62:

6.2

------------------------

- Feature: Now the ``--concurrency`` setting can now have a list of values, so
that threads and another lightweight threading package can be measured
together, such as ``--concurrency=gevent,thread``.  Closes `issue 1012`_ and
`issue 1082`_.

- Fix: A module specified as the ``source`` setting is imported during startup,
before the user program imports it.  This could cause problems if the rest of
the program isn&#x27;t ready yet.  For example, `issue 1203`_ describes a Django
setting that is accessed before settings have been configured.  Now the early
import is wrapped in a try/except so errors then don&#x27;t stop execution.

- Fix: A colon in a decorator expression would cause an exclusion to end too
early, preventing the exclusion of the decorated function. This is now fixed.

- Fix: The HTML report now will not overwrite a .gitignore file that already
exists in the HTML output directory (follow-on for `issue 1244`_).

- API: The exceptions raised by Coverage.py have been specialized, to provide
finer-grained catching of exceptions by third-party code.

- API: Using ``suffix=False`` when constructing a Coverage object with
multiprocessing wouldn&#x27;t suppress the data file suffix (`issue 989`_).  This
is now fixed.

- Debug: The ``coverage debug data`` command will now sniff out combinable data
files, and report on all of them.

- Debug: The ``coverage debug`` command used to accept a number of topics at a
time, and show all of them, though this was never documented.  This no longer
works, to allow for command-line options in the future.

.. _issue 989: https://github.com/nedbat/coveragepy/issues/989
.. _issue 1012: https://github.com/nedbat/coveragepy/issues/1012
.. _issue 1082: https://github.com/nedbat/coveragepy/issues/1082
.. _issue 1203: https://github.com/nedbat/coveragepy/issues/1203


.. _changes_612:

6.1.2

--------------------------

- Python 3.11 is supported (tested with 3.11.0a2).  One still-open issue has to
do with `exits through with-statements &lt;issue 1270_&gt;`_.

- Fix: When remapping file paths through the ``[paths]`` setting while
combining, the ``[run] relative_files`` setting was ignored, resulting in
absolute paths for remapped file names (`issue 1147`_).  This is now fixed.

- Fix: Complex conditionals over excluded lines could have incorrectly reported
a missing branch (`issue 1271`_). This is now fixed.

- Fix: More exceptions are now handled when trying to parse source files for
reporting.  Problems that used to terminate coverage.py can now be handled
with ``[report] ignore_errors``.  This helps with plugins failing to read
files (`django_coverage_plugin issue 78`_).

- Fix: Removed another vestige of jQuery from the source tarball
(`issue 840`_).

- Fix: Added a default value for a new-to-6.x argument of an internal class.
This unsupported class is being used by coveralls (`issue 1273`_). Although
I&#x27;d rather not &quot;fix&quot; unsupported interfaces, it&#x27;s actually nicer with a
default value.

.. _django_coverage_plugin issue 78: https://github.com/nedbat/django_coverage_plugin/issues/78
.. _issue 1147: https://github.com/nedbat/coveragepy/issues/1147
.. _issue 1270: https://github.com/nedbat/coveragepy/issues/1270
.. _issue 1271: https://github.com/nedbat/coveragepy/issues/1271
.. _issue 1273: https://github.com/nedbat/coveragepy/issues/1273


.. _changes_611:

6.1.1

--------------------------

- Fix: The sticky header on the HTML report didn&#x27;t work unless you had branch
coverage enabled. This is now fixed: the sticky header works for everyone.
(Do people still use coverage without branch measurement!? j/k)

- Fix: When using explicitly declared namespace packages, the &quot;already imported
a file that will be measured&quot; warning would be issued (`issue 888`_).  This
is now fixed.

.. _issue 888: https://github.com/nedbat/coveragepy/issues/888


.. _changes_61:

6.1

------------------------

- Deprecated: The ``annotate`` command and the ``Coverage.annotate`` function
will be removed in a future version, unless people let me know that they are
using it.  Instead, the ``html`` command gives better-looking (and more
accurate) output, and the ``report -m`` command will tell you line numbers of
missing lines.  Please get in touch if you have a reason to use ``annotate``
over those better options: nednedbatchelder.com.

- Feature: Coverage now sets an environment variable, ``COVERAGE_RUN`` when
running your code with the ``coverage run`` command.  The value is not
important, and may change in the future.  Closes `issue 553`_.

- Feature: The HTML report pages for Python source files now have a sticky
header so the file name and controls are always visible.

- Feature: The ``xml`` and ``json`` commands now describe what they wrote
where.

- Feature: The ``html``, ``combine``, ``xml``, and ``json`` commands all accept
a ``-q/--quiet`` option to suppress the messages they write to stdout about
what they are doing (`issue 1254`_).

- Feature: The ``html`` command writes a ``.gitignore`` file into the HTML
output directory, to prevent the report from being committed to git.  If you
want to commit it, you will need to delete that file.  Closes `issue 1244`_.

- Feature: Added support for PyPy 3.8.

- Fix: More generated code is now excluded from measurement.  Code such as
`attrs`_ boilerplate, or doctest code, was being measured though the
synthetic line numbers meant they were never reported.  Once Cython was
involved though, the generated .so files were parsed as Python, raising
syntax errors, as reported in `issue 1160`_.  This is now fixed.

- Fix: When sorting human-readable names, numeric components are sorted
correctly: file10.py will appear after file9.py.  This applies to file names,
module names, environment variables, and test contexts.

- Performance: Branch coverage measurement is faster, though you might only
notice on code that is executed many times, such as long-running loops.

- Build: jQuery is no longer used or vendored (`issue 840`_ and `issue 1118`_).
Huge thanks to Nils Kattenbeck (septatrix) for the conversion to vanilla
JavaScript in `pull request 1248`_.

.. _issue 553: https://github.com/nedbat/coveragepy/issues/553
.. _issue 840: https://github.com/nedbat/coveragepy/issues/840
.. _issue 1118: https://github.com/nedbat/coveragepy/issues/1118
.. _issue 1160: https://github.com/nedbat/coveragepy/issues/1160
.. _issue 1244: https://github.com/nedbat/coveragepy/issues/1244
.. _pull request 1248: https://github.com/nedbat/coveragepy/pull/1248
.. _issue 1254: https://github.com/nedbat/coveragepy/issues/1254
.. _attrs: https://www.attrs.org/


.. _changes_602:

6.0.2

--------------------------

- Namespace packages being measured weren&#x27;t properly handled by the new code
that ignores third-party packages. If the namespace package was installed, it
was ignored as a third-party package.  That problem (`issue 1231`_) is now
fixed.

- Packages named as &quot;source packages&quot; (with ``source``, or ``source_pkgs``, or
pytest-cov&#x27;s ``--cov``) might have been only partially measured.  Their
top-level statements could be marked as un-executed, because they were
imported by coverage.py before measurement began (`issue 1232`_).  This is
now fixed, but the package will be imported twice, once by coverage.py, then
again by your test suite.  This could cause problems if importing the package
has side effects.

- The :meth:`.CoverageData.contexts_by_lineno` method was documented to return
a dict, but was returning a defaultdict.  Now it returns a plain dict.  It
also no longer returns negative numbered keys.

.. _issue 1231: https://github.com/nedbat/coveragepy/issues/1231
.. _issue 1232: https://github.com/nedbat/coveragepy/issues/1232


.. _changes_601:

6.0.1

--------------------------

- In 6.0, the coverage.py exceptions moved from coverage.misc to
coverage.exceptions. These exceptions are not part of the public supported
API, CoverageException is. But a number of other third-party packages were
importing the exceptions from coverage.misc, so they are now available from
there again (`issue 1226`_).

- Changed an internal detail of how tomli is imported, so that tomli can use
coverage.py for their own test suite (`issue 1228`_).

- Defend against an obscure possibility under code obfuscation, where a
function can have an argument called &quot;self&quot;, but no local named &quot;self&quot;
(`pull request 1210`_).  Thanks, Ben Carlsson.

.. _pull request 1210: https://github.com/nedbat/coveragepy/pull/1210
.. _issue 1226: https://github.com/nedbat/coveragepy/issues/1226
.. _issue 1228: https://github.com/nedbat/coveragepy/issues/1228


.. _changes_60:

6.0

------------------------

- The ``coverage html`` command now prints a message indicating where the HTML
report was written.  Fixes `issue 1195`_.

- The ``coverage combine`` command now prints messages indicating each data
file being combined.  Fixes `issue 1105`_.

- The HTML report now includes a sentence about skipped files due to
``skip_covered`` or ``skip_empty`` settings.  Fixes `issue 1163`_.

- Unrecognized options in the configuration file are no longer errors. They are
now warnings, to ease the use of coverage across versions.  Fixes `issue
1035`_.

- Fix handling of exceptions through context managers in Python 3.10. A missing
exception is no longer considered a missing branch from the with statement.
Fixes `issue 1205`_.

- Fix another rarer instance of &quot;Error binding parameter 0 - probably
unsupported type.&quot; (`issue 1010`_).

- Creating a directory for the coverage data file now is safer against
conflicts when two coverage runs happen simultaneously (`pull 1220`_).
Thanks, Clément Pit-Claudel.

.. _issue 1035: https://github.com/nedbat/coveragepy/issues/1035
.. _issue 1105: https://github.com/nedbat/coveragepy/issues/1105
.. _issue 1163: https://github.com/nedbat/coveragepy/issues/1163
.. _issue 1195: https://github.com/nedbat/coveragepy/issues/1195
.. _issue 1205: https://github.com/nedbat/coveragepy/issues/1205
.. _pull 1220: https://github.com/nedbat/coveragepy/pull/1220


.. _changes_60b1:

6.0b1

--------------------------

- Dropped support for Python 2.7, PyPy 2, and Python 3.5.

- Added support for the Python 3.10 ``match/case`` syntax.

- Data collection is now thread-safe.  There may have been rare instances of
exceptions raised in multi-threaded programs.

- Plugins (like the `Django coverage plugin`_) were generating &quot;Already
imported a file that will be measured&quot; warnings about Django itself.  These
have been fixed, closing `issue 1150`_.

- Warnings generated by coverage.py are now real Python warnings.

- Using ``--fail-under=100`` with coverage near 100% could result in the
self-contradictory message :code:`total of 100 is less than fail-under=100`.
This bug (`issue 1168`_) is now fixed.

- The ``COVERAGE_DEBUG_FILE`` environment variable now accepts ``stdout`` and
``stderr`` to write to those destinations.

- TOML parsing now uses the `tomli`_ library.

- Some minor changes to usually invisible details of the HTML report:

- Use a modern hash algorithm when fingerprinting, for high-security
 environments (`issue 1189`_).  When generating the HTML report, we save the
 hash of the data, to avoid regenerating an unchanged HTML page. We used to
 use MD5 to generate the hash, and now use SHA-3-256.  This was never a
 security concern, but security scanners would notice the MD5 algorithm and
 raise a false alarm.

- Change how report file names are generated, to avoid leading underscores
 (`issue 1167`_), to avoid rare file name collisions (`issue 584`_), and to
 avoid file names becoming too long (`issue 580`_).

.. _Django coverage plugin: https://pypi.org/project/django-coverage-plugin/
.. _issue 580: https://github.com/nedbat/coveragepy/issues/580
.. _issue 584: https://github.com/nedbat/coveragepy/issues/584
.. _issue 1150: https://github.com/nedbat/coveragepy/issues/1150
.. _issue 1167: https://github.com/nedbat/coveragepy/issues/1167
.. _issue 1168: https://github.com/nedbat/coveragepy/issues/1168
.. _issue 1189: https://github.com/nedbat/coveragepy/issues/1189
.. _tomli: https://pypi.org/project/tomli/


.. _changes_56b1:

5.6b1

--------------------------

Note: 5.6 final was never released. These changes are part of 6.0.

- Third-party packages are now ignored in coverage reporting.  This solves a
few problems:

- Coverage will no longer report about other people&#x27;s code (`issue 876`_).
 This is true even when using ``--source=.`` with a venv in the current
 directory.

- Coverage will no longer generate &quot;Already imported a file that will be
 measured&quot; warnings about coverage itself (`issue 905`_).

- The HTML report uses j/k to move up and down among the highlighted chunks of
code.  They used to highlight the current chunk, but 5.0 broke that behavior.
Now the highlighting is working again.

- The JSON report now includes ``percent_covered_display``, a string with the
total percentage, rounded to the same number of decimal places as the other
reports&#x27; totals.

.. _issue 876: https://github.com/nedbat/coveragepy/issues/876
.. _issue 905: https://github.com/nedbat/coveragepy/issues/905


.. _changes_55:

5.5

------------------------

- ``coverage combine`` has a new option, ``--keep`` to keep the original data
files after combining them.  The default is still to delete the files after
they have been combined.  This was requested in `issue 1108`_ and implemented
in `pull request 1110`_.  Thanks, Éric Larivière.

- When reporting missing branches in ``coverage report``, branches aren&#x27;t
reported that jump to missing lines.  This adds to the long-standing behavior
of not reporting branches from missing lines.  Now branches are only reported
if both the source and destination lines are executed.  Closes both `issue
1065`_ and `issue 955`_.

- Minor improvements to the HTML report:

- The state of the line visibility selector buttons is saved in local storage
 so you don&#x27;t have to fiddle with them so often, fixing `issue 1123`_.

- It has a little more room for line numbers so that 4-digit numbers work
 well, fixing `issue 1124`_.

- Improved the error message when combining line and branch data, so that users
will be more likely to understand what&#x27;s happening, closing `issue 803`_.

.. _issue 803: https://github.com/nedbat/coveragepy/issues/803
.. _issue 955: https://github.com/nedbat/coveragepy/issues/955
.. _issue 1065: https://github.com/nedbat/coveragepy/issues/1065
.. _issue 1108: https://github.com/nedbat/coveragepy/issues/1108
.. _pull request 1110: https://github.com/nedbat/coveragepy/pull/1110
.. _issue 1123: https://github.com/nedbat/coveragepy/issues/1123
.. _issue 1124: https://github.com/nedbat/coveragepy/issues/1124


.. _changes_54:

5.4

------------------------

- The text report produced by ``coverage report`` now always outputs a TOTAL
line, even if only one Python file is reported.  This makes regex parsing
of the output easier.  Thanks, Judson Neer.  This had been requested a number
of times (`issue 1086`_, `issue 922`_, `issue 732`_).

- The ``skip_covered`` and ``skip_empty`` settings in the configuration file
can now be specified in the ``[html]`` section, so that text reports and HTML
reports can use separate settings.  The HTML report will still use the
``[report]`` settings if there isn&#x27;t a value in the ``[html]`` section.
Closes `issue 1090`_.

- Combining files on Windows across drives now works properly, fixing `issue
577`_.  Thanks, `Valentin Lab &lt;pr1080_&gt;`_.

- Fix an obscure warning from deep in the _decimal module, as reported in
`issue 1084`_.

- Update to support Python 3.10 alphas in progress, including `PEP 626: Precise
line numbers for debugging and other tools &lt;pep626_&gt;`_.

.. _issue 577: https://github.com/nedbat/coveragepy/issues/577
.. _issue 732: https://github.com/nedbat/coveragepy/issues/732
.. _issue 922: https://github.com/nedbat/coveragepy/issues/922
.. _issue 1084: https://github.com/nedbat/coveragepy/issues/1084
.. _issue 1086: https://github.com/nedbat/coveragepy/issues/1086
.. _issue 1090: https://github.com/nedbat/coveragepy/issues/1090
.. _pr1080: https://github.com/nedbat/coveragepy/pull/1080
.. _pep626: https://www.python.org/dev/peps/pep-0626/


.. _changes_531:

5.3.1

--------------------------

- When using ``--source`` on a large source tree, v5.x was slower than previous
versions.  This performance regression is now fixed, closing `issue 1037`_.

- Mysterious SQLite errors can happen on PyPy, as reported in `issue 1010`_. An
immediate retry seems to fix the problem, although it is an unsatisfying
solution.

- The HTML report now saves the sort order in a more widely supported way,
fixing `issue 986`_.  Thanks, Sebastián Ramírez (`pull request 1066`_).

- The HTML report pages now have a :ref:`Sleepy Snake &lt;sleepy&gt;` favicon.

- Wheels are now provided for manylinux2010, and for PyPy3 (pp36 and pp37).

- Continuous integration has moved from Travis and AppVeyor to GitHub Actions.

.. _issue 986: https://github.com/nedbat/coveragepy/issues/986
.. _issue 1037: https://github.com/nedbat/coveragepy/issues/1037
.. _issue 1010: https://github.com/nedbat/coveragepy/issues/1010
.. _pull request 1066: https://github.com/nedbat/coveragepy/pull/1066

.. _changes_53:

5.3

------------------------

- The ``source`` setting has always been interpreted as either a file path or a
module, depending on which existed.  If both interpretations were valid, it
was assumed to be a file path.  The new ``source_pkgs`` setting can be used
to name a package to disambiguate this case.  Thanks, Thomas Grainger. Fixes
`issue 268`_.

- If a plugin was disabled due to an exception, we used to still try to record
its information, causing an exception, as reported in `issue 1011`_.  This is
now fixed.

.. _issue 268: https://github.com/nedbat/coveragepy/issues/268
.. _issue 1011: https://github.com/nedbat/coveragepy/issues/1011


.. endchangesinclude

Older changes
-------------

The complete history is available in the `coverage.py docs`__.

__ https://coverage.readthedocs.io/en/latest/changes.html

Links

PyPI: https://pypi.org/project/coverage
Changelog: https://pyup.io/changelogs/coverage/
Repo: https://github.com/nedbat/coveragepy

update

opened by pyup-bot 0

Pin coverage to latest version 7.0.2

This PR pins coverage to the latest release 7.0.2.

Changelog

7.0.2

--------------------------

- Fix: when using the ``[run] relative_files = True`` setting, a relative
``[paths]`` pattern was still being made absolute.  This is now fixed,
closing `issue 1519`_.

- Fix: if Python doesn&#x27;t provide tomllib, then TOML configuration files can
only be read if coverage.py is installed with the ``[toml]`` extra.
Coverage.py will raise an error if TOML support is not installed when it sees
your settings are in a .toml file. But it didn&#x27;t understand that
``[tools.coverage]`` was a valid section header, so the error wasn&#x27;t reported
if you used that header, and settings were silently ignored.  This is now
fixed, closing `issue 1516`_.

- Fix: adjusted how decorators are traced on PyPy 7.3.10, fixing `issue 1515`_.

- Fix: the ``coverage lcov`` report did not properly implement the
``--fail-under=MIN`` option.  This has been fixed.

- Refactor: added many type annotations, including a number of refactorings.
This should not affect outward behavior, but they were a bit invasive in some
places, so keep your eyes peeled for oddities.

- Refactor: removed the vestigial and long untested support for Jython and
IronPython.

.. _issue 1515: https://github.com/nedbat/coveragepy/issues/1515
.. _issue 1516: https://github.com/nedbat/coveragepy/issues/1516
.. _issue 1519: https://github.com/nedbat/coveragepy/issues/1519


.. _changes_7-0-1:

7.0.1

--------------------------

- When checking if a file mapping resolved to a file that exists, we weren&#x27;t
considering files in .whl files.  This is now fixed, closing `issue 1511`_.

- File pattern rules were too strict, forbidding plus signs and curly braces in
directory and file names.  This is now fixed, closing `issue 1513`_.

- Unusual Unicode or control characters in source files could prevent
reporting.  This is now fixed, closing `issue 1512`_.

- The PyPy wheel now installs on PyPy 3.7, 3.8, and 3.9, closing `issue 1510`_.

.. _issue 1510: https://github.com/nedbat/coveragepy/issues/1510
.. _issue 1511: https://github.com/nedbat/coveragepy/issues/1511
.. _issue 1512: https://github.com/nedbat/coveragepy/issues/1512
.. _issue 1513: https://github.com/nedbat/coveragepy/issues/1513


.. _changes_7-0-0:

7.0.0

--------------------------

Nothing new beyond 7.0.0b1.


.. _changes_7-0-0b1:

7.0.0b1

&lt;changes_7-0-0b1_&gt;`_.)

- Changes to file pattern matching, which might require updating your
configuration:

- Previously, ``*`` would incorrectly match directory separators, making
 precise matching difficult.  This is now fixed, closing `issue 1407`_.

- Now ``**`` matches any number of nested directories, including none.

- Improvements to combining data files when using the
:ref:`config_run_relative_files` setting:

- During ``coverage combine``, relative file paths are implicitly combined
 without needing a ``[paths]`` configuration setting.  This also fixed
 `issue 991`_.

- A ``[paths]`` setting like ``*/foo`` will now match ``foo/bar.py`` so that
 relative file paths can be combined more easily.

- The setting is properly interpreted in more places, fixing `issue 1280`_.

- Fixed environment variable expansion in pyproject.toml files.  It was overly
broad, causing errors outside of coverage.py settings, as described in `issue
1481`_ and `issue 1345`_.  This is now fixed, but in rare cases will require
changing your pyproject.toml to quote non-string values that use environment
substitution.

- Fixed internal logic that prevented coverage.py from running on
implementations other than CPython or PyPy (`issue 1474`_).

.. _issue 991: https://github.com/nedbat/coveragepy/issues/991
.. _issue 1280: https://github.com/nedbat/coveragepy/issues/1280
.. _issue 1345: https://github.com/nedbat/coveragepy/issues/1345
.. _issue 1407: https://github.com/nedbat/coveragepy/issues/1407
.. _issue 1474: https://github.com/nedbat/coveragepy/issues/1474
.. _issue 1481: https://github.com/nedbat/coveragepy/issues/1481


.. _changes_6-5-0:

6.6.0

- Changes to file pattern matching, which might require updating your
configuration:

- Previously, ``*`` would incorrectly match directory separators, making
 precise matching difficult.  This is now fixed, closing `issue 1407`_.

- Now ``**`` matches any number of nested directories, including none.

- Improvements to combining data files when using the
:ref:`config_run_relative_files` setting, which might require updating your
configuration:

- During ``coverage combine``, relative file paths are implicitly combined
 without needing a ``[paths]`` configuration setting.  This also fixed
 `issue 991`_.

- A ``[paths]`` setting like ``*/foo`` will now match ``foo/bar.py`` so that
 relative file paths can be combined more easily.

- The :ref:`config_run_relative_files` setting is properly interpreted in
 more places, fixing `issue 1280`_.

- When remapping file paths with ``[paths]``, a path will be remapped only if
the resulting path exists.  The documentation has long said the prefix had to
exist, but it was never enforced.  This fixes `issue 608`_, improves `issue
649`_, and closes `issue 757`_.

- Reporting operations now implicitly use the ``[paths]`` setting to remap file
paths within a single data file.  Combining multiple files still requires the
``coverage combine`` step, but this simplifies some single-file situations.
Closes `issue 1212`_ and `issue 713`_.

- The ``coverage report`` command now has a ``--format=`` option.  The original
style is now ``--format=text``, and is the default.

- Using ``--format=markdown`` will write the table in Markdown format, thanks
 to `Steve Oswald &lt;pull 1479_&gt;`_, closing `issue 1418`_.

- Using ``--format=total`` will write a single total number to the
 output.  This can be useful for making badges or writing status updates.

- Combining data files with ``coverage combine`` now hashes the data files to
skip files that add no new information.  This can reduce the time needed.
Many details affect the speed-up, but for coverage.py&#x27;s own test suite,
combining is about 40% faster. Closes `issue 1483`_.

- When searching for completely un-executed files, coverage.py uses the
presence of ``__init__.py`` files to determine which directories have source
that could have been imported.  However, `implicit namespace packages`_ don&#x27;t
require ``__init__.py``.  A new setting ``[report]
include_namespace_packages`` tells coverage.py to consider these directories
during reporting.  Thanks to `Felix Horvat &lt;pull 1387_&gt;`_ for the
contribution.  Closes `issue 1383`_ and `issue 1024`_.

- Fixed environment variable expansion in pyproject.toml files.  It was overly
broad, causing errors outside of coverage.py settings, as described in `issue
1481`_ and `issue 1345`_.  This is now fixed, but in rare cases will require
changing your pyproject.toml to quote non-string values that use environment
substitution.

- An empty file has a coverage total of 100%, but used to fail with
``--fail-under``.  This has been fixed, closing `issue 1470`_.

- The text report table no longer writes out two separator lines if there are
no files listed in the table.  One is plenty.

- Fixed a mis-measurement of a strange use of wildcard alternatives in
match/case statements, closing `issue 1421`_.

- Fixed internal logic that prevented coverage.py from running on
implementations other than CPython or PyPy (`issue 1474`_).

- The deprecated ``[run] note`` setting has been completely removed.

.. _implicit namespace packages: https://peps.python.org/pep-0420/
.. _issue 608: https://github.com/nedbat/coveragepy/issues/608
.. _issue 649: https://github.com/nedbat/coveragepy/issues/649
.. _issue 713: https://github.com/nedbat/coveragepy/issues/713
.. _issue 757: https://github.com/nedbat/coveragepy/issues/757
.. _issue 991: https://github.com/nedbat/coveragepy/issues/991
.. _issue 1024: https://github.com/nedbat/coveragepy/issues/1024
.. _issue 1212: https://github.com/nedbat/coveragepy/issues/1212
.. _issue 1280: https://github.com/nedbat/coveragepy/issues/1280
.. _issue 1345: https://github.com/nedbat/coveragepy/issues/1345
.. _issue 1383: https://github.com/nedbat/coveragepy/issues/1383
.. _issue 1407: https://github.com/nedbat/coveragepy/issues/1407
.. _issue 1418: https://github.com/nedbat/coveragepy/issues/1418
.. _issue 1421: https://github.com/nedbat/coveragepy/issues/1421
.. _issue 1470: https://github.com/nedbat/coveragepy/issues/1470
.. _issue 1474: https://github.com/nedbat/coveragepy/issues/1474
.. _issue 1481: https://github.com/nedbat/coveragepy/issues/1481
.. _issue 1483: https://github.com/nedbat/coveragepy/issues/1483
.. _pull 1387: https://github.com/nedbat/coveragepy/pull/1387
.. _pull 1479: https://github.com/nedbat/coveragepy/pull/1479



.. _changes_6-6-0b1:

6.6.0b1

----------------------------

6.5.0

--------------------------

- The JSON report now includes details of which branches were taken, and which
are missing for each file. Thanks, `Christoph Blessing &lt;pull 1438_&gt;`_. Closes
`issue 1425`_.

- Starting with coverage.py 6.2, ``class`` statements were marked as a branch.
This wasn&#x27;t right, and has been reverted, fixing `issue 1449`_. Note this
will very slightly reduce your coverage total if you are measuring branch
coverage.

- Packaging is now compliant with `PEP 517`_, closing `issue 1395`_.

- A new debug option ``--debug=pathmap`` shows details of the remapping of
paths that happens during combine due to the ``[paths]`` setting.

- Fix an internal problem with caching of invalid Python parsing. Found by
OSS-Fuzz, fixing their `bug 50381`_.

.. _bug 50381: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50381
.. _PEP 517: https://peps.python.org/pep-0517/
.. _issue 1395: https://github.com/nedbat/coveragepy/issues/1395
.. _issue 1425: https://github.com/nedbat/coveragepy/issues/1425
.. _issue 1449: https://github.com/nedbat/coveragepy/issues/1449
.. _pull 1438: https://github.com/nedbat/coveragepy/pull/1438


.. _changes_6-4-4:

6.4.4

--------------------------

- Wheels are now provided for Python 3.11.


.. _changes_6-4-3:

6.4.3

--------------------------

- Fix a failure when combining data files if the file names contained glob-like
patterns.  Thanks, `Michael Krebs and Benjamin Schubert &lt;pull 1405_&gt;`_.

- Fix a messaging failure when combining Windows data files on a different
drive than the current directory, closing `issue 1428`_.  Thanks, `Lorenzo
Micò &lt;pull 1430_&gt;`_.

- Fix path calculations when running in the root directory, as you might do in
a Docker container. Thanks `Arthur Rio &lt;pull 1403_&gt;`_.

- Filtering in the HTML report wouldn&#x27;t work when reloading the index page.
This is now fixed.  Thanks, `Marc Legendre &lt;pull 1413_&gt;`_.

- Fix a problem with Cython code measurement, closing `issue 972`_.  Thanks,
`Matus Valo &lt;pull 1347_&gt;`_.

.. _issue 972: https://github.com/nedbat/coveragepy/issues/972
.. _issue 1428: https://github.com/nedbat/coveragepy/issues/1428
.. _pull 1347: https://github.com/nedbat/coveragepy/pull/1347
.. _pull 1403: https://github.com/nedbat/coveragepy/issues/1403
.. _pull 1405: https://github.com/nedbat/coveragepy/issues/1405
.. _pull 1413: https://github.com/nedbat/coveragepy/issues/1413
.. _pull 1430: https://github.com/nedbat/coveragepy/pull/1430


.. _changes_6-4-2:

6.4.2

--------------------------

- Updated for a small change in Python 3.11.0 beta 4: modules now start with a
line with line number 0, which is ignored.  This line cannot be executed, so
coverage totals were thrown off.  This line is now ignored by coverage.py,
but this also means that truly empty modules (like ``__init__.py``) have no
lines in them, rather than one phantom line.  Fixes `issue 1419`_.

- Internal debugging data added to sys.modules is now an actual module, to
avoid confusing code that examines everything in sys.modules.  Thanks,
`Yilei Yang &lt;pull 1399_&gt;`_.

.. _issue 1419: https://github.com/nedbat/coveragepy/issues/1419
.. _pull 1399: https://github.com/nedbat/coveragepy/pull/1399


.. _changes_6-4-1:

6.4.1

--------------------------

- Greatly improved performance on PyPy, and other environments that need the
pure Python trace function.  Thanks, Carl Friedrich Bolz-Tereick (`pull
1381`_ and `pull 1388`_).  Slightly improved performance when using the C
trace function, as most environments do.  Closes `issue 1339`_.

- The conditions for using tomllib from the standard library have been made
more precise, so that 3.11 alphas will continue to work. Closes `issue
1390`_.

.. _issue 1339: https://github.com/nedbat/coveragepy/issues/1339
.. _pull 1381: https://github.com/nedbat/coveragepy/pull/1381
.. _pull 1388: https://github.com/nedbat/coveragepy/pull/1388
.. _issue 1390: https://github.com/nedbat/coveragepy/issues/1390


.. _changes_64:

6.4

------------------------

- A new setting, :ref:`config_run_sigterm`, controls whether a SIGTERM signal
handler is used.  In 6.3, the signal handler was always installed, to capture
data at unusual process ends.  Unfortunately, this introduced other problems
(see `issue 1310`_).  Now the signal handler is only used if you opt-in by
setting ``[run] sigterm = true``.

- Small changes to the HTML report:

- Added links to next and previous file, and more keyboard shortcuts: ``[``
 and ``]`` for next file and previous file; ``u`` for up to the index; and
 ``?`` to open/close the help panel.  Thanks, `J. M. F. Tsang
 &lt;pull 1364_&gt;`_.

- The time stamp and version are displayed at the top of the report.  Thanks,
 `Ammar Askar &lt;pull 1354_&gt;`_. Closes `issue 1351`_.

- A new debug option ``debug=sqldata`` adds more detail to ``debug=sql``,
logging all the data being written to the database.

- Previously, running ``coverage report`` (or any of the reporting commands) in
an empty directory would create a .coverage data file.  Now they do not,
fixing `issue 1328`_.

- On Python 3.11, the ``[toml]`` extra no longer installs tomli, instead using
tomllib from the standard library.  Thanks `Shantanu &lt;pull 1359_&gt;`_.

- In-memory CoverageData objects now properly update(), closing `issue 1323`_.

.. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
.. _issue 1323: https://github.com/nedbat/coveragepy/issues/1323
.. _issue 1328: https://github.com/nedbat/coveragepy/issues/1328
.. _issue 1351: https://github.com/nedbat/coveragepy/issues/1351
.. _pull 1354: https://github.com/nedbat/coveragepy/pull/1354
.. _pull 1359: https://github.com/nedbat/coveragepy/pull/1359
.. _pull 1364: https://github.com/nedbat/coveragepy/pull/1364


.. _changes_633:

6.3.3

--------------------------

- Fix: Coverage.py now builds successfully on CPython 3.11 (3.11.0b1) again.
Closes `issue 1367`_.  Some results for generators may have changed.

.. _issue 1367: https://github.com/nedbat/coveragepy/issues/1367


.. _changes_632:

6.3.2

--------------------------

- Fix: adapt to pypy3.9&#x27;s decorator tracing behavior.  It now traces function
decorators like CPython 3.8: both the -line and the def-line are traced.
Fixes `issue 1326`_.

- Debug: added ``pybehave`` to the list of :ref:`coverage debug &lt;cmd_debug&gt;`
and :ref:`cmd_run_debug` options.

- Fix: show an intelligible error message if ``--concurrency=multiprocessing``
is used without a configuration file.  Closes `issue 1320`_.

.. _issue 1320: https://github.com/nedbat/coveragepy/issues/1320
.. _issue 1326: https://github.com/nedbat/coveragepy/issues/1326


.. _changes_631:

6.3.1

--------------------------

- Fix: deadlocks could occur when terminating processes.  Some of these
deadlocks (described in `issue 1310`_) are now fixed.

- Fix: a signal handler was being set from multiple threads, causing an error:
&quot;ValueError: signal only works in main thread&quot;.  This is now fixed, closing
`issue 1312`_.

- Fix: ``--precision`` on the command-line was being ignored while considering
``--fail-under``.  This is now fixed, thanks to
`Marcelo Trylesinski &lt;pull 1317_&gt;`_.

- Fix: releases no longer provide 3.11.0-alpha wheels. Coverage.py uses CPython
internal fields which are moving during the alpha phase. Fixes `issue 1316`_.

.. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
.. _issue 1312: https://github.com/nedbat/coveragepy/issues/1312
.. _issue 1316: https://github.com/nedbat/coveragepy/issues/1316
.. _pull 1317: https://github.com/nedbat/coveragepy/pull/1317


.. _changes_63:

6.3

------------------------

- Feature: Added the ``lcov`` command to generate reports in LCOV format.
Thanks, `Bradley Burns &lt;pull 1289_&gt;`_. Closes issues `587 &lt;issue 587_&gt;`_
and `626 &lt;issue 626_&gt;`_.

- Feature: the coverage data file can now be specified on the command line with
the ``--data-file`` option in any command that reads or writes data.  This is
in addition to the existing ``COVERAGE_FILE`` environment variable.  Closes
`issue 624`_. Thanks, `Nikita Bloshchanevich &lt;pull 1304_&gt;`_.

- Feature: coverage measurement data will now be written when a SIGTERM signal
is received by the process.  This includes
:meth:`Process.terminate &lt;python:multiprocessing.Process.terminate&gt;`,
and other ways to terminate a process.  Currently this is only on Linux and
Mac; Windows is not supported.  Fixes `issue 1307`_.

- Dropped support for Python 3.6, which reached end-of-life on 2021-12-23.

- Updated Python 3.11 support to 3.11.0a4, fixing `issue 1294`_.

- Fix: the coverage data file is now created in a more robust way, to avoid
problems when multiple processes are trying to write data at once. Fixes
issues `1303 &lt;issue 1303_&gt;`_ and `883 &lt;issue 883_&gt;`_.

- Fix: a .gitignore file will only be written into the HTML report output
directory if the directory is empty.  This should prevent certain unfortunate
accidents of writing the file where it is not wanted.

- Releases now have MacOS arm64 wheels for Apple Silicon, fixing `issue 1288`_.

.. _issue 587: https://github.com/nedbat/coveragepy/issues/587
.. _issue 624: https://github.com/nedbat/coveragepy/issues/624
.. _issue 626: https://github.com/nedbat/coveragepy/issues/626
.. _issue 883: https://github.com/nedbat/coveragepy/issues/883
.. _issue 1288: https://github.com/nedbat/coveragepy/issues/1288
.. _issue 1294: https://github.com/nedbat/coveragepy/issues/1294
.. _issue 1303: https://github.com/nedbat/coveragepy/issues/1303
.. _issue 1307: https://github.com/nedbat/coveragepy/issues/1307
.. _pull 1289: https://github.com/nedbat/coveragepy/pull/1289
.. _pull 1304: https://github.com/nedbat/coveragepy/pull/1304


.. _changes_62:

6.2

------------------------

- Feature: Now the ``--concurrency`` setting can now have a list of values, so
that threads and another lightweight threading package can be measured
together, such as ``--concurrency=gevent,thread``.  Closes `issue 1012`_ and
`issue 1082`_.

- Fix: A module specified as the ``source`` setting is imported during startup,
before the user program imports it.  This could cause problems if the rest of
the program isn&#x27;t ready yet.  For example, `issue 1203`_ describes a Django
setting that is accessed before settings have been configured.  Now the early
import is wrapped in a try/except so errors then don&#x27;t stop execution.

- Fix: A colon in a decorator expression would cause an exclusion to end too
early, preventing the exclusion of the decorated function. This is now fixed.

- Fix: The HTML report now will not overwrite a .gitignore file that already
exists in the HTML output directory (follow-on for `issue 1244`_).

- API: The exceptions raised by Coverage.py have been specialized, to provide
finer-grained catching of exceptions by third-party code.

- API: Using ``suffix=False`` when constructing a Coverage object with
multiprocessing wouldn&#x27;t suppress the data file suffix (`issue 989`_).  This
is now fixed.

- Debug: The ``coverage debug data`` command will now sniff out combinable data
files, and report on all of them.

- Debug: The ``coverage debug`` command used to accept a number of topics at a
time, and show all of them, though this was never documented.  This no longer
works, to allow for command-line options in the future.

.. _issue 989: https://github.com/nedbat/coveragepy/issues/989
.. _issue 1012: https://github.com/nedbat/coveragepy/issues/1012
.. _issue 1082: https://github.com/nedbat/coveragepy/issues/1082
.. _issue 1203: https://github.com/nedbat/coveragepy/issues/1203


.. _changes_612:

6.1.2

--------------------------

- Python 3.11 is supported (tested with 3.11.0a2).  One still-open issue has to
do with `exits through with-statements &lt;issue 1270_&gt;`_.

- Fix: When remapping file paths through the ``[paths]`` setting while
combining, the ``[run] relative_files`` setting was ignored, resulting in
absolute paths for remapped file names (`issue 1147`_).  This is now fixed.

- Fix: Complex conditionals over excluded lines could have incorrectly reported
a missing branch (`issue 1271`_). This is now fixed.

- Fix: More exceptions are now handled when trying to parse source files for
reporting.  Problems that used to terminate coverage.py can now be handled
with ``[report] ignore_errors``.  This helps with plugins failing to read
files (`django_coverage_plugin issue 78`_).

- Fix: Removed another vestige of jQuery from the source tarball
(`issue 840`_).

- Fix: Added a default value for a new-to-6.x argument of an internal class.
This unsupported class is being used by coveralls (`issue 1273`_). Although
I&#x27;d rather not &quot;fix&quot; unsupported interfaces, it&#x27;s actually nicer with a
default value.

.. _django_coverage_plugin issue 78: https://github.com/nedbat/django_coverage_plugin/issues/78
.. _issue 1147: https://github.com/nedbat/coveragepy/issues/1147
.. _issue 1270: https://github.com/nedbat/coveragepy/issues/1270
.. _issue 1271: https://github.com/nedbat/coveragepy/issues/1271
.. _issue 1273: https://github.com/nedbat/coveragepy/issues/1273


.. _changes_611:

6.1.1

--------------------------

- Fix: The sticky header on the HTML report didn&#x27;t work unless you had branch
coverage enabled. This is now fixed: the sticky header works for everyone.
(Do people still use coverage without branch measurement!? j/k)

- Fix: When using explicitly declared namespace packages, the &quot;already imported
a file that will be measured&quot; warning would be issued (`issue 888`_).  This
is now fixed.

.. _issue 888: https://github.com/nedbat/coveragepy/issues/888


.. _changes_61:

6.1

------------------------

- Deprecated: The ``annotate`` command and the ``Coverage.annotate`` function
will be removed in a future version, unless people let me know that they are
using it.  Instead, the ``html`` command gives better-looking (and more
accurate) output, and the ``report -m`` command will tell you line numbers of
missing lines.  Please get in touch if you have a reason to use ``annotate``
over those better options: nednedbatchelder.com.

- Feature: Coverage now sets an environment variable, ``COVERAGE_RUN`` when
running your code with the ``coverage run`` command.  The value is not
important, and may change in the future.  Closes `issue 553`_.

- Feature: The HTML report pages for Python source files now have a sticky
header so the file name and controls are always visible.

- Feature: The ``xml`` and ``json`` commands now describe what they wrote
where.

- Feature: The ``html``, ``combine``, ``xml``, and ``json`` commands all accept
a ``-q/--quiet`` option to suppress the messages they write to stdout about
what they are doing (`issue 1254`_).

- Feature: The ``html`` command writes a ``.gitignore`` file into the HTML
output directory, to prevent the report from being committed to git.  If you
want to commit it, you will need to delete that file.  Closes `issue 1244`_.

- Feature: Added support for PyPy 3.8.

- Fix: More generated code is now excluded from measurement.  Code such as
`attrs`_ boilerplate, or doctest code, was being measured though the
synthetic line numbers meant they were never reported.  Once Cython was
involved though, the generated .so files were parsed as Python, raising
syntax errors, as reported in `issue 1160`_.  This is now fixed.

- Fix: When sorting human-readable names, numeric components are sorted
correctly: file10.py will appear after file9.py.  This applies to file names,
module names, environment variables, and test contexts.

- Performance: Branch coverage measurement is faster, though you might only
notice on code that is executed many times, such as long-running loops.

- Build: jQuery is no longer used or vendored (`issue 840`_ and `issue 1118`_).
Huge thanks to Nils Kattenbeck (septatrix) for the conversion to vanilla
JavaScript in `pull request 1248`_.

.. _issue 553: https://github.com/nedbat/coveragepy/issues/553
.. _issue 840: https://github.com/nedbat/coveragepy/issues/840
.. _issue 1118: https://github.com/nedbat/coveragepy/issues/1118
.. _issue 1160: https://github.com/nedbat/coveragepy/issues/1160
.. _issue 1244: https://github.com/nedbat/coveragepy/issues/1244
.. _pull request 1248: https://github.com/nedbat/coveragepy/pull/1248
.. _issue 1254: https://github.com/nedbat/coveragepy/issues/1254
.. _attrs: https://www.attrs.org/


.. _changes_602:

6.0.2

--------------------------

- Namespace packages being measured weren&#x27;t properly handled by the new code
that ignores third-party packages. If the namespace package was installed, it
was ignored as a third-party package.  That problem (`issue 1231`_) is now
fixed.

- Packages named as &quot;source packages&quot; (with ``source``, or ``source_pkgs``, or
pytest-cov&#x27;s ``--cov``) might have been only partially measured.  Their
top-level statements could be marked as un-executed, because they were
imported by coverage.py before measurement began (`issue 1232`_).  This is
now fixed, but the package will be imported twice, once by coverage.py, then
again by your test suite.  This could cause problems if importing the package
has side effects.

- The :meth:`.CoverageData.contexts_by_lineno` method was documented to return
a dict, but was returning a defaultdict.  Now it returns a plain dict.  It
also no longer returns negative numbered keys.

.. _issue 1231: https://github.com/nedbat/coveragepy/issues/1231
.. _issue 1232: https://github.com/nedbat/coveragepy/issues/1232


.. _changes_601:

6.0.1

--------------------------

- In 6.0, the coverage.py exceptions moved from coverage.misc to
coverage.exceptions. These exceptions are not part of the public supported
API, CoverageException is. But a number of other third-party packages were
importing the exceptions from coverage.misc, so they are now available from
there again (`issue 1226`_).

- Changed an internal detail of how tomli is imported, so that tomli can use
coverage.py for their own test suite (`issue 1228`_).

- Defend against an obscure possibility under code obfuscation, where a
function can have an argument called &quot;self&quot;, but no local named &quot;self&quot;
(`pull request 1210`_).  Thanks, Ben Carlsson.

.. _pull request 1210: https://github.com/nedbat/coveragepy/pull/1210
.. _issue 1226: https://github.com/nedbat/coveragepy/issues/1226
.. _issue 1228: https://github.com/nedbat/coveragepy/issues/1228


.. _changes_60:

6.0

------------------------

- The ``coverage html`` command now prints a message indicating where the HTML
report was written.  Fixes `issue 1195`_.

- The ``coverage combine`` command now prints messages indicating each data
file being combined.  Fixes `issue 1105`_.

- The HTML report now includes a sentence about skipped files due to
``skip_covered`` or ``skip_empty`` settings.  Fixes `issue 1163`_.

- Unrecognized options in the configuration file are no longer errors. They are
now warnings, to ease the use of coverage across versions.  Fixes `issue
1035`_.

- Fix handling of exceptions through context managers in Python 3.10. A missing
exception is no longer considered a missing branch from the with statement.
Fixes `issue 1205`_.

- Fix another rarer instance of &quot;Error binding parameter 0 - probably
unsupported type.&quot; (`issue 1010`_).

- Creating a directory for the coverage data file now is safer against
conflicts when two coverage runs happen simultaneously (`pull 1220`_).
Thanks, Clément Pit-Claudel.

.. _issue 1035: https://github.com/nedbat/coveragepy/issues/1035
.. _issue 1105: https://github.com/nedbat/coveragepy/issues/1105
.. _issue 1163: https://github.com/nedbat/coveragepy/issues/1163
.. _issue 1195: https://github.com/nedbat/coveragepy/issues/1195
.. _issue 1205: https://github.com/nedbat/coveragepy/issues/1205
.. _pull 1220: https://github.com/nedbat/coveragepy/pull/1220


.. _changes_60b1:

6.0b1

--------------------------

- Dropped support for Python 2.7, PyPy 2, and Python 3.5.

- Added support for the Python 3.10 ``match/case`` syntax.

- Data collection is now thread-safe.  There may have been rare instances of
exceptions raised in multi-threaded programs.

- Plugins (like the `Django coverage plugin`_) were generating &quot;Already
imported a file that will be measured&quot; warnings about Django itself.  These
have been fixed, closing `issue 1150`_.

- Warnings generated by coverage.py are now real Python warnings.

- Using ``--fail-under=100`` with coverage near 100% could result in the
self-contradictory message :code:`total of 100 is less than fail-under=100`.
This bug (`issue 1168`_) is now fixed.

- The ``COVERAGE_DEBUG_FILE`` environment variable now accepts ``stdout`` and
``stderr`` to write to those destinations.

- TOML parsing now uses the `tomli`_ library.

- Some minor changes to usually invisible details of the HTML report:

- Use a modern hash algorithm when fingerprinting, for high-security
 environments (`issue 1189`_).  When generating the HTML report, we save the
 hash of the data, to avoid regenerating an unchanged HTML page. We used to
 use MD5 to generate the hash, and now use SHA-3-256.  This was never a
 security concern, but security scanners would notice the MD5 algorithm and
 raise a false alarm.

- Change how report file names are generated, to avoid leading underscores
 (`issue 1167`_), to avoid rare file name collisions (`issue 584`_), and to
 avoid file names becoming too long (`issue 580`_).

.. _Django coverage plugin: https://pypi.org/project/django-coverage-plugin/
.. _issue 580: https://github.com/nedbat/coveragepy/issues/580
.. _issue 584: https://github.com/nedbat/coveragepy/issues/584
.. _issue 1150: https://github.com/nedbat/coveragepy/issues/1150
.. _issue 1167: https://github.com/nedbat/coveragepy/issues/1167
.. _issue 1168: https://github.com/nedbat/coveragepy/issues/1168
.. _issue 1189: https://github.com/nedbat/coveragepy/issues/1189
.. _tomli: https://pypi.org/project/tomli/


.. _changes_56b1:

5.6b1

--------------------------

Note: 5.6 final was never released. These changes are part of 6.0.

- Third-party packages are now ignored in coverage reporting.  This solves a
few problems:

- Coverage will no longer report about other people&#x27;s code (`issue 876`_).
 This is true even when using ``--source=.`` with a venv in the current
 directory.

- Coverage will no longer generate &quot;Already imported a file that will be
 measured&quot; warnings about coverage itself (`issue 905`_).

- The HTML report uses j/k to move up and down among the highlighted chunks of
code.  They used to highlight the current chunk, but 5.0 broke that behavior.
Now the highlighting is working again.

- The JSON report now includes ``percent_covered_display``, a string with the
total percentage, rounded to the same number of decimal places as the other
reports&#x27; totals.

.. _issue 876: https://github.com/nedbat/coveragepy/issues/876
.. _issue 905: https://github.com/nedbat/coveragepy/issues/905


.. _changes_55:

5.5

------------------------

- ``coverage combine`` has a new option, ``--keep`` to keep the original data
files after combining them.  The default is still to delete the files after
they have been combined.  This was requested in `issue 1108`_ and implemented
in `pull request 1110`_.  Thanks, Éric Larivière.

- When reporting missing branches in ``coverage report``, branches aren&#x27;t
reported that jump to missing lines.  This adds to the long-standing behavior
of not reporting branches from missing lines.  Now branches are only reported
if both the source and destination lines are executed.  Closes both `issue
1065`_ and `issue 955`_.

- Minor improvements to the HTML report:

- The state of the line visibility selector buttons is saved in local storage
 so you don&#x27;t have to fiddle with them so often, fixing `issue 1123`_.

- It has a little more room for line numbers so that 4-digit numbers work
 well, fixing `issue 1124`_.

- Improved the error message when combining line and branch data, so that users
will be more likely to understand what&#x27;s happening, closing `issue 803`_.

.. _issue 803: https://github.com/nedbat/coveragepy/issues/803
.. _issue 955: https://github.com/nedbat/coveragepy/issues/955
.. _issue 1065: https://github.com/nedbat/coveragepy/issues/1065
.. _issue 1108: https://github.com/nedbat/coveragepy/issues/1108
.. _pull request 1110: https://github.com/nedbat/coveragepy/pull/1110
.. _issue 1123: https://github.com/nedbat/coveragepy/issues/1123
.. _issue 1124: https://github.com/nedbat/coveragepy/issues/1124


.. _changes_54:

5.4

------------------------

- The text report produced by ``coverage report`` now always outputs a TOTAL
line, even if only one Python file is reported.  This makes regex parsing
of the output easier.  Thanks, Judson Neer.  This had been requested a number
of times (`issue 1086`_, `issue 922`_, `issue 732`_).

- The ``skip_covered`` and ``skip_empty`` settings in the configuration file
can now be specified in the ``[html]`` section, so that text reports and HTML
reports can use separate settings.  The HTML report will still use the
``[report]`` settings if there isn&#x27;t a value in the ``[html]`` section.
Closes `issue 1090`_.

- Combining files on Windows across drives now works properly, fixing `issue
577`_.  Thanks, `Valentin Lab &lt;pr1080_&gt;`_.

- Fix an obscure warning from deep in the _decimal module, as reported in
`issue 1084`_.

- Update to support Python 3.10 alphas in progress, including `PEP 626: Precise
line numbers for debugging and other tools &lt;pep626_&gt;`_.

.. _issue 577: https://github.com/nedbat/coveragepy/issues/577
.. _issue 732: https://github.com/nedbat/coveragepy/issues/732
.. _issue 922: https://github.com/nedbat/coveragepy/issues/922
.. _issue 1084: https://github.com/nedbat/coveragepy/issues/1084
.. _issue 1086: https://github.com/nedbat/coveragepy/issues/1086
.. _issue 1090: https://github.com/nedbat/coveragepy/issues/1090
.. _pr1080: https://github.com/nedbat/coveragepy/pull/1080
.. _pep626: https://www.python.org/dev/peps/pep-0626/


.. _changes_531:

5.3.1

--------------------------

- When using ``--source`` on a large source tree, v5.x was slower than previous
versions.  This performance regression is now fixed, closing `issue 1037`_.

- Mysterious SQLite errors can happen on PyPy, as reported in `issue 1010`_. An
immediate retry seems to fix the problem, although it is an unsatisfying
solution.

- The HTML report now saves the sort order in a more widely supported way,
fixing `issue 986`_.  Thanks, Sebastián Ramírez (`pull request 1066`_).

- The HTML report pages now have a :ref:`Sleepy Snake &lt;sleepy&gt;` favicon.

- Wheels are now provided for manylinux2010, and for PyPy3 (pp36 and pp37).

- Continuous integration has moved from Travis and AppVeyor to GitHub Actions.

.. _issue 986: https://github.com/nedbat/coveragepy/issues/986
.. _issue 1037: https://github.com/nedbat/coveragepy/issues/1037
.. _issue 1010: https://github.com/nedbat/coveragepy/issues/1010
.. _pull request 1066: https://github.com/nedbat/coveragepy/pull/1066

.. _changes_53:

5.3

------------------------

- The ``source`` setting has always been interpreted as either a file path or a
module, depending on which existed.  If both interpretations were valid, it
was assumed to be a file path.  The new ``source_pkgs`` setting can be used
to name a package to disambiguate this case.  Thanks, Thomas Grainger. Fixes
`issue 268`_.

- If a plugin was disabled due to an exception, we used to still try to record
its information, causing an exception, as reported in `issue 1011`_.  This is
now fixed.

.. _issue 268: https://github.com/nedbat/coveragepy/issues/268
.. _issue 1011: https://github.com/nedbat/coveragepy/issues/1011


.. endchangesinclude

Older changes
-------------

The complete history is available in the `coverage.py docs`__.

__ https://coverage.readthedocs.io/en/latest/changes.html

Links

PyPI: https://pypi.org/project/coverage
Changelog: https://pyup.io/changelogs/coverage/
Repo: https://github.com/nedbat/coveragepy

update

opened by pyup-bot 0

Pin lz4 to latest version 4.3.1

This PR pins lz4 to the latest release 4.3.1.

Changelog

1.9.4

perf : faster decoding speed (~+20%) on aarch64 platforms
perf : faster decoding speed (~+70%) for -BD4 setting in CLI
api  : new function `LZ4_decompress_safe_partial_usingDict()` by yawqi
api  : lz4frame: ability to provide custom allocators at state creation
api  : can skip checksum validation for improved decoding speed
api  : new experimental unit `lz4file` for file i/o API, by anjiahao1
api  : new experimental function `LZ4F_uncompressedUpdate()`, by alexmohr
cli  : `--list` works on `stdin` input, by Low-power
cli  : `--no-crc` does not produce (compression) nor check (decompression) checksums
cli  : fix: `--test` and `--list` produce an error code when parsing invalid input
cli  : fix: `--test -m` does no longer create decompressed file artifacts
cli  : fix: support skippable frames when passed via `stdin`, reported by davidmankin
build: fix: Makefile respects CFLAGS directives passed via environment variable
build: `LZ4_FREESTANDING`, new build macro for freestanding environments, by t-mat
build: `make` and `make test` are compatible with `-j` parallel run
build: AS/400 compatibility, by jonrumsey
build: Solaris 10 compatibility, by pekdon
build: MSVC 2022 support, by t-mat
build: improved meson script, by eli-schwartz
doc  : Updated LZ4 block format, provide an &quot;implementation notes&quot; section

1.9.3

perf: highly improved speed in kernel space, by terrelln
perf: faster speed with Visual Studio, thanks to wolfpld and remittor
perf: improved dictionary compression speed, by felixhandte
perf: fixed LZ4_compress_HC_destSize() ratio, detected by hsiangkao
perf: reduced stack usage in high compression mode, by Yanpas
api : LZ4_decompress_safe_partial() supports unknown compressed size, requested by jfkthame
api : improved LZ4F_compressBound() with automatic flushing, by Christopher Harvie
api : can (de)compress to/from NULL without UBs
api : fix alignment test on 32-bit systems (state initialization)
api : fix LZ4_saveDictHC() in corner case scenario, detected by IgorKorkin
cli : `-l` legacy format is now compatible with `-m` multiple files, by Filipe Calasans
cli : benchmark mode supports dictionary, by rkoradi
cli : fix --fast with large argument, detected by picoHz
build: link to user-defined memory functions with LZ4_USER_MEMORY_FUNCTIONS, suggested by Yuriy Levchenko
build: contrib/cmake_unofficial/ moved to build/cmake/
build: visual/* moved to build/
build: updated meson script, by neheb
build: tinycc support, by Anton Kochkov
install: Haiku support, by Jerome Duval
doc : updated LZ4 frame format, clarify EndMark

1.9.2

fix : out-of-bound read in exceptional circumstances when using decompress_partial(), by terrelln
fix : slim opportunity for out-of-bound write with compress_fast() with a large enough input and when providing an output smaller than recommended (&lt; LZ4_compressBound(inputSize)), by terrelln
fix : rare data corruption bug with LZ4_compress_destSize(), by terrelln
fix : data corruption bug when Streaming with an Attached Dict in HC Mode, by felixhandte
perf: enable LZ4_FAST_DEC_LOOP on aarch64/GCC by default, by prekageo
perf: improved lz4frame streaming API speed, by dreambottle
perf: speed up lz4hc on slow patterns when using external dictionary, by terrelln
api: better in-place decompression and compression support
cli : --list supports multi-frames files, by gstedman
cli: --version outputs to stdout
cli : add option --best as an alias of -12 , by Low-power
misc: Integration into oss-fuzz by cmeister2, expanded list of scenarios by terrelln

1.9.1

fix : decompression functions were reading a few bytes beyond input size (introduced in v1.9.0, reported by ppodolsky and danlark1)
api : fix : lz4frame initializers compatibility with c++, reported by degski
cli : added command --list, based on a patch by gabrielstedman
build: improved Windows build, by JPeterMugaas
build: AIX, by Norman Green

1.9.0

perf: large decompression speed improvement on x86/x64 (up to +20%) by djwatson
api : changed : _destSize() compression variants are promoted to stable API
api : new : LZ4_initStream(HC), replacing LZ4_resetStream(HC)
api : changed : LZ4_resetStream(HC) as recommended reset function, for better performance on small data
cli : support custom block sizes, by blezsan
build: source code can be amalgamated, by Bing Xu
build: added meson build, by lzutao
build: new build macros : LZ4_DISTANCE_MAX, LZ4_FAST_DEC_LOOP
install: MidnightBSD, by laffer1
install: msys2 on Windows 10, by vtorri

1.8.3

perf: minor decompression speed improvement (~+2%) with gcc
fix : corruption in v1.8.2 at level 9 for files &gt; 64KB under rare conditions (560)
cli : new command --fast, by jennifermliu
cli : fixed elapsed time, and added cpu load indicator (on -vv) (555)
api : LZ4_decompress_safe_partial() now decodes exactly the nb of bytes requested (feature request 566)
build : added Haiku target, by fbrosson, and MidnightBSD, by laffer1
doc : updated documentation regarding dictionary compression

1.8.2

perf: *much* faster dictionary compression on small files, by felixhandte
perf: improved decompression speed and binary size, by Alexey Tourbin (svpv)
perf: slightly faster HC compression and decompression speed
perf: very small compression ratio improvement
fix : compression compatible with low memory addresses (&lt; 0xFFFF)
fix : decompression segfault when provided with NULL input, by terrelln
cli : new command --favor-decSpeed
cli : benchmark mode more accurate for small inputs
fullbench : can bench _destSize() variants, by felixhandte
doc : clarified block format parsing restrictions, by Alexey Tourbin (svpv)

1.8.1

perf : faster and stronger ultra modes (levels 10+)
perf : slightly faster compression and decompression speed
perf : fix bad degenerative case, reported by c-morgenstern
fix : decompression failed when using a combination of extDict + low memory address (397), reported and fixed by Julian Scheid (jscheid)
cli : support for dictionary compression (`-D`), by Felix Handte felixhandte
cli : fix : `lz4 -d --rm` preserves timestamp (441)
cli : fix : do not modify /dev/null permission as root, by aliceatlas
api : `_destSize()` variant supported for all compression levels
build  : `make` and `make test` compatible with `-jX`, reported by mwgamera
build  : can control LZ4LIB_VISIBILITY macro, by mikir
install: fix man page directory (387), reported by Stuart Cardall (itoffshore)

1.8.0

cli : fix : do not modify /dev/null permissions, reported by Maokaman1
cli : added GNU separator -- specifying that all following arguments are files
API : added LZ4_compress_HC_destSize(), by Oleg (remittor)
API : added LZ4F_resetDecompressionContext()
API : lz4frame : negative compression levels trigger fast acceleration, request by Lawrence Chan
API : lz4frame : can control block checksum and dictionary ID
API : fix : expose obsolete decoding functions, reported by Chen Yufei
API : experimental : lz4frame_static : new dictionary compression API
build : fix : static lib installation, by Ido Rosen
build : dragonFlyBSD, OpenBSD, NetBSD supported
build : LZ4_MEMORY_USAGE can be modified at compile time, through external define
doc : Updated LZ4 Frame format to v1.6.0, restoring Dictionary-ID field
doc : lz4 api manual, by Przemyslaw Skibinski

1.7.5

lz4hc : new high compression mode : levels 10-12 compress more and slower, by Przemyslaw Skibinski
lz4cat : fix : works with relative path (284) and stdin (285) (reported by beiDei8z)
cli : fix minor notification when using -r recursive mode
API : lz4frame : LZ4F_frameBound(0) gives upper bound of *flush() and *End() operations (290, 280)
doc : markdown version of man page, by Takayuki Matsuoka (279)
build : Makefile : fix make -jX lib+exe concurrency (277)
build : cmake : improvements by Michał Górny (296)

1.7.4.2

fix : Makefile : release build compatible with PIE and customized compilation directives provided through environment variables (274, reported by Antoine Martin)

1.7.4

Improved : much better speed in -mx32 mode
cli : fix : Large file support in 32-bits mode on Mac OS-X
fix : compilation on gcc 4.4 (272), reported by Antoine Martin

1.7.3

Changed : moved to versioning; package, cli and library have same version number
Improved: Small decompression speed boost
Improved: Small compression speed improvement on 64-bits systems
Improved: Small compression ratio and speed improvement on small files
Improved: Significant speed boost on ARMv6 and ARMv7
Fix : better ratio on 64-bits big-endian targets
Improved cmake build script, by Evan Nemerson
New liblz4-dll project, by Przemyslaw Skibinki
Makefile: Generates object files (*.o) for faster (re)compilation on low power systems
cli : new : --rm and --help commands
cli : new : preserved file attributes, by Przemyslaw Skibinki
cli : fix : crash on some invalid inputs
cli : fix : -t correctly validates lz4-compressed files, by Nick Terrell
cli : fix : detects and reports fread() errors, thanks to Hiroshi Fujishima report 243
cli : bench : new : -r recursive mode
lz4cat : can cat multiple files in a single command line (184)
Added : doc/lz4_manual.html, by Przemyslaw Skibinski
Added : dictionary compression and frame decompression examples, by Nick Terrell
Added : Debianization, by Evgeniy Polyakov

r131
New    : Dos/DJGPP target, thanks to Louis Santillan (114)
Added  : Example using lz4frame library, by Zbigniew Jędrzejewski-Szmek (118)
Changed: xxhash symbols are modified (namespace emulation) within liblz4

r130:
Fixed  : incompatibility sparse mode vs console, reported by Yongwoon Cho (105)
Fixed  : LZ4IO exits too early when frame crc not present, reported by Yongwoon Cho (106)
Fixed  : incompatibility sparse mode vs append mode, reported by Takayuki Matsuoka (110)
Performance fix : big compression speed boost for clang (+30%)
New    : cross-version test, by Takayuki Matsuoka

r129:
Added  : LZ4_compress_fast(), LZ4_compress_fast_continue()
Added  : LZ4_compress_destSize()
Changed: New lz4 and lz4hc compression API. Previous function prototypes still supported.
Changed: Sparse file support enabled by default
New    : LZ4 CLI improved performance compressing/decompressing multiple files (86, kind contribution from Kyle J. Harper &amp; Takayuki Matsuoka)
Fixed  : GCC 4.9+ optimization bug - Reported by Markus Trippelsdorf, Greg Slazinski &amp; Evan Nemerson
Changed: Enums converted to LZ4F_ namespace convention - by Takayuki Matsuoka
Added  : AppVeyor CI environment, for Visual tests - Suggested by Takayuki Matsuoka
Modified:Obsolete functions generate warnings - Suggested by Evan Nemerson, contributed by Takayuki Matsuoka
Fixed  : Bug 75 (unfinished stream), reported by Yongwoon Cho
Updated: Documentation converted to MarkDown format

r128:
New    : lz4cli sparse file support (Requested by Neil Wilson, and contributed by Takayuki Matsuoka)
New    : command -m, to compress multiple files in a single command (suggested by Kyle J. Harper)
Fixed  : Restored lz4hc compression ratio (slightly lower since r124)
New    : lz4 cli supports long commands (suggested by Takayuki Matsuoka)
New    : lz4frame &amp; lz4cli frame content size support
New    : lz4frame supports skippable frames, as requested by Sergey Cherepanov
Changed: Default &quot;make install&quot; directory is /usr/local, as notified by Ron Johnson
New    : lz4 cli supports &quot;pass-through&quot; mode, requested by Neil Wilson
New    : datagen can generate sparse files
New    : scan-build tests, thanks to kind help by Takayuki Matsuoka
New    : g++ compatibility tests
New    : arm cross-compilation test, thanks to kind help by Takayuki Matsuoka
Fixed  : Fuzzer + frametest compatibility with NetBSD (issue 48, reported by Thomas Klausner)
Added  : Visual project directory
Updated: Man page &amp; Specification

r127:
N/A   : added a file on SVN

r126:
New   : lz4frame API is now integrated into liblz4
Fixed : GCC 4.9 bug on highest performance settings, reported by Greg Slazinski
Fixed : bug within LZ4 HC streaming mode, reported by James Boyle
Fixed : older compiler don&#x27;t like nameless unions, reported by Cheyi Lin
Changed : lz4 is C90 compatible
Changed : added -pedantic option, fixed a few mminor warnings

r125:
Changed : endian and alignment code
Changed : directory structure : new &quot;lib&quot; directory
Updated : lz4io, now uses lz4frame
Improved: slightly improved decoding speed
Fixed : LZ4_compress_limitedOutput(); Special thanks to Christopher Speller !
Fixed : some alignment warnings under clang
Fixed : deprecated function LZ4_slideInputBufferHC()

r124:
New : LZ4 HC streaming mode
Fixed : LZ4F_compressBound() using null preferencesPtr
Updated : xxHash to r38

1.4.1

Updated : xxHash, to r36

r121:
Added : Makefile : install for kFreeBSD and Hurd (Nobuhiro Iwamatsu)
Fix : Makefile : install for OS-X and BSD, thanks to Takayuki Matsuoka

r120:
Modified : Streaming API, using strong types
Added : LZ4_versionNumber(), thanks to Takayuki Matsuoka
Fix : OS-X : library install name, thanks to Clemens Lang
Updated : Makefile : synchronize library version number with lz4.h, thanks to Takayuki Matsuoka
Updated : Makefile : stricter compilation flags
Added : pkg-config, thanks to Zbigniew Jędrzejewski-Szmek (issue 135)
Makefile : lz4-test only test native binaries, as suggested by Michał Górny (issue 136)
Updated : xxHash to r35

r119:
Fix : Issue 134 : extended malicious address space overflow in 32-bits mode for some specific configurations

r118:
New : LZ4 Streaming API (Fast version), special thanks to Takayuki Matsuoka
New : datagen : parametrable synthetic data generator for tests
Improved : fuzzer, support more test cases, more parameters, ability to jump to specific test
fix : support ppc64le platform (issue 131)
fix : Issue 52 (malicious address space overflow in 32-bits mode when using large custom format)
fix : Makefile : minor issue 130 : header files permissions

r117:
Added : man pages for lz4c and lz4cat
Added : automated tests on Travis, thanks to Takayuki Matsuoka !
fix : block-dependency command line (issue 127)
fix : lz4fullbench (issue 128)

r116:
hotfix (issue 124 &amp; 125)

r115:
Added : lz4cat utility, installed on POSX systems (issue 118)
OS-X compatible compilation of dynamic library (issue 115)

r114:
Makefile : library correctly compiled with -O3 switch (issue 114)
Makefile : library compilation compatible with clang
Makefile : library is versioned and linked (issue 119)
lz4.h : no more static inline prototypes (issue 116)
man : improved header/footer (issue 111)
Makefile : Use system default $(CC) &amp; $(MAKE) variables (issue 112)
xxhash : updated to r34

r113:
Large decompression speed improvement for GCC 32-bits. Thanks to Valery Croizier !
LZ4HC : Compression Level is now a programmable parameter (CLI from 4 to 9)
Separated IO routines from command line (lz4io.c)
Version number into lz4.h (suggested by Francesc Alted)

r112:
quickfix

r111 :
Makefile : added capability to install libraries
Modified Directory tree, to better separate libraries from programs.

r110 :
lz4 &amp; lz4hc : added capability to allocate state &amp; stream state with custom allocator (issue 99)
fuzzer &amp; fullbench : updated to test new functions
man : documented -l command (Legacy format, for Linux kernel compression) (issue 102)
cmake : improved version by Mika Attila, building programs and libraries (issue 100)
xxHash : updated to r33
Makefile : clean also delete local package .tar.gz

r109 :
lz4.c : corrected issue 98 (LZ4_compress_limitedOutput())
Makefile : can specify version number from makefile

r108 :
lz4.c : corrected compression efficiency issue 97 in 64-bits chained mode (-BD) for streams &gt; 4 GB (thanks Roman Strashkin for reporting)

r107 :
Makefile : support DESTDIR for staged installs. Thanks Jorge Aparicio.
Makefile : make install installs both lz4 and lz4c (Jorge Aparicio)
Makefile : removed -Wno-implicit-declaration compilation switch
lz4cli.c : include &lt;stduni.h&gt; for isatty() (Luca Barbato)
lz4.h : introduced LZ4_MAX_INPUT_SIZE constant (Shay Green)
lz4.h : LZ4_compressBound() : unified macro and inline definitions (Shay Green)
lz4.h : LZ4_decompressSafe_partial() : clarify comments (Shay Green)
lz4.c : LZ4_compress() verify input size condition (Shay Green)
bench.c : corrected a bug in free memory size evaluation
cmake : install into bin/ directory (Richard Yao)
cmake : check for just C compiler (Elan Ruusamae)

r106 :
Makefile : make dist modify text files in the package to respect Unix EoL convention
lz4cli.c : corrected small display bug in HC mode

r105 :
Makefile : New install script and man page, contributed by Prasad Pandit
lz4cli.c : Minor modifications, for easier extensibility
COPYING  : added license file
LZ4_Streaming_Format.odt : modified file name to remove white space characters
Makefile : .exe suffix now properly added only for Windows target

1.4.0

r123:
Added : experimental lz4frame API, thanks to Takayuki Matsuoka and Christopher Jackson for testings
Fix : s390x support, thanks to Nobuhiro Iwamatsu
Fix : test mode (-t) no longer requires confirmation, thanks to Thary Nguyen

r122:
Fix : AIX &amp; AIX64 support (SamG)
Fix : mips 64-bits support (lew van)
Added : Examples directory, using code examples from Takayuki Matsuoka

Links

PyPI: https://pypi.org/project/lz4
Changelog: https://pyup.io/changelogs/lz4/
Repo: https://github.com/python-lz4/python-lz4

update

opened by pyup-bot 0

Pin lz4 to latest version 4.3.0

This PR pins lz4 to the latest release 4.3.0.

Changelog

1.9.4

perf : faster decoding speed (~+20%) on aarch64 platforms
perf : faster decoding speed (~+70%) for -BD4 setting in CLI
api  : new function `LZ4_decompress_safe_partial_usingDict()` by yawqi
api  : lz4frame: ability to provide custom allocators at state creation
api  : can skip checksum validation for improved decoding speed
api  : new experimental unit `lz4file` for file i/o API, by anjiahao1
api  : new experimental function `LZ4F_uncompressedUpdate()`, by alexmohr
cli  : `--list` works on `stdin` input, by Low-power
cli  : `--no-crc` does not produce (compression) nor check (decompression) checksums
cli  : fix: `--test` and `--list` produce an error code when parsing invalid input
cli  : fix: `--test -m` does no longer create decompressed file artifacts
cli  : fix: support skippable frames when passed via `stdin`, reported by davidmankin
build: fix: Makefile respects CFLAGS directives passed via environment variable
build: `LZ4_FREESTANDING`, new build macro for freestanding environments, by t-mat
build: `make` and `make test` are compatible with `-j` parallel run
build: AS/400 compatibility, by jonrumsey
build: Solaris 10 compatibility, by pekdon
build: MSVC 2022 support, by t-mat
build: improved meson script, by eli-schwartz
doc  : Updated LZ4 block format, provide an &quot;implementation notes&quot; section

1.9.3

perf: highly improved speed in kernel space, by terrelln
perf: faster speed with Visual Studio, thanks to wolfpld and remittor
perf: improved dictionary compression speed, by felixhandte
perf: fixed LZ4_compress_HC_destSize() ratio, detected by hsiangkao
perf: reduced stack usage in high compression mode, by Yanpas
api : LZ4_decompress_safe_partial() supports unknown compressed size, requested by jfkthame
api : improved LZ4F_compressBound() with automatic flushing, by Christopher Harvie
api : can (de)compress to/from NULL without UBs
api : fix alignment test on 32-bit systems (state initialization)
api : fix LZ4_saveDictHC() in corner case scenario, detected by IgorKorkin
cli : `-l` legacy format is now compatible with `-m` multiple files, by Filipe Calasans
cli : benchmark mode supports dictionary, by rkoradi
cli : fix --fast with large argument, detected by picoHz
build: link to user-defined memory functions with LZ4_USER_MEMORY_FUNCTIONS, suggested by Yuriy Levchenko
build: contrib/cmake_unofficial/ moved to build/cmake/
build: visual/* moved to build/
build: updated meson script, by neheb
build: tinycc support, by Anton Kochkov
install: Haiku support, by Jerome Duval
doc : updated LZ4 frame format, clarify EndMark

1.9.2

fix : out-of-bound read in exceptional circumstances when using decompress_partial(), by terrelln
fix : slim opportunity for out-of-bound write with compress_fast() with a large enough input and when providing an output smaller than recommended (&lt; LZ4_compressBound(inputSize)), by terrelln
fix : rare data corruption bug with LZ4_compress_destSize(), by terrelln
fix : data corruption bug when Streaming with an Attached Dict in HC Mode, by felixhandte
perf: enable LZ4_FAST_DEC_LOOP on aarch64/GCC by default, by prekageo
perf: improved lz4frame streaming API speed, by dreambottle
perf: speed up lz4hc on slow patterns when using external dictionary, by terrelln
api: better in-place decompression and compression support
cli : --list supports multi-frames files, by gstedman
cli: --version outputs to stdout
cli : add option --best as an alias of -12 , by Low-power
misc: Integration into oss-fuzz by cmeister2, expanded list of scenarios by terrelln

1.9.1

fix : decompression functions were reading a few bytes beyond input size (introduced in v1.9.0, reported by ppodolsky and danlark1)
api : fix : lz4frame initializers compatibility with c++, reported by degski
cli : added command --list, based on a patch by gabrielstedman
build: improved Windows build, by JPeterMugaas
build: AIX, by Norman Green

1.9.0

perf: large decompression speed improvement on x86/x64 (up to +20%) by djwatson
api : changed : _destSize() compression variants are promoted to stable API
api : new : LZ4_initStream(HC), replacing LZ4_resetStream(HC)
api : changed : LZ4_resetStream(HC) as recommended reset function, for better performance on small data
cli : support custom block sizes, by blezsan
build: source code can be amalgamated, by Bing Xu
build: added meson build, by lzutao
build: new build macros : LZ4_DISTANCE_MAX, LZ4_FAST_DEC_LOOP
install: MidnightBSD, by laffer1
install: msys2 on Windows 10, by vtorri

1.8.3

perf: minor decompression speed improvement (~+2%) with gcc
fix : corruption in v1.8.2 at level 9 for files &gt; 64KB under rare conditions (560)
cli : new command --fast, by jennifermliu
cli : fixed elapsed time, and added cpu load indicator (on -vv) (555)
api : LZ4_decompress_safe_partial() now decodes exactly the nb of bytes requested (feature request 566)
build : added Haiku target, by fbrosson, and MidnightBSD, by laffer1
doc : updated documentation regarding dictionary compression

1.8.2

perf: *much* faster dictionary compression on small files, by felixhandte
perf: improved decompression speed and binary size, by Alexey Tourbin (svpv)
perf: slightly faster HC compression and decompression speed
perf: very small compression ratio improvement
fix : compression compatible with low memory addresses (&lt; 0xFFFF)
fix : decompression segfault when provided with NULL input, by terrelln
cli : new command --favor-decSpeed
cli : benchmark mode more accurate for small inputs
fullbench : can bench _destSize() variants, by felixhandte
doc : clarified block format parsing restrictions, by Alexey Tourbin (svpv)

1.8.1

perf : faster and stronger ultra modes (levels 10+)
perf : slightly faster compression and decompression speed
perf : fix bad degenerative case, reported by c-morgenstern
fix : decompression failed when using a combination of extDict + low memory address (397), reported and fixed by Julian Scheid (jscheid)
cli : support for dictionary compression (`-D`), by Felix Handte felixhandte
cli : fix : `lz4 -d --rm` preserves timestamp (441)
cli : fix : do not modify /dev/null permission as root, by aliceatlas
api : `_destSize()` variant supported for all compression levels
build  : `make` and `make test` compatible with `-jX`, reported by mwgamera
build  : can control LZ4LIB_VISIBILITY macro, by mikir
install: fix man page directory (387), reported by Stuart Cardall (itoffshore)

1.8.0

cli : fix : do not modify /dev/null permissions, reported by Maokaman1
cli : added GNU separator -- specifying that all following arguments are files
API : added LZ4_compress_HC_destSize(), by Oleg (remittor)
API : added LZ4F_resetDecompressionContext()
API : lz4frame : negative compression levels trigger fast acceleration, request by Lawrence Chan
API : lz4frame : can control block checksum and dictionary ID
API : fix : expose obsolete decoding functions, reported by Chen Yufei
API : experimental : lz4frame_static : new dictionary compression API
build : fix : static lib installation, by Ido Rosen
build : dragonFlyBSD, OpenBSD, NetBSD supported
build : LZ4_MEMORY_USAGE can be modified at compile time, through external define
doc : Updated LZ4 Frame format to v1.6.0, restoring Dictionary-ID field
doc : lz4 api manual, by Przemyslaw Skibinski

1.7.5

lz4hc : new high compression mode : levels 10-12 compress more and slower, by Przemyslaw Skibinski
lz4cat : fix : works with relative path (284) and stdin (285) (reported by beiDei8z)
cli : fix minor notification when using -r recursive mode
API : lz4frame : LZ4F_frameBound(0) gives upper bound of *flush() and *End() operations (290, 280)
doc : markdown version of man page, by Takayuki Matsuoka (279)
build : Makefile : fix make -jX lib+exe concurrency (277)
build : cmake : improvements by Michał Górny (296)

1.7.4.2

fix : Makefile : release build compatible with PIE and customized compilation directives provided through environment variables (274, reported by Antoine Martin)

1.7.4

Improved : much better speed in -mx32 mode
cli : fix : Large file support in 32-bits mode on Mac OS-X
fix : compilation on gcc 4.4 (272), reported by Antoine Martin

1.7.3

Changed : moved to versioning; package, cli and library have same version number
Improved: Small decompression speed boost
Improved: Small compression speed improvement on 64-bits systems
Improved: Small compression ratio and speed improvement on small files
Improved: Significant speed boost on ARMv6 and ARMv7
Fix : better ratio on 64-bits big-endian targets
Improved cmake build script, by Evan Nemerson
New liblz4-dll project, by Przemyslaw Skibinki
Makefile: Generates object files (*.o) for faster (re)compilation on low power systems
cli : new : --rm and --help commands
cli : new : preserved file attributes, by Przemyslaw Skibinki
cli : fix : crash on some invalid inputs
cli : fix : -t correctly validates lz4-compressed files, by Nick Terrell
cli : fix : detects and reports fread() errors, thanks to Hiroshi Fujishima report 243
cli : bench : new : -r recursive mode
lz4cat : can cat multiple files in a single command line (184)
Added : doc/lz4_manual.html, by Przemyslaw Skibinski
Added : dictionary compression and frame decompression examples, by Nick Terrell
Added : Debianization, by Evgeniy Polyakov

r131
New    : Dos/DJGPP target, thanks to Louis Santillan (114)
Added  : Example using lz4frame library, by Zbigniew Jędrzejewski-Szmek (118)
Changed: xxhash symbols are modified (namespace emulation) within liblz4

r130:
Fixed  : incompatibility sparse mode vs console, reported by Yongwoon Cho (105)
Fixed  : LZ4IO exits too early when frame crc not present, reported by Yongwoon Cho (106)
Fixed  : incompatibility sparse mode vs append mode, reported by Takayuki Matsuoka (110)
Performance fix : big compression speed boost for clang (+30%)
New    : cross-version test, by Takayuki Matsuoka

r129:
Added  : LZ4_compress_fast(), LZ4_compress_fast_continue()
Added  : LZ4_compress_destSize()
Changed: New lz4 and lz4hc compression API. Previous function prototypes still supported.
Changed: Sparse file support enabled by default
New    : LZ4 CLI improved performance compressing/decompressing multiple files (86, kind contribution from Kyle J. Harper &amp; Takayuki Matsuoka)
Fixed  : GCC 4.9+ optimization bug - Reported by Markus Trippelsdorf, Greg Slazinski &amp; Evan Nemerson
Changed: Enums converted to LZ4F_ namespace convention - by Takayuki Matsuoka
Added  : AppVeyor CI environment, for Visual tests - Suggested by Takayuki Matsuoka
Modified:Obsolete functions generate warnings - Suggested by Evan Nemerson, contributed by Takayuki Matsuoka
Fixed  : Bug 75 (unfinished stream), reported by Yongwoon Cho
Updated: Documentation converted to MarkDown format

r128:
New    : lz4cli sparse file support (Requested by Neil Wilson, and contributed by Takayuki Matsuoka)
New    : command -m, to compress multiple files in a single command (suggested by Kyle J. Harper)
Fixed  : Restored lz4hc compression ratio (slightly lower since r124)
New    : lz4 cli supports long commands (suggested by Takayuki Matsuoka)
New    : lz4frame &amp; lz4cli frame content size support
New    : lz4frame supports skippable frames, as requested by Sergey Cherepanov
Changed: Default &quot;make install&quot; directory is /usr/local, as notified by Ron Johnson
New    : lz4 cli supports &quot;pass-through&quot; mode, requested by Neil Wilson
New    : datagen can generate sparse files
New    : scan-build tests, thanks to kind help by Takayuki Matsuoka
New    : g++ compatibility tests
New    : arm cross-compilation test, thanks to kind help by Takayuki Matsuoka
Fixed  : Fuzzer + frametest compatibility with NetBSD (issue 48, reported by Thomas Klausner)
Added  : Visual project directory
Updated: Man page &amp; Specification

r127:
N/A   : added a file on SVN

r126:
New   : lz4frame API is now integrated into liblz4
Fixed : GCC 4.9 bug on highest performance settings, reported by Greg Slazinski
Fixed : bug within LZ4 HC streaming mode, reported by James Boyle
Fixed : older compiler don&#x27;t like nameless unions, reported by Cheyi Lin
Changed : lz4 is C90 compatible
Changed : added -pedantic option, fixed a few mminor warnings

r125:
Changed : endian and alignment code
Changed : directory structure : new &quot;lib&quot; directory
Updated : lz4io, now uses lz4frame
Improved: slightly improved decoding speed
Fixed : LZ4_compress_limitedOutput(); Special thanks to Christopher Speller !
Fixed : some alignment warnings under clang
Fixed : deprecated function LZ4_slideInputBufferHC()

r124:
New : LZ4 HC streaming mode
Fixed : LZ4F_compressBound() using null preferencesPtr
Updated : xxHash to r38

1.4.1

Updated : xxHash, to r36

r121:
Added : Makefile : install for kFreeBSD and Hurd (Nobuhiro Iwamatsu)
Fix : Makefile : install for OS-X and BSD, thanks to Takayuki Matsuoka

r120:
Modified : Streaming API, using strong types
Added : LZ4_versionNumber(), thanks to Takayuki Matsuoka
Fix : OS-X : library install name, thanks to Clemens Lang
Updated : Makefile : synchronize library version number with lz4.h, thanks to Takayuki Matsuoka
Updated : Makefile : stricter compilation flags
Added : pkg-config, thanks to Zbigniew Jędrzejewski-Szmek (issue 135)
Makefile : lz4-test only test native binaries, as suggested by Michał Górny (issue 136)
Updated : xxHash to r35

r119:
Fix : Issue 134 : extended malicious address space overflow in 32-bits mode for some specific configurations

r118:
New : LZ4 Streaming API (Fast version), special thanks to Takayuki Matsuoka
New : datagen : parametrable synthetic data generator for tests
Improved : fuzzer, support more test cases, more parameters, ability to jump to specific test
fix : support ppc64le platform (issue 131)
fix : Issue 52 (malicious address space overflow in 32-bits mode when using large custom format)
fix : Makefile : minor issue 130 : header files permissions

r117:
Added : man pages for lz4c and lz4cat
Added : automated tests on Travis, thanks to Takayuki Matsuoka !
fix : block-dependency command line (issue 127)
fix : lz4fullbench (issue 128)

r116:
hotfix (issue 124 &amp; 125)

r115:
Added : lz4cat utility, installed on POSX systems (issue 118)
OS-X compatible compilation of dynamic library (issue 115)

r114:
Makefile : library correctly compiled with -O3 switch (issue 114)
Makefile : library compilation compatible with clang
Makefile : library is versioned and linked (issue 119)
lz4.h : no more static inline prototypes (issue 116)
man : improved header/footer (issue 111)
Makefile : Use system default $(CC) &amp; $(MAKE) variables (issue 112)
xxhash : updated to r34

r113:
Large decompression speed improvement for GCC 32-bits. Thanks to Valery Croizier !
LZ4HC : Compression Level is now a programmable parameter (CLI from 4 to 9)
Separated IO routines from command line (lz4io.c)
Version number into lz4.h (suggested by Francesc Alted)

r112:
quickfix

r111 :
Makefile : added capability to install libraries
Modified Directory tree, to better separate libraries from programs.

r110 :
lz4 &amp; lz4hc : added capability to allocate state &amp; stream state with custom allocator (issue 99)
fuzzer &amp; fullbench : updated to test new functions
man : documented -l command (Legacy format, for Linux kernel compression) (issue 102)
cmake : improved version by Mika Attila, building programs and libraries (issue 100)
xxHash : updated to r33
Makefile : clean also delete local package .tar.gz

r109 :
lz4.c : corrected issue 98 (LZ4_compress_limitedOutput())
Makefile : can specify version number from makefile

r108 :
lz4.c : corrected compression efficiency issue 97 in 64-bits chained mode (-BD) for streams &gt; 4 GB (thanks Roman Strashkin for reporting)

r107 :
Makefile : support DESTDIR for staged installs. Thanks Jorge Aparicio.
Makefile : make install installs both lz4 and lz4c (Jorge Aparicio)
Makefile : removed -Wno-implicit-declaration compilation switch
lz4cli.c : include &lt;stduni.h&gt; for isatty() (Luca Barbato)
lz4.h : introduced LZ4_MAX_INPUT_SIZE constant (Shay Green)
lz4.h : LZ4_compressBound() : unified macro and inline definitions (Shay Green)
lz4.h : LZ4_decompressSafe_partial() : clarify comments (Shay Green)
lz4.c : LZ4_compress() verify input size condition (Shay Green)
bench.c : corrected a bug in free memory size evaluation
cmake : install into bin/ directory (Richard Yao)
cmake : check for just C compiler (Elan Ruusamae)

r106 :
Makefile : make dist modify text files in the package to respect Unix EoL convention
lz4cli.c : corrected small display bug in HC mode

r105 :
Makefile : New install script and man page, contributed by Prasad Pandit
lz4cli.c : Minor modifications, for easier extensibility
COPYING  : added license file
LZ4_Streaming_Format.odt : modified file name to remove white space characters
Makefile : .exe suffix now properly added only for Windows target

1.4.0

r123:
Added : experimental lz4frame API, thanks to Takayuki Matsuoka and Christopher Jackson for testings
Fix : s390x support, thanks to Nobuhiro Iwamatsu
Fix : test mode (-t) no longer requires confirmation, thanks to Thary Nguyen

r122:
Fix : AIX &amp; AIX64 support (SamG)
Fix : mips 64-bits support (lew van)
Added : Examples directory, using code examples from Takayuki Matsuoka

Links

PyPI: https://pypi.org/project/lz4
Changelog: https://pyup.io/changelogs/lz4/
Repo: https://github.com/python-lz4/python-lz4

update

opened by pyup-bot 0

Pin lz4 to latest version 4.2.0

This PR pins lz4 to the latest release 4.2.0.

Changelog

1.9.4

perf : faster decoding speed (~+20%) on aarch64 platforms
perf : faster decoding speed (~+70%) for -BD4 setting in CLI
api  : new function `LZ4_decompress_safe_partial_usingDict()` by yawqi
api  : lz4frame: ability to provide custom allocators at state creation
api  : can skip checksum validation for improved decoding speed
api  : new experimental unit `lz4file` for file i/o API, by anjiahao1
api  : new experimental function `LZ4F_uncompressedUpdate()`, by alexmohr
cli  : `--list` works on `stdin` input, by Low-power
cli  : `--no-crc` does not produce (compression) nor check (decompression) checksums
cli  : fix: `--test` and `--list` produce an error code when parsing invalid input
cli  : fix: `--test -m` does no longer create decompressed file artifacts
cli  : fix: support skippable frames when passed via `stdin`, reported by davidmankin
build: fix: Makefile respects CFLAGS directives passed via environment variable
build: `LZ4_FREESTANDING`, new build macro for freestanding environments, by t-mat
build: `make` and `make test` are compatible with `-j` parallel run
build: AS/400 compatibility, by jonrumsey
build: Solaris 10 compatibility, by pekdon
build: MSVC 2022 support, by t-mat
build: improved meson script, by eli-schwartz
doc  : Updated LZ4 block format, provide an &quot;implementation notes&quot; section

1.9.3

perf: highly improved speed in kernel space, by terrelln
perf: faster speed with Visual Studio, thanks to wolfpld and remittor
perf: improved dictionary compression speed, by felixhandte
perf: fixed LZ4_compress_HC_destSize() ratio, detected by hsiangkao
perf: reduced stack usage in high compression mode, by Yanpas
api : LZ4_decompress_safe_partial() supports unknown compressed size, requested by jfkthame
api : improved LZ4F_compressBound() with automatic flushing, by Christopher Harvie
api : can (de)compress to/from NULL without UBs
api : fix alignment test on 32-bit systems (state initialization)
api : fix LZ4_saveDictHC() in corner case scenario, detected by IgorKorkin
cli : `-l` legacy format is now compatible with `-m` multiple files, by Filipe Calasans
cli : benchmark mode supports dictionary, by rkoradi
cli : fix --fast with large argument, detected by picoHz
build: link to user-defined memory functions with LZ4_USER_MEMORY_FUNCTIONS, suggested by Yuriy Levchenko
build: contrib/cmake_unofficial/ moved to build/cmake/
build: visual/* moved to build/
build: updated meson script, by neheb
build: tinycc support, by Anton Kochkov
install: Haiku support, by Jerome Duval
doc : updated LZ4 frame format, clarify EndMark

1.9.2

fix : out-of-bound read in exceptional circumstances when using decompress_partial(), by terrelln
fix : slim opportunity for out-of-bound write with compress_fast() with a large enough input and when providing an output smaller than recommended (&lt; LZ4_compressBound(inputSize)), by terrelln
fix : rare data corruption bug with LZ4_compress_destSize(), by terrelln
fix : data corruption bug when Streaming with an Attached Dict in HC Mode, by felixhandte
perf: enable LZ4_FAST_DEC_LOOP on aarch64/GCC by default, by prekageo
perf: improved lz4frame streaming API speed, by dreambottle
perf: speed up lz4hc on slow patterns when using external dictionary, by terrelln
api: better in-place decompression and compression support
cli : --list supports multi-frames files, by gstedman
cli: --version outputs to stdout
cli : add option --best as an alias of -12 , by Low-power
misc: Integration into oss-fuzz by cmeister2, expanded list of scenarios by terrelln

1.9.1

fix : decompression functions were reading a few bytes beyond input size (introduced in v1.9.0, reported by ppodolsky and danlark1)
api : fix : lz4frame initializers compatibility with c++, reported by degski
cli : added command --list, based on a patch by gabrielstedman
build: improved Windows build, by JPeterMugaas
build: AIX, by Norman Green

1.9.0

perf: large decompression speed improvement on x86/x64 (up to +20%) by djwatson
api : changed : _destSize() compression variants are promoted to stable API
api : new : LZ4_initStream(HC), replacing LZ4_resetStream(HC)
api : changed : LZ4_resetStream(HC) as recommended reset function, for better performance on small data
cli : support custom block sizes, by blezsan
build: source code can be amalgamated, by Bing Xu
build: added meson build, by lzutao
build: new build macros : LZ4_DISTANCE_MAX, LZ4_FAST_DEC_LOOP
install: MidnightBSD, by laffer1
install: msys2 on Windows 10, by vtorri

1.8.3

perf: minor decompression speed improvement (~+2%) with gcc
fix : corruption in v1.8.2 at level 9 for files &gt; 64KB under rare conditions (560)
cli : new command --fast, by jennifermliu
cli : fixed elapsed time, and added cpu load indicator (on -vv) (555)
api : LZ4_decompress_safe_partial() now decodes exactly the nb of bytes requested (feature request 566)
build : added Haiku target, by fbrosson, and MidnightBSD, by laffer1
doc : updated documentation regarding dictionary compression

1.8.2

perf: *much* faster dictionary compression on small files, by felixhandte
perf: improved decompression speed and binary size, by Alexey Tourbin (svpv)
perf: slightly faster HC compression and decompression speed
perf: very small compression ratio improvement
fix : compression compatible with low memory addresses (&lt; 0xFFFF)
fix : decompression segfault when provided with NULL input, by terrelln
cli : new command --favor-decSpeed
cli : benchmark mode more accurate for small inputs
fullbench : can bench _destSize() variants, by felixhandte
doc : clarified block format parsing restrictions, by Alexey Tourbin (svpv)

1.8.1

perf : faster and stronger ultra modes (levels 10+)
perf : slightly faster compression and decompression speed
perf : fix bad degenerative case, reported by c-morgenstern
fix : decompression failed when using a combination of extDict + low memory address (397), reported and fixed by Julian Scheid (jscheid)
cli : support for dictionary compression (`-D`), by Felix Handte felixhandte
cli : fix : `lz4 -d --rm` preserves timestamp (441)
cli : fix : do not modify /dev/null permission as root, by aliceatlas
api : `_destSize()` variant supported for all compression levels
build  : `make` and `make test` compatible with `-jX`, reported by mwgamera
build  : can control LZ4LIB_VISIBILITY macro, by mikir
install: fix man page directory (387), reported by Stuart Cardall (itoffshore)

1.8.0

cli : fix : do not modify /dev/null permissions, reported by Maokaman1
cli : added GNU separator -- specifying that all following arguments are files
API : added LZ4_compress_HC_destSize(), by Oleg (remittor)
API : added LZ4F_resetDecompressionContext()
API : lz4frame : negative compression levels trigger fast acceleration, request by Lawrence Chan
API : lz4frame : can control block checksum and dictionary ID
API : fix : expose obsolete decoding functions, reported by Chen Yufei
API : experimental : lz4frame_static : new dictionary compression API
build : fix : static lib installation, by Ido Rosen
build : dragonFlyBSD, OpenBSD, NetBSD supported
build : LZ4_MEMORY_USAGE can be modified at compile time, through external define
doc : Updated LZ4 Frame format to v1.6.0, restoring Dictionary-ID field
doc : lz4 api manual, by Przemyslaw Skibinski

1.7.5

lz4hc : new high compression mode : levels 10-12 compress more and slower, by Przemyslaw Skibinski
lz4cat : fix : works with relative path (284) and stdin (285) (reported by beiDei8z)
cli : fix minor notification when using -r recursive mode
API : lz4frame : LZ4F_frameBound(0) gives upper bound of *flush() and *End() operations (290, 280)
doc : markdown version of man page, by Takayuki Matsuoka (279)
build : Makefile : fix make -jX lib+exe concurrency (277)
build : cmake : improvements by Michał Górny (296)

1.7.4.2

fix : Makefile : release build compatible with PIE and customized compilation directives provided through environment variables (274, reported by Antoine Martin)

1.7.4

Improved : much better speed in -mx32 mode
cli : fix : Large file support in 32-bits mode on Mac OS-X
fix : compilation on gcc 4.4 (272), reported by Antoine Martin

1.7.3

Changed : moved to versioning; package, cli and library have same version number
Improved: Small decompression speed boost
Improved: Small compression speed improvement on 64-bits systems
Improved: Small compression ratio and speed improvement on small files
Improved: Significant speed boost on ARMv6 and ARMv7
Fix : better ratio on 64-bits big-endian targets
Improved cmake build script, by Evan Nemerson
New liblz4-dll project, by Przemyslaw Skibinki
Makefile: Generates object files (*.o) for faster (re)compilation on low power systems
cli : new : --rm and --help commands
cli : new : preserved file attributes, by Przemyslaw Skibinki
cli : fix : crash on some invalid inputs
cli : fix : -t correctly validates lz4-compressed files, by Nick Terrell
cli : fix : detects and reports fread() errors, thanks to Hiroshi Fujishima report 243
cli : bench : new : -r recursive mode
lz4cat : can cat multiple files in a single command line (184)
Added : doc/lz4_manual.html, by Przemyslaw Skibinski
Added : dictionary compression and frame decompression examples, by Nick Terrell
Added : Debianization, by Evgeniy Polyakov

r131
New    : Dos/DJGPP target, thanks to Louis Santillan (114)
Added  : Example using lz4frame library, by Zbigniew Jędrzejewski-Szmek (118)
Changed: xxhash symbols are modified (namespace emulation) within liblz4

r130:
Fixed  : incompatibility sparse mode vs console, reported by Yongwoon Cho (105)
Fixed  : LZ4IO exits too early when frame crc not present, reported by Yongwoon Cho (106)
Fixed  : incompatibility sparse mode vs append mode, reported by Takayuki Matsuoka (110)
Performance fix : big compression speed boost for clang (+30%)
New    : cross-version test, by Takayuki Matsuoka

r129:
Added  : LZ4_compress_fast(), LZ4_compress_fast_continue()
Added  : LZ4_compress_destSize()
Changed: New lz4 and lz4hc compression API. Previous function prototypes still supported.
Changed: Sparse file support enabled by default
New    : LZ4 CLI improved performance compressing/decompressing multiple files (86, kind contribution from Kyle J. Harper &amp; Takayuki Matsuoka)
Fixed  : GCC 4.9+ optimization bug - Reported by Markus Trippelsdorf, Greg Slazinski &amp; Evan Nemerson
Changed: Enums converted to LZ4F_ namespace convention - by Takayuki Matsuoka
Added  : AppVeyor CI environment, for Visual tests - Suggested by Takayuki Matsuoka
Modified:Obsolete functions generate warnings - Suggested by Evan Nemerson, contributed by Takayuki Matsuoka
Fixed  : Bug 75 (unfinished stream), reported by Yongwoon Cho
Updated: Documentation converted to MarkDown format

r128:
New    : lz4cli sparse file support (Requested by Neil Wilson, and contributed by Takayuki Matsuoka)
New    : command -m, to compress multiple files in a single command (suggested by Kyle J. Harper)
Fixed  : Restored lz4hc compression ratio (slightly lower since r124)
New    : lz4 cli supports long commands (suggested by Takayuki Matsuoka)
New    : lz4frame &amp; lz4cli frame content size support
New    : lz4frame supports skippable frames, as requested by Sergey Cherepanov
Changed: Default &quot;make install&quot; directory is /usr/local, as notified by Ron Johnson
New    : lz4 cli supports &quot;pass-through&quot; mode, requested by Neil Wilson
New    : datagen can generate sparse files
New    : scan-build tests, thanks to kind help by Takayuki Matsuoka
New    : g++ compatibility tests
New    : arm cross-compilation test, thanks to kind help by Takayuki Matsuoka
Fixed  : Fuzzer + frametest compatibility with NetBSD (issue 48, reported by Thomas Klausner)
Added  : Visual project directory
Updated: Man page &amp; Specification

r127:
N/A   : added a file on SVN

r126:
New   : lz4frame API is now integrated into liblz4
Fixed : GCC 4.9 bug on highest performance settings, reported by Greg Slazinski
Fixed : bug within LZ4 HC streaming mode, reported by James Boyle
Fixed : older compiler don&#x27;t like nameless unions, reported by Cheyi Lin
Changed : lz4 is C90 compatible
Changed : added -pedantic option, fixed a few mminor warnings

r125:
Changed : endian and alignment code
Changed : directory structure : new &quot;lib&quot; directory
Updated : lz4io, now uses lz4frame
Improved: slightly improved decoding speed
Fixed : LZ4_compress_limitedOutput(); Special thanks to Christopher Speller !
Fixed : some alignment warnings under clang
Fixed : deprecated function LZ4_slideInputBufferHC()

r124:
New : LZ4 HC streaming mode
Fixed : LZ4F_compressBound() using null preferencesPtr
Updated : xxHash to r38

1.4.1

Updated : xxHash, to r36

r121:
Added : Makefile : install for kFreeBSD and Hurd (Nobuhiro Iwamatsu)
Fix : Makefile : install for OS-X and BSD, thanks to Takayuki Matsuoka

r120:
Modified : Streaming API, using strong types
Added : LZ4_versionNumber(), thanks to Takayuki Matsuoka
Fix : OS-X : library install name, thanks to Clemens Lang
Updated : Makefile : synchronize library version number with lz4.h, thanks to Takayuki Matsuoka
Updated : Makefile : stricter compilation flags
Added : pkg-config, thanks to Zbigniew Jędrzejewski-Szmek (issue 135)
Makefile : lz4-test only test native binaries, as suggested by Michał Górny (issue 136)
Updated : xxHash to r35

r119:
Fix : Issue 134 : extended malicious address space overflow in 32-bits mode for some specific configurations

r118:
New : LZ4 Streaming API (Fast version), special thanks to Takayuki Matsuoka
New : datagen : parametrable synthetic data generator for tests
Improved : fuzzer, support more test cases, more parameters, ability to jump to specific test
fix : support ppc64le platform (issue 131)
fix : Issue 52 (malicious address space overflow in 32-bits mode when using large custom format)
fix : Makefile : minor issue 130 : header files permissions

r117:
Added : man pages for lz4c and lz4cat
Added : automated tests on Travis, thanks to Takayuki Matsuoka !
fix : block-dependency command line (issue 127)
fix : lz4fullbench (issue 128)

r116:
hotfix (issue 124 &amp; 125)

r115:
Added : lz4cat utility, installed on POSX systems (issue 118)
OS-X compatible compilation of dynamic library (issue 115)

r114:
Makefile : library correctly compiled with -O3 switch (issue 114)
Makefile : library compilation compatible with clang
Makefile : library is versioned and linked (issue 119)
lz4.h : no more static inline prototypes (issue 116)
man : improved header/footer (issue 111)
Makefile : Use system default $(CC) &amp; $(MAKE) variables (issue 112)
xxhash : updated to r34

r113:
Large decompression speed improvement for GCC 32-bits. Thanks to Valery Croizier !
LZ4HC : Compression Level is now a programmable parameter (CLI from 4 to 9)
Separated IO routines from command line (lz4io.c)
Version number into lz4.h (suggested by Francesc Alted)

r112:
quickfix

r111 :
Makefile : added capability to install libraries
Modified Directory tree, to better separate libraries from programs.

r110 :
lz4 &amp; lz4hc : added capability to allocate state &amp; stream state with custom allocator (issue 99)
fuzzer &amp; fullbench : updated to test new functions
man : documented -l command (Legacy format, for Linux kernel compression) (issue 102)
cmake : improved version by Mika Attila, building programs and libraries (issue 100)
xxHash : updated to r33
Makefile : clean also delete local package .tar.gz

r109 :
lz4.c : corrected issue 98 (LZ4_compress_limitedOutput())
Makefile : can specify version number from makefile

r108 :
lz4.c : corrected compression efficiency issue 97 in 64-bits chained mode (-BD) for streams &gt; 4 GB (thanks Roman Strashkin for reporting)

r107 :
Makefile : support DESTDIR for staged installs. Thanks Jorge Aparicio.
Makefile : make install installs both lz4 and lz4c (Jorge Aparicio)
Makefile : removed -Wno-implicit-declaration compilation switch
lz4cli.c : include &lt;stduni.h&gt; for isatty() (Luca Barbato)
lz4.h : introduced LZ4_MAX_INPUT_SIZE constant (Shay Green)
lz4.h : LZ4_compressBound() : unified macro and inline definitions (Shay Green)
lz4.h : LZ4_decompressSafe_partial() : clarify comments (Shay Green)
lz4.c : LZ4_compress() verify input size condition (Shay Green)
bench.c : corrected a bug in free memory size evaluation
cmake : install into bin/ directory (Richard Yao)
cmake : check for just C compiler (Elan Ruusamae)

r106 :
Makefile : make dist modify text files in the package to respect Unix EoL convention
lz4cli.c : corrected small display bug in HC mode

r105 :
Makefile : New install script and man page, contributed by Prasad Pandit
lz4cli.c : Minor modifications, for easier extensibility
COPYING  : added license file
LZ4_Streaming_Format.odt : modified file name to remove white space characters
Makefile : .exe suffix now properly added only for Windows target

1.4.0

r123:
Added : experimental lz4frame API, thanks to Takayuki Matsuoka and Christopher Jackson for testings
Fix : s390x support, thanks to Nobuhiro Iwamatsu
Fix : test mode (-t) no longer requires confirmation, thanks to Thary Nguyen

r122:
Fix : AIX &amp; AIX64 support (SamG)
Fix : mips 64-bits support (lew van)
Added : Examples directory, using code examples from Takayuki Matsuoka

Links

PyPI: https://pypi.org/project/lz4
Changelog: https://pyup.io/changelogs/lz4/
Repo: https://github.com/python-lz4/python-lz4

update

opened by pyup-bot 0

Releases(v0.2.4)

v0.2.4(Aug 3, 2020)

If following the example in documentation using newt.db then we must use RelStorage<=2.1.1 during install.
Source code(tar.gz)
Source code(zip)
transistor-0.2.4-py3-none-any.whl(129.14 KB)
transistor-0.2.4.tar.gz(129.76 KB)
v0.2.2(Dec 3, 2018)

Fixed a bug in BaseWorker.load_items() method which previously resulted in losing scrape data when the number of workers did not equal the number of tasks. Now, using any number of workers or pool size will result in consistent export/save results. While scrape time will change proportional to the number of workers assigned. Wrote tests to ensure the same.
Source code(tar.gz)
Source code(zip)
transistor-0.2.2-py3-none-any.whl(137.51 KB)
transistor-0.2.2.tar.gz(120.37 KB)
v0.2.1(Nov 29, 2018)

Added url parameter to the WorkGroup which is a bit more attractive API, instead of including the URL in a kwarg. The reason why the URL was originally included as a kwarg is that depending on how the custom Spider is set up, the URL may already be specified, and it is redundant to specify it again. But for API clarity sake, now we just insist the URL is specified in the WorkGroup. At least, it is easier to read at a quick glance.
Source code(tar.gz)
Source code(zip)
transistor-0.2.1-py3-none-any.whl(118.98 KB)
transistor-0.2.1.tar.gz(114.11 KB)
v0.2.0(Nov 29, 2018)

Many API breaking changes. See README at https://github.com/bomquote/transistor/blob/master/CHANGES
Source code(tar.gz)
Source code(zip)
transistor-0.2.0-py3-none-any.whl(118.84 KB)
transistor-0.2.0.tar.gz(113.55 KB)
v0.1.1(Nov 17, 2018)
standardized SplashScraper attributes: auth, baseurl, browser, cookies, crawlera_user, http_session_timeout, http_session_valid, LUA_SOURCE, max_retries, name, number, referrer, searchurl, splash_args, user_agent.

now, nearly all of the SplashScraper attributes can be set via **kwargs if desired

when initializing a StatefulBook instance, use a **kwarg called keywords to set the name of the spreadsheet column heading which contains the target search terms. For example: keywords='titles' or keywords='part_numbers'. Defaults to "item".

Source code(tar.gz)
Source code(zip)
transistor-0.1.1-py3-none-any.whl(87.38 KB)
transistor-0.1.1.tar.gz(92.20 KB)
v0.1.0(Nov 12, 2018)

Source code(tar.gz)
Source code(zip)

Owner

BOM Quote Manufacturing

GitHub

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

8 Oct 27, 2022

Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

45.5k Jan 7, 2023

Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia ??️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

1.6k Jan 1, 2023

Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

2.3k Jan 4, 2023

A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

1 Dec 19, 2021

Amazon web scraping using Scrapy Framework

Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

1 Jan 25, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 8, 2023

Web Scraping Practica With Python

Web-Scraping-Practica Integrants: Guillem Vidal Pallarols. Lídia Bandrés Solé Fitxers: Aquest document és el primer que trobem. A continuació trobem u

2 Nov 8, 2021

Here I provide the source code for doing web scraping using the python library, it is Selenium.

1 Nov 13, 2021

Web Scraping OLX with Python and Bsoup.

webScrap WebScraping first step. Authors: Paulo, Claudio M. First steps in Web Scraping. Project carried out for training in Web Scrapping. The export

5 Sep 25, 2022

Web Scraping images using Selenium and Python

Web Scraping images using Selenium and Python A propos de ce document This is a markdown document about Web scraping images and videos using Selenium

3 Jul 1, 2022

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

1 Jan 4, 2022

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go

1 Mar 28, 2022

Transistor, a Python web scraping framework for intelligent use cases.

Related tags

Overview

transistor

About

Quickstart

Quickstart: Setup Splash

Quickstart: books_to_scrape example

Directly Using A SplashScraper

Architecture Summary

Database Setup

More on StatefulBook

Testing

Comments

3.4.5

3.4.4

3.4.3

3.4.2

3.4.1

3.4.0

3.3.2

3.3.1

3.3.0

3.2.1

3.2.0

3.1.2

3.1.1

3.1.0

3.0.1

3.0.0

3.0rc1

3.0b3

3.0b2

3.0b1

3.0a13

3.0a12

3.0a11

3.0a10

3.0a9

3.0a8

3.0a7

3.0a6

3.0a5

3.0a4

3.0a3

3.0a2

3.0a1

3.4.2 (2021-04-21)

3.4.1 (2021-04-12)

3.4.0 (2020-10-19)

3.3.2 (2020-09-21)

3.3.1 (2020-09-14)

3.3.0 (2020-09-14)

3.4.2

3.4.1

3.4.0

3.3.2

3.3.1

3.3.0

3.2.1

3.2.0

3.1.2

3.1.1

3.1.0

3.0.1

3.0.0

3.0rc1

3.0b3

3.0b2

3.0b1

3.0a13

3.0a12

3.0a11

3.0a10

3.0a9

3.0a8

3.0a7

3.0a6

3.0a5

3.0a4

Quickstart: `books_to_scrape` example