A mutable set that remembers the order of its entries. One of Python's missing data types.

Overview

Travis Codecov Pypi

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.

Usage examples

An OrderedSet is created and used like a set:

>>> from ordered_set import OrderedSet

>>> letters = OrderedSet('abracadabra')

>>> letters
OrderedSet(['a', 'b', 'r', 'c', 'd'])

>>> 'r' in letters
True

It is efficient to find the index of an entry in an OrderedSet, or find an entry by its index. To help with this use case, the .add() method returns the index of the added item, whether it was already in the set or not.

>>> letters.index('r')
2

>>> letters[2]
'r'

>>> letters.add('r')
2

>>> letters.add('x')
5

OrderedSets implement the union (|), intersection (&), and difference (-) operators like sets do.

>>> letters |= OrderedSet('shazam')

>>> letters
OrderedSet(['a', 'b', 'r', 'c', 'd', 'x', 's', 'h', 'z', 'm'])

>>> letters & set('aeiou')
OrderedSet(['a'])

>>> letters -= 'abcd'

>>> letters
OrderedSet(['r', 'x', 's', 'h', 'z', 'm'])

The __getitem__() and index() methods have been extended to accept any iterable except a string, returning a list, to perform NumPy-like "fancy indexing".

>>> letters = OrderedSet('abracadabra')

>>> letters[[0, 2, 3]]
['a', 'r', 'c']

>>> letters.index(['a', 'r', 'c'])
[0, 2, 3]

OrderedSet implements __getstate__ and __setstate__ so it can be pickled, and implements the abstract base classes collections.MutableSet and collections.Sequence.

OrderedSet can be used as a generic collection type, similar to the collections in the typing module like List, Dict, and Set. For example, you can annotate a variable as having the type OrderedSet[str] or OrderedSet[Tuple[int, str]].

OrderedSet in data science applications

An OrderedSet can be used as a bi-directional mapping between a sparse vocabulary and dense index numbers. As of version 3.1, it accepts NumPy arrays of index numbers as well as lists.

This combination of features makes OrderedSet a simple implementation of many of the things that pandas.Index is used for, and many of its operations are faster than the equivalent pandas operations.

For further compatibility with pandas.Index, get_loc (the pandas method for looking up a single index) and get_indexer (the pandas method for fancy indexing in reverse) are both aliases for index (which handles both cases in OrderedSet).

Authors

OrderedSet was implemented by Robyn Speer. Jon Crall contributed changes and tests to make it fit the Python set API. Roman Inflianskas added the original type annotations.

Comparisons

The original implementation of OrderedSet was a recipe posted to ActiveState Recipes by Raymond Hettiger, released under the MIT license.

Hettiger's implementation kept its content in a doubly-linked list referenced by a dict. As a result, looking up an item by its index was an O(N) operation, while deletion was O(1).

This version makes different trade-offs for the sake of efficient lookups. Its content is a standard Python list instead of a doubly-linked list. This provides O(1) lookups by index at the expense of O(N) deletion, as well as slightly faster iteration.

In Python 3.6 and later, the built-in dict type is inherently ordered. If you ignore the dictionary values, that also gives you a simple ordered set, with fast O(1) insertion, deletion, iteration and membership testing. However, dict does not provide the list-like random access features of OrderedSet. You would have to convert it to a list in O(N) to look up the index of an entry or look up an entry by its index.

Comments
  • Add ordered-set-stubs installation instructions

    Add ordered-set-stubs installation instructions

    I have found that:

    1. ordered_set.pyi file is not copied to ordered_set package.
    2. #26 is not fixed, OrderedSet[str], for example, gives an error.

    This PR fixes both.

    After the merge, please release. Sorry, I didn't notice for the first time.

    opened by rominf 17
  • Please consider switching to flit_core

    Please consider switching to flit_core

    I'm the @Gentoo maintainer for core Python packages. We're currently considering options for bootstrapping setuptools while unvendoring its dependencies in the PEP 517 world. One of the interesting options is to use flit as it has much fewer dependencies than setuptools and we've got its bootstrap trivially covered.

    Using flit as ordered-set's build system would mean that we can cleanly install it before building setuptools. Flit can also generate a backwards-compatible setup.py (or be used alongside setup.py, though for this to work for us pyproject.toml would have to specify flit_core as the build-backend). FWICS it is becoming increasingly popular these days.

    Would you be interested in this? If yes, then I can prepare a PR either converting the project to flit entirely, or adding it alongside existing setup.py, as you prefer.

    opened by mgorny 12
  • Directly distribute type hints via PEP 561

    Directly distribute type hints via PEP 561

    There is already a type stub file (ordered_set.pyi), but this was not actually being distributed to end-users, so, they had to use the package ordered-set-stubs.

    This uses the mechanisms from PEP 561 to ensure that the type stub file is properly distributed so that type hints work out-of-the-box, rather than requiring ordered-set-stubs.

    Per PEP 561, this requires two changes to setup.py:

    1. Add a sentinel file py.typed and include this in package_data.
    2. Use packages rather than py_modules, as the PEP states:

    This PEP does not support distributing typing information as part of module-only distributions. The code should be refactored into a package-based distribution and indicate that the package supports typing as described above.

    opened by Eric-Arellano 12
  • Conform to set API, update to pytest, add CI badges

    Conform to set API, update to pytest, add CI badges

    This is a fairly large PR in terms of text, but I think the content is manageable for review.

    It seems like this is the best implementation of OrderedSet on PYPI, but unfortunately it is missing functions implemented by the builtin set object like: intersection, symmetric_difference_update, and issubset. This PR adds implementations for these functions.

    A brief summary of changes is:

    • Implement all functions exposed in Python's builtin set API.
    • Replace nose with pytest
    • Add TravisCI and corresponding badges in the README
    • Change getitem(slice(None)) to return a copy instead of a self reference.

    As for reasoning behind these changes:

    • I think OrderedSet should serve as a drop-in replacement for set, and therefore should implement the full API.
    • I've replaced nose with pytest, because nose is [deprecated]. (http://nose.readthedocs.io/en/latest/index.html).
    • TravisCI serves as a public facing CI, so it can see that all tests pass without having to trust text in the README. This also lets the readme publicly display test coverage, and there is a nice pypi integration badge as well.
    • The __getitem__ change may be the most debatable. The previous docs stated that OrderedSet()[:] would return an exact self reference, but I want to argue against this behavior. The main point is that this deviates from a standard behavior taken by other Mutable Python objects. Hopefully this code snippet illustrates my point:
    >>> x = [1, 2, 3]
    >>> x[:] is x
    False
    
    >>> x = np.array([1, 2, 3])
    >>> x[:] is x
    False
    
    >>> # Why should OrderedSet have behavior different than a list here?
    >>> x = OrderedSet([1, 2, 3])
    >>> x[:] is x
    True
    

    A potential counter point may be that tuple([1, 2, 3])[:], does return a reference to itself, but tuple is an immutable object, where as OrderedSet is not.

    Unfortunately, even though this change is relatively minor it is a potentially breaking change that requires a major version update. My PR includes that as well. If I fail to convince you that this is a good idea, it can obviously be taken out and the minor version can be incremented.


    Note: I'm submitting the PR in its current state to gauge interest. Some of the code --- especially the tests --- needs a bit of tweaking to match the style of the other code. I will do this work if the maintainers like the general changes.

    I have my own implementation of OrderedSet, also based on Raymond Hettiger's original recipe, in my utility package but I'd like to rely on a standard pypi package instead. Also, I agree that nobody wants O(1) deletes in exchange for O(N) lookups. Thus most of this PR was simply copy and pasted from my implementation.

    All of the new tests were autogenerated from the doctests in my old package. As such, the stdout way of testing results doesn't work and all tests would need to be updated to actually check for the right values. Lastly, many of the tests might be redundant with existing tests. Again, I do the work to make these fixes if there is any interest in merging this PR.

    A summary of the cleanup work that would need to be done is:

    • [x] change travis targets in the readme to point at LuminosoInsight instead of Erotemic.
    • [x] Ensure that all test outputs are being checked
    • [x] Either enable doctests in CI testing? or Remove doctests from the main module?
    opened by Erotemic 8
  • It should be possible to build ordered-set without setuptools installed

    It should be possible to build ordered-set without setuptools installed

    python setuptools attempt to import ordered_set. Therefore it should be possible to build ordered_set with setuptools uninstalled. Perhaps try something like this:

    try:
        from setuptools import setup
    except ImportError:
        from distutils.core import setup
    
    opened by jachymb 7
  • Package is not PEP 561 compliant

    Package is not PEP 561 compliant

    Hi, I notice that this package isn't PEP 561 compliant, making mypy (v0.782, Python 3.7) unable to detect type annotations even though they exist in the code. A remedy is to put a py.typed file in the package directory, as explained in https://mypy.readthedocs.io/en/stable/installed_packages.html#making-pep-561-compatible-packages. I may be able to make a PR for this, so let me know if you're OK with it.

    opened by kmkurn 7
  • OrderedSet is slower than OrderedDict; should use OrderedDict instead

    OrderedSet is slower than OrderedDict; should use OrderedDict instead

    Since it requires Python >= 3.5 anyway, OrderedSet can be reduced to an adapter of collections.OrderedDict. One can easily use a dict as a set by ignoring the dict's values.

    A synthetic benchmark I ran shows that, if CPU caching does not come into play (a big if), an OrderedDict is faster (especially in the case of deletion, which is O(n) ):

    Wrote profile results to set_test.py.lprof
    Timer unit: 1e-06 s
    
    Total time: 6.21423 s
    File: set_test.py
    Function: bench at line 13
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
        13                                           @profile
        14                                           def bench():
        15                                               # Creation
        16         1       5228.0   5228.0      0.1      s = OrderedSet(included)
        17         1      17110.0  17110.0      0.3      d = OrderedDict({i: 1 for i in included})
        18                                           
        19                                               # Insertion
        20      3001       1518.0      0.5      0.0      for i in missing:
        21      3000       5976.0      2.0      0.1          s.add(i)
        22      3001       1518.0      0.5      0.0      for i in missing:
        23      3000       2231.0      0.7      0.0          d[i] = 1
        24                                           
        25                                               # Deletion
        26      3001       2161.0      0.7      0.0      for i in missing:
        27      3000    6159236.0   2053.1     99.1          s.remove(i)
        28      3001       1483.0      0.5      0.0      for i in missing:
        29      3000       2533.0      0.8      0.0          d.pop(i)
        30                                           
        31                                               # Membership test (included)
        32      3001       1492.0      0.5      0.0      for i in included:
        33      3000       2764.0      0.9      0.0          assert i in s
        34      3001       1534.0      0.5      0.0      for i in included:
        35      3000       1786.0      0.6      0.0          assert i in d
        36                                           
        37                                               # Membership test (missing)
        38      3001       1510.0      0.5      0.0      for i in missing:
        39      3000       2836.0      0.9      0.0          assert i not in s
        40      3001       1519.0      0.5      0.0      for i in missing:
        41      3000       1795.0      0.6      0.0          assert i not in d
    

    I am quite convinced there would be performance improvements, if we changed the implementation to use an OrderedDict instead of a set and a list (or a Python 3.6 dict, which is ordered, as @Eric-Arellano suggests).

    Edit: However, this would make the access by index (like OrderedSet[1]) much slower (complexity O(n) instead of the current O(1)).

    Would you accept a PR doing so?

    opened by danuker 6
  • Equality comparisons with regular set don't commute

    Equality comparisons with regular set don't commute

    Equality with set is not commutative:

    >>> s = {'x', 'y'}
    >>> os = OrderedSet(['x', 'y'])
    >>> s == os
    False
    >>> os == s
    True
    

    It will be difficult to fix because this returns False instead of NotImplemented

    >>> s.__eq__(os)
    False
    

    I'm not sure the best way forward. You can allow OrderedSet.__eq__ to take the precedence if it is a proper set subclass..

    opened by wimglenn 6
  • v4.0

    v4.0

    This changes equality comparisons completely to make them more consistent with how all the other data structures in Python stdlib behave.

    • Comparisons with regular set and frozenset are order agnostic.
    • Comparisons with other OrderedSet (and other ordered-set-like-things) are order sensitive.
    • Previous equality comparison with list/deque/sequence was removed (this was unpythonic and had no precedent)
    • Comparisons with unrelated types return NotImplemented - this is allowing the other type a chance to potentially handle the operation (as opposed to returning False)
    • Major version number is bumped up because this is a backwards-incompat change.

    The behaviour of equality comparison was modeled after the way collections.OrderedDict behaves (order agnostic vs regular dict, order sensitive vs other ordered dict).

    opened by wimglenn 5
  • Alternative implementation

    Alternative implementation

    Did this as an exercise to see if it would look cleaner to implement this using collections.OrderedDict.

    Use multiple inheritance (Sequence in addition to MutableSet) to get indexing.

    This implementation doesn't have to keep track of two different objects self.items and self.map (though maybe collections.OrderedDict might be doing that internally)... either way, the code looks cleaner. Look at the implementation of the abstract methods.

    The index method has this feature where you can pass it an iterable. Was able to implement this feature without having to re-implement index... we can just call super() and use the one we get from collections.Sequence

    All tests pass without modifying test.py

    opened by ericfrederich 5
  • Fixed bitwise and

    Fixed bitwise and

    The inherited bitwise and func (__and__) returns values in reverse order. This PR adds a fix and a test. Removing my change causes test_bitwise_and to fail (because the order of result2 will be incorrect).

    The fix simply overrides __and__ to call self.intersection directly.

    opened by Erotemic 4
  • Interest in including OrderedSet in python standard library?

    Interest in including OrderedSet in python standard library?

    Hello, I am interested to know if the maintainers of this package would have interest in including this functionality in the python standard library. It seems natural for OrderedSet to sit next to OrderedDict in the collections package. There is a discussion on the Python Discussion Forums. It is not clear-cut that it should be added but I wanted to gauge the interest of the folks maintaining the current Python ordered set implementations.

    https://discuss.python.org/t/add-orderedset-to-stdlib/12730/15

    opened by jagerber48 1
  • orderedset.update -> should be iterable

    orderedset.update -> should be iterable

    Current:

        def update(self, sequence: Union[Sequence[T], Set[T]]) -> int:
    

    Proposed:

        def update(self, sequence: Iterable[T]) -> int:
    

    The reason is that I push in a generator, which works just fine. But the IDE complains because it's neither a Sequence, nor a Set.

    opened by svaningelgem 0
  • OrderedSet.pop() with a non-default index breaks the contract

    OrderedSet.pop() with a non-default index breaks the contract

    >>> from ordered_set import OrderedSet
    >>> a = OrderedSet(["a", "b", "c"])
    >>> a.index("b")
    1
    >>> a.pop(0)
    'a'
    >>> a.index("b")
    1
    >>> a[1]
    'c'
    

    This was revealed by looking into #79. pop() would need to rebuild the mapping in place, like discard() does. Its current behavior, which only works for index=-1, would be an optimization.

    opened by rspeer 1
  • Support item assignment

    Support item assignment

    Since OrderedSet supports indexing, it's natural to assume that assignments and del would work as well, but they don't:

    >>> s = OrderedSet(['foo'])
    >>> s[0]
    'foo'
    >>> s[0] = 'bar'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'OrderedSet' object does not support item assignment
    >>> del s[0]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'OrderedSet' object doesn't support item deletion
    

    There's an easy workaround for del (s.remove(s[0])), but not for item assignments. As far as I can tell, there is no way to reorder items at all, so it's impossible to replace an element with another one (while preserving its index).

    opened by Aran-Fey 9
Owner
Elia Robyn Lake (Robyn Speer)
Knowledge graph sorceress. She/her. Developer of ConceptNet, which provides the common-sense background knowledge for some state-of-the-art NLP.
Elia Robyn Lake (Robyn Speer)
Integrating C Buffer Data Into the instruction of `.text` segment instead of on `.data`, `.rodata` to avoid copy.

gcc-bufdata-integrating2text Integrating C Buffer Data Into the instruction of .text segment instead of on .data, .rodata to avoid copy. Usage In your

Jack Ren 1 Jan 31, 2022
pyprobables is a pure-python library for probabilistic data structures

pyprobables is a pure-python library for probabilistic data structures. The goal is to provide the developer with a pure-python implementation of common probabilistic data-structures to use in their work.

Tyler Barrus 86 Dec 25, 2022
An esoteric data type built entirely of NaNs.

NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co

Travis Hoppe 72 Jan 1, 2023
This repository is a compilation of important Data Structures and Algorithms based on Python.

Python DSA ?? This repository is a compilation of important Data Structures and Algorithms based on Python. Please make seperate folders for different

Bhavya Verma 27 Oct 29, 2022
This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages

DSA-Code-Snippet This repository is for adding codes of data structures and algorithms, leetCode, hackerrank etc solutions in different languages Cont

DSCSRMNCR 3 Oct 22, 2021
Data Structures and algorithms package implementation

Documentation Simple and Easy Package --This is package for enabling basic linear and non-linear data structures and algos-- Data Structures Array Sta

null 1 Oct 30, 2021
This repo is all about different data structures and algorithms..

Data Structure and Algorithm : Want to learn data strutrues and algorithms ??? Then Stop thinking more and start to learn today. This repo will help y

Priyanka Kothari 7 Jul 10, 2022
This repo represents all we learned and are learning in Data Structure course.

DataStructure Journey This repo represents all we learned and are learning in Data Structure course which is based on CLRS book and is being taught by

Aprime Afr (Alireza Afroozi) 3 Jan 22, 2022
Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures

Svector Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures. Easily chain your methods confidently with tons of add

James Chua 5 Dec 9, 2022
This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms

Problems and it's solutions. Problem solving, a great Speed comes with a good Accuracy. The more Accurate you can write code, the more Speed you will

SAMIR PAUL 1.3k Jan 1, 2023
My notes on Data structure and Algos in golang implementation and python

My notes on DS and Algo Table of Contents Arrays LinkedList Trees Types of trees: Tree/Graph Traversal Algorithms Heap Priorty Queue Trie Graphs Graph

Chia Yong Kang 0 Feb 13, 2022
Webtesting for course Data Structures & Algorithms

Selenium job to automate queries to check last posts of Module Data Structures & Algorithms Web-testing for course Data Structures & Algorithms Struct

null 1 Dec 15, 2021
Data Structure With Python

Data-Structure-With-Python- Python programs also include in this repo Stack A stack is a linear data structure that stores items in a Last-In/First-Ou

Sumit Nautiyal 2 Jan 9, 2022
schemasheets - structuring your data using spreadsheets

schemasheets - structuring your data using spreadsheets Create a data dictionary / schema for your data using simple spreadsheets - no coding required

Linked data Modeling Language 23 Dec 1, 2022
A collection of data structures and algorithms I'm writing while learning

Data Structures and Algorithms: This is a collection of data structures and algorithms that I write while learning the subject Stack: stack.py A stack

Dhravya Shah 1 Jan 9, 2022
Final Project for Practical Python Programming and Algorithms for Data Analysis

Final Project for Practical Python Programming and Algorithms for Data Analysis (PHW2781L, Summer 2020) Redlining, Race-Exclusive Deed Restriction Lan

Aislyn Schalck 1 Jan 27, 2022
IADS 2021-22 Algorithm and Data structure collection

A collection of algorithms and datastructures introduced during UoE's Introduction to Datastructures and Algorithms class.

Artemis Livingstone 20 Nov 7, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

null 6 Jun 27, 2022
This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

SaiVenkatDhulipudi 2 Nov 17, 2021
Mini Tool to lovers of debe from eksisozluk (one of the most famous website -reffered as collaborative dictionary like reddit- in Turkey) for pushing debe (Most Liked Entries of Yesterday) to kindle every day via Github Actions.

debe to kindle Mini Tool to lovers of debe from eksisozluk (one of the most famous website -refered as collaborative dictionary like reddit- in Turkey

null 11 Oct 11, 2022