Universal 1d/2d data containers with Transformers functionality for data analysis.

Overview

Logo

Build Status PyPI version

XPandas (extended Pandas) implements 1D and 2D data containers for storing type-heterogeneous tabular data of any type, and encapsulates feature extraction and transformation modelling in an sklearn-compatible transformer interface.

Quickstart

Install the latest version

$ pip install xpandas

and run the example jupyter notebook

$ jupyter examples/ExampleUsage.ipynb

Documentation

The full documentation is available at https://alan-turing-institute.github.io/xpandas/.

Acknowledgements

  • Bernd Bischl (@berndbischl), who mentioned the idea of a general data container with transformers attached to columns in personal discussion with Franz Kiraly during a London visit in 2016.
  • Franz Kiraly (@fkiraly), who initiated and funded the project up to release, and who substantially contributed to the API design.
  • Haoran Xue (@HaoranXue), who, under the supervision of Franz Kiraly, earlier completed a thesis for a degree at UCL on the topic, and who wrote a similar package as part of it. No code was re-used in the creation of the XPandas package.

List of developers and contributors

Comments
  • Acknowledgments

    Acknowledgments

    Before I forget: somewhere prominently the following should be acknowledged in some form, not necessarily in the below:

    Bernd Bischl, who mentioned the idea of a general data container with transformers attached to columns in personal discussion during a London visit in 2016. Myself, having (in my opinion) substantially contributed through the API design (?). Haoran Xue, who completed a thesis on the topic erlier. While no code was transferred, lessons that were learnt may have been transferred.

    opened by fkiraly 2
  • Improved documentation

    Improved documentation

    This pull request improves the readability of the documentation.

    While going through your codebase, I realised that there's a lot of redundancy in the module naming, e.g. /transformers/transformers/series_transformers/series_transformer.py instead of /transformers/series/series_transformer.py. Is there any specific reason for that? If not I'd suggest you to refactor the module into a more straightforward naming structure.

    opened by frthjf 1
  • sensible default for transformation: column replacement

    sensible default for transformation: column replacement

    currently it adds the transformer output while retaining the original column

    for retaining original column: use identityTransformer (to be implemented)

    opened by fkiraly 0
  • tutorial: separate data container from transformer tutorial

    tutorial: separate data container from transformer tutorial

    Structure should be changed to: (1) data container (Xseries and XDataFrame) (2) transformer functionality

    since user should be made aware that (1) is a separate interface concept on top of which (2) may be invoked but isn't necessarily tied together

    opened by fkiraly 0
  • Bump numpy from 1.15.2 to 1.22.0

    Bump numpy from 1.15.2 to 1.22.0

    Bumps numpy from 1.15.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump ipython from 7.0.1 to 7.16.3

    Bump ipython from 7.0.1 to 7.16.3

    Bumps ipython from 7.0.1 to 7.16.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • will it work for multivariate time series prediction   both regression and classification

    will it work for multivariate time series prediction both regression and classification

    great code thanks may you clarify : will it work for multivariate time series prediction both regression and classification 1 where all values are continues values weight height age target 1 56 160 34 1.2 2 77 170 54 3.5 3 87 167 43 0.7 4 55 198 72 0.5 5 88 176 32 2.3

    2 or even will it work for multivariate time series where values are mixture of continues and categorical values for example 2 dimensions have continues values and 3 dimensions are categorical values

    color        weight     gender  height  age  target 
    

    1 black 56 m 160 34 yes 2 white 77 f 170 54 no 3 yellow 87 m 167 43 yes 4 white 55 m 198 72 no 5 white 88 f 176 32 yes

    opened by Sandy4321 0
  • will it work for multivariate time series prediction   both regression and classification

    will it work for multivariate time series prediction both regression and classification

    great code thanks may you clarify : will it work for multivariate time series prediction both regression and classification 1 where all values are continues values weight height age target 1 56 160 34 1.2 2 77 170 54 3.5 3 87 167 43 0.7 4 55 198 72 0.5 5 88 176 32 2.3

    2 or even will it work for multivariate time series where values are mixture of continues and categorical values for example 2 dimensions have continues values and 3 dimensions are categorical values

    color        weight     gender  height  age  target 
    

    1 black 56 m 160 34 yes 2 white 77 f 170 54 no 3 yellow 87 m 167 43 yes 4 white 55 m 198 72 no 5 white 88 f 176 32 yes

    opened by Sandy4321 0
  • Many standard methods do not work (properly) on XDataFrame with hierarchical data

    Many standard methods do not work (properly) on XDataFrame with hierarchical data

    #Β loading some time-series data
    from io import BytesIO
    from zipfile import ZipFile
    from urllib.request import urlopen
    from xpandas.data_container import XSeries, XDataFrame
    import numpy as np
    import pandas as pd
    
    def read_data(file):
        data = file.readlines()
        rows = [row.decode('utf-8').strip().split('  ') for row in data]
        X = pd.DataFrame(rows, dtype=np.float)
        y = X.pop(0)
        ts = XSeries([row for _, row in X.iterrows()])
        X = XDataFrame({'ts1': ts, 'ts2': ts})
        return X, y
    
    url = 'http://www.timeseriesclassification.com/Downloads/GunPoint.zip'
    url = urlopen(url)
    zipfile = ZipFile(BytesIO(url.read()))
    file = zipfile.open('GunPoint_TRAIN.txt')
    X, y = read_data(file)
    
    X.mean() # returns empty series rather than mean of series, the same for many other methods like .std(), .median(), etc)
    
    X.apply(np.mean) # breaks
    
    X['ts1'].mean() # breaks 
    
    X['ts1'].apply(np.mean) # works
    
    X['ts1'].apply(np.percentile, args=(25,)) #Β breaks, does not passes on args 
    
    opened by mloning 0
  • Slicing single row of XDataFrame does not work

    Slicing single row of XDataFrame does not work

    Slicing of single row in XDataFrame does not work, probably because it tries to return a series which does not work as types are heterogeneous, so instead one may want to return a XDataFrame with a single row.

    import pandas as pd
    from xpandas.data_container import XDataFrame
    
    iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
    iris.iloc[0] # works
    
    irisx = XDataFrame(iris) 
    irisx.iloc[0] # breaks
    
    
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-32-b46ce52f9af0> in <module>
    ----> 1 irisx.iloc[0]
    
    ~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
       1476 
       1477             maybe_callable = com._apply_if_callable(key, self.obj)
    -> 1478             return self._getitem_axis(maybe_callable, axis=axis)
       1479 
       1480     def _is_scalar_access(self, key):
    
    ~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
       2102             self._validate_integer(key, axis)
       2103 
    -> 2104             return self._get_loc(key, axis=axis)
       2105 
       2106     def _convert_to_indexer(self, obj, axis=None, is_setter=False):
    
    ~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/indexing.py in _get_loc(self, key, axis)
        143         if axis is None:
        144             axis = self.axis
    --> 145         return self.obj._ixs(key, axis=axis)
        146 
        147     def _slice(self, obj, axis=None, kind=None):
    
    ~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/frame.py in _ixs(self, i, axis)
       2624                                                       index=self.columns,
       2625                                                       name=self.index[i],
    -> 2626                                                       dtype=new_values.dtype)
       2627                 result._set_is_copy(self, copy=copy)
       2628                 return result
    
    ~/.conda/envs/sktime/lib/python3.7/site-packages/xpandas/data_container/data_container.py in __init__(self, *args, **kwargs)
         71         check_result, data_type = _check_all_elements_have_the_same_property(data, type)
         72         if not check_result:
    ---> 73             raise ValueError('Not all elements the same type')
         74 
         75         if data_type is not None:
    
    ValueError: Not all elements the same type
    
    opened by mloning 0
Releases(1.0.2)
Owner
The Alan Turing Institute
The UK's national institute for data science and artificial intelligence.
The Alan Turing Institute
High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

Man Group 2.9k Dec 23, 2022
A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

H2O.ai 1.6k Jan 5, 2023
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second πŸš€

What is Vaex? Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular data

vaex io 7.7k Jan 1, 2023
Fully Automated YouTube Channel ▢️with Added Extra Features.

Fully Automated Youtube Channel β–’β–ˆβ–€β–€β–ˆ β–ˆβ–€β–€β–ˆ β–€β–€β–ˆβ–€β–€ β–€β–€β–ˆβ–€β–€ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–€β–€β–ˆ β–’β–ˆβ–€β–€β–„ β–ˆβ–‘β–‘β–ˆ β–‘β–‘β–ˆβ–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–„β–„β–€ β–’β–ˆβ–„β–„β–ˆ β–€β–€β–€β–€ β–‘β–‘β–€β–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–‘β–€β–€β–€ β–€β–€β–€β–‘

sam-sepiol 249 Jan 2, 2023
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

Agrim Gupta 50 Jan 3, 2023
A Python Library for Simple Models and Containers Persisted in Redis

Redisco Python Containers and Simple Models for Redis Description Redisco allows you to store objects in Redis. It is inspired by the Ruby library Ohm

sebastien requiem 436 Nov 10, 2022
jupyter/ipython experiment containers for GPU and general RAM re-use

ipyexperiments jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks. About Thi

Stas Bekman 153 Dec 7, 2022
Discord Bot that leverages the idea of nested containers using podman, runs untrusted user input, executes Quantum Circuits, allows users to refer to the Qiskit Documentation, and provides the ability to search questions on the Quantum Computing StackExchange.

Discord Bot that leverages the idea of nested containers using podman, runs untrusted user input, executes Quantum Circuits, allows users to refer to the Qiskit Documentation, and provides the ability to search questions on the Quantum Computing StackExchange.

Mehul 23 Oct 18, 2022
Palestra sobre desenvolvimento seguro de imagens e containers para a DockerCon 2021 sala Brasil

Segurança de imagens e containers direto na pipeline Palestra sobre desenvolvimento seguro de imagens e containers para a DockerCon 2021 sala Brasil.

Fernando Guisso 10 May 19, 2022
Torch Containers simplified in PyTorch

pytorch-containers This repository aims to help former Torchies more seamlessly transition to the "Containerless" world of PyTorch by providing a list

Max deGroot 88 Apr 25, 2022
Containers And REST APIs Workshop

Containers & REST APIs Workshop Containers vs Virtual Machines Ferramentas Podman: https://podman.io/ Docker: https://www.docker.com/ IBM CLI: https:/

Vanderlei Munhoz 8 Dec 16, 2021
Build and run Docker containers leveraging NVIDIA GPUs

NVIDIA Container Toolkit Introduction The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includ

NVIDIA Corporation 15.6k Jan 1, 2023
Dockerized service to backup all running database containers

Docker Database Backup Dockerized service to automatically backup all of your database containers. Docker Image Tags: docker.io/jandi/database-backup

Jan Dittrich 16 Dec 31, 2022
Hatch plugin for Docker containers

hatch-containers CI/CD Package Meta This provides a plugin for Hatch that allows

Ofek Lev 11 Dec 30, 2022
Singularity Containers on Apple M1 (ARM64)

Singularity Containers on Apple M1 (ARM64) This is a repository containing a ready-to-use environment for singularity in arm64 (M1). It has been prepa

Manuel Parra 4 Nov 14, 2022
management tool for systemd-nspawn containers

nspctl nspctl, management tool for systemd-nspawn containers. Why nspctl? There are different tools for systemd-nspawn containers. You can use native

Emre Eryilmaz 5 Nov 27, 2022