Universal 1d/2d data containers with Transformers functionality for data analysis.

The Alan Turing Institute

Last update: Mar 14, 2022

Related tags

Overview

XPandas (extended Pandas) implements 1D and 2D data containers for storing type-heterogeneous tabular data of any type, and encapsulates feature extraction and transformation modelling in an sklearn-compatible transformer interface.

Quickstart

Install the latest version

$ pip install xpandas

and run the example jupyter notebook

$ jupyter examples/ExampleUsage.ipynb

Documentation

The full documentation is available at https://alan-turing-institute.github.io/xpandas/.

Acknowledgements

Bernd Bischl (@berndbischl), who mentioned the idea of a general data container with transformers attached to columns in personal discussion with Franz Kiraly during a London visit in 2016.
Franz Kiraly (@fkiraly), who initiated and funded the project up to release, and who substantially contributed to the API design.
Haoran Xue (@HaoranXue), who, under the supervision of Franz Kiraly, earlier completed a thesis for a degree at UCL on the topic, and who wrote a similar package as part of it. No code was re-used in the creation of the XPandas package.

List of developers and contributors

Comments

Acknowledgments

Before I forget: somewhere prominently the following should be acknowledged in some form, not necessarily in the below:

Bernd Bischl, who mentioned the idea of a general data container with transformers attached to columns in personal discussion during a London visit in 2016. Myself, having (in my opinion) substantially contributed through the API design (?). Haoran Xue, who completed a thesis on the topic erlier. While no code was transferred, lessons that were learnt may have been transferred.

opened by fkiraly 2
Improved documentation

This pull request improves the readability of the documentation.

While going through your codebase, I realised that there's a lot of redundancy in the module naming, e.g. /transformers/transformers/series_transformers/series_transformer.py instead of /transformers/series/series_transformer.py. Is there any specific reason for that? If not I'd suggest you to refactor the module into a more straightforward naming structure.

opened by frthjf 1
sensible default for transformation: column replacement

currently it adds the transformer output while retaining the original column

for retaining original column: use identityTransformer (to be implemented)

opened by fkiraly 0
tutorial: separate data container from transformer tutorial

Structure should be changed to: (1) data container (Xseries and XDataFrame) (2) transformer functionality

since user should be made aware that (1) is a separate interface concept on top of which (2) may be invoked but isn't necessarily tied together

opened by fkiraly 0
Bump numpy from 1.15.2 to 1.22.0
Bumps numpy from 1.15.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump ipython from 7.0.1 to 7.16.3
Bumps ipython from 7.0.1 to 7.16.3.

Commits

d43c7c7 release 7.16.3

5fa1e40 Merge pull request from GHSA-pq7m-3gw7-gq5x

8df8971 back to dev

9f477b7 release 7.16.2

138f266 bring back release helper from master branch

5aa3634 Merge pull request #13341 from meeseeksmachine/auto-backport-of-pr-13335-on-7...

bcae8e0 Backport PR #13335: What's new 7.16.2

8fcdcd3 Pin Jedi to <0.17.2.

2486838 release 7.16.1

20bdc6f fix conda build

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
will it work for multivariate time series prediction both regression and classification
great code thanks may you clarify : will it work for multivariate time series prediction both regression and classification 1 where all values are continues values weight height age target 1 56 160 34 1.2 2 77 170 54 3.5 3 87 167 43 0.7 4 55 198 72 0.5 5 88 176 32 2.3

2 or even will it work for multivariate time series where values are mixture of continues and categorical values for example 2 dimensions have continues values and 3 dimensions are categorical values

color weight gender height age target

1 black 56 m 160 34 yes 2 white 77 f 170 54 no 3 yellow 87 m 167 43 yes 4 white 55 m 198 72 no 5 white 88 f 176 32 yes
opened by Sandy4321 0
will it work for multivariate time series prediction both regression and classification
great code thanks may you clarify : will it work for multivariate time series prediction both regression and classification 1 where all values are continues values weight height age target 1 56 160 34 1.2 2 77 170 54 3.5 3 87 167 43 0.7 4 55 198 72 0.5 5 88 176 32 2.3

2 or even will it work for multivariate time series where values are mixture of continues and categorical values for example 2 dimensions have continues values and 3 dimensions are categorical values

color weight gender height age target

1 black 56 m 160 34 yes 2 white 77 f 170 54 no 3 yellow 87 m 167 43 yes 4 white 55 m 198 72 no 5 white 88 f 176 32 yes
opened by Sandy4321 0

Many standard methods do not work (properly) on XDataFrame with hierarchical data

# loading some time-series data
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
from xpandas.data_container import XSeries, XDataFrame
import numpy as np
import pandas as pd

def read_data(file):
    data = file.readlines()
    rows = [row.decode('utf-8').strip().split('  ') for row in data]
    X = pd.DataFrame(rows, dtype=np.float)
    y = X.pop(0)
    ts = XSeries([row for _, row in X.iterrows()])
    X = XDataFrame({'ts1': ts, 'ts2': ts})
    return X, y

url = 'http://www.timeseriesclassification.com/Downloads/GunPoint.zip'
url = urlopen(url)
zipfile = ZipFile(BytesIO(url.read()))
file = zipfile.open('GunPoint_TRAIN.txt')
X, y = read_data(file)

X.mean() # returns empty series rather than mean of series, the same for many other methods like .std(), .median(), etc)

X.apply(np.mean) # breaks

X['ts1'].mean() # breaks 

X['ts1'].apply(np.mean) # works

X['ts1'].apply(np.percentile, args=(25,)) # breaks, does not passes on args

opened by mloning 0

Slicing single row of XDataFrame does not work

Slicing of single row in XDataFrame does not work, probably because it tries to return a series which does not work as types are heterogeneous, so instead one may want to return a XDataFrame with a single row.

import pandas as pd
from xpandas.data_container import XDataFrame

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris.iloc[0] # works

irisx = XDataFrame(iris) 
irisx.iloc[0] # breaks

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-b46ce52f9af0> in <module>
----> 1 irisx.iloc[0]

~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1476 
   1477             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1478             return self._getitem_axis(maybe_callable, axis=axis)
   1479 
   1480     def _is_scalar_access(self, key):

~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   2102             self._validate_integer(key, axis)
   2103 
-> 2104             return self._get_loc(key, axis=axis)
   2105 
   2106     def _convert_to_indexer(self, obj, axis=None, is_setter=False):

~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/indexing.py in _get_loc(self, key, axis)
    143         if axis is None:
    144             axis = self.axis
--> 145         return self.obj._ixs(key, axis=axis)
    146 
    147     def _slice(self, obj, axis=None, kind=None):

~/.conda/envs/sktime/lib/python3.7/site-packages/pandas/core/frame.py in _ixs(self, i, axis)
   2624                                                       index=self.columns,
   2625                                                       name=self.index[i],
-> 2626                                                       dtype=new_values.dtype)
   2627                 result._set_is_copy(self, copy=copy)
   2628                 return result

~/.conda/envs/sktime/lib/python3.7/site-packages/xpandas/data_container/data_container.py in __init__(self, *args, **kwargs)
     71         check_result, data_type = _check_all_elements_have_the_same_property(data, type)
     72         if not check_result:
---> 73             raise ValueError('Not all elements the same type')
     74 
     75         if data_type is not None:

ValueError: Not all elements the same type

opened by mloning 0

Releases(1.0.2)

1.0.2(Oct 23, 2017)

Please refer to documentation and tutorial.
Source code(tar.gz)
Source code(zip)

Owner

The Alan Turing Institute

The UK's national institute for data science and artificial intelligence.

GitHub https://alan-turing-institute.github.io/xpandas/

High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

2.9k Dec 23, 2022

A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

1.6k Jan 5, 2023

Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

What is Vaex? Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular data

7.7k Jan 1, 2023

Docker is an open platform for developing, shipping, and running applications OS-level virtualization to deliver software in packages called containers However, 'security' is a top request on Docker's public roadmap This project aims at vulnerability check for such docker containers. New contributions are accepted

Docker-Vulnerability-Check Docker is an open platform for developing, shipping, and running applications OS-level virtualization to deliver software i

103 Aug 20, 2022

Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

249 Jan 2, 2023

Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

210 Dec 28, 2022

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow

27 Dec 15, 2022

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

96 Dec 22, 2022

Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

50 Jan 3, 2023

A Python Library for Simple Models and Containers Persisted in Redis

Redisco Python Containers and Simple Models for Redis Description Redisco allows you to store objects in Redis. It is inspired by the Ruby library Ohm

436 Nov 10, 2022

jupyter/ipython experiment containers for GPU and general RAM re-use

ipyexperiments jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks. About Thi

153 Dec 7, 2022

Discord Bot that leverages the idea of nested containers using podman, runs untrusted user input, executes Quantum Circuits, allows users to refer to the Qiskit Documentation, and provides the ability to search questions on the Quantum Computing StackExchange.

Discord Bot that leverages the idea of nested containers using podman, runs untrusted user input, executes Quantum Circuits, allows users to refer to the Qiskit Documentation, and provides the ability to search questions on the Quantum Computing StackExchange.