A Python toolbox for gaining geometric insights into high-dimensional data

Overview

Hypertools logo

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton

Hypertools example

Overview

HyperTools is designed to facilitate dimensionality reduction-based visual explorations of high-dimensional data. The basic pipeline is to feed in a high-dimensional dataset (or a series of high-dimensional datasets) and, in a single function call, reduce the dimensionality of the dataset(s) and create a plot. The package is built atop many familiar friends, including matplotlib, scikit-learn and seaborn. Our package was recently featured on Kaggle's No Free Hunch blog. For a general overview, you may find this talk useful (given as part of the MIND Summer School at Dartmouth).

Try it!

Click the badge to launch a binder instance with example uses:

Binder

or

Check the repo of Jupyter notebooks from the HyperTools paper.

Installation

To install the latest stable version run:

pip install hypertools

To install the latest unstable version directly from GitHub, run:

pip install -U git+https://github.com/ContextLab/hypertools.git

Or alternatively, clone the repository to your local machine:

git clone https://github.com/ContextLab/hypertools.git

Then, navigate to the folder and type:

pip install -e .

(These instructions assume that you have pip installed on your system)

NOTE: If you have been using the development version of 0.5.0, please clear your data cache (/Users/yourusername/hypertools_data).

Requirements

  • python 2.7, 3.5+
  • PPCA>=0.0.2
  • scikit-learn>=0.18.1
  • pandas>=0.18.0
  • seaborn>=0.8.1
  • matplotlib>=1.5.1
  • scipy>=0.17.1
  • numpy>=1.10.4
  • future
  • requests
  • deepdish
  • pytest (for development)
  • ffmpeg (for saving animations)

If installing from github (instead of pip), you must also install the requirements: pip install -r requirements.txt

Troubleshooting

If you encounter an error related to installing deepdish (hdf5) on a MacOS system, try installing hdf5 directly using homebrew:

$ brew tap homebrew/science
$ brew install hdf5

and then re-start the installation.

Documentation

Check out our readthedocs page for further documentation, complete API details, and additional examples.

Citing

We wrote a short JMLR paper about HyperTools, which you can read here, or you can check out a (longer) preprint here. We also have a repository with example notebooks from the paper here.

Please cite as:

Heusser AC, Ziman K, Owen LLW, Manning JR (2018) HyperTools: A Python toolbox for gaining geometric insights into high-dimensional data. Journal of Machine Learning Research, 18(152): 1--6.

Here is a bibtex formatted reference:

@ARTICLE {,
    author  = {Andrew C. Heusser and Kirsten Ziman and Lucy L. W. Owen and Jeremy R. Manning},    
    title   = {HyperTools: a Python Toolbox for Gaining Geometric Insights into High-Dimensional Data},    
    journal = {Journal of Machine Learning Research},
    year    = {2018},
    volume  = {18},	
    number  = {152},	
    pages   = {1-6},	
    url     = {http://jmlr.org/papers/v18/17-434.html}	
}

Contributing

Join the chat at https://gitter.im/hypertools/Lobby

If you'd like to contribute, please first read our Code of Conduct.

For specific information on how to contribute to the project, please see our Contributing page.

Testing

Build Status

To test HyperTools, install pytest (pip install pytest) and run pytest in the HyperTools folder

Examples

See here for more examples.

Plot

import hypertools as hyp
hyp.plot(list_of_arrays, '.', group=list_of_labels)

Plot example

Align

import hypertools as hyp
hyp.plot(list_of_arrays, align='hyper')

BEFORE

Align before example

AFTER

Align after example

Cluster

import hypertools as hyp
hyp.plot(array, '.', n_clusters=10)

Cluster Example

Describe

import hypertools as hyp
hyp.tools.describe(list_of_arrays, reduce='PCA', max_dims=14)

Describe Example

Comments
  • Plotting text

    Plotting text

    This PR adds the ability to plot text data. For example:

    data = [['i like cats alot', 'cats r pretty cool', 'cats are better than dogs'],
            ['dogs rule the haus', 'dogs are my jam', 'dogs are a mans best friend']]
    hyp.plot(data,'o')
    

    yields a plot where each dot represents a sentence that was vectorized using sklearn's CountVectorizer and then modeled using LatentDirichletAllocation.

    To plot just the vectorized text, simply set hyp.plot(data, text_model=None)

    I exposed the hyp.tools.text2mat function to the user, and that's what does the heavy lifting. It can vectorize the data using CountVectorizer or TfidfVectorizer and model the data using LDA or NMF.

    opened by andrewheusser 33
  • Issue#146 and Issue#143 Feature Enhancement: GaussianMixture and BayesianGaussianMixture Clustering Algorithms

    Issue#146 and Issue#143 Feature Enhancement: GaussianMixture and BayesianGaussianMixture Clustering Algorithms

    Enhancement: Added GaussianMixture Clustering Algorithm Issue Page: https://github.com/ContextLab/hypertools/issues/146#issuecomment-348784872 and https://github.com/ContextLab/hypertools/issues/143

    opened by alokkumary2j 31
  •   Failed building wheel for hdbscan

    Failed building wheel for hdbscan

    I've gotten this error for both methods of installation. Using an anaconda 3.6 environment.

    $ pip install -U git+https://github.com/ContextLab/hypertools.git
    
    hdbscan/dist_metrics.pyx:1140:24: Constructing Python dict not allowed without gil
      building 'hdbscan.dist_metrics' extension
      gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/galen/miniconda3/envs/python_3_6/include/python3.6m -I/home/galen/miniconda3/envs/python_3_6/lib/python3.6/site-packages/numpy/core/include -c hdbscan/dist_metrics.c -o build/temp.linux-x86_64-3.6/hdbscan/dist_metrics.o
      hdbscan/dist_metrics.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
       #error Do not use this file, it is the result of a failed Cython compilation.
        ^
      error: command 'gcc' failed with exit status 1
    
    $ uname -a
    Linux fibonacci 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    
    $ conda info
    Current conda install:
    
                   platform : linux-64
              conda version : 4.3.30
           conda is private : False
          conda-env version : 4.3.30
        conda-build version : not installed
             python version : 3.6.0.final.0
           requests version : 2.12.4
           root environment : /home/galen/miniconda3  (writable)
        default environment : /home/galen/miniconda3/envs/python_3_6
           envs directories : /home/galen/miniconda3/envs
                              /home/galen/.conda/envs
              package cache : /home/galen/miniconda3/pkgs
                              /home/galen/.conda/pkgs
               channel URLs : https://repo.continuum.io/pkgs/main/linux-64
                              https://repo.continuum.io/pkgs/main/noarch
                              https://repo.continuum.io/pkgs/free/linux-64
                              https://repo.continuum.io/pkgs/free/noarch
                              https://repo.continuum.io/pkgs/r/linux-64
                              https://repo.continuum.io/pkgs/r/noarch
                              https://repo.continuum.io/pkgs/pro/linux-64
                              https://repo.continuum.io/pkgs/pro/noarch
                config file : None
                 netrc file : None
               offline mode : False
                 user-agent : conda/4.3.30 requests/2.12.4 CPython/3.6.0 Linux/4.4.0-119-generic debian/stretch/sid glibc/2.23    
                    UID:GID : 1000:1000
    
    $ conda list
    # packages in environment at /home/galen/miniconda3/envs/python_3_6:
    #
    _license                  1.1                      py36_1    anaconda
    alabaster                 0.7.9                    py36_0    anaconda
    anaconda                  custom                   py36_0    anaconda
    anaconda-client           1.6.0                    py36_0    anaconda
    anaconda-navigator        1.4.3                    py36_0    anaconda
    astroid                   1.4.9                    py36_0    anaconda
    astropy                   1.3                 np111py36_0    anaconda
    babel                     2.3.4                    py36_0    anaconda
    backports                 1.0                      py36_0    anaconda
    backports.weakref         1.0rc1                   py36_0  
    backports.weakref         1.0rc1                    <pip>
    beautifulsoup4            4.5.3                    py36_0    anaconda
    bitarray                  0.8.1                    py36_0    anaconda
    blaze                     0.10.1                   py36_0    anaconda
    bleach                    1.5.0                     <pip>
    bleach                    1.5.0                    py36_0  
    bokeh                     0.12.4                   py36_0    anaconda
    boto                      2.45.0                   py36_0    anaconda
    boto                      2.48.0                    <pip>
    boto3                     1.5.3                     <pip>
    botocore                  1.8.17                    <pip>
    bottleneck                1.2.0               np111py36_0    anaconda
    bz2file                   0.98                      <pip>
    bzip2                     1.0.6                         3    anaconda
    cairo                     1.14.8                        0    anaconda
    certifi                   2018.4.16                 <pip>
    cffi                      1.9.1                    py36_0    anaconda
    chardet                   3.0.4                     <pip>
    chardet                   2.3.0                    py36_0    anaconda
    chest                     0.2.3                    py36_0    anaconda
    click                     6.7                      py36_0    anaconda
    cloudpickle               0.2.2                    py36_0    anaconda
    clyent                    1.2.2                    py36_0    anaconda
    colorama                  0.3.7                    py36_0    anaconda
    configobj                 5.0.6                    py36_0    anaconda
    contextlib2               0.5.4                    py36_0    anaconda
    cryptography              1.7.1                    py36_0    anaconda
    curl                      7.49.0                        1    anaconda
    cycler                    0.10.0                   py36_0    anaconda
    cython                    0.25.2                   py36_0    anaconda
    Cython                    0.28.2                    <pip>
    cytoolz                   0.8.2                    py36_0    anaconda
    dask                      0.13.0                   py36_0    anaconda
    datashape                 0.5.4                    py36_0    anaconda
    dbus                      1.10.10                       0    anaconda
    decorator                 4.0.11                   py36_0    anaconda
    deepdish                  0.3.6                     <pip>
    dill                      0.2.5                    py36_0    anaconda
    docutils                  0.14                      <pip>
    docutils                  0.13.1                   py36_0    anaconda
    entrypoints               0.2.2                    py36_0    anaconda
    et_xmlfile                1.0.1                    py36_0    anaconda
    expat                     2.1.0                         0    anaconda
    fastcache                 1.0.2                    py36_1    anaconda
    flask                     0.12                     py36_0    anaconda
    flask-cors                3.0.2                    py36_0    anaconda
    font-ttf-dejavu-sans-mono 2.37                          0    anaconda
    font-ttf-inconsolata      2.000                         0    anaconda
    font-ttf-source-code-pro  2.030                         0    anaconda
    font-ttf-ubuntu           0.83                          0    anaconda
    fontconfig                2.12.1                        2    anaconda
    fonts-continuum           1                             0    anaconda
    freetype                  2.5.5                         2    anaconda
    future                    0.16.0                    <pip>
    gensim                    3.2.0                     <pip>
    geotiff                   1.4.1                         0    anaconda
    get_terminal_size         1.0.0                    py36_0    anaconda
    gevent                    1.2.1                    py36_0    anaconda
    glib                      2.50.2                        1    anaconda
    gmp                       6.1.0                         0    anaconda
    graphviz                  0.8                       <pip>
    greenlet                  0.4.11                   py36_0    anaconda
    gsl                       2.2.1                         0    anaconda
    gst-plugins-base          1.8.0                         0    anaconda
    gstreamer                 1.8.0                         0    anaconda
    h5py                      2.6.0               np111py36_2    anaconda
    harfbuzz                  0.9.39                        2    anaconda
    hdbscan                   0.8.12                    <pip>
    hdf4                      4.2.12                        0    anaconda
    hdf5                      1.8.17                        1    anaconda
    heapdict                  1.0.0                    py36_1    anaconda
    html5lib                  0.9999999                py36_0  
    html5lib                  0.9999999                 <pip>
    hypertools                0.5.0                     <pip>
    icu                       54.1                          0    anaconda
    idna                      2.6                       <pip>
    idna                      2.2                      py36_0    anaconda
    igraph                    0.7.1                         1    conda-forge
    imagesize                 0.7.1                    py36_0    anaconda
    ipykernel                 4.5.2                    py36_0    anaconda
    ipykernel                 4.6.1                     <pip>
    ipython                   5.1.0                    py36_0    anaconda
    ipython_genutils          0.1.0                    py36_0    anaconda
    ipywidgets                5.2.2                    py36_1    anaconda
    isort                     4.2.5                    py36_0    anaconda
    itsdangerous              0.24                     py36_0    anaconda
    jbig                      2.1                           0    anaconda
    jdcal                     1.3                      py36_0    anaconda
    jedi                      0.9.0                    py36_1    anaconda
    jinja2                    2.9.4                    py36_0    anaconda
    jmespath                  0.9.3                     <pip>
    jpeg                      8d                            2    anaconda
    jsonschema                2.5.1                    py36_0    anaconda
    jupyter                   1.0.0                    py36_3  
    jupyter-cms               0.6.2                     <pip>
    jupyter-dashboards        0.6.1                     <pip>
    jupyter-dashboards-bundlers 0.8.1                     <pip>
    jupyter_client            4.4.0                    py36_0    anaconda
    jupyter_console           5.0.0                    py36_0    anaconda
    jupyter_contrib_core      0.3.0                    py36_1    conda-forge
    jupyter_contrib_nbextensions 0.2.6                    py36_0    conda-forge
    jupyter_core              4.2.1                    py36_0    anaconda
    jupyter_highlight_selected_word 0.0.11                   py36_0    conda-forge
    jupyter_latex_envs        1.3.8.2                  py36_1    conda-forge
    jupyter_nbextensions_configurator 0.2.4                    py36_0    conda-forge
    jupyterlab                0.31.12                  py36_1    conda-forge
    jupyterlab_launcher       0.10.5                   py36_0    conda-forge
    kealib                    1.4.6                         0    anaconda
    keras                     2.0.5                    py36_0  
    kiwisolver                1.0.1                     <pip>
    kmodes                    0.7                       <pip>
    lazy-object-proxy         1.2.2                    py36_0    anaconda
    libffi                    3.2.1                         1    anaconda
    libgcc                    4.8.5                         2    anaconda
    libgfortran               3.0.0                         1    anaconda
    libgpuarray               0.6.9                         0  
    libiconv                  1.14                          0    anaconda
    libnetcdf                 4.4.1                         0    anaconda
    libpng                    1.6.27                        0    anaconda
    libprotobuf               3.2.0                         0  
    libsodium                 1.0.10                        0    anaconda
    libtiff                   4.0.6                         2    anaconda
    libuuid                   1.0.3                         0    anaconda
    libxcb                    1.12                          1    anaconda
    libxml2                   2.9.4                         0    anaconda
    libxslt                   1.1.29                        0    anaconda
    llvmlite                  0.15.0                   py36_0    anaconda
    llvmlite                  0.22.0                    <pip>
    locket                    0.2.0                    py36_1    anaconda
    lxml                      3.7.2                    py36_0    anaconda
    mako                      1.0.6                    py36_0  
    Markdown                  2.2.0                     <pip>
    markdown                  2.6.8                    py36_0  
    markupsafe                0.23                     py36_2    anaconda
    matplotlib                2.0.0               np111py36_0    anaconda
    matplotlib                2.2.2                     <pip>
    mistune                   0.7.3                    py36_0    anaconda
    mkl                       2017.0.1                      0    anaconda
    mkl-service               1.1.2                    py36_3    anaconda
    mpmath                    0.19                     py36_1    anaconda
    multipledispatch          0.4.9                    py36_0    anaconda
    mysql-connector-python    2.0.4                    py36_0    anaconda
    nbconvert                 4.2.0                    py36_0    anaconda
    nbformat                  4.2.0                    py36_0    anaconda
    ncurses                   5.9                          10    anaconda
    networkx                  1.11                     py36_0  
    nltk                      3.2.2                    py36_0    anaconda
    nose                      1.3.7                    py36_1    anaconda
    notebook                  4.3.1                    py36_0    anaconda
    numba                     0.37.0                    <pip>
    numba                     0.30.1              np111py36_0    anaconda
    numexpr                   2.6.1               np111py36_2    anaconda
    numexpr                   2.6.4                     <pip>
    numpy                     1.14.2                    <pip>
    numpy                     1.11.3                   py36_0    anaconda
    numpydoc                  0.6.0                    py36_0    anaconda
    odo                       0.5.0                    py36_1    anaconda
    openpyxl                  2.4.1                    py36_0    anaconda
    openssl                   1.0.2k                        0    anaconda
    pandas                    0.19.2              np111py36_1    anaconda
    pandas                    0.22.0                    <pip>
    pandoc                    1.15.0.6                      0    anaconda
    pango                     1.40.3                        1    anaconda
    partd                     0.3.7                    py36_0    anaconda
    path.py                   10.0                     py36_0    anaconda
    pathlib2                  2.2.0                    py36_0    anaconda
    patsy                     0.4.1                    py36_0    anaconda
    pcre                      8.39                          1    anaconda
    pep8                      1.7.0                    py36_0    anaconda
    pexpect                   4.2.1                    py36_0    anaconda
    pickleshare               0.7.4                    py36_0    anaconda
    pillow                    3.4.2                    py36_0    anaconda
    pip                       9.0.1                    py36_1    anaconda
    pixman                    0.34.0                        0    anaconda
    ply                       3.9                      py36_0    anaconda
    powerlaw                  1.4.3                     <pip>
    ppca                      0.0.3                     <pip>
    proj4                     4.9.2                         0    anaconda
    prompt_toolkit            1.0.9                    py36_0    anaconda
    protobuf                  3.3.0                     <pip>
    protobuf                  3.2.0                    py36_0  
    psutil                    5.0.1                    py36_0    anaconda
    ptyprocess                0.5.1                    py36_0    anaconda
    py                        1.4.33                   py36_0    anaconda
    pyasn1                    0.1.9                    py36_0    anaconda
    pycosat                   0.6.1                    py36_1    anaconda
    pycparser                 2.17                     py36_0    anaconda
    pycrypto                  2.6.1                    py36_4    anaconda
    pycurl                    7.43.0                   py36_0    anaconda
    pyflakes                  1.5.0                    py36_0    anaconda
    pygments                  2.1.3                    py36_0    anaconda
    pygpu                     0.6.9                    py36_0  
    pylint                    1.6.4                    py36_1    anaconda
    pyopenssl                 16.2.0                   py36_0    anaconda
    pyparsing                 2.2.0                     <pip>
    pyparsing                 2.1.4                    py36_0    anaconda
    pyqt                      5.6.0                    py36_2    anaconda
    pytables                  3.3.0               np111py36_0    anaconda
    pytest                    3.0.5                    py36_0    anaconda
    python                    3.6.0                         0    anaconda
    python-dateutil           2.6.0                    py36_0    anaconda
    python-dateutil           2.7.2                     <pip>
    pytz                      2016.10                  py36_0    anaconda
    pytz                      2018.4                    <pip>
    pyyaml                    3.12                     py36_0    anaconda
    pyzmq                     16.0.2                   py36_0    anaconda
    qt                        5.6.2                         2    anaconda
    qtawesome                 0.4.3                    py36_0    anaconda
    qtconsole                 4.2.1                    py36_1    anaconda
    qtpy                      1.2.1                    py36_0    anaconda
    r-assertthat              0.1                    r3.3.2_4    r
    r-backports               1.0.4                  r3.3.2_0    r
    r-base                    3.3.2                         0    r
    r-base64enc               0.1_3                  r3.3.2_0    r
    r-bh                      1.62.0_1               r3.3.2_0    r
    r-bitops                  1.0_6                  r3.3.2_2    r
    r-boot                    1.3_18                 r3.3.2_0    r
    r-broom                   0.4.1                  r3.3.2_0    r
    r-car                     2.1_4                  r3.3.2_0    r
    r-caret                   6.0_73                 r3.3.2_0    r
    r-catools                 1.17.1                 r3.3.2_2    r
    r-class                   7.3_14                 r3.3.2_0    r
    r-cluster                 2.0.5                  r3.3.2_0    r
    r-codetools               0.2_15                 r3.3.2_0    r
    r-colorspace              1.3_1                  r3.3.2_0    r
    r-crayon                  1.3.2                  r3.3.2_0    r
    r-curl                    2.3                    r3.3.2_0    r
    r-data.table              1.10.0                 r3.3.2_0    r
    r-dbi                     0.5_1                  r3.3.2_0    r
    r-dichromat               2.0_0                  r3.3.2_2    r
    r-digest                  0.6.10                 r3.3.2_0    r
    r-doparallel              1.0.10                 r3.3.2_0    r
    r-dplyr                   0.5.0                  r3.3.2_0    r
    r-essentials              1.5.2                  r3.3.2_0    r
    r-evaluate                0.10                   r3.3.2_0    r
    r-forcats                 0.1.1                  r3.3.2_0    r
    r-foreach                 1.4.3                  r3.3.2_0    r
    r-foreign                 0.8_67                 r3.3.2_0    r
    r-formatr                 1.4                    r3.3.2_0    r
    r-ggplot2                 2.2.0                  r3.3.2_0    r
    r-gistr                   0.3.6                  r3.3.2_0    r
    r-glmnet                  2.0_5                  r3.3.2_0    r
    r-gridbase                0.4_7                  r3.3.2_0    r
    r-gtable                  0.2.0                  r3.3.2_0    r
    r-haven                   1.0.0                  r3.3.2_0    r
    r-hexbin                  1.27.1                 r3.3.2_0    r
    r-highr                   0.6                    r3.3.2_0    r
    r-hms                     0.3                    r3.3.2_0    r
    r-htmltools               0.3.5                  r3.3.2_0    r
    r-htmlwidgets             0.8                    r3.3.2_0    r
    r-httpuv                  1.3.3                  r3.3.2_0    r
    r-httr                    1.2.1                  r3.3.2_0    r
    r-igraph                  1.0.1                  r3.3.2_0    r
    r-irdisplay               0.4.4                  r3.3.2_0    r
    r-irkernel                0.7.1                  r3.3.2_0    r
    r-irlba                   2.1.2                  r3.3.2_0    r
    r-iterators               1.0.8                  r3.3.2_0    r
    r-jsonlite                1.1                    r3.3.2_0    r
    r-kernsmooth              2.23_15                r3.3.2_0    r
    r-knitr                   1.15.1                 r3.3.2_0    r
    r-labeling                0.3                    r3.3.2_2    r
    r-lattice                 0.20_34                r3.3.2_0    r
    r-lazyeval                0.2.0                  r3.3.2_0    r
    r-leaflet                 1.0.1                  r3.3.2_0    r
    r-lme4                    1.1_12                 r3.3.2_0    r
    r-lubridate               1.6.0                  r3.3.2_0    r
    r-magrittr                1.5                    r3.3.2_2    r
    r-maps                    3.1.1                  r3.3.2_0    r
    r-markdown                0.7.7                  r3.3.2_2    r
    r-mass                    7.3_45                 r3.3.2_0    r
    r-matrix                  1.2_7.1                r3.3.2_0    r
    r-matrixmodels            0.4_1                  r3.3.2_0    r
    r-mgcv                    1.8_16                 r3.3.2_0    r
    r-mime                    0.5                    r3.3.2_0    r
    r-minqa                   1.2.4                  r3.3.2_2    r
    r-mnormt                  1.5_5                  r3.3.2_0    r
    r-modelmetrics            1.1.0                  r3.3.2_0    r
    r-modelr                  0.1.0                  r3.3.2_0    r
    r-munsell                 0.4.3                  r3.3.2_0    r
    r-nlme                    3.1_128                r3.3.2_0    r
    r-nloptr                  1.0.4                  r3.3.2_2    r
    r-nmf                     0.20.6                 r3.3.2_0    r
    r-nnet                    7.3_12                 r3.3.2_0    r
    r-openssl                 0.9.5                  r3.3.2_0    r
    r-packrat                 0.4.8_1                r3.3.2_0    r
    r-pbdzmq                  0.2_4                  r3.3.2_0    r
    r-pbkrtest                0.4_6                  r3.3.2_0    r
    r-pkgmaker                0.22                   r3.3.2_0    r
    r-pki                     0.1_3                  r3.3.2_0    r
    r-plyr                    1.8.4                  r3.3.2_0    r
    r-png                     0.1_7                  r3.3.2_3    r
    r-pryr                    0.1.2                  r3.3.2_0    r
    r-psych                   1.6.9                  r3.3.2_0    r
    r-purrr                   0.2.2                  r3.3.2_0    r
    r-quantmod                0.4_7                  r3.3.2_0    r
    r-quantreg                5.29                   r3.3.2_0    r
    r-r6                      2.2.0                  r3.3.2_0    r
    r-randomforest            4.6_12                 r3.3.2_0    r
    r-raster                  2.5_8                  r3.3.2_0    r
    r-rbokeh                  0.5.0                  r3.3.2_0    r
    r-rcolorbrewer            1.1_2                  r3.3.2_3    r
    r-rcpp                    0.12.8                 r3.3.2_0    r
    r-rcppeigen               0.3.2.9.0              r3.3.2_0    r
    r-rcurl                   1.95_4.8               r3.3.2_0    r
    r-readr                   1.0.0                  r3.3.2_0    r
    r-readxl                  0.1.1                  r3.3.2_0    r
    r-recommended             3.3.2                  r3.3.2_0    r
    r-registry                0.3                    r3.3.2_0    r
    r-repr                    0.10                   r3.3.2_0    r
    r-reshape2                1.4.2                  r3.3.2_0    r
    r-rjsonio                 1.3_0                  r3.3.2_2    r
    r-rmarkdown               1.3                    r3.3.2_0    r
    r-rngtools                1.2.4                  r3.3.2_0    r
    r-rpart                   4.1_10                 r3.3.2_0    r
    r-rprojroot               1.1                    r3.3.2_0    r
    r-rsconnect               0.7                    r3.3.2_0    r
    r-rstudioapi              0.6                    r3.3.2_0    r
    r-rvest                   0.3.2                  r3.3.2_0    r
    r-scales                  0.4.1                  r3.3.2_0    r
    r-selectr                 0.3_0                  r3.3.2_0    r
    r-shiny                   0.14.2                 r3.3.2_0    r
    r-sourcetools             0.1.5                  r3.3.2_0    r
    r-sp                      1.2_3                  r3.3.2_0    r
    r-sparsem                 1.74                   r3.3.2_0    r
    r-spatial                 7.3_11                 r3.3.2_0    r
    r-stringi                 1.1.2                  r3.3.2_0    r
    r-stringr                 1.1.0                  r3.3.2_0    r
    r-survival                2.40_1                 r3.3.2_0    r
    r-tibble                  1.2                    r3.3.2_0    r
    r-tidyr                   0.6.0                  r3.3.2_0    r
    r-tidyverse               1.0.0                  r3.3.2_0    r
    r-ttr                     0.23_1                 r3.3.2_0    r
    r-uuid                    0.1_2                  r3.3.2_0    r
    r-xml2                    1.0.0                  r3.3.2_0    r
    r-xtable                  1.8_2                  r3.3.2_0    r
    r-xts                     0.9_7                  r3.3.2_2    r
    r-yaml                    2.1.14                 r3.3.2_0    r
    r-zoo                     1.7_13                 r3.3.2_0    r
    readline                  6.2                           2    anaconda
    redis                     3.2.0                         0    anaconda
    redis-py                  2.10.5                   py36_0    anaconda
    requests                  2.12.4                   py36_0    anaconda
    requests                  2.18.4                    <pip>
    rope                      0.9.4                    py36_1    anaconda
    rstudio                   1.0.136                       1    r
    s3transfer                0.1.12                    <pip>
    scikit-image              0.12.3              np111py36_1    anaconda
    scikit-learn              0.19.1                    <pip>
    scikit-learn              0.18.1              np111py36_1    anaconda
    scipy                     1.0.1                     <pip>
    scipy                     0.18.1              np111py36_1    anaconda
    seaborn                   0.8.1                     <pip>
    seaborn                   0.7.1                    py36_0    anaconda
    setuptools                39.0.1                    <pip>
    setuptools                27.2.0                   py36_0    anaconda
    simplegeneric             0.8.1                    py36_1    anaconda
    singledispatch            3.4.0.3                  py36_0    anaconda
    sip                       4.18                     py36_0    anaconda
    six                       1.11.0                    <pip>
    six                       1.10.0                   py36_0    anaconda
    smart-open                1.5.5                     <pip>
    snowballstemmer           1.2.1                    py36_0    anaconda
    sockjs-tornado            1.0.3                    py36_0    anaconda
    sphinx                    1.5.1                    py36_0    anaconda
    spyder                    3.1.2                    py36_0    anaconda
    sqlalchemy                1.1.5                    py36_0    anaconda
    sqlite                    3.13.0                        0    anaconda
    statsmodels               0.6.1               np111py36_1    anaconda
    stop-words                2015.2.23.1               <pip>
    sympy                     1.0                      py36_0    anaconda
    tables                    3.4.2                     <pip>
    tensorflow                1.2.0                     <pip>
    tensorflow                1.2.1                    py36_0  
    terminado                 0.6                      py36_0    anaconda
    tflearn                   0.3.2                     <pip>
    theano                    0.9.0                    py36_0  
    tk                        8.5.18                        0    anaconda
    toolz                     0.8.2                    py36_0    anaconda
    tornado                   4.4.2                    py36_0    anaconda
    traitlets                 4.3.1                    py36_0    anaconda
    umap-learn                0.2.3                     <pip>
    unicodecsv                0.14.1                   py36_0    anaconda
    urllib3                   1.22                      <pip>
    wcwidth                   0.1.7                    py36_0    anaconda
    werkzeug                  0.11.15                  py36_0    anaconda
    wheel                     0.29.0                   py36_0    anaconda
    Whoosh                    2.7.4                     <pip>
    widgetsnbextension        1.2.6                    py36_0    anaconda
    word2vec                  0.9.2                     <pip>
    wrapt                     1.10.8                   py36_0    anaconda
    xerces-c                  3.1.4                         0    anaconda
    xlrd                      1.0.0                    py36_0    anaconda
    xlsxwriter                0.9.6                    py36_0    anaconda
    xlwt                      1.2.0                    py36_0    anaconda
    xz                        5.2.2                         1    anaconda
    yaml                      0.1.6                         0    anaconda
    zeromq                    4.1.5                         0    anaconda
    zlib                      1.2.8                         3    anaconda
    
    opened by galenwilkerson 23
  • simplifying API

    simplifying API

    The API could be simplified in a few places. For example:

    • In hyp.plot we could include an align flag that runs hyp.tools.align on the data if set to True (default: False).
    • In hyp.tools.align and hyp.tools.procrustes we could include a ndimsflag that runs hyp.tools.reduce on the dataset prior to alignment if not None (default: None)
    • In hyp.tools.align and hyp.tools.procrustes, if the data matrices don't have the same numbers of features, we should zero-pad all of the matrices to ensure they have the same number of features as the matrix with the most features
    • in hyp.tools.load we could include align and ndims flags that pass the data through the appropriate other functions (hyp.tools.reduce, followed by hyp.tools.align) so that the reduced/aligned data are returned from the start, without needed to save extra copies of the dataset
    enhancement high priority mozilla sprint easy(ish) 
    opened by jeremymanning 20
  • Simplifying API by adding kw args ndims and align to some key functions

    Simplifying API by adding kw args ndims and align to some key functions

    This PR implements the kw args ndims and align in some of the key hypertools functions, as was requested in Issue #105.

    I was able to successfully run the tests using pytest and I had no issues running the following commands.

    hyp.tools.load('weights', ndims=3)
    hyp.tools.load('weights', ndims=2)
    hyp.tools.load('weights', ndims=1)
    hyp.tools.load('weights', align=True)
    hyp.tools.load('weights', ndims=3, align=True)
    hyp.tools.load('weights', ndims=2, align=True)
    hyp.tools.load('weights', ndims=1, align=True)
    

    Let me know if there's anything that I can adjust.

    opened by rarredon 19
  • MATLAB code: what should we do about it?

    MATLAB code: what should we do about it?

    Should we maintain separate MATLAB and Python codebases? The original MATLAB code is already released here: https://www.mathworks.com/matlabcentral/fileexchange/56623-hyperplot-tools

    The current Python toolbox goes way beyond the original MATLAB code, and our lab is no longer using MATLAB anyway. So I'm inclined to have us remove the MATLAB code from this repository and just have it be a Python repository.

    In a future release we could provide wrappers (for MATLAB, Javascript, R, etc.) for the Python code if we wanted to support those languages; that would allow us to maintain a single "main" codebase without re-writing everything multiple times.

    My proposal is that we replace the entire repository with the the current python directory. We could also add a link to the original MATLAB code in the readme or in our writeup.

    Thoughts?

    question 
    opened by jeremymanning 17
  • Add HDBSCAN and UMAP as options for clustering and reducing.

    Add HDBSCAN and UMAP as options for clustering and reducing.

    A quick proposal to add HDBSCAN for clustering and UMAP for reduction.

    HDBSCAN is a hierarchical density based clustering approach similar to DBSCAN. Like DBSCAN it labels some points as "noise"; that may or may not play nicely with the rest of the code.

    UMAP is a dimension reduction technique with similar output to t-SNE but can run much faster, and scale to larger datasets.

    I left the packages as unrequired for now and simply added warning if they are unavailable. If you have a preferred way of handling that let me know.

    opened by lmcinnes 14
  • writeup

    writeup

    After our OpenBCI Hackathon and whatever other polishing we want to do, we should write this up as a brief report in an appropriate forum (e.g. Nature Methods, Journal of Neuroscience Methods, arXiv, PLoS One), and then we should release the code. We should show how we can visualize a few interesting public datasets and use those visualizations to gain insights into the structure of the data. (They could be neuroscience datasets or not; the precise application will also help us narrow down a forum for reporting.)

    Proposed title: The Geometry of Big Data

    enhancement 
    opened by jeremymanning 14
  • UserWarning thrown when importing hypertools in jupyter notebook

    UserWarning thrown when importing hypertools in jupyter notebook

    When importing hypertools in a jupyter notebook (import hypertools as hyp), the following warning is thrown:

    /opt/conda/lib/python3.6/site-packages/matplotlib/__init__.py:1405: UserWarning: 
    This call to matplotlib.use() has no effect because the backend has already
    been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
    or matplotlib.backends is imported for the first time.
    
      warnings.warn(_use_error_msg)
    

    Proposed fix: suppress this warning within hypertools (more info here)

    bug easy(ish) 
    opened by jeremymanning 13
  • animation settings (speed & tail length)

    animation settings (speed & tail length)

    might be interesting to :

    -let user input speed option and tail length -default tail length to be a proportion of the dataset's overall length (only within a reasonable range to avoid weird looking issues with very small or very large datasets)

    opened by KirstensGitHub 13
  • PCA implementation doesn't appear to normalize features before reducing

    PCA implementation doesn't appear to normalize features before reducing

    this is important if there are mean/variance differences between the columns of features. One possibility is that we could z-score the columns before PCA automatically (with a flag to turn it off), or conversely off with the option to turn it on. Another option would be to print a warning when there are large differences in mean/var between cols.

    question 
    opened by andrewheusser 12
  • tests for backend management

    tests for backend management

    the matplotlib backend management code only works in ipython/jupyter notebook-based environments. we could use some of the tricks @paxtonfitzpatrick is using in the davos package to run tests for that code.

    enhancement help wanted 
    opened by jeremymanning 0
  • better tests

    better tests

    many of our tests are not sufficiently rigorous-- e.g., they check that "something" gets returned, or that the proper datatype is returned, rather than checking specific values for a set of test cases.

    we should improve our pytests to better check that both datatypes and values are correct.

    bug help wanted wish list 
    opened by jeremymanning 0
  • animate=True does not work in Google Colab

    animate=True does not work in Google Colab

    animate=False renders static plots, but animate=True renders static empty boxes.

    https://colab.research.google.com/drive/10LoEodWC7PeMYfMnEf85eK97rXNtS7lh#scrollTo=vpUphrib4qGs

    opened by mhlr 0
  • Allow option to use DataGeometry objects à la scikit-learn pipelines

    Allow option to use DataGeometry objects à la scikit-learn pipelines

    Currently, if you want to repeatedly transform text samples with hypertools.tools.format_data() using the same parameters, the function re-fits both the vectorizer and text model on each call. This ends up being fairly inefficient, and for expensive/numerous operations, makes working directly with the underlying sklearn classes the better option.

    We could add an argument to return the fit models for reuse, but a really nice feature would be something like a scikit-learn Pipeline object that you could create, fit, save, and reuse to perform various processing steps with a single call. This would also be a very attractive feature for hypertools, since it could also additionally implement methods like .plot() and .describe().

    enhancement wish list 
    opened by paxtonfitzpatrick 0
  • Suggestion: animating uncertainty

    Suggestion: animating uncertainty

    If I understand correctly the doc, hypertools animation is currently mainly meant to plot timeseries.

    Instead of timeseries, would it be possible to use the animation to plot the uncertainty in the variable? Kind of like visually representing a bootstrap sampling. This would make hypertools a great candidate to be used as a hypothetical outcomes plot (HOP).

    Thank you in advance for any suggestion on how this could be implemented (or if it's already possible with some tweaking).

    opened by lrq3000 0
Releases(v0.8.0)
  • v0.8.0(Feb 12, 2022)

    updates to .geo file format

    Hypertools now saves DataGeometry objects using the pickle file format internally, rather than HDF5. With improvements made to the built-in pickle module since Hypertools's initial release, this now generally results in smaller files that save and load more quickly. It also allows us to no longer depend on deepdish, which has compatibility issues with various pandas objects, doesn't offer pre-built wheels for more recent Python versions, and is largely no longer maintained.

    If you need to load .geo files from the old format, hypertools.load now accepts a keyword-only argument, legacy. Install deepdish if necessary, and pass legacy=True to load older DataGeometry objects. You can then .save() them to convert them to the new format.

    improvements to example datasets

    All example data files have been upgraded to the new file format. Additionally, the three pre-trained scikit-learn Pipelines Hypertools provides (wiki_model, nips_model, and sotus_model) have been recreated from scratch using a newer scikit-learn version, better text preprocessing, and updated CountVectorizer and LatentDirichletAllocation parameters that result in overall better models.

    The example DataGeometry objects associated with these three models (wiki, nips, and sotus) have been updated accordingly, and additionally now use IncrementalPCA as their default reducers, resulting in faster, deterministic transform outputs.

    To use the new models and datasets, upgrade Hypertools to v0.8.0 (pip install -U hypertools) and remove the local cache of old versions ([[ -d ~/hypertools_data ]] && rm ~/hypertools_data/*). Older versions of Hypertools will continue to use the old example data.

    Other improvements

    • Hypertools is now compatible with Python 3.9! This release is also compatible in principle with Python 3.10, but numba does not yet support Python 3.10, so certain dependencies will fail to install.
    • Hypertools now works with newer scikit-learn versions! The updates above to the example datasets make Hypertools fully compatible with recent scikit-learn releases (>=0.24). This should make Hypertools easier to use in Colaboratory notebooks and more flexible in general. If you need to use an older scikit-learn version, pip-install hypertools<0.8.0.
    • Hypertools now works with newer Matplotlib versions! Recent updates to matplotlib's plotting backends were causing Hypertools's plotting interface to fail on import. We've fixed these bugs and maintained backwards compatibility with older (deprecated) interactive plotting backends as well.

    Other assorted changes

    • failures when loading example datasets and .geo files now raise HypertoolsIOError with clearer error messages
    • specifying a compression when saving a DataGeometry object raises a FutureWarning
    • CI tests now run with Python 3.6 -- 3.9, use mamba for faster environment setup, and generate more verbose output
    • dependencies and code required for Python 2/3 compatibility have been removed
    • various code causing RuntimeWarnings has been fixed
    Source code(tar.gz)
    Source code(zip)
    hypertools-0.8.0-py3-none-any.whl(58.29 KB)
    hypertools-0.8.0.tar.gz(48.51 KB)
  • v0.7.0(Jun 15, 2021)

    Control over matplotlib backend & various bug fixes

    New features:

    • Create animated plots in an environment with a non-interactive matplotlib plotting backend set, without disrupting the global plotting backend
    • Create non-animated, interactive plots for easy inspection of data using the new interactive keyword argument
    • Set the plotting backend for a single plot using the new mpl_backend keyword argument, and easily switch between backends within a single Python interpreter session, IPython kernel, and even Jupyter notebook cell
    • Use the new hypertools.set_interactive_backend function to change the backend for all future plots, or use it as a context manager to temporarily switch to a different backend. You can also use this to create multiple animated/interactive plots simultaneously.
    • use hypertools's backend adjustments to control behavior of other plotting libraries
    • Set the $HYPERTOOLS_BACKEND environment variable to permanently set your preferred plotting backend for non-IPython environments

    NB: Currently supported backends include TkInter, GTK, wxPython, Qt4, Qt5, Cocoa (aka MacOSX; MacOS only), notebook/nbAgg (Jupyter notebooks only), and ipympl/widget (Jupyter notebooks only). 3D and interactive plots may not render properly in Colab notebooks due to security restrictions imposed by the Colaboratory platform.

    Bug fixes

    • importing hypertools in a notebook no longer creates phantom Python processes, issues warnings when TkInter isn't installed, fails if matplotlib.pyplot was imported first, or silently changes the plotting backend (fixes #242)
    • creating 3D plots with hypertools no longer alters the global matplotlib.rcParams object (fixes #243)
    • hypertools can now be imported for non-plotting-related uses in environments without a compatible GUI without throwing an error
    • IPython's TAB-completion no longer triggers a full import of hypertools or improperly sets the plotting backend based on the subprocess's environment
    • require scikit-learn<0.24 (full spec: scikit-learn>=0.19.1,!=0.22,<0.24) to avoid bug when loading pre-trained DataGeometry objects due to renamed sklearn module
    Source code(tar.gz)
    Source code(zip)
    hypertools-0.7.0-py3-none-any.whl(58.09 KB)
    hypertools-0.7.0.tar.gz(46.56 KB)
  • v0.6.3(Oct 2, 2020)

    dependency-related updates

    • allow scikit-learn>0.22. scikit-learn==0.22.0 contains a bug that affects the CountVectorizer vocabulary. This has been fixed in 0.23.0.
    • require umap-learn>=0.4.6. We previously avoided a bug in umap-learn<=0.4.5 by installing a pre-release version from GitHub. This has now been fixed in umap-learn==0.4.6
    • Beginning with seaborn==0.11.0, "dark" color palettes are returned in reverse order from how they were previously. This difference in behavior will be reflected in hypertools, but we've changed the default cmap in hypertools._shared.helpers.vals2colors to a non-dark palette for consistent default behavior.
    • Added tests for Python 3.8
    Source code(tar.gz)
    Source code(zip)
  • v0.6.2(Dec 18, 2019)

    minor patch that enables dependencies not hosted on PyPI to install properly

    • setup.py's setup command is now a custom class that inherits from setuptools.command.install.install, runs the regular installation process, then pip-installs UMAP from its GitHub URL at a pre-release commit hash. This is completely equivalent to manually running pip install git+<URL>, but takes the burden of having to do so off of end-users.
    • removed URL from requirements.txt, added a comment in its place
    • added MANIFEST.IN file to include requirements.txt
    • updated minimum Python version listed on PyPI page to 3.5 to reflect that Python 3.4 support was dropped in v0.5.1 (August 2018)

    This version is tagged as 0.6.2 to keep the versioning here and on PyPI consistent. The fix intended to be 0.6.1 was unsuccessful on TestPyPI, and PyPI does not allow removing and reuploading an existing version.

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Dec 18, 2019)

    Updates to hypertools.reduce

    • fixed bug when to passing a dictionary of parameters to the reduce argument that would result in those parameters being overwritten
    • added some basic support for passing custom embedding models
    • added a warning when resolving conflicts between hypertools arguments and model-specific arguments

    Other changes

    • dropped support for Python 2.7
    • fixed bug in Travis tests
    • replaced depreciated pandas.DataFrame method in hypertools.tools.df2mat
    • require installing UMAP from the GitHub repository due to bug fix not released yet.
    • updated setup.py to comply with PEP 508 guidelines for installing external dependencies
    • added unit test for hypertools.reduce bug fix
    • removed some unused imports and commented-out code
    • removed outdated pages from readthedocs
    • readthedocs build is now Python 3-based
    • build folder is ignored by default when installing from GitHub repository in editable mode
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Aug 2, 2018)

    • added flake8 to travis tests
    • refactored some of procrustes function code
    • removed support for python 3.4
    • removed hdbscan from dependencies (still can be used if installed manually)

    Code cleanup (thanks @dwillmer!):

    • Changed string comparisons from if x is 'str' to if x == 'str'; the former is an identity comparison, not equality. It happens to be true for some strings because of string interning, but == should always be used for normal comparisons.
    • Removed unused arguments from _draw function - return_data and others weren't used in the function body.
    • Removed unreachable code in normalize function (branch criteria could never be True).
    • Separated out the multiply-nested function calls in DataGeometry class for clarity.
    • Changed comparisons of the formif type(x) is list to if isinstance(x, list); The former doesn't return True for subclasses, so isinstance should always be used.
    • Set unused loop variables to _.
    • Removed unused imports.
    • Ensured all imports are at the top of the file (except lazy / circular ones)
    • Ensure 2 blank lines above functions/classes (PEP8), the code looks a bit weird without this.
    • Fixed typo repect -> respect, was copy-pasted in multiple docstrings.
    • Removed redundant pass before error raise
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 18, 2018)

    Enhancements:

    Plotting and transforming text data

    • hyp.plot now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages.
    • A new vectorizer argument in hyp.plot to specify a text vectorizer. Currently supports CountVectorizer,TfidfVectorizer`, or class instances (fit or unfit) of these models.
    • A new semantic argument in hyp.plot that specifies the semantic model to use to transform text. Current supports LatentDirichletAllocation, NMF, or class instances (fit or unfit) of these models.
    • A new corpus argument in hyp.plot that allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text.
    • Enhanced hyp.format_data function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.

    New algorithms

    • A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g. hyp.plot(data, cluster='HDBSCAN')
    • A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g. hyp.plot(data, reduce='UMAP')

    New parameters

    • A new size param to resize figure e.g. hyp.plot(data, size=[10,8])
    • A new ax param to add figure to existing axis e.g. hyp.plot(data, ax=ax)

    New text examples

    • A new dataset of NIPS papers e.g. hyp.load('nips') (from kaggle)
    • A new dataset of selected wikipedia pages e.g. hyp.load('wiki')
    • A new dataset of State of the Union text from 1989-2017. Can be loaded as hyp.load('sotus') (from kaggle)

    API changes In hyp.plot changed group arg to hue (group will still be supported but depreciated in a coming release).

    • Removed deprecated describe_pca function. Please use more general function, describe.

    Bugs fixed

    • When using chemtrails in hyp.plot, the entire timeseries would appear for the first few seconds of an animation and then dissapear.
    • The legend colors did not align with the data when using the fmt or color args.
    • When grouping with group/hue arg, labels were not reshuffled.
    • Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.

    NOTE: If you have been using the development version of 0.5.0, please clear your data cache (/Users/yourusername/hypertools_data).

    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Dec 11, 2017)

  • v0.4.1(Nov 19, 2017)

    • exposed format_data which formats numpy array, pandas df or mixed list in list of numpy arrays(hypertools.tools.format_data)
    • added tests for the function to format_data
    • added documentation to format_data
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Oct 12, 2017)

    Enhancements -

    • A new class: DataGeometry with methods for plotting, transforming new data and saving Support for loading *.geo objects
    • A new function: analyze to perform combinations of transformations
    • A new function: describe for characterizing the loss of information due to dimensionality reduction algs
    • In-memory caching of time-intensive reduce, align and describe operations
    • New syntax for reduce function: model and model_params are now passed as a dictionary using the reduce arg
    • New clustering models added to the cluster function: MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, and SpectralClustering
    • Moved major functions (normalize, align, reduce, cluster, load) to main level (i.e. hyp.load instead of hyp.tools.load, but the latter will still work)

    Deprecations -

    • A deprecation warning is thrown for the following align arguments: normalize, ndims, and method
    • A deprecation warning is thrown for the following reduce arguments: model, model_params, align, and normalize
    • A deprecation warning is thrown for the following cluster arguments: ndims
    • A deprecation warning is thrown for the describe_pca function (replaced by describe)

    Bugs -

    • fixed #148 bug in hyp.plot where figure would be rendered despite setting show=False (thanks @chaseWilliams !)
    • fixed a bug where n_clusters would not override group, even though a warning message said it would
    • fixed a bug where hyp.plot would quit if any kwargs were not the same length as the number of arrays in the list of input data.

    Minor -

    • added brainiak toolbox citation and github link to align.py docstring
    • added additional details and fixed typos in align.py docstring
    • Upgraded seaborn requirement to 8.1
    • updated all examples/docs with new syntax changes
    • added new tests for new features
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Aug 11, 2017)

  • v0.3.0(Jun 14, 2017)

    This release extends hypertools to support the following dimensionality reduction / manifold learning models:

    • PCA
    • FastICA
    • IncrementalPCA
    • KernelPCA
    • FactorAnalysis
    • TruncatedSVD
    • SparsePCA
    • MiniBatchSparsePCA
    • DictionaryLearning
    • MiniBatchDictionaryLearning
    • TSNE
    • MDS
    • SpectralEmbedding
    • LocallyLinearEmbedding
    • Isomap

    The default reduction algorithm was switched from PCA to IncrementalPCA for better handling of large datasets.

    Bugs squashed:

    • fixed plot_procrustes example so that rotation matrix is orthonormal
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Jun 5, 2017)

    The work for this update was done during the Mozilla Global Sprint 2017. Thank you @alysivji and @rarredon for your contributions! Thanks @stephwright and the Mozilla Open Science Team for organizing an awesome event!

    New Features:

    • If legend is not explicitly given, it can be computed implicitly by passing legend=True
    • Align flag added to hyp.plot function
    • Align flag added to hyp.tools.reduce function
    • Reduce flag added to hyp.tools.align and hyp.tools.procrustes functions
    • Align and reduce flags added to hyp.tools.load function
    • Updated examples with new syntax

    Bugs Squashed:

    • Fixed import bug for saving animations
    • Fixed bug in align function where an extra column(s) of zeros were appended to data before alignment if ndims<=2
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(May 31, 2017)

    new features, new code organization and also some style changes!

    Key changes:

    • The way we handle args. Now, all keywords are handled explicitly, instead of unpacking them using the **kwargs syntax. This makes the code much cleaner, and easier to parse arguments.

    • The organization of the plotting code. I eliminated the separate static and animate code bases, bc there was a lot of redundant code and handling args was a mess. its now organized into a plot function, which is the main plotting function that also handles data manipulation prior to plotting, and a draw function, which handles all static and animated drawing.

    • Plot styling. All styles are now consistent, the static plots are now the same as the animated plots. Also, all lines/points are thinner.

    • New keyword arguments to the plot function: +fmt in 0.1.0, format strings were handled as arguments, but now they are passed as a kwarg. since the fmt kwarg is the first param after the data, the API in 0.1 and 0.2 is the same, except in 0.1 format strings could be passed in any position, and now they must be passed immediately after the data. +title can be passed to add a title to the plot +elev can be passed to change the elevation of the plot. useful for static plots in jupyter notebooks +azim can be passed to change the azimuth of a plot. useful for static plots in jupyter notebooks +precog is an animation only feature which plots a low-opacity trace ahead of the data (similar to chemtrails but in the opposite direction. +bullettime (animation only) is the same as the combinations of precog and chemtrails +animate='spin' will create a "static" plot (i.e. all the data is plotted at once) that rotates +return_data has been eliminated. the data is now always returned by default

    Minor changes:

    • changed 'weights' example data from 3D numpy array to list of 2D numpy arrays
    • updated examples for new 'weights' data format
    • remove docs _build folder from repo
    • added requests to setup.py file
    • remove rogue class from init file

    Bugs fixed:

    if the rank is of the input matrix is smaller than the number of dimensions requested, the reduce function will now pad the reduced data matrix with ndims-rank columns of zeros.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.7(Apr 17, 2017)

    Enhancements:

    • added load function to load in example data
    • moved example data out of repo to google drive

    Bugs squashed:

    • added missing import sys statement to helpers.py
    • fixed bug with in example scripts dealing with missing data
    • fixed bug that caused software to crash when using PPCA
    • fixed bug in readthedocs build caused by empty toctree
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Feb 23, 2017)

    • fixed bug where group kwarg caused code to crash on some systems
    • fixed bug where requirements still listed matplotlib <2.0, even though it is supported
    • patched bug where tail_duration would crash if not an integer value
    • added warning to align function when n features exceeds n samples
    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Feb 6, 2017)

  • v0.1.4(Feb 6, 2017)

  • v0.1.3(Feb 3, 2017)

    Bugs fixed:

    • patched bug where future division was not being imported into align, procrustes and srm functions
    • fixed bug where category labels could be returned out of order because of set
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Feb 2, 2017)

    changes:

    • added docstrings to all public functions
    • added sphinx-generated documentation of API with examples
    • modified the examples so they are compatible with sphinx-gallery
    • simplified readme

    bugs fixed:

    • patched bug where matplotlib 2.0 could be used (working on this for next release)
    • patched bug where procrustes function uses dimensionality reduction during alignment by default
    Source code(tar.gz)
    Source code(zip)
  • v0.1(Jan 27, 2017)

    First version! API documented below:

    Main function

    - plot - Plots high dimensional data in 1, 2, or 3 dimensions as static image, 3d interactive plot or animated plot.

    Sub functions

    - tools.align - align multidimensional data (See [here](http://haxbylab.dartmouth.edu/publications/HGC+11.pdf) for details) - tools.reduce - implements PCA to reduce dimensionality of data - tools.cluster - runs k-means clustering and returns cluster labels - tools.describe_pca - plotting tool to evaluate how well the principle components describe the data - tools.missing_inds - returns indices of missing data (nans) - tools.normalize - z-scores the columns/rows of a matrix or list of matrices - tools.procrustes - projects from one space to another using Procrustean transformation (shift + scaling + rotation) (Adapted from [pyMVPA](https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/mappers/procrustean.py) implementation) - tools.df2mat - converts single-level pandas dataframe to numpy matrix

    Plot

    Plot example

    Inputs:

    A numpy array, list of arrays, or pandas dataframe or list of dataframes

    NOTE: HyperTools currently only supports single-level indexing for pandas dataframes, but we plan to support multi-level indices in the future. Additionally, be aware that if columns containing text are passed to HyperTools, those columns will be automatically converted into dummy variables (see pandas.get_dummies for details).

    Arguments:

    Format strings can be passed as a string, or tuple/list of length x. See matplotlib API for more styling options

    Keyword arguments:

    color(s) (list): A list of colors for each line to be plotted. Can be named colors, RGB values (e.g. (.3, .4, .1)) or hex codes. If defined, overrides palette. See http://matplotlib.org/examples/color/named_colors.html for list of named colors. Note: must be the same length as X.

    group (list of str, floats or ints): A list of group labels. Length must match the number of rows in your dataset. If the data type is numerical, the values will be mapped to rgb values in the specified palette. If the data type is strings, the points will be labeled categorically. To label a subset of points, use None (i.e. ['a', None, 'b','a'])

    linestyle(s) (list): a list of line styles

    marker(s) (list): a list of marker types

    palette (str): A matplotlib or seaborn color palette

    labels (list): A list of labels for each point. Must be dimensionality of data (X). If no label is wanted for a particular point, input None

    legend (list): A list of string labels to be plotted in a legend (one for each list item)

    ndims (int): an int representing the number of dims to plot in. Must be 1,2, or 3. NOTE: Currently only works with static plots.

    normalize (str or False) - If set to 'across', the columns of the input data will be z-scored across lists (default). If set to 'within', the columns will be z-scored within each list that is passed. If set to 'row', each row of the input data will be z-scored. If set to False, the input data will be returned (default is False).

    n_clusters (int): If n_clusters is passed, HyperTools will perform k-means clustering with the k parameter set to n_clusters. The resulting clusters will be plotted in different colors according to the color palette.

    animate (bool): If True, plots the data as an animated trajectory (default: False)

    show (bool): If set to False, the figure will not be displayed, but the figure, axis and data objects will still be returned (see Outputs) (default: True).

    save_path (str): Path to save the image/movie. Must include the file extension in the save path (i.e. save_path='/path/to/file/image.png'). NOTE: If saving an animation, FFMPEG must be installed (this is a matplotlib req). FFMPEG can be easily installed on a mac via homebrew brew install ffmpeg or linux via apt-get apt-get install ffmpeg. If you don't have homebrew (mac only), you can install it like this: /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)".

    explore (bool): Displays user defined labels will appear on hover. If no labels are passed, The point index and coordinate will be plotted. To use, set explore=True.

    Note: Explore more is currently only supported for 3D static plots.

    Animation-specific keyword arguments:

    duration (int): Length of the animation in seconds (default: 30 seconds)

    tail_duration (int): Sets the length of the tail of the data (default: 2 seconds)

    rotations (int): Number of rotations around the box (default: 2)

    zoom (int): Zoom, positive numbers will zoom in (default: 0)

    chem_trails (bool): Added trail with change in opacity (default: False)

    Outputs:

    -By default, the plot function outputs a figure handle (matplotlib.figure.Figure), axis handle (matplotlib.axes._axes.Axes) and data (list of numpy arrays), e.g. fig,axis,data = hyp.plot(x)

    -If animate=True, the plot function additionally outputs an animation handle (matplotlib.animation.FuncAnimation) e.g. fig,axis,data,line_ani = hyp.plot(x, animate=True)

    Example uses:

    Please see the examples folder for many more implementation examples.

    Import the library: import hypertools as hyp

    Plot with default color palette: hyp.plot(data)

    Plot as movie: hyp.plot(data, animate=True)

    Change color palette: hyp.plot(data,palette='Reds')

    Specify colors using unlabeled list of format strings: hyp.plot([data[0],data[1]],['r:','b--'])

    Plot data as points: hyp.plot([data[0],data[1]],'o')

    Specify colors using keyword list of colors (color codes, rgb values, hex codes or a mix): hyp.plot([data[0],data[1],[data[2]],color=['r', (.5,.2,.9), '#101010'])

    Specify linestyles using keyword list: hyp.plot([data[0],data[1],[data[2]],linestyle=[':','--','-'])

    Specify markers using keyword list: hyp.plot([data[0],data[1],[data[2]],marker=['o','*','^'])

    Specify markers with format string and colors with keyword argument: hyp.plot([data[0],data[1],[data[2]], 'o', color=['r','g','b'])

    Specify labels:

    # Label first point of each list
    labels=[]
    for idx,i in enumerate(data):
        tmp=[]
        for iidx,ii in enumerate(i):
            if iidx==0:
                tmp.append('Point ' + str(idx))
            else:
                tmp.append(None)
        labels.append(tmp)
    
    hyp.plot(data, 'o', labels=labels)
    

    Specify group:

    # Label first point of each list
    group=[]
    for idx,i in enumerate(data):
        tmp=[]
        for iidx,ii in enumerate(i):
                tmp.append(np.random.rand())
        group.append(tmp)
    
    hyp.plot(data, 'o', group=group)
    

    Plot in 2d: hyp.plot(data, ndims=2)

    Group clusters by color: hyp.plot(data, n_clusters=10)

    Create a legend: hyp.plot([data[0],data[1]], legend=['Group A', 'Group B'])

    Turn on explore mode (experimental): hyp.plot(data, 'o', explore=True)

    Align

    BEFORE

    Align before example

    AFTER

    Align after example

    Inputs:

    A list of numpy arrays

    Outputs:

    An aligned list of numpy arrays

    Example use:

    align a list of arrays: aligned_data = hyp.tools.align(data)

    Reduce

    Inputs:

    A numpy array or list of numpy arrays

    Keyword arguments:

    • ndims - dimensionality of output data
    • normalize (str or False) - If set to 'across', the columns of the input data will be z-scored across lists. If set to 'within', the columns will be z-scored within each list that is passed. If set to 'row', each row of the input data will be z-scored. If set to False, the input data will be returned. (default is 'across').

    Outputs

    An array or list of arrays with reduced dimensionality

    Example uses

    Reduce n-dimensional array to 3d: reduced_data = hyp.tools.reduce(data, ndims=3)

    Cluster

    Inputs:

    A numpy array or list of numpy arrays

    Keyword arguments:

    • n_clusters (int) - number of clusters to fit (default=8)
    • ndims (int) - reduce data to ndims before running k-means (optional)

    Outputs

    A list of cluster labels corresponding to each data point. NOTE: During the cluster fitting, the data are stacked across lists, so if multiple lists are passed, the returned list of cluster labels will need to be reshaped.

    Example use:

    cluster_labels = hyp.tools.cluster(data, n_clusters=10)
    hyp.plot(data, 'o', group = cluster_labels)
    

    Cluster Example

    Describe PCA

    Inputs:

    A numpy array or list of numpy arrays

    Keyword arguments:

    • show (bool) - if true, returns figure handle, axis handle and dictionary containing the plotted data. If false, the function just returns a dictionary containing the data

    Outputs

    A plot summarizing the correlation of the covariance matrixes for the raw input data and PCA reduced data

    Example use:

    hyp.tools.describe_pca(data)

    Describe Example

    Missing inds

    Inputs:

    A numpy array or list of numpy arrays

    Outputs

    A list of indices representing rows with missing data. If a list of numpy arrays is passed, a list of lists will be returned.

    Example use:

    missing_data_inds = hyp.tools.missing_inds(data)

    Normalize

    Inputs:

    A numpy array or list of numpy arrays

    Keyword arguments:

    • normalize (str or False) - If set to 'across', the columns of the input data will be z-scored across lists. If set to 'within', the columns will be z-scored within each list that is passed. If set to 'row', each row of the input data will be z-scored. If set to False, the input data will be returned. Note: you MUST set the normalize flag equal to 'across', 'within' or 'row or else you will get the same data back that you put in!

    Outputs

    An array or list of normalized data

    Example use:

    normalized_data = hyp.tools.normalize(data, normalize='within')

    Procrustes

    Inputs:

    • source - a numpy array to be transformed
    • target - a numpy array to serve as template

    Outputs

    A (shifted + scaled + rotated) version of source that best matches target

    Example use:

    source_aligned_to_target = hyp.tools.procrustes(source, target)

    df2mat

    Inputs:

    • a single-level pandas dataframe

    Outputs

    A numpy matrix built from the dataframe with text columns replaced with dummy variables (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html)

    Example use:

    matrix = hyp.tools.df2mat(df)

    Source code(tar.gz)
    Source code(zip)
Owner
Contextual Dynamics Laboratory
Contextual Dynamics Laboratory at Dartmouth College
Contextual Dynamics Laboratory
HiPlot makes understanding high dimensional data easy

HiPlot - High dimensional Interactive Plotting HiPlot is a lightweight interactive visualization tool to help AI researchers discover correlations and

Facebook Research 2.4k Sep 28, 2022
HiPlot makes understanding high dimensional data easy

HiPlot - High dimensional Interactive Plotting HiPlot is a lightweight interactive visualization tool to help AI researchers discover correlations and

Facebook Research 2k Feb 17, 2021
Profile and test to gain insights into the performance of your beautiful Python code

Profile and test to gain insights into the performance of your beautiful Python code View Demo - Report Bug - Request Feature QuickPotato in a nutshel

Joey Hendricks 135 Aug 5, 2022
BrowZen correlates your emotional states with the web sites you visit to give you actionable insights about how you spend your time browsing the web.

BrowZen BrowZen correlates your emotional states with the web sites you visit to give you actionable insights about how you spend your time browsing t

Nick Bild 35 Aug 9, 2022
A simple python script using Numpy and Matplotlib library to plot a Mohr's Circle when given a two-dimensional state of stress.

Mohr's Circle Calculator This is a really small personal project done for Department of Civil Engineering, Delhi Technological University (formerly, D

Agyeya Mishra 0 Jul 17, 2021
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 707 Sep 29, 2022
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 531 Feb 16, 2021
Seismic Waveform Inversion Toolbox-1.0

Seismic Waveform Inversion Toolbox (SWIT-1.0)

Haipeng Li 82 Sep 29, 2022
This is a super simple visualization toolbox (script) for transformer attention visualization ✌

Trans_attention_vis This is a super simple visualization toolbox (script) for transformer attention visualization ✌ 1. How to prepare your attention m

Mingyu Wang 3 Jul 9, 2022
Plot toolbox based on Matplotlib, simple and elegant.

Elegant-Plot Plot toolbox based on Matplotlib, simple and elegant. 绘制效果 绘制过程 数据准备 每种图标类型的目录下有data.csv文件,依据样例数据填入自己的数据。

null 3 Jul 15, 2022
High-level geospatial data visualization library for Python.

geoplot: geospatial data visualization geoplot is a high-level Python geospatial plotting library. It's an extension to cartopy and matplotlib which m

Aleksey Bilogur 1k Sep 13, 2022
Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from time series data.

ts2vg: Time series to visibility graphs The Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from t

Carlos Bergillos 19 Sep 1, 2022
Data aggregated from the reports found at the MCPS COVID Dashboard into a set of visualizations.

Montgomery County Public Schools COVID-19 Visualizer Contents About this project Data Support this project About this project Data All data we use can

James 3 Jan 19, 2022
A high performance implementation of HDBSCAN clustering. http://hdbscan.readthedocs.io/en/latest/

HDBSCAN Now a part of scikit-learn-contrib HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over va

Leland McInnes 89 Sep 18, 2022
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 656 Sep 28, 2022
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 1.9k Oct 2, 2022
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 349 Feb 15, 2021
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 209 Feb 17, 2021
Some problems of SSLC ( High School ) before outputs and after outputs

Some problems of SSLC ( High School ) before outputs and after outputs 1] A Python program and its output (output1) while running the program is given

Fayas Noushad 3 Dec 1, 2021