Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Overview

AutoViz

banner

Pepy Downloads Pepy Downloads per week Pepy Downloads per month standard-readme compliant Python Versions PyPI Version PyPI License

Automatically Visualize any dataset, any size with a single line of code.

AutoViz performs automatic visualization of any dataset with one line. Give any input file (CSV, txt or json) and AutoViz will visualize it.

Table of Contents

Install

Prerequsites

To clone AutoViz, it's better to create a new environment, and install the required dependencies:

To install from PyPi:

conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
pip install autoviz

To install from source:

cd <AutoViz_Destination>
git clone [email protected]:AutoViML/AutoViz.git
# or download and unzip https://github.com/AutoViML/AutoViz/archive/master.zip
conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
cd AutoViz
pip install -r requirements.txt

Usage

Read this Medium article to know how to use AutoViz.

In the AutoViz directory, open a Jupyter Notebook and use this line to instantiate the library

from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()

Load a dataset (any CSV or text file) into a Pandas dataframe or give the name of the path and filename you want to visualize. If you don't have a filename, you can simply assign the filename argument "" (empty string).

Call AutoViz using the filename (or dataframe) along with the separator and the name of the target variable in the input. AutoViz will do the rest. You will see charts and plots on your screen.

filename = ""
sep = ","
dft = AV.AutoViz(
    filename,
    sep=",",
    depVar="",
    dfte=None,
    header=0,
    verbose=0,
    lowess=False,
    chart_format="svg",
    max_rows_analyzed=150000,
    max_cols_analyzed=30,
)

AV.AutoViz is the main plotting function in AV.

Notes:

  • AutoViz will visualize any sized file using a statistically valid sample.
  • COMMA is assumed as default separator in file. But you can change it.
  • Assumes first row as header in file but you can change it.
  • verbose option
    • if 0, display minimal information but displays charts on your notebook
    • if 1, print extra information on the notebook and also display charts
    • if 2, will not display any charts, it will simply save them in your local machine under AutoViz_Plots directory

API

Arguments

  • filename - Make sure that you give filename as empty string ("") if there is no filename associated with this data and you want to use a dataframe, then use dfte to give the name of the dataframe. Otherwise, fill in the file name and leave dfte as empty string. Only one of these two is needed to load the data set.
  • sep - this is the separator in the file. It can be comma, semi-colon or tab or any value that you see in your file that separates each column.
  • depVar - target variable in your dataset. You can leave it as empty string if you don't have a target variable in your data.
  • dfte - this is the input dataframe in case you want to load a pandas dataframe to plot charts. In that case, leave filename as an empty string.
  • header - the row number of the header row in your file. If it is the first row, then this must be zero.
  • verbose - it has 3 acceptable values: 0, 1 or 2. With zero, you get all charts but limited info. With 1 you get all charts and more info. With 2, you will not see any charts but they will be quietly generated and save in your local current directory under the AutoViz_Plots directory which will be created. Make sure you delete this folder periodically, otherwise, you will have lots of charts saved here if you used verbose=2 option a lot.
  • lowess - this option is very nice for small datasets where you can see regression lines for each pair of continuous variable against the target variable. Don't use this for large data sets (that is over 100,000 rows)
  • chart_format - this can be SVG, PNG or JPG. You will get charts generated and saved in this format if you used verbose=2 option. Very useful for generating charts and using them later.
  • max_rows_analyzed - limits the max number of rows that is used to display charts. If you have a very large data set with millions of rows, then use this option to limit the amount of time it takes to generate charts. We will take a statistically valid sample.
  • max_cols_analyzed - limits the number of continuous vars that can be analyzed

Maintainers

Contributing

See the contributing file!

PRs accepted.

License

Apache License, Version 2.0

DISCLAIMER

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

Comments
  • Project logo [help wanted]

    Project logo [help wanted]

    If anyone with design sensibilities sees this. We are open to changing the project logo.

    We like the pandas logo for example https://github.com/pandas-dev/pandas

    opened by morenoh149 18
  • How do we see output using a script file a terminal?

    How do we see output using a script file a terminal?

    Hi AutoViML,

    Firstly, congratulations and thanks for this wonderful package. This works perfectly fine with Jupyter notebooks but how do I use the same if I am using an IDE let say Spyder?

    Thanks in advance. Mohit

    opened by bansalism2 9
  • "Not able to read or load file. Please check your inputs and try again..."

    hello Ram, when i run the code on my dateset, dft = av.AutoViz('', sep, target, df) i get this error "Not able to read or load file. Please check your inputs and try again..."

    what could the issue be?

    opened by isioma42 8
  • exporting the report

    exporting the report

    Similar project to AutoViz are Sweetviz and Pandas Profiling.

    They could export the report as a HTML file. I wonder if this library also has this function?

    opened by sunset1234321 5
  • some variables in data removed automatically

    some variables in data removed automatically

    Hi, I gave input csv contains 20 variables,while preprocessing it removed all important columns,may i know the reason?. note:- removed columns contains fill data without null values

    opened by sivanagendra123 5
  • Not able to read or load file using Auto viz

    Not able to read or load file using Auto viz

    I am having the above error while using autoviz My target variable is price i even tried it using empty target variable still same result i dont know the problem I tried dataframe, still same issue meanwhile i as able to read my data with pandas but autoviz could no house_price.zip i attached Zip file of the data because it was large kindly assist

    opened by gbogobabs 4
  • Suggesting Updated for Wordcloud

    Suggesting Updated for Wordcloud

    1. Updating Stopwords List

    Currently, I can see that Stopwords are defined as a list and I can see that it is missing a few stop words like "themselves".

    
    def return_stop_words():
        STOP_WORDS = ['it', "this", "that", "to", 'its', 'am', 'is', 'are', 'was', 'were', 'a',
                    'an', 'the', 'and', 'or', 'of', 'at', 'by', 'for', 'with', 'about', 'between',
                     'into','above', 'below', 'from', 'up', 'down', 'in', 'out', 'on', 'over',
                      'under', 'again', 'further', 'then', 'once', 'all', 'any', 'both', 'each',
                       'few', 'more', 'most', 'other', 'some', 'such', 'only', 'own', 'same', 'so',
                        'than', 'too', 'very', 's', 't', 'can', 'just', 'd', 'll', 'm', 'o', 're',
                        've', 'y', 'ain', 'ma']
        add_words = ["s", "m",'you', 'not',  'get', 'no', 'via', 'one', 'still', 'us', 'u','hey','hi','oh','jeez',
                    'the', 'a', 'in', 'to', 'of', 'i', 'and', 'is', 'for', 'on', 'it', 'got','aww','awww',
                    'not', 'my', 'that', 'by', 'with', 'are', 'at', 'this', 'from', 'be', 'have', 'was',
                    '', ' ', 'say', 's', 'u', 'ap', 'afp', '...', 'n', '\\']
        stop_words = list(set(STOP_WORDS+add_words))
        return sorted(stop_words)
    
    

    Isn't it better to use NLTK stop words list??

    from nltk.corpus import stopwords
    
    for lang in langs:
      stopwords.words(lang)
    
    

    Copied from: https://gist.github.com/sebleier/554280

    2. Lemmatization before plotting

    I think it is better if we lemmatize the data before we plot then words like "reads", "reading" will count as the same, which will give us a better word cloud.

    opened by chekoduadarsh 4
  • Misplaced graph x ylabel

    Misplaced graph x ylabel

    Hi Ram, I have tried this package and found out a potential bug. When I tried to do the AV.AutoViz('', ',', 'target', df) to run an autoViz stuff, the x y labels of each graph are misplaced (x label should be placed at y label and vice versa.). I have tried two datasets and it still happened. Please look into this and see if this is a bug or I just did something wrong. Thanks! Jeff

    opened by HiIamJeff 4
  • "panel" dependency support for the latest version

    requirements.txt have an dependecie "panel" in 12.x version. is too old. new version of panel is 0.14.x and have a new bugfixes.

    Suggestion: panel~=0.12.6 ---> support new panel version (0.14.0)

    opened by celestinoxp 3
  • Can not draw Wordcloud plot

    Can not draw Wordcloud plot

    Shape of your Data Set loaded: (3312, 20) ####################################################################################### ######################## C L A S S I F Y I N G V A R I A B L E S #################### ####################################################################################### Classifying variables in data set... 20 Predictors classified... 2 variables removed since they were ID or low-information variables Number of All Scatter Plots = 6 Could not draw wordcloud plot for Order ID Could not draw wordcloud plot for Product ID Could not draw wordcloud plot for Product Name Could not draw wordcloud plot for Customer ID Could not draw wordcloud plot for Customer Name Could not draw wordcloud plot for City Time to run AutoViz = 5 seconds

    ###################### AUTO VISUALIZATION Completed ########################

    opened by tonovoh1 3
  • [bug] problem with time series charts

    [bug] problem with time series charts

    Here is minimal reproducible example with google colab:

    1. Date time column is no recognized, when input is file:
    !pip install autoviz
    from autoviz.AutoViz_Class import AutoViz_Class
    
    import pandas as pd
    
    AV = AutoViz_Class()
    
    df = pd.DataFrame({'time': ['2020-01-15', '2020-02-15', '2020-03-15', '2020-04-15', '2020-05-15'], 'values': [1.0,2.5,3.2,4.2,5.6]})
    df['time'] = pd.to_datetime(df['time'])
    df.to_csv('ts.csv', index=False)
    
    dft = AV.AutoViz("ts.csv", verbose=2)
    
    hape of your Data Set loaded: (5, 2)
    ############## C L A S S I F Y I N G  V A R I A B L E S  ####################
    Classifying variables in data set...
    Data Set Shape: 5 rows, 2 cols
    Data Set columns info:
    * time: 0 nulls, 5 unique vals, most common: {'2020-05-15': 1, '2020-03-15': 1}
    * values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
    --------------------------------------------------------------------
        Numeric Columns: ['values']
        Integer-Categorical Columns: []
        String-Categorical Columns: []
        Factor-Categorical Columns: []
        String-Boolean Columns: []
        Numeric-Boolean Columns: []
        Discrete String Columns: []
        NLP text Columns: []
        Date Time Columns: []
        ID Columns: ['time']
        Columns that will not be considered in modeling: []
        2 Predictors classified...
            This does not include the Target column(s)
            1 variables removed since they were ID or low-information variables
        List of variables removed: ['time']
    No categorical or numeric vars in data set. Hence no bar charts.
    Time to run AutoViz (in seconds) = 0.562
    
    
    1. When input is dataframe - chart is not generated, but date time column is recognized:
    dft = AV.AutoViz("", dfte=df, verbose=2)
    Shape of your Data Set loaded: (5, 2)
    ############## C L A S S I F Y I N G  V A R I A B L E S  ####################
    Classifying variables in data set...
    Data Set Shape: 5 rows, 2 cols
    Data Set columns info:
    * time: 0 nulls, 5 unique vals, most common: {Timestamp('2020-05-15 00:00:00'): 1, Timestamp('2020-04-15 00:00:00'): 1}
    * values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
    --------------------------------------------------------------------
        Numeric Columns: ['values']
        Integer-Categorical Columns: []
        String-Categorical Columns: []
        Factor-Categorical Columns: []
        String-Boolean Columns: []
        Numeric-Boolean Columns: []
        Discrete String Columns: []
        NLP text Columns: []
        Date Time Columns: ['time']
        ID Columns: []
        Columns that will not be considered in modeling: []
        2 Predictors classified...
            This does not include the Target column(s)
            No variables removed since no ID or low-information variables found in data set
    Could not draw Date Vars
    No categorical or numeric vars in data set. Hence no bar charts.
    Time to run AutoViz (in seconds) = 0.408
    
    

    Expected result: chart with date on x-axis, and value on y-axis.

    opened by mglowacki100 3
  • Clean up README

    Clean up README

    I personally found the readme very hard to read. I had to scroll to three quarters down the page to learn how to use the library. I think it would be better to move the new features at the top of the file to the rear half of the readme for advanced users, so that beginners can begin using the package easier.

    opened by kevinlinxc 1
  • The title part at the top of the output image is cut off

    The title part at the top of the output image is cut off

    When using vervose=2 to output an svg or png file, there is an issue where the top title part is cut off. There seems to be a problem with the height value setting, please check.

    opened by beobest2 2
Owner
AutoViz and Auto_ViML
Automated Machine Learning: Build Variant Interpretable Machine Learning models. Project Created by Ram Seshadri.
AutoViz and Auto_ViML
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 2.3k Jan 5, 2023
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 1.2k Feb 18, 2021
Visualize your pandas data with one-line code

PandasEcharts 简介 基于pandas和pyecharts的可视化工具 安装 pip 安装 $ pip install pandasecharts 源码安装 $ git clone https://github.com/gamersover/pandasecharts $ cd pand

陈华杰 2 Apr 13, 2022
Create 3d loss surface visualizations, with optimizer path. Issues welcome!

MLVTK A loss surface visualization tool Simple feed-forward network trained on chess data, using elu activation and Adam optimizer Simple feed-forward

null 7 Dec 21, 2022
The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Computational Intelligence Group 125 Dec 24, 2022
Flame Graphs visualize profiled code

Flame Graphs visualize profiled code

Brendan Gregg 14.1k Jan 3, 2023
Generate graphs with NetworkX, natively visualize with D3.js and pywebview

webview_d3 This is some PoC code to render graphs created with NetworkX natively using D3.js and pywebview. The main benifit of this approac

byt3bl33d3r 68 Aug 18, 2022
Visualize tensors in a plain Python REPL using Sparklines

Visualize tensors in a plain Python REPL using Sparklines

Shawn Presser 43 Sep 3, 2022
Visualize the bitcoin blockchain from your local node

Project Overview A new feature in Bitcoin Core 0.20 allows users to dump the state of the blockchain (the UTXO set) using the command dumptxoutset. I'

null 18 Sep 11, 2022
Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database

SpiderFoot Neo4j Tools Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database Step 1: Installation NOTE: This installs the sf

Black Lantern Security 42 Dec 26, 2022
Extract and visualize information from Gurobi log files

GRBlogtools Extract information from Gurobi log files and generate pandas DataFrames or Excel worksheets for further processing. Also includes a wrapp

Gurobi Optimization 56 Nov 17, 2022
Extract data from ThousandEyes REST API and visualize it on your customized Grafana Dashboard.

ThousandEyes Grafana Dashboard Extract data from the ThousandEyes REST API and visualize it on your customized Grafana Dashboard. Deploy Grafana, Infl

Flo Pachinger 16 Nov 26, 2022
A gui application to visualize various sorting algorithms using pure python.

Sorting Algorithm Visualizer A gui application to visualize various sorting algorithms using pure python. Language : Python 3 Libraries required Tkint

Rajarshi Banerjee 19 Nov 30, 2022
This is a web application to visualize various famous technical indicators and stocks tickers from user

Visualizing Technical Indicators Using Python and Plotly. Currently facing issues hosting the application on heroku. As soon as I am able to I'll like

null 4 Aug 4, 2022
Visualize the training curve from the *.csv file (tensorboard format).

Training-Curve-Vis Visualize the training curve from the *.csv file (tensorboard format). Feature Custom labels Curve smoothing Support for multiple c

Luckky 7 Feb 23, 2022
Python package to Create, Read, Write, Edit, and Visualize GSFLOW models

pygsflow pyGSFLOW is a python package to Create, Read, Write, Edit, and Visualize GSFLOW models API Documentation pyGSFLOW API documentation can be fo

pyGSFLOW 21 Dec 14, 2022
A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects.

Orbitals in Python A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects, and o

Prakrisht Dahiya 1 Nov 25, 2021
Visualize data of Vietnam's regions with interactive maps.

Plotting Vietnam Development Map This is my personal project that I use plotly to analyse and visualize data of Vietnam's regions with interactive map

null 1 Jun 26, 2022
Minimalistic tool to visualize how the routes to a given target domain change over time, feat. Python 3.10 & mermaid.js

Minimalistic tool to visualize how the routes to a given target domain change over time, feat. Python 3.10 & mermaid.js

Péter Ferenc Gyarmati 1 Jan 17, 2022