Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz and Auto_ViML

Last update: Jan 2, 2023

Related tags

Data Visualization visualization python machine-learning scikit-learn python3 xgboost tableau automl tpot automated-machine-learning auto-sklearn automl-algorithms

Overview

AutoViz

Automatically Visualize any dataset, any size with a single line of code.

AutoViz performs automatic visualization of any dataset with one line. Give any input file (CSV, txt or json) and AutoViz will visualize it.

Install
Usage
API
Maintainers
Contributing
License

Install

Prerequsites

Anaconda

To clone AutoViz, it's better to create a new environment, and install the required dependencies:

To install from PyPi:

conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
pip install autoviz

To install from source:

cd <AutoViz_Destination>
git clone git@github.com:AutoViML/AutoViz.git
# or download and unzip https://github.com/AutoViML/AutoViz/archive/master.zip
conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
cd AutoViz
pip install -r requirements.txt

Usage

Read this Medium article to know how to use AutoViz.

In the AutoViz directory, open a Jupyter Notebook and use this line to instantiate the library

from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()

Load a dataset (any CSV or text file) into a Pandas dataframe or give the name of the path and filename you want to visualize. If you don't have a filename, you can simply assign the filename argument "" (empty string).

Call AutoViz using the filename (or dataframe) along with the separator and the name of the target variable in the input. AutoViz will do the rest. You will see charts and plots on your screen.

filename = ""
sep = ","
dft = AV.AutoViz(
    filename,
    sep=",",
    depVar="",
    dfte=None,
    header=0,
    verbose=0,
    lowess=False,
    chart_format="svg",
    max_rows_analyzed=150000,
    max_cols_analyzed=30,
)

AV.AutoViz is the main plotting function in AV.

Notes:

AutoViz will visualize any sized file using a statistically valid sample.
COMMA is assumed as default separator in file. But you can change it.
Assumes first row as header in file but you can change it.

verbose option
- if 0, display minimal information but displays charts on your notebook
- if 1, print extra information on the notebook and also display charts
- if 2, will not display any charts, it will simply save them in your local machine under AutoViz_Plots directory

API

Arguments

filename - Make sure that you give filename as empty string ("") if there is no filename associated with this data and you want to use a dataframe, then use dfte to give the name of the dataframe. Otherwise, fill in the file name and leave dfte as empty string. Only one of these two is needed to load the data set.
sep - this is the separator in the file. It can be comma, semi-colon or tab or any value that you see in your file that separates each column.
depVar - target variable in your dataset. You can leave it as empty string if you don't have a target variable in your data.
dfte - this is the input dataframe in case you want to load a pandas dataframe to plot charts. In that case, leave filename as an empty string.
header - the row number of the header row in your file. If it is the first row, then this must be zero.
verbose - it has 3 acceptable values: 0, 1 or 2. With zero, you get all charts but limited info. With 1 you get all charts and more info. With 2, you will not see any charts but they will be quietly generated and save in your local current directory under the AutoViz_Plots directory which will be created. Make sure you delete this folder periodically, otherwise, you will have lots of charts saved here if you used verbose=2 option a lot.
lowess - this option is very nice for small datasets where you can see regression lines for each pair of continuous variable against the target variable. Don't use this for large data sets (that is over 100,000 rows)
chart_format - this can be SVG, PNG or JPG. You will get charts generated and saved in this format if you used verbose=2 option. Very useful for generating charts and using them later.
max_rows_analyzed - limits the max number of rows that is used to display charts. If you have a very large data set with millions of rows, then use this option to limit the amount of time it takes to generate charts. We will take a statistically valid sample.
max_cols_analyzed - limits the number of continuous vars that can be analyzed

Maintainers

Contributing

See the contributing file!

PRs accepted.

License

Apache License, Version 2.0

DISCLAIMER

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

Comments

Project logo [help wanted]

If anyone with design sensibilities sees this. We are open to changing the project logo.

We like the pandas logo for example https://github.com/pandas-dev/pandas

opened by morenoh149 18
How do we see output using a script file a terminal?

Hi AutoViML,

Firstly, congratulations and thanks for this wonderful package. This works perfectly fine with Jupyter notebooks but how do I use the same if I am using an IDE let say Spyder?

Thanks in advance. Mohit

opened by bansalism2 9
"Not able to read or load file. Please check your inputs and try again..."

hello Ram, when i run the code on my dateset, dft = av.AutoViz('', sep, target, df) i get this error "Not able to read or load file. Please check your inputs and try again..."

what could the issue be?

opened by isioma42 8
exporting the report

Similar project to AutoViz are Sweetviz and Pandas Profiling.

They could export the report as a HTML file. I wonder if this library also has this function?

opened by sunset1234321 5
some variables in data removed automatically

Hi, I gave input csv contains 20 variables,while preprocessing it removed all important columns,may i know the reason?. note:- removed columns contains fill data without null values

opened by sivanagendra123 5
Not able to read or load file using Auto viz

I am having the above error while using autoviz My target variable is price i even tried it using empty target variable still same result i dont know the problem I tried dataframe, still same issue meanwhile i as able to read my data with pandas but autoviz could no house_price.zip i attached Zip file of the data because it was large kindly assist

opened by gbogobabs 4

Suggesting Updated for Wordcloud

1. Updating Stopwords List

Currently, I can see that Stopwords are defined as a list and I can see that it is missing a few stop words like "themselves".


def return_stop_words():
    STOP_WORDS = ['it', "this", "that", "to", 'its', 'am', 'is', 'are', 'was', 'were', 'a',
                'an', 'the', 'and', 'or', 'of', 'at', 'by', 'for', 'with', 'about', 'between',
                 'into','above', 'below', 'from', 'up', 'down', 'in', 'out', 'on', 'over',
                  'under', 'again', 'further', 'then', 'once', 'all', 'any', 'both', 'each',
                   'few', 'more', 'most', 'other', 'some', 'such', 'only', 'own', 'same', 'so',
                    'than', 'too', 'very', 's', 't', 'can', 'just', 'd', 'll', 'm', 'o', 're',
                    've', 'y', 'ain', 'ma']
    add_words = ["s", "m",'you', 'not',  'get', 'no', 'via', 'one', 'still', 'us', 'u','hey','hi','oh','jeez',
                'the', 'a', 'in', 'to', 'of', 'i', 'and', 'is', 'for', 'on', 'it', 'got','aww','awww',
                'not', 'my', 'that', 'by', 'with', 'are', 'at', 'this', 'from', 'be', 'have', 'was',
                '', ' ', 'say', 's', 'u', 'ap', 'afp', '...', 'n', '\\']
    stop_words = list(set(STOP_WORDS+add_words))
    return sorted(stop_words)

Isn't it better to use NLTK stop words list??

from nltk.corpus import stopwords

for lang in langs:
  stopwords.words(lang)

Copied from: https://gist.github.com/sebleier/554280

2. Lemmatization before plotting

I think it is better if we lemmatize the data before we plot then words like "reads", "reading" will count as the same, which will give us a better word cloud.

opened by chekoduadarsh 4

Misplaced graph x ylabel

Hi Ram, I have tried this package and found out a potential bug. When I tried to do the AV.AutoViz('', ',', 'target', df) to run an autoViz stuff, the x y labels of each graph are misplaced (x label should be placed at y label and vice versa.). I have tried two datasets and it still happened. Please look into this and see if this is a bug or I just did something wrong. Thanks! Jeff

opened by HiIamJeff 4
"panel" dependency support for the latest version

requirements.txt have an dependecie "panel" in 12.x version. is too old. new version of panel is 0.14.x and have a new bugfixes.

Suggestion: panel~=0.12.6 ---> support new panel version (0.14.0)

opened by celestinoxp 3
Can not draw Wordcloud plot

Shape of your Data Set loaded: (3312, 20) ####################################################################################### ######################## C L A S S I F Y I N G V A R I A B L E S #################### ####################################################################################### Classifying variables in data set... 20 Predictors classified... 2 variables removed since they were ID or low-information variables Number of All Scatter Plots = 6 Could not draw wordcloud plot for Order ID Could not draw wordcloud plot for Product ID Could not draw wordcloud plot for Product Name Could not draw wordcloud plot for Customer ID Could not draw wordcloud plot for Customer Name Could not draw wordcloud plot for City Time to run AutoViz = 5 seconds

###################### AUTO VISUALIZATION Completed ########################

opened by tonovoh1 3

[bug] problem with time series charts

Here is minimal reproducible example with google colab:

Date time column is no recognized, when input is file:

!pip install autoviz
from autoviz.AutoViz_Class import AutoViz_Class

import pandas as pd

AV = AutoViz_Class()

df = pd.DataFrame({'time': ['2020-01-15', '2020-02-15', '2020-03-15', '2020-04-15', '2020-05-15'], 'values': [1.0,2.5,3.2,4.2,5.6]})
df['time'] = pd.to_datetime(df['time'])
df.to_csv('ts.csv', index=False)

dft = AV.AutoViz("ts.csv", verbose=2)

hape of your Data Set loaded: (5, 2)
############## C L A S S I F Y I N G  V A R I A B L E S  ####################
Classifying variables in data set...
Data Set Shape: 5 rows, 2 cols
Data Set columns info:
* time: 0 nulls, 5 unique vals, most common: {'2020-05-15': 1, '2020-03-15': 1}
* values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
--------------------------------------------------------------------
    Numeric Columns: ['values']
    Integer-Categorical Columns: []
    String-Categorical Columns: []
    Factor-Categorical Columns: []
    String-Boolean Columns: []
    Numeric-Boolean Columns: []
    Discrete String Columns: []
    NLP text Columns: []
    Date Time Columns: []
    ID Columns: ['time']
    Columns that will not be considered in modeling: []
    2 Predictors classified...
        This does not include the Target column(s)
        1 variables removed since they were ID or low-information variables
    List of variables removed: ['time']
No categorical or numeric vars in data set. Hence no bar charts.
Time to run AutoViz (in seconds) = 0.562

When input is dataframe - chart is not generated, but date time column is recognized:

dft = AV.AutoViz("", dfte=df, verbose=2)
Shape of your Data Set loaded: (5, 2)
############## C L A S S I F Y I N G  V A R I A B L E S  ####################
Classifying variables in data set...
Data Set Shape: 5 rows, 2 cols
Data Set columns info:
* time: 0 nulls, 5 unique vals, most common: {Timestamp('2020-05-15 00:00:00'): 1, Timestamp('2020-04-15 00:00:00'): 1}
* values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
--------------------------------------------------------------------
    Numeric Columns: ['values']
    Integer-Categorical Columns: []
    String-Categorical Columns: []
    Factor-Categorical Columns: []
    String-Boolean Columns: []
    Numeric-Boolean Columns: []
    Discrete String Columns: []
    NLP text Columns: []
    Date Time Columns: ['time']
    ID Columns: []
    Columns that will not be considered in modeling: []
    2 Predictors classified...
        This does not include the Target column(s)
        No variables removed since no ID or low-information variables found in data set
Could not draw Date Vars
No categorical or numeric vars in data set. Hence no bar charts.
Time to run AutoViz (in seconds) = 0.408

Expected result: chart with date on x-axis, and value on y-axis.

opened by mglowacki100 3

Clean up README

I personally found the readme very hard to read. I had to scroll to three quarters down the page to learn how to use the library. I think it would be better to move the new features at the top of the file to the rear half of the readme for advanced users, so that beginners can begin using the package easier.

opened by kevinlinxc 1
The title part at the top of the output image is cut off

When using vervose=2 to output an svg or png file, there is an issue where the top title part is cut off. There seems to be a problem with the height value setting, please check.

opened by beobest2 2

Owner

AutoViz and Auto_ViML

Automated Machine Learning: Build Variant Interpretable Machine Learning models. Project Created by Ram Seshadri.

GitHub

Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

2.3k Jan 5, 2023

Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

1.2k Feb 18, 2021

Visualize your pandas data with one-line code

PandasEcharts 简介基于pandas和pyecharts的可视化工具安装 pip 安装 $ pip install pandasecharts 源码安装 $ git clone https://github.com/gamersover/pandasecharts $ cd pand

2 Apr 13, 2022

Create 3d loss surface visualizations, with optimizer path. Issues welcome!

MLVTK A loss surface visualization tool Simple feed-forward network trained on chess data, using elu activation and Adam optimizer Simple feed-forward

7 Dec 21, 2022

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

125 Dec 24, 2022

Flame Graphs visualize profiled code

14.1k Jan 3, 2023

Generate graphs with NetworkX, natively visualize with D3.js and pywebview

webview_d3 This is some PoC code to render graphs created with NetworkX natively using D3.js and pywebview. The main benifit of this approac

68 Aug 18, 2022

Visualize tensors in a plain Python REPL using Sparklines

43 Sep 3, 2022

Visualize the bitcoin blockchain from your local node

Project Overview A new feature in Bitcoin Core 0.20 allows users to dump the state of the blockchain (the UTXO set) using the command dumptxoutset. I'

18 Sep 11, 2022

Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database

SpiderFoot Neo4j Tools Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database Step 1: Installation NOTE: This installs the sf

42 Dec 26, 2022

Extract and visualize information from Gurobi log files

GRBlogtools Extract information from Gurobi log files and generate pandas DataFrames or Excel worksheets for further processing. Also includes a wrapp

56 Nov 17, 2022

Extract data from ThousandEyes REST API and visualize it on your customized Grafana Dashboard.

ThousandEyes Grafana Dashboard Extract data from the ThousandEyes REST API and visualize it on your customized Grafana Dashboard. Deploy Grafana, Infl

16 Nov 26, 2022

A gui application to visualize various sorting algorithms using pure python.

Sorting Algorithm Visualizer A gui application to visualize various sorting algorithms using pure python. Language : Python 3 Libraries required Tkint

19 Nov 30, 2022

This is a web application to visualize various famous technical indicators and stocks tickers from user

Visualizing Technical Indicators Using Python and Plotly. Currently facing issues hosting the application on heroku. As soon as I am able to I'll like

4 Aug 4, 2022

Visualize the training curve from the *.csv file (tensorboard format).

Training-Curve-Vis Visualize the training curve from the *.csv file (tensorboard format). Feature Custom labels Curve smoothing Support for multiple c

7 Feb 23, 2022

Python package to Create, Read, Write, Edit, and Visualize GSFLOW models

pygsflow pyGSFLOW is a python package to Create, Read, Write, Edit, and Visualize GSFLOW models API Documentation pyGSFLOW API documentation can be fo

21 Dec 14, 2022

A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects.

Orbitals in Python A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects, and o

1 Nov 25, 2021

Visualize data of Vietnam's regions with interactive maps.

Plotting Vietnam Development Map This is my personal project that I use plotly to analyse and visualize data of Vietnam's regions with interactive map

1 Jun 26, 2022

Minimalistic tool to visualize how the routes to a given target domain change over time, feat. Python 3.10 & mermaid.js

1 Jan 17, 2022

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Related tags

Overview

AutoViz

Table of Contents

Install

Usage

API

Maintainers

Contributing

License

DISCLAIMER

Comments

1. Updating Stopwords List

2. Lemmatization before plotting

Owner

AutoViz and Auto_ViML

Visualize and compare datasets, target values and associations, with one line of code.

Visualize and compare datasets, target values and associations, with one line of code.

Visualize your pandas data with one-line code

Create 3d loss surface visualizations, with optimizer path. Issues welcome!

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

Flame Graphs visualize profiled code

Generate graphs with NetworkX, natively visualize with D3.js and pywebview

Visualize tensors in a plain Python REPL using Sparklines

Visualize the bitcoin blockchain from your local node

Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database

Extract and visualize information from Gurobi log files

Extract data from ThousandEyes REST API and visualize it on your customized Grafana Dashboard.

A gui application to visualize various sorting algorithms using pure python.

This is a web application to visualize various famous technical indicators and stocks tickers from user

Visualize the training curve from the *.csv file (tensorboard format).

Python package to Create, Read, Write, Edit, and Visualize GSFLOW models

A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects.

Visualize data of Vietnam's regions with interactive maps.

Minimalistic tool to visualize how the routes to a given target domain change over time, feat. Python 3.10 & mermaid.js