Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz and Auto_ViML

Last update: Dec 30, 2022

Related tags

Deep Learning python machine-learning scikit-learn python3 xgboost automl tpot automated-machine-learning autosklearn autokeras auto-viml automl-algorithms

Overview

Auto-ViML

Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare)

NEW FEATURES in this version are:
1. SMOTE -> now we use SMOTE for imbalanced data. Just set Imbalanced_Flag = True in input below
2. Auto_NLP: It automatically detects Text variables and does NLP processing on those columns
3. Date Time Variables: It automatically detects date time variables and adds extra features
4. Feature Engineering: Now you can perform feature engineering with the available featuretools library.

To upgrade to the best, most stable and full-featured version (anything over > 0.1.600), do one of the following:
Use $ pip install autoviml --upgrade --ignore-installed
or pip install git+https://github.com/AutoViML/Auto_ViML.git

Background
Install
Usage
Tips for using Auto_ViML
API
Maintainers
Contributing
License

Background

Read this Medium article to learn how to use Auto_ViML.

Auto_ViML was designed for building High Performance Interpretable Models with the fewest variables. The "V" in Auto_ViML stands for Variable because it tries multiple models with multiple features to find you the best performing model for your dataset. The "i" in Auto_ViML stands for "interpretable" since Auto_ViML selects the least number of features necessary to build a simpler, more interpretable model. In most cases, Auto_ViML builds models with 20-99% fewer features than a similar performing model with all included features (this is based on my trials. Your experience may vary).

Auto_ViML is every Data Scientist's model assistant that:

Helps you with data cleaning: you can send in your entire dataframe as is and Auto_ViML will suggest changes to help with missing values, formatting variables, adding variables, etc. It loves dirty data. The dirtier the better!
Assists you with variable classification: Auto_ViML classifies variables automatically. This is very helpful when you have hundreds if not thousands of variables since it can readily identify which of those are numeric vs categorical vs NLP text vs date-time variables and so on.
Performs feature reduction automatically. When you have small data sets and you know your domain well, it is easy to perhaps do EDA and identify which variables are important. But when you have a very large data set with hundreds if not thousands of variables, selecting the best features from your model can mean the difference between a bloated and highly complex model or a simple model with the fewest and most information-rich features. Auto_ViML uses XGBoost repeatedly to perform feature selection. You must try it on your large data sets and compare!
Produces model performance results as graphs automatically. Just set verbose = 1 (or) 2
Handles text, date-time, structs (lists, dictionaries), numeric, boolean, factor and categorical variables all in one model using one straight process.
Allows you to use the featuretools library to do Feature Engineering.
See example below.
Let's say you have a few numeric features in your data called "preds". You can 'add','subtract','multiply' or 'divide' these features among themselves using this module. You can optionally send an ID column in the data so that the index ordering is preserved.
```
from autoviml.feature_engineering import feature_engineering

print(df[preds].shape)

dfmod = feature_engineering(df[preds],['add'],'ID')

print(dfmod.shape)
```

Auto_ViML is built using scikit-learn, Nnumpy, pandas and matplotlib. It should run on most Python 3 Anaconda installations. You won't have to import any special libraries other than "XGBoost", "Imbalanced-Learn", "CatBoost", and "featuretools" library. We use "SHAP" library for interpretability.
But if you don't have these libraries, Auto_ViML will install those for you automatically.

Install

Prerequsites:

Anaconda

To clone Auto_ViML, it is better to create a new environment, and install the required dependencies:

To install from PyPi:

conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
pip install autoviml
or
pip install git+https://github.com/AutoViML/Auto_ViML.git

To install from source:

cd <AutoVIML_Destination>
git clone [email protected]:AutoViML/Auto_ViML.git
# or download and unzip https://github.com/AutoViML/Auto_ViML/archive/master.zip
conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
cd Auto_ViML
pip install -r requirements.txt

Usage

In the same directory, open a Jupyter Notebook and use this line to import the .py file:

from autoviml.Auto_ViML import Auto_ViML

Load a data set (any CSV or text file) into a Pandas dataframe and split it into Train and Test dataframes. If you don't have a test dataframe, you can simple assign the test variable below to '' (empty string):

model, features, trainm, testm = Auto_ViML(
    train,
    target,
    test,
    sample_submission,
    hyper_param="GS",
    feature_reduction=True,
    scoring_parameter="weighted-f1",
    KMeans_Featurizer=False,
    Boosting_Flag=False,
    Binning_Flag=False,
    Add_Poly=False,
    Stacking_Flag=False,
    Imbalanced_Flag=False,
    verbose=0,
)

Finally, it writes your submission file to disk in the current directory called mysubmission.csv. This submission file is ready for you to show it clients or submit it to competitions. If no submission file was given, but as long as you give it a test file name, it will create a submission file for you named mySubmission.csv. Auto_ViML works on any Multi-Class, Multi-Label Data Set. So you can have many target labels. You don't have to tell Auto_ViML whether it is a Regression or Classification problem.

Tips for using Auto_ViML:

For Classification problems and imbalanced classes, choose scoring_parameter="balanced_accuracy". It works better.
For Imbalanced Classes (<5% samples in rare class), choose "Imbalanced_Flag"=True. You can also set this flag to True for Regression problems where the target variable might have skewed distributions.
For Multi-Label dataset, the target input target variable can be sent in as a list of variables.
It is recommended that you first set Boosting_Flag=None to get a Linear model. Once you understand that, then you can try to set Boosting_Flag=False to get a Random Forest model. Finally, try Boosting_Flag=True to get an XGBoost model. This is the order that we recommend in order to use Auto_ViML.
Finally try Boosting_Flag="CatBoost" to get a complex but high performing model.
Binning_Flag=True improves a CatBoost model since it adds to the list of categorical vars in data
KMeans_featurizer=True works well in NLP and CatBoost models since it creates cluster variables
Add_Poly=3 improves certain models where there is date-time or categorical and numeric variables
feature_reduction=True is the default and works best. But when you have <10 features in data, set it to False
Do not use Stacking_Flag=True with Linear models since your results may not look great.
Use Stacking_Flag=True only for complex models and as a last step with Boosting_Flag=True or CatBoost
Always set hyper_param ="RS" as input since it runs faster than GridSearchCV and gives better results!
KMeans_Featurizer=True does not work well for small data sets. Use it for data sets > 10,000 rows.
Finally Auto_ViML is meant to be a baseline or challenger solution to your data set. So use it for making quick models that you can compare against or in Hackathons. It is not meant for production!

API

Arguments

train: could be a datapath+filename or a dataframe. It will detect which is which and load it.
test: could be a datapath+filename or a dataframe. If you don't have any, just leave it as "".
submission: must be a datapath+filename. If you don't have any, just leave it as empty string.
target: name of the target variable in the data set.
sep: if you have a spearator in the file such as "," or "\t" mention it here. Default is ",".
scoring_parameter: if you want your own scoring parameter such as "f1" give it here. If not, it will assume the appropriate scoring param for the problem and it will build the model.
hyper_param: Tuning options are GridSearch ('GS') and RandomizedSearch ('RS'). Default is 'RS'.
feature_reduction: Default = 'True' but it can be set to False if you don't want automatic feature_reduction since in Image data sets like digits and MNIST, you get better results when you don't reduce features automatically. You can always try both and see.
KMeans_Featurizer
- True: Adds a cluster label to features based on KMeans. Use for Linear.
- False (default) For Random Forests or XGB models, leave it False since it may overfit.
Boosting Flag: you have 4 possible choices (default is False):
- None This will build a Linear Model
- False This will build a Random Forest or Extra Trees model (also known as Bagging)
- True This will build an XGBoost model
- CatBoost This will build a CatBoost model (provided you have CatBoost installed)
Add_Poly: Default is 0 which means do-nothing. But it has three interesting settings:
- 1 Add interaction variables only such as x1x2, x2x3,...x9*10 etc.
- 2 Add Interactions and Squared variables such as x12, x22, etc.
- 3 Adds both Interactions and Squared variables such as x1x2, x1**2,x2x3, x2**2, etc.
Stacking_Flag: Default is False. If set to True, it will add an additional feature which is derived from predictions of another model. This is used in some cases but may result in overfitting. So be careful turning this flag "on".
Binning_Flag: Default is False. It set to True, it will convert the top numeric variables into binned variables through a technique known as "Entropy" binning. This is very helpful for certain datasets (especially hard to build models).
Imbalanced_Flag: Default is False. If set to True, it will use SMOTE from Imbalanced-Learn to oversample the "Rare Class" in an imbalanced dataset and make the classes balanced (50-50 for example in a binary classification). This also works for Regression problems where you have highly skewed distributions in the target variable. Auto_ViML creates additional samples using SMOTE for Highly Imbalanced data.
verbose: This has 3 possible states:
- 0 limited output. Great for running this silently and getting fast results.
- 1 more charts. Great for knowing how results were and making changes to flags in input.
- 2 lots of charts and output. Great for reproducing what Auto_ViML does on your own.

Return values

model: It will return your trained model
features: the fewest number of features in your model to make it perform well
train_modified: this is the modified train dataframe after removing and adding features
test_modified: this is the modified test dataframe with the same transformations as train

Maintainers

Contributing

See the contributing file!

PRs accepted.

License

DISCLAIMER

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

Comments

getting ValueError when running notebook with XGBoost on Titanic dataset.

Hi,

Thanks for sharing your work! I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below- m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)

Once I ran the above code then found below error- ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

It seems same error occurs in case of Boosting_Flag=None. Logs of the console just prior to error is as below-

Train (Size: 891,12) has Single_Label with target: ['Survived'] " ################### Binary-Class ##################### " Shuffling the data set before training Class -> Counts -> Percent 1: 342 -> 38.4% 0: 549 -> 61.6% Selecting 2-Class Classifier... Using GridSearchCV for Hyper Parameter tuning... Target Survived is already numeric. No transformation done. Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked'] and their missing value totals: [687, 177, 2] Classifying variables in data set... Number of Numeric Columns = 2 Number of Integer-Categorical Columns = 3 Number of String-Categorical Columns = 1 Number of Factor-Categorical Columns = 0 Number of String-Boolean Columns = 1 Number of Numeric-Boolean Columns = 0 Number of Discrete String Columns = 2 Number of NLP String Columns = 0 Number of Date Time Columns = 0 Number of ID Columns = 2 Number of Columns to Delete = 0 11 Predictors classified... This does not include the Target column(s) 2 variables removed since they were some ID or low-information variables Completed Label Encoding, Missing Value Imputing and Scaling of data without errors. No Missing values in Train Test data has no missing values Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data

Data Ready for Modeling with Target variable = Survived Starting Selection among 11 predictors... Number of numeric variables = 5 No variables were removed since no highly correlated variables found in data Adding 6 categorical variables to reduced numeric variables of 5 Selected No. of variables = 11 Finding Important Features... in 11 variables

opened by dsbyprateekg 17

TypeError when performing Auto_NLP for a regression problem

Hi AutoViML community,

Thank you for providing this amazing package.

I am trying my hands on a regression problem and ended up with typeerror. Please note that the same can be replicated in a kaggle kernel. The code and error is shared below for your reference.

Thanks -

nlp_column = 'Product_Information'
target = 'Product_Price'
train_nlp, test_nlp, nlp_transformer, preds = Auto_NLP(
                nlp_column, train, test, target, score_type='neg_mean_squared_error',
                modeltype='Regression',top_num_features=50, verbose=2,
                build_model=True)

error:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-1ec6fbec18bb> in <module>
      4                 nlp_column, train, test, target, score_type='neg_mean_squared_error',
      5                 modeltype='Regression',top_num_features=50, verbose=2,
----> 6                 build_model=True)

/opt/conda/lib/python3.7/site-packages/autoviml/Auto_NLP.py in Auto_NLP(nlp_column, train, test, target, score_type, modeltype, top_num_features, verbose, build_model)
   1197         gs = RandomizedSearchCV(nlp_model,params, n_iter=10, cv=scv,
   1198                                 scoring=score_type, random_state=seed)
-> 1199         gs.fit(X_train_dtm,y_train)
   1200         y_pred = gs.predict(X_test_dtm)
   1201         ##### Print the model results on Cross Validation data set (held out)

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    737             refit_start_time = time.time()
    738             if y is not None:
--> 739                 self.best_estimator_.fit(X, y, **fit_params)
    740             else:
    741                 self.best_estimator_.fit(X, **fit_params)

/opt/conda/lib/python3.7/site-packages/sklearn/linear_model/_least_angle.py in fit(self, X, y, Xy)
    955             returns an instance of self.
    956         """
--> 957         X, y = check_X_y(X, y, y_numeric=True, multi_output=True)
    958 
    959         alpha = getattr(self, 'alpha', 0.)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    753                     ensure_min_features=ensure_min_features,
    754                     warn_on_dtype=warn_on_dtype,
--> 755                     estimator=estimator)
    756     if multi_output:
    757         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    509                                       dtype=dtype, copy=copy,
    510                                       force_all_finite=force_all_finite,
--> 511                                       accept_large_sparse=accept_large_sparse)
    512     else:
    513         # If np.array(..) gives ComplexWarning, then we convert the warning

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
    304 
    305     if accept_sparse is False:
--> 306         raise TypeError('A sparse matrix was passed, but dense '
    307                         'data is required. Use X.toarray() to '
    308                         'convert to a dense numpy array.')

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

opened by abdul0807 5

Getting unboundlocalerror when setting Boosting_flag=True

Getting error when trying to use xgboost.

File "C:\ProgramData\Anaconda3\lib\site-packages\autoviml\Auto_ViML.py", line 1242, in Auto_ViML print('%d-fold Cross Validation %s = %0.1f%%' %(n_splits,scoring_parameter, best_score*100))

UnboundLocalError: local variable 'best_score' referenced before assignment

opened by Naseer5543 5
running Auto_ViML with hyper_param='HO' throws an exception.
Im testing Auto_ViML with hyperopt on the titanic dataset, running it like this:

train, target, test, verbose=0, hyper_param='HO', )``` This throws the following exception:

... ############### M O D E L B U I L D I N G B E G I N S #################### Rows in Train data set = 640 Features in Train data set = 10 Rows in held-out data set = 161 Finding Best Model and Hyper Parameters for Target: Survived... Baseline Accuracy Needed for Model = 62.17% CPU Count = 8 in this device Using Linear Model, Estimated Training time = 0.02 mins Error: Not able to print validation metrics. Continuing... Actual training time (in seconds): 0 ########### S I N G L E M O D E L R E S U L T S #################

UnboundLocalError Traceback (most recent call last) /tmp/core/run_auto-viml.py in 42 train, target, test, 43 verbose=0, ---> 44 hyper_param='HO', 45 ) 46

/usr/local/lib/python3.7/site-packages/autoviml/Auto_ViML.py in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, verbose) 1725 ############## This is for Classification Only !! ######################## 1726 if scoring_parameter in ['logloss','neg_log_loss','log_loss','log-loss','']: -> 1727 print('{}-fold Cross Validation {} = {}'.format(n_splits, 'logloss', best_score)) 1728 elif scoring_parameter in ['accuracy','balanced-accuracy','balanced_accuracy','roc_auc','roc-auc', 1729 'f1','precision','recall','average-precision','average_precision',

UnboundLocalError: local variable 'best_score' referenced before assignment

specifying the evaluation metric doesnt fix the issue. Im using `autoviml==0.1.651`.
opened by manugarri 4
How are you handling preprocessing steps during prediction?

Hi @rsesha

Currently Auto_ViML function is returning best model (XGB), features (Array), train metrics and test metrics. But how you suggesting to handling the preprocessing in the prediction dataset?

For Example, if you are applying LabelEncoding on a column inside the Auto_ViML function during training

opened by deneshkumar 4

ModuleNotFoundError: No module named 'lightgbm'

I've installed from pip. I'm using py 3.9.

When attempting:

train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,score_type="balanced_accuracy",top_num_features=100,modeltype="Classification", verbose=2, build_model=True)

I got:

[nltk_data]    | 
[nltk_data]  Done downloading collection popular

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [35], in <cell line: 1>()
----> 1 train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,score_type="balanced_accuracy",top_num_features=100,modeltype="Classification", verbose=2, build_model=True)

File /usr/local/lib/python3.9/site-packages/autoviml/Auto_NLP.py:1334, in Auto_NLP(nlp_column, train, test, target, score_type, modeltype, top_num_features, verbose, build_model)
   1332 nltk.download("popular")
   1333 calibrator_flag = False
-> 1334 from lightgbm import LGBMClassifier, LGBMRegressor
   1335 seed = 99
   1336 train = copy.deepcopy(train)

ModuleNotFoundError: No module named 'lightgbm'

A missing dependency?

opened by emsi 3

UnboundLocalError: local variable 'missing_cols' referenced before assignment`

I'm getting the following error:

`Filling missing values with "missing" placeholder and adding a column for missing_flags

UnboundLocalError Traceback (most recent call last) in 12 Stacking_Flag=False, 13 Imbalanced_Flag=True, ---> 14 verbose=2, 15 )

/sas_cambrian/Projects/hamilton_attrib_276597/rxb427/python3_env/lib/python3.6/site-packages/autoviml/Auto_ViML.py in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, verbose) 754 preds.append(new_missing_col) 755 missing_flag_cols.append(new_missing_col) --> 756 elif f in missing_cols: 757 #### YOu have to do nothing for missing column yet. Leave them as is for Iterative Imputer later ############## 758 continue

UnboundLocalError: local variable 'missing_cols' referenced before assignment`

I didn't have time to dig too deep in the code. But it does appear the missing_cols gets defined in an if statement that appears to be checking if the test dataset exist before it runs. So it appears there is a way that you can get to the 756 check without providing a test dataset.

So either the defining of the missing_cols needs to change or you need to fix the logical error that allows the f in missing_cols check. You could also just get rid of that check since it doesn't do anything right now.

opened by r0bb23 3

KeyError: "['index'] not in index" while saving results

Hello Team, I ran a simple regression model with the following

model, features, trainm, testm = Auto_ViML(
    train2.reset_index(),
    "Total_Effort",
    x_test,
    "",
    hyper_param="GS",
    feature_reduction=True,
    scoring_parameter="weighted-f1",
    KMeans_Featurizer=True,
    Boosting_Flag=False,
    Binning_Flag=False,
    Add_Poly=False,
    Stacking_Flag=False,
    Imbalanced_Flag=False,
    verbose=2,
)

The error stack -

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-41-58032da7fcff> in <module>
     13     Stacking_Flag=False,
     14     Imbalanced_Flag=False,
---> 15     verbose=2,
     16 )

~\Anaconda3\lib\site-packages\autoviml\Auto_ViML.py in Auto_ViML(train, target, test, sample_submission, hyper_param, feature_reduction, scoring_parameter, Boosting_Flag, KMeans_Featurizer, Add_Poly, Stacking_Flag, Binning_Flag, Imbalanced_Flag, verbose)
   2587         #############################################################################################
   2588         if isinstance(sample_submission, str):
-> 2589             sample_submission = testm[id_cols+[each_target+'_predictions']]
   2590         try:
   2591             write_file_to_folder(sample_submission, each_target, each_target+'_'+modeltype+'_'+'submission.csv')

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2999             if is_iterator(key):
   3000                 key = list(key)
-> 3001             indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
   3002 
   3003         # take() does not accept boolean indexers

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter, raise_missing)
   1283                 # When setting, missing keys are not allowed, even with .loc:
   1284                 kwargs = {"raise_missing": True if is_setter else raise_missing}
-> 1285                 return self._get_listlike_indexer(obj, axis, **kwargs)[1]
   1286         else:
   1287             try:

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1090 
   1091         self._validate_read_indexer(
-> 1092             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1093         )
   1094         return keyarr, indexer

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1183             if not (self.name == "loc" and not raise_missing):
   1184                 not_found = list(set(key) - set(ax))
-> 1185                 raise KeyError("{} not in index".format(not_found))
   1186 
   1187             # we skip the warning on Categorical/Interval

KeyError: "['index'] not in index"

Environment -

Windows 10
Python Version: 3.7.6
AutoViz_Class version: 0.0.68**

I couldn't get time to look inside the module but it looks like this has something to do with File Lock as it saves the file Total_Effort_Regression_test_modified.csv which is lock under the process.

opened by shivam-kotwalia 2

running Auto_ViML with python interpreter throws an ipython exception.
Im testing Auto_Viml as one of automl containers in my pipeline. However, running Auto_Viml function from a script (or importing it from standard python REPL) throws the following exception:

from autoviml.Auto_ViML import Auto_ViML File "/usr/local/lib/python3.7/site-packages/autoviml/Auto_ViML.py", line 29, in <module> get_ipython().magic(u'matplotlib inline') NameError: name 'get_ipython' is not defined

This can be fixed by either using a jupyter notebook (which i imagine is the only thing tested so far) or using ipython instead of python. This is ok for toy examples but in production systems python is the default executable.

Would make sense to make ipython magic not fail.

I am using autoviml==0.1.651
opened by manugarri 2
how to load the Catboost model for predicting new data

I used the Catboost to train the model and saved the trained model by using the code (m.save_model('Catboost.dump')).

However, when I failed to load such a saved model for predicting the new data unless I trained the model again. I used the code (m.load_model('Catboost.dump')); however, the bug is name "m" is not defined.

The question is how to load such a model for predicting new data without training the model each time.

Thanks!

opened by HazelSCUT 1

Error saving output files to disk

I found this error with saving output files to disk. Maybe a sanitization function for the target should help. Thanks, Daniel

Saving predictions to .\Avg_Quadrat_Yield(t/ha)\Avg_Quadrat_Yield(t/ha)_Regression_test_modified.csv
    Error: Not able to save test modified file. Skipping...
    Saving predictions to .\Avg_Quadrat_Yield(t/ha)\Avg_Quadrat_Yield(t/ha)_Regression_submission.csv
    Error: Not able to save submission file. Skipping...
    Saving predictions to .\Avg_Quadrat_Yield(t/ha)\Avg_Quadrat_Yield(t/ha)_Regression_train_modified.csv
    Error: Not able to save train modified file. Skipping...

opened by Daniel-Trung-Nguyen 1

Sample Weight Support for Regression Problems

First want to say thank you for the very interesting looking library. I've tried it briefly, and gotten very strong performance.

I wanted to ask whether it would be possible to add sample weight support for regression problems. This is typically done in scikit-learn estimators by simply passing a sample_weight parameter after X, and y. For example, LinearRegression, XGBoost, or Catboost all support the same API, so I'm hopeful this is a fairly straightforward addition.

Under the hood it's typically just multiplying the loss for each row by the sample weight, in order to give certain observations more weight than others. This can be very helpful for problems where you have sensor data with different quality sensors, or for simply downweighting older observations.

opened by kmedved 3
groups in cross validation?

I would like to use the group params in the cross validation like in sklearn ? Is it possible to do so ?

Otherwise, thank you for your package, this is a very cool work here, especially the visualization part.

opened by crsegerie 2
Training model errors out without context/stacktrace.
Training regular model first time is Erroring: Check if your Input is correct... --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-32-d59141079a76> in <module> 12 Stacking_Flag=False, 13 Imbalanced_Flag=False, ---> 14 verbose=0, 15 ) TypeError: 'NoneType' object is not iterable

I DO think there is probably something wrong with the data I'm feeding in or how I'm feeding it in. However, The fact that only the error is printed and there is no context around where, what, etc. is erroring out is why I'm raising this to an issue. Its near impossible to figure out what is wrong with sparse information provided.

In short, you should probably surface the error from the training of the model here with the stacktrace.
opened by r0bb23 1

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Related tags

Overview

Auto-ViML

Table of Contents

Background

Install

Usage

Tips for using Auto_ViML:

API

Maintainers

Contributing

License

DISCLAIMER

Comments

`Filling missing values with "missing" placeholder and adding a column for missing_flags

Owner

AutoViz and Auto_ViML

This repo will contain code to reproduce and build upon understanding transfer learning

Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Pytoydl: A toy deep learning framework built upon numpy.

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

I have created this Virtual Paint Program, in this you can paint(draw) on your screen using hand gestures, created in Python-3 using OpenCV and Mediapipe library. Gestures :- Index Finger for drawing and Index+Middle Finger for changing position and objects.

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Object detection on multiple datasets with an automatically learned unified label space.

AoT is a system for automatically generating off-target test harness by using build information.

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code