StackNet is a computational, scalable and analytical Meta modelling framework

Overview

StackNet

This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science department at UCL. My PhD was sponsored by dunnhumby.

StackNet is empowered by H2O's agorithms

(NEW) There is a Python implementation of StackNet

StackNet and other topics can now be discussed on FaceBook too :

Contents

Alt text

What is StackNet

StackNet is a computational, scalable and analytical framework implemented with a software implementation in Java that resembles a feedforward neural network and uses Wolpert's stacked generalization [1] in multiple levels to improve accuracy in machine learning problems. In contrast to feedforward neural networks, rather than being trained through back propagation, the network is built iteratively one layer at a time (using stacked generalization), each of which uses the final target as its target.

The Sofware is made available under MIT licence.

[1] Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.

How does it work

Given some input data, a neural network normally applies a perceptron along with a transformation function like relu, sigmoid, tanh or others.

The StackNet model assumes that this function can take the form of any supervised machine learning algorithm

Logically the outputs of each neuron, can be fed onto next layers.

The algorithms can be classifiers or regressors or any estimator that produces an output..

For classification problems, to create an output prediction score for any number of unique categories of the response variable, all selected algorithms in the last layer need to have outputs dimensionality equal to the number those unique classes. In case where there are many such classifiers, the results is the scaled average of all these output predictions and can be written as:

The Modes

The stacking element of the StackNet model could be run with two different modes.

Normal stacking mode

The first mode (e.g. the default) is the one already mentioned and assumes that in each layer uses the predictions (or output scores) of the direct previous one similar with a typical feedforward neural network or equivalently:

Restacking mode

The second mode (also called restacking) assumes that each layer uses previous neurons activations as well as all previous layers neurons (including the input layer). Therefore the previous formula can be re-written as:

The intuition behind this mode is derived from the fact that the higher level algorithm has extracted information from the input data, but rescanning the input space may yield new information not obvious from the first passes. This is also driven from the forward training methodology discussed below and assumes that convergence needs to happen within one model iteration.

The modes may also be viewed bellow:

Alt text

K-fold Training

The typical neural networks are most commonly trained with a form of backpropagation, however, stacked generalization requires a forward training methodology that splits the data into two parts – one of which is used for training and the other for predictions. The reason this split is necessary is to avoid overfitting .

However splitting the data into just two parts would mean that in each new layer the second part needs to be further dichotomized increasing the bias as each algorithm will have to be trained and validated on increasingly fewer data. To overcome this drawback, the algorithm utilises a k-fold cross validation (where k is a hyperparameter) so that all the original training data is scored in different k batches thereby outputting n shape training predictions where n is the size of the samples in the training data. Therefore the training process consists of two parts:

  1. Split the data k times and run k models to output predictions for each k part and then bring the k parts back together to the original order so that the output predictions can be used in later stages of the model.

  2. Rerun the algorithm on the whole training data to be used later on for scoring the external test data. There is no reason to limit the ability of the model to learn using 100% of the training data since the output scoring is already unbiased (given that it is always scored as a holdout set).

The K-fold train/predict process is illustrated below:

Alt text

It should be noted that (1) is only applied during training to create unbiased predictions for the second layers model to fit one. During the scoring time (and after model training is complete) only (2) is in effect.

All models must be run sequentially based on the layers, but the order of the models within the layer does not matter. In other words, all models of layer one need to be trained to proceed to layer two but all models within the layer can be run asynchronously and in parallel to save time. The k-fold may also be viewed as a form of regularization where a smaller number of folds (but higher than 1) ensure that the validation data is big enough to demonstrate how well a single model could generalize. On the other hand higher k means that the models come closer to running with 100% of the training and may yield more unexplained information. The best values could be found through cross-validation. Another possible way to implement this could be to save all the k models and use the average of their predicting to score the unobserved test data, but this has all the models never trained with 100% of the training data and may be suboptimal.

Some Notes about StackNet

StackNet is (commonly) better than the best single model it contains in each first layer however, its ability to perform well still relies on a mix of strong and diverse single models in order to get the best out of this Meta modelling methodology.

StackNet (methodology - not the software) was also used to win the Truly Native data modelling competition hosted by the popular data science platform Kaggle in 2015

StackNet in simple terms is also explained in kaggle's blog

Network's example:

Alt text

StackNet is made available now with a handful of classifiers and regressors. The implementations are based on the original papers and software. However, most have some personal tweaks in them.

Algorithms contained

Native

Native - Not fully developed

  • knnClassifier
  • knnRegressor
  • KernelmodelClassifier
  • KernelmodelRegressor

Wrappers

H2O

Python

Sklearn(New)

Keras

Generic for user defined scripts (New)

Algorithm's Tuning parameters

For the common models, have a look at:

parameters

Run StackNet

You can do so directly from the jar file, using Java higher than 1.6. You need to add Java as an environmental variable (e.g., add it to PATH).

The basic format is:

Java –jar stacknet.jar [train or predict] [task=regression or classification]  [parameter = value]

Installations

This sections explains how to install the different external tools StackNet uses in its ensemble.

Install Xgboost

Awesome xgboost can be used as a subprocess now in StackNet. This would require privileges to save and change files where the .jar is executed.

It is already pre-compiled for windows(64), mac and linux.

verify that the 'lib' folder os in the same directory where the StackNet.jar file is. By default it should be there when you do git clone

for linux and mac you most probably need to change privileges for the executable :

cd lib/
cd linux/
cd xg/
chmod +x xgboost

You can test that it works with : ./xgboost

It should print :

Usage: <config>

In windows and mac the behaviour should be similar. After executing xgboost from inside the lib/your_operation_system/xg/ you should see the:

Usage: <config>

If you don't see this, then you need to compile it manually and drop the executables inside lib/your_operation_system/xg/ .

You may find the follwing sources usefull:

Small Note: The user would need to delete the '.mod' files from inside the model/ folder when no longer need them. StackNet does not do that automatically as it is not possible to determine when they are not needed anymore.

IMPORTANT NOTE: This implementation does not include all Xgboost's features and the user is advised to use it directly from source to exploit its full potential. Also the version included is 6.0 and it is not certain whether it will be updated in the future as it required manual work to find all libraries and files required that need to be included for it to run. The performance and memory consumption will also be worse than running it directly from source. Additionally the descritpion of the parameters may not match the one in the offcial website, hence it is advised to use xgboost's online parameter thread in github for more information about them.

Install lightGBM

lightGBM can be used as a subprocess now in StackNet. This would require privileges to save and change files where the .jar is executed.

It is already pre-compiled for windows(64), mac and linux.

Verify that the 'lib' folder is in the same directory where the StackNet.jar file is. By default it should be there when you do git clone

for linux and mac you most probably need to change privileges for the executable :

cd lib/
cd linux/
cd lightgbm/
chmod +x lightgbm

You can test that it works with : ./lightgbm

It should print something in the form of:

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Fatal] No training/prediction data, application quit
Met Exceptions:
No training/prediction data, application quit

In windows and mac the behaviour should be similar. After executing lightgbm from inside the lib/your_operation_system/lightgbm/ you should see the:

[LightGBM] [Info] Finished loading parameters...

If you don't see this, then you need to compile it manually and drop the executables inside lib/your_operation_system/lightgbm/ .

You may find the follwing sources usefull:

Install LightGBM

Small Note: The user would need to delete the '.mod' files from inside the model/ folder when no longer need them. StackNet does not do that automatically as it is not possible to determine when they are not needed anymore.

IMPORTANT NOTE: This implementation does not include all LightGBM's features and the user is advised to use it directly from source to exploit its full potential. it is not certain whether it will be updated in the future as it required manual work to find all libraries and files required that need to be included for it to run. The performance and memory consumption will also be worse than running it directly from source. Additionally the descritpion of the parameters may not match the one in the offcial website, hence it is advised to use LightGBM's online parameter thread in github for more information about them.

Install H2O Algorithms

All the required jars are already packaged within the StackNet jar, however the user may find them inside the repo too.

No special installation is required , but experimentally system protection might be blocking it , therefore make certain that the StackNet.jar is in the exceptions (firewall).

Additionally the first time StackNet uses an H2o Algorithm within the ensemble it takes more time (in comparison to every other time) because it sets up a cluster .

Install Fast_rgf

fast_rgf can be used as a subprocess now in StackNet. This would require privileges to save and change files where the .jar is executed.

It is already pre-compiled for windows(64), mac and linux.

Verify that the 'lib' folder is in the same directory where the StackNet.jar file is. By default it should be there when you do git clone

for linux and mac you most probably need to change privileges for the executable :

cd lib/
cd linux/
cd frgf/
chmod +x forest_train
chmod +x forest_predict

You can test that it works with : ./forest_train

It should print something in the form of:

using up to x threads

In windows and mac the behaviour should be similar. After executing forest_train from inside the lib/your_operation_system/frgf/ you should see the:

using up to x threads...

If you don't see this, then you need to compile it manually and drop the executables inside lib/your_operation_system/frgf/ .

If you need to make the compilling manually for windows, you may find useful to download cmake fom :

Install cmake

and use mingw32-make.exe as a compiler.

Small Note: The user would need to delete the '.mod' files from inside the model/ folder when no longer need them. StackNet does not do that automatically as it is not possible to determine when they are not needed anymore.

IMPORTANT NOTE: This implementation does not include all fast_rgf's features and the user is advised to use it directly from source to exploit its full potential. it is not certain whether it will be updated in the future as it required manual work to find all libraries and files required that need to be included for it to run. The performance and memory consumption will also be worse than running it directly from source. Additionally the descritpion of the parameters may not match the one in the offcial website, hence it is advised to use fast_rgf's online parameter thread in github for more information about them.

Install Sklearn Algorithms

To install Sklearn in StackNet you need python higher-equal-to 2.7. Python needs to be found in PATH as StackNet makes subprocesses in the command line. This would require privileges to save and change files where the .jar is executed.

verify that the 'lib' folder is in the same directory where the StackNet.jar file is

Once Python is installed and can be found on PATH, the user needs to isnstall sklearn version 0.18.2 .

The following should do the trick in linux and mac.

pip install scipy
pip install sklearn

For an easier installation in windows, the user could download Anaconda and make certain to check the Add Anaconda's python to PATH when it shows up during the installation.

All sklearn python scripts executed by StackNet are put in lib/python/

Install Python Generic Algorithms

This a new feature that allows the user to run his/her own models as long as all libraries required can be found in his/her system when calling python. Assuming python is installed as explained in sklearn version above, the user may have a look inside lib/python/.

The scripts PythonGenericRegressor0.py and PythonGenericClassifier0.py are sample scripts that show how to format these models. The '0' is the main hyper parameter (called index) of the model PythonGenericRegressor (or PythonGenericClassifier). The data gets loaded in sparse format, but after this the user could add whetver he/she wants.

One could make many scritps and name them PythonGenericRegressor1,PythonGenericRegressor2...PythonGenericRegressorN and call them as:

PythonGenericRegressor index:1 seed:1 verbose:False 
PythonGenericRegressor index:2 seed:1 verbose:False 
PythonGenericRegressor index:N seed:1 verbose:False 

Once again Verify that the 'lib' folder in the same directory where the StackNet.jar file is.

Install original libfm

libFM can be used as a subprocess now in StackNet. This would require privileges to save and change files where the .jar is executed.

It is already pre-compiled for windows(64), mac and linux. Note for windows libfm is compiled with cygwin

Verify that the 'lib' folder is in the same directory where the StackNet.jar file is. By default it should be there when you do git clone

for linux and mac you most probably need to change privileges for the executable :

cd lib/
cd linux/
cd libfm/
chmod +x libfm

You can test that it works with : ./libfm

It should print something in the form of:

libFM
  Version: 1.4.2
   ...
   ...

In windows and mac the behaviour should be similar. After executing libfm from inside the lib/your_operation_system/libfm/ you should see the same.

If you don't see this, then you need to compile it manually and drop the executables inside lib/your_operation_system/libfm/.

You may find the follwing sources usefull:

libfm manual

IMPORTANT NOTE: This implementation may not include all libFM features plus it actually uses a version of it that had a bug on purpose. You can find more information about why this was chosen in the following python wrapper for libFM. It basically had this bug that was allowing you to get the parameters of the trained models for all training methods. These parameters are now extracted once a model has been trained and the scoring uses only these parameters (e.g. not the libFM executable).

Also, multiclass problems are formed as binary 1-vs-all.

Bear in mind the licence of libfm. If you find it useful, cite the following paper : Rendle, S. (2012). Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST), 3(3), 57.Chicago . Link

Install vowpal wabbit

vowpal wabbit can be used as a subprocess now in StackNet. This would require privileges to save and change files where the .jar is executed.

It is already pre-compiled for windows(64) and linux.

Mac was more difficult than expected and generally there is a lack of expertise working with Mac. If someone could help here, please email me at [email protected].

For mac, you have to install vowpal wabbit from source and drop the executable in lib/mac/vw/. Consider the following link. brew install vowpal-wabbit will most probably do the trick. If that does not work, you may execute thelib/mac/vw/script.sh. This is not advised though as it will override some files you may have already installed - use it as a last resort.

Verify that the 'lib' folder is in the same directory where the StackNet.jar file is. By default it should be there when you do git clone

for linux and mac you most probably need to change privileges for the executable :

cd lib/
cd linux/
cd vw/
chmod +x vw

You can test that it works with : ./vw

It should print something in the form of:

	Num weight bits = 18
	learning rate = 0.5
	initial_t = 0
    ...

In windows and mac the behaviour should be similar. After executing vw from inside the lib/your_operation_system/vw/ you should see the same.

If you don't see this, then you need to compile it manually and drop the executables inside lib/your_operation_system/vw/.

You may find the follwing sources usefull:

Download suggestions

IMPORTANT NOTE: This implementation may not include all Vowpal Wabbit features and the user is advised to use it directly from the source. Also the version may not be the final and it is not certain whether it will be updated in the future as it required manual work to find all libraries and files required that need to be included for it to run. The performance and memory consumption will also be worse than running directly. Additionally the descritpion of the parameters may not match the one in the website, hence it is advised to use VW's online parameter thread in github for more information about them.

Install libffm

libffm can be used as a subprocess now in StackNet. This would require privileges to save and change files where the .jar is executed.

It is already pre-compiled for windows(64), mac and linux.

Verify that the 'lib' folder is in the same directory where the StackNet.jar file is. By default it should be there when you do git clone

for linux and mac you most probably need to change privileges for the executable :

cd lib/
cd linux/
cd libffm/
chmod +x ffm-train
chmod +x ffm-predict

You can test that it works with : ./ffm-train

It should print something in the form of:

usage: ffm-train [options] training_set_file [model_file]

options:
-l <lambda>: set regularization parameter (default 0.00002)
-k <factor>: set number of latent factors (default 4)
-t <iteration>: set number of iterations (default 15)

You should also test that it works with : ./ffm-predict

It should print:

usage: ffm-predict test_file model_file output_file

In windows and mac the behaviour should be similar. After executing ffm-train or ffm-predict from inside the lib/your_operation_system/libffm/ you should see the same results.

If you don't see this, then you need to compile it manually and drop the executables inside lib/your_operation_system/libffm/ .

You may find the follwing sources usefull:

Install libffm . Search for Installation ... and OpenMP and SSE ...

Small Note: The user would need to delete the '.mod' files from inside the model/ folder when no longer need them. StackNet does not do that automatically as it is not possible to determine when they are not needed anymore.

IMPORTANT NOTE: This implementation may not include all libffm features and the user is advised to use it directly from the source. Also the version may not be the final and it is not certain whether it will be updated in the future as it required manual work to find all libraries and files required that need to be included for it to run. The performance and memory consumption will also be worse than running directly . Additionally the descritpion of the parameters may not match the one in the website, hence it is advised to use libffm online parameter thread in github for more information about them. Also, multiclass problems are formed as binary 1-vs-all.

Command Line Parameters

Command Explanation
task could be either regression or classification.
sparse True if the data to be imported are in sparse format (libsvm) or dense (false)
has_head True if train_file and test_file have headers else false
model Name of the output model file.
pred_file Name of the output prediction file.
train_file Name of the training file.
test_file Name of the test file.
output_name Prefix of the models to be printed per iteration. This is to allow the Meta features of each iteration to be printed. Defaults to nothing.
data_prefix prefix to be used when the user supplies own pairs of [X_train,X_cv] datasets for each fold as well as an X file for the whole training data. This is particularly useful for when likelihood features are needed or generally features than must be computed within cv. Each train/valid pair is identified by prefix_train[fold_index_starting_from_zero].txt/prefix_cv[fold_index_starting_from_zero].txt and prefix_train.txt for the final set. For example if prefix=mystack and folds=2 then stacknet is expecting 2 pairs of train/cv files. e.g [[mystack_train0.txt,mystack_cv0.txt],[mystack_train1.txt,mystack_cv1.txt]]. It also expects a [mystack_train.txt] for the final train set. These files can be either dense or sparse ( when 'sparse=True') and need to have the target variable in the beginning. If you use output_name to extract the predictions, these will be stacked vertically in the same order as the cv files.
indices_name A prefix. When given any value it prints a .csv file for each fold with the corresponding train(0) and valiation(1) indices stacked vertically .The format is “row_index,[0 if train else 1 for validation]”. First it prints the train indices and then the validation indices in exactly the same order as they appear when modelling inside StackNet.
input_index (New) Name of file to load in order to form the train and cv indices during kfold cross validation. This overrides the internal process for generating kfolds and ignores the given folds. Each row needs to contain an integer in that file. Row size of the file needs to be the same as the train_file. It should not contain headers. one line=one integer - the indice of the validation fold the case belongs to.There is an example
include_target (New) True to enable printing the target column in the output file for train holdout predictions (when output_name is not empty).
test_target True if the test file has a target variable in the beginning (left) else false (only predictors in the file).
params Parameter file where each line is a model. empty lines correspond to the creation of new levels
verbose True if we need StackNet to output its progress else false
threads Number of models to run in parallel. This is independent of any extra threads allocated from the selected algorithms. e.g. it is possible to run 4 models in parallel where one is a randomforest that runs on 10 threads (it selected).
metric Metric to output in cross validation for each model-neuron. can be logloss, accuracy or auc (for binary only) for classification and rmse ,rsquared or mae for regerssion .defaults to 'logloss' for classification and 'rmse' for regression.
stackdata True for restacking else false
seed Integer for randomised procedures
bins A parameter that allows classifiers to be used in regression problems. It first bins (digitises) the target variable and then runs classifiers on the transformed variable. Defaults to 2.
folds Number of folds for re-usable kfold

Parameters' File

In The parameter file, each line is a model. When there is an empty line then any new algorithm is used in the next level. This is a sample format. Note this file accepts comments (#). Anything on the right of the # symbol is ignored.(New)

LogisticRegression C:1 Type:Liblinear maxim_Iteration:100 scale:true verbose:false
RandomForestClassifier bootsrap:false estimators:100 threads:5 logit.offset:0.00001 verbose:false cut_off_subsample:1.0 feature_subselection:1.0 gamma:0.00001 max_depth:8 max_features:0.25 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1
GradientBoostingForestClassifier estimators:100 threads: offset:0.00001 verbose:false trees:1 rounding:2 shrinkage:0.05 cut_off_subsample:1.0 feature_subselection:0.8 gamma:0.00001 max_depth:8 max_features:1.0 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:RMSE row_subsample:0.9 seed:1
Vanilla2hnnclassifier UseConstant:true usescale:true seed:1 Type:SGD maxim_Iteration:50 C:0.000001 learn_rate:0.009 smooth:0.02 h1:30 h2:20 connection_nonlinearity:Relu init_values:0.02
LSVC Type:Liblinear threads:1 C:1.0 maxim_Iteration:100 seed:1
LibFmClassifier lfeatures:3 init_values:0.035 smooth:0.05 learn_rate:0.1 threads:1 C:0.00001 maxim_Iteration:15 seed:1
NaiveBayesClassifier usescale:true threads:1 Shrinkage:0.1 seed:1 verbose:false
XgboostRegressor booster:gbtree objective:reg:linear num_round:100 eta:0.015 threads:1 gamma:2.0 max_depth:4 subsample:0.8 colsample_bytree:0.4 seed:1 verbose:false
XgboostRegressor booster:gblinear objective:reg:gamma num_round:500 eta:0.5 threads:1 lambda:1 alpha:1 seed:1 verbose:false

RandomForestClassifier estimators=1000 rounding:3 threads:4 max_depth:6 max_features:0.6 min_leaf:2.0 Objective:ENTROPY gamma:0.000001 row_subsample:1.0 verbose:false copy=false

Tip: To tune a single model, one may choose an algorithm for the first layer and a dummy one for the second layer. StackNet expects at least two algorithms, so with this format the user can visualize the performance of single algorithm inside the K-fold. For example, if I wanted to tune a Random Forest Classifier, I would put it in the first line (layer) and also put any model (lets say Logistic Regression) in the second layer and could break the process immediately after the first layer kfold is done:

RandomForestClassifier bootsrap:false estimators:100 threads:5 logit.offset:0.00001 verbose:false cut_off_subsample:1.0 feature_subselection:1.0 gamma:0.00001 max_depth:8 max_features:0.25 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95 seed:1

LogisticRegression verbose:false

Data Format

For dense input data, the file needs to start with the target variable followed by a comma, separated variables like:

1,0,0,2,3,2.4

0,1,1,0,0,12

For sparse format , it is the same as libsvm (same example as above) :

1 2:2 3:3 4:2.4

0 0:1 1:1 4:12

warning: Some algorithms (mostly tree-based) may not be very fast with this format)

If test_target is false, then the test data may not have a target and start directly from the variables.

A train method needs at least a train_file and a params_file. It also needs at least two algorithms, and the and last layer must not contain a regressor unless the metric is auc and the problem is binary.

A predict method needs at least a test_file and a model_file.

Commandline Train Statement

Java –jar stacknet.jar train task=classification sparse=false has_head=true model=model pred_file=pred.csv train_file=sample_train.csv test_file= sample_test.csv test_target=true params=params.txt verbose=true threads=7 metric=logloss stackdata=false seed=1 folds=5 bins=3

Note that you can have train and test at the same time. In that case after training, it scores the test data.

Commandline predict Statement

Java -jar stacknet.jar predict sparse=false has_head=true model=model pred_file=pred.csv test_file=sample_test.csv test_target=true verbose=true metric=logloss

Examples

Run StackNet from within Java code

If we wanted to build a 3-level stacknet on a binary target with desne data, we start with initializing a StackNetClassifier Object:

 StackNetClassifier StackNet = new StackNetClassifier (); // Initialise a StackNet 

Which is then followed by a 2-dimensional String array with the list of models in each layer along with their hyperparameters in the form of as in "estimator [space delimited hyper parameters]"

String models_per_level[][]=new String[][]; 

            
{//First Level
{"LogisticRegression C:0.5 maxim_Iteration:100 verbose:true", 
"RandomForestClassifier bootsrap:false estimators:100 threads:25 offset:0.00001 cut_off_subsample:1.0 feature_subselection:1.0 max_depth:15 max_features:0.3 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:ENTROPY row_subsample:0.95", 
"LSVC C:3 maxim_Iteration:50",
"LibFmClassifier maxim_Iteration:16 C:0.000001 lfeatures:3 init_values:0.9 learn_rate:0.9 smooth:0.1", 
"NaiveBayesClassifier Shrinkage:0.01", 
"Vanilla2hnnclassifier maxim_Iteration:20 C:0.000001 tolerance:0.01 learn_rate:0.009 smooth:0.02 h1:30 h2:20 connection_nonlinearity:Relu init_values:0.02", 
"GradientBoostingForestClassifier estimators:100 threads:25 verbose:false trees:1 rounding:2 shrinkage:0.1 feature_subselection:0.5 max_depth:8 max_features:1.0 min_leaf:2.0 min_split:5.0 row_subsample:0.9", 
"LinearRegression C:0.00001", 
"AdaboostRandomForestClassifier estimators:100 threads:3 verbose:true trees:1 rounding:2 weight_thresold:0.4 feature_subselection:0.5 max_depth:8 max_features:1.0 min_leaf:2.0 min_split:5.0 row_subsample:0.9", 
"GradientBoostingForestRegressor estimators:100 threads:3 trees:1 rounding:2 shrinkage:0.1 feature_subselection:0.5 max_depth:9 max_features:1.0 min_leaf:2.0 min_split:5.0 row_subsample:0.9", 
"RandomForestRegressor estimators:100 internal_threads:1 threads:25 offset:0.00001 verbose:true cut_off_subsample:1.0 feature_subselection:1.0 max_depth:14 max_features:0.25 max_tree_size:-1 min_leaf:2.0 min_split:5.0 Objective:RMSE row_subsample:1.0", 
"LSVR C:3 maxim_Iteration:50 P:0.2" },
//Second Level                
{"RandomForestClassifier estimators:1000  threads:25 offset:0.0000000001 verbose=false cut_off_subsample:0.1 feature_subselection:1.0 max_depth:7 max_features:0.4  max_tree_size:-1 min_leaf:1.0  min_split:2.0 Objective:ENTROPY row_subsample:1.0",
"GradientBoostingForestClassifier estimators:1000 threads:25 verbose:false trees:1 rounding:4 shrinkage:0.01 feature_subselection:0.5 max_depth:5 max_features:1.0 min_leaf:1.0 min_split:2.0 row_subsample:0.9",    
"Vanilla2hnnclassifier maxim_Iteration:20 C:0.000001 tolerance:0.01 learn_rate:0.009 smooth:0.02 h1:30 h2:20 connection_nonlinearity:Relu init_values:0.02",    
"LogisticRegression C:0.5 maxim_Iteration:100 verbose:false" },
//Third Level                    
{"RandomForestClassifier estimators:1000  threads:25 offset:0.0000000001 verbose=false cut_off_subsample:0.1 feature_subselection:1.0 max_depth:6 max_features:0.7  max_tree_size:-1 min_leaf:1.0  min_split:2.0 Objective:ENTROPY row_subsample:1.0" }
};

Alternatively, we could load directly from a file :

String modellings[][]=io.input.StackNet_Configuration("params.txt");

StackNet.parameters=models_per_level; // adding the models' specifications

The remaining parameters to be specified include the cross validation training schema, the Restacking mode option, setting a random state as well as some other miscellaneous options:

StackNet.threads=4; // models to be run in parallel
StackNet.folds=5; // size of K-Fold
StackNet.stackdata=true; // use Restacking
StackNet.print=true; // this helps to avoid rerunning should the model fail
StackNet.output_name="restack";// prefix for each layer's output.
StackNet.verbose=true; // it outputs 
StackNet.seed=1; // random state
StackNet.metric="logloss"

Ultimately given a data object X and a 1-dimensional vector y, the model can be trained using:

StackNet.target=y; // the target variable        
StackNet.fit(X); // fitting the model on the training data

Predictions are made with :

double preds [][]=StackNet.predict_proba(X_test);

Potential Next Steps

  • Add StackNetRegressor Done.
  • Add H2O
  • increase coverage in general with well-known and well-performing ml tools (original libfm, libffm, vowpal wabbit)
  • Add data pre-processing steps
  • Make a python wrapper

Reference

For now, you may use this:

Marios Michailidis (2017), StackNet, StackNet Meta Modelling Framework, url https://github.com/kaz-Anova/StackNet

News

  • StackNet model was presented at infiniteconf 2017 [6th-7th July] and the video is available there if you sign up
  • New facebook page to discuss StackNet and other open source data science topics.
  • StackNet and Sracking was explained in kaggle's blog
  • The is an Ask Me Anything (AMA) thread in kaggle with useful material about stacking and StackNet.
  • A workshop with StackNet will take place in ODSC in London October 12-14 .

Special Thanks

To my co-supervisors:

Comments
  • Exception in thread

    Exception in thread "main" java.lang.reflect.InvocationTargetException

    I have tried StackNet example with CMD under Windows, following problem happens. @kaz-Anova or someone else could give me tips how to fix it? Thanks a lot.

    C:\Users\User>java -jar StackNet.jar train task=classification sparse=false has_head=false model=model train_file=train_iris.csv test_file=test_iris.csv test_target=true params=params.txt verbose=true threads=4 metric=logloss stackdata=false parameter name : task value : classification parameter name : sparse value : false parameter name : has_head value : false parameter name : model value : model parameter name : train_file value : train_iris.csv parameter name : test_file value : test_iris.csv parameter name : test_target value : true parameter name : params value : params.txt parameter name : verbose value : true parameter name : threads value : 4 parameter name : metric value : logloss parameter name : stackdata value : false Completed: 4.04 % Completed: 8.08 % Completed: 12.12 % Completed: 16.16 % Completed: 20.20 % Completed: 24.24 % Completed: 28.28 % Completed: 32.32 % Completed: 36.36 % Completed: 40.40 % Completed: 44.44 % Completed: 48.48 % Completed: 52.53 % Completed: 56.57 % Completed: 60.61 % Completed: 64.65 % Completed: 68.69 % Completed: 72.73 % Completed: 76.77 % Completed: 80.81 % Completed: 84.85 % Completed: 88.89 % Completed: 92.93 % Completed: 96.97 % Loaded File: train_iris.csv Total rows in the file: 99 Total columns in the file: 5 Weighted variable : -1 counts: 0 Int Id variable : -1 str id: -1 counts: 0 Target Variables : 1 values : [0] Actual columns number : 4 Number of Skipped rows : 0 Actual Rows (removing the skipped ones) : 99 Loaded dense train data with 99 and columns 4 loaded data in : 0.100000 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: File params.txt failed to import at bufferreader params.txt (系统找不到指定的文件。) at io.input.StackNet_Configuration(input.java:1650) at stacknetrun.runstacknet.main(runstacknet.java:441)

    opened by ahbon123 38
  • ArrayIndexOutOfBounds when using GradientBoostingForestClassifier

    ArrayIndexOutOfBounds when using GradientBoostingForestClassifier

    Hi. Thanks for stacknet classifier. I encountered a exception when I try to add some more features to the kaggle quora problem.

    Exception in thread "Thread-23" java.lang.ArrayIndexOutOfBoundsException: 12
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3011)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3042)
            at ml.Tree.DecisionTreeRegressor.expand_node(DecisionTreeRegressor.java:3038)
            at ml.Tree.DecisionTreeRegressor.fit(DecisionTreeRegressor.java:2382)
            at ml.Tree.DecisionTreeRegressor.run(DecisionTreeRegressor.java:483)
            at java.lang.Thread.run(Thread.java:745)
    Exception in thread "Thread-5" java.lang.NullPointerException
            at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
            at ml.Tree.scoringhelperv2.<init>(scoringhelperv2.java:107)
            at ml.Tree.RandomForestRegressor.predict2d(RandomForestRegressor.java:744)
            at ml.Tree.GradientBoostingForestClassifier.fit(GradientBoostingForestClassifier.java:2353)
            at ml.Tree.GradientBoostingForestClassifier.run(GradientBoostingForestClassifier.java:382)
            at java.lang.Thread.run(Thread.java:745)
    
    
    Exception in thread "main" java.lang.NullPointerException
            at ml.Tree.scoringhelperfv2.<init>(scoringhelperfv2.java:107)
            at ml.Tree.GradientBoostingForestClassifier.predict_proba(GradientBoostingForestClassifier.java:603)
            at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2438)
            at stacknetrun.runstacknet.main(runstacknet.java:385)
    Exception in thread "Thread-28783" java.lang.NullPointerException
            at ml.Tree.DecisionTreeRegressor.isfitted(DecisionTreeRegressor.java:3275)
            at ml.Tree.scoringhelperv2.<init>(scoringhelperv2.java:107)
            at ml.Tree.RandomForestRegressor.predictfs(RandomForestRegressor.java:590)
            at ml.Tree.scoringhelperfv2.score(scoringhelperfv2.java:149)
            at ml.Tree.scoringhelperfv2.run(scoringhelperfv2.java:175)
            at java.lang.Thread.run(Thread.java:745)
    

    I used the paramsv1.txt but add more threads to each base classifier.

    opened by yqf3139 13
  • InvocationTargetException error

    InvocationTargetException error

    Hi,

    I encountered an error for running StackNet. Here is the command:

    java -Xmx12144m -jar StackNet.jar train train_file='/home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv' test_file='/home/jlu/
    Experiments/Examples/Instacart/imba/data/all_data_test_V1.csv' has_head=true params='/home/jlu/Experiments/Examples/Instacart/imba/paramsv1.txt' sparse=false pred_file='/home/jlu/Experiments/Exam
    ples/Instacart/imba/data/stacknet_pred_V1.csv' test_target=false verbose=true Threads=10 folds=5 seed=1 metric=auc output_name=restack_instacart folds=10 seed=1 task=classification
    

    Here is the error message. What does InvocationTargetException error here imply?

    parameter name : train_file value :  /home/jlu/experiments/examples/instacart/imba/data/nz_train_slim.csv
    parameter name : test_file value :  /home/jlu/experiments/examples/instacart/imba/data/all_data_test_v1.csv
    parameter name : has_head value :  true
    parameter name : params value :  /home/jlu/experiments/examples/instacart/imba/paramsv1.txt
    parameter name : sparse value :  false
    parameter name : pred_file value :  /home/jlu/experiments/examples/instacart/imba/data/stacknet_pred_v1.csv
    parameter name : test_target value :  false
    parameter name : verbose value :  true
    parameter name : threads value :  10
    parameter name : folds value :  5
    parameter name : seed value :  1
    parameter name : metric value :  auc
    parameter name : output_name value :  restack_instacart
    parameter name : folds value :  10
    parameter name : seed value :  1
    parameter name : task value :  classification
     Completed: 5.00 %
     Completed: 10.00 %
     Completed: 15.00 %
     Completed: 20.00 %
     Completed: 25.00 %
     Completed: 30.00 %
     Completed: 35.00 %
     Completed: 40.00 %
     Completed: 45.00 %
     Completed: 50.00 %
     Completed: 55.00 %
     Completed: 60.00 %
     Completed: 65.00 %
     Completed: 70.00 %
     Completed: 75.00 %
     Completed: 80.00 %
     Completed: 85.00 %
     Completed: 90.00 %
     Completed: 95.00 %
     Completed: 100.00 %
     Loaded File: /home/jlu/Experiments/Examples/Instacart/imba/data/nz_train_slim.csv
     Total rows in the file: 8474661
     Total columns in the file: 78
     Weighted variable : -1 counts: 0
     Int Id variable : -1 str id: -1 counts: 0
     Target Variables  : 1 values : [0]
     Actual columns number  : 77
     Number of Skipped rows   : 0
     Actual Rows (removing the skipped ones)  : 8474661
    Loaded dense train data with 8474661 and columns 77
     loaded data in : 125.971000
     Level: 1 dimensionality: 893
     Starting cross validation
    Exception in thread "main" java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
    Caused by: java.lang.NegativeArraySizeException
            at matrix.fsmatrix.<init>(fsmatrix.java:85)
            at ml.stacknet.StackNetClassifier.fit(StackNetClassifier.java:2749)
            at stacknetrun.runstacknet.main(runstacknet.java:471)
            ... 5 more
    
    
    opened by ajing 11
  • Error while replicating example

    Error while replicating example

    Hi, I tried to replicate one of the example that provided in this repo. In this case, I tried to replicate the Amazon one. I ran the code using param_amazon_linear like the one documented in that example, but all I got was this:

    > java -Xmx3048m -jar StackNet.jar train train_file=train.sparse test_file=test.sparse params=param_amazon_linear.txt pred_file=amazon_linear_pred.csv test_target=false verbose=true Threads=1 sparse=true folds=5 seed=1 metric=auc
    parameter name : train_file value :  train.sparse
    parameter name : test_file value :  test.sparse
    parameter name : params value :  param_amazon_linear.txt
    parameter name : pred_file value :  amazon_linear_pred.csv
    parameter name : test_target value :  false
    parameter name : verbose value :  true
    parameter name : threads value :  1
    parameter name : sparse value :  true
    parameter name : folds value :  5
    parameter name : seed value :  1
    parameter name : metric value :  auc
    a train method needs to have a task which may be regression or classification
    

    After I checked for awhile, it didn't produce any output file. Is there something that I did wrong?

    Additional note: I also already produced train.sparse and test.sparse by running prepare_data.py

    opened by arisbw 9
  • What is a typical way to use StackNet?

    What is a typical way to use StackNet?

    I am a Kaggler and try to improve myself. :). Thanks for the great tool!

    I saw you have cases combining two StackNets. So, I am wondering the typical strategy to use StackNet. After some data cleaning and feature engineering, then run StackNet. How to do model diagnosis with the result of StackNet? How to gradually improve the final model?

    Thanks, Jing

    opened by ajing 8
  • can't figure this error out?

    can't figure this error out?

    @kaz-Anova @jq (sorry for @ )

     prediction to has failed due to 2
    printing prediction to  preds.csv has failed due to null
     predicting on test data lasted : 323.916000
    Exception in thread "main" java.lang.reflect.InvocationTargetException
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
            at java.base/java.lang.reflect.Method.invoke(Unknown Source)
            at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
    Caused by: java.lang.NullPointerException
            at stacknetrun.runstacknet.main(runstacknet.java:745)
            ... 5 more
    

    Command on cmd(Anaconda's Terminal in a Virtual env)

    java -jar StackNet.jar train task=classification train_file=train.csv test_file=test.csv params=params.txt pred_file=preds0.csv
    test_target=false has_head=true verbose=true Threads=4 folds=3 metric=auc --output_name=model_params --indices_name=k_folds --seed=10 --include_target=True
    

    Here's my params.txt

    XgboostClassifier booster:gbtree num_round:1500 eta:0.03 max_leaves:0 gamma:.6 max_depth:8 min_child_weight:1.0 subsample:0.9 colsample_bytree:0.75 colsample_bylevel:0.85 lambda:1.0 alpha:1.0 seed:1 threads:4 bags:1 verbose:true
    SklearnknnClassifier seed:1 usedense:true use_scale:false distance:cityblock metric:uniform n_neighbors:10 thread:4 verbose:true
    KerasnnClassifier loss:categorical_crossentropy standardize:true use_log1p:true shuffle:true batch_normalization:true weight_init:lecun_uniform momentum:0.9 optimizer:sgd use_dense:true l2:0.1,0.1 hidden:30,20 activation:relu,relu droupouts:0.2,0.1 epochs:250 lr:0.01 batch_size:32 stopping_rounds:10 validation_split:0.2 seed:1 verbose:true
    SklearnknnClassifier seed:1 usedense:true use_scale:false distance:cosine metric:uniform n_neighbors:10 thread:4 verbose:true
    KerasnnClassifier loss:categorical_crossentropy standardize:true use_log1p:false shuffle:true batch_normalization:true weight_init:lecun_uniform momentum:0.9 optimizer:sgd use_dense:true l2:0.001,0.001 hidden:30,15 activation:relu,relu droupouts:0.3,0.1 epochs:250 lr:0.01 batch_size:32 stopping_rounds:10 validation_split:0.2 seed:1 verbose:true
    
    XgboostClassifier booster:gbtree num_round:1000 eta:0.03 max_leaves:0 gamma:.6 max_depth:8 min_child_weight:1.0 subsample:0.9 colsample_bytree:0.75 colsample_bylevel:0.85 lambda:1.0 alpha:1.0 seed:1 threads:4 bags:1 verbose:true
    KerasnnClassifier loss:categorical_crossentropy standardize:true use_log1p:false shuffle:true batch_normalization:true weight_init:lecun_uniform momentum:0.9 optimizer:sgd use_dense:true l2:0.001,0.001 hidden:50,25 activation:relu,relu droupouts:0.25,0.15 epochs:250 lr:0.01 batch_size:32 stopping_rounds:10 validation_split:0.2 seed:1 verbose:true
    

    I had verified that the repo is the latest version, and other proper installations as well..

    I am on Win 10 x64, 8 gigs 4 logical Cores.

    The weird part is on removing the xgb, it works absolutely perfect ( I am having xgb at the end of the 2nd level)

    opened by AdityaSoni19031997 6
  • How to tune a single model?

    How to tune a single model?

    Hi Marios,

    Thanks for sharing Stacknet, great tool for stacking method, but still i'm not clear how to tune a single model, for example, if my paramter file is as following:

    _XgboostRegressor booster:gblinear objective:reg:linear max_leaves:0 num_round:500 eta:0.1 threads:3 gamma:1 max_depth:4 colsample_bylevel:1.0 min_child_weight:4.0 max_delta_step:0.0 subsample:0.8 colsample_bytree:0.5 scale_pos_weight:1.0 alpha:10.0 lambda:1.0 seed:1 verbose:false

    LightgbmRegressor verbose:false_

    What should i put in the command line? Is it same as this one?

    java -Xmx12048m -jar StackNet.jar train task=regression sparse=true has_head=false output_name=datasettwo model=model2 pred_file=pred2.csv train_file=dataset2_train.txt test_file=dataset2_test.txt test_target=false params=dataset2_params.txt verbose=true threads=1 metric=mae stackdata=false seed=1 folds=4 bins=3

    Thank you!

    opened by ahbon123 6
  • Fail to train a regression task?

    Fail to train a regression task?

    Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: exceptions.IllegalStateException: The last layer of StackNet cannot have a classifier at ml.stacknet.StackNetRegressor.fit(StackNetRegressor.java:2532) at stacknetrun.runstacknet.main(runstacknet.java:556) ... 5 more

    opened by gdragone1 5
  • Question: How to tune level 2 or level 3 model?

    Question: How to tune level 2 or level 3 model?

    I know how to tune the model in level1, just put that model to first line in param.txt and rerun it once k-fold is finished.

    However, i don't know how to tune the model at level2, do any one know ?

    opened by iFe1er 5
  • Questions: Why stacknet performance bad in Zillow Competition

    Questions: Why stacknet performance bad in Zillow Competition

    Hi dear stacknet developers and the author Mr.@kaz-Anova

    I am new to kaggle, and i found stacknet a COOL tool to use. However it is not as GOOD as i expected using the default parameter.... I tried tuning/adding features.. but none of them give me any improvment yet. So that's why i post this passage, hopefully someone can guide me how to tune a BETTER stacknet (paramaters, folds, regressors, etc) for Zillow Competition.

    As far as the doc said: https://github.com/kaz-Anova/StackNet/tree/master/example/zillow_regression_sparse

    the performance of StackeNet alone ONLY achieve around 0.0647 on LB, which is not good, comparing with other kernels. Some kernels can achieve a high score with a single model (lightgbm alone up to 0.0644). And there are also Kernels that use a 2-layer traditional-way stacking that achieve 0.0645 (https://www.kaggle.com/wangsg/ensemble-stacking-lb-0-644/comments), which is much better than stacknet as far as the LeaderBoard concerned.

    So my question is that, why stacknet is not working very well on LB in Zillow competition?

    1.Is it the problem of default parameter or regressors?

    If so, could @kaz-Anova please help update the paramater and regressors in the example so that the performance can gets better? (i tried a lot, but not imporved) . If so, more people (especially freshman like me) will be more happy to use StackNet.

    2. Is it a problem of reusable K-fold metrics?

    ( As far as i tried, my 5-fold LightGBM average works much worse than my 1-time Lightgbm.) In my opinion, the Zillow LeaderBoard test data is evaluated on 2016.10 ~ 2016.12 . However, the data between 2016.10~2016.12 are very few in the training set. So , a K-FOLD may be is a bad ways to do the competition.
    If so, would it be possible that StackedNet will ** in the future support a DIFFERENT out-of-bag metrics, not just KFOLD**, so that more flexible blending(i.e. devided data into two parts. Then only use history data to train, predict on future data, and use the future data to do stacking) or sliding window algorithm would be supported( you know, especially for time-related problem, sometimes it is bad to leak the future into the past with K-fold or reusable K-fold)

    3.Some other questions about the Zillow Competition using StackNet.

    to Mr.@kaz-Anova: Sincerely congrats on the high score that you and your team achieve. However, considering the bad performance of Stacknet now, I am very curious if you are still using the StackNet in the competition as a strong predictor (instead as a weak model for averaging..)

    Apologize for my rudeness (if that is the case) and surely I know that one can achieve better LB scoring just by combing with other kernels... .But my question is, the baseline of Stackenet now is so far away from other kernels. Are there any practical methods(or tricks) that you would like to share in order to make stacknet works better?

    p.s. I am now a fan of stacknet, i want to express gratitude to Mr.@kaz-Anova for the converience that powerful stacknet brings us. I wish it could be even BETTER in the future.

    Sincerely

    opened by iFe1er 5
  • Error with SklearnknnClassifier 'metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear''

    Error with SklearnknnClassifier 'metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear''

    I believe there is a wrong error check in the file SklearnknnClassifier.java

    		if ( !metric.equals("rbf")  && !metric.equals("poly")&& !metric.equals("sigmoid") && !metric.equals("linear") ){
    			throw new IllegalStateException(" metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear'" );	
    		}
    
    		if ( !metric.equals("uniform")  && !metric.equals("distance") ){
    			throw new IllegalStateException(" metric has to be between 'uniform' or 'distance'" );	
    }
    

    I think the first metric.equals check block is incorrect and you only want the second one. I couldn't get the SklearnknnClassifier to work.

    Also, in the docs the parameters section for KerasnnClassifier section has the wrong header (it has the header 'SklearnsvmClassifier'). There are also some typos like dropout being labeled 'droupout' and 'Toral' instead of 'Total.'

    Thanks for creating and sharing the StackNet tool.

    opened by AurelianTactics 5
  • Hyperparameter Tuning

    Hyperparameter Tuning

    Thanks for the awesome package, i want to ask how does hyperparameter work in the K-fold paradigm. should i use out of fold data to pre-tune the hyperparameters or use part of the data to pre-tune hyperparameters or should i do hyperparameter selection during each fold of the cv process? thanks

    opened by samyip123 0
  • Help please

    Help please

    Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.NullPointerException at RO2.getro2(RO2.java:60) at CIRI_Full2.main(CIRI_Full2.java:272) ... 5 more

    opened by ishaaq34 1
  • README: Missing information

    README: Missing information

    Towards the end of the 'How Does it Work' section of the repository README, some content seems to have been trimmed off.

    In case where there are many such classifiers, the results is the scaled average of all these output predictions and can be written as:

    opened by sarthakbatragatech 0
  • Pilon terminated with

    Pilon terminated with "Exception in thread "main" java.lang.reflect.InvocationTargetException" error

    MyPassport: java -Xmx32G -jar /home/urbe/anaconda3/share/pilon-1.22-0/pilon-1.22.jar --genome MatDATA/final_edited.fa --unpaired ngmlrMAP_ONT.sorted.bam  --changes --variant --tracks --outdir seeTEST
    Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400
    Genome: MatDATA/final_edited.fa
    Fixing breaks, indels, gaps, local, snps
    Input genome size: 185688181
    Scanning BAMs
    ngmlrMAP_ONT.sorted.bam: 2280512 reads, 0 filtered, 1693834 mapped, 0 proper, 0 stray, Unpaired 100% 12353+/-8282, max 37198
    Processing i33C-assembly_contig_1030:1-1103
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 108
    Total Reads: 119, Coverage: 108, minDepth: 11
    Confirmed 1024 of 1103 bases (92.84%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_1030:1-1103 log:
    Finished processing i33C-assembly_contig_1030:1-1103
    Processing i33C-assembly_contig_118:1-596
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 9
    Total Reads: 10, Coverage: 9, minDepth: 5
    Confirmed 503 of 596 bases (84.40%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_118:1-596 log:
    Finished processing i33C-assembly_contig_118:1-596
    Processing i33C-assembly_contig_1109:1-17939
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 48
    Total Reads: 220, Coverage: 48, minDepth: 5
    Confirmed 16127 of 17939 bases (89.90%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 26 small deletions totaling 30 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_1109:472-488 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1109:8173 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1109:9619 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1109:9929 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1109:10933-12321 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1109:14874-14875 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1109:15937 0 -0 +0 NoSolution
    i33C-assembly_contig_1109:1-17939 log:
    Finished processing i33C-assembly_contig_1109:1-17939
    Processing i33C-assembly_contig_1108:1-2195
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 66
    Total Reads: 148, Coverage: 66, minDepth: 7
    Confirmed 1860 of 2195 bases (84.74%)
    Corrected 2 snps; 0 ambiguous bases; corrected 2 small insertions totaling 2 bases, 7 small deletions totaling 9 bases
    i33C-assembly_contig_1108:1-2195 log:
    Finished processing i33C-assembly_contig_1108:1-2195
    Processing i33C-assembly_contig_1088:1-1151
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 124
    Total Reads: 222, Coverage: 124, minDepth: 12
    Confirmed 1058 of 1151 bases (91.92%)
    Corrected 1 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 2 bases
    i33C-assembly_contig_1088:1-1151 log:
    Finished processing i33C-assembly_contig_1088:1-1151
    Processing i33C-assembly_contig_584:1-214
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 8
    Total Reads: 8, Coverage: 8, minDepth: 5
    Confirmed 158 of 214 bases (73.83%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_584:1-214 log:
    Finished processing i33C-assembly_contig_584:1-214
    Processing i33C-assembly_contig_589:1-6076
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 34
    Total Reads: 64, Coverage: 34, minDepth: 5
    Confirmed 5750 of 6076 bases (94.63%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 2 bases
    i33C-assembly_contig_589:1-6076 log:
    Finished processing i33C-assembly_contig_589:1-6076
    Processing i33C-assembly_contig_999:1-2638
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 15
    Total Reads: 17, Coverage: 15, minDepth: 5
    Confirmed 2327 of 2638 bases (88.21%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 9 small deletions totaling 10 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_999:169-170 0 -0 +0 NoSolution
    i33C-assembly_contig_999:1-2638 log:
    Finished processing i33C-assembly_contig_999:1-2638
    Processing i33C-assembly_contig_119:1-3737
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 46
    Total Reads: 117, Coverage: 46, minDepth: 5
    Confirmed 3188 of 3737 bases (85.31%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 3 small deletions totaling 6 bases
    i33C-assembly_contig_119:1-3737 log:
    Finished processing i33C-assembly_contig_119:1-3737
    Processing i33C-assembly_contig_628:1-1946
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 22
    Total Reads: 33, Coverage: 22, minDepth: 5
    Confirmed 1593 of 1946 bases (81.86%)
    Corrected 3 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 4 small deletions totaling 7 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_628:888 0 -0 +0 NoSolution
    i33C-assembly_contig_628:1-1946 log:
    Finished processing i33C-assembly_contig_628:1-1946
    Processing i33C-assembly_contig_729:1-520
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 32
    Total Reads: 34, Coverage: 32, minDepth: 5
    Confirmed 471 of 520 bases (90.58%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_729:1-520 log:
    Finished processing i33C-assembly_contig_729:1-520
    Processing i33C-assembly_contig_728:1-729
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 4
    Total Reads: 8, Coverage: 4, minDepth: 5
    Confirmed 266 of 729 bases (36.49%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_728:1-729 log:
    Finished processing i33C-assembly_contig_728:1-729
    Processing i33C-assembly_contig_727:1-767
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 31
    Total Reads: 35, Coverage: 31, minDepth: 5
    Confirmed 724 of 767 bases (94.39%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_727:1-767 log:
    Finished processing i33C-assembly_contig_727:1-767
    Processing i33C-assembly_contig_726:1-1512
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 41
    Total Reads: 44, Coverage: 41, minDepth: 5
    Confirmed 1410 of 1512 bases (93.25%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_726:1-1512 log:
    Finished processing i33C-assembly_contig_726:1-1512
    Processing i33C-assembly_contig_725:1-1512
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 36
    Total Reads: 40, Coverage: 36, minDepth: 5
    Confirmed 1358 of 1512 bases (89.81%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_725:1-1512 log:
    Finished processing i33C-assembly_contig_725:1-1512
    Processing i33C-assembly_contig_724:1-522
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 522 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_724:1-522 log:
    Finished processing i33C-assembly_contig_724:1-522
    Processing i33C-assembly_contig_723:1-9230
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 27
    Total Reads: 57, Coverage: 27, minDepth: 5
    Confirmed 8511 of 9230 bases (92.21%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 7 small deletions totaling 9 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_723:5069 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_723:7143 0 -0 +0 NoSolution
    i33C-assembly_contig_723:1-9230 log:
    Finished processing i33C-assembly_contig_723:1-9230
    Processing i33C-assembly_contig_722:1-1663
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 51
    Total Reads: 60, Coverage: 51, minDepth: 5
    Confirmed 1515 of 1663 bases (91.10%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_722:1-1663 log:
    Finished processing i33C-assembly_contig_722:1-1663
    Processing i33C-assembly_contig_721:1-2061
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 5
    Total Reads: 29, Coverage: 5, minDepth: 5
    Confirmed 744 of 2061 bases (36.10%)
    Corrected 0 snps; 0 ambiguous bases; corrected 3 small insertions totaling 5 bases, 9 small deletions totaling 38 bases
    i33C-assembly_contig_721:1-2061 log:
    Finished processing i33C-assembly_contig_721:1-2061
    Processing i33C-assembly_contig_720:1-636
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 25
    Total Reads: 27, Coverage: 25, minDepth: 5
    Confirmed 566 of 636 bases (88.99%)
    Corrected 1 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_720:1-636 log:
    Finished processing i33C-assembly_contig_720:1-636
    Processing i33C-assembly_contig_622:1-435
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 12
    Total Reads: 21, Coverage: 12, minDepth: 5
    Confirmed 306 of 435 bases (70.34%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_622:1-435 log:
    Finished processing i33C-assembly_contig_622:1-435
    Processing i33C-assembly_contig_623:1-545
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 7
    Total Reads: 8, Coverage: 7, minDepth: 5
    Confirmed 446 of 545 bases (81.83%)
    Corrected 0 snps; 0 ambiguous bases; corrected 1 small insertions totaling 1 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_623:1-545 log:
    Finished processing i33C-assembly_contig_623:1-545
    Processing i33C-assembly_contig_620:1-191
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 191 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_620:1-191 log:
    Finished processing i33C-assembly_contig_620:1-191
    Processing i33C-assembly_contig_548:1-760
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 1
    Total Reads: 4, Coverage: 1, minDepth: 5
    Confirmed 0 of 760 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_548:1-760 log:
    Finished processing i33C-assembly_contig_548:1-760
    Processing i33C-assembly_contig_626:1-413
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 3
    Total Reads: 3, Coverage: 3, minDepth: 5
    Confirmed 0 of 413 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_626:1-413 log:
    Finished processing i33C-assembly_contig_626:1-413
    Processing i33C-assembly_contig_627:1-982
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 14
    Total Reads: 16, Coverage: 14, minDepth: 5
    Confirmed 825 of 982 bases (84.01%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_627:1-982 log:
    Finished processing i33C-assembly_contig_627:1-982
    Processing i33C-assembly_contig_624:1-2861
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 386
    Total Reads: 532, Coverage: 386, minDepth: 39
    Confirmed 2683 of 2861 bases (93.78%)
    Corrected 1 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_624:1-2861 log:
    Finished processing i33C-assembly_contig_624:1-2861
    Processing i33C-assembly_contig_625:1-1428
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 33
    Total Reads: 49, Coverage: 33, minDepth: 5
    Confirmed 1299 of 1428 bases (90.97%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 3 bases
    i33C-assembly_contig_625:1-1428 log:
    Finished processing i33C-assembly_contig_625:1-1428
    Processing i33C-assembly_contig_543:1-330
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 4
    Total Reads: 5, Coverage: 4, minDepth: 5
    Confirmed 108 of 330 bases (32.73%)
    Corrected 0 snps; 0 ambiguous bases; corrected 1 small insertions totaling 1 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_543:1-330 log:
    Finished processing i33C-assembly_contig_543:1-330
    Processing i33C-assembly_contig_542:1-1529
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 22
    Total Reads: 35, Coverage: 22, minDepth: 5
    Confirmed 1347 of 1529 bases (88.10%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_542:1-1529 log:
    Finished processing i33C-assembly_contig_542:1-1529
    Processing i33C-assembly_contig_541:1-1367
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 5
    Total Reads: 5, Coverage: 5, minDepth: 5
    Confirmed 783 of 1367 bases (57.28%)
    Corrected 1 snps; 0 ambiguous bases; corrected 8 small insertions totaling 9 bases, 9 small deletions totaling 12 bases
    i33C-assembly_contig_541:1-1367 log:
    Finished processing i33C-assembly_contig_541:1-1367
    Processing i33C-assembly_contig_540:1-937
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 1, Coverage: 0, minDepth: 5
    Confirmed 0 of 937 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_540:1-937 log:
    Finished processing i33C-assembly_contig_540:1-937
    Processing i33C-assembly_contig_547:1-620
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 620 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_547:1-620 log:
    Finished processing i33C-assembly_contig_547:1-620
    Processing i33C-assembly_contig_546:1-1197
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 8
    Total Reads: 9, Coverage: 8, minDepth: 5
    Confirmed 1036 of 1197 bases (86.55%)
    Corrected 0 snps; 0 ambiguous bases; corrected 2 small insertions totaling 2 bases, 5 small deletions totaling 6 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_546:423-623 0 -0 +0 NoSolution
    i33C-assembly_contig_546:1-1197 log:
    Finished processing i33C-assembly_contig_546:1-1197
    Processing i33C-assembly_contig_545:1-412
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 12
    Total Reads: 13, Coverage: 12, minDepth: 5
    Confirmed 343 of 412 bases (83.25%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_545:1-412 log:
    Finished processing i33C-assembly_contig_545:1-412
    Processing i33C-assembly_contig_544:1-538
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 11
    Total Reads: 20, Coverage: 11, minDepth: 5
    Confirmed 477 of 538 bases (88.66%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_544:1-538 log:
    Finished processing i33C-assembly_contig_544:1-538
    Processing i33C-assembly_contig_1036:1-819
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 2
    Total Reads: 5, Coverage: 2, minDepth: 5
    Confirmed 0 of 819 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_1036:1-819 log:
    Finished processing i33C-assembly_contig_1036:1-819
    Processing i33C-assembly_contig_990:1-350
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 350 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_990:1-350 log:
    Finished processing i33C-assembly_contig_990:1-350
    Processing i33C-assembly_contig_994:1-1989
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 72
    Total Reads: 103, Coverage: 72, minDepth: 7
    Confirmed 1884 of 1989 bases (94.72%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_994:525 0 -0 +0 NoSolution
    i33C-assembly_contig_994:1-1989 log:
    Finished processing i33C-assembly_contig_994:1-1989
    Processing i33C-assembly_contig_858:1-3081
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 97
    Total Reads: 129, Coverage: 97, minDepth: 10
    Confirmed 2811 of 3081 bases (91.24%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 52 bases
    i33C-assembly_contig_858:1-3081 log:
    Finished processing i33C-assembly_contig_858:1-3081
    Processing i33C-assembly_contig_1071:1-1098
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 50
    Total Reads: 97, Coverage: 50, minDepth: 5
    Confirmed 917 of 1098 bases (83.52%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_1071:1-1098 log:
    Finished processing i33C-assembly_contig_1071:1-1098
    Processing i33C-assembly_contig_1116:1-28302
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 47
    Total Reads: 247, Coverage: 47, minDepth: 5
    Confirmed 25358 of 28302 bases (89.60%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 16 small deletions totaling 17 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_1116:1726-1799 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1116:2959-2995 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1116:19433-19525 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1116:21118 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1116:23451 0 -0 +0 NoSolution
    i33C-assembly_contig_1116:1-28302 log:
    Finished processing i33C-assembly_contig_1116:1-28302
    Processing i33C-assembly_contig_992:1-583
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 5
    Total Reads: 5, Coverage: 5, minDepth: 5
    Confirmed 368 of 583 bases (63.12%)
    Corrected 0 snps; 0 ambiguous bases; corrected 1 small insertions totaling 2 bases, 4 small deletions totaling 8 bases
    i33C-assembly_contig_992:1-583 log:
    Finished processing i33C-assembly_contig_992:1-583
    Processing i33C-assembly_contig_1070:1-1369
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 57
    Total Reads: 73, Coverage: 57, minDepth: 6
    Confirmed 1302 of 1369 bases (95.11%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 3 bases
    i33C-assembly_contig_1070:1-1369 log:
    Finished processing i33C-assembly_contig_1070:1-1369
    Processing i33C-assembly_contig_938:1-793
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 69
    Total Reads: 73, Coverage: 69, minDepth: 7
    Confirmed 766 of 793 bases (96.60%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_938:1-793 log:
    Finished processing i33C-assembly_contig_938:1-793
    Processing i33C-assembly_contig_993:1-12973
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 27
    Total Reads: 102, Coverage: 27, minDepth: 5
    Confirmed 11105 of 12973 bases (85.60%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 5 small deletions totaling 6 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_993:4168-4870 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_993:5074 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_993:6508 0 -0 +0 NoSolution
    i33C-assembly_contig_993:1-12973 log:
    Finished processing i33C-assembly_contig_993:1-12973
    Processing i33C-assembly_contig_1081:1-1876
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 11
    Total Reads: 14, Coverage: 11, minDepth: 5
    Confirmed 1413 of 1876 bases (75.32%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 7 small deletions totaling 158 bases
    i33C-assembly_contig_1081:1-1876 log:
    Finished processing i33C-assembly_contig_1081:1-1876
    Processing i33C-assembly_contig_587:1-691
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 4
    Total Reads: 8, Coverage: 4, minDepth: 5
    Confirmed 275 of 691 bases (39.80%)
    Corrected 2 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 3 small deletions totaling 5 bases
    i33C-assembly_contig_587:1-691 log:
    Finished processing i33C-assembly_contig_587:1-691
    Processing i33C-assembly_contig_854:1-1517
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 36
    Total Reads: 46, Coverage: 36, minDepth: 5
    Confirmed 1408 of 1517 bases (92.81%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_854:1-1517 log:
    Finished processing i33C-assembly_contig_854:1-1517
    Processing i33C-assembly_contig_857:1-1260
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 7
    Total Reads: 13, Coverage: 7, minDepth: 5
    Confirmed 543 of 1260 bases (43.10%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_857:1-1260 log:
    Finished processing i33C-assembly_contig_857:1-1260
    Processing i33C-assembly_contig_437:1-244
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 244 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_437:1-244 log:
    Finished processing i33C-assembly_contig_437:1-244
    Processing i33C-assembly_contig_436:1-832
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 23
    Total Reads: 48, Coverage: 23, minDepth: 5
    Confirmed 485 of 832 bases (58.29%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 2 bases
    i33C-assembly_contig_436:1-832 log:
    Finished processing i33C-assembly_contig_436:1-832
    Processing i33C-assembly_contig_435:1-3664
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 24
    Total Reads: 42, Coverage: 24, minDepth: 5
    Confirmed 3363 of 3664 bases (91.78%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 7 small deletions totaling 8 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_435:2602 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_435:3433 0 -0 +0 NoSolution
    i33C-assembly_contig_435:1-3664 log:
    Finished processing i33C-assembly_contig_435:1-3664
    Processing i33C-assembly_contig_434:1-1274
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 40
    Total Reads: 65, Coverage: 40, minDepth: 5
    Confirmed 1134 of 1274 bases (89.01%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 4 small deletions totaling 6 bases
    i33C-assembly_contig_434:1-1274 log:
    Finished processing i33C-assembly_contig_434:1-1274
    Processing i33C-assembly_contig_349:1-303
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 21
    Total Reads: 22, Coverage: 21, minDepth: 5
    Confirmed 277 of 303 bases (91.42%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_349:1-303 log:
    Finished processing i33C-assembly_contig_349:1-303
    Processing i33C-assembly_contig_348:1-991
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 1
    Total Reads: 1, Coverage: 1, minDepth: 5
    Confirmed 0 of 991 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_348:1-991 log:
    Finished processing i33C-assembly_contig_348:1-991
    Processing i33C-assembly_contig_431:1-967
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 7
    Total Reads: 9, Coverage: 7, minDepth: 5
    Confirmed 724 of 967 bases (74.87%)
    Corrected 1 snps; 0 ambiguous bases; corrected 2 small insertions totaling 32 bases, 4 small deletions totaling 5 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_431:580 0 -0 +0 NoSolution
    i33C-assembly_contig_431:1-967 log:
    Finished processing i33C-assembly_contig_431:1-967
    Processing i33C-assembly_contig_430:1-922
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 5
    Total Reads: 7, Coverage: 5, minDepth: 5
    Confirmed 254 of 922 bases (27.55%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_430:1-922 log:
    Finished processing i33C-assembly_contig_430:1-922
    Processing i33C-assembly_contig_345:1-2106
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 19
    Total Reads: 48, Coverage: 19, minDepth: 5
    Confirmed 1748 of 2106 bases (83.00%)
    Corrected 1 snps; 0 ambiguous bases; corrected 1 small insertions totaling 40 bases, 5 small deletions totaling 6 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_345:1627-1744 0 -0 +0 NoSolution
    i33C-assembly_contig_345:1-2106 log:
    Finished processing i33C-assembly_contig_345:1-2106
    Processing i33C-assembly_contig_344:1-943
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 13
    Total Reads: 27, Coverage: 13, minDepth: 5
    Confirmed 780 of 943 bases (82.71%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_344:1-943 log:
    Finished processing i33C-assembly_contig_344:1-943
    Processing i33C-assembly_contig_347:1-1193
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 19
    Total Reads: 20, Coverage: 19, minDepth: 5
    Confirmed 1080 of 1193 bases (90.53%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_347:1-1193 log:
    Finished processing i33C-assembly_contig_347:1-1193
    Processing i33C-assembly_contig_346:1-813
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 21
    Total Reads: 23, Coverage: 21, minDepth: 5
    Confirmed 721 of 813 bases (88.68%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_346:1-813 log:
    Finished processing i33C-assembly_contig_346:1-813
    Processing i33C-assembly_contig_341:1-1039
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 1
    Total Reads: 3, Coverage: 1, minDepth: 5
    Confirmed 0 of 1039 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_341:1-1039 log:
    Finished processing i33C-assembly_contig_341:1-1039
    Processing i33C-assembly_contig_340:1-1039
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 1, Coverage: 0, minDepth: 5
    Confirmed 0 of 1039 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_340:1-1039 log:
    Finished processing i33C-assembly_contig_340:1-1039
    Processing i33C-assembly_contig_343:1-83
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 83 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_343:1-83 log:
    Finished processing i33C-assembly_contig_343:1-83
    Processing i33C-assembly_contig_342:1-4207
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 26
    Total Reads: 57, Coverage: 26, minDepth: 5
    Confirmed 3676 of 4207 bases (87.38%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 9 small deletions totaling 11 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_342:2586-2587 0 -0 +0 NoSolution
    i33C-assembly_contig_342:1-4207 log:
    Finished processing i33C-assembly_contig_342:1-4207
    Processing i33C-assembly_contig_262:1-228
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 7
    Total Reads: 9, Coverage: 7, minDepth: 5
    Confirmed 147 of 228 bases (64.47%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_262:1-228 log:
    Finished processing i33C-assembly_contig_262:1-228
    Processing i33C-assembly_contig_263:1-965
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 4
    Total Reads: 4, Coverage: 4, minDepth: 5
    Confirmed 15 of 965 bases (1.55%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_263:1-965 log:
    Finished processing i33C-assembly_contig_263:1-965
    Processing i33C-assembly_contig_189:1-165
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 165 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_189:1-165 log:
    Finished processing i33C-assembly_contig_189:1-165
    Processing i33C-assembly_contig_188:1-710
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 29
    Total Reads: 43, Coverage: 29, minDepth: 5
    Confirmed 666 of 710 bases (93.80%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_188:1-710 log:
    Finished processing i33C-assembly_contig_188:1-710
    Processing i33C-assembly_contig_266:1-4976
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 21
    Total Reads: 53, Coverage: 21, minDepth: 5
    Confirmed 3851 of 4976 bases (77.39%)
    Corrected 133 snps; 6 ambiguous bases; corrected 14 small insertions totaling 16 bases, 20 small deletions totaling 24 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_266:624-679 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_266:3428-3951 0 -0 +0 NoSolution
    i33C-assembly_contig_266:1-4976 log:
    Finished processing i33C-assembly_contig_266:1-4976
    Processing i33C-assembly_contig_267:1-962
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 962 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_267:1-962 log:
    Finished processing i33C-assembly_contig_267:1-962
    Processing i33C-assembly_contig_264:1-1213
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 23
    Total Reads: 31, Coverage: 23, minDepth: 5
    Confirmed 1092 of 1213 bases (90.02%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 3 small deletions totaling 4 bases
    i33C-assembly_contig_264:1-1213 log:
    Finished processing i33C-assembly_contig_264:1-1213
    Processing i33C-assembly_contig_265:1-962
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 0, Coverage: 0, minDepth: 5
    Confirmed 0 of 962 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_265:1-962 log:
    Finished processing i33C-assembly_contig_265:1-962
    Processing i33C-assembly_contig_183:1-1475
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 133
    Total Reads: 153, Coverage: 133, minDepth: 13
    Confirmed 1393 of 1475 bases (94.44%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_183:1-1475 log:
    Finished processing i33C-assembly_contig_183:1-1475
    Processing i33C-assembly_contig_182:1-2093
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 9
    Total Reads: 19, Coverage: 9, minDepth: 5
    Confirmed 1496 of 2093 bases (71.48%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 7 small deletions totaling 11 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_182:1011 0 -0 +0 NoSolution
    i33C-assembly_contig_182:1-2093 log:
    Finished processing i33C-assembly_contig_182:1-2093
    Processing i33C-assembly_contig_181:1-1396
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 42
    Total Reads: 48, Coverage: 42, minDepth: 5
    Confirmed 1256 of 1396 bases (89.97%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 2 bases
    i33C-assembly_contig_181:1-1396 log:
    Finished processing i33C-assembly_contig_181:1-1396
    Processing i33C-assembly_contig_180:1-1554
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 44
    Total Reads: 53, Coverage: 44, minDepth: 5
    Confirmed 1463 of 1554 bases (94.14%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_180:1-1554 log:
    Finished processing i33C-assembly_contig_180:1-1554
    Processing i33C-assembly_contig_187:1-1262
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 32
    Total Reads: 48, Coverage: 32, minDepth: 5
    Confirmed 1128 of 1262 bases (89.38%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_187:1-1262 log:
    Finished processing i33C-assembly_contig_187:1-1262
    Processing i33C-assembly_contig_186:1-1296
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 1
    Total Reads: 4, Coverage: 1, minDepth: 5
    Confirmed 0 of 1296 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_186:1-1296 log:
    Finished processing i33C-assembly_contig_186:1-1296
    Processing i33C-assembly_contig_185:1-1011
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 2
    Total Reads: 3, Coverage: 2, minDepth: 5
    Confirmed 0 of 1011 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_185:1-1011 log:
    Finished processing i33C-assembly_contig_185:1-1011
    Processing i33C-assembly_contig_184:1-147
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 1
    Total Reads: 1, Coverage: 1, minDepth: 5
    Confirmed 0 of 147 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_184:1-147 log:
    Finished processing i33C-assembly_contig_184:1-147
    Processing i33C-assembly_contig_947:1-1583
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 24
    Total Reads: 28, Coverage: 24, minDepth: 5
    Confirmed 1443 of 1583 bases (91.16%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 3 small deletions totaling 4 bases
    i33C-assembly_contig_947:1-1583 log:
    Finished processing i33C-assembly_contig_947:1-1583
    Processing i33C-assembly_contig_946:1-2067
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 68
    Total Reads: 84, Coverage: 68, minDepth: 7
    Confirmed 1929 of 2067 bases (93.32%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 2 bases
    i33C-assembly_contig_946:1-2067 log:
    Finished processing i33C-assembly_contig_946:1-2067
    Processing i33C-assembly_contig_945:1-1243
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 39
    Total Reads: 82, Coverage: 39, minDepth: 5
    Confirmed 1049 of 1243 bases (84.39%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_945:997 0 -0 +0 NoSolution
    i33C-assembly_contig_945:1-1243 log:
    Finished processing i33C-assembly_contig_945:1-1243
    Processing i33C-assembly_contig_944:1-1335
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 106
    Total Reads: 198, Coverage: 106, minDepth: 11
    Confirmed 1261 of 1335 bases (94.46%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 2 bases
    i33C-assembly_contig_944:1-1335 log:
    Finished processing i33C-assembly_contig_944:1-1335
    Processing i33C-assembly_contig_943:1-419
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 25
    Total Reads: 27, Coverage: 25, minDepth: 5
    Confirmed 383 of 419 bases (91.41%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 1 small deletions totaling 1 bases
    i33C-assembly_contig_943:1-419 log:
    Finished processing i33C-assembly_contig_943:1-419
    Processing i33C-assembly_contig_829:1-9553
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 37
    Total Reads: 81, Coverage: 37, minDepth: 5
    Confirmed 8840 of 9553 bases (92.54%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 6 small deletions totaling 6 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_829:2680 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_829:7514 0 -0 +0 NoSolution
    i33C-assembly_contig_829:1-9553 log:
    Finished processing i33C-assembly_contig_829:1-9553
    Processing i33C-assembly_contig_941:1-1052
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 4
    Total Reads: 4, Coverage: 4, minDepth: 5
    Confirmed 1 of 1052 bases (0.10%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_941:1-1052 log:
    Finished processing i33C-assembly_contig_941:1-1052
    Processing i33C-assembly_contig_539:1-1529
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 7
    Total Reads: 15, Coverage: 7, minDepth: 5
    Confirmed 1134 of 1529 bases (74.17%)
    Corrected 1 snps; 1 ambiguous bases; corrected 0 small insertions totaling 0 bases, 3 small deletions totaling 3 bases
    i33C-assembly_contig_539:1-1529 log:
    Finished processing i33C-assembly_contig_539:1-1529
    Processing i33C-assembly_contig_536:1-1821
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 597
    Total Reads: 718, Coverage: 597, minDepth: 60
    Confirmed 1719 of 1821 bases (94.40%)
    Corrected 5 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_536:1-1821 log:
    Finished processing i33C-assembly_contig_536:1-1821
    Processing i33C-assembly_contig_537:1-538
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 21
    Total Reads: 33, Coverage: 21, minDepth: 5
    Confirmed 478 of 538 bases (88.85%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_537:1-538 log:
    Finished processing i33C-assembly_contig_537:1-538
    Processing i33C-assembly_contig_826:1-12097
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 29
    Total Reads: 88, Coverage: 29, minDepth: 5
    Confirmed 11095 of 12097 bases (91.72%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 13 small deletions totaling 14 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_826:9282 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_826:9847 0 -0 +0 NoSolution
    i33C-assembly_contig_826:1-12097 log:
    Finished processing i33C-assembly_contig_826:1-12097
    Processing i33C-assembly_contig_535:1-1220
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 43
    Total Reads: 61, Coverage: 43, minDepth: 5
    Confirmed 1125 of 1220 bases (92.21%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_535:310 0 -0 +0 NoSolution
    i33C-assembly_contig_535:1-1220 log:
    Finished processing i33C-assembly_contig_535:1-1220
    Processing i33C-assembly_contig_532:1-51959
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 11
    Total Reads: 126, Coverage: 11, minDepth: 5
    Confirmed 30937 of 51959 bases (59.54%)
    Corrected 5 snps; 0 ambiguous bases; corrected 7 small insertions totaling 9 bases, 70 small deletions totaling 214 bases
    Large collapsed region: i33C-assembly_contig_532:29980-44647 size 14668
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_532:875-908 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:1274-1351 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:2219 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:3931-11588 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:11868-12585 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:12816-14819 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:15083-15196 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:15771 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:17819 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:19474-19502 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:19794-24552 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:28718-28780 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:30196 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:32097-32225 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:35666 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:35898 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:38068-38103 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:40658 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:40976-40977 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:42121 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:44473-44474 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:47364 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:48007 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:48359 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_532:49642 0 -0 +0 NoSolution
    i33C-assembly_contig_532:1-51959 log:
    Finished processing i33C-assembly_contig_532:1-51959
    Processing i33C-assembly_contig_821:1-150
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 10
    Total Reads: 11, Coverage: 10, minDepth: 5
    Confirmed 123 of 150 bases (82.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_821:1-150 log:
    Finished processing i33C-assembly_contig_821:1-150
    Processing i33C-assembly_contig_530:1-2440
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 93
    Total Reads: 147, Coverage: 93, minDepth: 9
    Confirmed 2249 of 2440 bases (92.17%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 3 small deletions totaling 3 bases
    i33C-assembly_contig_530:1-2440 log:
    Finished processing i33C-assembly_contig_530:1-2440
    Processing i33C-assembly_contig_531:1-811
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 0
    Total Reads: 1, Coverage: 0, minDepth: 5
    Confirmed 0 of 811 bases (0.00%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_531:1-811 log:
    Finished processing i33C-assembly_contig_531:1-811
    Processing i33C-assembly_contig_449:1-878
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 87
    Total Reads: 151, Coverage: 87, minDepth: 9
    Confirmed 818 of 878 bases (93.17%)
    Corrected 7 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    i33C-assembly_contig_449:1-878 log:
    Finished processing i33C-assembly_contig_449:1-878
    Processing i33C-assembly_contig_1084:1-712
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 5
    Total Reads: 5, Coverage: 5, minDepth: 5
    Confirmed 545 of 712 bases (76.54%)
    Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 2 small deletions totaling 4 bases
    i33C-assembly_contig_1084:1-712 log:
    Finished processing i33C-assembly_contig_1084:1-712
    Processing i33C-assembly_contig_1127:1-8076670
    unpaired ngmlrMAP_ONT.sorted.bam: coverage 62
    Total Reads: 76049, Coverage: 62, minDepth: 6
    Confirmed 7581872 of 8076670 bases (93.87%)
    Corrected 113 snps; 10 ambiguous bases; corrected 47 small insertions totaling 4767 bases, 4742 small deletions totaling 28141 bases
    Large collapsed region: i33C-assembly_contig_1127:885636-898178 size 12543
    # Attempting to fix local continuity breaks
    # fix break: i33C-assembly_contig_1127:23130-25395 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:25829 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:26799-29571 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:30127 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:30346-30369 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:30618-30625 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:31028 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:31289-31355 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:31714-31940 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:32828 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:33037-54201 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:54478 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:55116-55135 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:55553-59741 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:60002 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:60374-60814 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:61387 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:62196-62308 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:62794-62797 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:66099 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:71318-71319 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:72781-72832 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:73094 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:77821 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:79361 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:82392-82393 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:89050 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:92631 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:105584 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:108379-108521 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:117137 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:118789-118790 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:119296-120556 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:120889-120895 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:121113-121569 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:122947-126693 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:126974 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:127209-127248 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:127474-127803 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:128189-144832 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:145291 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:145506 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:147370-157996 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:159763-180309 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:180685 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:181072-181178 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:181945 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:182630-182799 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:185800-186053 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:186331-191048 0 -0 +0 NoSolution
    fix break: i33C-assembly_contig_1127:193795-193806 193795 -102 +585 OpenedGap
    # fix break: i33C-assembly_contig_1127:194778-194839 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:195370-199016 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:199240-214163 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:214583-216967 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:217187-227632 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:228162-228456 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:228725-228838 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:229076-242290 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:243257-260594 0 -0 +0 NoSolution
    # fix break: i33C-assembly_contig_1127:261744-261943 0 -0 +0 NoSolution
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at com.simontuffs.onejar.Boot.run(Boot.java:340)
    	at com.simontuffs.onejar.Boot.main(Boot.java:166)
    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    	at java.lang.String.substring(String.java:1969)
    	at scala.collection.immutable.StringOps$.slice$extension(StringOps.scala:44)
    	at org.broadinstitute.pilon.Assembler$$anonfun$addToPileups$1.apply$mcVI$sp(Assembler.scala:82)
    	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    	at org.broadinstitute.pilon.Assembler.addToPileups(Assembler.scala:81)
    	at org.broadinstitute.pilon.Assembler.addRead(Assembler.scala:66)
    	at org.broadinstitute.pilon.Assembler$$anonfun$addReads$1.apply(Assembler.scala:48)
    	at org.broadinstitute.pilon.Assembler$$anonfun$addReads$1.apply(Assembler.scala:48)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.broadinstitute.pilon.Assembler.addReads(Assembler.scala:48)
    	at org.broadinstitute.pilon.GapFiller.assembleIntoBreak(GapFiller.scala:124)
    	at org.broadinstitute.pilon.GapFiller.assembleAcrossBreak(GapFiller.scala:52)
    	at org.broadinstitute.pilon.GapFiller.fixBreak(GapFiller.scala:45)
    	at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:383)
    	at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:381)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.broadinstitute.pilon.GenomeRegion.identifyAndFixIssues(GenomeRegion.scala:381)
    	at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:119)
    	at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:108)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    	at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
    	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
    	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
    	at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
    	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
    	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    
    
    opened by jnarayan81 0
  • Amazon example

    Amazon example

    Dear Marios,

    The file "StackNet/example/example_amazon/EXAMPLE.MD" has a very good introduction on how to run StackNet in this competition, but I think the command line for the second part is missing the params "params=param_amazon_count.txt". The current version throws a java exception. This command line worked for me (just added params):

    java -Xmx3048m -jar StackNet.jar train task=classification data_prefix=amazon_counts test_file=amazon_counts_test.txt params=param_amazon_count.txt pred_file=amazon_count_pred.csv verbose=true Threads=1 folds=5 seed=1 metric=auc

    Hope it helps!

    opened by vhdeluca 0
Owner
Marios Michailidis
Competitive Data Scientist at H2o.ai
Marios Michailidis
Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Flow Flow is a computational framework for deep RL and control experiments for traffic microsimulation. See our website for more information on the ap

null 867 Jan 2, 2023
A standard framework for modelling Deep Learning Models for tabular data

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike.

null 801 Jan 8, 2023
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve

Mohamed Khalil 21 Nov 22, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Dec 29, 2022
:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

bulbea "Deep Learning based Python Library for Stock Market Prediction and Modelling." Table of Contents Installation Usage Documentation Dependencies

Achilles Rasquinha 1.8k Jan 5, 2023
Civsim is a basic civilisation simulation and modelling system built in Python 3.8.

Civsim Introduction Civsim is a basic civilisation simulation and modelling system built in Python 3.8. It requires the following packages: perlin_noi

null 17 Aug 8, 2022
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations Code repo for paper Trans-Encoder: Unsupervised sentence-pa

Amazon 101 Dec 29, 2022
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

null 2.7k Jan 3, 2023
null 190 Jan 3, 2023
Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

Diego Restrepo 11 Sep 10, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 3, 2023
Dataloader tools for language modelling

Installation: pip install lm_dataloader Design Philosophy A library to unify lm dataloading at large scale Simple interface, any tokenizer can be inte

null 5 Mar 25, 2022
A Tensorflow based library for Time Series Modelling with Gaussian Processes

Markovflow Documentation | Tutorials | API reference | Slack What does Markovflow do? Markovflow is a Python library for time-series analysis via prob

Secondmind Labs 24 Dec 12, 2022
Reaction SMILES-AA mapping via language modelling

rxn-aa-mapper Reactions SMILES-AA sequence mapping setup conda env create -f conda.yml conda activate rxn_aa_mapper In the following we consider on ex

null 16 Dec 13, 2022
House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

House-GAN++ Code and instructions for our paper: House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent

null 122 Dec 28, 2022
A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Emma 1 Jan 18, 2022