Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit

Last update: Jan 6, 2023

Related tags

Deep Learning c-plus-plus machine-learning reinforcement-learning cpp active-learning online-learning contextual-bandits learning-to-search

Overview

This is the Vowpal Wabbit fast online learning code.

Why Vowpal Wabbit?

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well. Vowpal Wabbit is a destination for implementing and maturing state of the art algorithms with performance in mind.

Input Format. The input format for the learning algorithm is substantially more flexible than might be expected. Examples can have features consisting of free form text, which is interpreted in a bag-of-words way. There can even be multiple sets of free form text in different namespaces.
Speed. The learning algorithm is fast -- similar to the few other online algorithm implementations out there. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function.
Scalability. This is not the same as fast. Instead, the important characteristic here is that the memory footprint of the program is bounded independent of data. This means the training set is not loaded into main memory before learning starts. In addition, the size of the set of features is bounded independent of the amount of training data using the hashing trick.
Feature Interaction. Subsets of features can be internally paired so that the algorithm is linear in the cross-product of the subsets. This is useful for ranking problems. The alternative of explicitly expanding the features before feeding them into the learning algorithm can be both computation and space intensive, depending on how it's handled.

Visit the wiki to learn more.

Getting Started

For the most up to date instructions for getting started on Windows, MacOS or Linux please see the wiki. This includes:

Comments

C# refactoring, memory leak fixes, general goodness,...

fixed Runtime library mismatch between zlib, libvw, vw.exe, VowpalWabbitCore.dll (CLR),... by using zlib/boost nuget provided msbuild targets included Visual Leak Detector for memory leak detection on windows refactored C# API to allow users to dynamically constructor serializers based on alternate descriptions (not just on static annotations) string marshalling is compatible to command line (either escaping or splitting) schema based pre-hashing: if hash can be determine from schema it's only generated once and re-used for each example. added type extension API for marshalling allow user to generate native and string examples in parallel in both debug and release keep marshalling expression tree for debugging refactored marshalling expression tree generation to improve readability added sweeping helper improved C# label parsing extensibility added assembly signing fixed memory leaks in C# usage of VW fixed model hashing/reload interaction fixed handling of empty line examples within set of action dependent features fixed order issue when predicting ADF examples containing empty action dependent features fixed default namespace incompatibility (space vs. 0) improved RunTests to C# test wrapping (detects inter-test dependencies and input files) unit tests are run in test/ folder, thus no need copy all input files added user-supplied model id support

opened by eisber 60

VWRegressor provides very different performance for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

Describe the bug

VWRegressor provides very different performance for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

loss_function = 'squared' - provides very GOOD low MAE loss_function = 'quantile' , quantile_tau = 0.5 - provides bad high MAE

data is mixture categorical data and continues data : 600 rows like this

To Reproduce

        if 1:
            model = VWRegressor(convert_to_vw = False ,normalized = True, 
                                                           passes = passes, 
                                                            power_t = 0.5, #1.0,
                                                           readable_model = 'my_VW.model' , cache_file =  'my_VW.cache' ,
                                                           learning_rate = 2.3 , l2 = l2, l1=l1,
                                                           quadratic= 'CC' , cubic = 'CCC',
                                                            loss_function = 'quantile' , quantile_tau = 0.5)
            q=0
        else:
            model = VWRegressor(convert_to_vw = False ,normalized = True, 
                                                      passes = passes, 
                                                       power_t = 0.5, #1.0,
                                                      readable_model = 'my_VW.model' , cache_file =  'my_VW.cache' ,
                                                      learning_rate = 2.1, loss_function = 'squared' , l2 = l2, l1=l1,
                                                      quadratic= 'CC' , cubic = 'CCC' )

Expected behavior

my guess MAE for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared' should be very similar

in addition loss_function = 'quantile' , quantile_tau = 0.9 and loss_function = 'quantile' , quantile_tau = 0.1 gives very wide confidence intervals - even no sense confidence intervals

Observed Behavior

How did VW behave? Please include any stack trace, log messages or crash logs.

Environment

What version of VW did you use? latest OS - windows 10

Additional context

do you have code example where VWRegressor loss_function = 'quantile' , quantile_tau = 0.9 and loss_function = 'quantile' , quantile_tau = 0.1

Question

opened by Sandy4321 56

JNI Layer throws Exceptions when close method is called in parallel ON DIFFERENT MODELS
Problem

In the JNI layer, when multiple passes are enabled (> 1) and an attempt is made to close separate models in parallel, exceptions can be thrown. This is true even though each model has it's own lock to guard all accesses to the native code paths. Only a global lock around calls to the model close methods seems to avoid this issue.

I'm looking for help on identifying if any critical sections of the C code that can be guarded by a lock to avoid thread-safety issues. I'm not asking for the C code to lock. I just want help trying to figure out where to put the locks in the Java code that wraps the C code.

Scope

This seems to be in PR #1291 but was not fixed with PR #1295

Discussion

From empirical testing, it appears one of these lines seems to be the problem. I am wondering if any of these use global state. I am trying to figure out if we can lock only over a short critical section to avoid thread-safety issues.

adjust_used_index(*vwInstance);

vwInstance->do_reset_source = true;

VW::start_parser(*vwInstance);

LEARNER::generic_driver(*vwInstance);

VW::end_parser(*vwInstance);

Previous Conversation

In PR #1295 there was the following conversation:

@JohnLangford

There should be zero shared state between multiple created VW objects. Is that what it's doing? (Creating multiple distinct VW objects?)

@deaktator

@JohnLangford. It looks like just one VW object. Each Java call does the following on the C side:

vw* vwInstance = VW::initialize(env->GetStringUTFChars(command, NULL);

@JohnLangford

A single VW object can not be operated on in multiple threads because the code inside VW is not thread safe. If you want to have a model which is shared by multiple threads, you set this up more explicitly by initializing a new VW object with an existing model.

@deaktator

Hey @JohnLangford. We take care of multi-threaded access to VW by locking anywhere that requires access to the C code. The thread-safety issues I encountered before were on an incomplete version of the code that locked in the wrong place. When I run the tests in parallel, they seem to work just fine now. I ran them a bunch of times with forking in the tests and didn't see any issues.

Tracking Down What's Happening

It appears @jon-morra-zefr pretty much copied the C# code for multiple passes, so this seems like it might apply to C# as well. Both C# and JNI C++ code appear below as well as the calling code that blows up.

I've seen a bunch of different errors that occur at the same spot. invalidated cache, malformed LDF feature exceptions, etc.

Example Code That triggers exceptions

// Doing this many times in parallel with no locks causes problems. val vwJNI = VWLearners.create[VWTypedLearner[_]](vwLearnString) // Learning in here using vwJNI.learn // PROBLEM AREA: lock.lock() // <== NEED GLOBAL LOCKING OR EXCEPTIONS THROWN vwJNI.close() lock.unlock() // <== NEED GLOBAL LOCKING OR EXCEPTIONS THROWN

Similarity of the C# and JNI C++ Code

C# Code: vowpal_wabbit/cs/cli/vowpalwabbit.cpp

void VowpalWabbit::RunMultiPass() { if (m_vw->numpasses > 1) { try { adjust_used_index(*m_vw); m_vw->do_reset_source = true; VW::start_parser(*m_vw); LEARNER::generic_driver(*m_vw); VW::end_parser(*m_vw); } CATCHRETHROW } }

JNI C++ Code: vowpal_wabbit/java/src/main/c++/vowpalWabbit_learner_VWLearners.cc

JNIEXPORT void JNICALL Java_vowpalWabbit_learner_VWLearners_performRemainingPasses(JNIEnv *env, jclass obj, jlong vwPtr) { try { vw* vwInstance = (vw*)vwPtr; if (vwInstance->numpasses > 1) { adjust_used_index(*vwInstance); vwInstance->do_reset_source = true; VW::start_parser(*vwInstance); LEARNER::generic_driver(*vwInstance); VW::end_parser(*vwInstance); } } catch(...) { rethrow_cpp_exception_as_java_exception(env); } }

Any thoughts?
Lang: Java Bug In Progress Priority: Medium
opened by ryan-deak-zefr 50
Continuous actions

This is the preliminary PR for continuous actions.

This includes, cats_tree (continuous action tree with smoothing) algorithm, converting between PMF (discrete) to PDF (continuous) distribution, sampling form continuous PDF, etc.

The code is for the paper available at https://arxiv.org/pdf/2006.06040.pdf

We will add more details.

opened by mmajzoubi 47
Bug fixes

Fixed NRE on empty hashes Skip model load/initialize when seeding from in-memory model Fixed progressive validation in Azure trainer Includes mixed JSON string and JSON direct support Includes native C++ JSON parsing

opened by eisber 47
Try coveralls

Added 3 new make targets: vw_gcov, library_example_gcov, test_gcov which build vw and the examples with GCOV support, then run tests. This allows coveralls to analyze test coverage in the source code, but slows the tests down signifigantly. I also edited the travis .yml file to upload the results to coveralls.io and added the badge to the readme.

Someone will need to setup a coveralls account for the main VW project and point the badge in the readme to that badge. Currently the coveralls badge points only to my fork.

opened by zachmayer 47
Pandas to vw text format
1. Overview

The goal of this PR is to fix the issue #2308.

The PR introduces a new class DFToVW in vowpalwabbit.pyvw that takes as input the pandas.DataFrame and special types (SimpleLabel, Feature, Namespace) that specify the desired VW conversion.

These classes make extensive use of a class Col that refers to a given column in the user specified dataframe.

A simpler interface DFtoVW.from_colnames also be used for the simple use-cases. The main benefit is that the user need not use the specific types.

Below are some usages of this class. They all rely on the following pandas.DataFrame called df :

house_id need_new_roof price sqft age year_built 0 id1 0 0.23 0.25 0.05 2006 1 id2 1 0.18 0.15 0.35 1976 2 id3 0 0.53 0.32 0.87 1924

2. Simple usage using DFtoVW.from_colnames

Let say we want to build a VW dataset with the target need_new_roof and the feature age :

from vowpalwabbit.pyvw import DFtoVW conv = DFtoVW.from_colnames(y="need_new_roof", x=["age", "year_built"], df=df)

Then we can use the method process_df:

conv.process_df()

that outputs the following list:

['0 | 0.05 2006', '1 | 0.35 1976', '0 | 0.87 1924']

This list can then directly be consumed by the method pyvw.model.learn.

3. Advanced usages using default constructor

The class DFtoVW also allow the following patterns in its default constructor :

tag

(named) namespaces, with scaling factor

(named) features, with constant feature possible

To use these more complex patterns we need to import them using:

from vowpalwabbit.pyvw import SimpleLabel, Namespace, Feature, Col

3.1. Named namespace with scaling, and named feature

Let's create a VW dataset that include a named namespace (with scaling) and a named feature:

conv = DFtoVW( df=df, label=SimpleLabel(Col("need_new_roof")), namespaces=Namespace(name="Imperial", value=0.092, features=Feature(value=Col("sqft"), name="sqm")) ) conv.process_df()

which yields:

['0 |Imperial:0.092 sqm:0.25', '1 |Imperial:0.092 sqm:0.15', '0 |Imperial:0.092 sqm:0.32']

3.2. Multiple namespaces, multiple features, and tag

Let's create a more complex example with a tag and multiples namespaces with multiples features.

conv = DFtoVW( df=df, label=SimpleLabel(Col("need_new_roof")), tag=Col("house_id"), namespaces=[ Namespace(name="Imperial", value=0.092, features=Feature(value=Col("sqft"), name="sqm")), Namespace(name="DoubleIt", value=2, features=[Feature(value=Col("price")), Feature(Col("age"))]) ] ) conv.process_df()

which yields:

['0 id1|Imperial:0.092 sqm:0.25 |DoubleIt:2 0.23 0.05', '1 id2|Imperial:0.092 sqm:0.15 |DoubleIt:2 0.18 0.35', '0 id3|Imperial:0.092 sqm:0.32 |DoubleIt:2 0.53 0.87']

4. Implementation details

The class DFtoVW and the specific types are located in vowpalwabbit/pyvw.py. The class only depends on the pandas module.

the code includes docstrings

8 tests are included in tests/test_pyvw.py

5. Extensions

This PR does not yet handle multilines and more complex label types.

To convert very large dataset that can't fit in RAM, one can make use of the pandas import option chunksize and process each chunk at a time. I could also implement this functionnality directly in the class using generator. The generator would then be consumed by either a VW learning interface or could be written to external file (for conversion purpose only).
opened by etiennekintzler 43

Test don't pass on Mac OS 10.10

Mac os 10.10 and boost version boost-1.58.0

gcc --version Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.0.0 Thread model: posix

Get some warnings and test 16 don't pass.

Here is full log:

make
cd vowpalwabbit; /Library/Developer/CommandLineTools/usr/bin/make -j 4 things
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c main.cc -o main.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c global_data.cc -o global_data.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_regressor.cc -o parse_regressor.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_primitives.cc -o parse_primitives.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c unique_sort.cc -o unique_sort.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cache.cc -o cache.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c rand48.cc -o rand48.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c simple_label.cc -o simple_label.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multiclass.cc -o multiclass.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c oaa.cc -o oaa.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multilabel_oaa.cc -o multilabel_oaa.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c boosting.cc -o boosting.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c ect.cc -o ect.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c autolink.cc -o autolink.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c binary.cc -o binary.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lrq.cc -o lrq.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cost_sensitive.cc -o cost_sensitive.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multilabel.cc -o multilabel.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c label_dictionary.cc -o label_dictionary.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c csoaa.cc -o csoaa.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb.cc -o cb.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb_adf.cc -o cb_adf.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb_algs.cc -o cb_algs.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search.cc -o search.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_meta.cc -o search_meta.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_sequencetask.cc -o search_sequencetask.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_dep_parser.cc -o search_dep_parser.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_hooktask.cc -o search_hooktask.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_multiclasstask.cc -o search_multiclasstask.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_entityrelationtask.cc -o search_entityrelationtask.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_graph.cc -o search_graph.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_example.cc -o parse_example.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c scorer.cc -o scorer.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c network.cc -o network.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_args.cc -o parse_args.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c accumulate.cc -o accumulate.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c gd.cc -o gd.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c learner.cc -o learner.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lda_core.cc -o lda_core.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c gd_mf.cc -o gd_mf.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c mf.cc -o mf.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c bfgs.cc -o bfgs.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c noop.cc -o noop.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c print.cc -o print.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c example.cc -o example.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parser.cc -o parser.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c loss_functions.cc -o loss_functions.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c sender.cc -o sender.o
parser.cc:452:26: warning: 'daemon' is deprecated: first deprecated in OS X 10.5
      [-Wdeprecated-declarations]
      if (!all.active && daemon(1,1))
                         ^
/usr/include/stdlib.h:267:6: note: 'daemon' has been explicitly marked
      deprecated here
int      daemon(int, int) __DARWIN_1050(daemon) __OSX_AVAILABLE_BUT_DEPR...
         ^
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c nn.cc -o nn.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c bs.cc -o bs.o
1 warning generated.
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cbify.cc -o cbify.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c topk.cc -o topk.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c stagewise_poly.cc -o stagewise_poly.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c log_multi.cc -o log_multi.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c active.cc -o active.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c kernel_svm.cc -o kernel_svm.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c best_constant.cc -o best_constant.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c ftrl.cc -o ftrl.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c svrg.cc -o svrg.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lrqfa.cc -o lrqfa.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c interact.cc -o interact.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c comp_io.cc -o comp_io.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c interactions.cc -o interactions.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c vw_exception.cc -o vw_exception.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c allreduce.cc -o allreduce.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o active_interactor active_interactor.cc
ar rcs liballreduce.a allreduce.o
ar rcs libvw.a hash.o global_data.o io_buf.o parse_regressor.o parse_primitives.o unique_sort.o cache.o rand48.o simple_label.o multiclass.o oaa.o multilabel_oaa.o boosting.o ect.o autolink.o binary.o lrq.o cost_sensitive.o multilabel.o label_dictionary.o csoaa.o cb.o cb_adf.o cb_algs.o search.o search_meta.o search_sequencetask.o search_dep_parser.o search_hooktask.o search_multiclasstask.o search_entityrelationtask.o search_graph.o parse_example.o scorer.o network.o parse_args.o accumulate.o gd.o learner.o lda_core.o gd_mf.o mf.o bfgs.o noop.o print.o example.o parser.o loss_functions.o sender.o nn.o bs.o cbify.o topk.o stagewise_poly.o log_multi.o active.o kernel_svm.o best_constant.o ftrl.o svrg.o lrqfa.o interact.o comp_io.o interactions.o vw_exception.o
/usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o vw main.o -L. -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
cd cluster; /Library/Developer/CommandLineTools/usr/bin/make
/usr/bin/clang++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c spanning_tree.cc -o spanning_tree.o
spanning_tree.cc:161:9: warning: 'daemon' is deprecated: first deprecated in OS
      X 10.5 [-Wdeprecated-declarations]
    if (daemon(1,1))
        ^
/usr/include/stdlib.h:267:6: note: 'daemon' has been explicitly marked
      deprecated here
int      daemon(int, int) __DARWIN_1050(daemon) __OSX_AVAILABLE_BUT_DEPR...
         ^
1 warning generated.
/usr/bin/clang++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o spanning_tree spanning_tree.o 
cd library; /Library/Developer/CommandLineTools/usr/bin/make things
/usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o ezexample_predict ezexample_predict.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
/usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o ezexample_train ezexample_train.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
/usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o library_example library_example.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
/usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o recommend recommend.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
/usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o gd_mf_weights gd_mf_weights.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
/usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o test_search test_search.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z

make test
cd vowpalwabbit; /Library/Developer/CommandLineTools/usr/bin/make -j 4 things
make[1]: Nothing to be done for `things'.
cd library; /Library/Developer/CommandLineTools/usr/bin/make things
make[1]: Nothing to be done for `things'.
vw running test-suite...
(cd test && ./RunTests -d -fe -E 0.001 ../vowpalwabbit/vw ../vowpalwabbit/vw)
Testing on: hostname=air-mac OS=darwin
Testing vw: ../vowpalwabbit/vw
Testing lda: ../vowpalwabbit/vw
RunTests: '-D' to see any diff output
RunTests: '-o' to force overwrite references
RunTests: test 1: stderr OK
RunTests: test 2: stderr OK
RunTests: test 2: predict OK
RunTests: test 3: stderr OK
RunTests: test 4: stdout OK
RunTests: test 4: stderr OK
RunTests: test 5: stderr OK
RunTests: test 6: stderr OK
RunTests: test 6: minor (<0.001) precision differences ignored
RunTests: test 6: predict OK
RunTests: test 7: stderr OK
RunTests: test 8: stderr OK
RunTests: test 8: minor (<0.001) precision differences ignored
RunTests: test 8: predict OK
RunTests: test 9: stderr OK
RunTests: test 9: predict OK
RunTests: test 10: stderr OK
RunTests: test 10: predict OK
RunTests: test 11: stderr OK
RunTests: test 12: stderr OK
RunTests: test 13: stderr OK
RunTests: test 14: stdout OK
RunTests: test 14: minor (<0.001) precision differences ignored
RunTests: test 14: stderr OK
RunTests: test 15: stdout OK
RunTests: test 15: stderr OK
RunTests: test 16: stdout OK
--- diff -u --minimal train-sets/ref/rcv1_small.stderr stderr.tmp
--- train-sets/ref/rcv1_small.stderr    2015-08-13 00:22:20.000000000 +0300
+++ stderr.tmp  2015-08-13 00:33:33.000000000 +0300
@@ -17,7 +17,7 @@
  5 0.47879     0.00006     0.00617      0.595892   0.183063                            0.47184     1.00000   
  6 0.47750     0.00000     0.00221      0.703360   0.403715                            0.68626     1.00000   
  7 0.47680     0.00000     0.00038      0.588395   0.175459                            0.08911     1.00000   
- 8 0.47671     0.00000     0.00002      0.568445   0.136827                            0.00444     1.00000   
+ 8 0.47671     0.00000     0.00002      0.568443   0.136827                            0.00444     1.00000   

 finished run
 number of examples = 8000
RunTests: test 16: FAILED: ref(train-sets/ref/rcv1_small.stderr) != stderr(stderr.tmp)
    cmd: ../vowpalwabbit/vw -k -c -d train-sets/rcv1_small.dat --loss_function=logistic -b 20 --bfgs --mem 7 --passes 20 --termination 0.001 --l2 1.0 --holdout_off

opened by mrgloom 43

Trying to upgrade from vw-jni-8.2.0 to something close to vw-jni-8.4.1-SNAPSHOT
Using VW 8.4.0 installed with brew I created a simple test set initializing VW with

$ vw --csoaa 10 -b 24 --l2 0.0 -l 0.1 -c -k --passes 100 -f /Users/pat/big-data/harness/models/test_resource --save_resume

Then I paste examples in:

0:0.0 1:1.0 | user_user_2 testGroupId_1 0:0.0 1:1.0 | user_user_2 testGroupId_1 0:0.0 1:1.0 | user_user_2 testGroupId_1 0:0.0 1:1.0 | user_user_2 testGroupId_1 0:1.0 1:0.0 | user_user_1 testGroupId_1 0:1.0 1:0.0 | user_user_1 testGroupId_1 0:1.0 1:0.0 | user_user_1 testGroupId_1 0:1.0 1:0.0 | user_user_1 testGroupId_1 save_

at the save_ the file Users/pat/big-data/harness/models/test_resource is updated—all is well.

Using the last available JNI binary wrapper for 8.2.0 doing the same thing from Java does not update the model file. I'm not running in --quiet mode and there is no complaint from VW.

The save_ pseudo example does apparently work for 8.2.0 since another user is using it in CLI and daemon mode.

Is this feature not supported with JNI?

On the advice of @arielf it appears I need something like vw-jni-8.4.1-SNAPSHOT so trying to build for dev machine (MBP) and deploy machine (ubuntu). Dev machine first.
opened by pferrel 41

Python wrapper installation fails

I'm not able to pip install vowpalwabbit to install the python wrapper. I don't know enough to understand why it's failing, but I thought it might be worth bringing to someone's attention.

I'm on OSX and using an Anaconda environment. I installed vowpal wabbit from homebrew.

Here's my traceback:

Collecting vowpalwabbit
  Using cached vowpalwabbit-8.2.0.tar.gz
Building wheels for collected packages: vowpalwabbit
  Running setup.py bdist_wheel for vowpalwabbit ... error
  Complete output from command /Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/tmp5FosrOpip-wheel- --python-tag cp27:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.5-x86_64-2.7
  creating build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
  copying vowpalwabbit/__init__.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
  copying vowpalwabbit/pyvw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
  copying vowpalwabbit/sklearn_vw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
  running egg_info
  writing vowpalwabbit.egg-info/PKG-INFO
  writing top-level names to vowpalwabbit.egg-info/top_level.txt
  writing dependency_links to vowpalwabbit.egg-info/dependency_links.txt
  warning: manifest_maker: standard file '-c' not found

  reading manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no previously-included files matching '*.o' found anywhere in distribution
  warning: no previously-included files matching '*.exe' found anywhere in distribution
  warning: no previously-included files matching '*.pyc' found anywhere in distribution
  writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
  running build_ext
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 184, in <module>
      tests_require=['tox'],

[...]

    File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 38, in find_boost
      raise Exception('Could not find boost python library')
  Exception: Could not find boost python library

  ----------------------------------------
  Failed building wheel for vowpalwabbit
  Running setup.py clean for vowpalwabbit
Failed to build vowpalwabbit
Installing collected packages: vowpalwabbit
  Running setup.py install for vowpalwabbit ... error
    Complete output from command /Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-4GRRiq-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.5-x86_64-2.7
    creating build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
    copying vowpalwabbit/__init__.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
    copying vowpalwabbit/pyvw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
    copying vowpalwabbit/sklearn_vw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
    running egg_info
    creating vowpalwabbit.egg-info
    writing vowpalwabbit.egg-info/PKG-INFO
    writing top-level names to vowpalwabbit.egg-info/top_level.txt
    writing dependency_links to vowpalwabbit.egg-info/dependency_links.txt
    writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
    warning: manifest_maker: standard file '-c' not found

    reading manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*' under directory 'src'
    warning: no previously-included files matching '*.o' found anywhere in distribution
    warning: no previously-included files matching '*.exe' found anywhere in distribution
    warning: no previously-included files matching '*.pyc' found anywhere in distribution
    writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
    running build_ext
    make: *** No rule to make target `clean'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 184, in <module>
        tests_require=['tox'],

[...]

    subprocess.CalledProcessError: Command '['make', 'clean']' returned non-zero exit status 2

    ----------------------------------------
Command "/Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-4GRRiq-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/

opened by vaer-k 41

Accept multiline examples in the JNI interface

This is a pretty major refactoring of the JNI layer. The impetus for this refactoring was the ability to accept multiline examples but it lead to a much larger change. The biggest change is the decoupling of the return type and the prediction function. This proved necessary to support all the different ways to extract data with --cb_explore.

I am going to go over this in detail with @deaktator offline, so @JohnLangford let's hold off on merging this for now. If anyone else has any comments at this time they are certainly welcome.

opened by jmorra 40

refactor: change initial pool size to 0

Forcing all consumers to have a preallocated pool is quite restrictive, especially now that the pool is dynamic (in the past it was a fixed size ring buffer).

Removing this initial allocation improves support for library scenarios, and in practice should have no effect on driver performance. (hyperfine shows this too)

This change from a default of 256 to an initial size of 0 means that base memory consumption is halved (An example is ~31kb) from 13.5MB to 6.2MB.

Startup should also be reduced. While it is hard to measure: vw --no_stdin using hyperfine shows:

Benchmark 1: ./build/vowpalwabbit/cli/vw --no_stdin
  Time (mean ± σ):       4.5 ms ±   2.1 ms    [User: 1.6 ms, System: 0.9 ms]
  Range (min … max):     3.0 ms …  30.6 ms    682 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ./vw-master --no_stdin
  Time (mean ± σ):       5.8 ms ±   1.6 ms    [User: 2.0 ms, System: 1.5 ms]
  Range (min … max):     4.2 ms …  21.6 ms    658 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  './build/vowpalwabbit/cli/vw --no_stdin' ran
    1.29 ± 0.69 times faster than './vw-master --no_stdin'

opened by jackgerrits 0

feat: constexpr uniform_hash and type fixes

Makes uniform hash constexpr.

It clarifies the fact we use x86 murmurhash 32bit, and fixes the types accordingly. Note, we were casting internally anyway which made for an odd API. Now it is clearer by the parameter types.

opened by jackgerrits 0
refactor: cb_algs finish functions

One small thing here; code-wise its more elegant but it does add one extra bit of computation in the hot path. The finish functions now choose their correct label instead of having different pointers assigned in the setup function. If anybody has an issue with this, I can always just split each function up to avoid the extra computation

opened by peterychang 0

Releases(9.6.0)

9.6.0(Nov 8, 2022)
Large Action Spaces

This introduces the Large Actions Spaces (LAS) feature. LAS is an algorithm that lets exploration happen efficiently when there are a large number of actions in a contextual bandit dataset. The main idea behind it is to eliminate actions that are similar and explore only over the most diverse actions in the dataset. For more information, see the LAS wiki page.

Style Changes

This release introduces additional style changes to make VW code formatting more consistent. Variable and type names are snake_case, constants and macros are UPPERCASE, and the VW::details namespace is used to hide implementation details.

Faster Compile Times

Through @jackgerrits's changes to some header files, VW now builds much faster! On one machine, compile times went from around 30 seconds to 17.

Click here to see all changes in this release

What's Changed

build: remove FORCE_COLORED_OUTPUT by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4213

style: fix some more style issues by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4211

feat: implement serialization and deserialization for model deltas by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4222

chore: update boost_math, fmt, vcpkg, zlib by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4223

style: another round of style updates by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4224

style: update style and namespacing for constants by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4226

fix: [LAS] don't use shared features during SVD calculation by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4225

fix: [LAS] ensure vw prediction makes it to exploration by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4227

fix: VW should not add anything to namespace std by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4230

style: update style of label_type_t by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4229

test: [epsilon decay] find champ change in simulator by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4228

refactor: reduce build time by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4232

feat: [LAS] filter out (potentially) more actions than d based on singular values by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4234

build: do not add sse flags when doing MacOS arm cross build by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4235

style: update label type to all caps by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4236

style: update prediction type to all caps by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4237

build: allow VW_CXX_STANDARD to be provided by consumer of VW by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4238

refactor: remove beam.h by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4241

style: more style fixes by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4240

style: another round of style updates by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4242

feat: [LAS] sparse Rademacher by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4243

chore: [LAS] remove unused implementations and set max actions default by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4247

refactor: move LabelDict namespace items into other namespaces, add const by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4245

fix: don't run tests with iterations (and a simulator) with valgrind … by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4251

chore: [LAS] remove compile time flag and its own custom CI by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4249

fix: don't run tests with iterations with asan and ubsan by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4253

fix: [LAS] block size should never be zero by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4252

fix: [LAS] always return full predictions by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4255

refactor: use model_utils for save_load in las by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4263

refactor: remove unused field in sparse_iterator by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4259

test: add make_args for easier workspace creation in tests by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4267

ci: change caching for benchmark job by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4269

build: resolve cmake version check TODO in DetectCXXStandard.cmake by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4268

fix!: save/load entire tag in flat_example + bump version to 9.6 by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4266

test: apply make_args across test projects by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4272

fix: This patches a bug with flat_example collision cleanup by @mrucker in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4265

fix: explore_eval don't learn if logged action not in predicted actions by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4262

fix: [LAS] full predictions regardless of learn/predict path by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4273

Full Changelog: https://github.com/VowpalWabbit/vowpal_wabbit/compare/9.5.0...9.6.0
Source code(tar.gz)
Source code(zip)
9.5.0(Oct 14, 2022)
Style Changes

This release includes some improvements to the style and naming conventions in VW. This includes using snake_case for all variable and class names, and converting most structs to classes. These style changes will be standardized and enforced in later releases.

Confidence Sequence Estimator

Confidence sequences have become the default estimators when evaluating policies in multi-model reductions such as AutoML and Epsilon Decay.

Click here to see all changes in this release

Changes

Features

feat: integrate confidence sequences in automl and epsilon_decay by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4125

feat: add experimental to python by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4165

feat: Improve large actions multithreading. by @zwd-ms in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4158

feat: [epsilon decay] add initial epsilon option by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4170

feat: Model merging with delta objects by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4177

feat: Add ftrl to dump_weights_to_json and compat CIs by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4193

Fixes

fix: build issue for model merger tool by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4160

fix: remove experimental and fix up model version test by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4162

fix: test 67 windows failure by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4167

fix: one_of for loss_option by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4178

fix: Fix test 67 by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4194

fix: small build fixes for LAS on MacOS by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4202

fix: LAS unit test bug by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4210

fix: quake_inv_sqrt func for aarch64 test failure by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4217

fix: only remove ksvm dump_weights by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4195

fix: Add native runtime dependencies to nuspec by @lokitoth in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4216

Other Changes

chore: [LAS] code cleanup by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4153

ci: Upgrade Ubuntu version used in CI pipelines by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4161

ci: check current VW wheel against most recent models by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4169

ci: Enable test 67 with ASan by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4176

refactor: Use github-action-benchmark for running benchmarks by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4152

docs: Update readme for benchmarks by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4181

ci: check model weights for gd-based tests for forward and backward compat by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4172

style: resolve style issues in allreduce project by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4187

refactor: cleanup includes by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4188

refactor: split sparse and dense parameters by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4190

refactor: move open_socket into details namespace by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4189

refactor: use operators for inequality instead of custom compare functions by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4192

refactor: remove unused type in allreduce by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4191

ci: settle on consistent style and add warnings for violation by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4183

style: fix style issues in config by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4198

style: don't warn on short variable names, add static constant rule by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4196

style: move action scores into VW namespace by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4199

style: update allreduce to snake_case by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4197

refactor: replace classes with structs for consistency by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4205

refactor: move ccb items into VW namespace by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4204

style: rename label_data to VW::simple_label by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4200

ci: Fix randomly failing .NET benchmarks by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4209

refactor: move several labels into VW namespace, style updates by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4206

style: apply more style fixes per clang-tidy by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4208

docs: only document public includes with doxygen by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4212

refactor: update structs to classes with public by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4215

style: style fixes in io project by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4201

refactor: No RapidJSON in header files by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4219

refactor: remove empty public: by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4220

chore: Update Version to 9.5.0 by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4221

Full Changelog: https://github.com/VowpalWabbit/vowpal_wabbit/compare/9.4.0...9.5.0

Source code(tar.gz)
Source code(zip)
9.4.0(Oct 13, 2022)
We tagged 9.4.0 a few weeks ago but a few delays caused the rest of the release to only be completed now. The contents of these release notes and all associated artifacts correspond to the 9.4.0 tag on September 15.

DotNet Core

DotNet core support is here! It works if you manually import dependencies, but it will become automatic in the upcoming 9.4.1.

Native CSV Parser

As part of RLOS Fest this year @HollowMan6 implemented Native CSV parsing support. It is currently disabled by default but is available using a CMake option (VW_BUILD_CSV). This feature makes it possible to download CSV datasets and process them without any extra steps. @HollowMan6 also created a tutorial to explain how to use the feature on the well known iris dataset. Thanks for all of your hard work!

Click here to see all changes in this release

Changes

Features

feat: native CSV parsing by @HollowMan6 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4073

feat: [las] spanner rank one determinant update implementation by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4090

feat: confidence sequences estimator by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4120

feat: .NET Core Support by @lokitoth in https://github.com/VowpalWabbit/vowpal_wabbit/pull/3969

feat: Semantic Versioning by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4134

feat: ptr queue to generic queue by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4140

feat: simple thread pool by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4137

feat: [LAS] use thread pool in LAS one_pass implementation by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4141

feat: Performance benchmarks for .NET C# bindings by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4133

Fixes

fix: [automl] multiple fixes by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4089

fix: [las] unit-test-fails to link by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4118

fix: dump_options typo by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4119

fix: [automl] call process_example before predict by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4122

fix: [automl] generate challengers on first example by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4123

fix(csv): default label regardless of if label column present by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4127

fix: Fix minor typos in large action space benchmarks by @zwd-ms in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4128

fix: add one_of to csoaa_ldf by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4130

fix: Only create launch_vs target if it does not already exist by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4135

Other changes

refactor: [automl] split config oracle out by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4107

refactor: [automl] move files to details dir by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4108

refactor: [automl] small refactors by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4109

refactor: clear labels before reading from cache by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4113

refactor: [automl] template for oracle by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4111

chore: [las] separate spanners and remove aatop SVD impl by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4116

test: [automl] add unit tests for oracle & refactor by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4115

refactor: [automl] remove estimator dependency from oracle by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4117

refactor: [automl] prepare aml_estimator for other estimator impl by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4114

test: add csv parser to throughput measurement tool by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4110

ci: check all runtests models for forwards model compatibility by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4126

ci: clang-tidy only errors by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4129

ci: Build and test with AddressSanitizer by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4103

refactor: [automl] iterator based config generators by @lalo in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4121

perf: [LAS] speedup by removing branch from hot path by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4136

test: Resolve gtest loading issues on Windows by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4131

docs: pin theme to 0.9.0 to resolve missing TOC by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4139

ci: Fix pytype error by @byronxu99 in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4142

build: fix windows.h GetObject issue by @jackgerrits in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4143

refactor: [automl] allow_override for selected options by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4144

test: [LAS ] add spanner tests and test file separation by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4146

test: [LAS] calculate num of non-degenerate singular values by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4147

docs: [LAS] add some comments in the code explaining one-pass SVD by @olgavrou in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4148

refactor: one_of for wap_ldf and cbzo, clean rank options by @bassmang in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4145

chore: Update Version to 9.4.0 by @lokitoth in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4151

Full Changelog: https://github.com/VowpalWabbit/vowpal_wabbit/compare/9.3.0...9.4.0

Source code(tar.gz)
Source code(zip)
9.3.0(Aug 10, 2022)

This release includes a new experimental feature to merge VW models and bug fixes.

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
9.2.0(Jul 12, 2022)

This release includes a large refactoring of the library structure, support for learn and predict directly from a string in Java, access to model weights in Python, and many more small features and bug fixes.

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
9.1.0(Apr 6, 2022)

This release includes a new way to output readable weights, removal of the Boost Program Options dependency, a new loss function and plenty of bug fixes.

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
9.0.1(Feb 1, 2022)
This patch releases resolves a build issue when VW is built against fmtlib version 8 or newer as well as a few bugs.

fix: use different syntax for opening namespace for custom formatters

build: use correct platform suffix for Python native shared library

fix: dftovw address post PR feedback

ci: run twine check in ci

fix: Remove content from image directives in README.rst

build: fixing run_tests.py with custom paths

fix: add long_description_content_type

fix: fix compile issues when consuming fmt 8.1.1

Source code(tar.gz)
Source code(zip)
9.0.0(Jan 28, 2022)

Vowpal Wabbit 9 is the first major release in over 6 years! There are a number of usability improvements, new reductions, bug fixes and internal improvements here. The Python package has undergone a bit of a modernization with a more understandable module structure, naming and types. Most changes should be non breaking for standard use cases. See here for the Python migration guide.

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
8.11.0(Jul 14, 2021)

This release includes python API improvements, --cubic ::: and --interactions [:]* speedup, deprecations, logging line limiting, bug fixes and more

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
8.10.2(Jun 11, 2021)
This patch release contains a fix for model loading in --cb_explore

Fixes

fix: Fix model corruption on reading model for --cb_explore (#3063)

Source code(tar.gz)
Source code(zip)
8.10.1(Apr 13, 2021)
This release includes no code changes but fixes the Python source distribution, and adds support for Python 3.9 binary wheels on MacOS and Windows.

Since this only affects Python, only the PyPi release channel will be updated.

All changes:

ci: run CI on release branches (#2942)

ci: update brew to mitigate bintray brownout (#2941)

chore: update version to 8.10.1

build: add 3.9 to windows python build (#2939)

Fix python manifest for ext_libs (#2938)

build: Update build_python_wheels_macos.yml (#2937)

Source code(tar.gz)
Source code(zip)
8.10.0(Apr 1, 2021)

This release includes quadratic interaction speed improvements, ARM support, logging updates and more.

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
8.9.2(Mar 4, 2021)
This patch release contains a fix for Neural Network reduction (--nn).

Fixes

[nn] fix double free (#2802)

Source code(tar.gz)
Source code(zip)
8.9.1(Jan 27, 2021)
This patch release contains a fix for the Java bindings. It changes no other code paths apart from the version number changing.

Fixes

Don't export lib symbols from vw_jni binary when static linking (#2789)

Source code(tar.gz)
Source code(zip)
8.9.0(Nov 12, 2020)

This release includes major features such as continuous actions, square CB, probabilistic label tree, slates, CB distributionally robust optimization, CB ADF RND, Python wheels and many bug fixes.

Click here to read the full release notes.
Source code(tar.gz)
Source code(zip)
8.8.1(Mar 3, 2020)
This patch release fixes an issue in the Python bindings where parsing examples from text sometimes caused crashes.

Fixes:

Fix delete behavior for examples created using parse function (#2206)

Source code(tar.gz)
Source code(zip)
8.8.0(Dec 7, 2019)
There has significant work in streamlining and improving support for the Python bindings. Thanks @peterychang! (#1928)

The Conditional Contextual Bandit reduction got merged in, this reduction allows you to express problems where there are multiple slots to fill. See here for the wiki page. (#1816) (#1995) (#2078) (#2141)

CMake install targets have been added to the build files (#2172) (#2135)

Now you can find and link VW easily in other projects:

find_package(VowpalWabbit REQUIRED) add_executable(my_exe main.cpp) target_link_libraries(my_exe PRIVATE VowpalWabbit::vw)

Slim VW got merged into master. This is an experimental lightweight inference runtime that supports a subset of VW features. (#2028)

Bug fixes! (see all changes below)

Internal improvements

We are at work overhauling and modernizing VW, some of the relevant changes on that front are below:

Migrate c arrays to std::array (#2094)

Make hashing constexpr in C++14 and unify rotl impl (#2093)

Make is_example_header const (#2095)

Allow constructor arugments for calloc_or_throw (#2070)

Learner now holds type erased reduction data (#2060)

Use numeric_limits (#2107)

Unify throwing of exceptions to use vw_exception instead of bare std:exception (#2171)

Cb explore adf atomization (#2069)

Refactor cb_adf reduction (#2057)

Move cb_sample to be class based (#2087)

Atomize topk reduction (#2050)

Atomize autolink reduction (#2047)

Other notable changes:

--version now includes commit id if available at build time (#1951)

Macos added as CI target (#1965)

Allow escaped command lines (#2157)

Update MSVC Toolchain to v14.1 (#1988)

Multiinstance mode for multiline examples (#1934)

All Changes

Click to expand all changes in 8.8.0

Fix warning (#2179)

Add/exclude new folders in the python MANIFEST (#2180)

fix some warnings (#2177)

Fixes for Learning2Search Subsystem (#2176)

Install rapidjson too (#2174)

Use standard save/load functionality for sklearn Python lib (#2142)

Default the label for CCB when reading cached labels (#2158)

Allow escaped command lines (#2157)

Fix header install locations (#2172)

Unify throwing of exceptions to use vw_exception instead of bare std::exception (#2171)

Disallow combining no_sample and cb_sample (#2148)

Fix memory leak in search.cc (#2167)

Fix unlabeled sgd examples (#2162)

Add deleter for parsed examples (#2153)

Fix misinterpreted negative option (#2149)

Catalina segfault mitigation (#2152)

Implement CCB type binding for Python (#2141)

Update test dependencies (#2147)

Fix typo: setup.py is not in vowpal_wabbit/python (#2143)

Throw instead of silently append nullptr when types don't match (#2139)

Fix variadic macro warning (#2138)

Create testing harness for cluster operation of VW and add test (#2134)

Improve VW support for CMake install process (#2135)

Properly support default build type, fix comment, define project version (#2133)

Update cluster readme to markdown, cleanup, format code (#2131)

Fix slim build and various CMake fixes (#2130)

Update CMakeLists.txt (#2127)

fixed docker image version (#2126)

Fix segfault when ring_size argument is not supplied (#2125)

Python: fix deprecated joblib (#2068)

forgot to set a parameter (#2114)

Add option to turn off sampling for CCB (#2096)

Remove redunant copy from CCB reduction (#2112)

Fix segfault in CCB - MTR must clean up predictions allocated for cost sensitive examples (#2111)

Softmaxpredfile (#2113)

Update vw_types.natvis (#2109)

Use numeric_limits (#2107)

Refactor cb_adf reduction (#2057)

Fix an LGTM warning in recommend (#2105)

Update CMakeSettings.json (#2104)

Remove all usages of "using namespace std" (#2071)

Fix lots of warnings and clang-tidy suggestions (#2085)

Make hashing constexpr in C++14 and unify rotl impl (#2093)

Migrate c arrays to std::array (#2094)

Cb explore adf initialize vars (#2102)

"-q" as default nc delay option (#2098)

Move cb_sample to be class based (#2087)

Cb explore adf atomization (#2069)

Fix OSX builds when not using Anaconda. (#2097)

Mac Os X CI tests fixes (#2035)

Make is_example_header const (#2095)

Enable use of newer standards (#2092)

mitigate clang-cl SIMD issue (#2091)

Java Binding Improvements (#2081)

fix: softmax can overflow (#2088)

Fix type issues and windows version in cmake file (#2084)

Atomize topk reduction (#2050)

Update badges in Python readme (#2086)

pdrop support for cb/ccb dsjson (#2078)

Add CMake option to force color codes (#2082)

Atomize autolink reduction (#2047)

Add forwarding header for commonly used objects in reductions headers (#2080)

Fix initilizer (#2073)

Force OSX to build .so files for python (#2061)

Allow constructor arugments for calloc_or_throw (#2070)

Propagate cache reading failures (#2062)

Learner now holds type erased reduction data (#2060)

Remove unnecessary null checks (#2067)

Remove most usages of unsafe sprintf function (#2054)

Fix LGTM Java build issues (#2063)

Add unit tests to coverage report (#2053)

remove copy from closure, and do by reference (#2058)

Limit python install parallelisation to number of cpus (#2056)

Update noexcept specifier (#2048)

Fix memory leak in CCB prediction (#2065)

Constrain doxygen input dirs, remove graphs (#2055)

vw_slim into master (#2028)

DBG helper and new natvis (#2042)

Multiinstance mode for multiline examples (#1934)

Properly add CCB index feature with stride/offset (#2041)

Ataymano/memory leaks fixes (#2020)

Implement explicit included actions for CCB (#1995)

Add VW-JNI SNAPSHOT publishing to nightly build

Fix compile errors on centos (#2005)

Install all headers as fix for missing headers in installed library (#1994)

Add comment to to tovw to clarify usage (#1999)

Add comment to learner.h (#1997)

Update MSVC Toolchain to v14.1 (#1988)

Remove deprecated projects and old scripts (#1992)

fixed command line argument retrieval from parsed model (#1993)

Fix implicit fallthrough warning and unused variable warning in GCC (#1984)

Replace nanpattern and infpattern with std:: equivalents (#1983)

Migrate Travis to migrated Docker image + cleanup old files (#1982)

Add Azure pipeline for Linux CI (#1981)

Add constexpr and noexcept to some functions, cleanup unused functions (#1985)

vw-hyperopt. Passing additional command when training and validating (#1959)

Ensure vw object cannot be moved or copied (#1986)

Fix building GCOV with Clang (#1980)

Update RunTests to be able to find binaries in the build directory (#1979)

Test and unify usage of ec_is_example_header (#1970)

Remove thread_local_storage from ccb (#1976)

Remove hard requirement for git during Windows build (#1977)

Action scores print tag (#1971)

Fix some warnings (#1973)

Remove two makefiles that were missed in Cmake change (#1969)

cd_adf: Added importance weight probability clipping for cb_type mtr, ips, dr (#1952)

Fix file permissions for macos CI scripts (#1966)

Add macos pipeline (#1965)

Add LICENSE to the python source package (#1963)

Add source info to VW Nuget description, and update copyright years. (#1962)

Update new_version script (#1956)

Update python README to reflect new build proceedures (#1961)

Conditional Contextual Bandit (#1816)

Java Maven pom.xml.in update (#1954)

Add git commit to output of --version (#1951)

cover and regcb: data.counter++ only in learn examples (not predict) (#1950)

Spark/JNI multipass fixes and AllReduce quiet support (#1949)

cb_explore_adf: fixed bug when resetting cb_type + improvements (#1948)

vw-hyperopt (add support for passing namespaces) (#1941)

Tau first should count only learn examples (not predict) (#1944)

Python distributions (#1928)

Enable suppressing NuGet version tag for official builds (#1946)

Fix predict path for cb_explore_adf First (#1939)

CS simulator v3.0 (#1932)

cs/cli/vw_label.h: Avoid to throw for precision issues (#1933)

Remove hard-coded version in Windows CI package gen script. (#1936)

Fix clear labels the correct way (#1930)

Source code(tar.gz)
Source code(zip)
8.7.0(Jun 7, 2019)
The repo has moved to the VowpalWabbit organization

The group of core maintainers has been growing with steady improvements

Changes

As always, lots of bug fixes.

Build System

The build system for VW has been overhauled to use CMake. This means, easier dependency resolution, faster build times and easier consumption as a dependency. The old automake + make systems have been replaced by this and eventually the .sln file will be replaced too. (#1624)

Reductions/Learning Algorithms

Coin betting (#1903)

Contextual Memory Tree (#1799)

Warm start for cbify (#1534)

Softmax learner for cbadf (#1839)

cbify: --cbify_ldf for multiline (csoaa_ldf input datasets (#1681))

Other Improvements

The parser has moved to using std types for concurency and has a clearer data production model. You’ll slowly notice more RAII types in VW. (#1731) (#1777)

A clang format file was added and the codebase reformatted. This is to keep things more consistent and easy to read. (#1701)

The Python bindings can now use JSON as the input format, as long as the VW instance you’re using is configured to be parsing input as JSON (#1809)

A new --strict_parse option was added to throw instead of warn for malformed examples (#1906)

Bare Java JNI Bindings optimized for Apache Spark (#1798)

Add multiclass support for hyperopt utl (#1682)

All Changes

Click to expand all changes in 8.7.0

Warm start for cbify (#1534)

utl/vw-varinfo: work-around for issue/1547 (#1548)

Gramhagen 1538 python make test (#1550)

Fixed bug in CLI parsing of --csoaa_ldf multiline (#1551)

ksvm lambda fix (#1556)

Fixed bug causing to reset dump_interval to 1 when input model -i is provided (#1558)

Extract stable_unique to own function for clarity (#1559)

Improvements and small fixes to utl (vw-lda, csv2vw (#1580)

When parsing dsjson, skip lines not starting with "{" (#1593)

Clean windows build and unify output paths (#1599)

Remove unused cs_testcommon project and directory (#1606)

build instructions: program_options insufficient (#1607)

Fixed _labelIndex out of bound error message (#1609)

Update Readme with details on installing boost dependencies (#1613)

Add caching to appveyor build (#1616)

Use a prebuilt docker image for travis build (#1620)

CMake build definitions (#1624)

VS 2017 fixes (#1628)

Further improve build cache by saving packages (#1634)

Update Travis and Appveyor badges to reflect organization change (#1642)

Fix cb explore adf segfault (#1643)

Make VW setup projects buildable from the command-line (#1646)

Use nuget restore instead of nuget install in Appveyor build (#1659)

Remove unused boost packages from VW (#1664)

Replace $(SolutionDir with $(ProjectDir (#1666)

Use cerr for parser warnings (#1670)

Export vw audit output to .tsv file (#1677)

Enable selective CMake configuration, improve messaging (#1678)

Fix crash when empty multi_ex is supplied for --cb_explore_adf (#1679)

cbify: --cbify_ldf for multiline (csoaa_ldf input datasets (#1681)

add multiclass support for hyperopt utl (#1682)

bugfix: cb_adf is not including some examples in stats calculation (#1686)

Move trace_message from arguments to vw object (#1688)

Add bug report issue template (#1693)

Fix some VW initialization memory leaks (#1697)

Fix memory leak in cb_explore_adf (#1698)

vw java 11 compatibility (#1700)

Add clang-format (#1701)

Fix best constant and best constant's loss calculation when using ksvm (#1704)

Update assembly versions to match current version (#1705)

options_i command line parsing refactor (#1706)

Disable stdin processing for all but single instance CLI. Fixes 1300 (#1708)

Use std::exp instead of exp free function (#1709)

Fix unused params warnings plus incomplete struct init (done to default values. (#1710)

Add check for hash_inv when creating json parser (#1711)

Compile fixes (#1713)

Change from strcmp to std::string operator== (#1715)

Update Dockerfile for TravisCI, define pipeline, upgrade Java (#1716)

Change to using a bool_switch for bool options (#1717)

Ataymano/mac options types fix (#1718)

Add clang-format 7.0.1 to CI image (#1719)

Create Windows build scripts and update instructions (#1721)

Make cbify reduction respect is_learn parameter (#1722)

Add natvis file for v_array and substring (#1723)

Remove -DSTATIC_LINK_VW from Ubuntu build instructions (#1727)

Fix warnings in Windows MSVC x64 build (#1730)

Move to std types for concurrency (#1731)

Scripts for running tests and generating NuGet packages (#1732)

Fix a small issue and cleanup usage of long long (#1733)

Fix file enocding for .rc files (#1734)

Fix build scripts forcing Debug builds. Add LTO mode and fix VW default visibility. (#1735)

Update README.md (#1737)

Update image used in travis (#1738)

Fix model file parser for proper treatment of negative-valued options (#1742)

shift clang-format to advise (#1744)

[tests] Make repeat.py compatible with python 3. (#1747)

Do not define BOOST_TEST_DYN_LINK when statically linking (#1750)

Fix test 175 (#1751)

Convert TopK reduction to be multiline example based (#1752)

RunTests: use test label number instead of counter (#1753)

Fix static linking (#1758)

Small Json parser cleanup (#1759)

Type erase json parser context for easier deletion (#1760)

Migrate from v_array to std::vector in some places + other changes (#1765)

On OSX, we need to catch boost::program_options::ambiguous_option directly. Probably due to different boost versions. (#1776)

Parser now uses object and prod/cons queue and object pool for example parsing (#1777)

Fix resetting of cb_type (#1779)

Fix weighted average on cluster mode (#1781)

Allow spanning_tree to be a static executable (#1787)

[audit] Make sure --audit output is reproducible across systems. (#1788)

Add example C# Simulator (#1790)

Fix usage of char instead of unsigned char for namespace index (#1791)

Fix to bug 1729: no bias regularization for BFGS not working (#1794)

Bare Java JNI Bindings optimized for Apache Spark (#1798)

Contextual Memory Tree (#1799)

expand Boost_LIBRARIES for pkg-config libs (#1801)

Use std::stable_sort when computing interactions. (#1804)

[OSX] Improve support for static linking. (#1807)

Enable JSON example parsing in Python bindings (#1809)

Fix example constructor (#1814)

[python] Fix packaging so it's possible to produce a wheels package on linux. (#1815)

Update Python readme to reflect latest way to consume bindings from source (#1817)

Fix L2S HookTask in c-library and Python bindings (#1818)

Clear cb labels when returning to pool (#1822)

Fix build scripts for python3 (#1823)

Fix "average loss" bug in cb_explore.cc (#1825)

Add gitter chat badge to readme (#1826)

Fix README.Windows.md link (#1827)

Consume RapidJson through submodule (#1828)

fix bug 1784: return probabilities for actions in the correct order in python binding (#1830)

Fix else statement (#1832)

Fix NaN issue in box-muller tranform (#1833)

Fix v142 toolchain (#1834)

Implement finish_example, update scripts to match (#1837)

cb_adf and cb_explore_adf: Setting --cb_type mtr as default (#1838)

First attempt at softmax learner for cbadf (#1839)

Fix conda_install.sh script (#1843)

Fix counting issue with --holdout_after option (#1844)

Run coveralls inside docker (#1847)

Cmake build on Windows (#1849)

Fix coveralls badge URL in README [(#1850)]https://github.com/VowpalWabbit/vowpal_wabbit/pull/1850)

Azure Pipeline for Windows CI (#1853)

cleaning up broken python3 vcxproj files (#1854)

BF: add cmake to installed dependencies in unstable singularity images (#1857)

Ataymano/c wrapper fix2 (#1859)

Fix Azure Pipeline build-break (#1867)

Enable Version and Tag override for NuGet pack (#1870)

Shared example merge reduction (#1873)

Update gd_mf_weights.cc right namespace (#1874)

Fixed warning message ips -> mtr (#1875)

Fix dsjson parser regression and add smoke-test (#1878)

do not output progressive validation loss for oaa with subsampling (#1880)

Add python documentation generation (#1884)

Update python documentation theme and index (#1885)

Change references from SolutionDir to ProjectDir (#1886)

Clean up some options warnings and remove some extra copies (#1888)

fix to make invert_hash work with bfgs (#1892)

Treat resource files as binary (#1896)

memory leaks and warnings (#1898)

Change version-generation for C# bindings to use version.txt (#1899)

Change to absolute https path for submodule (#1900)

fix for no label confidence (#1901)

Coin betting (#1903)

Use Appveyor MSBuildLogger (#1904)

Optional exception (#1906)

fix closing invalid file descriptor with memory_io_buf (#1910)

remove warnings (#1911)

Fix to save the state of FTRL models (#1912)

fix static library build (#1913)

more warnings (#1915)

fix for daemon race condition (#1918)

Bremen79 fix save ftrl (#1919)

change semantics of lambda (#1920)

Source code(tar.gz)
Source code(zip)
8.6.1(Jul 21, 2018)

The internal version number now matches the tag number
Source code(tar.gz)
Source code(zip)
8.6.0(Jul 5, 2018)

This version has many bug fixes of course. In addition, (1) There are many improvements to the contextual bandit code thanks to Alberto Bietti (https://arxiv.org/abs/1802.04064 ). (2) There are significant improvements to save_resume due to @denik.

Internally, (a) The argument parsing code has been rewritten to be more consistent throughout the code base via parser_helper.h. (b) There are significant improvements in the way that multiline examples are handled thanks to @rajan-chari

Note that unlike previous release versions, I've left the "Makefile" build system in place rather than using Automake. Automake is still available if desired, but you'll need to run it manually if you want to use it.
Source code(tar.gz)
Source code(zip)
8.5.0(Dec 3, 2017)

8.5.0 has further improvements, including fully working sparse model support, empirically optimized exploration algorithms, a new cost-sensitive active learning algorithm (https://arxiv.org/abs/1703.01014), and baseline prediction support.
Source code(tar.gz)
Source code(zip)
8.4.0(Jul 22, 2017)

There aren’t many new features since 8.3.0 (better JSON language support, better offline eval, sparse parameter support), so this version simply addresses a number of issues that have shown up over time
Source code(tar.gz)
Source code(zip)
8.2.0(Jun 21, 2016)

Many things added including more sophisticated contextual bandits, the recall tree, and OjaNewton.
Source code(tar.gz)
Source code(zip)
7.10(Feb 14, 2015)

No significant changes from 7.9---primarily bugfixes.
Source code(tar.gz)
Source code(zip)
7.9(Jan 11, 2015)

Primarily updates to learning reductions modularity.
Source code(tar.gz)
Source code(zip)
7.8(Dec 7, 2014)

A post with some notes
Source code(tar.gz)
Source code(zip)