scikit-learn inspired API for CRFsuite

Last update: Dec 20, 2022

Related tags

Deep Learning sklearn-crfsuite

Overview

sklearn-crfsuite

sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF is a scikit-learn compatible estimator: you can use e.g. scikit-learn model selection utilities (cross-validation, hyperparameter optimization) with it, or save/load CRF models using joblib.

License is MIT.

Documentation can be found here.

Comments

How to create features with duplicate keys ?
I see in (crfsuite document)[http://www.chokkan.org/software/crfsuite/manual.html] that key of feature can be duplicate:

B-NP w[1..4]=a:2 w[1..4]=man w[1..4]=eats B-NP w[1..4]=a w[1..4]=a w[1..4]=man w[1..4]=eats B-NP w[1..4]=a:2.0 w[1..4]=man:1.0 w[1..4]=eats:1.0

How to create features with duplicate keys if i using sklearn-crfsuite ?
opened by binhnq94 8
Different result despite same input
I tried to create some CRF instances to train with the same training set and same max_iteration param.

crf = sklearn_crfsuite.CRF( algorithm='ap', max_iterations=5, ) crf.fit(X_train, Y_train) t = sklearn_crfsuite.CRF( algorithm='ap', max_iterations=5, ) t.fit(X_train, Y_train)

However, their result is different ( I tested them on the same develop set with fmeasure). Hope to see your response soon. Thank you
opened by iamhuy 6
UnicodeEncodeError:

Hello! Thank you for your work.

I experiments with rissian texts. But I have this problem: UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) I think that in my data set I have some strange symbols, but how I can find it?

python == 3.6 last version of sklearn-crfsuite

opened by Ulitochka 5
Mini-batch training

Does the CRF implementation support mini-batch training. Some sklearn predictors have a partial_fit method which supports incremental training. Would there be scope to extend the current implementation to include this?

opened by uwaisiqbal 5

Sequence labelling issue: The numbers of items and labels differ...

Hi, I'm trying to use sklearn-crfsuite for sequence labelling.

when running crf.fit(train_data, train_targets) on my data, I get the below stack trace:

Traceback (most recent call last):
  File ".../argument_segmenter.py", line 49, in train
    crf.fit(train_data, train_targets)
  File "/usr/local/lib/python3.9/site-packages/sklearn_crfsuite/estimator.py", line 314, in fit
    trainer.append(xseq, yseq)
  File "pycrfsuite/_pycrfsuite.pyx", line 312, in pycrfsuite._pycrfsuite.BaseTrainer.append
ValueError: The numbers of items and labels differ: |x| = 40, |y| = 38

I noticed in https://github.com/TeamHG-Memex/sklearn-crfsuite/issues/20 that someone suggests using a custom scorer, but I don't seem to get past the fitting stage.

Any advice would be appreciate.

My code looks like this:

train_data, test_data, train_targets, test_targets = load_data()

train_data = [sent2features(s) for s in train_data]
train_targets = [sent2labels(s) for s in train_targets]

test_data = [sent2features(s) for s in test_data]
test_targets = [sent2labels(s) for s in test_targets]

crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1,
    c2=0.1,
    max_iterations=100,
    all_possible_transitions=True
)

try:
    crf.fit(train_data, train_targets)
except Exception as e:
    logging.error(e)

opened by chriswales95 3

Possible memory leak problem?

Hi @kmike

My colleagues used a Java version of CRFSuite, and found a memory leak problem in it. Therefore, we checked the original CRFsuite site, and found there are a number of issues related to this problem: Results in chokkan/crfsuite. The latest fix accepted by the author is in 2016, and there are some more recent commits by other contributors.

When we read the doc of Python-CRFsuite, the latest fix of this issue is back to 2015. Can you tell us if the latest Python-CRFSuite or sklearn-CRFSuite fixed those problems? Many thanks!

opened by acepor 3
Effective Feature Induction to Increase F1

Hello,

I want to use some conjunctions of features to increase my F1 score. Is there any functionality to induce feature effectively?

Or

Does sklearn-crfsuite support the algorithm described in this paper? https://people.cs.umass.edu/~mccallum/papers/ifcrf-uai2003.pdf

Thanks

opened by emirceyani 3

Is there an easy way to obtain a confusion matrix?

I'm trying

    confusion_matrix(y_test, y_pred)

with sklearn's method, but am getting the error message

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format.

opened by goerch 2

sklearn.model_selection.cross_validate() can't run.

Nice to meet you. I am a student studying with your package. I am in trouble with the problem which I can not solve by myself.

I tried your tutorial with this site’s feature(https://qiita.com/Hironsan/items/326b66711eb4196aa9d4), and add cross-validation as follows.

from sklearn.model_selection import cross_validate
scores = cross_validate(crf, X, y, scoring="f1_macro", cv=5)
print(scores.test_score)

However, the following error occurs.

Traceback (most recent call last):
  File "/program/crf.py", line 41, in <module>
    scores = cross_validate(crf, X, y, scoring="f1_macro", cv=5)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 195, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 467, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 502, in _score
    return _multimetric_score(estimator, X_test, y_test, scorer)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 532, in _multimetric_score
    score = scorer(estimator, X_test, y_test)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 108, in __call__
    **self._kwargs)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 714, in f1_score
    sample_weight=sample_weight)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 828, in fbeta_score
    sample_weight=sample_weight)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 1025, in precision_recall_fscore_support
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 72, in _check_targets
    type_true = type_of_target(y_true)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 259, in type_of_target
    raise ValueError('You appear to be using a legacy multi-label data'
ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

So, I added the follows before crcross-validation.

trans_X = []
mlb = MultiLabelBinarizer()
for x in X:
        x = mlb.fit_transform(x)
        trans_X.append(x.astype(bytes))
X = trans_X
y = MultiLabelBinarizer().fit_transform(y)
y = y.astype(bytes)

However, the following error occurs.

Traceback (most recent call last):
  File "/program/crf.py", line 41, in <module>
    scores = cross_validate(crf, X, y, scoring="f1_macro", cv=5)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 195, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 437, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn_crfsuite/estimator.py", line 314, in fit
    trainer.append(xseq, yseq)
  File "pycrfsuite/_pycrfsuite.pyx", line 312, in pycrfsuite._pycrfsuite.BaseTrainer.append
ValueError: The numbers of items and labels differ: |x| = 62, |y| = 3

Please tell me how to solve this problem. Sorry to ask this of you when you are busy but I appreciate your help;;

opened by ss1357 2

How to save the trained CRF model?

Thank you so much for the work.

I'm wondering if the trained model can be saved? In the API, CRF has a parameter model_filename to import the trained model, and it stated:

By default, model files are created automatically and saved in temporary locations; the preferred way to save/load CRF models is to use pickle (or its alternatives like joblib)

How can we export the model to an explicit location?

Many thanks!

opened by acepor 2
Can features be discarded by the classifier?

Hello! I am using crfsuite to train models on my own datasets, and I am testing different sets of features (I have a lot of them). However, some of those features seem to have no effect on classification results: e.g. first I use set of features A and get an F1 = X, and then I use set A + B and get the same results, and this repeats on every train and test set I have (if it is any help, my data is various acoustic features of speech in two languages). My question is: is this normal, or is there a possibility that some of these features are somehow discarded by the model? Thank you in advance!

opened by PKholyavin 1
Maintenance is not current

Any possibility of transferring maintenance activity to someone else? There are many PR that would fix many issues with this crfsuite to make it current with sklearn interface.

opened by vicissitudele 1
how to add bigram features ?
Thanks for this excellent package.

Kindly help with the below questions.

How to use xt, xt-1 or even xt, xt-1, xt-2...xt-n as a feature in sklearn-crfsuite?

How to use a float feature instead of buckets of this continuous variable in sklearn-crfsuite? Any example for this implementation?

Does sklearn-crfsuite only have implementation of linear-chain crf or does it have general crf as well?
opened by deepak-george 0
flat_classification_report seems to be broken

Hi,

it appears that flat_classification_report is now broken. Scikit-learn's classification_report no longer uses positional arguments anymore and was deprecated prior a while back. It seems this is now being enforced.

Specifically, the issue is that labels are no longer a positional argument and is instead a keyword argument.

It seems to be a simple fix so I can submit a pull request later.

opened by chriswales95 3

Owner

GitHub

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

380 Nov 5, 2022

Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

482 Jan 4, 2023

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

528 Nov 25, 2022

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Dec 31, 2022

scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

52.5k Jan 8, 2023

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

3.8k Feb 13, 2021

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Jan 3, 2023

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 2, 2023

Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

260 Dec 14, 2022

Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

709 Jan 3, 2023

SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

73 Sep 30, 2022

Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

1.3k Dec 29, 2022

A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

584 Dec 27, 2022

Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

11 Dec 16, 2022

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

6 Oct 4, 2022

Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

101 Dec 16, 2022

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

1.4k Dec 22, 2022

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

30 Jan 4, 2023

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

156 Nov 28, 2022