scikit-learn inspired API for CRFsuite

Overview

sklearn-crfsuite

PyPI Version Build Status Code Coverage Documentation

sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF is a scikit-learn compatible estimator: you can use e.g. scikit-learn model selection utilities (cross-validation, hyperparameter optimization) with it, or save/load CRF models using joblib.

License is MIT.

Documentation can be found here.


define hyperiongray
Comments
  • How to create features with  duplicate keys ?

    How to create features with duplicate keys ?

    I see in (crfsuite document)[http://www.chokkan.org/software/crfsuite/manual.html] that key of feature can be duplicate:

    B-NP    w[1..4]=a:2 w[1..4]=man w[1..4]=eats
    B-NP    w[1..4]=a w[1..4]=a w[1..4]=man w[1..4]=eats
    B-NP    w[1..4]=a:2.0 w[1..4]=man:1.0 w[1..4]=eats:1.0
    

    How to create features with duplicate keys if i using sklearn-crfsuite ?

    opened by binhnq94 8
  • Different result despite same input

    Different result despite same input

    I tried to create some CRF instances to train with the same training set and same max_iteration param.

    crf = sklearn_crfsuite.CRF(
                algorithm='ap', 
                max_iterations=5, 
            )
    crf.fit(X_train, Y_train)
    
    t = sklearn_crfsuite.CRF(
                algorithm='ap', 
                max_iterations=5, 
            )
    t.fit(X_train, Y_train)
    

    However, their result is different ( I tested them on the same develop set with fmeasure). Hope to see your response soon. Thank you

    opened by iamhuy 6
  • UnicodeEncodeError:

    UnicodeEncodeError:

    Hello! Thank you for your work.

    I experiments with rissian texts. But I have this problem: UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128) I think that in my data set I have some strange symbols, but how I can find it?

    python == 3.6 last version of sklearn-crfsuite

    opened by Ulitochka 5
  • Mini-batch training

    Mini-batch training

    Does the CRF implementation support mini-batch training. Some sklearn predictors have a partial_fit method which supports incremental training. Would there be scope to extend the current implementation to include this?

    opened by uwaisiqbal 5
  • Sequence labelling issue: The numbers of items and labels differ...

    Sequence labelling issue: The numbers of items and labels differ...

    Hi, I'm trying to use sklearn-crfsuite for sequence labelling.

    when running crf.fit(train_data, train_targets) on my data, I get the below stack trace:

    Traceback (most recent call last):
      File ".../argument_segmenter.py", line 49, in train
        crf.fit(train_data, train_targets)
      File "/usr/local/lib/python3.9/site-packages/sklearn_crfsuite/estimator.py", line 314, in fit
        trainer.append(xseq, yseq)
      File "pycrfsuite/_pycrfsuite.pyx", line 312, in pycrfsuite._pycrfsuite.BaseTrainer.append
    ValueError: The numbers of items and labels differ: |x| = 40, |y| = 38
    

    I noticed in https://github.com/TeamHG-Memex/sklearn-crfsuite/issues/20 that someone suggests using a custom scorer, but I don't seem to get past the fitting stage.

    Any advice would be appreciate.

    My code looks like this:

    train_data, test_data, train_targets, test_targets = load_data()
    
    train_data = [sent2features(s) for s in train_data]
    train_targets = [sent2labels(s) for s in train_targets]
    
    test_data = [sent2features(s) for s in test_data]
    test_targets = [sent2labels(s) for s in test_targets]
    
    crf = sklearn_crfsuite.CRF(
        algorithm='lbfgs',
        c1=0.1,
        c2=0.1,
        max_iterations=100,
        all_possible_transitions=True
    )
    
    try:
        crf.fit(train_data, train_targets)
    except Exception as e:
        logging.error(e)
    
    opened by chriswales95 3
  • Possible memory leak problem?

    Possible memory leak problem?

    Hi @kmike

    My colleagues used a Java version of CRFSuite, and found a memory leak problem in it. Therefore, we checked the original CRFsuite site, and found there are a number of issues related to this problem: Results in chokkan/crfsuite. The latest fix accepted by the author is in 2016, and there are some more recent commits by other contributors.

    When we read the doc of Python-CRFsuite, the latest fix of this issue is back to 2015. Can you tell us if the latest Python-CRFSuite or sklearn-CRFSuite fixed those problems? Many thanks!

    opened by acepor 3
  • Effective Feature Induction to Increase F1

    Effective Feature Induction to Increase F1

    Hello,

    I want to use some conjunctions of features to increase my F1 score. Is there any functionality to induce feature effectively?

    Or

    Does sklearn-crfsuite support the algorithm described in this paper? https://people.cs.umass.edu/~mccallum/papers/ifcrf-uai2003.pdf

    Thanks

    opened by emirceyani 3
  • Is there an easy way to obtain a confusion matrix?

    Is there an easy way to obtain a confusion matrix?

    I'm trying

        confusion_matrix(y_test, y_pred)
    

    with sklearn's method, but am getting the error message

    ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format.
    
    opened by goerch 2
  • sklearn.model_selection.cross_validate() can't run.

    sklearn.model_selection.cross_validate() can't run.

    Nice to meet you. I am a student studying with your package. I am in trouble with the problem which I can not solve by myself.

    I tried your tutorial with this site’s feature(https://qiita.com/Hironsan/items/326b66711eb4196aa9d4), and add cross-validation as follows.

    from sklearn.model_selection import cross_validate
    scores = cross_validate(crf, X, y, scoring="f1_macro", cv=5)
    print(scores.test_score)
    

    However, the following error occurs.

    Traceback (most recent call last):
      File "/program/crf.py", line 41, in <module>
        scores = cross_validate(crf, X, y, scoring="f1_macro", cv=5)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 195, in cross_validate
        for train, test in cv.split(X, y, groups))
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
        while self.dispatch_one_batch(iterator):
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
        self._dispatch(tasks)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
        job = self._backend.apply_async(batch, callback=cb)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
        result = ImmediateResult(func)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
        self.results = batch()
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
        return [func(*args, **kwargs) for func, args, kwargs in self.items]
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
        return [func(*args, **kwargs) for func, args, kwargs in self.items]
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 467, in _fit_and_score
        test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 502, in _score
        return _multimetric_score(estimator, X_test, y_test, scorer)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 532, in _multimetric_score
        score = scorer(estimator, X_test, y_test)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 108, in __call__
        **self._kwargs)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 714, in f1_score
        sample_weight=sample_weight)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 828, in fbeta_score
        sample_weight=sample_weight)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 1025, in precision_recall_fscore_support
        y_type, y_true, y_pred = _check_targets(y_true, y_pred)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 72, in _check_targets
        type_true = type_of_target(y_true)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 259, in type_of_target
        raise ValueError('You appear to be using a legacy multi-label data'
    ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.
    
    

    So, I added the follows before crcross-validation.

    trans_X = []
    mlb = MultiLabelBinarizer()
    for x in X:
            x = mlb.fit_transform(x)
            trans_X.append(x.astype(bytes))
    X = trans_X
    y = MultiLabelBinarizer().fit_transform(y)
    y = y.astype(bytes)
    

    However, the following error occurs.

    Traceback (most recent call last):
      File "/program/crf.py", line 41, in <module>
        scores = cross_validate(crf, X, y, scoring="f1_macro", cv=5)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 195, in cross_validate
        for train, test in cv.split(X, y, groups))
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
        while self.dispatch_one_batch(iterator):
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
        self._dispatch(tasks)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
        job = self._backend.apply_async(batch, callback=cb)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
        result = ImmediateResult(func)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
        self.results = batch()
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
        return [func(*args, **kwargs) for func, args, kwargs in self.items]
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
        return [func(*args, **kwargs) for func, args, kwargs in self.items]
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 437, in _fit_and_score
        estimator.fit(X_train, y_train, **fit_params)
      File "/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn_crfsuite/estimator.py", line 314, in fit
        trainer.append(xseq, yseq)
      File "pycrfsuite/_pycrfsuite.pyx", line 312, in pycrfsuite._pycrfsuite.BaseTrainer.append
    ValueError: The numbers of items and labels differ: |x| = 62, |y| = 3
    

    Please tell me how to solve this problem. Sorry to ask this of you when you are busy but I appreciate your help;;

    opened by ss1357 2
  • How to save the trained CRF model?

    How to save the trained CRF model?

    Thank you so much for the work.

    I'm wondering if the trained model can be saved? In the API, CRF has a parameter model_filename to import the trained model, and it stated:

    By default, model files are created automatically and saved in temporary locations; the preferred way to save/load CRF models is to use pickle (or its alternatives like joblib)

    How can we export the model to an explicit location?

    Many thanks!

    opened by acepor 2
  • Can features be discarded by the classifier?

    Can features be discarded by the classifier?

    Hello! I am using crfsuite to train models on my own datasets, and I am testing different sets of features (I have a lot of them). However, some of those features seem to have no effect on classification results: e.g. first I use set of features A and get an F1 = X, and then I use set A + B and get the same results, and this repeats on every train and test set I have (if it is any help, my data is various acoustic features of speech in two languages). My question is: is this normal, or is there a possibility that some of these features are somehow discarded by the model? Thank you in advance!

    opened by PKholyavin 1
  • Maintenance is not current

    Maintenance is not current

    Any possibility of transferring maintenance activity to someone else? There are many PR that would fix many issues with this crfsuite to make it current with sklearn interface.

    opened by vicissitudele 1
  • how to add bigram features ?

    how to add bigram features ?

    Thanks for this excellent package.

    Kindly help with the below questions.

    1. How to use xt, xt-1 or even xt, xt-1, xt-2...xt-n as a feature in sklearn-crfsuite?
    2. How to use a float feature instead of buckets of this continuous variable in sklearn-crfsuite? Any example for this implementation?
    3. Does sklearn-crfsuite only have implementation of linear-chain crf or does it have general crf as well?
    opened by deepak-george 0
  • flat_classification_report seems to be broken

    flat_classification_report seems to be broken

    Hi,

    it appears that flat_classification_report is now broken. Scikit-learn's classification_report no longer uses positional arguments anymore and was deprecated prior a while back. It seems this is now being enforced.

    Specifically, the issue is that labels are no longer a positional argument and is instead a keyword argument.

    It seems to be a simple fix so I can submit a pull request later.

    opened by chriswales95 3
Owner
null
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 8, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 3, 2023
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

Son Tran 6 Oct 4, 2022
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

Gustavo Rosa 30 Jan 4, 2023
[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

VITA 156 Nov 28, 2022