MonkeyLearn API for Python

Official Python client for the MonkeyLearn API. Build and run machine learning models for language processing from your Python apps.

Installation

You can use pip to install the library:

$ pip install monkeylearn

Alternatively, you can just clone the repository and run the setup.py script:

$ python setup.py install

Usage

Before making requests to the API, you need to create an instance of the MonkeyLearn client. You will have to use your account API Key:

from monkeylearn import MonkeyLearn

# Instantiate the client Using your API key
ml = MonkeyLearn('<YOUR API TOKEN HERE>')

Requests

From the MonkeyLearn client instance, you can call any endpoint (check the available endpoints below). For example, you can classify a list of texts using the public Sentiment analysis classifier:

response = ml.classifiers.classify(
    model_id='cl_Jx8qzYJh',
    data=[
        'Great hotel with excellent location',
        'This is the worst hotel ever.'
    ]
)

Responses

The response object returned by every endpoint call is a MonkeyLearnResponse object. The body attribute has the parsed response from the API:

print(response.body)
# =>  [
# =>      {
# =>          'text': 'Great hotel with excellent location',
# =>          'external_id': null,
# =>          'error': false,
# =>          'classifications': [
# =>              {
# =>                  'tag_name': 'Positive',
# =>                  'tag_id': 1994,
# =>                  'confidence': 0.922,
# =>              }
# =>          ]
# =>      },
# =>      {
# =>          'text': 'This is the worst hotel ever.',
# =>          'external_id': null,
# =>          'error': false,
# =>          'classifications': [
# =>              {
# =>                  'tag_name': 'Negative',
# =>                  'tag_id': 1941,
# =>                  'confidence': 0.911,
# =>              }
# =>          ]
# =>      }
# =>  ]

You can also access other attributes in the response object to get information about the queries used or available:

print(response.plan_queries_allowed)
# =>  300

print(response.plan_queries_remaining)
# =>  240

print(response.request_queries_used)
# =>  2

Errors

Endpoint calls may raise exceptions. Here is an example on how to handle them:

from monkeylearn.exceptions import PlanQueryLimitError, MonkeyLearnException

try:
    response = ml.classifiers.classify('[MODEL_ID]', data=['My text'])
except PlanQueryLimitError as e:
    # No monthly queries left
    # e.response contains the MonkeyLearnResponse object
    print(e.error_code, e.detail)
except MonkeyLearnException:
    raise

Available exceptions:

class	Description
`MonkeyLearnException`	Base class for every exception below.
`RequestParamsError`	An invalid parameter was sent. Check the exception message or response object for more information.
`AuthenticationError`	Authentication failed, usually because an invalid token was provided. Check the exception message. More about Authentication.
`ForbiddenError`	You don't have permissions to perform the action on the given resource.
`ModelLimitError`	You have reached the custom model limit for your plan.
`ModelNotFound`	The model does not exist. Check the `model_id`.
`TagNotFound`	The tag does not exist. Check the `tag_id` parameter.
`PlanQueryLimitError`	You have reached the monthly query limit for your plan. Consider upgrading your plan. More about Plan query limits.
`PlanRateLimitError`	You have sent too many requests in the last minute. Check the exception detail. More about Plan rate limit.
`ConcurrencyRateLimitError`	You have sent too many requests in the last second. Check the exception detail. More about Concurrency rate limit.
`ModelStateError`	The state of the model is invalid. Check the exception detail.

Auto-batching

Classify and Extract endpoints might require more than one request to the MonkeyLearn API in order to process every text in the data parameter. If the auto_batch parameter is True (which is the default value), you won't have to keep the data length below the max allowed value (200). You can just pass the full list and the library will handle the batching and make the necessary requests. If the retry_if_throttled parameter is True (which is the default value), it will also wait and retry if the API throttled a request.

Let's say you send a data parameter with 300 texts and auto_batch is enabled. The list will be split internally and two requests will be sent to MonkeyLearn with 200 and 100 texts, respectively. If all requests respond with a 200 status code, the responses will be appended and you will get the 300 classifications as usual in the MonkeyLearnResponse.body attribute:

data = ['Text to classify'] * 300
response = ml.classifiers.classify('[MODEL_ID]', data)
assert len(response.body) == 300  # => True

Now, let's say you only had 200 queries left when trying the previous example, the second internal request would fail since you wouldn't have queries left after the first batch and a PlanQueryLimitError exception would be raised. The first 200 (successful) classifications will be in the exception object. However, if you don't manage this exception with an except clause, those first 200 successful classifications will be lost. Here's how you should handle that case:

from monkeylearn.exceptions import PlanQueryLimitError

data = ['Text to classify'] * 300
batch_size = 200

try:
    response = ml.classifiers.classify('[MODEL_ID]', data, batch_size=batch_size)
except PlanQueryLimitError as e:
    partial_predictions = e.response.body  # The body of the successful responses
    non_2xx_raw_responses = r.response.failed_raw_responses  # List of requests responses objects
else:
    predictions = response.body

This is very convenient and usually should be enough. If you need more flexibility, you can manage batching and rate limits yourself.

from time import sleep
from monkeylearn.exceptions import PlanQueryLimitError, ConcurrencyRateLimitError, PlanRateLimitError

data = ['Text to classify'] * 300
batch_size = 200
predictions = []

for i in range(0, len(data), batch_size):
    batch_data = data[i:i + batch_size]

    retry = True
    while retry:
        try:
            retry = True
            response = ml.classifiers.classify('[MODEL_ID]', batch_data, auto_batch=False,
                                               retry_if_throttled=False)
        except PlanRateLimitError as e:
            sleep(e.seconds_to_wait)
        except ConcurrencyRateLimitError:
            sleep(2)
        except PlanQueryLimitError:
            raise
        else:
            retry = False

    predictions.extend(response.body)

This way you'll be able to control every request that is sent to the MonkeyLearn API.

Available endpoints

These are all the endpoints of the API. For more information about each endpoint, check out the API documentation.

Classifiers

Classify

def MonkeyLearn.classifiers.classify(model_id, data, production_model=False, batch_size=200,
                                     auto_batch=True, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
data	`list[str or dict]`	A list of up to 200 data elements to classify. Each element must be a string with the text or a dict with the required `text` key and the text as the value. You can provide an optional `external_id` key with a string that will be included in the response.
production_model	`bool`	Indicates if the classifications are performed by the production model. Only use this parameter with custom models (not with the public ones). Note that you first need to deploy your model to production either from the UI model settings or by using the Classifier deploy endpoint.
batch_size	`int`	Max number of texts each request will send to MonkeyLearn. A number from 1 to 200.
auto_batch	`bool`	Split the `data` list into smaller valid lists, send each one in separate request to MonkeyLearn, and merge the responses.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

data = ['First text', {'text': 'Second text', 'external_id': '2'}]
response = ml.classifiers.classify('[MODEL_ID]', data)

Classifier detail

def MonkeyLearn.classifiers.detail(model_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.detail('[MODEL_ID]')

Create Classifier

def MonkeyLearn.classifiers.create(name, description='', algorithm='nb', language='en',
                                   max_features=10000, ngram_range=(1, 1), use_stemming=True,
                                   preprocess_numbers=True, preprocess_social_media=False,
                                   normalize_weights=True, stopwords=True, whitelist=None,
                                   retry_if_throttled=True)

Parameters:

Parameter	Type	Description
name	`str`	The name of the model.
description	`str`	The description of the model.
algorithm	`str`	The algorithm used when training the model. It can be either "nb" or "svm".
language	`str`	The language of the model. Full list of supported languages.
max_features	`int`	The maximum number of features used when training the model. Between 10 and 100000.
ngram_range	`tuple(int,int)`	Indicates which n-gram range used when training the model. A list of two numbers between 1 and 3. They indicate the minimum and the maximum n for the n-grams used.
use_stemming	`bool`	Indicates whether stemming is used when training the model.
preprocess_numbers	`bool`	Indicates whether number preprocessing is done when training the model.
preprocess_social_media	`bool`	Indicates whether preprocessing of social media is done when training the model.
normalize_weights	`bool`	Indicates whether weights will be normalized when training the model.
stopwords	`bool or list`	The list of stopwords used when training the model. Use False for no stopwords, True for the default stopwords, or a list of strings for custom stopwords.
whitelist	`list`	The whitelist of words used when training the model.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.create(name='New classifier', stopwords=True)

Edit Classifier

def MonkeyLearn.classifiers.edit(model_id, name=None, description=None, algorithm=None,
                                 language=None, max_features=None, ngram_range=None,
                                 use_stemming=None, preprocess_numbers=None,
                                 preprocess_social_media=None, normalize_weights=None,
                                 stopwords=None, whitelist=None, retry_if_throttled=None)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
name	`str`	The name of the model.
description	`str`	The description of the model.
algorithm	`str`	The algorithm used when training the model. It can be either "nb" or "svm".
language	`str`	The language of the model. Full list of supported languages.
max_features	`int`	The maximum number of features used when training the model. Between 10 and 100000.
ngram_range	`tuple(int,int)`	Indicates which n-gram range used when training the model. A list of two numbers between 1 and 3. They indicate the minimum and the maximum n for the n-grams used.
use_stemming	`bool`	Indicates whether stemming is used when training the model.
preprocess_numbers	`bool`	Indicates whether number preprocessing is done when training the model.
preprocess_social_media	`bool`	Indicates whether preprocessing of social media is done when training the model.
normalize_weights	`bool`	Indicates whether weights will be normalized when training the model.
stopwords	`bool or list`	The list of stopwords used when training the model. Use False for no stopwords, True for the default stopwords, or a list of strings for custom stopwords.
whitelist	`list`	The whitelist of words used when training the model.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.edit('[MODEL_ID]', description='The new description of the classifier')

Delete classifier

def MonkeyLearn.classifiers.delete(model_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.delete('[MODEL_ID]')

List Classifiers

def MonkeyLearn.classifiers.list(page=1, per_page=20, order_by='-created', retry_if_throttled=True)

Parameters:

Parameter	Type	Description
page	`int`	Specifies which page to get.
per_page	`int`	Specifies how many items per page will be returned.
order_by	`string or list`	Specifies the ordering criteria. It can either be a string for single criteria ordering or a list of strings for more than one. Each string must be a valid field name; if you want inverse/descending order of the field prepend a `-` (dash) character. Some valid examples are: `'is_public'`, `'-name'` or `['-is_public', 'name']`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.list(page=2, per_page=5, order_by=['-is_public', 'name'])

Deploy

def MonkeyLearn.classifiers.deploy(model_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.deploy('[MODEL_ID]')

Train

def MonkeyLearn.classifiers.train(model_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.train('[MODEL_ID]')

Tag detail

def MonkeyLearn.classifiers.tags.detail(model_id, tag_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
tag_id	`int`	Tag ID.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.tags.detail('[MODEL_ID]', TAG_ID)

Create tag

def MonkeyLearn.classifiers.tags.create(model_id, name, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
name	`str`	The name of the new tag.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.tags.create('[MODEL_ID]', 'Positive')

Edit tag

def MonkeyLearn.classifiers.tags.edit(model_id, tag_id, name=None,
                                      retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
tag_id	`int`	Tag ID.
name	`str`	The new name of the tag.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.tags.edit('[MODEL_ID]', TAG_ID, 'New name')

Delete tag

def MonkeyLearn.classifiers.tags.delete(model_id, tag_id, move_data_to=None,
                                        retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
tag_id	`int`	Tag ID.
move_data_to	`int`	An optional tag ID. If provided, training data associated with the tag to be deleted will be moved to the specified tag before deletion.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.classifiers.tags.delete('[MODEL_ID]', TAG_ID)

Upload data

def MonkeyLearn.classifiers.upload_data(model_id, data, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Classifier ID. It always starts with `'cl'`, for example, `'cl_oJNMkt2V'`.
data	`list[dict]`	A list of dicts with the keys described below.
input_duplicates_strategy	`str`	Indicates what to do with duplicate texts in this request. Must be one of `merge`, `keep_first` or `keep_last`.
existing_duplicates_strategy	`str`	Indicates what to do with texts of this request that already exist in the model. Must be one of `overwrite` or `ignore`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

data dict keys:

Key	Description
text	A string of the text to upload.
tags	A list of tags that can be refered to by their numeric ID or their name. The text will be tagged with each tag in the list when created (in case it doesn't already exist on the model). Otherwise, its tags will be updated to the new ones. New tags will be created if they don't already exist.
markers	An optional list of string. Each one represents a marker that will be associated with the text. New markers will be created if they don't already exist.

Example:

response = ml.classifiers.upload_data(
    model_id='[MODEL_ID]',
    data=[{'text': 'text 1', 'tags': [TAG_ID_1, '[tag_name]']},
          {'text': 'text 2', 'tags': [TAG_ID_1, TAG_ID_2]}]
)

Extractors

Extract

def MonkeyLearn.extractors.extract(model_id, data, production_model=False, batch_size=200,
                                   retry_if_throttled=True, extra_args=None)

Parameters:

Parameter	Type	Description
model_id	`str`	Extractor ID. It always starts with `'ex'`, for example, `'ex_oJNMkt2V'`.
data	`list[str or dict]`	A list of up to 200 data elements to extract from. Each element must be a string with the text or a dict with the required `text` key and the text as the value. You can also provide an optional `external_id` key with a string that will be included in the response.
production_model	`bool`	Indicates if the extractions are performed by the production model. Only use this parameter with custom models (not with the public ones). Note that you first need to deploy your model to production from the UI model settings.
batch_size	`int`	Max number of texts each request will send to MonkeyLearn. A number from 1 to 200.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

data = ['First text', {'text': 'Second text', 'external_id': '2'}]
response = ml.extractors.extract('[MODEL_ID]', data=data)

Extractor detail

def MonkeyLearn.extractors.detail(model_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Extractor ID. It always starts with `'ex'`, for example, `'ex_oJNMkt2V'`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.extractors.detail('[MODEL_ID]')

List extractors

def MonkeyLearn.extractors.list(page=1, per_page=20, order_by='-created', retry_if_throttled=True)

Parameters:

Parameter	Type	Description
page	`int`	Specifies which page to get.
per_page	`int`	Specifies how many items per page will be returned.
order_by	`string or list`	Specifies the ordering criteria. It can either be a string for single criteria ordering or a list of strings for more than one. Each string must be a valid field name; if you want inverse/descending order of the field prepend a `-` (dash) character. Some valid examples are: `'is_public'`, `'-name'` or `['-is_public', 'name']`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.extractors.list(page=2, per_page=5, order_by=['-is_public', 'name'])

Workflows

Workflow detail

def MonkeyLearn.workflows.detail(model_id, step_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
step_id	`int`	Step ID.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.workflows.detail('[MODEL_ID]', '[STEP_ID]')

Create workflow

def MonkeyLearn.workflows.create(name, db_name, steps, description='', webhook_url=None,
                                 custom_fields=None, sources=None, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
name	`str`	The name of the model.
db_name	`str`	The name of the database where the data will be stored. The name must not already be in use by another database.
steps	`list[dict]`	A list of step dicts.
description	`str`	The description of the model.
webhook_url	`str`	An URL that will be called when an action is triggered.
custom_fields	`[]`	A list of custom_field dicts that represent user defined fields that come with the input data and that will be saved. It does not include the mandatory `text` field.
sources	`{}`	An object that represents the data sources of the workflow.

Example:

response = ml.workflows.create(
    name='Example Workflow',
    db_name='example_workflow',
    steps=[{
        name: 'sentiment',
        model_id: 'cl_pi3C7JiL'
    }, {
        name: 'keywords',
        model_id: 'ex_YCya9nrn'
    }])

Delete workflow

def MonkeyLearn.workflows.delete(model_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.workflows.delete('[MODEL_ID]')

Step detail

def MonkeyLearn.workflows.steps.detail(model_id, step_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
step_id	`int`	Step ID.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.workflows.steps.detail('[MODEL_ID]', STEP_ID)

Create step

def MonkeyLearn.workflows.steps.create(model_id, name, step_model_id, input=None,
                                         conditions=None, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
name	`str`	The name of the new step.
step_model_id	`str`	The ID of the MonkeyLearn model that will run in this step. Must be an existing classifier or extractor.
input	`str`	Where the input text to use in this step comes from. It can be either the name of a step or `input_data` (the default), which means that the input will be the original text.
conditions	`list[dict]`	A list of condition dicts that indicate whether this step should execute or not. All the conditions in the list must be true for the step to execute.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.workflows.steps.create(model_id='[MODEL_ID]',  name='sentiment',
                                     step_model_id='cl_pi3C7JiL')

Delete step

def MonkeyLearn.workflows.steps.delete(model_id, step_id, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
step_id	`int`	Step ID.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.workflows.steps.delete('[MODEL_ID]', STEP_ID)

Upload workflow data

def MonkeyLearn.workflows.data.create(model_id, data, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
data	`list[dict]`	A list of dicts with the keys described below.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

data dict keys:

Key	Description
text	A string of the text to upload.
[custom field name]	The value for a custom field for this text. The type of the value must be the one specified when the field was created.

Example:

response = ml.workflows.data.create(
    model_id='[MODEL_ID]',
    data=[{'text': 'text 1', 'rating': 3},
          {'text': 'text 2', 'rating': 4}]
)

List workflow data

def MonkeyLearn.workflows.data.list(model_id, batch_id=None, is_processed=None,
                                    sent_to_process_date_from=None, sent_to_process_date_to=None,
                                    page=None, per_page=None, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
page	`int`	The page number to be retrieved.
per_page	`int`	The maximum number of items the page should have. The maximum allowed value is `50`.
batch_id	`int`	The ID of the batch to retrieve. If unspecified, data from all batches is shown.
is_processed	`bool`	Whether to return data that has been processed or data that has not been processed yet. If unspecified, both are shown indistinctly.
sent_to_process_date_from	`str`	An ISO formatted date which specifies the oldest `sent_date` of the data to be retrieved.
sent_to_process_date_to	`str`	An ISO formatted date which specifies the most recent `sent_date` of the data to be retrieved.

Example:

response = ml.workflows.data.list('[MODEL_ID]', batch_id=1839, page=1)

Create custom field

def MonkeyLearn.workflows.custom_fields.create(model_id, name, data_type, retry_if_throttled=True)

Parameters:

Parameter	Type	Description
model_id	`str`	Workflow ID. It always starts with `'wf'`, for example, `'wf_oJNMkt2V'`.
name	`str`	The name of the new custom field.
data_type	`str`	The type of the data of the field. It must be one of `string`, `date`, `text`, `integer`, `float`, `bool`.
retry_if_throttled	`bool`	If a request is throttled, sleep and retry the request.

Example:

response = ml.workflows.custom_fields.create(model_id='[MODEL_ID]',  name='rating',
                                             data_type='integer')

If I understand the "classify" API documentation correctly, I should be able to pass multiple text strings in a single call and only use 1 API call, correct? I'm basing this on

This endpoint allows you to perform the classification of many text samples using only one request to a custom or public module.

Assuming my understanding is accurate, I am seeing different behavior. Instead of a single API call for a list of text strings, I see one API call per text string.

This is my sample code

from monkeylearn import MonkeyLearn
import user_settings

MONKEYLEARN_API_KEY = user_settings.KEY
MODULE_ID = user_settings.MODULE_ID

ml = MonkeyLearn(MONKEYLEARN_API_KEY)
res = ml.classifiers.detail(MODULE_ID)
samples = ['this is a string',
           'a dog barks on tuesday',
           'i like purple muffins',
           'apples are funny fruit',
           'the house is up a mountain',
           'cats do not like trees',
           'computers will take over',
           'you sank my battleship',
           'duck, duck, goose',
           'yellow baby buggy bumpers']

res = ml.classifiers.classify(MODULE_ID, samples, sandbox=True, debug=False)
print("Quota Remaining")
print("-"*20)
pprint.pprint(res.query_limit_remaining, indent=4)

Two runs, back to back, of this print the following:

RUN 1: Quota Remaining

'99940'

RUN 2: Quota Remaining

'99930'

Notice that the remaining quota is 10 apart between runs. This is the same number of sample text strings. If I comment out 5 of them and run it again, I get this output:

Quota Remaining

'99915'

To me, this is confirming that I am using 1 API call to the classify endpoint per string, not per group of strings.

Is this the expected behavior, or should my list of 10 strings only be costing 1 API call?

Calls to classify end point seem to use 1 API call per string not per list of strings
If I understand the "classify" API documentation correctly, I should be able to pass multiple text strings in a single call and only use 1 API call, correct? I'm basing this on

This endpoint allows you to perform the classification of many text samples using only one request to a custom or public module.

Assuming my understanding is accurate, I am seeing different behavior. Instead of a single API call for a list of text strings, I see one API call per text string.

This is my sample code

from monkeylearn import MonkeyLearn import user_settings MONKEYLEARN_API_KEY = user_settings.KEY MODULE_ID = user_settings.MODULE_ID ml = MonkeyLearn(MONKEYLEARN_API_KEY) res = ml.classifiers.detail(MODULE_ID) samples = ['this is a string', 'a dog barks on tuesday', 'i like purple muffins', 'apples are funny fruit', 'the house is up a mountain', 'cats do not like trees', 'computers will take over', 'you sank my battleship', 'duck, duck, goose', 'yellow baby buggy bumpers'] res = ml.classifiers.classify(MODULE_ID, samples, sandbox=True, debug=False) print("Quota Remaining") print("-"*20) pprint.pprint(res.query_limit_remaining, indent=4)

Two runs, back to back, of this print the following:

RUN 1: Quota Remaining

'99940'

RUN 2: Quota Remaining

'99930'

Notice that the remaining quota is 10 apart between runs. This is the same number of sample text strings. If I comment out 5 of them and run it again, I get this output:

Quota Remaining

'99915'

To me, this is confirming that I am using 1 API call to the classify endpoint per string, not per group of strings.

Is this the expected behavior, or should my list of 10 strings only be costing 1 API call?
opened by AWegnerGitHub 2

Bug CLASSIFICATION_ENDPOINT

>>> from monkeylearn import MonkeyLearn
>>> ml = MonkeyLearn('API')
>>> res = ml.classifiers.create('Test Classifier')
>>> module_id = res.result['classifier']['hashed_id']
>>> res = ml.classifiers.detail(module_id)
>>> root_id = res.result['sandbox_categories'][0]['id']
>>> res = ml.classifiers.categories.create(module_id, 'Negative', root_id)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\py34\lib\site-packages\monkeylearn\classification.py", line 20, in ca
tegories
    return Categories(self.token)
  File "C:\py34\lib\site-packages\monkeylearn\classification.py", line 110, in _
_init__
    self.endpoint = CLASSIFICATION_ENDPOINT
NameError: name 'CLASSIFICATION_ENDPOINT' is not defined

I using Python 3.4.3 64 bit for windows. Thank you.

opened by wannaphong 2

extraction.py (line 28): AttributeError: 'dict' object has no attribute 'iteritems' in python 3
File "extraction.py", line 28, in extract for key, value in kwargs.iteritems(): AttributeError: 'dict' object has no attribute 'iteritems'

A similar issue is raised in http://stackoverflow.com/a/30418498/2952688 iteritems() being deprecated in python 3, items() has to be used instead.

Commit https://github.com/monkeylearn/monkeylearn-python/commit/f3eee368e13ee37048d52bde0d067efea057fef8 seems to be the culprit.
opened by RomainBrunias 1

bug Resource not found

I doing like https://github.com/monkeylearn/monkeylearn-python but , It's have bug.

>>> from monkeylearn import MonkeyLearn
>>> ml = MonkeyLearn('API')
>>> res = ml.classifiers.create('Test Classifier')
>>> module_id = res.result['classifier']['hashed_id']
>>> res = ml.classifiers.detail(module_id)
>>> root_id = res.result['sandbox_categories'][0]['id']
>>> res = ml.classifiers.categories.create(module_id, 'Negative', root_id)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\py34\lib\site-packages\monkeylearn\classification.py", line 119, in c
reate
    self.handle_errors(response)
  File "C:\py34\lib\site-packages\monkeylearn\utils.py", line 78, in handle_erro
rs
    raise MonkeyLearnException(json.dumps(res['detail']))
monkeylearn.exceptions.MonkeyLearnException: Error: "Resource not found. Check y
our URL."

opened by wannaphong 1

Warn about usage of db_name parameter

db_name usage is being deprecated from the API, an automatically generated name will be provided instead. This PR warns users about the parameter removal in a future release.

opened by omab 0

Bug "No JSON object could be decoded" exception thrown when running pipeline

Happens on average once every ~5000 requests. This fix didn't solve the issue.

Error message:

    Traceback (most recent call last):
      File "classify_pipe.py", line 32, in <module>
        res = ml.pipelines.run(module_id, data, sandbox=False)
      File "/usr/local/lib/python2.7/dist-packages/monkeylearn/pipelines.py", line 21, in run
        response = self.make_request(url, 'POST', data, sleep_if_throttled)
      File "/usr/local/lib/python2.7/dist-packages/monkeylearn/utils.py", line 46, in make_request
        response_json = response.json()
      File "/usr/lib/python2.7/dist-packages/requests/models.py", line 741, in json
        return json.loads(self.text, **kwargs)
      File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
        return _default_decoder.decode(s)
      File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
        raise ValueError("No JSON object could be decoded")
    ValueError: No JSON object could be decoded

opened by brusteca 0

v3.6.0(Jul 26, 2021)
Remove deprecated parent_id parameter

Deprecate usage of Workflows db_name

Source code(tar.gz)
Source code(zip)
v3.5.2(Feb 21, 2020)
Removed deprecated parent_id param

Source code(tar.gz)
Source code(zip)
v3.5.1(Oct 17, 2019)
MonkeyLearn Python v3.5.1

Fixed wrong default for algorithm on Classification.create

Source code(tar.gz)
Source code(zip)
v3.5.0(Jun 10, 2019)
MonkeyLearn Python v3.5.0

Added input_duplicates_strategy and existing_duplicates_strategy support to classifiers.upload_data.

Source code(tar.gz)
Source code(zip)
v3.4.0(Jun 6, 2019)
MonkeyLearn Python v3.4.0

Added Workflow API to SDK

Source code(tar.gz)
Source code(zip)
v3.2.4(Jan 11, 2019)
bugfix in how we set the long_description for setup.py

Source code(tar.gz)
Source code(zip)
v3.2.3(Dec 28, 2018)
add user agent in request

Source code(tar.gz)
Source code(zip)
v3.2.2(Dec 28, 2018)
add support for workflows

Source code(tar.gz)
Source code(zip)
v3.2.1(Sep 13, 2018)
Cache response.body to avoid parsing on each access and improve performance when dealing with multiple requests

Improved documentation

Source code(tar.gz)
Source code(zip)
v3.2.0(Jul 26, 2018)
Support for MonkeyLearn API v3.2

Added order_by support in classifier and extractor list endpoints

Source code(tar.gz)
Source code(zip)
v3.1.0(Jul 13, 2018)
Support for MonkeyLearn API v3.1

Added classifiers.train endpoint

Added seconds_to_wait attr to PlanRateLimitError

Source code(tar.gz)
Source code(zip)
v3.0.1(Jul 4, 2018)
Fixed a bug when retrying rate-limited requests

Updated package metadata for PYPI

Source code(tar.gz)
Source code(zip)
v3.0.0(May 28, 2018)
Implemented MonkeyLearn API v3 support

Source code(tar.gz)
Source code(zip)
v0.3.7(Mar 21, 2017)
add category detail endpoint

Source code(tar.gz)
Source code(zip)
v0.3.6(Jan 4, 2017)
Add kwargs parameter to classify function

Source code(tar.gz)
Source code(zip)
v0.3.5(Aug 30, 2016)
Fix six import

Source code(tar.gz)
Source code(zip)
v0.3.4(Aug 30, 2016)
Fix iteritems for python3

Source code(tar.gz)
Source code(zip)
v0.3.2(Jun 15, 2016)
Handle "No JSON object could be decoded" exception

Source code(tar.gz)
Source code(zip)
v0.3.1(May 27, 2016)
Add clustering

Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 29, 2016)

-new params for multi-feature
Source code(tar.gz)
Source code(zip)
v0.2.5(Apr 4, 2016)
Fix compatibility with python 3

Source code(tar.gz)
Source code(zip)
v0.2.4(Mar 10, 2016)
Add the debug option to the classify endpoint

Source code(tar.gz)
Source code(zip)
v0.2.3(Mar 4, 2016)
Update samples upload API

Source code(tar.gz)
Source code(zip)
v0.2.2(Mar 3, 2016)
Add list endpoint to the classifiers api

Source code(tar.gz)
Source code(zip)
v0.2.1(Dec 7, 2015)
Bugfix on Categories class

Source code(tar.gz)
Source code(zip)
v0.2(Dec 2, 2015)
Support for python3

Added support for custom API URLs (for development, enterprise, etc...)

Added data param validation in pipelines.run

Source code(tar.gz)
Source code(zip)
v0.1.1(Nov 20, 2015)

Put the library on pip
Source code(tar.gz)
Source code(zip)
v0.1.0(Nov 20, 2015)

First version of the library
Source code(tar.gz)
Source code(zip)