Pysolr — Python Solr client

Overview

pysolr

pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Status

https://secure.travis-ci.org/django-haystack/pysolr.png

Changelog

Features

  • Basic operations such as selecting, updating & deleting.
  • Index optimization.
  • "More Like This" support (if set up in Solr).
  • Spelling correction (if set up in Solr).
  • Timeout support.
  • SolrCloud awareness

Requirements

  • Python 2.7 - 3.7
  • Requests 2.9.1+
  • Optional - simplejson
  • Optional - kazoo for SolrCloud mode

Installation

pysolr is on PyPI:

$ pip install pysolr

Or if you want to install directly from the repository:

$ python setup.py install

Usage

Basic usage looks like:

# If on Python 2.X
from __future__ import print_function

import pysolr

# Create a client instance. The timeout and authentication options are not required.
solr = pysolr.Solr('http://localhost:8983/solr/', always_commit=True, [timeout=10], [auth=<type of authentication>])

# Note that auto_commit defaults to False for performance. You can set
# `auto_commit=True` to have commands always update the index immediately, make
# an update call with `commit=True`, or use Solr's `autoCommit` / `commitWithin`
# to have your data be committed following a particular policy.

# Do a health check.
solr.ping()

# How you'd index data.
solr.add([
    {
        "id": "doc_1",
        "title": "A test document",
    },
    {
        "id": "doc_2",
        "title": "The Banana: Tasty or Dangerous?",
        "_doc": [
            { "id": "child_doc_1", "title": "peel" },
            { "id": "child_doc_2", "title": "seed" },
        ]
    },
])

# You can index a parent/child document relationship by
# associating a list of child documents with the special key '_doc'. This
# is helpful for queries that join together conditions on children and parent
# documents.

# Later, searching is easy. In the simple case, just a plain Lucene-style
# query is fine.
results = solr.search('bananas')

# The ``Results`` object stores total results found, by default the top
# ten most relevant results and any additional data like
# facets/highlighting/spelling/etc.
print("Saw {0} result(s).".format(len(results)))

# Just loop over it to access the results.
for result in results:
    print("The title is '{0}'.".format(result['title']))

# For a more advanced query, say involving highlighting, you can pass
# additional options to Solr.
results = solr.search('bananas', **{
    'hl': 'true',
    'hl.fragsize': 10,
})

# Traverse a cursor using its iterator:
for doc in solr.search('*:*',fl='id',cursorMark='*'):
    print(doc['id'])

# You can also perform More Like This searches, if your Solr is configured
# correctly.
similar = solr.more_like_this(q='id:doc_2', mltfl='text')

# Finally, you can delete either individual documents,
solr.delete(id='doc_1')

# also in batches...
solr.delete(id=['doc_1', 'doc_2'])

# ...or all documents.
solr.delete(q='*:*')
# For SolrCloud mode, initialize your Solr like this:

zookeeper = pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")
solr = pysolr.SolrCloud(zookeeper, "collection1", auth=<type of authentication>)

Multicore Index

Simply point the URL to the index core:

# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)

Custom Request Handlers

# Setup a Solr instance. The trailing slash is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', use_qt_param=False)

If use_qt_param is True it is essential that the name of the handler is exactly what is configured in solrconfig.xml, including the leading slash if any. If use_qt_param is False (default), the leading and trailing slashes can be omitted.

If search_handler is not specified, pysolr will default to /select.

The handlers for MoreLikeThis, Update, Terms etc. all default to the values set in the solrconfig.xml SOLR ships with: mlt, update, terms etc. The specific methods of pysolr's Solr class (like more_like_this, suggest_terms etc.) allow for a kwarg handler to override that value. This includes the search method. Setting a handler in search explicitly overrides the search_handler setting (if any).

Custom Authentication

# Setup a Solr instance in a kerborized enviornment
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)

solr = pysolr.Solr('http://localhost:8983/solr/', auth=kerberos_auth)
# Setup a CloudSolr instance in a kerborized environment
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)

zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
solr = pysolr.SolrCloud(zookeeper, "collection", auth=kerberos_auth)

If your Solr servers run off https

# Setup a Solr instance in an https environment
solr = pysolr.Solr('http://localhost:8983/solr/', verify=path/to/cert.pem)
# Setup a CloudSolr instance in a kerborized environment

zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
solr = pysolr.SolrCloud(zookeeper, "collection", verify=path/to/cert.perm)

Custom Commit Policy

# Setup a Solr instance. The trailing slash is optional.
# All requests to Solr will be immediately committed because `always_commit=True`:
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', always_commit=True)

always_commit signals to the Solr object to either commit or not commit by default for any solr request. Be sure to change this to True if you are upgrading from a version where the default policy was always commit by default.

Functions like add and delete also still provide a way to override the default by passing the commit kwarg.

It is generally good practice to limit the amount of commits to Solr as excessive commits risk opening too many searchers or excessive system resource consumption. See the Solr documentation for more information and details about the autoCommit and commitWithin options:

https://lucene.apache.org/solr/guide/7_7/updatehandlers-in-solrconfig.html#UpdateHandlersinSolrConfig-autoCommit

LICENSE

pysolr is licensed under the New BSD license.

Contributing to pysolr

For consistency, this project uses pre-commit to manage Git commit hooks:

  1. Install the pre-commit package: e.g. brew install pre-commit, pip install pre-commit, etc.
  2. Run pre-commit install each time you check out a new copy of this Git repository to ensure that every subsequent commit will be processed by running pre-commit run, which you may also do as desired. To test the entire repository or in a CI scenario, you can check every file rather than just the staged ones using pre-commit run --all.

Running Tests

The run-tests.py script will automatically perform the steps below and is recommended for testing by default unless you need more control.

Running a test Solr instance

Downloading, configuring and running Solr 4 looks like this:

./start-solr-test-server.sh

Running the tests

$ python -m unittest tests
Comments
  • Feature/solrcloud

    Feature/solrcloud

    An extension to pysolr to make it Zookeeper/SolrCloud aware. This is cloned from the code in the SolrJ client. The tests are limited to proving that this does not break existing functionality, although I have tested (manually) that it does correctly failover between nodes when a node in a cluster fails.

    Commit Checklist

    • [ ] Test coverage for ZooKeeper / SolrCloud error states
    • [x] Add a Travis test matrix which runs without Kazoo installed to confirm that nothing breaks for traditional usage (the SolrCloud tests are supposed to be skipped)
    • [ ] Support SolrCloud 5 and have a Travis test matrix entry for both major versions
    • [x] Add test that confirms that Pysolr fails-over correctly when one of the Solr nodes disappears (can be simulated with kill -STOP and kill -CONT)
    feature stale 
    opened by upayavira 45
  • delete(): allow batch delete by id

    delete(): allow batch delete by id

    In order to speed up batch deletes, Solr supports sending multiple document id values in a single command:

      <doc>
        <id>3</id>
        <id>14</id>
        <id>15</id>
        <id>62</id>
      </doc>
    

    Also added basic testing of the feature in tests/client.py.

    opened by cosimo 20
  • add() should not commit by default

    add() should not commit by default

    In Solr, a "commit" is a heavy duty operation and shouldn't be taken lightly. A Solr client API like PySolr should NOT tell Solr to commit without the user code's deliberate request for it.

    Of course, PySolr has been around for a while now and you can't simply change it without breaking user expectations; I'm not sure what the solution is.

    feature documentation stale 
    opened by dsmiley 19
  • TypeError: Element() keywords must be strings

    TypeError: Element() keywords must be strings

    Traceback (most recent call last):
      File "/storage/pydev/feelfree-v4/feelfree/../lib/haystack/management/commands/update_index.py", line 210, in handle_
    label
        self.update_backend(label, using)
      File "/storage/pydev/feelfree-v4/feelfree/../lib/haystack/management/commands/update_index.py", line 256, in update_
    backend
        do_update(backend, index, qs, start, end, total, self.verbosity)
      File "/storage/pydev/feelfree-v4/feelfree/../lib/haystack/management/commands/update_index.py", line 78, in do_updat
    e
        backend.update(index, current_qs)
      File "/storage/pydev/feelfree-v4/feelfree/../lib/haystack/backends/solr_backend.py", line 66, in update
        self.conn.add(docs, commit=commit, boost=index.get_field_weights())
      File "/storage/pydev/feelfree-v4/feelfree/../lib/pysolr.py", line 740, in add
        message.append(self._build_doc(doc, boost=boost))
      File "/storage/pydev/feelfree-v4/feelfree/../lib/pysolr.py", line 695, in _build_doc
        field = ET.Element('field', **attrs)
      File "lxml.etree.pyx", line 2560, in lxml.etree.Element (src/lxml/lxml.etree.c:52924)
    TypeError: Element() keywords must be strings
    

    attrs looks like {u'name': 'django_ct'}

    env:

    • Linux 3.5.0-22-generic #34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
    • python 2.7.3
    • pysolr 3.0.5
    • lxml 2.3.5

    Possible fix (works for me)

    @@ -687,10 +687,10 @@ class Solr(object):
                     if self._is_null_value(bit):
                         continue
    
    -                attrs = {'name': key}
    +                attrs = {str('name'): key}
    
                     if boost and key in boost:
    -                    attrs['boost'] = force_unicode(boost[key])
    +                    attrs[str('boost')] = force_unicode(boost[key])
    
                     field = ET.Element('field', **attrs)
                     field.text = self._from_python(bit)
    
    opened by srusskih 18
  • Add ability to declare Solr object with default commit policy

    Add ability to declare Solr object with default commit policy

    Since solr has hooks to auto commit after X documents or X seconds, it is best that the Solr methods get, delete, and _update do not have a policy of committing by default.

    One may already assume this to be false by default since it is good practice to limit commits. But incase users rely on this functionality, there is the ability to declare the default policy in the Solr constructor.

    opened by efagerberg 16
  • how to use deep paging cursorMark with pysolr ?

    how to use deep paging cursorMark with pysolr ?

    results = solr.search('query', *{'fq':'title', 'cursorMark':'', 'sort':'id desc'})

    but i cant see nextcursorMark in results. for more details using cursorMark http://heliosearch.org/solr/paging-and-deep-paging/

    any solution ?

    opened by yspanchal 13
  • Add support for error response in JSON format. Closes #108, #109

    Add support for error response in JSON format. Closes #108, #109

    Since #60 is probably too big, I propose to support at least JSON responses with this PR. Also, I do not agree how #92, #108 and #109 are trying to address the issue.

    Closes #108, #109.

    @toastdriven I think #92 should be fixed properly via #60 or something similar for new XML responses.

    opened by andreif 12
  • Solr errors throw TypeError

    Solr errors throw TypeError

    Stuff I have:

    • Python 3.3
    • Solr 4.5.1
    • pysolr==3.1.0
    • django-haystack==2.1.0
    • lxml==3.2.3

    During rebuild_index if Solr kicks back an error message I get this traceback:

    Removing all documents from your index because you said so.
    All documents removed.
    Indexing 1785 songs
    DEBUG 2013-11-10 16:48:07,096 base 21556 139963105810240 Sending message of length 7208 to http://sentry.phazemedia.com/api/9/store/
    Traceback (most recent call last):
      File "./manage.py", line 15, in <module>
        execute_from_command_line(sys.argv)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
        utility.execute()
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/__init__.py", line 392, in execute
        self.fetch_command(subcommand).run_from_argv(self.argv)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/base.py", line 242, in run_from_argv
        self.execute(*args, **options.__dict__)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/base.py", line 285, in execute
        output = self.handle(*args, **options)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/haystack/management/commands/rebuild_index.py", line 16, in handle
        call_command('update_index', **options)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/__init__.py", line 159, in call_command
        return klass.execute(*args, **defaults)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/base.py", line 285, in execute
        output = self.handle(*args, **options)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/haystack/management/commands/update_index.py", line 195, in handle
        return super(Command, self).handle(*items, **options)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/django/core/management/base.py", line 385, in handle
        label_output = self.handle_label(label, **options)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/haystack/management/commands/update_index.py", line 221, in handle_label
        self.update_backend(label, using)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/haystack/management/commands/update_index.py", line 267, in update_backend
        do_update(backend, index, qs, start, end, total, self.verbosity)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/haystack/management/commands/update_index.py", line 89, in do_update
        backend.update(index, current_qs)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/haystack/backends/solr_backend.py", line 68, in update
        self.conn.add(docs, commit=commit, boost=index.get_field_weights())
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/pysolr.py", line 754, in add
        return self._update(m, commit=commit, waitFlush=waitFlush, waitSearcher=waitSearcher)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/pysolr.py", line 362, in _update
        return self._send_request('post', path, message, {'Content-type': 'text/xml; charset=utf-8'})
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/pysolr.py", line 293, in _send_request
        error_message = self._extract_error(resp)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/pysolr.py", line 372, in _extract_error
        reason, full_html = self._scrape_response(resp.headers, resp.content)
      File "/var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/pysolr.py", line 442, in _scrape_response
        full_html = full_html.replace('\n', '')
    TypeError: expected bytes, bytearray or buffer compatible object
    

    And here's the culprit variable:

    > /var/www/_sites/plumradio_20131110163934/env/lib/python3.3/site-packages/pysolr.py(443)_scrape_response()
    -> full_html = full_html.replace('\n', '')
    (Pdb) full_html
    b'<response>\n<lst name="responseHeader"><int name="status">400</int><int name="QTime">1</int></lst><lst name="error"><str name="msg">ERROR: [doc=songs.song.1] unknown field \'django_ct\'</str><int name="code">400</int></lst>\n</response>'
    (Pdb)
    

    The actual error is a result of me forgetting to copy my schema.xml file (wrist slap), but the resulting TypeError is not graceful.

    stale 
    opened by dustinfarris 12
  • fixed lxml errors when reading Tomcat error messages.

    fixed lxml errors when reading Tomcat error messages.

    When parsing error messages pysolr assumes that Tomcat will send a certain flavour of invalid response.

    Sometime in Tomcat 6 (or maybe Solr4) the assertion that this code was based on became untrue, and so the error handling code in pysolr began creating it's own error. This may only be true when using lxml (it's required in my project so I haven't tested without).

    This fix prevents pysolr from obscuring the tomcat error message with it's own, if it fails to find the tag it's looking for.

    This is what I was getting before making this fix:

    File "/home/webapps/.virtualenvs/myapp/local/lib/python2.7/site-packages/pysolr.py", line 404, in _scrape_response
    p_nodes = body_node.cssselect('p')
    AttributeError: 'NoneType' object has no attribute 'cssselect'
    
    opened by atkinson 12
  • How pysolr import UTF-8 data to Solr server?

    How pysolr import UTF-8 data to Solr server?

    I have a request look like this:

        solr.add(    [
            {
                "id": "1",
                "title": "đinh bộ linh",
                "content": ["ông bà "]
            }]
    )
    

    When i call this request. Having a error like this:

    Traceback (most recent call last):
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\threading.py", line 914, in _
    bootstrap_inner
        self.run()
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\threading.py", line 1180, in
    run
        self.function(*self.args, **self.kwargs)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\mysql_replicati
    on-0.7-py3.5.egg\pymysqlreplication\row_event.py", line 739, in AddThread
        solr.add(value["value"])
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pysolr-3.3.3-py
    3.5.egg\pysolr.py", line 822, in add
        return self._update(m, commit=commit, softCommit=softCommit, waitFlush=waitFlush, waitSearcher=w
    aitSearcher)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pysolr-3.3.3-py
    3.5.egg\pysolr.py", line 400, in _update
        return self._send_request('post', path, message, {'Content-type': 'text/xml; charset=utf-8'})
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pysolr-3.3.3-py
    3.5.egg\pysolr.py", line 309, in _send_request
        timeout=self.timeout)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\sessions.py", line 511, in post
        return self.request('POST', url, data=data, json=json, **kwargs)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\sessions.py", line 454, in request
        prep = self.prepare_request(req)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\sessions.py", line 388, in prepare_request
        hooks=merge_hooks(request.hooks, self.hooks),
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\models.py", line 308, in prepare
        self.prepare_body(data, files, json)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\models.py", line 459, in prepare_body
        body = self._encode_params(data)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\models.py", line 85, in _encode_params
        return to_native_string(data)
      File "C:\Users\dungdb1\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests-2.9.0-
    py3.5.egg\requests\utils.py", line 702, in to_native_string
        out = string.decode(encoding)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 91: ordinal not in range(128)
    

    how can you send that to solr using pysolr? Pls help me! Thanks

    opened by dungdb1 11
  • Preventing empty strings from being sent to Solr unnecessary?

    Preventing empty strings from being sent to Solr unnecessary?

    _is_null_value prevents fields that map to the empty string from being sent to Solr. Why is that? The comment around this code suggests that this averts some issue within Solr itself, but Solr 4.10 at least seems to be able to handle this. Should pysolr continue to filter out empty string values?

    stale 
    opened by llvtt 10
  • Unable to store custom/nested JSON docs

    Unable to store custom/nested JSON docs

    I have

    • [x] Tested with the latest release
    • [ ] Tested with the current master branch
    • [x] Searched for similar existing issues

    Expected behaviour

    I want to be able to store nested JSON document in solr, as described here: https://solr.apache.org/guide/solr/latest/indexing-guide/transforming-and-indexing-custom-json.html

    Actual behaviour

    It seems this is not possible at the moment.

    Steps to reproduce the behaviour

    1. Run the following code:
    solr.add([
        {
          "first": "John",
          "last": "Doe",
          "grade": 8,
          "exams": [
            {
              "subject": "Maths",
              "test"   : "term1",
              "marks"  : 90},
            {
              "subject": "Biology",
              "test"   : "term1",
              "marks"  : 86}
          ]
        }
    ])
    

    Configuration

    • Operating system version: Ubuntu
    • Search engine version: solr:9.1.0
    • Python version: 3.8
    • pysolr version: 3.10.0b1
    opened by yoavnash 0
  • Bump certifi from 2021.10.8 to 2022.12.7

    Bump certifi from 2021.10.8 to 2022.12.7

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Intermittent Connection Failure

    Intermittent Connection Failure

    I am connecting to solr cluster using SolrCloud with Kerberos authentication is enabled. We are getting connection failures every next day after the app is not used overnight.

    2022-09-08 16:56:13,754 - kazoo.client - WARNING - Connection dropped: socket connection error: Connection refused

    We then decided to open and close connection on every request but something the problem remains.

    @contextmanager
    def open_connection(collection) -> pysolr.SolrCloud:
        """create Pysolr connection"""
        os.environ["KRB5_CLIENT_KTNAME"] = KEYTAB
        solr: pysolr.SolrCloud = None
        try:
            logger.info("Opening solr connection")
            zookeeper = pysolr.ZooKeeper(ZOOKEEPER_URL)
            kerberos_auth = HTTPKerberosAuth(principal=PRINCIPAL, force_preemptive=True)
            solr = pysolr.SolrCloud(
                zookeeper,
                collection,
                auth=kerberos_auth,
                verify=False,
                search_handler="/select",
            )
            yield solr
        finally:
            if solr:
                logger.info("Closing solr connection")
                solr.get_session().close()
    

    Configuration

    • Operating system version: RHEL 7.3
    • Search engine version: 8.6
    • Python version: 3.6
    • pysolr version: 3.9.0
    opened by chirdeeptomar 0
  • Unknown operation for the an atomic update with pysolr-3.9.0

    Unknown operation for the an atomic update with pysolr-3.9.0

    I have

    • [x] Tested with the latest release
    • [ ] Tested with the current master branch
    • [x] Searched for similar existing issues

    Actual behaviour

    While trying to insert dataset I'm getting this error:

    2022-07-22 18:10:43,208 ERROR [ckan.lib.search] Error while indexing dataset 87d3efd7-ef1c-43b4-b805-2ade2ab9d2b8: SearchIndexError('Solr returned an error: Solr responded with an error (HTTP 400): [Reason: Error:[doc=dc6fb4e7feeabffcb8ce037697ab1e83]  Unknown operation for the an atomic update: publication_date]')
    Solr returned an error: Solr responded with an error (HTTP 400): [Reason: Error:[doc=dc6fb4e7feeabffcb8ce037697ab1e83]  Unknown operation for the an atomic update: publication_date]
    
    

    This only happens when using pysolr 3.9.0, everything works fine with pysolr 3.6.0 I believe the error comes from this line when the solrapi is set to JSON, if the solrapi is set to XML it works like a charm.

    Here you can find the 'index' logic https://github.com/ckan/ckan/blob/0a596b8394dbf9582902853ad91450d2c0d7959b/ckan/lib/search/index.py

    Here you can find the schema.xml: https://github.com/ckan/ckan/blob/master/ckan/config/solr/schema.xml

    Configuration

    • Operating system version: macOS Monterrey 12.4
    • Search engine version: 8.11.1
    • Python version: 3.9.12
    • pysolr version: 3.9.0
    opened by TomeCirun 2
  • Is there a new release for pysolr?

    Is there a new release for pysolr?

    I saw a commit https://github.com/django-haystack/pysolr/commit/f6169e9681b95c23d04829c5c0102480382e8ea6 to support customized session has been merged to master branch, however, no new release after that, May I know is there a plan for a new release for the package?

    opened by lyle-w 8
Releases(v3.9.0)
  • v3.9.0(Apr 17, 2020)

  • v3.8.1(Dec 13, 2018)

    • extract() handles spaces and special characters in filenames
    • Python set() instances are handled automatically just like lists
    • Updates no longer commit by default
    Source code(tar.gz)
    Source code(zip)
  • v3.2.0(Mar 4, 2014)

    • Better XML serialization: see 0b1fb09803854da3363517043d929141954cc701
    • Update content extraction for newer versions of Solr - see #104, #96, #90
    • Test updates
    Source code(tar.gz)
    Source code(zip)
Owner
Haystack Search
Haystack Search
Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

elastic 463 Dec 30, 2022
Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork.

Flask-WhooshAlchemy3 Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork. Performance improvements and suggestions are read

Blake VandeMerwe 27 Mar 10, 2022
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Search Engine (GShop, GVideo and many too) and Baidu Search Engine.

null 33 Nov 21, 2022
esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch.

esguard esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch. Quick Start You need to launch elast

po3rin 5 Dec 8, 2021
A real-time tech course finder, created using Elasticsearch, Python, React+Redux, Docker, and Kubernetes.

A real-time tech course finder, created using Elasticsearch, Python, React+Redux, Docker, and Kubernetes.

Dinesh Sonachalam 130 Dec 20, 2022
a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot

Drive Search Bot This is a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot How to deploy? Clone this repo: git clone

Hafitz Setya 25 Dec 9, 2022
Simple algorithm search engine like google in python using function

Mini-Search-Engine-Like-Google I have created the simple algorithm search engine like google in python using function. I am matching every word with w

Sachin Vinayak Dabhade 5 Sep 24, 2021
User-friendly, tiny source code searcher written by pure Python.

User-friendly, tiny source code searcher written in pure Python. Example Usages Cat is equivalent in the regular expression as '^Cat$' bor class Cat

Furkan Onder 106 Nov 2, 2022
This is a Telegram Bot written in Python for searching data on Google Drive.

This is a Telegram Bot written in Python for searching data on Google Drive. Supports multiple Shared Drives (TDs). Manual Guide for deploying the bot

Levi 158 Dec 27, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022
A Python web searcher library with different search engines

Robert A simple Python web searcher library with different search engines. Install pip install roberthelper Usage from robert import GoogleSearcher

null 1 Dec 23, 2021
A fast, efficiency python package for searching and getting search results with many different search engines

search A fast, efficiency python package for searching and getting search results with many different search engines. Installation To install the pack

Neurs 0 Oct 6, 2022
Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Haystack Search 626 Dec 1, 2022
Solrorm : A sort-of solr ORM for python

solrorm : A sort-of solr ORM for python solrpy - deprecated solrorm - currently in dev Usage Cores The first step to interact with solr using solrorm

Aj 1 Nov 21, 2021
Apache Solr SSRF(CVE-2021-27905)

Solr-SSRF Apache Solr SSRF #Use [-] Apache Solr SSRF漏洞 (CVE-2021-27905) [-] Options: -h or --help : 方法说明 -u or --url

Henry4E36 70 Nov 9, 2022
Index different CKAN entities in Solr, not just datasets

ckanext-sitesearch Index different CKAN entities in Solr, not just datasets Requirements This extension requires CKAN 2.9 or higher and Python 3 Featu

Open Knowledge Foundation 3 Dec 2, 2022
Mlflow-rest-client - Python client for MLflow REST API

Python Client for MLflow Python client for MLflow REST API. Features: Minimal de

MTS 35 Dec 23, 2022
Iris-client - Python client for DFIR-IRIS

Python client dfir_iris_client offers a Python interface to communicate with IRI

DFIR-IRIS 11 Dec 22, 2022
league-connection is a python package to communicate to riot client and league client

league-connection is a python package to communicate to riot client and league client.

Sandbox 1 Sep 13, 2022
Dns-Client-Server - Dns Client Server For Python

Dns-client-server DNS Server: supporting all types of queries and replies. Shoul

Nishant Badgujar 1 Feb 15, 2022