solrpy is a Python client for Solr

Related tags

Search solrpy
Overview

solrpy

Build Status

solrpy is a Python client for Solr, an enterprise search server built on top of Lucene. solrpy allows you to add documents to a Solr instance, and then to perform queries and gather search results from Solr using Python.

Overview

Here's the basic idea:

import solr

# create a connection to a solr server
s = solr.SolrConnection('http://example.org:8083/solr')

# add a document to the index
doc = {
  "id": 1,
  "title": "Lucene in Action",
  "author": ["Erik Hatcher", "Otis Gospodnetić"]
}
s.add(doc, commit=True)

# do a search
response = s.query('title:lucene')
for hit in response.results:
    print hit['title']

More powerful queries

Optional parameters for query, faceting, highlighting and more like this can be passed in as Python parameters to the query method. You just need to convert the dot notation (e.g. facet.field) to underscore notation (e.g. facet_field) so that they can be used as parameter names.

For example, let's say you wanted to get faceting information in your search result::

response = s.query('title:lucene', facet='true', facet_field='subject')

and if the parameter takes multiple values you just pass them in as a list::

response = s.query('title:lucene', facet='true', facet_field=['subject', 'publisher'])

Tests

To run the tests, you need to have a running solr instance. The easiest way to do this is:

curl -sSL https://raw.githubusercontent.com/moliware/travis-solr/master/travis-solr.sh | SOLR_VERSION=4.10.3 SOLR_CONFS=tests bash

Community

Feel free to join our discussion list if you have ideas or suggestions.

Comments
  • Batch updates broken for Solr 1.2+

    Batch updates broken for Solr 1.2+

    Currently, batch updates are broken in trunk, it's been like that since at
    least r6. I don't know which version of Solr the feature was written in
    mind with but it doesn't work with Solr 1.2+.
    
    I'm not sure what to do about it as I'm not using it at all. Personally, I
    would have nothing against deprecating and removing it, however having it
    working again could possibly be an area for contributing to the project.
    

    Original issue reported on code.google.com by [email protected] on 30 Aug 2009 at 6:32

    opened by GoogleCodeExporter 7
  • More flexible sort in queries

    More flexible sort in queries

    What steps will reproduce the problem?
    
    solr_conn = solr.SolrConnection('http://localhost:8080/solr')
    
    solr_conn.query('test query', sort='field1 asc, field2 desc')
    
    This doesn't work. I get a "400 Bad Request" response.
    
    According to the docstrings, the sort param could be a string in the format
    that SOLR expects, or a python list/tuple of field names.
    
    I've attached a diff with a proposed fix for this.
    
    
    

    Original issue reported on code.google.com by augusto%[email protected] on 26 May 2009 at 8:35

    Attachments:

    opened by GoogleCodeExporter 6
  • Python 3 support.

    Python 3 support.

    I've added support for Python3 to solrpy. The root of the repo has been split into python2 and python3 subdirs. All the tests of the Python2 version have been appropriately updated and are 100% passing on Python3.

    I've also updated setup.py so that it will install the correct version of solrpy given the version of Python the user is running.

    I updated solrpy to run on python3 around Septmber, and the python3 version has been running on the Elvis Database since then. I only just updated the tests and setup this week, and feel it's ready to pull.

    This version is currently hosted on Pypi as solrpy3, but I think it would be great to add these changes to the main solrpy dist. The repo is already set up to support both python 2 and python 3 installs from Pypi.

    -Alex

    opened by agpar 5
  • Add LICENSE and include in MANIFEST?

    Add LICENSE and include in MANIFEST?

    Hey-lo,

    I'm building a version of solrpy using conda for conda-forge. When possible, we try to include a link to the license file in the meta.yaml specification for the build; doing so requires the license be in the repository and that it be indexed in MANIFEST.in file so that it gets included in the source distribution. Would you consider adding a separate copy of the license info outside of the metadata where it's currently living and including it in the manifest?

    opened by pmlandwehr 4
  • SolrConnection can post mal-formed XML

    SolrConnection can post mal-formed XML

    It's easy to create mal-formed XML posts to Solr, and difficult to create
    an efficient (single-POST) multi-document add or delete:
    
      conn = solr.SolrConnection('http://solr.example.net/')
      conn.begin_batch()
      conn.delete_many(['one', 'two'])
      conn.end_batch(commit=True)
    
    will cause this XML to be POSTed:
    
      <delete><id>one</id></delete><delete><id>two</id></delete><commit/>
    
    This should produce two POSTs:
    
      <delete><id>one</id><id>two</id></delete>
    
    and:
    
      <commit/>
    
    I'm using solrpy 0.5 (installed from PyPI using zc.buildout).
    

    Original issue reported on code.google.com by fdrake on 6 May 2009 at 5:47

    opened by GoogleCodeExporter 4
  • post_header with s.query()

    post_header with s.query()

    https://code.google.com/p/solrpy/issues/detail?id=34

    What steps will reproduce the problem?

    >>> import solr
    >>> s = solr.SolrConnection('https://example.com/solr', post_headers={'x-api-key': 'xxxyyyzzz'})
    >>> s.query("fred")
    

    What is the expected output? What do you see instead?

    I expect to see the x-api-key header on the server side

    What version of the product are you using? On what operating system? pip install solrpy version on OS X

    Please provide any additional information below.

    here is a patch that fixes it for me

    https://code.google.com/r/briantinglecdliborg-solrpyx/source/detail?r=d83b854e3b301aa1f433e52c7c5e617c1b4b303e

    opened by tingletech 3
  • Retry post if badstatusline received

    Retry post if badstatusline received

    in the post method, we added retries due to badstatusline:
    
            attempts = 4
            while attempts: 
                caught_exception = False
                try:
                    self.conn.request('POST', url, body.encode('utf-8'), headers)
                    return check_response_status(self.conn.getresponse())
                except (socket.error,
                        httplib.ImproperConnectionState,
                        httplib.BadStatusLine):
                        # We include BadStatusLine as they are spurious
                        # and may randomly happen on an otherwise fine 
                        # SOLR connection (though not often)
                    time.sleep(1)
                    caught_exception = True
                except SolrHTTPException:
                    msg = "HTTP error. %s tries left; retrying...\n" % attempts
                    sys.stderr.write(msg)
                    time.sleep(20)
                    caught_exception = True
                if caught_exception:    
                    self._reconnect()
                    attempts -= 1
                    if not attempts:
                        raise
    
    
    

    Original issue reported on code.google.com by [email protected] on 24 Mar 2008 at 4:16

    opened by GoogleCodeExporter 3
  • fix travis tests

    fix travis tests

    not sure why the first commit fixes things, based it on this:

    https://github.com/key/solrpy/commit/5b5f14d17506a843290fccaf645d34f229a25714

    That got it down to 4 errors, and then this http://stackoverflow.com/a/7790127/1763984 fixed those errors

    opened by tingletech 2
  • numFound is a string

    numFound is a string

    What steps will reproduce the problem?
    1. Issue a query
    2. Use a condition like if response.numFound > 100
    3. Python enters the condition nontheless
    4. isinstance(response.numFound, long) == False
    
    What is the expected output? What do you see instead?
    The numFound attribute should be of type long, and similarly, start should 
    be long too and maxScore float
    
    
    

    Original issue reported on code.google.com by [email protected] on 18 Feb 2010 at 5:37

    opened by GoogleCodeExporter 2
  • Debug mode

    Debug mode

    solrpy currently lacks a way to determine exactly what (encoded) query was 
    sent to the SOLR server and what response was received from th SOLR server.
    
    This is very inconvenient when developing against a potentially misbehaving 
    SOLR server.
    
    Attached here is a simple patch that adds a "debug" flag parameter to 
    SolrConnection constructor.
    
    Usage:
    
    conn = solr.SolrConnection('http://localhost:8080/solr/', debug=True)
    
    From then on, at each request passing through conn, solrpy will log at INFO 
    level (using the logging module) both the passed in, encoded parameters and 
    the resulting XML, making it easier to determine where a bug lies.
    

    Original issue reported on code.google.com by [email protected] on 4 Feb 2010 at 10:03

    Attachments:

    opened by GoogleCodeExporter 2
  • raw_query turns params into unicode, but urllib in python 2.x doesn't actually handle it

    raw_query turns params into unicode, but urllib in python 2.x doesn't actually handle it

    raw_query() turns all it's parameters into unicode objects, and then calls 
    urllib.urlencode(), which can't really handle unicode in python 2.x, so it 
    just turns non-ascii into ?'s. This makes it impossible to hand non-ascii to 
    solr queries. If you encode to utf-8 before passing it to raw_query, the 
    unicode() constructor throws an exception.
    
    query() doesn't do this to its extra parameters, so to me it seems that 
    raw_query shouldn't either. Patch attached.
    

    Original issue reported on code.google.com by [email protected] on 19 Jan 2010 at 10:04

    Attachments:

    opened by GoogleCodeExporter 2
  • 1.0.0 has been released with a print ?

    1.0.0 has been released with a print ?

    Hello, We've been indexing with Solr.add_many successfully for a few years, but after an update to 1.0.0 we noticed our logs contained a copy of all documents.

    It seems that the 1.0.0 package published on PyPi contains a print(''.join(lst)) ; that's really suprising as it's not in the repository (https://github.com/search5/solrpy/blob/master/solr/core.py#L518)

    Can someone double-check ? If it is really there, please re-package a 1.0.1 release.

    opened by martinkirch-luxia 1
  • Support for

    Support for "q.op" parameter

    Starting from Solr 7.0 this parameter is very essential as defaultOperator was dropped from the schema. Because of the dot in the name, it is not easy and straightforward to specify it in solrpy using kwargs. So, let's support it natively.

    opened by rkosenko 1
  • Remove use of `eval`

    Remove use of `eval`

    This is extremely dangerous when processing field settings from untrusted sources: https://github.com/search5/solrpy/blob/master/solr/core.py#L1116

    For Open ONI, we're considering just forking the repo to avoid this problem. A simple prototype has verified that it's trivial to construct a "value" which can run arbitrary code before the key check occurs. Given how often data is going to be stored from unknown sources, this just isn't an option for us.

    opened by jechols 0
  • README file causes UnicodeDecodeError in setup.py

    README file causes UnicodeDecodeError in setup.py

    Seems that the UTF-8 name Otis Gospodnetić causes a problem for the Unicode Decoder as it doesn't know what to do with the c with the acute accent.

    Possible fix: https://github.com/search5/solrpy/blob/master/setup.py#L10 to: with open('README.md', "r", encoding="utf-8", errors="ignore") as f:

    opened by AndrewGearhart 0
  • Unicode Problems

    Unicode Problems

    unicode is not good for some class, like Decimal, unicode will convert it to "Decimal xxx",but not pure number string. This could cause error. However it can be solved by manually convert that variable to str. But sometimes it cannot be avoided. For example, delete just one field: { "id": 1, "fieldToBeDelete": {"set": None} }

    "None" will be serialized to "None" but not null, and Solr server will accept it as a new value,rather than to delete this field. This may be terrible.

    Can you fix this problem? or moreover provide a way to replace 'unicode' by some option?

    opened by nexttonever 0
Owner
Jiho Persy Lee
Korea National Open University Graduate Computer Science Dept. Database System
Jiho Persy Lee
Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

elastic 463 Dec 30, 2022
Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork.

Flask-WhooshAlchemy3 Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork. Performance improvements and suggestions are read

Blake VandeMerwe 27 Mar 10, 2022
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Search Engine (GShop, GVideo and many too) and Baidu Search Engine.

null 33 Nov 21, 2022
esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch.

esguard esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch. Quick Start You need to launch elast

po3rin 5 Dec 8, 2021
A real-time tech course finder, created using Elasticsearch, Python, React+Redux, Docker, and Kubernetes.

A real-time tech course finder, created using Elasticsearch, Python, React+Redux, Docker, and Kubernetes.

Dinesh Sonachalam 130 Dec 20, 2022
a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot

Drive Search Bot This is a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot How to deploy? Clone this repo: git clone

Hafitz Setya 25 Dec 9, 2022
Simple algorithm search engine like google in python using function

Mini-Search-Engine-Like-Google I have created the simple algorithm search engine like google in python using function. I am matching every word with w

Sachin Vinayak Dabhade 5 Sep 24, 2021
User-friendly, tiny source code searcher written by pure Python.

User-friendly, tiny source code searcher written in pure Python. Example Usages Cat is equivalent in the regular expression as '^Cat$' bor class Cat

Furkan Onder 106 Nov 2, 2022
This is a Telegram Bot written in Python for searching data on Google Drive.

This is a Telegram Bot written in Python for searching data on Google Drive. Supports multiple Shared Drives (TDs). Manual Guide for deploying the bot

Levi 158 Dec 27, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022
A Python web searcher library with different search engines

Robert A simple Python web searcher library with different search engines. Install pip install roberthelper Usage from robert import GoogleSearcher

null 1 Dec 23, 2021
A fast, efficiency python package for searching and getting search results with many different search engines

search A fast, efficiency python package for searching and getting search results with many different search engines. Installation To install the pack

Neurs 0 Oct 6, 2022
Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Haystack Search 626 Dec 1, 2022
Solrorm : A sort-of solr ORM for python

solrorm : A sort-of solr ORM for python solrpy - deprecated solrorm - currently in dev Usage Cores The first step to interact with solr using solrorm

Aj 1 Nov 21, 2021
Apache Solr SSRF(CVE-2021-27905)

Solr-SSRF Apache Solr SSRF #Use [-] Apache Solr SSRF漏洞 (CVE-2021-27905) [-] Options: -h or --help : 方法说明 -u or --url

Henry4E36 70 Nov 9, 2022
Index different CKAN entities in Solr, not just datasets

ckanext-sitesearch Index different CKAN entities in Solr, not just datasets Requirements This extension requires CKAN 2.9 or higher and Python 3 Featu

Open Knowledge Foundation 3 Dec 2, 2022
Mlflow-rest-client - Python client for MLflow REST API

Python Client for MLflow Python client for MLflow REST API. Features: Minimal de

MTS 35 Dec 23, 2022
Iris-client - Python client for DFIR-IRIS

Python client dfir_iris_client offers a Python interface to communicate with IRI

DFIR-IRIS 11 Dec 22, 2022
league-connection is a python package to communicate to riot client and league client

league-connection is a python package to communicate to riot client and league client.

Sandbox 1 Sep 13, 2022
Dns-Client-Server - Dns Client Server For Python

Dns-client-server DNS Server: supporting all types of queries and replies. Shoul

Nishant Badgujar 1 Feb 15, 2022