Pythonic search engine based on PyLucene.

Overview

image image image image image image image image image

Lupyne is a search engine based on PyLucene, the Python extension for accessing Java Lucene. Lucene is a relatively low-level toolkit, and PyLucene wraps it through automatic code generation. So although Java idioms are translated to Python idioms where possible, the resulting interface is far from Pythonic. See ./docs/examples.ipynb for comparisons with the Lucene API.

Lupyne also provides GraphQL and RESTful search services, based on Starlette. Note Solr and Elasticsearch are popular options for Lucene-based search, if no further (Python) customization is needed. So while the services are suitable for production usage, their primary motivation is to be an extensible example.

Not having to initially choose between an embedded library and a server not only provides greater flexibility, it can provide better performance, e.g., batch indexing offline and remote searching live. Additionally only lightweight wrappers with extended behavior are used wherever possible, so falling back to using PyLucene directly is always an option, but should never be necessary for performance.

Usage

PyLucene requires initializing the VM.

import lucene

lucene.initVM()

Indexes are accessed through an IndexSearcher (read-only), IndexWriter, or the combined Indexer.

from lupyne import engine

searcher = engine.IndexSearcher('index/path')
hits = searcher.search('text:query')

See ./lupyne/services/README.md for services usage.

Installation

% pip install lupyne[graphql,rest]

PyLucene is not pip installable.

Dependencies

  • PyLucene >=8
  • strawberry-graphql >=0.84.4 (if graphql option)
  • fastapi (if rest option)

Tests

100% branch coverage.

% pytest [--cov]

Changes

dev

  • PyLucene >=8.6 required
  • PyLucene 8.11 supported
  • CherryPy server removed

2.5

  • Python >=3.7 required
  • PyLucene 8.6 supported
  • CherryPy server deprecated

2.4

  • PyLucene >=8 required
  • Hit.keys renamed to Hit.sortkeys

2.3

  • PyLucene >=7.7 required
  • PyLucene 8 supported

2.2

  • PyLucene 7.6 supported

2.1

  • PyLucene >=7 required

2.0

  • PyLucene >=6 required
  • Python 3 support
  • client moved to external package

1.9

  • Python 2.6 dropped
  • PyLucene 4.8 and 4.9 dropped
  • IndexWriter implements context manager
  • Server DocValues updated via patch method
  • Spatial tile search optimized

1.8

  • PyLucene 4.10 supported
  • PyLucene 4.6 and 4.7 dropped
  • Comparator iteration optimized
  • Support for string based FieldCacheRangeFilters
Comments
  • Combining Querys with BooleanQuerys

    Combining Querys with BooleanQuerys

    Hi @coady, thanks for all your hard work on lupyne, its been super helpful for me! I used your Dockerfile as a basis for compiling JCC & PyLucene to wheel files in my own non-Docker environment and now I've been able to successfully run some of the examples and setup my own 14 GB corpus, index it to a directory, and do some basic searches based on the examples you provided in the docs.

    Right now I'm trying to write a slightly more complex query, but was having some trouble and hoping you might be able to point me in the right direction.

    I have a fairly simple index that has 4 stored fields. A text field containing the article text, a text field containing the name of the company (the list of company names is finite and each document is associated with exactly one company), a datetime field that contains the date the article was published, and an article id.

    I'm trying to write a query that does the following: find all documents that contain the phrase "lupyne is great" and occur between some arbitrary date range and that have a company_name field value of 'company a', 'company_b', or 'company_c'.

    I've tried the following:

    
    import lucene
    from lupyne import engine
    from datetime import date
    
    assert lucene.getVMEnv() or lucene.initVM()
    
    index_path: str = r'myindexdir'
    
    query_str: str = 'lupyne is great'
    start_date: date = date(year=2020, month=2, day=14)
    companies: [str] = ['company a', 'company b', 'company c']
    
    indexer = engine.Indexer(index_path, mode='r', nrt=True)
    
    indexer.set('article_id', stored=True)
    indexer.set('company_name', stored=True)
    indexer.set('date', engine.DateTimeField, stored=True)
    indexer.set('text', engine.Field.Text, stored=True)
    
    query_engine = engine.Query
    
    # The following works with the query string 'lupyne'
    query_str: str = 'lupyne'
    query = indexer.fields['date'].range(start_date, None) & query_engine.term('text', query_str)
    
    # This does not with the query_string 'lupyne is great',
    query_str: str = 'lupyne is great'
    query = indexer.fields['date'].range(start_date, None) & query_engine.phrase('text', query_str)
    # TypeError: unsupported operand type(s) for &: 'Query' and 'MultiPhraseQuery'
    
    # This also does not work
    range_query = query_engine.range('date', date_field.timestamp(start_date), None)
    # java.lang.IncompatibleClassChangeError
    #        at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:84)
    
    # This will also break
    range_query = query_engine.range('date', start_date, None)
    # lucene.InvalidArgsError: (<class 'org.apache.lucene.util.BytesRef'>, '__init__', (datetime.date(2021, 2, 2),))
    
    

    Any suggestions on how I might go about this? Thanks again for all the hard work!

    EDIT: So, it looks like this might be because Query.ranges() doesn't return a lupyne Query object as seen here, but instead directly returns a pylucene query object. Any good way to get around this?

    opened by ZeroCool2u 7
  • Bump actions/setup-python from 3 to 4

    Bump actions/setup-python from 3 to 4

    Bumps actions/setup-python from 3 to 4.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/setup-python@v4
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/setup-python@v4
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405 python-path output contains Python executable path.

    • Updated zeit/ncc to vercel/ncc package: #393

    • Bugfix: fixed output for prerelease version of poetry: #409

    • Made pythonLocation environment variable consistent for Python and PyPy: #418

    • Bugfix for 3.x-dev syntax: #417

    • Other improvements: #318 #396 #384 #387 #388

    Update actions/cache version to 2.0.2

    In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

    Add "cache-hit" output and fix "python-version" output for PyPy

    This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

    The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

    ... (truncated)

    Commits
    • d09bd5e fix: 3.x-dev can install a 3.y version (#417)
    • f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)
    • 53e1529 add support for python-version-file (#336)
    • 3f82819 Fix output for prerelease version of poetry (#409)
    • 397252c Update zeit/ncc to vercel/ncc (#393)
    • de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test
    • 22c6af9 Change PyPy version to rebuild cache
    • 081a3cf Merge pull request #405 from mayeut/interpreter-path
    • ff70656 feature: add a python-path output
    • fff15a2 Use pypyX.Y for PyPy python-version input (#349)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • Bump codecov/codecov-action from 2 to 3

    Bump codecov/codecov-action from 2 to 3

    Bumps codecov/codecov-action from 2 to 3.

    Release notes

    Sourced from codecov/codecov-action's releases.

    v3.0.0

    Breaking Changes

    • #689 Bump to node16 and small fixes

    Features

    • #688 Incorporate gcov arguments for the Codecov uploader

    Dependencies

    • #548 build(deps-dev): bump jest-junit from 12.2.0 to 13.0.0
    • #603 [Snyk] Upgrade @​actions/core from 1.5.0 to 1.6.0
    • #628 build(deps): bump node-fetch from 2.6.1 to 3.1.1
    • #634 build(deps): bump node-fetch from 3.1.1 to 3.2.0
    • #636 build(deps): bump openpgp from 5.0.1 to 5.1.0
    • #652 build(deps-dev): bump @​vercel/ncc from 0.30.0 to 0.33.3
    • #653 build(deps-dev): bump @​types/node from 16.11.21 to 17.0.18
    • #659 build(deps-dev): bump @​types/jest from 27.4.0 to 27.4.1
    • #667 build(deps): bump actions/checkout from 2 to 3
    • #673 build(deps): bump node-fetch from 3.2.0 to 3.2.3
    • #683 build(deps): bump minimist from 1.2.5 to 1.2.6
    • #685 build(deps): bump @​actions/github from 5.0.0 to 5.0.1
    • #681 build(deps-dev): bump @​types/node from 17.0.18 to 17.0.23
    • #682 build(deps-dev): bump typescript from 4.5.5 to 4.6.3
    • #676 build(deps): bump @​actions/exec from 1.1.0 to 1.1.1
    • #675 build(deps): bump openpgp from 5.1.0 to 5.2.1

    v2.1.0

    2.1.0

    Features

    • #515 Allow specifying version of Codecov uploader

    Dependencies

    • #499 build(deps-dev): bump @​vercel/ncc from 0.29.0 to 0.30.0
    • #508 build(deps): bump openpgp from 5.0.0-5 to 5.0.0
    • #514 build(deps-dev): bump @​types/node from 16.6.0 to 16.9.0

    v2.0.3

    2.0.3

    Fixes

    • #464 Fix wrong link in the readme
    • #485 fix: Add override OS and linux default to platform

    Dependencies

    • #447 build(deps): bump openpgp from 5.0.0-4 to 5.0.0-5
    • #458 build(deps-dev): bump eslint from 7.31.0 to 7.32.0
    • #465 build(deps-dev): bump @​typescript-eslint/eslint-plugin from 4.28.4 to 4.29.1
    • #466 build(deps-dev): bump @​typescript-eslint/parser from 4.28.4 to 4.29.1
    • #468 build(deps-dev): bump @​types/jest from 26.0.24 to 27.0.0
    • #470 build(deps-dev): bump @​types/node from 16.4.0 to 16.6.0
    • #472 build(deps): bump path-parse from 1.0.6 to 1.0.7
    • #473 build(deps-dev): bump @​types/jest from 27.0.0 to 27.0.1

    ... (truncated)

    Changelog

    Sourced from codecov/codecov-action's changelog.

    3.0.0

    Breaking Changes

    • #689 Bump to node16 and small fixes

    Features

    • #688 Incorporate gcov arguments for the Codecov uploader

    Dependencies

    • #548 build(deps-dev): bump jest-junit from 12.2.0 to 13.0.0
    • #603 [Snyk] Upgrade @​actions/core from 1.5.0 to 1.6.0
    • #628 build(deps): bump node-fetch from 2.6.1 to 3.1.1
    • #634 build(deps): bump node-fetch from 3.1.1 to 3.2.0
    • #636 build(deps): bump openpgp from 5.0.1 to 5.1.0
    • #652 build(deps-dev): bump @​vercel/ncc from 0.30.0 to 0.33.3
    • #653 build(deps-dev): bump @​types/node from 16.11.21 to 17.0.18
    • #659 build(deps-dev): bump @​types/jest from 27.4.0 to 27.4.1
    • #667 build(deps): bump actions/checkout from 2 to 3
    • #673 build(deps): bump node-fetch from 3.2.0 to 3.2.3
    • #683 build(deps): bump minimist from 1.2.5 to 1.2.6
    • #685 build(deps): bump @​actions/github from 5.0.0 to 5.0.1
    • #681 build(deps-dev): bump @​types/node from 17.0.18 to 17.0.23
    • #682 build(deps-dev): bump typescript from 4.5.5 to 4.6.3
    • #676 build(deps): bump @​actions/exec from 1.1.0 to 1.1.1
    • #675 build(deps): bump openpgp from 5.1.0 to 5.2.1

    2.1.0

    Features

    • #515 Allow specifying version of Codecov uploader

    Dependencies

    • #499 build(deps-dev): bump @​vercel/ncc from 0.29.0 to 0.30.0
    • #508 build(deps): bump openpgp from 5.0.0-5 to 5.0.0
    • #514 build(deps-dev): bump @​types/node from 16.6.0 to 16.9.0

    2.0.3

    Fixes

    • #464 Fix wrong link in the readme
    • #485 fix: Add override OS and linux default to platform

    Dependencies

    • #447 build(deps): bump openpgp from 5.0.0-4 to 5.0.0-5
    • #458 build(deps-dev): bump eslint from 7.31.0 to 7.32.0
    • #465 build(deps-dev): bump @​typescript-eslint/eslint-plugin from 4.28.4 to 4.29.1
    • #466 build(deps-dev): bump @​typescript-eslint/parser from 4.28.4 to 4.29.1
    • #468 build(deps-dev): bump @​types/jest from 26.0.24 to 27.0.0
    • #470 build(deps-dev): bump @​types/node from 16.4.0 to 16.6.0
    • #472 build(deps): bump path-parse from 1.0.6 to 1.0.7
    • #473 build(deps-dev): bump @​types/jest from 27.0.0 to 27.0.1
    • #478 build(deps-dev): bump @​typescript-eslint/parser from 4.29.1 to 4.29.2
    • #479 build(deps-dev): bump @​typescript-eslint/eslint-plugin from 4.29.1 to 4.29.2

    ... (truncated)

    Commits
    • e3c5604 Merge pull request #689 from codecov/feat/gcov
    • 174efc5 Update package-lock.json
    • 6243a75 bump to 3.0.0
    • 0d6466f Bump to node16
    • d4729ee fetch.default
    • 351baf6 fix: bash
    • d8cf680 Merge pull request #675 from codecov/dependabot/npm_and_yarn/openpgp-5.2.1
    • b775e90 Merge pull request #676 from codecov/dependabot/npm_and_yarn/actions/exec-1.1.1
    • 2ebc2f0 Merge pull request #682 from codecov/dependabot/npm_and_yarn/typescript-4.6.3
    • 8e2ef2b Merge pull request #681 from codecov/dependabot/npm_and_yarn/types/node-17.0.23
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • Example Failed

    Example Failed

    It seems that example failed:

    lucene.initVM()
    indexer = engine.Indexer()
    indexer.set('name', stored=True)
    indexer.set('text')
    indexer.add(name='sample', text='hello world')
    indexer.commit()
    

    Will raise:

    Traceback (most recent call last):
      File "index.py", line 78, in <module>
        indexer.set('text')
      File "/Users/Nasy/.pyenv/versions/3.8.0/lib/python3.8/site-packages/lupyne/engine/indexers.py", line 546, in set
        field = self.fields[name] = cls(name, **settings)
      File "/Users/Nasy/.pyenv/versions/3.8.0/lib/python3.8/site-packages/lupyne/engine/documents.py", line 61, in __init__
        assert self.stored or self.indexed or self.docvalues or self.dimensions
    AssertionError
    
    opened by nasyxx 2
  • Bump github/codeql-action from 1 to 2

    Bump github/codeql-action from 1 to 2

    Bumps github/codeql-action from 1 to 2.

    Changelog

    Sourced from github/codeql-action's changelog.

    2.1.8 - 08 Apr 2022

    • Update default CodeQL bundle version to 2.8.5. #1014
    • Fix error where the init action would fail due to a GitHub API request that was taking too long to complete #1025

    2.1.7 - 05 Apr 2022

    • A bug where additional queries specified in the workflow file would sometimes not be respected has been fixed. #1018

    2.1.6 - 30 Mar 2022

    • [v2+ only] The CodeQL Action now runs on Node.js v16. #1000
    • Update default CodeQL bundle version to 2.8.4. #990
    • Fix a bug where an invalid commit_oid was being sent to code scanning when a custom checkout path was being used. #956
    Commits
    • 2c03704 Allow the version of the ML-powered pack to depend on the CLI version
    • dd6b592 Simplify ML-powered query status report definition
    • a90d8bf Merge pull request #1011 from github/henrymercer/ml-powered-queries-pr-check
    • dc0338e Use latest major version of actions/upload-artifact
    • 57096fe Add a PR check to validate that ML-powered queries are run correctly
    • b0ddf36 Merge pull request #1012 from github/henrymercer/update-actions-major-versions
    • 1ea2f2d Merge branch 'main' into henrymercer/update-actions-major-versions
    • 9dcc141 Merge pull request #1010 from github/henrymercer/stop-running-ml-powered-quer...
    • ea751a9 Update other Actions from v2 to v3
    • a2949f4 Update actions/checkout from v2 to v3
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • Bump actions/checkout from 2 to 3

    Bump actions/checkout from 2 to 3

    Bumps actions/checkout from 2 to 3.

    Release notes

    Sourced from actions/checkout's releases.

    v3.0.0

    • Update default runtime to node16

    v2.4.0

    • Convert SSH URLs like org-<ORG_ID>@github.com: to https://github.com/ - pr

    v2.3.5

    Update dependencies

    v2.3.4

    v2.3.3

    v2.3.2

    Add Third Party License Information to Dist Files

    v2.3.1

    Fix default branch resolution for .wiki and when using SSH

    v2.3.0

    Fallback to the default branch

    v2.2.0

    Fetch all history for all tags and branches when fetch-depth=0

    v2.1.1

    Changes to support GHES (here and here)

    v2.1.0

    Changelog

    Sourced from actions/checkout's changelog.

    Changelog

    v2.3.1

    v2.3.0

    v2.2.0

    v2.1.1

    • Changes to support GHES (here and here)

    v2.1.0

    v2.0.0

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • Could anyone help with a simple example?

    Could anyone help with a simple example?

    Hello, I am new to search engine and lupyne. And I want to use search engine to help me achieve a simple target which is given a query, I want to search through all documents and return relevant ones in terms of BM25 score? How can I do it? I tried examples in the doc: image How can I assign BM25 as scoring function? and should I give different setting like tokenization when searching different languages ? Sorry for taking your time! Thanks !

    opened by Hannibal046 1
  • indexer.commit get struck when using multiprocess

    indexer.commit get struck when using multiprocess

    when indexer.commit() is run using a process (multiprocess), commit tends to get struck. I've tried attachCurrentThread() as well, but it doesnt seem to work.

    Is there any way where i ll be able to use multiprocess along with lypyne

    Following is the code:

    import lucene from lupyne import engine lucene.initVM() #assert lucene.getVMEnv() or lucene.initVM() from multiprocessing import Process

    #vm_env = lucene.initVM(vmargs=['-Djava.awt.headless=true']) #from org.apache.lucene import analysis, document, index, queryparser, search, store, util class testd: def idx(self): #lucene.getVMEnv().attachCurrentThread() print("init") indexer = engine.Indexer() indexer.set('fieldname', stored=True) # settings for all documents of indexer; indexed and tokenized is the default indexer.add(fieldname="sample_test")
    print("Trying to commit") indexer.commit() print("done")

    if __name__ == '__main__': #testd().idx() p = Process(target=testd().idx) p.start() p.join()

    opened by khasa3 1
  • Overriding dict.keys() with Hit.keys breaks Hit object displaying in IPython

    Overriding dict.keys() with Hit.keys breaks Hit object displaying in IPython

    First of all, thanks for your efforts in providing a high-level Lucene Python library! I really appreciate that I can almost completely omit Java-related code in my library.

    I'm experimenting with the library in IPython and have problems with displaying Hit object:

    In [101]: print(type(h))                                                                                                                                                         [0/11160]
    <class 'lupyne.engine.documents.Hit'>
    
    In [102]: print(repr(h))                      
    {'LEMMA': ['кошка'], 'LEMMA_LANGUAGE': ['RU'], 'POS': ['n']}
    
    In [103]: h                                   
    Out[103]: ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    ~/.pyenv/versions/3.6.9/envs/babelnet-lite-3.6.9/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
        700                 type_pprinters=self.type_printers,
        701                 deferred_pprinters=self.deferred_printers)
    --> 702             printer.pretty(obj)                                                                                                                                                   
        703             printer.flush()                                                                                                                                                       
        704             return stream.getvalue()                                                                                                                                              
                                                                                                                                                                                              
    ~/.pyenv/versions/3.6.9/envs/babelnet-lite-3.6.9/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)                                                                   
        383                 if cls in self.type_pprinters:                                                                                                                                    
        384                     # printer registered in self.type_pprinters                                                                                                                   
    --> 385                     return self.type_pprinters[cls](obj, self, cycle)                                                                                                             
        386                 else:                                                                                                                                                             
        387                     # deferred printer                                                                                                                                            
                                                                                                                                                                                              
    ~/.pyenv/versions/3.6.9/envs/babelnet-lite-3.6.9/lib/python3.6/site-packages/IPython/lib/pretty.py in inner(obj, p, cycle)                                                                
        606         step = len(start)                                                                                                                                                         
        607         p.begin_group(step, start)                                                                                                                                                
    --> 608         keys = obj.keys()                                                                                                                                                         
        609         # if dict isn't large enough to be truncated, sort keys before displaying                                                                                                 
        610         # From Python 3.7, dicts preserve order by definition, so we don't sort.                                                                                                  
                                                                                                                                                                                              
    TypeError: 'tuple' object is not callable
    

    I think that the problem is that the Hit object is the instance of dict and IPython tries to pretty print it as a dict, but when it calls Hit.keys() TypeError occurs because you've overridden it with a tuple. I suggest you rename Hit.keys to Hit.keys_ to fix that and to follow the principle of least astonishment.

    opened by rominf 1
  • Why Lupyne?

    Why Lupyne?

    Hello,

    While looking around for how to run PyLucene, I stumbled around your docker image for PyLucene and eventually here. I'm curious why you have written Lupyne? Is it to provide a more Pythonic interface to Lucene? Why should one use Lupyne over PyLucene? The lack of documentation on PyLucene makes me feel like only a handful of people are actually using PyLucene...

    Thanks, Sep

    opened by seperman 1
  • Bump actions/setup-python from 2 to 3

    Bump actions/setup-python from 2 to 3

    Bumps actions/setup-python from 2 to 3.

    Release notes

    Sourced from actions/setup-python's releases.

    v3.0.0

    What's Changed

    Breaking Changes

    With the update to Node 16, all scripts will now be run with Node 16 rather than Node 12.

    This new major release removes support of legacy pypy2 and pypy3 keywords. Please use more specific and flexible syntax to specify a PyPy version:

    jobs:
      build:
        runs-on: ubuntu-latest
        strategy:
          matrix:
            python-version:
            - 'pypy-2.7' # the latest available version of PyPy that supports Python 2.7
            - 'pypy-3.8' # the latest available version of PyPy that supports Python 3.8
            - 'pypy-3.8-v7.3.8' # Python 3.8 and PyPy 7.3.8
        steps:
        - uses: actions/checkout@v2
        - uses: actions/setup-python@v3
          with:
            python-version: ${{ matrix.python-version }}
    

    See more usage examples in the documentation

    Update primary and restore keys for pip

    In scope of this release we include a version of python in restore and primary cache keys for pip. Besides, we add temporary fix for Windows caching issue, that the pip cache dir command returns non zero exit code or writes to stderr. Moreover we updated node-fetch dependency.

    Update actions/cache version to 1.0.8

    We have updated actions/cache dependency version to 1.0.8 to support 10GB cache upload

    Support caching dependencies

    This release introduces dependency caching support (actions/setup-python#266)

    Caching dependencies.

    The action has a built-in functionality for caching and restoring pip/pipenv dependencies. The cache input is optional, and caching is turned off by default.

    Besides, this release introduces dependency caching support for mono repos and repositories with complex structure.

    By default, the action searches for the dependency file (requirements.txt for pip or Pipfile.lock for pipenv) in the whole repository. Use the cache-dependency-path input for cases when you want to override current behaviour and use different file for hash generation (for example requirements-dev.txt). This input supports wildcards or a list of file names for caching multiple dependencies.

    Caching pip dependencies:

    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    </tr></table> 
    

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 0
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Search Engine (GShop, GVideo and many too) and Baidu Search Engine.

null 33 Nov 21, 2022
Google Search Engine Results Pages (SERP) in locally, no API key, no signup required

Local SERP Google Search Engine Results Pages (SERP) in locally, no API key, no signup required Make sure the chromedriver and required package are in

theblackcat102 4 Jun 29, 2021
Simple algorithm search engine like google in python using function

Mini-Search-Engine-Like-Google I have created the simple algorithm search engine like google in python using function. I am matching every word with w

Sachin Vinayak Dabhade 5 Sep 24, 2021
A sentence search engine that fetches examples from trusted news/media organisations. Great for writing better English.

A sentence search engine that fetches examples from trusted news/media websites. Great for improving writing & speaking better English.

Stephen Appiah 1 Apr 4, 2022
A simple search engine that allow searching for chess games

A simple search engine that allow searching for chess games based on queries about opening names & opening moves. Built with Python 3.10 and python-chess.

Tyler Hoang 1 Jun 17, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022
Search emails from a domain through search engines

EmailFinder - search emails through Search Engines

Josué Encinar 155 Dec 30, 2022
GitScanner is a script to make it easy to search for Exposed Git through an advanced Google search.

GitScanner Legal disclaimer Usage of GitScanner for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to

Kaio Gomes 3 Oct 28, 2022
A fast, efficiency python package for searching and getting search results with many different search engines

search A fast, efficiency python package for searching and getting search results with many different search engines. Installation To install the pack

Neurs 0 Oct 6, 2022
Reverse-ikea-image-search - A simple image of ikea search using jina.ai

IKEA Reverse Image Search This is a demo project to fetch ikea product images(IK

SOUVIK GHOSH 4 Mar 8, 2022
Modular search for Django

Haystack Author: Daniel Lindsley Date: 2013/07/28 Haystack provides modular search for Django. It features a unified, familiar API that allows you to

Haystack Search 3.4k Jan 4, 2023
Full text search for flask.

flask-msearch Installation To install flask-msearch: pip install flask-msearch # when MSEARCH_BACKEND = "whoosh" pip install whoosh blinker # when MSE

honmaple 197 Dec 29, 2022
Jina allows you to build deep learning-powered search-as-a-service in just minutes

Cloud-native neural search framework for any kind of data

Jina AI 17k Dec 31, 2022
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Manos Pitsidianakis 152 Oct 29, 2022
A web search server for ParlAI, including Blenderbot2.

Description A web search server for ParlAI, including Blenderbot2. Querying the server: The server reacting correctly: Uses html2text to strip the mar

Jules Gagnon-Marchand 119 Jan 6, 2023
This project is a sample demo of Arxiv search related to AI/ML Papers built using Streamlit, sentence-transformers and Faiss.

This project is a sample demo of Arxiv search related to AI/ML Papers built using Streamlit, sentence-transformers and Faiss.

Karn Deb 49 Oct 30, 2022
Google Project: Search and auto-complete sentences within given input text files, manipulating data with complex data-structures.

Auto-Complete Google Project In this project there is an implementation for one feature of Google's search engines - AutoComplete. Autocomplete, or wo

Hadassah Engel 10 Jun 20, 2022
Full-text multi-table search application for Django. Easy to install and use, with good performance.

django-watson django-watson is a fast multi-model full-text search plugin for Django. It is easy to install and use, and provides high quality search

Dave Hall 1.1k Jan 3, 2023
rclip - AI-Powered Command-Line Photo Search Tool

rclip is a command-line photo search tool based on the awesome OpenAI's CLIP neural network.

Yurij Mikhalevich 394 Dec 12, 2022