Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Overview

PyPI Version Conda Version Downloads Package Status Build Status License Documentation Status

About

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows Eland to access large datasets stored in Elasticsearch.

Eland also provides tools to upload trained machine learning models from your common libraries like scikit-learn, XGBoost, and LightGBM into Elasticsearch.

Getting Started

Eland can be installed from PyPI with Pip:

$ python -m pip install eland

Eland can also be installed from Conda Forge with Conda:

$ conda install -c conda-forge eland

Compatibility

  • Supports Python 3.7+ and Pandas 1.3
  • Supports Elasticsearch clusters that are 7.11+, recommended 7.14 or later for all features to work.

Connecting to Elasticsearch

Eland uses the Elasticsearch low level client to connect to Elasticsearch. This client supports a range of connection options and authentication options.

You can pass either an instance of elasticsearch.Elasticsearch to Eland APIs or a string containing the host to connect to:

") ) df = ed.DataFrame(es, es_index_pattern="flights") ">
import eland as ed

# Connecting to an Elasticsearch instance running on 'localhost:9200'
df = ed.DataFrame("localhost:9200", es_index_pattern="flights")

# Connecting to an Elastic Cloud instance
from elasticsearch import Elasticsearch

es = Elasticsearch(
    cloud_id="cluster-name:...",
    http_auth=("elastic", "
   
    "
   )
)
df = ed.DataFrame(es, es_index_pattern="flights")

DataFrames in Eland

eland.DataFrame wraps an Elasticsearch index in a Pandas-like API and defers all processing and filtering of data to Elasticsearch instead of your local machine. This means you can process large amounts of data within Elasticsearch from a Jupyter Notebook without overloading your machine.

Eland DataFrame API documentation

Advanced examples in a Jupyter Notebook

900.0) & (df.Cancelled == True)].head() AvgTicketPrice Cancelled ... dayOfWeek timestamp 8 960.869736 True ... 0 2018-01-01 12:09:35 26 975.812632 True ... 0 2018-01-01 15:38:32 311 946.358410 True ... 0 2018-01-01 11:51:12 651 975.383864 True ... 2 2018-01-03 21:13:17 950 907.836523 True ... 2 2018-01-03 05:14:51 [5 rows x 27 columns] # Running aggregations across an index >>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std']) DistanceKilometers AvgTicketPrice sum 9.261629e+07 8.204365e+06 min 0.000000e+00 1.000205e+02 std 4.578263e+03 2.663867e+02 ">
>>> import eland as ed

>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   AvgTicketPrice      13059 non-null  float64       
 1   Cancelled           13059 non-null  bool          
 2   Carrier             13059 non-null  object        
...      
 24  OriginWeather       13059 non-null  object        
 25  dayOfWeek           13059 non-null  int64         
 26  timestamp           13059 non-null  datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB

# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
     AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
8        960.869736       True  ...         0 2018-01-01 12:09:35
26       975.812632       True  ...         0 2018-01-01 15:38:32
311      946.358410       True  ...         0 2018-01-01 11:51:12
651      975.383864       True  ...         2 2018-01-03 21:13:17
950      907.836523       True  ...         2 2018-01-03 05:14:51

[5 rows x 27 columns]

# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
     DistanceKilometers  AvgTicketPrice
sum        9.261629e+07    8.204365e+06
min        0.000000e+00    1.000205e+02
std        4.578263e+03    2.663867e+02

Machine Learning in Eland

Eland allows transforming trained models from scikit-learn, XGBoost, and LightGBM libraries to be serialized and used as an inference model in Elasticsearch

Eland Machine Learning API documentation

Read more about Machine Learning in Elasticsearch

>> xgb_model.fit(training_data[0], training_data[1]) >>> xgb_model.predict(training_data[0]) [0 1 1 0 1 0 0 0 1 0] # Import the model into Elasticsearch >>> es_model = MLModel.import_model( es_client="localhost:9200", model_id="xgb-classifier", model=xgb_model, feature_names=["f0", "f1", "f2", "f3", "f4"], ) # Exercise the ML model in Elasticsearch with the training data >>> es_model.predict(training_data[0]) [0 1 1 0 1 0 0 0 1 0] ">
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])

>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
    es_client="localhost:9200",
    model_id="xgb-classifier",
    model=xgb_model,
    feature_names=["f0", "f1", "f2", "f3", "f4"],
)

# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]
Comments
  • Add `iterrows()` and `itertuples()` DataFrame API, its usage is similar to `pandas`

    Add `iterrows()` and `itertuples()` DataFrame API, its usage is similar to `pandas`

    Related to this issues: Can we get batch data use df.to_pandas() in the case of big data? close #345

    Add to_pandas_in_batch() DataFrame API

    Then, We can use below code to get batch dataframe in the case of big data

    pd_df_iterator = ed_df.to_pandas_in_batch(batch_size=1000)
    for pd_df in pd_df_iterator:
        print(pd_df)
    

    If there code is something wrong, please give some suggestions. Thank you!

    opened by kxbin 25
  • Fix issues following update to pandas 1.0.1

    Fix issues following update to pandas 1.0.1

    • Change _stringify_path to stringify_path (make it public)
    • Fix info() formatting
    • Update output in notebooks cells and doctest strings
    • Keep the original dtype for some aggregate functions (min, max) by adding an optional parameter (keep_domain)

    Closes #124

    opened by mesejo 20
  • Add `iterrows()` and `itertuples()` DataFrame API, its usage is similar to `pandas`

    Add `iterrows()` and `itertuples()` DataFrame API, its usage is similar to `pandas`

    I'm sorry about this, because I want to squash multiple commits, and accidentally modified the branch name. So that the PR was automatically closed unexpectedly.

    For historical Conversation see PR #369

    @sethmlarson Thanks for your comment Based on these suggestions, I have completed the modification, And passed the lint and docs jobs.

    Thanks to this change in #379, no need to convert _es_results_to_pandas into a generator now, because they have similar effects. Now, the performance has been further improved and the logic is more concise.

    Finally, the same 50,000 data sets, my test results are as follows:

    ed.iterrows(),  It took a total of `19 seconds` after the iteration
    ed.itertuples(), It took a total of `15 seconds` after the iteration
    ed.to_pandas(), It took `15 seconds`
    

    Closes https://github.com/elastic/eland/issues/345

    opened by kxbin 19
  • Added support for 2 date formats:

    Added support for 2 date formats:

    This PR addresses #22 by adding support for 2 date formats:

    1. epoch_millis
    2. epoch_seconds

    When the mapping are fetched each date type will carry additional format information. Ex. date[epoch_millis], date[epoch_second], etc.

    This will later on be converted to pandas types according to their format: datetime64[ms], datetime64[s]

    enhancement 
    opened by viglia 19
  • Add quantile to DataFrame and Series

    Add quantile to DataFrame and Series

    Closes #315

    • Functionality w.r.t pandas for quantile is implemented i.e. df.agg['quantile',...] , df.quantile() , Series.quantile()
    • Added tests and documentation.

    @sethmlarson Please Review 😄

    P.S. Need to change commit message while merging to master from percentile to quantile 😃

    opened by V1NAY8 14
  • Refactor tests

    Refactor tests

    Changes made are:

    • Refactored eland/tests/* to test_eland/* and imports used in tests.
    • Updated noxfile.py
    • Added code snippets to contributing.rst
    • All tests (Including tests, doctests) were run and successful

    @sethmlarson Please review, Happy to make any changes required 😄

    opened by V1NAY8 14
  • [ML] adding inference results tests for pytorch transformer models

    [ML] adding inference results tests for pytorch transformer models

    This commit adds many integration tests for

    • downloading PyTorch models
    • Processing them locally prepping for Elasticsearch uploading
    • Uploading and deploying to Elasticsearch
    • Confirming inference results on the models.

    The tests cover all our current NLP tasks over only some of our supported models.

    The models chosen to test were chosen for the broadest coverage over our supported tokenization and model wrapped types.

    ci topic:ml 
    opened by benwtrent 13
  • Add nbval to CI and notebook examples

    Add nbval to CI and notebook examples

    Closes #88 The following changes are made:

    • Modified noxfile.py to include nbval tests which are executed when CI is run.
    • Added some notebook tests which are
      • # doctest: +SKIP tests in tests folder
      • A demo notebook of all basic functionality
      • ETL notebook which consists of tests for show_progress parameter
      • Metrics notebook which consists of median, mad => numeric_only tests

    @sethmlarson Please review and let me know if any tests are to be added or If existing ones are not necessary.

    • License headers are not included for .ipynb or .csv files, Is that Acceptable ?
    opened by V1NAY8 12
  • Handle datetime types in comparison filters

    Handle datetime types in comparison filters

    Fixes issue #265, partially implements #236

    I implemented the handling of python's datetime objects when comparing with series. The datetime objects will be converted to a string in the ElasticSearch format "strict_date_optional_time" (if there is a more suitable format, let me know).

    Todos:

    • [x] Handle datetime objects for less, less-equal, greater, greater-equal, equal and not-equal comparison
    • [x] Handle np.datetime64 objects for less, less-equal, greater, greater-equal, equal and not-equal comparison
    • [x] Write tests

    Tell me if I missed some types that should be supported as well!

    If this PR is helpful, I would appreciate this PR to be labeled with "hacktoberfest-accepted" (:

    Cheers :v:

    hacktoberfest-accepted 
    opened by Fju 12
  • Switch agg defaults to numeric_only=None

    Switch agg defaults to numeric_only=None

    Closes #254

    • Added numeric_only = True | False | None for all agg and aggs (except nunique)
    • Added logic where booleans are not supported by median_absolute_deviation
    • Added tests, doctests for every change made, also modified existing tests to work with these changes.
    • Nox and pytest sessions are successful.
    opened by V1NAY8 12
  • Add mode to dataframe and series

    Add mode to dataframe and series

    Closes #215

    • Added mode to dataframe and series

    • Added tests, documentation

    • Currently mode isn't supported by pandas groupby too, Hence added a NotImplementedError

    • To satisfy mypy I had to add type hints to some other methods too.

    • Implemented es_size parameter instead of dumping the entire cluster if we have multiple mode values.

    • Currently dropna=True is only supported because I have a query on missing parameter of terms aggregation Reference

    @sethmlarson Please review 😃

    opened by V1NAY8 10
  • Remind users to sync saved objects in order for training model to show up in Kibana

    Remind users to sync saved objects in order for training model to show up in Kibana

    We should consider automatically synchronizing Kibana saved objects after using eland_import_hub_model to import a training model to Elasticsearch, or add an output message to the docker run -it --rm --network host elastic/eland eland_import_hub_model script at the end of the process to instruct the user to either call the Kibana API or click on Synchronize saved objects from the ML UI to manually sync the objects in order for the training model to show up.

    This will improve the user experience when using the import script.

    opened by ppf2 0
  •  How to download only desired entry rows from Eland?

    How to download only desired entry rows from Eland?

    Currently, if I want to create a dataframe with specific entries from an index, using Eland, I must first download a dataframe with all the columns, and then filter them out, locally... this is hardly desirable in terms of CPU and RAM local usage. Is there a way to push the filtering to the ES nodes instead, and simply download the already filtered dataframe?

    opened by IavTavares 0
  • pydata-sphinx-theme has broken our docs

    pydata-sphinx-theme has broken our docs

    Some options:

    • Pin the pydata-sphinx-theme version
    • Change to a theme that's stable
    • Move all docs to elastic.co (this is what I'd like to do long term)
    opened by sethmlarson 0
  • Document Rust requirement

    Document Rust requirement

    Some users of the NLP features have reported that Rust is required to install eland. The dependency comes from the fast tokenizers in Hugging Face transformers.

    Closes #495

    documentation 
    opened by davidkyle 0
  • Include pitfall of `--start` in the README

    Include pitfall of `--start` in the README

    Users who follow the Eland README as a guide to importing models can easily end up seeing inexplicably poor performance due to unknowingly running the model with one allocation and one thread per allocation.

    This change spells out the effect of --start and links to alternatives that allow better use of available hardware.

    documentation 
    opened by droberts195 0
Releases(v8.3.0)
  • v8.3.0(Jul 11, 2022)

    Added

    • Added a new NLP model task type "auto" which infers the task type based on model configuration and architecture (#475)

    Changed

    • Changed required version of 'torch' package to >=1.11.0,<1.12 to match required PyTorch version for Elasticsearch 8.3 (was >=1.9.0,<2) (#479)
    • Changed the default value of the --task-type parameter for the eland_import_hub_model CLI to be "auto" (#475)

    Fixed

    • Fixed decision tree classifier serialization to account for probabilities (#465)
    • Fixed PyTorch model quantization (#472)
    Source code(tar.gz)
    Source code(zip)
    eland-8.3.0-py3-none-any.whl(140.30 KB)
    eland-8.3.0.tar.gz(115.78 KB)
  • v8.2.0(May 11, 2022)

  • v8.1.0(Mar 31, 2022)

  • v8.0.0(Feb 10, 2022)

    Added

    • Added support for Natural Language Processing (NLP) models using PyTorch (#394)
    • Added new extra eland[pytorch] for installing all dependencies needed for PyTorch (#394)
    • Added a CLI script eland_import_hub_model for uploading HuggingFace models to Elasticsearch (#403)
    • Added support for v8.0 of the Python Elasticsearch client (#415)
    • Added a warning if Eland detects it's communicating with an incompatible Elasticsearch version (#419)
    • Added support for number_samples to LightGBM and Scikit-Learn models (#397, contributed by @V1NAY8)
    • Added ability to use datetime types for filtering dataframes (#284, contributed by @Fju)
    • Added pandas datetime64 type to use the Elasticsearch date type (#425, contributed by @Ashton-Sidhu)
    • Added es_verify_mapping_compatibility parameter to disable schema enforcement with pandas_to_eland (#423, contributed by @Ashton-Sidhu)

    Changed

    • Changed to_pandas() to only use Point-in-Time and search_after instead of using Scroll APIs for pagination.
    Source code(tar.gz)
    Source code(zip)
    eland-8.0.0-py3-none-any.whl(134.03 KB)
    eland-8.0.0.tar.gz(9.25 MB)
  • v8.0.0b1(Dec 16, 2021)

    Added

    • Added support for Natural Language Processing (NLP) models using PyTorch (https://github.com/elastic/eland/pull/394)
    • Added new extra eland[pytorch] for installing all dependencies needed for PyTorch (https://github.com/elastic/eland/pull/394)
    • Added a CLI script eland_import_hub_model for uploading HuggingFace models to Elasticsearch (https://github.com/elastic/eland/pull/403)
    • Added support for v8.0 of the Python Elasticsearch client (https://github.com/elastic/eland/pull/415)
    • Added a warning if Eland detects it's communicating with an incompatible Elasticsearch version (https://github.com/elastic/eland/pull/419)
    • Added support for number_samples to LightGBM and Scikit-Learn models (https://github.com/elastic/eland/pull/397, contributed by @V1NAY8)

    Changed

    • Changed to_pandas() to only use Point-in-Time and search_after instead of using Scroll APIs for pagination.
    Source code(tar.gz)
    Source code(zip)
    eland-8.0.0b1-py3-none-any.whl(133.76 KB)
    eland-8.0.0b1.tar.gz(9.25 MB)
  • v7.14.1b1(Aug 30, 2021)

  • v7.14.0b1(Aug 9, 2021)

    Added

    • Added support for Pandas 1.3.x (#362, contributed by @V1NAY8)
    • Added support for LightGBM 3.x (#362, contributed by @V1NAY8)
    • Added DataFrame.idxmax() and DataFrame.idxmin() methods (#353, contributed by @V1NAY8)
    • Added type hints to eland.ndframe and eland.operations (#366, contributed by @V1NAY8)

    Removed

    • Removed support for Pandas <1.2 (#364)
    • Removed support for Python 3.6 to match Pandas (#364)

    Changed

    • Changed paginated search function to use Point-in-Time and Search After features instead of Scroll when connected to Elasticsearch 7.12+ (#370 and #376, contributed by @V1NAY8)
    • Optimized the FieldMappings.aggregate_field_name() method (#373, contributed by @V1NAY8)
    Source code(tar.gz)
    Source code(zip)
    eland-7.14.0b1-py3-none-any.whl(123.97 KB)
    eland-7.14.0b1.tar.gz(102.71 KB)
  • v7.13.0b1(Jun 22, 2021)

    Added

    • Added DataFrame.quantile(), Series.quantile(), and DataFrameGroupBy.quantile() aggregations (#318 and #356, contributed by @V1NAY8)

    Changed

    • Changed the error raised when es_index_pattern doesn't point to any indices to be more user-friendly (#346)

    Fixed

    • Fixed a warning about conflicting field types when wildcards are used in es_index_pattern (#346)
    • Fixed sorting when using DataFrame.groupby() with dropna (#322, contributed by @V1NAY8)
    • Fixed deprecated usage numpy.int in favor of numpy.int_ (#354, contributed by @V1NAY8)
    Source code(tar.gz)
    Source code(zip)
    eland-7.13.0b1-py3-none-any.whl(120.72 KB)
    eland-7.13.0b1.tar.gz(99.19 KB)
  • 7.10.1b1(Jan 12, 2021)

    Added

    • Added support for Pandas 1.2.0 (#336)

    • Added DataFrame.mode() and Series.mode() aggregation (#323, contributed by @V1NAY8)

    • Added support for pd.set_option("display.max_rows", None) (#308, contributed by @V1NAY8)

    • Added Elasticsearch storage usage to df.info() (#321, contributed by @V1NAY8)

    Removed

    • Removed deprecated aliases read_es, read_csv, DataFrame.info_es, and MLModel(overwrite=True) (#331, contributed by @V1NAY8)
    Source code(tar.gz)
    Source code(zip)
    eland-7.10.1b1-py3-none-any.whl(118.00 KB)
    eland-7.10.1b1.tar.gz(96.54 KB)
  • 7.10.0b1(Oct 29, 2020)

    Added

    • Added DataFrame.groupby() method with all aggregations (#278, #291, #292, #300 contributed by @V1NAY8)

    • Added es_match() method to DataFrame and Series for filtering rows with full-text search (#301)

    • Added support for type hints of the elasticsearch-py package (#295)

    • Added support for passing dictionaries to es_type_overrides parameter in the pandas_to_eland() function to directly control the field mapping generated in Elasticsearch (#310)

    • Added es_dtypes property to DataFrame and Series (#285)

    Changed

    • Changed pandas_to_eland() to use the parallel_bulk() helper instead of single-threaded bulk() helper to improve performance (#279, contributed by @V1NAY8)

    • Changed the es_type_overrides parameter in pandas_to_eland() to raise ValueError if an unknown column is given (#302)

    • Changed DataFrame.filter() to preserve the order of items (#283, contributed by @V1NAY8)

    • Changed when setting es_type_overrides={"column": "text"} in pandas_to_eland() will automatically add the column.keyword sub-field so that aggregations are available for the field as well (#310)

    Fixed

    • Fixed Series.__repr__ when the series is empty (#306)
    Source code(tar.gz)
    Source code(zip)
    eland-7.10.0b1-py3-none-any.whl(198.79 KB)
    eland-7.10.0b1.tar.gz(128.32 KB)
  • 7.9.1a1(Sep 30, 2020)

    Added

    • Added the predict() method and model_type, feature_names, and results_field properties to MLModel (#266)

    Deprecated

    • Deprecated ImportedMLModel in favor of MLModel.import_model(...) (#266)

    Changed

    • Changed DataFrame aggregations to use numeric_only=None instead of numeric_only=True by default. This is the same behavior as Pandas (#270, contributed by @V1NAY8)

    Fixed

    • Fixed DataFrame.agg() when given a string instead of a list of aggregations will now properly return a Series instead of a DataFrame (#263, contributed by @V1NAY8)
    Source code(tar.gz)
    Source code(zip)
    eland-7.9.1a1-py3-none-any.whl(181.22 KB)
    eland-7.9.1a1.tar.gz(110.35 KB)
  • 7.9.0a1(Aug 18, 2020)

    7.9.0a1 (2020-08-18)

    Added

    • Added support for Pandas v1.1 (#253)
    • Added support for LightGBM LGBMRegressor and LGBMClassifier to ImportedMLModel (#247, #252)
    • Added support for multi:softmax and multi:softprob XGBoost operators to ImportedMLModel (#246)
    • Added column names to DataFrame.__dir__() for better auto-completion support (#223, contributed by @leonardbinet)
    • Added support for es_if_exists='append' to pandas_to_eland() (#217)
    • Added support for aggregating datetimes with nunique and mean (#253)
    • Added es_compress_model_definition parameter to ImportedMLModel constructor (#220)
    • Added .size and .ndim properties to DataFrame and Series (#231 and #233)
    • Added .dtype property to Series (#258)
    • Added support for using pandas.Series with Series.isin() (#231)
    • Added type hints to many APIs in DataFrame and Series (#231)

    Deprecated

    • Deprecated the overwrite parameter in favor of es_if_exists in ImportedMLModel constructor (#249, contributed by @V1NAY8)

    Changed

    • Changed aggregations for datetimes to be higher precision when available (#253)

    Fixed

    • Fixed ImportedMLModel.predict() to fail when errors are present in the ingest.simulate response (#220)
    • Fixed Series.median() aggregation to return a scalar instead of pandas.Series (#253)
    • Fixed Series.describe() to return a pandas.Series instead of pandas.DataFrame (#258)
    • Fixed DataFrame.mean() and Series.mean() dtype (#258)
    • Fixed DataFrame.agg() aggregations when using extended_stats Elasticsearch aggregation (#253)
    Source code(tar.gz)
    Source code(zip)
    eland-7.9.0a1-py3-none-any.whl(178.54 KB)
    eland-7.9.0a1.tar.gz(107.78 KB)
  • 7.7.0a1(Aug 12, 2020)

    7.7.0a1 (2020-05-20)

    Added

    • Added the package to Conda Forge, install via conda install -c conda-forge eland (#209)
    • Added DataFrame.sample() and Series.sample() for querying a random sample of data from the index (#196, contributed by @mesejo)
    • Added Series.isna() and Series.notna() for filtering out missing, NaN or null values from a column (#210, contributed by @mesejo)
    • Added DataFrame.filter() and Series.filter() for reducing an axis using a sequence of items or a pattern (#212)
    • Added DataFrame.to_pandas() and Series.to_pandas() for converting an Eland dataframe or series into a Pandas dataframe or series inline (#208)
    • Added support for XGBoost v1.0.0 (#200)

    Deprecated

    • Deprecated info_es() in favor of es_info() (#208)
    • Deprecated eland.read_csv() in favor of eland.csv_to_eland() (#208)
    • Deprecated eland.read_es() in favor of eland.DataFrame() (#208)

    Changed

    • Changed var and std aggregations to use sample instead of population in line with Pandas (#185)
    • Changed painless scripts to use source rather than inline to improve script caching performance (#191, contributed by @mesejo)
    • Changed minimum elasticsearch Python library version to v7.7.0 (#207)
    • Changed name of Index.field_name to Index.es_field_name (#208)

    Fixed

    • Fixed DeprecationWarning raised from pandas.Series when an an empty series was created without specifying dtype (#188, contributed by @mesejo)
    • Fixed a bug when filtering columns on complex combinations of and and or (#204)
    • Fixed an issue where DataFrame.shape would return a larger value than in the index if a sized operation like .head(X) was applied to the data frame (#205, contributed by @mesejo)
    • Fixed issue where both scikit-learn and xgboost libraries were required to use eland.ml.ImportedMLModel, now only one library is required to use this feature (#206)
    Source code(tar.gz)
    Source code(zip)
    eland-7.7.0a1-py3-none-any.whl(137.93 KB)
    eland-7.7.0a1.tar.gz(99.58 KB)
  • 7.6.0a5(Aug 12, 2020)

    7.6.0a5 (2020-04-14)

    Added

    • Added support for Pandas v1.0.0 (#141, contributed by @mesejo)
    • Added use_pandas_index_for_es_ids parameter to pandas_to_eland() (#154)
    • Added es_type_overrides parameter to pandas_to_eland() (#181)
    • Added NDFrame.var(), .std() and .median() aggregations (#175, #176, contributed by @mesejo)
    • Added DataFrame.es_query() to allow modifying ES queries directly (#156)
    • Added eland.__version__ (#153, contributed by @mesejo)

    Changed

    • Changed ML model serialization to be slightly smaller (#159)
    • Changed minimum elasticsearch Python library version to v7.6.0 (#181)

    Fixed

    • Fixed inference_config being required on ML models for ES >=7.8 (#174)
    • Fixed unpacking for DataFrame.aggregate("median") (#161)

    Removed

    • Removed support for Python 3.5 (#150)
    • Removed eland.Client() interface, use elasticsearch.Elasticsearch() client instead (#166)
    • Removed all private objects from top-level eland namespace (#170)
    • Removed geo_points from pandas_to_eland() in favor of es_type_overrides (#181)

    """

    Source code(tar.gz)
    Source code(zip)
    eland-7.6.0a5-py3-none-any.whl(143.67 KB)
    eland-7.6.0a5.tar.gz(91.15 KB)
  • 7.6.0a4(Aug 12, 2020)

Es-schema - Common Data Schemas for Elasticsearch

Common Data Schemas for Elasticsearch The Common Data Schema for Elasticsearch i

Tim Schnell 2 Jan 25, 2022
A real-time tech course finder, created using Elasticsearch, Python, React+Redux, Docker, and Kubernetes.

A real-time tech course finder, created using Elasticsearch, Python, React+Redux, Docker, and Kubernetes.

Dinesh Sonachalam 130 Dec 20, 2022
esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch.

esguard esguard provides a Python decorator that waits for processing while monitoring the load of Elasticsearch. Quick Start You need to launch elast

po3rin 5 Dec 8, 2021
A library for fast import of Windows NT Registry(REGF) into Elasticsearch.

A library for fast import of Windows NT Registry(REGF) into Elasticsearch.

S.Nakano 3 Apr 1, 2022
A library for fast parse & import of Windows Prefetch into Elasticsearch.

prefetch2es Fast import of Windows Prefetch(.pf) into Elasticsearch. prefetch2es uses C library libscca. Usage When using from the commandline interfa

S.Nakano 5 Nov 24, 2022
Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Haystack Search 626 Dec 1, 2022
Google Project: Search and auto-complete sentences within given input text files, manipulating data with complex data-structures.

Auto-Complete Google Project In this project there is an implementation for one feature of Google's search engines - AutoComplete. Autocomplete, or wo

Hadassah Engel 10 Jun 20, 2022
Senginta is All in one Search Engine Scrapper for used by API or Python Module. It's Free!

Senginta is All in one Search Engine Scrapper. With traditional scrapping, Senginta can be powerful to get result from any Search Engine, and convert to Json. Now support only for Google Product Search Engine (GShop, GVideo and many too) and Baidu Search Engine.

null 33 Nov 21, 2022
Google Search Engine Results Pages (SERP) in locally, no API key, no signup required

Local SERP Google Search Engine Results Pages (SERP) in locally, no API key, no signup required Make sure the chromedriver and required package are in

theblackcat102 4 Jun 29, 2021
A play store search application programming interface ( API )

Play-Store-API A play store search application programming interface ( API ) Made with Python3

Fayas Noushad 8 Oct 21, 2022
This is a Telegram Bot written in Python for searching data on Google Drive.

This is a Telegram Bot written in Python for searching data on Google Drive. Supports multiple Shared Drives (TDs). Manual Guide for deploying the bot

Levi 158 Dec 27, 2022
txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.

NeuML 3.1k Dec 31, 2022
A fast, efficiency python package for searching and getting search results with many different search engines

search A fast, efficiency python package for searching and getting search results with many different search engines. Installation To install the pack

Neurs 0 Oct 6, 2022
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Manos Pitsidianakis 152 Oct 29, 2022
Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork.

Flask-WhooshAlchemy3 Whoosh indexing capabilities for Flask-SQLAlchemy, Python 3 compatibility fork. Performance improvements and suggestions are read

Blake VandeMerwe 27 Mar 10, 2022
a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot

Drive Search Bot This is a Telegram bot writen in Python for searching files in Drive. Based on SearchX-bot How to deploy? Clone this repo: git clone

Hafitz Setya 25 Dec 9, 2022
Simple algorithm search engine like google in python using function

Mini-Search-Engine-Like-Google I have created the simple algorithm search engine like google in python using function. I am matching every word with w

Sachin Vinayak Dabhade 5 Sep 24, 2021
User-friendly, tiny source code searcher written by pure Python.

User-friendly, tiny source code searcher written in pure Python. Example Usages Cat is equivalent in the regular expression as '^Cat$' bor class Cat

Furkan Onder 106 Nov 2, 2022
Pythonic Lucene - A simplified python impelementaiton of Apache Lucene

A simplified python impelementaiton of Apache Lucene, mabye helps to understand how an enterprise search engine really works.

Mahdi Sadeghzadeh Ghamsary 2 Sep 12, 2022