Skip to content

Latest commit

 

History

History
756 lines (545 loc) · 25.7 KB

CHANGELOG.rst

File metadata and controls

756 lines (545 loc) · 25.7 KB

Changelog

8.15.4 (2024-10-17)

  • Revert "Allow reading Elasticsearch certs in Wolfi image" (#734)

8.15.3 (2024-10-09)

  • Added support for DeBERTa-V2 tokenizer (#717)
  • Fixed --ca-cert with a shared Elasticsearch Docker volume (#732)

8.15.2 (2024-10-02)

  • Fixed Docker image build (#728)

8.15.1 (2024-10-01)

  • Upgraded PyTorch to version 2.3.1, which is compatible with Elasticsearch 8.15.2 or above (#718)
  • Migrated to distroless Wolfi base Docker image (#720)

8.15.0 (2024-08-12)

  • Added a default truncation of second for text similarity (#713)
  • Added note about using text_similarity for rerank in the CLI (#716)
  • Added support for lists in result hits (#707)
  • Removed input fields from exported LTR models (#708)

8.14.0 (2024-06-10)

Added

  • Added Elasticsearch Serverless support in DataFrames (#690, contributed by @AshokChoudhary11) and eland_import_hub_model (#698)

Fixed

  • Fixed Python 3.8 support (#695, contributed by @bartbroere)
  • Fixed non _source fields missing from the results hits (#693, contributed by @bartbroere)

8.13.1 (2024-05-03)

Added

  • Added support for HTTP proxies in eland_import_hub_model (#688)

8.13.0 (2024-03-27)

Added

  • Added support for Python 3.11 (#681)
  • Added eland.DataFrame.to_json function (#661, contributed by @bartbroere)
  • Added override option to specify the model's max input size (#674)

Changed

  • Upgraded torch to 2.1.2 (#671)
  • Mirrored pandas' lineterminator instead of line_terminator in to_csv (#595, contributed by @bartbroere)

8.12.1 (2024-01-30)

Fixed

  • Fix missing value support for XGBRanker (#654)

8.12.0 (2024-01-18)

Added

  • Supported XGBRanker model (#649)
  • Accepted LTR (Learning to rank) model config when importing model (#645, #651)
  • Added LTR feature logger (#648)
  • Added prefix_string config option to the import model hub script (#642)
  • Made online retail analysis notebook runnable in Colab (#641)
  • Added new movie dataset to the tests (#646)

8.11.1 (2023-11-22)

Added

  • Make demo notebook runnable in Colab (#630)

Changed

  • Bump Shap version to 0.43 (#636)

Fixed

  • Fix failed import of Sentence Transformer RoBERTa models (#637)

8.11.0 (2023-11-08)

Added

  • Support E5 small multilingual model (#625)

Changed

  • Stream writes in ed.DataFrame.to_csv() (#579)
  • Improve memory estimation for NLP models (#568)

Fixed

  • Fixed deprecations in preparation of Pandas 2.0 support (#602, #603, contributed by @bartbroere)

8.10.1 (2023-10-11)

Fixed

  • Fixed direct usage of TransformerModel (#619)

8.10.0 (2023-10-09)

Added

  • Published pre-built Docker images to docker.elastic.co/eland/eland (#613)
  • Allowed importing private HuggingFace models (#608)
  • Added Apple Silicon (arm64) support to Docker image (#615)
  • Allowed importing some DPR models like ance-dpr-context-multi (#573)
  • Allowed using the Pandas API without monitoring/main permissions (#581)

Changed

  • Updated Docker image to Debian 12 Bookworm (#613)
  • Reduced Docker image size by not installing unused PyTorch GPU support on amd64 (#615)
  • Reduced model chunk size to 1MB (#605)

Fixed

  • Fixed deprecations in preparation of Pandas 2.0 support (#593, #596, contributed by @bartbroere)

8.9.0 (2023-08-24)

Added

  • Simplify embedding model support and loading (#569)
  • Make eland_import_hub_model easier to find on Windows (#559)
  • Update trained model inference endpoint (#556)
  • Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534)
  • Add ability to upload xlm-roberta tokenized models (#518)
  • Tolerate different model output formats when measuring embedding size (#535)
  • Generate valid NLP model id from file path (#541)
  • Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model (#522)
  • Set embedding_size config parameter for Text Embedding models (#532)
  • Add support for the pass_through task (#526)

Fixed

  • Fixed black to comply with the code style (#557)
  • Fixed No module named 'torch' (#553)
  • Fix autosummary directive by removing hack autosummaries (#548)
  • Prevent TypeError with None check (#525)

8.7.0 (2023-03-30)

Added

  • Added a new NLP model task type "text_similarity" (#486)
  • Added a new NLP model task type "text_expansion" (#520)
  • Added support for exporting an Elastic ML model as a scikit-learn pipeline via MLModel.export_model() (#509)

Fixed

  • Fixed an issue that occurred when LightGBM was installed but libomp wasn't installed on the system. (#499)

8.3.0 (2022-07-11)

Added

  • Added a new NLP model task type "auto" which infers the task type based on model configuration and architecture (#475)

Changed

  • Changed required version of 'torch' package to >=1.11.0,<1.12 to match required PyTorch version for Elasticsearch 8.3 (was >=1.9.0,<2) (#479)
  • Changed the default value of the --task-type parameter for the eland_import_hub_model CLI to be "auto" (#475)

Fixed

  • Fixed decision tree classifier serialization to account for probabilities (#465)
  • Fixed PyTorch model quantization (#472)

8.2.0 (2022-05-09)

Added

  • Added support for passing Cloud ID via --cloud-id to eland_import_hub_model CLI tool (#462)
  • Added support for authenticating via --es-username, --es-password, and --es-api-key to the eland_import_hub_model CLI tool (#461)
  • Added support for XGBoost 1.6 (#458)
  • Added support for question_answering NLP tasks (#457)

8.1.0 (2022-03-31)

Added

  • Added support for eland.Series.unique() (#448, contributed by @V1NAY8)
  • Added --ca-certs and --insecure options to eland_import_hub_model for configuring TLS (#441)

8.0.0 (2022-02-10)

Added

  • Added support for Natural Language Processing (NLP) models using PyTorch (#394)
  • Added new extra eland[pytorch] for installing all dependencies needed for PyTorch (#394)
  • Added a CLI script eland_import_hub_model for uploading HuggingFace models to Elasticsearch (#403)
  • Added support for v8.0 of the Python Elasticsearch client (#415)
  • Added a warning if Eland detects it's communicating with an incompatible Elasticsearch version (#419)
  • Added support for number_samples to LightGBM and Scikit-Learn models (#397, contributed by @V1NAY8)
  • Added ability to use datetime types for filtering dataframes (`284`_, contributed by @Fju)
  • Added pandas datetime64 type to use the Elasticsearch date type (`#425`_, contributed by @Ashton-Sidhu)
  • Added es_verify_mapping_compatibility parameter to disable schema enforcement with pandas_to_eland (#423, contributed by @Ashton-Sidhu)

Changed

  • Changed to_pandas() to only use Point-in-Time and search_after instead of using Scroll APIs for pagination.

7.14.1b1 (2021-08-30)

Added

  • Added support for DataFrame.iterrows() and DataFrame.itertuples() (#380, contributed by @kxbin)

Performance

  • Simplified result collectors to increase performance transforming Elasticsearch results to pandas (#378, contributed by @V1NAY8)
  • Changed search pagination function to yield batches of hits (#379)

7.14.0b1 (2021-08-09)

Added

  • Added support for Pandas 1.3.x (#362, contributed by @V1NAY8)
  • Added support for LightGBM 3.x (#362, contributed by @V1NAY8)
  • Added DataFrame.idxmax() and DataFrame.idxmin() methods (#353, contributed by @V1NAY8)
  • Added type hints to eland.ndframe and eland.operations (#366, contributed by @V1NAY8)

Removed

  • Removed support for Pandas <1.2 (#364)
  • Removed support for Python 3.6 to match Pandas (#364)

Changed

  • Changed paginated search function to use Point-in-Time and Search After features instead of Scroll when connected to Elasticsearch 7.12+ (#370 and #376, contributed by @V1NAY8)
  • Optimized the FieldMappings.aggregate_field_name() method (#373, contributed by @V1NAY8)

7.13.0b1 (2021-06-22)

Added

  • Added DataFrame.quantile(), Series.quantile(), and DataFrameGroupBy.quantile() aggregations (#318 and #356, contributed by @V1NAY8)

Changed

  • Changed the error raised when es_index_pattern doesn't point to any indices to be more user-friendly (#346)

Fixed

  • Fixed a warning about conflicting field types when wildcards are used in es_index_pattern (#346)
  • Fixed sorting when using DataFrame.groupby() with dropna (#322, contributed by @V1NAY8)
  • Fixed deprecated usage numpy.int in favor of numpy.int_ (#354, contributed by @V1NAY8)

7.10.1b1 (2021-01-12)

Added

  • Added support for Pandas 1.2.0 (#336)
  • Added DataFrame.mode() and Series.mode() aggregation (#323, contributed by @V1NAY8)
  • Added support for pd.set_option("display.max_rows", None) (#308, contributed by @V1NAY8)
  • Added Elasticsearch storage usage to df.info() (#321, contributed by @V1NAY8)

Removed

  • Removed deprecated aliases read_es, read_csv, DataFrame.info_es, and MLModel(overwrite=True) (#331, contributed by @V1NAY8)

7.10.0b1 (2020-10-29)

Added

  • Added DataFrame.groupby() method with all aggregations (#278, #291, #292, #300 contributed by @V1NAY8)
  • Added es_match() method to DataFrame and Series for filtering rows with full-text search (#301)
  • Added support for type hints of the elasticsearch-py package (#295)
  • Added support for passing dictionaries to es_type_overrides parameter in the pandas_to_eland() function to directly control the field mapping generated in Elasticsearch (#310)
  • Added es_dtypes property to DataFrame and Series (#285)

Changed

  • Changed pandas_to_eland() to use the parallel_bulk() helper instead of single-threaded bulk() helper to improve performance (#279, contributed by @V1NAY8)
  • Changed the es_type_overrides parameter in pandas_to_eland() to raise ValueError if an unknown column is given (#302)
  • Changed DataFrame.filter() to preserve the order of items (#283, contributed by @V1NAY8)
  • Changed when setting es_type_overrides={"column": "text"} in pandas_to_eland() will automatically add the column.keyword sub-field so that aggregations are available for the field as well (#310)

Fixed

  • Fixed Series.__repr__ when the series is empty (#306)

7.9.1a1 (2020-09-29)

Added

  • Added the predict() method and model_type, feature_names, and results_field properties to MLModel (#266)

Deprecated

  • Deprecated ImportedMLModel in favor of MLModel.import_model(...) (#266)

Changed

  • Changed DataFrame aggregations to use numeric_only=None instead of numeric_only=True by default. This is the same behavior as Pandas (#270, contributed by @V1NAY8)

Fixed

  • Fixed DataFrame.agg() when given a string instead of a list of aggregations will now properly return a Series instead of a DataFrame (#263, contributed by @V1NAY8)

7.9.0a1 (2020-08-18)

Added

  • Added support for Pandas v1.1 (#253)
  • Added support for LightGBM LGBMRegressor and LGBMClassifier to ImportedMLModel (#247, #252)
  • Added support for multi:softmax and multi:softprob XGBoost operators to ImportedMLModel (#246)
  • Added column names to DataFrame.__dir__() for better auto-completion support (#223, contributed by @leonardbinet)
  • Added support for es_if_exists='append' to pandas_to_eland() (#217)
  • Added support for aggregating datetimes with nunique and mean (#253)
  • Added es_compress_model_definition parameter to ImportedMLModel constructor (#220)
  • Added .size and .ndim properties to DataFrame and Series (#231 and #233)
  • Added .dtype property to Series (#258)
  • Added support for using pandas.Series with Series.isin() (#231)
  • Added type hints to many APIs in DataFrame and Series (#231)

Deprecated

  • Deprecated the overwrite parameter in favor of es_if_exists in ImportedMLModel constructor (#249, contributed by @V1NAY8)

Changed

  • Changed aggregations for datetimes to be higher precision when available (#253)

Fixed

  • Fixed ImportedMLModel.predict() to fail when errors are present in the ingest.simulate response (#220)
  • Fixed Series.median() aggregation to return a scalar instead of pandas.Series (#253)
  • Fixed Series.describe() to return a pandas.Series instead of pandas.DataFrame (#258)
  • Fixed DataFrame.mean() and Series.mean() dtype (#258)
  • Fixed DataFrame.agg() aggregations when using extended_stats Elasticsearch aggregation (#253)

7.7.0a1 (2020-05-20)

Added

  • Added the package to Conda Forge, install via conda install -c conda-forge eland (#209)
  • Added DataFrame.sample() and Series.sample() for querying a random sample of data from the index (#196, contributed by @mesejo)
  • Added Series.isna() and Series.notna() for filtering out missing, NaN or null values from a column (#210, contributed by @mesejo)
  • Added DataFrame.filter() and Series.filter() for reducing an axis using a sequence of items or a pattern (#212)
  • Added DataFrame.to_pandas() and Series.to_pandas() for converting an Eland dataframe or series into a Pandas dataframe or series inline (#208)
  • Added support for XGBoost v1.0.0 (#200)

Deprecated

  • Deprecated info_es() in favor of es_info() (#208)
  • Deprecated eland.read_csv() in favor of eland.csv_to_eland() (#208)
  • Deprecated eland.read_es() in favor of eland.DataFrame() (#208)

Changed

  • Changed var and std aggregations to use sample instead of population in line with Pandas (#185)
  • Changed painless scripts to use source rather than inline to improve script caching performance (#191, contributed by @mesejo)
  • Changed minimum elasticsearch Python library version to v7.7.0 (#207)
  • Changed name of Index.field_name to Index.es_field_name (#208)

Fixed

  • Fixed DeprecationWarning raised from pandas.Series when an an empty series was created without specifying dtype (#188, contributed by @mesejo)
  • Fixed a bug when filtering columns on complex combinations of and and or (#204)
  • Fixed an issue where DataFrame.shape would return a larger value than in the index if a sized operation like .head(X) was applied to the data frame (#205, contributed by @mesejo)
  • Fixed issue where both scikit-learn and xgboost libraries were required to use eland.ml.ImportedMLModel, now only one library is required to use this feature (#206)

7.6.0a5 (2020-04-14)

Added

  • Added support for Pandas v1.0.0 (#141, contributed by @mesejo)
  • Added use_pandas_index_for_es_ids parameter to pandas_to_eland() (#154)
  • Added es_type_overrides parameter to pandas_to_eland() (#181)
  • Added NDFrame.var(), .std() and .median() aggregations (#175, #176, contributed by @mesejo)
  • Added DataFrame.es_query() to allow modifying ES queries directly (#156)
  • Added eland.__version__ (#153, contributed by @mesejo)

Removed

  • Removed support for Python 3.5 (#150)
  • Removed eland.Client() interface, use elasticsearch.Elasticsearch() client instead (#166)
  • Removed all private objects from top-level eland namespace (#170)
  • Removed geo_points from pandas_to_eland() in favor of es_type_overrides (#181)

Changed

  • Changed ML model serialization to be slightly smaller (#159)
  • Changed minimum elasticsearch Python library version to v7.6.0 (#181)

Fixed

  • Fixed inference_config being required on ML models for ES >=7.8 (#174)
  • Fixed unpacking for DataFrame.aggregate("median") (#161)

7.6.0a4 (2020-03-23)

Changed

  • Changed requirement for xgboost from >=0.90 to ==0.90

Fixed

  • Fixed issue in DataFrame.info() when called on an empty frame (#135)
  • Fixed issues where many _source fields would generate a too_long_frame error (#135, #137)