- Revert "Allow reading Elasticsearch certs in Wolfi image" (#734)
- Added support for DeBERTa-V2 tokenizer (#717)
- Fixed
--ca-cert
with a shared Elasticsearch Docker volume (#732)
- Fixed Docker image build (#728)
- Upgraded PyTorch to version 2.3.1, which is compatible with Elasticsearch 8.15.2 or above (#718)
- Migrated to distroless Wolfi base Docker image (#720)
- Added a default truncation of
second
for text similarity (#713) - Added note about using text_similarity for rerank in the CLI (#716)
- Added support for lists in result hits (#707)
- Removed input fields from exported LTR models (#708)
- Added Elasticsearch Serverless support in DataFrames (#690, contributed by @AshokChoudhary11) and eland_import_hub_model (#698)
- Fixed Python 3.8 support (#695, contributed by @bartbroere)
- Fixed non _source fields missing from the results hits (#693, contributed by @bartbroere)
- Added support for HTTP proxies in eland_import_hub_model (#688)
- Added support for Python 3.11 (#681)
- Added
eland.DataFrame.to_json
function (#661, contributed by @bartbroere) - Added override option to specify the model's max input size (#674)
- Upgraded torch to 2.1.2 (#671)
- Mirrored pandas'
lineterminator
instead ofline_terminator
into_csv
(#595, contributed by @bartbroere)
- Fix missing value support for XGBRanker (#654)
- Supported XGBRanker model (#649)
- Accepted LTR (Learning to rank) model config when importing model (#645, #651)
- Added LTR feature logger (#648)
- Added
prefix_string
config option to the import model hub script (#642) - Made online retail analysis notebook runnable in Colab (#641)
- Added new movie dataset to the tests (#646)
- Make demo notebook runnable in Colab (#630)
- Bump Shap version to 0.43 (#636)
- Fix failed import of Sentence Transformer RoBERTa models (#637)
- Support E5 small multilingual model (#625)
- Fixed deprecations in preparation of Pandas 2.0 support (#602, #603, contributed by @bartbroere)
- Fixed direct usage of TransformerModel (#619)
- Published pre-built Docker images to docker.elastic.co/eland/eland (#613)
- Allowed importing private HuggingFace models (#608)
- Added Apple Silicon (arm64) support to Docker image (#615)
- Allowed importing some DPR models like ance-dpr-context-multi (#573)
- Allowed using the Pandas API without monitoring/main permissions (#581)
- Updated Docker image to Debian 12 Bookworm (#613)
- Reduced Docker image size by not installing unused PyTorch GPU support on amd64 (#615)
- Reduced model chunk size to 1MB (#605)
- Fixed deprecations in preparation of Pandas 2.0 support (#593, #596, contributed by @bartbroere)
- Simplify embedding model support and loading (#569)
- Make eland_import_hub_model easier to find on Windows (#559)
- Update trained model inference endpoint (#556)
- Add BertJapaneseTokenizer support with bert_ja tokenization configuration (#534)
- Add ability to upload xlm-roberta tokenized models (#518)
- Tolerate different model output formats when measuring embedding size (#535)
- Generate valid NLP model id from file path (#541)
- Upgrade torch to 1.13.1 and check the cluster version before uploading a NLP model (#522)
- Set embedding_size config parameter for Text Embedding models (#532)
- Add support for the pass_through task (#526)
- Fixed black to comply with the code style (#557)
- Fixed No module named 'torch' (#553)
- Fix autosummary directive by removing hack autosummaries (#548)
- Prevent TypeError with None check (#525)
- Added a new NLP model task type "text_similarity" (#486)
- Added a new NLP model task type "text_expansion" (#520)
- Added support for exporting an Elastic ML model as a scikit-learn pipeline via
MLModel.export_model()
(#509)
- Fixed an issue that occurred when LightGBM was installed but libomp wasn't installed on the system. (#499)
- Added a new NLP model task type "auto" which infers the task type based on model configuration and architecture (#475)
- Changed required version of 'torch' package to >=1.11.0,<1.12 to match required PyTorch version for Elasticsearch 8.3 (was >=1.9.0,<2) (#479)
- Changed the default value of the --task-type parameter for the eland_import_hub_model CLI to be "auto" (#475)
- Fixed decision tree classifier serialization to account for probabilities (#465)
- Fixed PyTorch model quantization (#472)
- Added support for passing Cloud ID via
--cloud-id
toeland_import_hub_model
CLI tool (#462) - Added support for authenticating via
--es-username
,--es-password
, and--es-api-key
to theeland_import_hub_model
CLI tool (#461) - Added support for XGBoost 1.6 (#458)
- Added support for
question_answering
NLP tasks (#457)
- Added support for
eland.Series.unique()
(#448, contributed by @V1NAY8) - Added
--ca-certs
and--insecure
options toeland_import_hub_model
for configuring TLS (#441)
- Added support for Natural Language Processing (NLP) models using PyTorch (#394)
- Added new extra
eland[pytorch]
for installing all dependencies needed for PyTorch (#394) - Added a CLI script
eland_import_hub_model
for uploading HuggingFace models to Elasticsearch (#403) - Added support for v8.0 of the Python Elasticsearch client (#415)
- Added a warning if Eland detects it's communicating with an incompatible Elasticsearch version (#419)
- Added support for
number_samples
to LightGBM and Scikit-Learn models (#397, contributed by @V1NAY8) - Added ability to use datetime types for filtering dataframes (`284`_, contributed by @Fju)
- Added pandas
datetime64
type to use the Elasticsearchdate
type (`#425`_, contributed by @Ashton-Sidhu) - Added
es_verify_mapping_compatibility
parameter to disable schema enforcement withpandas_to_eland
(#423, contributed by @Ashton-Sidhu)
- Changed
to_pandas()
to only use Point-in-Time andsearch_after
instead of using Scroll APIs for pagination.
- Simplified result collectors to increase performance transforming Elasticsearch results to pandas (#378, contributed by @V1NAY8)
- Changed search pagination function to yield batches of hits (#379)
- Added support for Pandas 1.3.x (#362, contributed by @V1NAY8)
- Added support for LightGBM 3.x (#362, contributed by @V1NAY8)
- Added
DataFrame.idxmax()
andDataFrame.idxmin()
methods (#353, contributed by @V1NAY8) - Added type hints to
eland.ndframe
andeland.operations
(#366, contributed by @V1NAY8)
- Changed paginated search function to use Point-in-Time and Search After features instead of Scroll when connected to Elasticsearch 7.12+ (#370 and #376, contributed by @V1NAY8)
- Optimized the
FieldMappings.aggregate_field_name()
method (#373, contributed by @V1NAY8)
- Added
DataFrame.quantile()
,Series.quantile()
, andDataFrameGroupBy.quantile()
aggregations (#318 and #356, contributed by @V1NAY8)
- Changed the error raised when
es_index_pattern
doesn't point to any indices to be more user-friendly (#346)
- Fixed a warning about conflicting field types when wildcards are used
in
es_index_pattern
(#346) - Fixed sorting when using
DataFrame.groupby()
withdropna
(#322, contributed by @V1NAY8) - Fixed deprecated usage
numpy.int
in favor ofnumpy.int_
(#354, contributed by @V1NAY8)
- Added support for Pandas 1.2.0 (#336)
- Added
DataFrame.mode()
andSeries.mode()
aggregation (#323, contributed by @V1NAY8) - Added support for
pd.set_option("display.max_rows", None)
(#308, contributed by @V1NAY8) - Added Elasticsearch storage usage to
df.info()
(#321, contributed by @V1NAY8)
- Removed deprecated aliases
read_es
,read_csv
,DataFrame.info_es
, andMLModel(overwrite=True)
(#331, contributed by @V1NAY8)
- Added
DataFrame.groupby()
method with all aggregations (#278, #291, #292, #300 contributed by @V1NAY8) - Added
es_match()
method toDataFrame
andSeries
for filtering rows with full-text search (#301) - Added support for type hints of the
elasticsearch-py
package (#295) - Added support for passing dictionaries to
es_type_overrides
parameter in thepandas_to_eland()
function to directly control the field mapping generated in Elasticsearch (#310) - Added
es_dtypes
property toDataFrame
andSeries
(#285)
- Changed
pandas_to_eland()
to use theparallel_bulk()
helper instead of single-threadedbulk()
helper to improve performance (#279, contributed by @V1NAY8) - Changed the
es_type_overrides
parameter inpandas_to_eland()
to raiseValueError
if an unknown column is given (#302) - Changed
DataFrame.filter()
to preserve the order of items (#283, contributed by @V1NAY8) - Changed when setting
es_type_overrides={"column": "text"}
inpandas_to_eland()
will automatically add thecolumn.keyword
sub-field so that aggregations are available for the field as well (#310)
- Fixed
Series.__repr__
when the series is empty (#306)
- Added the
predict()
method andmodel_type
,feature_names
, andresults_field
properties toMLModel
(#266)
- Deprecated
ImportedMLModel
in favor ofMLModel.import_model(...)
(#266)
- Changed DataFrame aggregations to use
numeric_only=None
instead ofnumeric_only=True
by default. This is the same behavior as Pandas (#270, contributed by @V1NAY8)
- Fixed
DataFrame.agg()
when given a string instead of a list of aggregations will now properly return aSeries
instead of aDataFrame
(#263, contributed by @V1NAY8)
- Added support for Pandas v1.1 (#253)
- Added support for LightGBM
LGBMRegressor
andLGBMClassifier
toImportedMLModel
(#247, #252) - Added support for
multi:softmax
andmulti:softprob
XGBoost operators toImportedMLModel
(#246) - Added column names to
DataFrame.__dir__()
for better auto-completion support (#223, contributed by @leonardbinet) - Added support for
es_if_exists='append'
topandas_to_eland()
(#217) - Added support for aggregating datetimes with
nunique
andmean
(#253) - Added
es_compress_model_definition
parameter toImportedMLModel
constructor (#220) - Added
.size
and.ndim
properties toDataFrame
andSeries
(#231 and #233) - Added
.dtype
property toSeries
(#258) - Added support for using
pandas.Series
withSeries.isin()
(#231) - Added type hints to many APIs in
DataFrame
andSeries
(#231)
- Deprecated the
overwrite
parameter in favor ofes_if_exists
inImportedMLModel
constructor (#249, contributed by @V1NAY8)
- Changed aggregations for datetimes to be higher precision when available (#253)
- Fixed
ImportedMLModel.predict()
to fail whenerrors
are present in theingest.simulate
response (#220) - Fixed
Series.median()
aggregation to return a scalar instead ofpandas.Series
(#253) - Fixed
Series.describe()
to return apandas.Series
instead ofpandas.DataFrame
(#258) - Fixed
DataFrame.mean()
andSeries.mean()
dtype (#258) - Fixed
DataFrame.agg()
aggregations when usingextended_stats
Elasticsearch aggregation (#253)
- Added the package to Conda Forge, install via
conda install -c conda-forge eland
(#209) - Added
DataFrame.sample()
andSeries.sample()
for querying a random sample of data from the index (#196, contributed by @mesejo) - Added
Series.isna()
andSeries.notna()
for filtering out missing,NaN
or null values from a column (#210, contributed by @mesejo) - Added
DataFrame.filter()
andSeries.filter()
for reducing an axis using a sequence of items or a pattern (#212) - Added
DataFrame.to_pandas()
andSeries.to_pandas()
for converting an Eland dataframe or series into a Pandas dataframe or series inline (#208) - Added support for XGBoost v1.0.0 (#200)
- Deprecated
info_es()
in favor ofes_info()
(#208) - Deprecated
eland.read_csv()
in favor ofeland.csv_to_eland()
(#208) - Deprecated
eland.read_es()
in favor ofeland.DataFrame()
(#208)
- Changed
var
andstd
aggregations to use sample instead of population in line with Pandas (#185) - Changed painless scripts to use
source
rather thaninline
to improve script caching performance (#191, contributed by @mesejo) - Changed minimum
elasticsearch
Python library version to v7.7.0 (#207) - Changed name of
Index.field_name
toIndex.es_field_name
(#208)
- Fixed
DeprecationWarning
raised frompandas.Series
when an an empty series was created without specifyingdtype
(#188, contributed by @mesejo) - Fixed a bug when filtering columns on complex combinations of and and or (#204)
- Fixed an issue where
DataFrame.shape
would return a larger value than in the index if a sized operation like.head(X)
was applied to the data frame (#205, contributed by @mesejo) - Fixed issue where both
scikit-learn
andxgboost
libraries were required to useeland.ml.ImportedMLModel
, now only one library is required to use this feature (#206)
- Added support for Pandas v1.0.0 (#141, contributed by @mesejo)
- Added
use_pandas_index_for_es_ids
parameter topandas_to_eland()
(#154) - Added
es_type_overrides
parameter topandas_to_eland()
(#181) - Added
NDFrame.var()
,.std()
and.median()
aggregations (#175, #176, contributed by @mesejo) - Added
DataFrame.es_query()
to allow modifying ES queries directly (#156) - Added
eland.__version__
(#153, contributed by @mesejo)
- Removed support for Python 3.5 (#150)
- Removed
eland.Client()
interface, useelasticsearch.Elasticsearch()
client instead (#166) - Removed all private objects from top-level
eland
namespace (#170) - Removed
geo_points
frompandas_to_eland()
in favor ofes_type_overrides
(#181)
- Changed ML model serialization to be slightly smaller (#159)
- Changed minimum
elasticsearch
Python library version to v7.6.0 (#181)
- Fixed
inference_config
being required on ML models for ES >=7.8 (#174) - Fixed unpacking for
DataFrame.aggregate("median")
(#161)
- Changed requirement for
xgboost
from>=0.90
to==0.90
- Fixed issue in
DataFrame.info()
when called on an empty frame (#135) - Fixed issues where many
_source
fields would generate atoo_long_frame
error (#135, #137)