Skip to content

Releases: neuml/txtai

v4.6.0

15 Aug 14:23
Compare
Choose a tag to compare

🎈🎉🥳 txtai turns 2 🎈🎉🥳

We're excited to release the 25th version of txtai marking it's 2 year anniversary. Thank you to the txtai community. Please remember to ⭐ txtai!

txtai 4.6 is a large but backwards compatible release! This release adds better integration between embeddings and workflows. It also adds a number of significant performance improvements and bug fixes.

New Features

  • Add transform workflow action to application (#281)
  • Add ability to resolve workflows within applications (#290)
  • OFFSET in sql query statement (#293)
  • Add webpage summary image generation notebook (#299)
  • Add notebook on running txtai with native code (#304)
  • Add mmap parameter to Faiss (#308)
  • Add indexing guide to docs (#312)

Improvements

  • Consume generator outputs in workflow tasks (#291)
  • Update pipeline workflow notebook (#292)
  • Update tabular notebook (#297)
  • Lower required version of Pillow library to prevent unnecessary upgrades (#303)
  • Embeddings vector batch improvements (#309)
  • Use single constant for current pickle protocol (#310)
  • Move quantize config param to Faiss (#311)
  • Update documentation with new demo and diagrams (#313)
  • Improve embeddings performance with large query limits (#318)

Bug Fixes

  • ModuleNotFoundError: No module named 'transformers.hf_api' (#274)
  • Dependency issue with ONNX and Protobuf (#285)
  • The key should be writable instead of path. Thank you to @csnelsonchu! (#287)
  • Fix breaking change in build script from mkdocstrings bug (#289)
  • Index id sync issue when inserting multiple data types (text, documents, objects) into Embeddings (#294)
  • Labels pipeline outputs changed with transformers 4.20.0 (#295)
  • Tabular pipeline throws error when processing list fields (#296)
  • txtai load testing (#305)
  • Add cloud config to application.upsert method (#306)

v4.5.0

17 May 13:52
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

New Features

  • Add scripts to train bashsql query translation model (#271)
  • Add QA database example notebook (#272)
  • Add CITATION file (#273)

Improvements

  • Improve efficiency of external vectors (#275)
  • Refactor vectors package to improve code reuse (#276)
  • Add logic to detect external vectors method (#277)

Bug Fixes

  • Fix summary pipeline issue with transformers>=4.19.0 (#278)

v4.4.0

20 Apr 14:21
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

New Features

  • Add semantic search explainability (#248)
  • Add notebook covering model explainability (#249)
  • Add txtai console (#252)
  • Add sequences pipeline (#261)
  • Add scripts to train query translation models (#265)
  • Add query translation logic in embeddings searches (#266)
  • Add notebook for query translation (#269)

Improvements

  • Update HFTrainer to support sequence-sequence models (#262)

Bug Fixes

  • Unit tests failing with tokenizers>= 0.12 (#253)
  • Running default.config.yml returns TypeError: register() got an unexpected keyword argument 'ids' (#256)
  • Unit tests failing with transformers==4.18.0 (#258)
  • Update precommit to use latest version of psf black (#259)

v4.3.1

11 Mar 20:14
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

Bug Fixes

  • Fix word embeddings regression with batch transformation (#245)

v4.3.0

10 Mar 00:02
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

New Features

  • Add notebook covering txtai embeddings index file structure (#237)
  • Add Image Hash pipeline (#240)
  • Add support for custom SQL functions in embeddings queries (#241)
  • Add notebook for Embeddings SQL functions (#243)
  • Add notebook for near-duplicate image detection (#244)

Improvements

  • Rename SQLException to SQLError (#232)
  • Refactor API instance into a separate package (#233)
  • API should raise an error if attempting to modify a read-only index (#235)
  • Add last update field to index metadata (#236)
  • Update transcription pipeline to use AutoModelForCTC (#238)

Bug Fixes

  • Ensure limit always set in embeddings search/batchsearch (#234)
  • Fix issue with parsing multiline SQL statements bug (#242)

v4.2.1

28 Feb 01:05
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

Bug Fixes

  • Fixed mislabeled API config definition (#231)

v4.2.0

24 Feb 11:47
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

New Features

  • Add notebook for workflow notifications (#225)
  • Add default and custom docker configurations (#226)
  • Create docker configuration for AWS Lambda (#228)
  • Add support for loading/storing embedding indexes on cloud storage (#229)

Improvements

  • Add support for SQL || operator (#223)
  • Add flag to disable loading index data in API (#230)

Bug Fixes

  • Modify database decoder methods to check for None (#220)
  • Modify embeddings search to make return type consistent when index initialized and not initialized (#221)
  • Embeddings index returning malformed JSON errors in certain situations (#222)
  • Check for empty documents input before indexing (#224)

v4.1.0

03 Feb 11:55
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

New Features

  • Add entity extraction pipeline (#203)
  • Add workflow scheduling (#206)
  • Add workflow search task to API (#210)
  • Add Console Task (#215)
  • Add Export Task (#216)
  • Add notebook for workflow scheduling (#218)

Improvements

  • Default documentation theme using system preference (#197)
  • Improve multi-user experience for workflow application (#198)
  • Documentation improvements (#200)
  • Add social preview image for documentation (#201)
  • Add links to txtai in all example notebooks (#202)
  • Add limit parameter to API search method (#208)
  • Add documentation on local API instances (#209)
  • Add shorthand syntax for creating workflow tasks in API (#211)
  • Accept functions as workflow task actions in API (#213)

Bug Fixes

  • Object detection model fails to load additional models (#204)
  • Update unit tests to limit cpu usage for word vector tests (#207)
  • Add better error handling around unindexed embedding instances (#212)
  • Fix issue when workflow task generates no output (#214)
  • Add lock to API search methods (#217)

v4.0.0

11 Jan 12:23
Compare
Choose a tag to compare

🎈🎉🥳 We're excited to announce the release of txtai 4.0! 🥳🎉🎈

Thank you to the growing txtai community. This couldn't be done without you. Please remember to ⭐ txtai if it has been helpful.

txtai 4.0 is a major release with a significant number of new features. This release adds content storage, querying with sql, object storage, reindexing, index compression, external vectors and more!

To quantify the changes, the code base increased by 50% with 36 resolved issues, by far the biggest release of txtai. These changes were designed to be fully backward compatible but keep in mind it is a new major release.

What's new in txtai 4.0 covers all the changes with detailed examples. The documentation site has also been refreshed.

New Features

  • Store text content (#168)
  • Add option to index dictionaries of content (#169)
  • Add SQL support for generating combined embeddings + database queries (#170)
  • Add reindex method to embeddings (#171)
  • Add index archive support (#172)
  • Add close method to embeddings (#173)
  • Update API to work with embeddings + database search (#176)
  • Add content option to tabular pipeline (#177)
  • Update workflow example to support embeddings content (#179)
  • Add index metadata to embeddings config (#180)
  • Add object storage (#183)
  • Aggregate partial query results when clustering (#184)
  • Add function parameter to embeddings reindex (#185)
  • Add support for user defined column aliases (#186)
  • Use SQL bracket notation to support multi word and more complex JSON path expressions (#187)
  • Support SQLite 3.22+ (#190)
  • Add pre-computed vector support (#192)
  • Change document/object inserts to only keep latest record (#193)
  • Update documentation with 4.0 changes (#196)

Improvements

  • Modify workflow to select batches with slices (#158)
  • Add tensor support to workflows (#159)
  • Read YAML config if provided as a file path (#162)
  • Make adding pipelines to API easier (#163)
  • Process task actions concurrently (#164)
  • Add tensor workflow notebook (#167)
  • Update default ANN parameters (#174)
  • Require Python 3.7+ (#175)
  • Consistently name embeddings id fields (#178)
  • Add txtai version attribute (#181)
  • Refresh notebooks for 4.0 (#188)
  • Modify embeddings to only iterate over input documents once (#189)
  • Improve efficiency of vector transformations (#191)

Bug Fixes

  • Add thread lock around API write calls (#160)
  • Expose caption and objects pipeline via API (#161)
  • Change pickle calls to use protocol supporting lowest Python version (#182)
  • HFOnnx expects ORT provider bug (#195)

v3.7.0

23 Nov 01:28
Compare
Choose a tag to compare

This release adds the following new features, improvements and bug fixes.

New Features

  • Add object detection pipeline (#148)
  • Add image caption pipeline (#149)
  • Add retrieval task (#150)
  • Add no-op pipeline (#152)
  • Add new workflow functionality (#155)

Improvements

  • Add korean translation to README.md. Thank you @0206pdh! (#138)
  • Add links to external articles (#139)
  • Update example applications to be consistent (#140)
  • Add an article summarization example (#144)
  • Add fallback mode for textractor (#145)
  • Reorganize pipeline package (#147)
  • Update optional package tests to simulate missing packages (#154)
  • Add parameter to flatten labels output (#153)
  • Update documentation with latest changes (#156)

Bug Fixes

  • Fix bug with importing service task when workflow extra not installed (#146)
  • Fix inconsistencies with url based tasks (#151)