Releases: enjalot/latent-scope
Table improvements & embedding visualization
This release fixes a few bugs with the Explore page table UI and nearest neighbor search, making it much more reliable and performant.
Thank you to @hydrosquall for issues & PRs! #49 #50 #52
A new experimental feature for directly visualizing embeddings in the table is ready to try:
Use any Sentence Transformer from HuggingFace
This release adopts sentence transformers for embedding using local open source models downloaded automatically from HuggingFace hub.
It also keeps track of recently used models and brings it all together in a much improved selector component on the frontend.
Also includes a PR from @hydrosquall that fixed a bug using truncated embeddings in the nearest neighbor search.
One minor note: for now truncating of sentence transformers isn't supported as we don't have a way to tell if the model supports it arbitrarily. We could maintain a list of matroyshka enabled models separately.
export interactive plots
Export interactive DataMapPlots optionally instead of static thanks to @dhruv-anand-aintech
Fixes an unpinned dependency breaking transformers models
Export static plots
Implements #23, creating a UI to easily export static plots using datamapplot
Support more filetype inputs thanks to #40
Support proxy servers / alternate OpenAI compatible endpoints #44
The requirements.txt has been loosened so Python 3.12 should be enabled and more updated versions of some important pip modules will be installed
new models
Adds a few embedding models:
https://huggingface.co/Snowflake/snowflake-arctic-embed-s
https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long
https://huggingface.co/BAAI/bge-m3
Also a new chat model for labeling:
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
Tweaked the labeling prompt to perform better
Improve setup flow
Minor improvements to the setup flow
Refined data export
Creating a scope now also creates a combined parquet of the input data and the scope annotations.
This makes loading curated scopes much easier in other workflows
0.2.0 Explore Overhaul
This release makes a number of improvements to the exploring and curation part of Latent Scope. You can now filter a number of ways from a unified interface and perform bulk actions on the filtered points.
The following issues were closed:
This wasn't closed, but now we can show images in the data table if there is an image url:
- #24 showing images
Improved documentation and a number of guides have been published to https://enjalot.github.io/latent-scope/
v0.1.8
Fixed a long-standing performance issue with loading the python module. Imports are now done on demand, shaving ~4 seconds off loading the library (including starting the server or running any of the scripts).
Improved ingest flow #34 implemented. When data is ingested we now:
- Check types of columns and generate summary statistics. this will enable future UI
- Check for array columns and suggest importing those as embeddings
- Check for name collisions when uploading a dataset, giving a warning if a collision is detected
0.1.7 Setup process improvements
- A number of styling improvements are made to the setup page.
- Fixed an annoying loading issue that was adding ~4 seconds to each part of the process.
- Added nltk top words as a CPU friendly way to summarize clusters