Vector Database AI Apps

AI Apps using Vector Database (Pinecone)

Vector databses is eseential part of stack for developing LLM base applications. RAG - (retrieval augmented generation), retrieves the relevant data and use it as augmented context for the LLM application.

VECTOR DBs can also do:

Text similarity search
RAGs
Image similarity search
anamoly detection
recommendation system

Vector dbs good for sparse & dense vectors

Repo consists of below 6 apps using Vector DBs in various ways:

1. Basic semantic search for text documents
1. RAG
1. Recommendation system
1. Hybrid Search app for product Recommendation (uses dense vector for image & sparse for text)
1. Child Parent similarity app
1. Anamoly dtection based on database of server logs

1) SEMANTIC SEARCH

link to git code

search using meaning of content being search, whereas lexical search which looks for literal or pattern matching strings.

We will use Sentence Trasnformer model file for embedding.

FROM sbert.net

SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings.(initial work - paper Sentence-BERT)
framework to compute sentence / text embeddings for more than 100 languages.
embeddings can be compared e.g. with cosine-similarity to find sentences with a similar meaning.
useful for semantic textual similarity, semantic search, or paraphrase mining.
framework based on PyTorch and Transformers
offers a large collection of pre-trained models tuned for various tasks.
easy to fine-tune your own models

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

all-MiniLM-L6-v2

we will use sentence transformer model all-MiniLM-L6-v2 for embeddings. It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

2) RAG (Retrieval Augmented Generation)

link to git code

Insteaing of directly sending query to LLMs, in RAGs we optimize the output by also refering authoritative knowledge base (which was not part of training data)

DATASET - wikipidea articles Add embeddings to vector db on search Query - Search result on vector database pinecone document retrieval OpenAI - augmented query sent to OpenAI

Image Source

Embedding model - 'text-embedding-ada-002' (OPENAI)

text-embedding-ada-002 used for text search, text similarity, and code search
outperforms previous model - Davinci

OpenAI Embedding model can be simply called by below line. It converts

import openai
response = openai.Embedding.create(
  input="I have a dream",
  model="text-embedding-ada-002"
)

PINECONE Index works with format of values:

(ids, values, metadata)

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index("pinecone-index")

index.upsert(
  vectors=[
    {
      "id": "A", 
      "values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], 
      "metadata": {"genre": "comedy", "year": 2020}
    },
    {
      "id": "B", 
      "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
      "metadata": {"genre": "documentary", "year": 2019}
    }
  ]
)

(Update + insert = upsert)

3) RECOMMENDER SYSTEM

New article Embeddings from article titles Recommended system which searches across all titles

Reco system based on content rather than topic

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
RAG_OPENAI_wikipidea_article.ipynb		RAG_OPENAI_wikipidea_article.ipynb
README.md		README.md
Semantic_Search_Vector_db.ipynb		Semantic_Search_Vector_db.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector Database AI Apps

1) SEMANTIC SEARCH

FROM sbert.net

all-MiniLM-L6-v2

2) RAG (Retrieval Augmented Generation)

Embedding model - 'text-embedding-ada-002' (OPENAI)

PINECONE Index works with format of values:

3) RECOMMENDER SYSTEM

About

Releases

Packages

Languages

mekhiya/vector-database-ai-apps

Folders and files

Latest commit

History

Repository files navigation

Vector Database AI Apps

1) SEMANTIC SEARCH

FROM sbert.net

all-MiniLM-L6-v2

2) RAG (Retrieval Augmented Generation)

Embedding model - 'text-embedding-ada-002' (OPENAI)

PINECONE Index works with format of values:

3) RECOMMENDER SYSTEM

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages