Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultimodalQnA Image and Audio Support Phase 1 #1071

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
97f4bc0
Initial implementation of image ingestion
mhbuehler Oct 14, 2024
08956e5
Added ability to change LVM Model (#1)
okhleif-IL Oct 21, 2024
56bd8c3
Use same endpoint for image and video ingestion
mhbuehler Oct 21, 2024
b334dc2
Update tests and docs
mhbuehler Oct 21, 2024
13c752a
Merge branch 'main' of github.com:mhbuehler/GenAIExamples into melani…
dmsuehir Oct 22, 2024
5f4cf29
Renamed Dataprep Endpoints videos --> files (#3)
okhleif-IL Oct 23, 2024
271117e
added LVM_MODEL_ID var to test file (#4)
okhleif-IL Oct 23, 2024
c4e0259
Updates tests per feedback
mhbuehler Oct 24, 2024
a541140
Merge pull request #2 from mhbuehler/melanie/combined_image_video_ing…
mhbuehler Oct 24, 2024
30c311d
Merge branch 'main' of github.com:mhbuehler/GenAIExamples into melani…
dmsuehir Oct 24, 2024
e83fc44
Update LVM model for Xeon
dmsuehir Oct 25, 2024
4b8a5ad
Merge pull request #5 from mhbuehler/dina/lvm_model
mhbuehler Oct 25, 2024
a6d826c
Initial setup for ingest_with_text
mhbuehler Oct 24, 2024
547a139
Write and send custom caption file
mhbuehler Oct 25, 2024
7cfc343
Update docs and tests
mhbuehler Oct 28, 2024
69dbdfc
MMQnA doc updates for audio ingestion (#7)
dmsuehir Oct 29, 2024
834c668
Merge branch 'main' of github.com:mhbuehler/GenAIExamples into melani…
dmsuehir Oct 29, 2024
a8f8dc9
Fix UI request for follow up queries with no image (#8)
dmsuehir Oct 30, 2024
718d02e
Updated for review suggestions
mhbuehler Oct 30, 2024
d05cfb3
Merge branch 'melanie/mm-rag-enhanced' into melanie/images_and_text
mhbuehler Oct 30, 2024
72591c1
Add audio upload functionality to UI
mhbuehler Oct 30, 2024
431e41b
Merge pull request #6 from mhbuehler/melanie/images_and_text
mhbuehler Oct 30, 2024
d535aa7
Merge branch 'melanie/mm-rag-enhanced' into melanie/audio_ingest_ui
mhbuehler Oct 31, 2024
39f43fc
Minor refactor, improve display text, and suppress PDF tab
mhbuehler Oct 31, 2024
afc3c8a
Merge pull request #9 from mhbuehler/melanie/audio_ingest_ui
mhbuehler Oct 31, 2024
ddd5dfb
Small fixes
mhbuehler Nov 1, 2024
426c739
Improve appearance
mhbuehler Nov 1, 2024
cfa1c8c
Improve upload errors and revert multimodal query box
mhbuehler Nov 2, 2024
cdec83f
Small text edit as suggested
mhbuehler Nov 4, 2024
48baceb
Merge pull request #11 from mhbuehler/melanie/mm-fixes
mhbuehler Nov 4, 2024
170b723
Merge branch 'main' of github.com:mhbuehler/GenAIExamples into melani…
dmsuehir Nov 4, 2024
3a23e5b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
65a8afd
Merge branch 'main' into melanie/mm-rag-enhanced
ashahba Nov 5, 2024
f95b946
updated readmes with MMQA info
okhleif-IL Nov 5, 2024
c4d5138
removed stray char
okhleif-IL Nov 5, 2024
a1350c5
Fixed header
okhleif-IL Nov 5, 2024
c7aadd2
addressed review comments
okhleif-IL Nov 5, 2024
7288faa
removed video
okhleif-IL Nov 5, 2024
e108ee9
Merge branch 'main' into melanie/mm-rag-enhanced
ashahba Nov 6, 2024
9fdd6fe
Reorder new lvm-dependent tests and fix clear textbox
mhbuehler Nov 6, 2024
ee387a2
Merge pull request #13 from mhbuehler/omar/mmqa-docs
mhbuehler Nov 6, 2024
aafcfe1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2024
359b6f8
Merge pull request #14 from mhbuehler/melanie/fix_tests
mhbuehler Nov 6, 2024
9d3ed45
fixed multimodalqna typos
okhleif-IL Nov 6, 2024
54cff40
Point git clone at specific branch of GenAIComps
mhbuehler Nov 6, 2024
fd9fd84
Merge pull request #15 from mhbuehler/omar/dockimg_doc
mhbuehler Nov 6, 2024
d88513a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2024
24438e5
Merge pull request #16 from mhbuehler/melanie/clone_specific_branch
mhbuehler Nov 6, 2024
df2511b
Fix xeon test to use llava-hf/llava-1.5-7b-hf (#17)
dmsuehir Nov 6, 2024
6631601
Update MMQnA xeon test to wait for LVM to be ready (#18)
dmsuehir Nov 7, 2024
59acc77
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2024
9d4cb5f
Test update: Increase wait time and add more messages (#19)
dmsuehir Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions MultimodalQnA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Suppose you possess a set of videos and wish to perform question-answering to extract insights from these videos. To respond to your questions, it typically necessitates comprehension of visual cues within the videos, knowledge derived from the audio content, or often a mix of both these visual elements and auditory facts. The MultimodalQnA framework offers an optimal solution for this purpose.

`MultimodalQnA` addresses your questions by dynamically fetching the most pertinent multimodal information (frames, transcripts, and/or captions) from your collection of videos. For this purpose, MultimodalQnA utilizes [BridgeTower model](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi), a multimodal encoding transformer model which merges visual and textual data into a unified semantic space. During the video ingestion phase, the BridgeTower model embeds both visual cues and auditory facts as texts, and those embeddings are then stored in a vector database. When it comes to answering a question, the MultimodalQnA will fetch its most relevant multimodal content from the vector store and feed it into a downstream Large Vision-Language Model (LVM) as input context to generate a response for the user.
`MultimodalQnA` addresses your questions by dynamically fetching the most pertinent multimodal information (frames, transcripts, and/or captions) from your collection of videos, images, and audio files. For this purpose, MultimodalQnA utilizes [BridgeTower model](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi), a multimodal encoding transformer model which merges visual and textual data into a unified semantic space. During the ingestion phase, the BridgeTower model embeds both visual cues and auditory facts as texts, and those embeddings are then stored in a vector database. When it comes to answering a question, the MultimodalQnA will fetch its most relevant multimodal content from the vector store and feed it into a downstream Large Vision-Language Model (LVM) as input context to generate a response for the user.

The MultimodalQnA architecture shows below:

Expand Down Expand Up @@ -100,10 +100,12 @@ In the below, we provide a table that describes for each microservice component

By default, the embedding and LVM models are set to a default value as listed below:

| Service | Model |
| -------------------- | ------------------------------------------- |
| embedding-multimodal | BridgeTower/bridgetower-large-itm-mlm-gaudi |
| LVM | llava-hf/llava-v1.6-vicuna-13b-hf |
| Service | HW | Model |
| -------------------- | ----- | ----------------------------------------- |
| embedding-multimodal | Xeon | BridgeTower/bridgetower-large-itm-mlm-itc |
| LVM | Xeon | llava-hf/llava-1.5-7b-hf |
| embedding-multimodal | Gaudi | BridgeTower/bridgetower-large-itm-mlm-itc |
| LVM | Gaudi | llava-hf/llava-v1.6-vicuna-13b-hf |

You can choose other LVM models, such as `llava-hf/llava-1.5-7b-hf ` and `llava-hf/llava-1.5-13b-hf`, as needed.

Expand Down
50 changes: 37 additions & 13 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,16 +84,18 @@ export INDEX_NAME="mm-rag-redis"
export LLAVA_SERVER_PORT=8399
export LVM_ENDPOINT="http://${host_ip}:8399"
export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc"
export LVM_MODEL_ID="llava-hf/llava-1.5-7b-hf"
export WHISPER_MODEL="base"
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/ingest_with_text"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_transcripts"
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_captions"
export DATAPREP_GET_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_videos"
export DATAPREP_DELETE_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_videos"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_files"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_files"
```

Note: Please replace with `host_ip` with you external IP address, do not use localhost.
Expand Down Expand Up @@ -274,54 +276,76 @@ curl http://${host_ip}:9399/v1/lvm \

6. dataprep-multimodal-redis

Download a sample video
Download a sample video, image, and audio file and create a caption

```bash
export video_fn="WeAreGoingOnBullrun.mp4"
wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn}

export image_fn="apple.png"
wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn}

export caption_fn="apple.txt"
echo "This is an apple." > ${caption_fn}

export audio_fn="AudioSample.wav"
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn}
```

Test dataprep microservice. This command updates a knowledge base by uploading a local video .mp4.
Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav file.

```bash
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${video_fn}"
-X POST \
-F "files=@./${video_fn}" \
-F "files=@./${audio_fn}"
```

Also, test dataprep microservice with generating caption using lvm microservice
Also, test dataprep microservice with generating an image caption using lvm microservice

```bash
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${video_fn}"
-X POST -F "files=@./${image_fn}"
```

Now, test the microservice with posting a custom caption along with an image

```bash
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_INGEST_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}"
```

Also, you are able to get the list of all videos that you uploaded:
Also, you are able to get the list of all files that you uploaded:

```bash
curl -X POST \
-H "Content-Type: application/json" \
${DATAPREP_GET_VIDEO_ENDPOINT}
${DATAPREP_GET_FILE_ENDPOINT}
```

Then you will get the response python-style LIST like this. Notice the name of each uploaded video e.g., `videoname.mp4` will become `videoname_uuid.mp4` where `uuid` is a unique ID for each uploaded video. The same video that are uploaded twice will have different `uuid`.
Then you will get the response python-style LIST like this. Notice the name of each uploaded file e.g., `videoname.mp4` will become `videoname_uuid.mp4` where `uuid` is a unique ID for each uploaded file. The same files that are uploaded twice will have different `uuid`.

```bash
[
"WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4",
"WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4"
"WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4",
"apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png",
"AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav
]
```

To delete all uploaded videos along with data indexed with `$INDEX_NAME` in REDIS.
To delete all uploaded files along with data indexed with `$INDEX_NAME` in REDIS.

```bash
curl -X POST \
-H "Content-Type: application/json" \
${DATAPREP_DELETE_VIDEO_ENDPOINT}
${DATAPREP_DELETE_FILE_ENDPOINT}
```

7. MegaService
Expand Down
3 changes: 3 additions & 0 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
PORT: ${EMBEDDER_PORT}
entrypoint: ["python", "bridgetower_server.py", "--device", "cpu", "--model_name_or_path", $EMBEDDING_MODEL_ID]
restart: unless-stopped
embedding-multimodal:
image: ${REGISTRY:-opea}/embedding-multimodal:${TAG:-latest}
Expand Down Expand Up @@ -76,6 +77,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
entrypoint: ["python", "llava_server.py", "--device", "cpu", "--model_name_or_path", $LVM_MODEL_ID]
restart: unless-stopped
lvm-llava-svc:
image: ${REGISTRY:-opea}/lvm-llava-svc:${TAG:-latest}
Expand Down Expand Up @@ -125,6 +127,7 @@ services:
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- DATAPREP_INGEST_SERVICE_ENDPOINT=${DATAPREP_INGEST_SERVICE_ENDPOINT}
- DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT=${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT}
- DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT=${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT}
ipc: host
Expand Down
6 changes: 4 additions & 2 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@ export INDEX_NAME="mm-rag-redis"
export LLAVA_SERVER_PORT=8399
export LVM_ENDPOINT="http://${host_ip}:8399"
export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc"
export LVM_MODEL_ID="llava-hf/llava-1.5-7b-hf"
export WHISPER_MODEL="base"
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/ingest_with_text"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_transcripts"
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_captions"
export DATAPREP_GET_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_videos"
export DATAPREP_DELETE_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_videos"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_files"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_files"
51 changes: 36 additions & 15 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,11 @@ export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/ingest_with_text"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_transcripts"
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_captions"
export DATAPREP_GET_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_videos"
export DATAPREP_DELETE_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_videos"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_files"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_files"
```

Note: Please replace with `host_ip` with you external IP address, do not use localhost.
Expand Down Expand Up @@ -224,56 +225,76 @@ curl http://${host_ip}:9399/v1/lvm \

6. Multimodal Dataprep Microservice

Download a sample video
Download a sample video, image, and audio file and create a caption

```bash
export video_fn="WeAreGoingOnBullrun.mp4"
wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn}
```

Test dataprep microservice. This command updates a knowledge base by uploading a local video .mp4.
export image_fn="apple.png"
wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn}

export caption_fn="apple.txt"
echo "This is an apple." > ${caption_fn}

export audio_fn="AudioSample.wav"
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn}
```

Test dataprep microservice with generating transcript using whisper model
Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav file.

```bash
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${video_fn}"
-X POST \
-F "files=@./${video_fn}" \
-F "files=@./${audio_fn}"
```

Also, test dataprep microservice with generating caption using lvm-tgi
Also, test dataprep microservice with generating an image caption using lvm-tgi

```bash
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${video_fn}"
-X POST -F "files=@./${image_fn}"
```

Now, test the microservice with posting a custom caption along with an image

```bash
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_INGEST_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}"
```

Also, you are able to get the list of all videos that you uploaded:
Also, you are able to get the list of all files that you uploaded:

```bash
curl -X POST \
-H "Content-Type: application/json" \
${DATAPREP_GET_VIDEO_ENDPOINT}
${DATAPREP_GET_FILE_ENDPOINT}
```

Then you will get the response python-style LIST like this. Notice the name of each uploaded video e.g., `videoname.mp4` will become `videoname_uuid.mp4` where `uuid` is a unique ID for each uploaded video. The same video that are uploaded twice will have different `uuid`.
Then you will get the response python-style LIST like this. Notice the name of each uploaded file e.g., `videoname.mp4` will become `videoname_uuid.mp4` where `uuid` is a unique ID for each uploaded file. The same files that are uploaded twice will have different `uuid`.

```bash
[
"WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4",
"WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4"
"WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4",
"apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png",
"AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav
]
```

To delete all uploaded videos along with data indexed with `$INDEX_NAME` in REDIS.
To delete all uploaded files along with data indexed with `$INDEX_NAME` in REDIS.

```bash
curl -X POST \
-H "Content-Type: application/json" \
${DATAPREP_DELETE_VIDEO_ENDPOINT}
${DATAPREP_DELETE_FILE_ENDPOINT}
```

7. MegaService
Expand Down
2 changes: 2 additions & 0 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
PORT: ${EMBEDDER_PORT}
entrypoint: ["python", "bridgetower_server.py", "--device", "hpu", "--model_name_or_path", $EMBEDDING_MODEL_ID]
restart: unless-stopped
embedding-multimodal:
image: ${REGISTRY:-opea}/embedding-multimodal:${TAG:-latest}
Expand Down Expand Up @@ -139,6 +140,7 @@ services:
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- DATAPREP_INGEST_SERVICE_ENDPOINT=${DATAPREP_INGEST_SERVICE_ENDPOINT}
- DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT=${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT}
- DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT=${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT}
ipc: host
Expand Down
5 changes: 3 additions & 2 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/multimodalqna"
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/ingest_with_text"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_transcripts"
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/generate_captions"
export DATAPREP_GET_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_videos"
export DATAPREP_DELETE_VIDEO_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_videos"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_files"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_files"
Loading
Loading