[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

mshakirDr · 2024-08-10T04:20:38Z

Question

I have been trying to ingest about 1000 PDFs through PGPT. After testing I found that pipeline with 1 worker is the fastest option on my system (any more workers hinder the speed). However, I found that the 8 GB VRAM and 32 GB (out of 64 GB) shared memory of my system quickly gets occupied even if I try to ingest 10 PDFs at a time. I tried to circumvent the memory hogging issue by restarting the pipeline every time. See below how I build a chunking solution by using LocalIngestWorker from ingest_folder.py.

    files = get_list_of_combined_files(folders)
    print(len(files))
    split_into_chunks = lambda lst, n: [lst[i:i+n] for i in range(0, len(lst), n)]
    list_of_size_30_chunks = split_into_chunks(files, 10)
    for index, chunk in enumerate(list_of_size_30_chunks):
        print("Chunk number", index, "of", len(list_of_size_30_chunks))
        destination = r"\Temp\\"
        copy_new_files(destination, chunk)
        ingest_service = global_injector.get(IngestService)
        settings = global_injector.get(Settings)
        worker = LocalIngestWorker(ingest_service, settings)
        worker.ingest_folder(Path(destination), irgnored)
        del worker
        del ingest_service
        del settings

However this does not release the memory at the end of for loop and the same problem persists (I even tried del with no luck). I tried to search around about potential memory leak issues with huggingface text embeddings solution: found this memory leak issue.
Is it just me or anyone else also facing the same issue with ingest mode pipeline, huggingface on an nvidia gpu? I would appreciate any solution or suggestions.

The text was updated successfully, but these errors were encountered:

mshakirDr · 2024-08-11T11:18:47Z

I have found a work around by ingesting 5 pdfs at one time, then clear torch cuda cache, and restart the process again (pipeline mode, mock profile, huggingface embedding model). It is slow, but it works. The memory is reset after every batch. It takes time to write the results to database, the GPU is idle in the meantime but it is the most efficient way I could find based on my hardware. Added the following at the end of my code adapted from ingest_folder.py.

    del worker
    del settings
    del ingest_service
    with torch.no_grad():
        torch.cuda.empty_cache()
        gc.collect()

mshakirDr added the question Further information is requested label Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

mshakirDr commented Aug 10, 2024

mshakirDr commented Aug 11, 2024

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

[QUESTION] Is there a memory leak in huggingface embedding with pipeline mode #2054

Comments

mshakirDr commented Aug 10, 2024

Question

mshakirDr commented Aug 11, 2024