Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Why indexing so slow? #13313

Closed
1 task done
jiaoqianji opened this issue May 7, 2024 · 6 comments
Closed
1 task done

[Question]: Why indexing so slow? #13313

jiaoqianji opened this issue May 7, 2024 · 6 comments
Labels
question Further information is requested

Comments

@jiaoqianji
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I'm trying to create indexes for a database that is 50 GB in size, and I've spent more than 5 days running the code below without any results. Then I noticed that the GPU utilization remains at 0; what should I do to speed up the indexing process?

image

import os
from datetime import datetime, timedelta
from typing import Optional, Any

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.embeddings import resolve_embed_model
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

data_path = "/databases/"
PERSIST_DIR = "/data_cache/fin_databases"
embedding_model_path = "/Qwen1.5-0.5B/"

documents = SimpleDirectoryReader(data_path).load_data()

embed_model = HuggingFaceEmbedding(
    model_name=embedding_model_path,
    device="cuda:7"
)

index = VectorStoreIndex.from_documents(documents, show_progress=True, transformations=[embed_model])

index.storage_context.persist(persist_dir=PERSIST_DIR)
@jiaoqianji jiaoqianji added the question Further information is requested label May 7, 2024
Copy link

dosubot bot commented May 7, 2024

To address the issue of 0% GPU utilization during the indexing process with your NVIDIA A40 GPU, follow these targeted steps:

  1. Ensure CUDA Compatibility: Confirm that CUDA is correctly installed and that its version is compatible with your GPU. This is crucial for enabling GPU acceleration in Python.

  2. Framework Version Check: Make sure you're using a version of PyTorch or TensorFlow that supports your CUDA version. Compatibility between these components is essential for GPU utilization.

  3. Optimize with LocalTensorRTLLM: For efficient NVIDIA GPU utilization, particularly with TensorRT for language models, use the LocalTensorRTLLM class from llama_index.legacy.llms.nvidia_tensorrt. This class is specifically designed to leverage NVIDIA GPUs:

    from llama_index.legacy.llms.nvidia_tensorrt import LocalTensorRTLLM
    
    tensorrt_llm = LocalTensorRTLLM(
        model_path="path/to/your/model",
        tokenizer_dir="path/to/your/tokenizer",
        temperature=0.1,
        max_new_tokens=512,
        context_window=2048,
        verbose=True
    )
  4. Verify GPU Device Identifier: Double-check that cuda:7 correctly identifies your NVIDIA A40 GPU. Use nvidia-smi to list all CUDA-capable devices and ensure the identifier matches.

  5. Monitor GPU Utilization: Utilize nvidia-smi to monitor GPU utilization in real-time, confirming that your GPU is being engaged during the indexing operation.

  6. Debugging: If the issue persists, simplify your code to a basic operation that should engage the GPU, helping to isolate the problem.

This approach focuses on ensuring compatibility and proper configuration for GPU utilization, leveraging specific classes designed for NVIDIA GPUs, and monitoring to confirm active GPU engagement.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@logan-markewich
Copy link
Collaborator

50GB of data is a ton of data to load into memory. I wonder if it's getting stuck there? If you hit ctrl+c, the traceback would tell us where it is in the code.

I would also suggest testing the embedding model on its own first to ensure it uses GPU
embeddings = embed_model.get_text_embedding("Hello World!") is an easy test

If you have the memory, you can also increase the batch size: HuggingFaceEmbedding(..., embed_batch_size=10) (default is 10)

@logan-markewich
Copy link
Collaborator

Also, by specifying transformations=[embed_model], you are removing any text splitting (not sure if intended or not)

@logan-markewich
Copy link
Collaborator

You also didn't specify a vector DB, so all the embeddings (50GB of text of embeddings!!) will be in memory -- probably also not ideal

@logan-markewich
Copy link
Collaborator

If I was to re-write this, it might look like:

splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=128)
vector_store = <some remote/hosted/optimized vector store, qdrant, chroma, weaviate, etc.>
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes=[], storage_context=storage_context, embed_model=embed_model)

# iter and insert in batches
cur_batch = []
for doc in SimpleDirectoryReader(data_path).iter_data():
  cur_batch.append(doc)
  if len(cur_batch) == 100:
    nodes = splitter(cur_batch)
    index.insert_nodes(nodes)
    cur_batch = []

# get the last batch
nodes = splitter(cur_batch)
index.insert_nodes(nodes)

@jiaoqianji
Copy link
Author

Thanks for ur help, really apperciate it. I realized it was stuck and split it into small batches then. thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants