Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU OOM during the embedding process #13428

Open
1 task done
mistoFENG opened this issue May 11, 2024 · 1 comment
Open
1 task done

GPU OOM during the embedding process #13428

mistoFENG opened this issue May 11, 2024 · 1 comment
Labels
question Further information is requested

Comments

@mistoFENG
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

As the file size changes during the embedding process, the GPU memory also increases. This will cause GPU OOM. How to avoid it? Thanks

@mistoFENG mistoFENG added the question Further information is requested label May 11, 2024
Copy link

dosubot bot commented May 11, 2024

To manage GPU memory and avoid Out of Memory (OOM) errors during the embedding process, consider implementing the following strategies:

  1. Limit Batch Size: Ensure your data is processed in smaller batches that do not exceed the maximum batch size limit, which is set to 259 in the NVIDIAEmbedding class. This helps in preventing excessive GPU memory usage.

  2. Truncate Input Text: Use the truncation feature in the NVIDIAEmbedding class to keep the input text within the model's maximum token length, reducing memory usage.

  3. Optimize Memory Usage: After each embedding operation, use torch.cuda.empty_cache() and gc.collect() to clear unused memory from the GPU and trigger garbage collection in Python, respectively. This can help in efficiently managing GPU memory.

  4. Adjust Configurable Parameters: Experiment with adjusting parameters such as DEFAULT_EMBED_BATCH_SIZE, DEFAULT_CONTEXT_WINDOW, and others to find a balance that works best for your specific use case and helps in managing GPU memory usage.

  5. Utilize Error Handling Mechanisms: The codebase includes error handling for embedding processes, which can be extended or adapted to manage GPU OOM errors by logging or taking recovery actions through the callback system.

By carefully managing batch sizes, input text length, and memory usage, and by adjusting configurable parameters, you can mitigate the risk of GPU OOM errors during the embedding process.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant