Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: LocalTensorRTLLM not loading without any debug. #10444

Closed
teis-e opened this issue Feb 4, 2024 · 1 comment
Closed

[Bug]: LocalTensorRTLLM not loading without any debug. #10444

teis-e opened this issue Feb 4, 2024 · 1 comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@teis-e
Copy link

teis-e commented Feb 4, 2024

Bug Description

I got all the necessary files provided in this Nvidia TensorRT-LLM documantation

Like this in my dir:

app.py
model/
├── .gitkeep
├── config.json
├── generation_config.json
├── LICENSE.txt
├── llama_float16_tp1_rank0.engine
├── model.cache
├── model.safetensors.index.json
├── model-00001-of-00003.safetensors
├── model-00002-of-00003.safetensors
├── model-00003-of-00003.safetensors
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
├── pytorch_model-00002-of-00003.bin
├── pytorch_model-00003-of-00003.bin
├── README.md
├── Responsible-Use-Guide.pdf
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer.model
├── tokenizer_config.json
└── USE_POLICY.md

All in a venv with a working llama index.

Cuda, cuDNN etc is installed.

Operating system is WSL 2.0 Ubuntu 22.04 in Windows 10

`import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.llms import LocalTensorRTLLM

def completion_to_prompt(completion: str) -> str:
"""
Given a completion, return the prompt using llama2 format.
"""
return f" [INST] {completion} [/INST] "

llm = LocalTensorRTLLM(
model_path="./model",
engine_name="llama_float16_tp1_rank0.engine",
tokenizer_dir="meta-llama/Llama-2-13b-chat",
completion_to_prompt=completion_to_prompt,
verbose=True
)

resp = llm.complete("Who is Paul Graham?")
print(str(resp))
`

Version

0.9.43

Steps to Reproduce

Running the code with all necessary libraries installed.

Relevant Logs/Tracbacks

No response

@teis-e teis-e added bug Something isn't working triage Issue needs to be triaged/prioritized labels Feb 4, 2024
Copy link

dosubot bot commented Feb 4, 2024

🤖

Hey @teis-e! Great to see you back. Hope you're doing well!

Based on the information you've provided, it's difficult to pinpoint the exact issue as there are no error messages or logs. However, I can suggest a few things to check:

  1. Model File Existence: Ensure that the model file exists at the specified path. The LocalTensorRTLLM class checks if the model file exists at the provided path and raises a ValueError if it does not.

  2. TensorRT-LLM Library: Make sure that the TensorRT-LLM library is installed and can be imported correctly. The code tries to import the tensorrt_llm library and raises an ImportError if it fails.

  3. CUDA Availability: The code requires a CUDA-enabled GPU to run. It checks if CUDA is available and raises an AssertionError if it is not. Please ensure that a CUDA-enabled GPU is available and the necessary drivers are installed.

  4. World Size Matching: The code checks if the world size of the engine matches the runtime world size and raises an AssertionError if they do not match. Please ensure that the world sizes match.

  5. Pipeline Parallelism: The code does not support pipeline parallelism. It raises an AssertionError if the pipeline parallelism size is not 1. Please ensure that the pipeline parallelism size is 1.

  6. Engine File Opening: The code tries to open the engine file and raises an error if it fails. Please ensure that the engine file can be opened correctly.

Regarding the model_path and engine_name parameters in the LocalTensorRTLLM class, model_path should be a directory containing the TensorRT engine and its configuration, and engine_name (if provided) should be the name of the serialized TensorRT engine file within that directory.

If you're still encountering issues, could you please provide more specific error messages or logs? This would help in diagnosing the problem more accurately.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 5, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

1 participant