[Bug]: LocalTensorRTLLM not loading without any debug. #10444

teis-e · 2024-02-04T18:13:24Z

Bug Description

I got all the necessary files provided in this Nvidia TensorRT-LLM documantation

Like this in my dir:

app.py
model/
├── .gitkeep
├── config.json
├── generation_config.json
├── LICENSE.txt
├── llama_float16_tp1_rank0.engine
├── model.cache
├── model.safetensors.index.json
├── model-00001-of-00003.safetensors
├── model-00002-of-00003.safetensors
├── model-00003-of-00003.safetensors
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
├── pytorch_model-00002-of-00003.bin
├── pytorch_model-00003-of-00003.bin
├── README.md
├── Responsible-Use-Guide.pdf
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer.model
├── tokenizer_config.json
└── USE_POLICY.md

All in a venv with a working llama index.

Cuda, cuDNN etc is installed.

Operating system is WSL 2.0 Ubuntu 22.04 in Windows 10

`import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.llms import LocalTensorRTLLM

def completion_to_prompt(completion: str) -> str:
"""
Given a completion, return the prompt using llama2 format.
"""
return f" ~~[INST] {completion} [/INST] "~~

llm = LocalTensorRTLLM(
model_path="./model",
engine_name="llama_float16_tp1_rank0.engine",
tokenizer_dir="meta-llama/Llama-2-13b-chat",
completion_to_prompt=completion_to_prompt,
verbose=True
)

resp = llm.complete("Who is Paul Graham?")
print(str(resp))
`

Version

0.9.43

Steps to Reproduce

Running the code with all necessary libraries installed.

Relevant Logs/Tracbacks

No response

dosubot · 2024-02-04T18:13:42Z

🤖

Hey @teis-e! Great to see you back. Hope you're doing well!

Based on the information you've provided, it's difficult to pinpoint the exact issue as there are no error messages or logs. However, I can suggest a few things to check:

Model File Existence: Ensure that the model file exists at the specified path. The LocalTensorRTLLM class checks if the model file exists at the provided path and raises a ValueError if it does not.
TensorRT-LLM Library: Make sure that the TensorRT-LLM library is installed and can be imported correctly. The code tries to import the tensorrt_llm library and raises an ImportError if it fails.
CUDA Availability: The code requires a CUDA-enabled GPU to run. It checks if CUDA is available and raises an AssertionError if it is not. Please ensure that a CUDA-enabled GPU is available and the necessary drivers are installed.
World Size Matching: The code checks if the world size of the engine matches the runtime world size and raises an AssertionError if they do not match. Please ensure that the world sizes match.
Pipeline Parallelism: The code does not support pipeline parallelism. It raises an AssertionError if the pipeline parallelism size is not 1. Please ensure that the pipeline parallelism size is 1.
Engine File Opening: The code tries to open the engine file and raises an error if it fails. Please ensure that the engine file can be opened correctly.

Regarding the model_path and engine_name parameters in the LocalTensorRTLLM class, model_path should be a directory containing the TensorRT engine and its configuration, and engine_name (if provided) should be the name of the serialized TensorRT engine file within that directory.

If you're still encountering issues, could you please provide more specific error messages or logs? This would help in diagnosing the problem more accurately.

Sources

llama_index/llms/nvidia_tensorrt.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

teis-e added bug Something isn't working triage Issue needs to be triaged/prioritized labels Feb 4, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 5, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: LocalTensorRTLLM not loading without any debug. #10444

[Bug]: LocalTensorRTLLM not loading without any debug. #10444

teis-e commented Feb 4, 2024

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Feb 4, 2024 •

edited

[Bug]: LocalTensorRTLLM not loading without any debug. #10444

[Bug]: LocalTensorRTLLM not loading without any debug. #10444

Comments

teis-e commented Feb 4, 2024

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Feb 4, 2024 • edited

Sources

dosubot bot commented Feb 4, 2024 •

edited