You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @teis-e! Great to see you back. Hope you're doing well!
Based on the information you've provided, it's difficult to pinpoint the exact issue as there are no error messages or logs. However, I can suggest a few things to check:
Model File Existence: Ensure that the model file exists at the specified path. The LocalTensorRTLLM class checks if the model file exists at the provided path and raises a ValueError if it does not.
TensorRT-LLM Library: Make sure that the TensorRT-LLM library is installed and can be imported correctly. The code tries to import the tensorrt_llm library and raises an ImportError if it fails.
CUDA Availability: The code requires a CUDA-enabled GPU to run. It checks if CUDA is available and raises an AssertionError if it is not. Please ensure that a CUDA-enabled GPU is available and the necessary drivers are installed.
World Size Matching: The code checks if the world size of the engine matches the runtime world size and raises an AssertionError if they do not match. Please ensure that the world sizes match.
Pipeline Parallelism: The code does not support pipeline parallelism. It raises an AssertionError if the pipeline parallelism size is not 1. Please ensure that the pipeline parallelism size is 1.
Engine File Opening: The code tries to open the engine file and raises an error if it fails. Please ensure that the engine file can be opened correctly.
Regarding the model_path and engine_name parameters in the LocalTensorRTLLM class, model_path should be a directory containing the TensorRT engine and its configuration, and engine_name (if provided) should be the name of the serialized TensorRT engine file within that directory.
If you're still encountering issues, could you please provide more specific error messages or logs? This would help in diagnosing the problem more accurately.
Bug Description
I got all the necessary files provided in this Nvidia TensorRT-LLM documantation
Like this in my dir:
app.py
model/
├── .gitkeep
├── config.json
├── generation_config.json
├── LICENSE.txt
├── llama_float16_tp1_rank0.engine
├── model.cache
├── model.safetensors.index.json
├── model-00001-of-00003.safetensors
├── model-00002-of-00003.safetensors
├── model-00003-of-00003.safetensors
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
├── pytorch_model-00002-of-00003.bin
├── pytorch_model-00003-of-00003.bin
├── README.md
├── Responsible-Use-Guide.pdf
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer.model
├── tokenizer_config.json
└── USE_POLICY.md
All in a venv with a working llama index.
Cuda, cuDNN etc is installed.
Operating system is WSL 2.0 Ubuntu 22.04 in Windows 10
`import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.llms import LocalTensorRTLLM
def completion_to_prompt(completion: str) -> str:
"""
Given a completion, return the prompt using llama2 format.
"""
return f"
[INST] {completion} [/INST] "llm = LocalTensorRTLLM(
model_path="./model",
engine_name="llama_float16_tp1_rank0.engine",
tokenizer_dir="meta-llama/Llama-2-13b-chat",
completion_to_prompt=completion_to_prompt,
verbose=True
)
resp = llm.complete("Who is Paul Graham?")
print(str(resp))
`
Version
0.9.43
Steps to Reproduce
Running the code with all necessary libraries installed.
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: