New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serving] Image support in JSONFFIEngine #2208
Conversation
@anibohara2000 please fix the windows build error |
bfb5154
to
727c351
Compare
cpp/json_ffi/image_utils.cc
Outdated
|
||
} // namespace json_ffi | ||
} // namespace llm | ||
} // namespace mlc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need new line at end of file
cpp/json_ffi/image_utils.cc
Outdated
// Center crop | ||
const int crop_x = (new_width - target_size) / 2; | ||
const int crop_y = (new_height - target_size) / 2; | ||
float* cropped_image_data = new float[target_size * target_size * 3](); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid using new, instead use std::vector cropped_imagte_data(size);
They get automatically released when going out of scope
cpp/json_ffi/image_utils.cc
Outdated
MemoryBufferStream stream(base64_str.c_str(), base64_str.size()); | ||
tvm::support::Base64InStream base64_stream(&stream); | ||
size_t decoded_size = Base64DecodedSize(base64_str); | ||
unsigned char* decoded = new unsigned char[decoded_size](); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer std::vector/std::string over new
cpp/json_ffi/config.cc
Outdated
} | ||
|
||
picojson::object config = json::LoadJSONFromString(model_config_str, err).value(); | ||
if (config.find("model_config") == config.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we are lazily loading these config per request. I think it is better to populate once during reload
cpp/json_ffi/image_utils.cc
Outdated
const int new_width = width < height ? new_short_side : new_long_side; | ||
const int new_height = width > height ? new_short_side : new_long_side; | ||
|
||
float* processed_image_data = new float[new_width * new_height * 3](); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer managed memory (std::vector
cpp/json_ffi/image_utils.h
Outdated
|
||
using namespace tvm::runtime; | ||
|
||
unsigned char* LoadImageFromBase64(std::string base64_str, int* width, int* height, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally prefer managed memory. We can consider return an NDArray (on CPU), with {height, width} dimension and uint8
as dtype.
|
||
|
||
class JSONFFIEngine: | ||
def __init__( # pylint: disable=too-many-arguments,too-many-locals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us consolidate the implementation of JSONFFIEngine into a single place.
mlc_llm/python/json_ffi/engine.py
Then import these from there, so we don't have to duplicate things for json_ffi_engine_image and json_ffi testcases
#2241 should help that lifts out the common JSONFFI into a package namespace |
48cc617
to
78f2811
Compare
cpp/json_ffi/image_utils.h
Outdated
|
||
std::optional<NDArray> LoadImageFromBase64(std::string base64_str, std::string* err); | ||
|
||
NDArray ClipPreprocessor(NDArray image_data, int target_size, DLDevice device, std::string* err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: document all public API
cpp/json_ffi/image_utils.h
Outdated
|
||
using namespace tvm::runtime; | ||
|
||
std::optional<NDArray> LoadImageFromBase64(std::string base64_str, std::string* err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const std::string &base64_str
cpp/json_ffi/image_utils.h
Outdated
namespace llm { | ||
namespace json_ffi { | ||
|
||
using namespace tvm::runtime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid using namespace in header file, instead, do tvm::runtime::NDArray in arguments
cpp/json_ffi/config.cc
Outdated
*err += "vision_config should be an object"; | ||
return std::nullopt; | ||
} | ||
picojson::object vision_config = model_config["vision_config"].get<picojson::object>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create VisionConfig and ModelConfig as Explicit in memory structure object(ModelConfig has an optional VisionConfig field) in-memory so we don't have to such parsing per Call
cpp/json_ffi/config.cc
Outdated
*err += "model_config should be an object"; | ||
return std::nullopt; | ||
} | ||
picojson::object model_config = config["model_config"].get<picojson::object>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use explicit object. In this case, ideally we should do const picojson::object& model_config
, otherwise it will trigger a deep copy of the object. Of course moving to explicit in-memory structure will avoid the issue. Let us do that
we just had a major refactor landed, unfortunately there was a bit of conflict, please rebase |
83a5243
to
afe66e4
Compare
afe66e4
to
bdffdf0
Compare
Thank you @anibohara2000 ! |
This PR adds support for Image input in JSONFFIEngine, so that now LLaVa model can be used.
The image has to be passed as base64 encoded. See
tests/python/json_ffi/test_json_ffi_engine_image.py
for example usage.For reading the image in C++, the header-only file
stb_image.h
is being used from the nothings/stb repository. This repository has been added as a submodule in3rdparty/
The image has to be pre-processed before passing into the LLaVa model (which uses CLIP Vision model). Currently Bilinear interpolation is being used for resizing, instead of Huggingface CLIPImageProcessor's default Bicubic interpolation