[Serving] Image support in JSONFFIEngine #2208

anibohara2000 · 2024-04-23T23:34:59Z

This PR adds support for Image input in JSONFFIEngine, so that now LLaVa model can be used.

The image has to be passed as base64 encoded. See tests/python/json_ffi/test_json_ffi_engine_image.py for example usage.

For reading the image in C++, the header-only file stb_image.h is being used from the nothings/stb repository. This repository has been added as a submodule in 3rdparty/

The image has to be pre-processed before passing into the LLaVa model (which uses CLIP Vision model). Currently Bilinear interpolation is being used for resizing, instead of Huggingface CLIPImageProcessor's default Bicubic interpolation

tqchen · 2024-04-24T17:54:08Z

@anibohara2000 please fix the windows build error

tqchen · 2024-04-27T20:13:35Z

cpp/json_ffi/image_utils.cc

+
+}  // namespace json_ffi
+}  // namespace llm
+}  // namespace mlc


need new line at end of file

tqchen · 2024-04-27T20:14:09Z

cpp/json_ffi/image_utils.cc

+  // Center crop
+  const int crop_x = (new_width - target_size) / 2;
+  const int crop_y = (new_height - target_size) / 2;
+  float* cropped_image_data = new float[target_size * target_size * 3]();


avoid using new, instead use std::vector cropped_imagte_data(size);

They get automatically released when going out of scope

tqchen · 2024-04-27T20:16:11Z

cpp/json_ffi/image_utils.cc

+  MemoryBufferStream stream(base64_str.c_str(), base64_str.size());
+  tvm::support::Base64InStream base64_stream(&stream);
+  size_t decoded_size = Base64DecodedSize(base64_str);
+  unsigned char* decoded = new unsigned char[decoded_size]();


prefer std::vector/std::string over new

tqchen · 2024-04-27T20:19:28Z

cpp/json_ffi/config.cc

+        }
+
+        picojson::object config = json::LoadJSONFromString(model_config_str, err).value();
+        if (config.find("model_config") == config.end()) {


Seems we are lazily loading these config per request. I think it is better to populate once during reload

tqchen · 2024-04-27T20:20:05Z

cpp/json_ffi/image_utils.cc

+  const int new_width = width < height ? new_short_side : new_long_side;
+  const int new_height = width > height ? new_short_side : new_long_side;
+
+  float* processed_image_data = new float[new_width * new_height * 3]();


prefer managed memory (std::vector

tqchen · 2024-04-27T20:22:18Z

cpp/json_ffi/image_utils.h

+
+using namespace tvm::runtime;
+
+unsigned char* LoadImageFromBase64(std::string base64_str, int* width, int* height,


We generally prefer managed memory. We can consider return an NDArray (on CPU), with {height, width} dimension and uint8 as dtype.

tqchen · 2024-04-27T20:23:38Z

tests/python/json_ffi/test_json_ffi_engine_image.py

+
+
+class JSONFFIEngine:
+    def __init__(  # pylint: disable=too-many-arguments,too-many-locals


Let us consolidate the implementation of JSONFFIEngine into a single place.

mlc_llm/python/json_ffi/engine.py

Then import these from there, so we don't have to duplicate things for json_ffi_engine_image and json_ffi testcases

tqchen · 2024-04-27T23:01:16Z

#2241 should help that lifts out the common JSONFFI into a package namespace

tqchen · 2024-04-29T11:58:52Z

cpp/json_ffi/image_utils.h

+
+std::optional<NDArray> LoadImageFromBase64(std::string base64_str, std::string* err);
+
+NDArray ClipPreprocessor(NDArray image_data, int target_size, DLDevice device, std::string* err);


minor nit: document all public API

tqchen · 2024-04-29T11:59:05Z

cpp/json_ffi/image_utils.h

+
+using namespace tvm::runtime;
+
+std::optional<NDArray> LoadImageFromBase64(std::string base64_str, std::string* err);


const std::string &base64_str

tqchen · 2024-04-29T11:59:23Z

cpp/json_ffi/image_utils.h

+namespace llm {
+namespace json_ffi {
+
+using namespace tvm::runtime;


avoid using namespace in header file, instead, do tvm::runtime::NDArray in arguments

tqchen · 2024-04-29T12:06:50Z

cpp/json_ffi/config.cc

+          *err += "vision_config should be an object";
+          return std::nullopt;
+        }
+        picojson::object vision_config = model_config["vision_config"].get<picojson::object>();


Create VisionConfig and ModelConfig as Explicit in memory structure object(ModelConfig has an optional VisionConfig field) in-memory so we don't have to such parsing per Call

tqchen · 2024-04-29T12:08:12Z

cpp/json_ffi/config.cc

+          *err += "model_config should be an object";
+          return std::nullopt;
+        }
+        picojson::object model_config = config["model_config"].get<picojson::object>();


Use explicit object. In this case, ideally we should do const picojson::object& model_config, otherwise it will trigger a deep copy of the object. Of course moving to explicit in-memory structure will avoid the issue. Let us do that

tqchen · 2024-05-03T12:37:17Z

we just had a major refactor landed, unfortunately there was a bit of conflict, please rebase

tqchen · 2024-05-06T14:56:38Z

Thank you @anibohara2000 !

anibohara2000 requested review from tqchen and MasterJH5574 April 23, 2024 23:35

anibohara2000 force-pushed the json-ffi-image branch from bfb5154 to 727c351 Compare April 27, 2024 15:03

tqchen requested changes Apr 27, 2024

View reviewed changes

anibohara2000 force-pushed the json-ffi-image branch from 48cc617 to 78f2811 Compare April 29, 2024 03:34

tqchen reviewed Apr 29, 2024

View reviewed changes

tqchen requested changes Apr 29, 2024

View reviewed changes

anibohara2000 force-pushed the json-ffi-image branch from 83a5243 to afe66e4 Compare May 4, 2024 22:31

Using new Result interface

bdffdf0

anibohara2000 force-pushed the json-ffi-image branch from afe66e4 to bdffdf0 Compare May 5, 2024 15:06

tqchen approved these changes May 6, 2024

View reviewed changes

tqchen merged commit 5ae393a into mlc-ai:main May 6, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serving] Image support in JSONFFIEngine #2208

[Serving] Image support in JSONFFIEngine #2208

anibohara2000 commented Apr 23, 2024

tqchen commented Apr 24, 2024

tqchen Apr 27, 2024

tqchen Apr 27, 2024

tqchen Apr 27, 2024

tqchen Apr 27, 2024

tqchen Apr 27, 2024

tqchen Apr 27, 2024

tqchen Apr 27, 2024

tqchen commented Apr 27, 2024

tqchen Apr 29, 2024

tqchen Apr 29, 2024

tqchen Apr 29, 2024

tqchen Apr 29, 2024

tqchen Apr 29, 2024

tqchen commented May 3, 2024

tqchen commented May 6, 2024


		using namespace tvm::runtime;

		unsigned char* LoadImageFromBase64(std::string base64_str, int* width, int* height,



		class JSONFFIEngine:
		def __init__( # pylint: disable=too-many-arguments,too-many-locals


		std::optional<NDArray> LoadImageFromBase64(std::string base64_str, std::string* err);

		NDArray ClipPreprocessor(NDArray image_data, int target_size, DLDevice device, std::string* err);

[Serving] Image support in JSONFFIEngine #2208

[Serving] Image support in JSONFFIEngine #2208

Conversation

anibohara2000 commented Apr 23, 2024

tqchen commented Apr 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Apr 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented May 3, 2024

tqchen commented May 6, 2024