Pulse · mlc-ai/mlc-llm · GitHub

April 23, 2024 – April 30, 2024

Overview

29 Active pull requests

33 Active issues

24 Pull requests merged by 14 people

[Tokenizer] Support ByteLevel BPE in tokenizer token table
#2248 merged Apr 30, 2024
[Fix] fix a typo in event_trace_recorder
#2253 merged Apr 30, 2024
[Serving] Introduce DraftTokenWorkspaceManager
#2250 merged Apr 30, 2024
[Support] Simplify function names in encoding.h
#2251 merged Apr 30, 2024
Model Library Delivery
#2139 merged Apr 29, 2024
[Sampler] FlashInfer sampling func integration
#2224 merged Apr 29, 2024
[DOC] Improve Install via environment variable
#2245 merged Apr 29, 2024
[Serving] Share disco sessions among multiple model function tables
#2242 merged Apr 28, 2024
[REFACTOR] Migrate JSONFFIEngine to formal namespace
#2241 merged Apr 27, 2024
[Bugfix] layer_norm_eps in GPT2Config should be float
#2240 merged Apr 27, 2024
[Serving] Creating EngineConfig from JSON
#2237 merged Apr 27, 2024
[Op] Batch Verify: accept proposal when p and q are close enough
#2236 merged Apr 27, 2024
[Op] Top-p cutoff pivot
#2221 merged Apr 27, 2024
[Docs] Update deploy/ios#bring-your-own-model-library
#2235 merged Apr 26, 2024
[Pass] Support two-stage softmax
#2220 merged Apr 26, 2024
[Sampler] Fix GPU sampler behavior when batch size is 0
#2234 merged Apr 26, 2024
[JSONFFIEngine] Support generation config in JSONFFIEngine. Default config values to NOT_GIVEN
#2225 merged Apr 26, 2024
[Serving] Remove `cli.model_metadata` import from engine base
#2226 merged Apr 26, 2024
[Serving] Support RWKV for serving
#2111 merged Apr 25, 2024
[PYTHON][KVCACHE] Enhance the thread limit for opencl
#2216 merged Apr 25, 2024
[Android ] Enable OpenCL host pointer usage
#2215 merged Apr 25, 2024
[Fix] CUDA architecture detection bug fix
#2211 merged Apr 24, 2024
[Python] Rename LLMEngine to MLCEngine
#2210 merged Apr 24, 2024
[Sampler] Prob renormalization with top p for spec decoding
#2201 merged Apr 23, 2024

5 Pull requests opened by 5 people

[Serving] Image support in JSONFFIEngine
#2208 opened Apr 23, 2024
[ANDROID] Revive mlc_chat_cli utility
#2214 opened Apr 25, 2024
[SLM] Introduce microsoft/Phi-3
#2222 opened Apr 25, 2024
[SLM] Support BERT architecture. Implement a text embedding module
#2249 opened Apr 29, 2024
[Eagle] Avoid worker - engine transfer for hidden states
#2256 opened Apr 30, 2024

12 Issues closed by 4 people

How can I deploy a single-card MLC-LLM model? I want the model inference to run only on one card, not distributed.
#2213 closed Apr 29, 2024
[Bug] mlc_llm command does not respect current conda environment
#2135 closed Apr 28, 2024
[Bug] input_ids expects Tensor with ndim 2 but get 1
#1923 closed Apr 27, 2024
[Question]InternalError: Check failed: (lib_handle_ != nullptr) is false: Failed to load dynamic shared library dist\prebuilt\lib\Llama-2-7b-chat-hf\Llama-2-7b-chat-hf-q4f16_1-metal.so Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.
#1927 closed Apr 27, 2024
InternalError: Check failed: (config_istream) is false:
#2021 closed Apr 27, 2024
Error! Cannot open libOpenCL! My Android Phone says this:
#1562 closed Apr 27, 2024
[Bug] Cuda Error library ,TVM Compilation
#1434 closed Apr 27, 2024
[Bug]
#1481 closed Apr 27, 2024
[Bug] Llama2-13b q4f16_1 crash on Snapdragon8 gen3
#1487 closed Apr 27, 2024
[Doc] Instructions on how to install on Intel Arc dGPU
#2181 closed Apr 25, 2024
[Bug] rocm57 flow nightly crashes
#2144 closed Apr 24, 2024
[Question] CMake Error at /mnt/f/mlc-llm/CMakeLists.txt:65 (add_subdirectory)
#2209 closed Apr 24, 2024

21 Issues opened by 18 people

[Bug] `system-lib-prefix` would be cleared if `device` is not strictly `android` while `mlc_llm compile`
#2255 opened Apr 30, 2024
[Bug] `mlc_llm chat` throws errors for model `mlc-ai/Qwen1.5-1.8B-Chat-q4f16_1-MLC`
#2254 opened Apr 30, 2024
[Bug] Error: could not compile `regex-syntax`
#2252 opened Apr 30, 2024
[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models?
#2247 opened Apr 29, 2024
Phi-3-3.8 billion model [Model Request]
#2246 opened Apr 29, 2024
AutoTVM optimization?
#2244 opened Apr 28, 2024
[Bug] TVMError: Check failed: (result) is false: Failed to allocate 99121664 bytes with alignment 16 bytes
#2243 opened Apr 28, 2024
[Bug] Unexpected Error: The model weight size may be larger than GPU memory size
#2239 opened Apr 27, 2024
[Model Request] Microsoft Phi-3 mini Instruct (Faster and better then LLama 3 8B)
#2238 opened Apr 27, 2024
[Bug] libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [14:02:26]
#2233 opened Apr 26, 2024
[Question] Support for Custom Attention Mask
#2232 opened Apr 26, 2024
[Model Request] OpenELM
#2231 opened Apr 26, 2024
[Question] Is Apple Silicon Neural Engine (ANE) and Core ML model package format supported?
#2230 opened Apr 26, 2024
[Question] Is there an embeddings model in MLC format?
#2229 opened Apr 26, 2024
[Question] Can I serve multiple models with the same instance?
#2228 opened Apr 26, 2024
[Question] Is GGUF model package format supported with quantized models?
#2227 opened Apr 26, 2024
[Bug] Token IDs not accepted by JSON grammar
#2223 opened Apr 25, 2024
[Bug] Failed to compile because the correct code page is not set
#2219 opened Apr 25, 2024
[Question] Rust SDK + WebAssembly + GPU?
#2218 opened Apr 25, 2024
[NOTICE] Transition from ChatModule to MLCEngine
#2217 opened Apr 25, 2024
[Bug] AttributeError: Module has no function 'vm_load_executable' encountered in Step 4 of the "Bring Your Own Model Library" tutorial docs/deploy/ios.html#bring-your-own-model-library
#2212 opened Apr 24, 2024

20 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Phi 3 128
#2205 commented on Apr 26, 2024 • 5 new comments
[Question] Can PagedKVCache support different size of kvcache in different layers?
#2193 commented on Apr 24, 2024 • 3 new comments
[Question] Issues with model deployment after pruning
#1654 commented on Apr 25, 2024 • 2 new comments
[Feature Request] Change OpenAI protocol default value to NOT_GIVEN
#2114 commented on Apr 27, 2024 • 2 new comments
[Bug] gemma 2b start chatting error
#2203 commented on Apr 25, 2024 • 2 new comments
Support Qwen2-MoE Architecture
#2089 commented on Apr 29, 2024 • 1 new comment
[Bug] relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
#2162 commented on Apr 30, 2024 • 1 new comment
[Feature Request] run the LLM model on the Qualcomm Hexagon NPU in Android OS
#1689 commented on Apr 28, 2024 • 1 new comment
[Feature Request] Nightly or Weekly Android apk build
#2194 commented on Apr 25, 2024 • 1 new comment
[Question] Support for Flutter
#766 commented on Apr 25, 2024 • 1 new comment
[Feature Request] Do you have any plan to support CPU backend on Android devices?
#1106 commented on Apr 25, 2024 • 1 new comment
[Question] Why read generation config in every decode step?
#2150 commented on Apr 25, 2024 • 1 new comment
[Question] decode Func What operations exist between two adjacent operator operations?
#2149 commented on Apr 24, 2024 • 1 new comment
PoC implementation of SmoothQuant
#855 commented on Apr 25, 2024 • 0 new comments
Implement Whisper in new concise nn.Module API
#868 commented on Apr 25, 2024 • 0 new comments
Llava module implementation and pre-build
#1235 commented on Apr 25, 2024 • 0 new comments
Add docker container support
#1271 commented on Apr 25, 2024 • 0 new comments
[WebUI] Add tutorial for WebUI
#1291 commented on Apr 25, 2024 • 0 new comments
[Doc] List of SLM Supported Models
#1516 commented on Apr 25, 2024 • 0 new comments
[Serving] Support Gemma for serving
#1806 commented on Apr 25, 2024 • 0 new comments