Insights: mlc-ai/mlc-llm
Overview
Could not load contribution data
Please try again later
24 Pull requests merged by 14 people
-
[Tokenizer] Support ByteLevel BPE in tokenizer token table
#2248 merged
Apr 30, 2024 -
[Fix] fix a typo in event_trace_recorder
#2253 merged
Apr 30, 2024 -
[Serving] Introduce DraftTokenWorkspaceManager
#2250 merged
Apr 30, 2024 -
[Support] Simplify function names in encoding.h
#2251 merged
Apr 30, 2024 -
Model Library Delivery
#2139 merged
Apr 29, 2024 -
[Sampler] FlashInfer sampling func integration
#2224 merged
Apr 29, 2024 -
[DOC] Improve Install via environment variable
#2245 merged
Apr 29, 2024 -
[Serving] Share disco sessions among multiple model function tables
#2242 merged
Apr 28, 2024 -
[REFACTOR] Migrate JSONFFIEngine to formal namespace
#2241 merged
Apr 27, 2024 -
[Bugfix] layer_norm_eps in GPT2Config should be float
#2240 merged
Apr 27, 2024 -
[Serving] Creating EngineConfig from JSON
#2237 merged
Apr 27, 2024 -
[Op] Batch Verify: accept proposal when p and q are close enough
#2236 merged
Apr 27, 2024 -
[Op] Top-p cutoff pivot
#2221 merged
Apr 27, 2024 -
[Docs] Update deploy/ios#bring-your-own-model-library
#2235 merged
Apr 26, 2024 -
[Pass] Support two-stage softmax
#2220 merged
Apr 26, 2024 -
[Sampler] Fix GPU sampler behavior when batch size is 0
#2234 merged
Apr 26, 2024 -
[JSONFFIEngine] Support generation config in JSONFFIEngine. Default config values to NOT_GIVEN
#2225 merged
Apr 26, 2024 -
[Serving] Remove `cli.model_metadata` import from engine base
#2226 merged
Apr 26, 2024 -
[Serving] Support RWKV for serving
#2111 merged
Apr 25, 2024 -
[PYTHON][KVCACHE] Enhance the thread limit for opencl
#2216 merged
Apr 25, 2024 -
[Android ] Enable OpenCL host pointer usage
#2215 merged
Apr 25, 2024 -
[Fix] CUDA architecture detection bug fix
#2211 merged
Apr 24, 2024 -
[Python] Rename LLMEngine to MLCEngine
#2210 merged
Apr 24, 2024 -
[Sampler] Prob renormalization with top p for spec decoding
#2201 merged
Apr 23, 2024
5 Pull requests opened by 5 people
-
[Serving] Image support in JSONFFIEngine
#2208 opened
Apr 23, 2024 -
[ANDROID] Revive mlc_chat_cli utility
#2214 opened
Apr 25, 2024 -
[SLM] Introduce microsoft/Phi-3
#2222 opened
Apr 25, 2024 -
[SLM] Support BERT architecture. Implement a text embedding module
#2249 opened
Apr 29, 2024 -
[Eagle] Avoid worker - engine transfer for hidden states
#2256 opened
Apr 30, 2024
12 Issues closed by 4 people
-
[Bug] mlc_llm command does not respect current conda environment
#2135 closed
Apr 28, 2024 -
[Bug] input_ids expects Tensor with ndim 2 but get 1
#1923 closed
Apr 27, 2024 -
InternalError: Check failed: (config_istream) is false:
#2021 closed
Apr 27, 2024 -
Error! Cannot open libOpenCL! My Android Phone says this:
#1562 closed
Apr 27, 2024 -
[Bug] Cuda Error library ,TVM Compilation
#1434 closed
Apr 27, 2024 -
[Bug]
#1481 closed
Apr 27, 2024 -
[Bug] Llama2-13b q4f16_1 crash on Snapdragon8 gen3
#1487 closed
Apr 27, 2024 -
[Doc] Instructions on how to install on Intel Arc dGPU
#2181 closed
Apr 25, 2024 -
[Bug] rocm57 flow nightly crashes
#2144 closed
Apr 24, 2024 -
[Question] CMake Error at /mnt/f/mlc-llm/CMakeLists.txt:65 (add_subdirectory)
#2209 closed
Apr 24, 2024
21 Issues opened by 18 people
-
[Bug] `system-lib-prefix` would be cleared if `device` is not strictly `android` while `mlc_llm compile`
#2255 opened
Apr 30, 2024 -
[Bug] `mlc_llm chat` throws errors for model `mlc-ai/Qwen1.5-1.8B-Chat-q4f16_1-MLC`
#2254 opened
Apr 30, 2024 -
[Bug] Error: could not compile `regex-syntax`
#2252 opened
Apr 30, 2024 -
Phi-3-3.8 billion model [Model Request]
#2246 opened
Apr 29, 2024 -
AutoTVM optimization?
#2244 opened
Apr 28, 2024 -
[Bug] Unexpected Error: The model weight size may be larger than GPU memory size
#2239 opened
Apr 27, 2024 -
[Model Request] Microsoft Phi-3 mini Instruct (Faster and better then LLama 3 8B)
#2238 opened
Apr 27, 2024 -
[Bug] libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [14:02:26]
#2233 opened
Apr 26, 2024 -
[Question] Support for Custom Attention Mask
#2232 opened
Apr 26, 2024 -
[Model Request] OpenELM
#2231 opened
Apr 26, 2024 -
[Question] Is Apple Silicon Neural Engine (ANE) and Core ML model package format supported?
#2230 opened
Apr 26, 2024 -
[Question] Is there an embeddings model in MLC format?
#2229 opened
Apr 26, 2024 -
[Question] Can I serve multiple models with the same instance?
#2228 opened
Apr 26, 2024 -
[Question] Is GGUF model package format supported with quantized models?
#2227 opened
Apr 26, 2024 -
[Bug] Token IDs not accepted by JSON grammar
#2223 opened
Apr 25, 2024 -
[Bug] Failed to compile because the correct code page is not set
#2219 opened
Apr 25, 2024 -
[Question] Rust SDK + WebAssembly + GPU?
#2218 opened
Apr 25, 2024 -
[NOTICE] Transition from ChatModule to MLCEngine
#2217 opened
Apr 25, 2024
20 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Phi 3 128
#2205 commented on
Apr 26, 2024 • 5 new comments -
[Question] Can PagedKVCache support different size of kvcache in different layers?
#2193 commented on
Apr 24, 2024 • 3 new comments -
[Question] Issues with model deployment after pruning
#1654 commented on
Apr 25, 2024 • 2 new comments -
[Feature Request] Change OpenAI protocol default value to NOT_GIVEN
#2114 commented on
Apr 27, 2024 • 2 new comments -
[Bug] gemma 2b start chatting error
#2203 commented on
Apr 25, 2024 • 2 new comments -
Support Qwen2-MoE Architecture
#2089 commented on
Apr 29, 2024 • 1 new comment -
[Bug] relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
#2162 commented on
Apr 30, 2024 • 1 new comment -
[Feature Request] run the LLM model on the Qualcomm Hexagon NPU in Android OS
#1689 commented on
Apr 28, 2024 • 1 new comment -
[Feature Request] Nightly or Weekly Android apk build
#2194 commented on
Apr 25, 2024 • 1 new comment -
[Question] Support for Flutter
#766 commented on
Apr 25, 2024 • 1 new comment -
[Feature Request] Do you have any plan to support CPU backend on Android devices?
#1106 commented on
Apr 25, 2024 • 1 new comment -
[Question] Why read generation config in every decode step?
#2150 commented on
Apr 25, 2024 • 1 new comment -
[Question] decode Func What operations exist between two adjacent operator operations?
#2149 commented on
Apr 24, 2024 • 1 new comment -
PoC implementation of SmoothQuant
#855 commented on
Apr 25, 2024 • 0 new comments -
Implement Whisper in new concise nn.Module API
#868 commented on
Apr 25, 2024 • 0 new comments -
Llava module implementation and pre-build
#1235 commented on
Apr 25, 2024 • 0 new comments -
Add docker container support
#1271 commented on
Apr 25, 2024 • 0 new comments -
[WebUI] Add tutorial for WebUI
#1291 commented on
Apr 25, 2024 • 0 new comments -
[Doc] List of SLM Supported Models
#1516 commented on
Apr 25, 2024 • 0 new comments -
[Serving] Support Gemma for serving
#1806 commented on
Apr 25, 2024 • 0 new comments