Insights: ggerganov/llama.cpp
Overview
Could not load contribution data
Please try again later
35 Releases published by 1 person
-
b2717
published
Apr 24, 2024 -
b2724
published
Apr 24, 2024 -
b2727
published
Apr 25, 2024 -
b2728
published
Apr 25, 2024 -
b2729
published
Apr 25, 2024 -
b2730
published
Apr 25, 2024 -
b2731
published
Apr 25, 2024 -
b2734
published
Apr 25, 2024 -
b2735
published
Apr 25, 2024 -
b2736
published
Apr 25, 2024 -
b2737
published
Apr 25, 2024 -
b2740
published
Apr 26, 2024 -
b2746
published
Apr 26, 2024 -
b2747
published
Apr 26, 2024 -
b2748
published
Apr 26, 2024 -
b2749
published
Apr 26, 2024 -
b2750
published
Apr 27, 2024 -
b2751
published
Apr 27, 2024 -
b2753
published
Apr 28, 2024 -
b2754
published
Apr 28, 2024 -
b2755
published
Apr 29, 2024 -
b2756
published
Apr 29, 2024 -
b2757
published
Apr 29, 2024 -
b2760
published
Apr 29, 2024 -
b2761
published
Apr 29, 2024 -
b2763
published
Apr 29, 2024 -
b2764
published
Apr 29, 2024 -
b2766
published
Apr 30, 2024 -
b2767
published
Apr 30, 2024 -
b2768
published
Apr 30, 2024 -
b2769
published
Apr 30, 2024 -
b2771
published
Apr 30, 2024 -
b2772
published
Apr 30, 2024 -
b2773
published
Apr 30, 2024 -
b2774
published
Apr 30, 2024
53 Pull requests merged by 26 people
-
hardcode error codes on metal
#7010 merged
Apr 30, 2024 -
metal : remove deprecated error code
#7008 merged
Apr 30, 2024 -
log more info when metal fails
#6987 merged
Apr 30, 2024 -
ggml : add Flash Attention
#5021 merged
Apr 30, 2024 -
convert : use utf8 encoding
#7000 merged
Apr 30, 2024 -
Improve usability of --model-url & related flags
#6930 merged
Apr 29, 2024 -
Extending grammar integration tests
#6644 merged
Apr 29, 2024 -
main : fix typo in comment in main.cpp
#6985 merged
Apr 29, 2024 -
build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`)
#6964 merged
Apr 29, 2024 -
ci : tmp disable gguf-split
#6983 merged
Apr 29, 2024 -
ggml : fix __MSC_VER -> _MSC_VER
#6977 merged
Apr 29, 2024 -
llama : improve BPE pre-processing + LLaMA 3 and Deepseek support
#6920 merged
Apr 29, 2024 -
use std::random_device{}() for default random seed
#6962 merged
Apr 29, 2024 -
Fix conversion of some BERT embedding models
#6937 merged
Apr 29, 2024 -
make : change GNU make default CXX from g++ to c++
#6966 merged
Apr 29, 2024 -
ci : add building in MSYS2 environments (Windows)
#6967 merged
Apr 29, 2024 -
fix typo: LAMMAFILE -> LLAMAFILE
#6974 merged
Apr 29, 2024 -
Fix more int overflow during quant (PPL/CUDA).
#6563 merged
Apr 28, 2024 -
gguf : enforce that tensor names are unique
#6905 merged
Apr 28, 2024 -
[SYCL] add device version in SYCL device list
#6959 merged
Apr 28, 2024 -
nix: update flake.lock
#6952 merged
Apr 28, 2024 -
Replace "alternative" boolean operator in conditional compilation directive
#6949 merged
Apr 27, 2024 -
ci: server: tests python env on github container ubuntu latest / fix n_predict
#6935 merged
Apr 27, 2024 -
Reset schedule earlier to allow overlap with ggml graph computation on device
#6933 merged
Apr 26, 2024 -
`quantize`: add imatrix and dataset metadata in GGUF
#6658 merged
Apr 26, 2024 -
add basic tensor data validation function
#6884 merged
Apr 26, 2024 -
gguf : fix mismatch between alloc and free functions
#6929 merged
Apr 26, 2024 -
llamafile : use 64-bit integers in sgemm
#6928 merged
Apr 26, 2024 -
ci: server: fix python installation
#6925 merged
Apr 26, 2024 -
server: stop generation at `n_ctx_train` if `n_predict` is not set
#6638 merged
Apr 26, 2024 -
ci: server: fix python installation again
#6922 merged
Apr 26, 2024 -
ci: server: fix python installation
#6918 merged
Apr 26, 2024 -
ci: fix concurrency for pull_request_target (again)
#6917 merged
Apr 26, 2024 -
bench: server add stop word for PHI-2
#6916 merged
Apr 26, 2024 -
add support for moondream vision language model
#6899 merged
Apr 25, 2024 -
llama : synchronize before get/set session data
#6911 merged
Apr 25, 2024 -
update model list
#6908 merged
Apr 25, 2024 -
llama : check that all the tensor data is in the model file
#6885 merged
Apr 25, 2024 -
ggml : fix redefinition of vaddvq_f32 for 32-bit ARM
#6906 merged
Apr 25, 2024 -
clip : rename lerp function to avoid conflict
#6894 merged
Apr 25, 2024 -
ggml : fix MIN / MAX macros
#6904 merged
Apr 25, 2024 -
tests : minor bash stuff
#6902 merged
Apr 25, 2024 -
Implement '--keep-split' to quantize model into several shards
#6688 merged
Apr 25, 2024 -
README: add graphic for matrix multiplication
#6881 merged
Apr 24, 2024 -
add llama_get_pooling_type function
#6862 merged
Apr 24, 2024 -
Server front-end: do not apply Markdown formatting in code sections
#6850 merged
Apr 24, 2024 -
Fix: Revert showing control tokens by default for server OpenAI Chat completions
#6860 merged
Apr 24, 2024 -
Server: fix seed for multiple slots
#6835 merged
Apr 24, 2024 -
ggml : move 32-bit arm compat in ggml-impl.h
#6865 merged
Apr 24, 2024 -
Add phi 3 chat template
#6857 merged
Apr 24, 2024 -
add support of codeqwen due to tokenizer
#6707 merged
Apr 24, 2024 -
add phi3 support
#6852 merged
Apr 24, 2024
24 Pull requests opened by 22 people
-
convert : fix set_vocab_sentencepiece
#6866 opened
Apr 24, 2024 -
ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend
#6869 opened
Apr 24, 2024 -
Clamp out of range values in K quantizer
#6888 opened
Apr 25, 2024 -
AVX Q4_0 and Q8_0 sgemm
#6891 opened
Apr 25, 2024 -
Fix CORS for /health endpoint
#6892 opened
Apr 25, 2024 -
Properly set `clamp_qkv` value in OLMo conversion
#6910 opened
Apr 25, 2024 -
Draft Idea... CPU Inference... This seems to perform better?
#6915 opened
Apr 26, 2024 -
support MiniCPM-V-2
#6919 opened
Apr 26, 2024 -
fixed off by one error when context shifting in main.cpp example
#6921 opened
Apr 26, 2024 -
main : don't print special tokens with --grammar
#6923 opened
Apr 26, 2024 -
Fix clip build on windows + clang
#6934 opened
Apr 26, 2024 -
perplexity: more statistics, added documentation
#6936 opened
Apr 26, 2024 -
Implemented basic interface for llamacheck and link to weights, adapt…
#6940 opened
Apr 27, 2024 -
Updated server_queue to delete tasks from queue when server is shutdown. Feature Request #6421
#6941 opened
Apr 27, 2024 -
Option to split during conversion
#6942 opened
Apr 27, 2024 -
Server: add test for num slots, fails on master
#6950 opened
Apr 27, 2024 -
move ndk code to a new library
#6951 opened
Apr 27, 2024 -
server: avoid breaking KV cache when prompt >= n_ctx
#6958 opened
Apr 28, 2024 -
llama3 custom regex split
#6965 opened
Apr 28, 2024 -
Attempt at OpenElm
#6986 opened
Apr 29, 2024 -
new tokenizer-verifier tool to check gguf tokenizer parameters
#6988 opened
Apr 29, 2024 -
add chatglm3-6b model support [help wanted]
#6999 opened
Apr 30, 2024 -
Fix flash attention for ROCm
#7011 opened
Apr 30, 2024 -
Update Server's README with undocumented options for RoPE, YaRN, and KV cache quantization
#7013 opened
Apr 30, 2024
70 Issues closed by 23 people
-
nix build fails on apple silicon
#7009 closed
Apr 30, 2024 -
Current state Llama3 & Mixtral 8x22b conversion
#7001 closed
Apr 30, 2024 -
Can't offload layers to GPU
#6261 closed
Apr 30, 2024 -
llama : revisit using flash attention for prompt processing (a.k.a. prefil) + GPU implementation
#3365 closed
Apr 30, 2024 -
quantize.exe Bug(s) --token-embedding-type / --output-tensor-type and - Docu? Advanced Usage Context ?
#6776 closed
Apr 30, 2024 -
Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16
#5234 closed
Apr 30, 2024 -
Problem connecting VSCode (Continue) to the server LlamaCpp
#5406 closed
Apr 30, 2024 -
Gemma models quantized using llamacpp not working in lm studio
#5706 closed
Apr 30, 2024 -
Add support for Vary-toy
#6054 closed
Apr 30, 2024 -
How to Modify Hugging Face's Language Models?
#6057 closed
Apr 30, 2024 -
GGML_ASSERT: ggml-quants.c:11615: besti1 >= 0 && besti2 >= 0 && best_shift != 0
#6067 closed
Apr 30, 2024 -
KeyError: ('torch.nn.modules.sparse', 'Embedding')
#6071 closed
Apr 30, 2024 -
Metal kernel mv_f16_f32_l4 performance issue for long contexts, too many threads
#6089 closed
Apr 30, 2024 -
-mu without -m is... tricky
#6887 closed
Apr 29, 2024 -
BPE Tokenizer: Multiple newlines doesn't merge into a single token
#6809 closed
Apr 29, 2024 -
Garbled output on Windows 11 Arm due to typo in ggml-impl.h file
#6976 closed
Apr 29, 2024 -
Help: Batching the same request?
#6978 closed
Apr 29, 2024 -
Command R Plus crashed on large context (~40K) with CUDA
#6948 closed
Apr 29, 2024 -
gguf : enforce that tensor names are unique
#6836 closed
Apr 28, 2024 -
support for openelm apple
#6960 closed
Apr 28, 2024 -
[SYCL] fail to load llama.dll compiled by icx with -DBUILD_SHARED_LIBS=on flag on Windows
#6309 closed
Apr 28, 2024 -
SYCL Hangs after ggml_backend_sycl_host_buffer_type
#6943 closed
Apr 28, 2024 -
Does `"add_bos_token": false` in `tokenizer_config.json` cause no BOS to get output?
#6947 closed
Apr 28, 2024 -
GPU NOT used during "normal generation" when ONE LAYER offloaded (But GPU used in prompt evaluation)
#3860 closed
Apr 28, 2024 -
Save Chat History into New Prompts
#3985 closed
Apr 28, 2024 -
Constrained decoding with BNF grammar fails to work with some tokens
#5599 closed
Apr 28, 2024 -
tips for GPU op profile
#5865 closed
Apr 28, 2024 -
Quantazation Questions - Odd bits
#6011 closed
Apr 28, 2024 -
AVX512 support
#6024 closed
Apr 28, 2024 -
Add Ascend NPU as a new backend
#6034 closed
Apr 28, 2024 -
[SYCL] Failed when running llama.cpp on ARC770
#6036 closed
Apr 28, 2024 -
-DCMAKE_BUILD_TYPE=Debug Does not work!
#6049 closed
Apr 28, 2024 -
MSVC Main exits immediately on model load
#6932 closed
Apr 27, 2024 -
Centos9 编译提示unsupported instruction `vpdpbusd'
#5316 closed
Apr 27, 2024 -
Vocab problems converting QWEN 110b with convert.py
#6938 closed
Apr 27, 2024 -
GGUF endianness cannot be determined from GGUF itself
#3957 closed
Apr 27, 2024 -
Using convert.py with a fine tuned phi-2
#5009 closed
Apr 27, 2024 -
Error when converting safe tensors to gguf
#5559 closed
Apr 27, 2024 -
segmentation fault with on mac M3 Pro with llama-7b.Q4_0.gguf
#5983 closed
Apr 27, 2024 -
convert.py incompatible with most new models, including salesforce/codegen models
#6030 closed
Apr 27, 2024 -
GGUF writer reverses array (tensor) dimensions
#6040 closed
Apr 27, 2024 -
`quantize`: add imatrix and dataset metadata in GGUF
#6656 closed
Apr 26, 2024 -
Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely?
#6841 closed
Apr 26, 2024 -
main: crashing upon loading model since commit 83b72cb0 - Windows MSVC + CUDA
#6931 closed
Apr 26, 2024 -
`llama_apply_lora_from_file_internal: bad file magic` when trying to load lora from `finetune`
#6926 closed
Apr 26, 2024 -
server: index.html issue
#5788 closed
Apr 26, 2024 -
Hope Support Emebdding Model Architectures: JinaBertModel
#6005 closed
Apr 26, 2024 -
running clblas (opencl) slow speed on rk3588
#6008 closed
Apr 26, 2024 -
Error while building for hipBLAS on Windows 11
#6514 closed
Apr 25, 2024 -
How to fine tune LLaMA 3 in Google Colab (Pro)?
#6800 closed
Apr 25, 2024 -
Truncated model files can cause llama.cpp to crash when using mmap
#6774 closed
Apr 25, 2024 -
Re-quantization of a split gguf file produces "invalid split file"
#6548 closed
Apr 25, 2024 -
Vulkan generated targets and shader organization
#5356 closed
Apr 25, 2024 -
Low performance with Sycl Backend
#5480 closed
Apr 25, 2024 -
if use MoE + Ternary, what's happen?
#5870 closed
Apr 25, 2024 -
Fill in the token usage information in the usage object, and output it at the 'v1/embeddings' endpoint.
#5987 closed
Apr 25, 2024 -
Design2Code
#5989 closed
Apr 25, 2024 -
CUDA 12.4 released incompletely.
#5998 closed
Apr 25, 2024 -
[Old models] Gibberish text at the end of chat/completion - server
#6847 closed
Apr 25, 2024 -
Api llama_tokenize function problem
#6854 closed
Apr 24, 2024 -
The model began to add </s > to each main and server response
#6872 closed
Apr 24, 2024 -
OpenAI-Compatible Chat Completions API Endpoint Responses include EOS / stop tokens
#6859 closed
Apr 24, 2024 -
server: recieving <|im_end|> in all responses of llama 3
#6873 closed
Apr 24, 2024 -
key file
#5972 closed
Apr 24, 2024 -
how to set this chat_template in server?
#5974 closed
Apr 24, 2024
45 Issues opened by 45 people
-
Llama 3 - Regression with apostrophes
#7006 opened
Apr 30, 2024 -
server: self context extent broken
#7005 opened
Apr 30, 2024 -
Pythonic way for quantization
#7003 opened
Apr 30, 2024 -
LLamaCpp embedding returns an empty array for long text(While HuggingFaceEmbeddings works fine)
#6996 opened
Apr 30, 2024 -
Segmentation fault on finetune with -ngl > 0, Debian 12 stable
#6994 opened
Apr 30, 2024 -
About dialogue training mode
#6993 opened
Apr 30, 2024 -
Intel(R) Arc(TM) A770M Setting as default instead of Iris Xe Graphics
#6991 opened
Apr 29, 2024 -
main Segfault using cmake & -march=armv8.4a flag
#6990 opened
Apr 29, 2024 -
[feature] Support inference on raw text input in main and server.
#6982 opened
Apr 29, 2024 -
Tokenizers questions and ... proposals?
#6980 opened
Apr 29, 2024 -
Fast request make the server stuck
#6979 opened
Apr 29, 2024 -
Metal doesn't work in x86 macos
#6975 opened
Apr 29, 2024 -
cudaDeviceReset() not working?
#6973 opened
Apr 29, 2024 -
Windows cmake failed compile for rocm
#6972 opened
Apr 29, 2024 -
llava-cli fails to build on M2 due to symbol(s) not found for architecture arm64
#6963 opened
Apr 28, 2024 -
llama_decode return logbits whose value are all nan
#6957 opened
Apr 28, 2024 -
Regression/bug in Windows on ARM64 build between #7593639c and #4dba7e81
#6954 opened
Apr 28, 2024 -
xcrun: error: unable to find utility "metal", not a developer tool or in PATH in B2479
#6946 opened
Apr 27, 2024 -
failed to quantize: ios_base::clear: unspecified iostream_category error
#6945 opened
Apr 27, 2024 -
llava-cli outputs gibberish
#6944 opened
Apr 27, 2024 -
Help test CPUSet patch for Windows and Linux
#6927 opened
Apr 26, 2024 -
Why does every answer end with <|img end|>?
#6924 opened
Apr 26, 2024 -
Something might be wrong with either llama.cpp or the Llama 3 GGUFs
#6914 opened
Apr 25, 2024 -
ggml : unified CMake build
#6913 opened
Apr 25, 2024 -
main exe with deepseek-coder-1.3b-instruct.Q8_0.gguf not stopping correctly
#6912 opened
Apr 25, 2024 -
Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago
#6909 opened
Apr 25, 2024 -
ggml.c:2284:43: error: use of undeclared identifier 'cpu_set_t'
#6907 opened
Apr 25, 2024 -
server: phi-3 end token not handled?
#6903 opened
Apr 25, 2024 -
offload_kqv ONLY supported by python version?
#6900 opened
Apr 25, 2024 -
output from server service is not proper and there are many duplicate words
#6895 opened
Apr 25, 2024 -
Fix CORS in `/health` endpoint
#6893 opened
Apr 25, 2024 -
Add cmake option to build without CUDA VMM
#6889 opened
Apr 25, 2024 -
Error Building llama.cpp on Intel MacBook Pro with Metal
#6886 opened
Apr 24, 2024 -
[Performance] Llava-cli offloading image encoding to cuda
#6883 opened
Apr 24, 2024 -
Generate control vector using llama.cpp
#6880 opened
Apr 24, 2024 -
Add support to ArcticForCausalLM
#6877 opened
Apr 24, 2024 -
Why Ollama is using VRAM Only insted of VRAM + RAM?
#6876 opened
Apr 24, 2024 -
Getting "Bad CPU type in executable" on macos-x64 build
#6875 opened
Apr 24, 2024 -
Vulkan: possible NaN propagation on llama-3 8B (more testing required)
#6874 opened
Apr 24, 2024 -
Support for OpenELM of Apple
#6868 opened
Apr 24, 2024 -
Support for Functionary-v2 chat template
#6867 opened
Apr 24, 2024 -
Implement 4-bit quantized KV Cache for faster performance and to enable longer context
#6863 opened
Apr 24, 2024 -
crash on llama_new_context_with_model: failed assertion `Buffer Validation
#6861 opened
Apr 24, 2024
102 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Introduction of CUDA Graphs to LLama.cpp
#6766 commented on
Apr 30, 2024 • 44 new comments -
Support for Phi-3 models
#6849 commented on
Apr 30, 2024 • 27 new comments -
added implementation of DRY sampler
#6839 commented on
Apr 29, 2024 • 25 new comments -
CPUSet support for Windows and Linux
#6832 commented on
Apr 29, 2024 • 22 new comments -
ggml : add RPC backend
#6829 commented on
Apr 30, 2024 • 21 new comments -
Server: enable lookup decoding
#6828 commented on
Apr 29, 2024 • 16 new comments -
grammars: x{min,max} repetition operator
#6640 commented on
Apr 30, 2024 • 14 new comments -
`grammars`: cache decoded token codepoints & early exit in candidates rejection (faster sampling)
#6811 commented on
Apr 30, 2024 • 10 new comments -
Custom quantization schemes
#6844 commented on
Apr 26, 2024 • 8 new comments -
split: include the option in ./convert.py and quantize
#6260 commented on
Apr 27, 2024 • 7 new comments -
Added server example themes support with two sample themes and a favicon.
#6848 commented on
Apr 29, 2024 • 7 new comments -
llamafile : improve moe prompt eval speed on cpu
#6840 commented on
Apr 26, 2024 • 7 new comments -
llama : add Deepseek support #5981
#6252 commented on
Apr 26, 2024 • 6 new comments -
convert.py: add python logging instead of print()
#6511 commented on
Apr 30, 2024 • 6 new comments -
main chat using simple json based template which drives in-prefix, in-suffix and reverse-prompt and a generic chat-apply-template helper driven by flags from same json
#6834 commented on
Apr 30, 2024 • 6 new comments -
Introduce bfloat16 support
#6412 commented on
Apr 29, 2024 • 5 new comments -
feat: add potential to run Jina Embeddings architecture
#6826 commented on
Apr 30, 2024 • 5 new comments -
kubernetes example
#6546 commented on
Apr 27, 2024 • 4 new comments -
[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?
#1499 commented on
Apr 25, 2024 • 4 new comments -
Refactor convert.py and add support for Metas official Llama 3 model
#6819 commented on
Apr 25, 2024 • 4 new comments -
[Feature Request] Dynamic temperature sampling for better coherence / creativity
#3483 commented on
Apr 27, 2024 • 3 new comments -
off topic: linking two Mac Studio together to fit larger models
#6390 commented on
Apr 29, 2024 • 3 new comments -
Performance decreated between tag b1500 and b2581 on Windows ARM64 PC
#6417 commented on
Apr 29, 2024 • 3 new comments -
Support CoreML like whisper.cpp?
#1714 commented on
Apr 25, 2024 • 3 new comments -
When I tried to convert the Qwen-VL-chat model to gguf, an error occurred: `Can not map tensor ‘transformer.visual.positional_embedding’. What is the reason?
#5331 commented on
Apr 28, 2024 • 3 new comments -
Refactor chat template API
#6822 commented on
Apr 24, 2024 • 3 new comments -
Support speculative decoding in `server` example
#5877 commented on
Apr 30, 2024 • 3 new comments -
Implement (properly) different chat templates in main.cpp
#6391 commented on
Apr 24, 2024 • 3 new comments -
Subtle Vulkan shader compilation bug when running on Adreno GPUs (Samsung Galaxy S23 Ultra)
#5186 commented on
Apr 29, 2024 • 2 new comments -
[CANN] Add Ascend NPU backend (Part 1)
#6035 commented on
Apr 29, 2024 • 2 new comments -
Server CUDA Infill Segmentation Fault
#6672 commented on
Apr 30, 2024 • 2 new comments -
Python 3.12 support
#6422 commented on
Apr 27, 2024 • 2 new comments -
Windows ROCm Build.
#2843 commented on
Apr 30, 2024 • 2 new comments -
Server: add function calling API
#5588 commented on
Apr 30, 2024 • 2 new comments -
Support for InternVL
#6803 commented on
Apr 27, 2024 • 2 new comments -
server: avoid full prompt eval when 'prompt >= ctx'
#6855 commented on
Apr 28, 2024 • 2 new comments -
common : fix parallel shard download interleaving output
#6831 commented on
Apr 29, 2024 • 2 new comments -
vulkan backend failed to load models vk::Device::createComputePipeline: ErrorUnknown
#6843 commented on
Apr 26, 2024 • 2 new comments -
can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft'
#6690 commented on
Apr 25, 2024 • 2 new comments -
MobileVLM convert.py error
#6087 commented on
Apr 29, 2024 • 1 new comment -
Running convert fails with BadZipFile (Bad CRC-32)
#4365 commented on
Apr 29, 2024 • 1 new comment -
llama : add T5 (encoder-decoder) support
#5763 commented on
Apr 30, 2024 • 1 new comment -
llava-cli process_prompt bug
#6823 commented on
Apr 24, 2024 • 1 new comment -
Server: possibility of customizable chat template?
#5922 commented on
Apr 28, 2024 • 1 new comment -
The tensor shape is different during gemma-2b model conversion, resulting in loading errors during inference. Repeat python convert.py ./models/gemma-2b After multiple conversions, the tensor shape is still different, resulting in loading errors during inference.
#6437 commented on
Apr 28, 2024 • 1 new comment -
Add support for OPTForCausalLM
#6473 commented on
Apr 24, 2024 • 1 new comment -
error
#6601 commented on
Apr 24, 2024 • 1 new comment -
Implement ANPD (3x speedup, lossless)
#6813 commented on
Apr 25, 2024 • 1 new comment -
Added dependency needed for numa in numactl mode
#6784 commented on
Apr 24, 2024 • 1 new comment -
qwen 1.5 Beta 1.8B output incoherently
#5459 commented on
Apr 25, 2024 • 1 new comment -
wrong number of tensors for AdaptLLM/medicine-chat
#6490 commented on
Apr 25, 2024 • 1 new comment -
truly opensource model called olmo
#6712 commented on
Apr 25, 2024 • 1 new comment -
For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER_ARCH
#5976 commented on
Apr 25, 2024 • 1 new comment -
Support for Alibaba-NLP/gte-large-en-v1.5 Embedding Model
#6821 commented on
Apr 27, 2024 • 1 new comment -
How can i get log probs in create_chat_completions in llama-cpp , I'm using logprobs=True as an attribute but still not getting Log Probabilities.
#6423 commented on
Apr 26, 2024 • 1 new comment -
Multi-GPU support for AMD?
#3051 commented on
Apr 27, 2024 • 1 new comment -
Temperature slider not working
#6676 commented on
Apr 26, 2024 • 1 new comment -
New optimization from NVIDIA to use CUDA Graphs in llama.cpp
#6763 commented on
Apr 26, 2024 • 1 new comment -
adding support for linux binaries
#5106 commented on
Apr 30, 2024 • 1 new comment -
[WIP] agent example (w/ sandboxable Tools!) & improved OAI compatibility layer (in Python)
#6389 commented on
Apr 30, 2024 • 1 new comment -
Support for 2-bit Quantized Llama-2-7b-chat-hf_2bitgs8_hqq Model
#6368 commented on
Apr 30, 2024 • 0 new comments -
[User] Insert summary of your issue or enhancement..
#1471 commented on
Apr 30, 2024 • 0 new comments -
llama : add phixtral support
#4912 commented on
Apr 29, 2024 • 0 new comments -
llama : compute BERT graph with F16 K, V
#5891 commented on
Apr 29, 2024 • 0 new comments -
Make tokenize CLI tool have nicer command line arguments.
#6188 commented on
Apr 25, 2024 • 0 new comments -
[SYCL] refactor
#6408 commented on
Apr 30, 2024 • 0 new comments -
Server: Unix Socket Support
#6413 commented on
Apr 23, 2024 • 0 new comments -
The MLX Challenge
#6539 commented on
Apr 24, 2024 • 0 new comments -
How to activate BLAS?
#627 commented on
Apr 26, 2024 • 0 new comments -
Bring back multimodal support for server
#6168 commented on
Apr 26, 2024 • 0 new comments -
Add a new `llama_load_model_from_buffer()` method to compliment `llama_load_model_from_file()`
#6311 commented on
Apr 26, 2024 • 0 new comments -
Adding MistralForCausalLM architecture to convert-hf-to-gguf.py
#4463 commented on
Apr 25, 2024 • 0 new comments -
Add support for Jais architecture, both Jais-13B and Jais-30B shares the same architecture.
#6227 commented on
Apr 25, 2024 • 0 new comments -
I have a specific question regarding qwen1.8b and qwen1.8b-chat, for which I am eagerly seeking your assistance.
#6228 commented on
Apr 25, 2024 • 0 new comments -
Phind-CodeLlama-34b-v2
#6306 commented on
Apr 25, 2024 • 0 new comments -
CUDA error: invalid device function when compiling and running for amd gfx 1032
#4762 commented on
Apr 24, 2024 • 0 new comments -
Finetune from text
#5170 commented on
Apr 24, 2024 • 0 new comments -
Excessively slow prompt processing time with 70B partially offloaded in SYCL
#5272 commented on
Apr 24, 2024 • 0 new comments -
New IQ1_S somehow much worse than previous version
#5996 commented on
Apr 24, 2024 • 0 new comments -
Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model)
#6007 commented on
Apr 24, 2024 • 0 new comments -
GGML_ASSERT: ../llama.cpp/ggml-quants.c:10340: grid_index >= 0
#6018 commented on
Apr 24, 2024 • 0 new comments -
GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0
#6132 commented on
Apr 24, 2024 • 0 new comments -
MiniCPM Chat Template
#6236 commented on
Apr 24, 2024 • 0 new comments -
Need help in extracting logits (token + probabilities)!
#6285 commented on
Apr 24, 2024 • 0 new comments -
1-2 Tesla P40 plus a powerful graphics card, does it make sense?
#6386 commented on
Apr 30, 2024 • 0 new comments -
Kompute backend: add support for Vulkan devices that do not have storageBuffer8BitAccess
#6401 commented on
Apr 30, 2024 • 0 new comments -
Lack of documentation regarding RoPE scaling
#2402 commented on
Apr 29, 2024 • 0 new comments -
Incomplete instruction for https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md#intel-gpu
#6318 commented on
Apr 29, 2024 • 0 new comments -
Add full support for OpenCL
#6362 commented on
Apr 29, 2024 • 0 new comments -
POST to server takes forever
#2572 commented on
Apr 28, 2024 • 0 new comments -
[User] AMD GPU slower than CPU
#3422 commented on
Apr 28, 2024 • 0 new comments -
When I used the tool to quantify the chatglm model, the following error was reported
#3808 commented on
Apr 28, 2024 • 0 new comments -
corruption on slot context shift
#6002 commented on
Apr 28, 2024 • 0 new comments -
Metal failure after early March versions of server startup loading the model
#6020 commented on
Apr 28, 2024 • 0 new comments -
Working Fine-Tune Example?
#6361 commented on
Apr 28, 2024 • 0 new comments -
May we remove the big loop which runs > 10000 times everytime.
#6375 commented on
Apr 28, 2024 • 0 new comments -
error: implicit declaration of function ‘getcpu’
#5537 commented on
Apr 27, 2024 • 0 new comments -
Mixtral 8x7b QLora not able to convert to gguf after training
#5905 commented on
Apr 27, 2024 • 0 new comments -
“'token_embd.weight' has wrong shape” when loading deepseek-coder-1.3b-base.Q8_0.gguf
#5910 commented on
Apr 27, 2024 • 0 new comments -
Using OpenCL on Adreno & Mali GPUs is slower than CPU
#5965 commented on
Apr 27, 2024 • 0 new comments