Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_model_loader: support multiple split/shard GGUFs #6187

Merged
merged 28 commits into from
Mar 22, 2024
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7c64fef
split: support in llama_model_loader
phymbert Mar 19, 2024
b8feff4
Avoir copying the entire vector
phymbert Mar 21, 2024
18ff6ca
split: move llama_tensor_offset to llama_model_loader
phymbert Mar 21, 2024
60a87ae
Merge branch 'master' into hp/split/load-model
phymbert Mar 21, 2024
1892ae7
llama_model_loader: PR feedbacks:
phymbert Mar 21, 2024
00381b0
avoid copying the entire vector
phymbert Mar 21, 2024
c34a5de
Simplify this by making these optional, switch some layer creation te…
phymbert Mar 21, 2024
1c931f3
Handle optional tensors
phymbert Mar 21, 2024
d8b567d
llama_model_loader: fail if backend cannot allocate buffer
phymbert Mar 21, 2024
02020b0
fix mmap buffer management
slaren Mar 21, 2024
078a1ac
llama_model_loader: map file to backend buffer if the allocation succ…
phymbert Mar 21, 2024
69bdee9
llama_model_loader: only map tensors included in the context
phymbert Mar 21, 2024
6df9757
llama_model_loader: minor, use same variable name for consistency, fi…
phymbert Mar 21, 2024
f9a2973
llama_model_loader: fail if any of backend buffer cannot be allocated
phymbert Mar 21, 2024
0fd652e
spacing
phymbert Mar 21, 2024
1a179bf
fix loop over pointer
phymbert Mar 21, 2024
7cbe1ea
llama_model_loader: if n_tensors declared not equals to loaded tensor…
phymbert Mar 22, 2024
9940df4
llama_model_loader: ensure mappings vector has the expected size
phymbert Mar 22, 2024
ec372c6
llama_model_loader: use at instead of operator[] if this should neve…
phymbert Mar 22, 2024
a9e88c6
llama_model_loader: immediately add the backend buffer to the model b…
phymbert Mar 22, 2024
b19af36
llama_model_loader: be sure the model mappings has enough capacity be…
phymbert Mar 22, 2024
4c04400
llama_model_loader: fix map -> unordered map
phymbert Mar 22, 2024
e474e45
llama_split_prefix: use a clearer version, not pass split path len bu…
phymbert Mar 22, 2024
8326607
llama : minor
ggerganov Mar 22, 2024
dbc35ac
llama : introduce some typedef helpers
ggerganov Mar 22, 2024
f616b38
docs: add model shard in hot topic
phymbert Mar 22, 2024
1f38759
llama_model_loader: put mapping in a unique_ptr from the moment it is…
phymbert Mar 22, 2024
764c7af
fix llama_split_prefix
ngxson Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Prev Previous commit
Next Next commit
fix loop over pointer
Co-authored-by: slaren <slarengh@gmail.com>
  • Loading branch information
phymbert and slaren committed Mar 21, 2024
commit 1a179bfc4e6079079e6ab7dbc7d1ddb8c5d74d5b
2 changes: 1 addition & 1 deletion llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3040,7 +3040,7 @@ struct llama_model_loader {
if (meta) {
gguf_free(meta);
}
for (auto & ctx : contexts) {
for (auto * ctx : contexts) {
ggml_free(ctx);
}
}
Expand Down