quantize: be able to explicitly specify quantization type of output and token embedding tensors #6239

ikawrakow · 2024-03-22T14:38:24Z

Two new command line options for the quantize tool:

--output-tensor-type ggml_type specifies the type of the output tensor
--token-embedding-type ggml_type specifies the type of the token embedding tensor

The ggml_type argument is the string that ggml uses to identify the various possible types (q4_0, q4_1, ..., f16, etc., see type_traits in ggml.c.

Can be useful e.g. when comparing with quantization papers (where researchers tend to not worry about these tensors and just leave them at f16), or, if one wants to fine-tune the size vs quality tradeoff (particularly relevant for very low-bit quantization and/or small models). I guess, it could be useful for Gemma as well.

…anov#6239) * quantize: be able to specify the output tensor type * quantize: be able to specify the token embedding tensor type --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Kawrakow added 2 commits March 22, 2024 16:11

quantize: be able to specify the output tensor type

7883796

quantize: be able to specify the token embedding tensor type

0e826d1

slaren approved these changes Mar 22, 2024

View reviewed changes

ggerganov approved these changes Mar 22, 2024

View reviewed changes

ggerganov merged commit 1d0331c into master Mar 22, 2024
58 checks passed

david565656 mentioned this pull request Apr 20, 2024

quantize.exe Bug(s) --token-embedding-type / --output-tensor-type and - Docu? Advanced Usage Context ? #6776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantize: be able to explicitly specify quantization type of output and token embedding tensors #6239

quantize: be able to explicitly specify quantization type of output and token embedding tensors #6239

ikawrakow commented Mar 22, 2024

quantize: be able to explicitly specify quantization type of output and token embedding tensors #6239

quantize: be able to explicitly specify quantization type of output and token embedding tensors #6239

Conversation

ikawrakow commented Mar 22, 2024