Force 4-bit weight quantization only? #6150

wilderfield · 2024-03-19T01:35:02Z

wilderfield
Mar 19, 2024

I notice the Q4_K_M quantized llama2 models have some weight tensors that are 4-bit, and some that are 6-bit.

Is there a way to force 4-bit only?

Answered by slaren

./quantize --pure should do it.

slaren · 2024-03-19T01:39:41Z

./quantize --pure should do it.

0 replies