Modifying "kv_cache_dtype" now conforms to Q4/Q4f16. The original model's dtype is now "float16".

#4

Fixed a bug in Q4 that caused issues with the dialogue.
OrtRun(). ERROR_CODE: 2, ERROR_MESSAGE: Unexpected input data type. Actual: (tensor(float)) , expected: (tensor(float16))

willopcbeta changed pull request status to merged

Sign up or log in to comment