Modifying "kv_cache_dtype" now conforms to Q4/Q4f16. The original model's dtype is now "float16".
#4
by
willopcbeta - opened
Fixed a bug in Q4 that caused issues with the dialogue.
OrtRun(). ERROR_CODE: 2, ERROR_MESSAGE: Unexpected input data type. Actual: (tensor(float)) , expected: (tensor(float16))
willopcbeta changed pull request status to
merged