You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue is due to the numerical precision differences between Flash Attention (or Torch Attention) and the Triton version of Flash Attention. You can test using attn_type= "minference_with_dense". If the generation results are similar, then this is likely the cause.
Describe the issue
attn_type= "hf":
attn_type= "minference":
The result is unreadable.
MInference==0.1.4.post4
model_name="Qwen/Qwen2-7B-Instruct"
How can I adjust params to get the same result?
The text was updated successfully, but these errors were encountered: