Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: attn_type="minference" and attn_type= "hf" got different result #52

Open
qiling1345 opened this issue Jul 21, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@qiling1345
Copy link

qiling1345 commented Jul 21, 2024

Describe the issue

attn_type= "hf":

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.33it/s]
Prompt: '请介绍下你自己吧。', Generated text: ' 我是一个基于大模型的AI助手,能够回答各种问题、提供信息和帮助解决问题。我被设计成能够理解和生成自然语言,以便与人类进行有效的沟通。我可以帮助你查找信息、提供建议、解答疑问、进行翻译、编写文本、总结文本、分析情绪、提供建议、开发算法、编写代码等等。无论你需要什么帮助,只要在合理的范围内,我都会尽力为你提供支持。请告诉我你需要什么样的帮助,我会尽我所能来协助你。'

attn_type= "minference":

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.97it/s]
Patched model for minference..
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
Prompt: '请介绍下你自己吧。', Generated text: '这个题目的, 1.年则表示为了解,,将实现提供一个方法表示,提供实现 \\确保数据存储功能设计的20突如.com.cn/result.html/fileadmin/include/def:http::}在数学上,做人59系统,,在实际应用中, 0,将创建提供 \\  20, \\ \\    \\ = \\ \\ \\ \\  1 \\ \\ \\ \\ Initialise\\; \\ \\ \\ \\ \\ \\ 1 ( \\ directly  könnt = be \\ 4 \\ \\ \\ \\ \\ \\  \\ \\ 2 ( 2 \\ = \\   0为 könnt \\  \\  \\ \\ ( \\  1:0 \\ \\ \\ \\ ( \\  \\ ( \\ \\  \\ \\ 200 \\ 0 \\ \\ \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\  -scalable \\ \\ \\ \\ \\ \\ \\0 könnt \\ \\ \\ \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ ( könnt( \\ \\ \\ \\ (  \\ \\ \\ \\    \\ \\ /ok \\2 \\ \\ \\ \\ \\ \\ \\ \\ Initialise \\ \\ \\ \\ \\ \\ \\  Initialise \\ \\ \\ \\ \\ \\ 0 \\ \\l \\ \\ \\ \\ \\ 0 浇 being \\ \\   \\ \\ \\ \\ 11  \\ \\ \\ & ( könnt,  \\ \\ n  \\纳斯\n\n2 Is自发\n0 the自发\n\n- fool\n\n  \\实现 \\实现化的 \\  \\ \\ \\ { \\ @"\n = -scalable\\\\纳斯的 \\ elementary \\ \\ \\ \\ \\ \\ Initialise könnt \\ \\ \\ \\ könnt \\ \\自发 \\ \\\n\n0说 \\ \\ (1 Initialise0n自发n0飏为\n8 \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\20 \\ \\旋 \\ \\ \\\\ könnt \\ \\ \\ 0 \\ \\0 \\  Initialise 0 0 \\ garn  \\ \\ \\2 könnt = könnt为 \\ \\ \\ one   \\ \\ \\ \\ \\ \\ 蒴 könnt  könnted \\ \\    (() \\\n\n \\ \\ \\ \\ \\ \\ 0 könnt = könnt =0 \\ \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\  \\  \\0 könnt \\ not könnt() \\ and 0 \\ the könnt \\以下0 \\ \\  \\实现 \\ \\ \\ \\ könnt \\ be be be \\ \\ \\  \\ \\ \\ 2-scalable0 \\umber \\ \\ 00 \\  \\ \\ \\0 könnt \\ \\\\  \\ \\ \\  \\ \\ \\ \\ \\ \\0 könnt \\ \\ \\ \\  \\ \\ \\ \\ \\  \\  \\ \\ \\ ensuring \\ \\ \\ \\ \\ \\ -scalable  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\000 \\  \\ Ens \\ \\ \\ \\ \\ \\ \\  \\ \\ 00  \\ \\7 \\ \\ \\ 0 \\ \\ \\ 00 \\  \\ \\ \\   Initialise Initialise ( \\ \\ \\ \\ \\ \\  \\ \\ \\ \\ \\00  \\实现 \\芯 \\  \\ \\ \\ /\\0 \\ \\ \\实现 \\ \\ \\ \\ \\0-scalable \\ \\ \\ \\ \\ \\0 \\ \\ \\ \\ \\ \\ \\ Initialise \\ \\[sizeof蒴0为  \\ one2为\n \\ \\为 being being "id being be时实现为确保 being being) check to \\ \\能 \\ being \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\0 könnt to könnt \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ ( 00 \\ \\  \\ \\ Initialise Initialise-scalable-scalable \\[sizeof: \\riger = \\ \\ \\ \n\n \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ 00 \\ote \\ garn \\ \\ \\ \\ \\ \\0 \\ \\ \\-scalable Initialise  \\ \\ \\ könnt \\ Initialise-scalable \\ \\ \\ \\ \\ \\ \\ \\ 0  \\  \\ \\ \\ \\  \\ \\ \\ \\ \\0-scalable \\ \\2 (00 \\2 \\ \\ \\ \\ \\ \\0 Initialise Initialise-scalable \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ Initialise-scalable00 \\ \\ \\, könnt \\0 \\ \\ \\  Initialise ( \\ \\   为的人 \\将 \\ \\0 \\ \\ \\ ensuring \\ \\  Initialise ( \\ \\和 Initialise0 \\ \\ \\ \\ \\ \\ \\ \\ \\ \\实现 könnt 0 \\ \\ \\ \\ \\ \\ \\ \\ 200 könnt \\ \\ \\ \\ \\ \\0-scalable \\ \\ \\ \\ \\ \\  \\ \\ \\ \\ \\ \\  \\ \\ \\  \\ \\  \\ 0  0 \\ \\ 0 \\表示 \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ 2 könnt \\  \\ \\ \\ \\ \\ Initialise könnt könnt könnt-scalable ( \\ 2 \\ \\ \\ '

The result is unreadable.

MInference==0.1.4.post4
model_name="Qwen/Qwen2-7B-Instruct"
How can I adjust params to get the same result?

@qiling1345 qiling1345 added the question Further information is requested label Jul 21, 2024
@iofu728 iofu728 self-assigned this Jul 23, 2024
@iofu728
Copy link
Contributor

iofu728 commented Jul 23, 2024

Hi @qiling1345, thanks for your feedback.

The issue is due to the numerical precision differences between Flash Attention (or Torch Attention) and the Triton version of Flash Attention. You can test using attn_type= "minference_with_dense". If the generation results are similar, then this is likely the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants