[Question]: attn_type="minference" and attn_type= "hf" got different result #52

qiling1345 · 2024-07-21T06:47:14Z

Describe the issue

attn_type= "hf"：

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.33it/s]
Prompt: '请介绍下你自己吧。', Generated text: ' 我是一个基于大模型的AI助手，能够回答各种问题、提供信息和帮助解决问题。我被设计成能够理解和生成自然语言，以便与人类进行有效的沟通。我可以帮助你查找信息、提供建议、解答疑问、进行翻译、编写文本、总结文本、分析情绪、提供建议、开发算法、编写代码等等。无论你需要什么帮助，只要在合理的范围内，我都会尽力为你提供支持。请告诉我你需要什么样的帮助，我会尽我所能来协助你。'

attn_type= "minference"：

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.97it/s]
Patched model for minference..
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
Prompt: '请介绍下你自己吧。', Generated text: '这个题目的， 1.年则表示为了解，，将实现提供一个方法表示，提供实现 \\确保数据存储功能设计的20突如.com.cn/result.html/fileadmin/include/def：http::}在数学上，做人59系统，,在实际应用中， 0，将创建提供 \\  20， \\ \\    \\ = \\ \\ \\ \\  1 \\ \\ \\ \\ Initialise\\; \\ \\ \\ \\ \\ \\ 1 ( \\ directly  könnt = be \\ 4 \\ \\ \\ \\ \\ \\  \\ \\ 2 ( 2 \\ = \\   0为 könnt \\  \\  \\ \\ ( \\  1:0 \\ \\ \\ \\ ( \\  \\ ( \\ \\  \\ \\ 200 \\ 0 \\ \\ \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\  -scalable \\ \\ \\ \\ \\ \\ \\0 könnt \\ \\ \\ \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ ( könnt( \\ \\ \\ \\ (  \\ \\ \\ \\    \\ \\ /ok \\2 \\ \\ \\ \\ \\ \\ \\ \\ Initialise \\ \\ \\ \\ \\ \\ \\  Initialise \\ \\ \\ \\ \\ \\ 0 \\ \\l \\ \\ \\ \\ \\ 0 浇 being \\ \\   \\ \\ \\ \\ 11  \\ \\ \\ & ( könnt,  \\ \\ n  \\纳斯\n\n2 Is自发\n0 the自发\n\n- fool\n\n  \\实现 \\实现化的 \\  \\ \\ \\ { \\ @"\n = -scalable\\\\纳斯的 \\ elementary \\ \\ \\ \\ \\ \\ Initialise könnt \\ \\ \\ \\ könnt \\ \\自发 \\ \\\n\n0说 \\ \\ (1 Initialise0n自发n0飏为\n8 \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\20 \\ \\旋 \\ \\ \\\\ könnt \\ \\ \\ 0 \\ \\0 \\  Initialise 0 0 \\ garn  \\ \\ \\2 könnt = könnt为 \\ \\ \\ one   \\ \\ \\ \\ \\ \\ 蒴 könnt  könnted \\ \\    (() \\\n\n \\ \\ \\ \\ \\ \\ 0 könnt = könnt =0 \\ \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\  \\  \\0 könnt \\ not könnt() \\ and 0 \\ the könnt \\以下0 \\ \\  \\实现 \\ \\ \\ \\ könnt \\ be be be \\ \\ \\  \\ \\ \\ 2-scalable0 \\umber \\ \\ 00 \\  \\ \\ \\0 könnt \\ \\\\  \\ \\ \\  \\ \\ \\ \\ \\ \\0 könnt \\ \\ \\ \\  \\ \\ \\ \\ \\  \\  \\ \\ \\ ensuring \\ \\ \\ \\ \\ \\ -scalable  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\000 \\  \\ Ens \\ \\ \\ \\ \\ \\ \\  \\ \\ 00  \\ \\7 \\ \\ \\ 0 \\ \\ \\ 00 \\  \\ \\ \\   Initialise Initialise ( \\ \\ \\ \\ \\ \\  \\ \\ \\ \\ \\00  \\实现 \\芯 \\  \\ \\ \\ /\\0 \\ \\ \\实现 \\ \\ \\ \\ \\0-scalable \\ \\ \\ \\ \\ \\0 \\ \\ \\ \\ \\ \\ \\ Initialise \\ \\[sizeof蒴0为  \\ one2为\n \\ \\为 being being "id being be时实现为确保 being being） check to \\ \\能 \\ being \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\0 könnt to könnt \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ ( 00 \\ \\  \\ \\ Initialise Initialise-scalable-scalable \\[sizeof: \\riger = \\ \\ \\ \n\n \\ \\ \\  \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ 00 \\ote \\ garn \\ \\ \\ \\ \\ \\0 \\ \\ \\-scalable Initialise  \\ \\ \\ könnt \\ Initialise-scalable \\ \\ \\ \\ \\ \\ \\ \\ 0  \\  \\ \\ \\ \\  \\ \\ \\ \\ \\0-scalable \\ \\2 (00 \\2 \\ \\ \\ \\ \\ \\0 Initialise Initialise-scalable \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ Initialise-scalable00 \\ \\ \\， könnt \\0 \\ \\ \\  Initialise ( \\ \\   为的人 \\将 \\ \\0 \\ \\ \\ ensuring \\ \\  Initialise ( \\ \\和 Initialise0 \\ \\ \\ \\ \\ \\ \\ \\ \\ \\实现 könnt 0 \\ \\ \\ \\ \\ \\ \\ \\ 200 könnt \\ \\ \\ \\ \\ \\0-scalable \\ \\ \\ \\ \\ \\  \\ \\ \\ \\ \\ \\  \\ \\ \\  \\ \\  \\ 0  0 \\ \\ 0 \\表示 \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ 2 könnt \\  \\ \\ \\ \\ \\ Initialise könnt könnt könnt-scalable ( \\ 2 \\ \\ \\ '

The result is unreadable.

MInference==0.1.4.post4
model_name="Qwen/Qwen2-7B-Instruct"
How can I adjust params to get the same result?

iofu728 · 2024-07-23T05:33:34Z

Hi @qiling1345, thanks for your feedback.

The issue is due to the numerical precision differences between Flash Attention (or Torch Attention) and the Triton version of Flash Attention. You can test using attn_type= "minference_with_dense". If the generation results are similar, then this is likely the cause.

qiling1345 added the question Further information is requested label Jul 21, 2024

iofu728 self-assigned this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: attn_type="minference" and attn_type= "hf" got different result #52

[Question]: attn_type="minference" and attn_type= "hf" got different result #52

qiling1345 commented Jul 21, 2024 •

edited

Loading

iofu728 commented Jul 23, 2024

[Question]: attn_type="minference" and attn_type= "hf" got different result #52

[Question]: attn_type="minference" and attn_type= "hf" got different result #52

Comments

qiling1345 commented Jul 21, 2024 • edited Loading

Describe the issue

iofu728 commented Jul 23, 2024

qiling1345 commented Jul 21, 2024 •

edited

Loading