Sync huggingface modifications of qwen Moe model #4774

eigen2017 · 2024-05-12T17:36:52Z

nearly huggingface merged my pr：https://github.com/huggingface/transformers/pull/30552/files

i introduced a new config "mlp_only_layers" to qwen Moe model,
i think vllm should keep the same model forward logic as huggingface model definations.
so this pr is for sync huggingface modifications of qwen Moe model.

the new config "mlp_only_layers" is to set cut which layers' experts, for fit to limited HBM or other creative scenarios.

change in Qwen2MoeDecoderLayer is to sync huggingface modifications of the model.
change in load_weights is when mlp_only_layers is not empty, "self.named_parameters()" do not have all weights(because some experts have been cut), it'll cause exception when doing "param = params_dict[name]".
it's also a bug in the original version of qwen2_moe.py, when set decoder_sparse_step > 1, experts also been cut, and name is not in params_dict, params_dict[name] will cause expceptions.
what ever, check the params_dict's key "name" exsist or not before read params_dict[name] is safer.

i'm willing to contribute to great vllm ~~ any reply is welcomed~ thks ^_^

eigen2017 · 2024-05-12T17:39:41Z

i'll fix the CI warns soon.

eigen2017 · 2024-05-13T16:13:07Z

this issue can be solved：
#4369

and these issues can be solved when using MoE models：
#3931
#3563

eigen2017 · 2024-05-13T16:15:03Z

@WoosukKwon @zhuohan123
could you please arrage someone to review my pr ?
thanks~~

eigen2017 · 2024-05-14T13:18:58Z

amd test said:
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json

Failed to connect to localhost port 8000: Connection refused
timeout 600 bash -c 'until curl localhost:8000/v1/models; do sleep 1; done'

it's likely to be a bug of the scanner, i try it again

eigen2017 · 2024-05-14T15:14:10Z

@WoosukKwon @zhuohan123 now ci passed, could you please give a review？thanks ^_^

simon-mo · 2024-05-14T23:19:49Z

@JustinLin610 can you help take a look at this? Thank you! 🙏

eigen2017 · 2024-05-15T07:28:15Z

@JustinLin610 can you help take a look at this? Thank you! 🙏

thank you for replying!

@JustinLin610 HI ^_^ ，qwen member, could you please give this pr a review? thanks!

eigen2017 · 2024-05-15T15:51:42Z

@JustinLin610 @yangapku @simonJJJ @logicwong @JianxinMa @hzhwcmhf @fyabc @huybery
各位阿里千问member灯塔级大神，谁能帮忙确认一下这个pr嘛？
欢迎帮忙提供任何观点，多谢啦！！

bozheng-hit

I think the PR is functionally okay.

eigen2017 · 2024-05-16T16:03:19Z

I think the PR is functionally okay.

thank you for your approvement!!

@simon-mo HI ~ it seems qwen members are all very busy ...

JustinLin610 · 2024-05-17T02:47:26Z

bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo

eigen2017 · 2024-05-17T11:31:00Z

bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo

thank you very much！yes it's functional ok and don't change anything if not set.

@simon-mo if anything else needed for merg this pr please tell me,

78 · 2024-05-24T22:08:20Z

config.mlp_only_layers should have a default value.

eigen2017 added 3 commits May 13, 2024 00:58

support new config field: mlp_only_layers

6545e68

support new config field: mlp_only_layers

b4800e1

support new config field: mlp_only_layers

4afd428

eigen2017 added 2 commits May 13, 2024 23:41

clear CI warnings

f6e0ad9

clear CI warnings

260d747

to rerun amd test

b514868

bozheng-hit approved these changes May 16, 2024

View reviewed changes

simon-mo approved these changes May 17, 2024

View reviewed changes

simon-mo merged commit 48d5985 into vllm-project:main May 17, 2024
55 checks passed

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

Sync huggingface modifications of qwen Moe model (vllm-project#4774)

96e8baa

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

Sync huggingface modifications of qwen Moe model (vllm-project#4774)

d0f3b87

tybalex pushed a commit to tybalex/vllm-function-call that referenced this pull request May 25, 2024

Sync huggingface modifications of qwen Moe model (vllm-project#4774)

9869ba8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync huggingface modifications of qwen Moe model #4774

Sync huggingface modifications of qwen Moe model #4774

eigen2017 commented May 12, 2024 •

edited

eigen2017 commented May 12, 2024

eigen2017 commented May 13, 2024

eigen2017 commented May 13, 2024

eigen2017 commented May 14, 2024

eigen2017 commented May 14, 2024

simon-mo commented May 14, 2024 •

edited

eigen2017 commented May 15, 2024

eigen2017 commented May 15, 2024 •

edited

bozheng-hit left a comment

eigen2017 commented May 16, 2024

JustinLin610 commented May 17, 2024

eigen2017 commented May 17, 2024

78 commented May 24, 2024

Sync huggingface modifications of qwen Moe model #4774

Sync huggingface modifications of qwen Moe model #4774

Conversation

eigen2017 commented May 12, 2024 • edited

eigen2017 commented May 12, 2024

eigen2017 commented May 13, 2024

eigen2017 commented May 13, 2024

eigen2017 commented May 14, 2024

eigen2017 commented May 14, 2024

simon-mo commented May 14, 2024 • edited

eigen2017 commented May 15, 2024

eigen2017 commented May 15, 2024 • edited

bozheng-hit left a comment

Choose a reason for hiding this comment

eigen2017 commented May 16, 2024

JustinLin610 commented May 17, 2024

eigen2017 commented May 17, 2024

78 commented May 24, 2024

eigen2017 commented May 12, 2024 •

edited

simon-mo commented May 14, 2024 •

edited

eigen2017 commented May 15, 2024 •

edited