Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inductor][cpu]speech_transformer AMP single/multiple thread static/dynamic shape CPP/default wrapper performance regression in 2024-05-12 nightly release #126274

Open
zxd1997066 opened this issue May 15, 2024 · 1 comment
Assignees
Labels
oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2

Comments

@zxd1997066
Copy link
Contributor

zxd1997066 commented May 15, 2024

馃悰 Describe the bug

AMP static shape default wrapper

suite name thread batch_size_new speed_up_new inductor_new eager_new compilation_latency_new batch_size_old speed_up_old inductor_old eager_old compilation_latency_old Ratio Speedup(New/old) Eager Ratio(old/new) Inductor Ratio(old/new) Compilation_latency_Ratio(old/new)
torchbench speech_transformer multiple 1 0.958469 0.030251814 0.028995425912766 69.833969 1.0 1.263245 0.023321511 0.029460782163194997 34.216882 0.76 1.02 0.77 0.49
torchbench speech_transformer single 1 0.996765 0.217230587 0.21652784605105502 69.387519 1.0 1.268162 0.176525437 0.223862851236794 33.153806 0.79 1.03 0.81 0.48

AMP dyanmic shape default wrapper

suite name thread batch_size_new speed_up_new inductor_new eager_new compilation_latency_new batch_size_old speed_up_old inductor_old eager_old compilation_latency_old Ratio Speedup(New/old) Eager Ratio(old/new) Inductor Ratio(old/new) Compilation_latency_Ratio(old/new)
torchbench speech_transformer multiple 1 0.951825 0.030079704 0.0286306142598 69.78338 1.0 1.258736 0.023420975 0.029480824387600003 34.077379 0.76 1.03 0.78 0.49
torchbench speech_transformer single 1 0.996724 0.218491288 0.21777551054051203 69.378941 1.0 1.241771 0.178057441 0.221106566568011 33.172262 0.8 1.02 0.81 0.48

AMP static shape cpp wrapper

suite name thread batch_size_new speed_up_new inductor_new eager_new compilation_latency_new batch_size_old speed_up_old inductor_old eager_old compilation_latency_old Ratio Speedup(New/old) Eager Ratio(old/new) Inductor Ratio(old/new) Compilation_latency_Ratio(old/new)
torchbench speech_transformer multiple 1 0.982142 0.029919203 0.029384905872826 207.79774 1.0 1.331547 0.022101082 0.029428629433854003 75.792615 0.74 1.0 0.74 0.36
torchbench speech_transformer single 1 0.997013 0.21735344 0.21670420527472 207.207765 1.0 1.273872 0.17650107299999998 0.22483977486465595 75.687333 0.78 1.04 0.81 0.37

AMP dynamic shape cpp wrapper

suite name thread batch_size_new speed_up_new inductor_new eager_new compilation_latency_new batch_size_old speed_up_old inductor_old eager_old compilation_latency_old Ratio Speedup(New/old) Eager Ratio(old/new) Inductor Ratio(old/new) Compilation_latency_Ratio(old/new)
torchbench speech_transformer multiple 1 0.972185 0.029728354 0.028901459833489997 208.31205 1.0 1.347673 0.021702161 0.029247416421353 75.875923 0.72 1.01 0.73 0.36
torchbench speech_transformer single 1 0.997032 0.218706831 0.218057709125592 207.342141 1.0 1.278801 0.175010154 0.22380315994535402 75.659505 0.78 1.03 0.8 0.36

SW info

name target_branch target_commit refer_branch refer_commit
torchbench main d6015d42 main d6015d42
torch main 02093b6 main 6d30803
torchvision main 0.19.0a0+d23a6e1 main 0.19.0a0+06ad737
torchtext main 0.16.0a0+b0ebddc main 0.16.0a0+b0ebddc
torchaudio main 2.2.0a0+ea437b3 main 2.2.0a0+ea437b3
torchdata main 0.7.1a0+0790338 main 0.7.1a0+0790338
dynamo_benchmarks main nightly main nightly

Repro:
inductor_single_run.sh
bash inductor_single_run.sh multiple/single inference performance torchbench speech_transformer amp first dynamic/static default/cpp
Suspected guilty commit: 0935b3d
torchbench-speech_transformer-inference-amp-static-default-single-performance-drop_guilty_commit.log
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @WeizhuoZhang-intel @chuanqi129

@chuanqi129 chuanqi129 added the oncall: cpu inductor CPU Inductor issues for Intel team to triage label May 15, 2024
@chuanqi129
Copy link
Collaborator

Hi @anijain2305, according to the bisect search log, the PR #125202 may introduce this AMP performance regression issue on CPU, could you please help to double check it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2
Projects
None yet
Development

No branches or pull requests

4 participants