-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add couple configs into generator.py for mixed input MM #1350
base: main
Are you sure you want to change the base?
Conversation
What do you mean by symmetric (same?). Tensor Core math_instruction shape for both upcast_a and upcast_b is 16816. The supported tile_description (more precisely tile shape) may need to be different for upcast_a vs upcast_b. |
93dc359
to
6ee70e7
Compare
By symmetry, I meant on As far as |
For For |
6ee70e7
to
33f48a7
Compare
Thanks for the clarification. I've updated
that matching
produces at least one line of profiling output, which should mean that the kernel compiled and ran successfully. As a matter of fact, in these tests of mine |
Asking again: how to properly run verification after my changes? |
cmake --no-warn-unused-cli -DCMAKE_BUILD_TYPE:STRING=Release -DCUTLASS_NVCC_ARCHS:STRING=80 -DCUTLASS_NVCC_KEEP:STRING=OFF -DCUTLASS_ENABLE_F16C:STRING=ON -DCUTLASS_LIBRARY_KERNELS:STRING=f16_s16816gemm_f16_s8_128x128_64x*,f16_s16816gemm_s8_f16_128x128_64x*,f16_s16816gemm_u8_f16_128x128_64x*,f16_s16816gemm_f16_u8_128x128_64x*,bf16_s16816gemm_bf16_s8_128x128_64x*,bf16_s16816gemm_s8_bf16_128x128_64x*,bf16_s16816gemm_bf16_u8_128x128_64x*,bf16_s16816gemm_u8_bf16_128x128_64x*,f16_s16816gemm_f16_128x128_64x*_tn_align8,bf16_s16816gemm_bf16_128x128_64x*_tn_align8 -DCUTLASS_LIBRARY_IGNORE_KERNELS:STRING=gemm_grouped*,gemm_planar* -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -DCMAKE_C_COMPILER:FILEPATH=/usr/bin/gcc -DCMAKE_CXX_COMPILER:FILEPATH=/usr/bin/g++ -S/mnt/disks/gcloud_workspace/repos/cutlass/cutlass_tree_2/cutlass -B/mnt/disks/gcloud_workspace/repos/cutlass/cutlass_tree_2/build -G Ninja The cmake flags to play with are
Apologies for the delayed response. I have been OOO for last few weeks. |
33f48a7
to
e130f14
Compare
Thanks for the clarifications. PR is updated with the changes suggested: Added number of tests, so that it should be all consistent now between tests, Script used to validate that
|
e130f14
to
36223dd
Compare
fe7e3ed
to
4e57cc2
Compare
This PR has been labeled |
4e57cc2
to
14cbeab
Compare
I'm adding (PR here) CUTLASS kernels as an auto-tune option for PyTorch compiler, and it would be nice to have these additional configurations available. This is not urgent, and more of alike changes may be further desired, so if it's OK to make changes like this then this PR could be kept open for while, and I'll make further additions, as needed, to it.
@manishucsd : Would it make sense for
GenerateSM80_TensorOp_16816_mixed_input_upcast_a
andGenerateSM80_TensorOp_16816_mixed_input_upcast_b
to be symmetric w.r.t.math_instructions
andtile_descriptions
? I can change it through this PR too.