PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611

copybara-service · 2024-05-15T06:27:28Z

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter

Imported from GitHub PR openxla/xla#12328

Current transpose mlir emitter allocate shared cache with shape 32x1x32 for transpose 2-1-0. But the read indices of shared cache are {0, y, x} as this line shows, which is not compatible with 32x1x32 shape. What's strange is that transpose 2-1-0 can run successfully using transpose mlir emitter. I find the reason is that lower tensor pass use linear index to access shared cache, which is lucky to get right result. For example, the strides of 32x1x32 are {32, 32, 1}, and the linear index of {0, y ,x} is 0 * 32 + y * 32 + 32.
I am not sure if it is as expected or just mistake. If reviewer think no need of this PR, feel free to close.
Copybara import of the project:

--
bfb21798ee518dc11293a5683669add619a38e53 by Zhou, Lingzhi lingzhi.zhou@intel.com:

make shared cache read/write logic more clearly for transpose mlir emitter

--
0c9033334835bc8a14310e5ee059489cea7b5309 by Zhou, Lingzhi lingzhi.zhou@intel.com:

refactor

--
5554110835fc18207fb466587c1aeb20c3a542fe by Zhou, Lingzhi lingzhi.zhou@intel.com:

pad shared cache

--
8c17818baa1e2477952df15e412a6463f73106ab by Zhou, Lingzhi lingzhi.zhou@intel.com:

include missing file

Merging this change closes #12328

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12328 from lingzhi98:lingzhi/transpose_mlir_210 8c17818baa1e2477952df15e412a6463f73106ab

google-cla · 2024-05-15T06:27:34Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

…ose mlir emitter Imported from GitHub PR openxla/xla#12328 Current transpose mlir emitter allocate shared cache with shape 32x1x32 for transpose 2-1-0. But the read indices of shared cache are {0, y, x} as [this line](https://github.com/openxla/xla/blob/main/xla/service/gpu/fusions/transpose_mlir.cc#L190) shows, which is not compatible with 32x1x32 shape. What's strange is that transpose 2-1-0 can run successfully using transpose mlir emitter. I find the reason is that lower tensor pass use [linear index](https://github.com/openxla/xla/blob/main/xla/service/gpu/fusions/mlir/lower_tensors.cc#L148) to access shared cache, which is lucky to get right result. For example, the strides of 32x1x32 are {32, 32, 1}, and the linear index of {0, y ,x} is 0 * 32 + y * 32 + 32. I am not sure if it is as expected or just mistake. If reviewer think no need of this PR, feel free to close. Copybara import of the project: -- bfb21798ee518dc11293a5683669add619a38e53 by Zhou, Lingzhi <lingzhi.zhou@intel.com>: make shared cache read/write logic more clearly for transpose mlir emitter -- 0c9033334835bc8a14310e5ee059489cea7b5309 by Zhou, Lingzhi <lingzhi.zhou@intel.com>: refactor -- 5554110835fc18207fb466587c1aeb20c3a542fe by Zhou, Lingzhi <lingzhi.zhou@intel.com>: pad shared cache -- 8c17818baa1e2477952df15e412a6463f73106ab by Zhou, Lingzhi <lingzhi.zhou@intel.com>: include missing file Merging this change closes #12328 PiperOrigin-RevId: 633848774

copybara-service bot force-pushed the exported_pr_633823653 branch from 0bd7857 to d6347e8 Compare May 15, 2024 07:36

copybara-service bot force-pushed the exported_pr_633823653 branch from d6347e8 to a8a6dd4 Compare May 15, 2024 08:27

copybara-service bot closed this May 15, 2024

copybara-service bot deleted the exported_pr_633823653 branch May 15, 2024 08:27

copybara-service bot merged commit a8a6dd4 into master May 15, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611

copybara-service bot commented May 15, 2024

google-cla bot commented May 15, 2024

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611

Conversation

copybara-service bot commented May 15, 2024

google-cla bot commented May 15, 2024