Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611

Merged
merged 1 commit into from
May 15, 2024

Conversation

copybara-service[bot]
Copy link

PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter

Imported from GitHub PR openxla/xla#12328

Current transpose mlir emitter allocate shared cache with shape 32x1x32 for transpose 2-1-0. But the read indices of shared cache are {0, y, x} as this line shows, which is not compatible with 32x1x32 shape. What's strange is that transpose 2-1-0 can run successfully using transpose mlir emitter. I find the reason is that lower tensor pass use linear index to access shared cache, which is lucky to get right result. For example, the strides of 32x1x32 are {32, 32, 1}, and the linear index of {0, y ,x} is 0 * 32 + y * 32 + 32.
I am not sure if it is as expected or just mistake. If reviewer think no need of this PR, feel free to close.
Copybara import of the project:

--
bfb21798ee518dc11293a5683669add619a38e53 by Zhou, Lingzhi lingzhi.zhou@intel.com:

make shared cache read/write logic more clearly for transpose mlir emitter

--
0c9033334835bc8a14310e5ee059489cea7b5309 by Zhou, Lingzhi lingzhi.zhou@intel.com:

refactor

--
5554110835fc18207fb466587c1aeb20c3a542fe by Zhou, Lingzhi lingzhi.zhou@intel.com:

pad shared cache

--
8c17818baa1e2477952df15e412a6463f73106ab by Zhou, Lingzhi lingzhi.zhou@intel.com:

include missing file

Merging this change closes #12328

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12328 from lingzhi98:lingzhi/transpose_mlir_210 8c17818baa1e2477952df15e412a6463f73106ab

Copy link

google-cla bot commented May 15, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

…ose mlir emitter

Imported from GitHub PR openxla/xla#12328

Current transpose mlir emitter allocate shared cache with shape 32x1x32 for transpose 2-1-0. But the read indices of shared cache are {0, y, x} as [this line](https://github.com/openxla/xla/blob/main/xla/service/gpu/fusions/transpose_mlir.cc#L190) shows, which is not compatible with 32x1x32 shape. What's strange is that transpose 2-1-0 can run successfully using transpose mlir emitter. I find the reason is that lower tensor pass use [linear index](https://github.com/openxla/xla/blob/main/xla/service/gpu/fusions/mlir/lower_tensors.cc#L148) to access shared cache, which is lucky to get right result. For example, the strides of 32x1x32 are {32, 32, 1}, and the linear index of {0, y ,x} is 0 * 32 + y * 32 + 32.
I am not sure if it is as expected or just mistake. If reviewer think no need of this PR, feel free to close.
Copybara import of the project:

--
bfb21798ee518dc11293a5683669add619a38e53 by Zhou, Lingzhi <lingzhi.zhou@intel.com>:

make shared cache read/write logic more clearly for transpose mlir emitter

--
0c9033334835bc8a14310e5ee059489cea7b5309 by Zhou, Lingzhi <lingzhi.zhou@intel.com>:

refactor

--
5554110835fc18207fb466587c1aeb20c3a542fe by Zhou, Lingzhi <lingzhi.zhou@intel.com>:

pad shared cache

--
8c17818baa1e2477952df15e412a6463f73106ab by Zhou, Lingzhi <lingzhi.zhou@intel.com>:

include missing file

Merging this change closes #12328

PiperOrigin-RevId: 633848774
@copybara-service copybara-service bot closed this May 15, 2024
@copybara-service copybara-service bot deleted the exported_pr_633823653 branch May 15, 2024 08:27
@copybara-service copybara-service bot merged commit a8a6dd4 into master May 15, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues related to integrating syntaxnet/parsey mcparseface in Java + tf.reduce_sum
1 participant