PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter #67611
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #12328: Make shared cache read/write logic more clearly for transpose mlir emitter
Imported from GitHub PR openxla/xla#12328
Current transpose mlir emitter allocate shared cache with shape 32x1x32 for transpose 2-1-0. But the read indices of shared cache are {0, y, x} as this line shows, which is not compatible with 32x1x32 shape. What's strange is that transpose 2-1-0 can run successfully using transpose mlir emitter. I find the reason is that lower tensor pass use linear index to access shared cache, which is lucky to get right result. For example, the strides of 32x1x32 are {32, 32, 1}, and the linear index of {0, y ,x} is 0 * 32 + y * 32 + 32.
I am not sure if it is as expected or just mistake. If reviewer think no need of this PR, feel free to close.
Copybara import of the project:
--
bfb21798ee518dc11293a5683669add619a38e53 by Zhou, Lingzhi lingzhi.zhou@intel.com:
make shared cache read/write logic more clearly for transpose mlir emitter
--
0c9033334835bc8a14310e5ee059489cea7b5309 by Zhou, Lingzhi lingzhi.zhou@intel.com:
refactor
--
5554110835fc18207fb466587c1aeb20c3a542fe by Zhou, Lingzhi lingzhi.zhou@intel.com:
pad shared cache
--
8c17818baa1e2477952df15e412a6463f73106ab by Zhou, Lingzhi lingzhi.zhou@intel.com:
include missing file
Merging this change closes #12328
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12328 from lingzhi98:lingzhi/transpose_mlir_210 8c17818baa1e2477952df15e412a6463f73106ab