Clean up and fix numpy_helper and subbyte #6124

justinchuby · 2024-05-03T02:37:07Z

complex number handling in numpy_helper
Otherwise the line
```
return np.asarray(data, dtype=storage_np_dtype).astype(np_dtype).reshape(dims)
```
raises TypeError: float() argument must be a string or a real number, not 'complex', because storage_np_dtype is float but data is complex already.
Vectorize float8 conversion functions and improve readability: Speed up float8e4m3_to_float32 by 10.3x (1000x1000 input, 10 iterations, 34.829s -> 3.11s)
Clean up int4 numpy helpers to make them more useful and performant with np native vectorization. Move all int4 related functions to the subbyte module.
Improve handling of big-endian systems
Remove the dims parameter in numpy helper functions to simplify the implementation.
Improve reference evaluator to_array_extended

@galagam for int4 updates, @AlexandreEichenberger for big-endian handling @xadupre for float8 functions and reference evaluator. Thanks!

Float 8 util speed test

from onnx import numpy_helper
import numpy as np
import pyinstrument
import onnx
floats = np.random.randint(0, 255, [1000, 1000], dtype=np.uint8)

print(onnx.__version__)
profiler = pyinstrument.Profiler()
profiler.start()
for i in range(10):
    print(i)
    numpy_helper.float8e4m3_to_float32(floats)
profiler.stop()
profiler.print()

TODO: Unit tests

Fixes #6126

github-actions · 2024-05-03T02:53:45Z

Test Results

3 files ±0 3 suites ±0 2m 15s ⏱️ +8s
7 486 tests ±0 4 386 ✅ - 70 3 030 💤 ± 0 70 ❌ + 70
22 441 runs +5 13 051 ✅ - 135 9 180 💤 - 70 210 ❌ +210

For more details on these failures, see this check.

Results for commit 64be57c. ± Comparison against base commit 013eb5e.

♻️ This comment has been updated with latest results.

xadupre · 2024-05-03T08:44:46Z

Maybe it is worth adding a unit test.

third_party/benchmark

justinchuby · 2024-05-04T02:15:41Z

onnx/numpy_helper.py

    """
-    single_func = lambda x: subbyte.unpack_single_4bitx2(x, signed)  # noqa: E731
-    func = np.frompyfunc(single_func, 1, 2)


frompyfunc is not performant

onnx/numpy_helper.py

-    if tensor_dtype in (TensorProto.COMPLEX64, TensorProto.COMPLEX128):
-        data = combine_pairs_to_complex(data)  # type: ignore[assignment,arg-type]
+    if tensor_dtype in (onnx.TensorProto.COMPLEX64, onnx.TensorProto.COMPLEX128):
+        return np.asarray(combine_pairs_to_complex(data)).astype(np_dtype).reshape(dims)


onnx/numpy_helper.py

-    if tensor_dtype in (TensorProto.COMPLEX64, TensorProto.COMPLEX128):
-        data = combine_pairs_to_complex(data)  # type: ignore[assignment,arg-type]
+    if tensor_dtype in (onnx.TensorProto.COMPLEX64, onnx.TensorProto.COMPLEX128):
+        return np.asarray(combine_pairs_to_complex(data)).astype(np_dtype).reshape(dims)


onnx/subbyte.py

-    clip_high = INT4_MAX if signed else UINT4_MAX
-    if not isinstance(x, np.ndarray):
-        x = np.asarray(x)
+    return np.rint(np.clip(x, INT4_MIN, INT4_MAX)).astype(np.int8)


onnx/subbyte.py

+    Returns:
+        An ndarray with a single int4 element.
+    """
+    return np.rint(np.clip(x, UINT4_MIN, UINT4_MAX)).astype(np.uint8)


onnx/subbyte.py

+    else:
+        i8_low = cast_uint4(val_low)
+        i8_high = cast_uint4(val_high)
+    i8_high <<= 4


onnx/subbyte.py

+    x_low = x & np.uint8(0x0F)
+    x_high = (x >> 4).astype(np.uint8)
+    if signed:
+        x_low = _int4_to_int8(x_low)


onnx/subbyte.py

+    x_high = (x >> 4).astype(np.uint8)
+    if signed:
+        x_low = _int4_to_int8(x_low)
+        x_high = _int4_to_int8(x_high)


onnx/numpy_helper.py

+    # if mantissa > 0:
+    #     exponent = 127 - exponent_bias
+    #     if mantissa & 0b100 == 0:
+    #         mantissa &= 0b011
+    #         mantissa <<= 1
+    #         exponent -= 1
+    #     if mantissa & 0b100 == 0:
+    #         mantissa &= 0b011
+    #         mantissa <<= 1
+    #         exponent -= 1
+    #     result |= (mantissa & 0b011) << 21
+    #     result |= exponent << 23


onnx/numpy_helper.py

justinchuby · 2024-05-04T05:26:47Z

onnx/numpy_helper.py

-    return f
-
-
-_float8e4m3_to_float32 = np.vectorize(


Removed use of vectorize because it is a for loop and is not performant

onnx/numpy_helper.py

+    result[normal_mask] |= exponents[normal_mask] << 23
+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]


onnx/numpy_helper.py

+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]
+    return result


onnx/numpy_helper.py

+    result[normal_mask] |= exponents[normal_mask] << 23
+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]


onnx/numpy_helper.py

+    result = result.view(np.float32)
+    if is_scalar:
+        return result[0]
+    return result


onnx/numpy_helper.py

+    # if exponent == 0:
+    #     # Subnormal number
+    #     if mantissa > 0:
+    #         exponent = 127 - exponent_bias
+    #         if mantissa & 0b10 == 0:
+    #             mantissa &= 0b01
+    #             mantissa <<= 1
+    #             exponent -= 1
+    #         result |= (mantissa & 0b01) << 22
+    #         result |= exponent << 23


Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Signed-off-by: Justin Chu <justinchu@microsoft.com>

onnx/reference/op_run.py

-            y[i] = d
-        return y.reshape(shape)
+            data = np.array(tensor.int32_data, dtype=np.uint8)
+        data = data.view(dtype_mapping[elem_type])


onnx/reference/op_run.py

-            y[i] = d
-        return y.reshape(shape)
+            data = np.array(tensor.int32_data, dtype=np.uint8)
+        data = data.view(dtype_mapping[elem_type])


onnx/reference/op_run.py

-        for i, d in enumerate(data):
-            y[i] = d
+        dtype_mapping = {TensorProto.INT4: int4, TensorProto.UINT4: uint4}
+        dtype = dtype_mapping[elem_type]


onnx/reference/op_run.py

-            y[i] = d
+        dtype_mapping = {TensorProto.INT4: int4, TensorProto.UINT4: uint4}
+        dtype = dtype_mapping[elem_type]
+        return subbyte.unpack_int4(data, dims=tensor.dims, signed=signed).view(dtype)


Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby · 2024-05-04T19:01:23Z

onnx/numpy_helper.py

+    return result
+
+
+def _small_endian_dtype(dtype) -> np.dtype:


Suggested change

def _small_endian_dtype(dtype) -> np.dtype:

def _little_endian_dtype(dtype) -> np.dtype:

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby · 2024-05-05T03:34:09Z

For float8 usage, we may be better of using https://github.com/jax-ml/ml_dtypes?

xadupre · 2024-05-06T07:43:31Z

onnx/numpy_helper.py

-    return shift(data.astype(np.int32)).reshape(dims).view(np.float32)  # type: ignore[no-any-return]
-
-
-def _float8e4m3_to_float32_scalar(ival: int, fn: bool, uz: bool) -> np.float32:


I would keep the code of the old function in the documentation. The logic is easier to read so that the new code can be more easily understood.

justinchuby requested a review from a team as a code owner May 3, 2024 02:37

justinchuby changed the title ~~Fix numpy_helper to_array when tensor is complex~~ Fix numpy_helper to_array errors May 3, 2024

justinchuby requested review from xadupre and gramalingam May 3, 2024 03:04

justinchuby added this to the 1.17 milestone May 3, 2024

justinchuby marked this pull request as draft May 3, 2024 14:53

justinchuby changed the title ~~Fix numpy_helper to_array errors~~ Clean up numpy_helper and subbyte May 4, 2024

justinchuby marked this pull request as ready for review May 4, 2024 02:03

justinchuby force-pushed the justinchu/complex-numpy branch 2 times, most recently from 9ba8ff1 to b688548 Compare May 4, 2024 02:07

justinchuby added the better engineering Improve engineering quality of the project label May 4, 2024

justinchuby requested a review from AlexandreEichenberger May 4, 2024 02:11

justinchuby commented May 4, 2024

View reviewed changes

third_party/benchmark Outdated Show resolved Hide resolved

justinchuby changed the title ~~Clean up numpy_helper and subbyte~~ Clean up and fix numpy_helper and subbyte May 4, 2024

justinchuby commented May 4, 2024

View reviewed changes

onnx/numpy_helper.py Outdated Show resolved Hide resolved

justinchuby commented May 4, 2024

View reviewed changes

onnx/numpy_helper.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

justinchuby marked this pull request as draft May 4, 2024 04:29

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

onnx/numpy_helper.py Fixed Show fixed Hide fixed

onnx/numpy_helper.py Fixed Show fixed Hide fixed

justinchuby marked this pull request as ready for review May 4, 2024 05:25

justinchuby commented May 4, 2024

View reviewed changes

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

justinchuby and others added 3 commits May 4, 2024 15:37

Fix numpy_helper to_array when tensor is complex

4a7013b

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update numpy_helper.py

4da0d00

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

float8

b397529

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby and others added 14 commits May 4, 2024 15:37

sybbyte

b2f37f5

Signed-off-by: Justin Chu <justinchu@microsoft.com>

int4

0d5311a

Signed-off-by: Justin Chu <justinchu@microsoft.com>

More

756d915

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Remove bench mark

8e5baaa

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update onnx/numpy_helper.py

c511da9

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update onnx/numpy_helper.py

b50d1b7

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

WIP float8e4m3_to_float32

73e2a79

Signed-off-by: Justin Chu <justinchu@microsoft.com>

float8e4m3_to_float32

126da8b

Signed-off-by: Justin Chu <justinchu@microsoft.com>

array

a9d7e76

Signed-off-by: Justin Chu <justinchu@microsoft.com>

dtype

ecb131e

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Fix

a01b2f1

Signed-off-by: Justin Chu <justinchu@microsoft.com>

test_float8_e5m2fnuz_out_of_range

475f513

Signed-off-by: Justin Chu <justinchu@microsoft.com>

float8e5m2_to_float32

93c3448

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update reference

9fd4909

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby force-pushed the justinchu/complex-numpy branch from 15aec39 to 9fd4909 Compare May 4, 2024 15:37

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

justinchuby requested a review from a team as a code owner May 4, 2024 16:01

Ref tests

4c18a15

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby force-pushed the justinchu/complex-numpy branch from cd51e75 to 4c18a15 Compare May 4, 2024 16:04

justinchuby added 2 commits May 4, 2024 16:07

Fix sign mask

c4545cf

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Use put mask to handle multi-d masks

01d1f5d

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby marked this pull request as draft May 4, 2024 16:22

justinchuby marked this pull request as ready for review May 4, 2024 16:26

justinchuby mentioned this pull request May 4, 2024

[IR] Allow tensor created with numpy unsupported dtypes microsoft/onnxscript#1441

Merged

justinchuby commented May 4, 2024

View reviewed changes

docs

64be57c

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

xadupre reviewed May 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up and fix numpy_helper and subbyte #6124

Clean up and fix numpy_helper and subbyte #6124

justinchuby commented May 3, 2024 •

edited

github-actions bot commented May 3, 2024 •

edited

xadupre commented May 3, 2024

justinchuby May 4, 2024

justinchuby May 4, 2024

justinchuby May 4, 2024

justinchuby commented May 5, 2024

xadupre May 6, 2024

	def _small_endian_dtype(dtype) -> np.dtype:
	def _little_endian_dtype(dtype) -> np.dtype:

		return shift(data.astype(np.int32)).reshape(dims).view(np.float32) # type: ignore[no-any-return]


		def _float8e4m3_to_float32_scalar(ival: int, fn: bool, uz: bool) -> np.float32:

Clean up and fix numpy_helper and subbyte #6124

Are you sure you want to change the base?

Clean up and fix numpy_helper and subbyte #6124

Conversation

justinchuby commented May 3, 2024 • edited

Float 8 util speed test

TODO: Unit tests

github-actions bot commented May 3, 2024 • edited

Test Results

xadupre commented May 3, 2024

justinchuby May 4, 2024

Choose a reason for hiding this comment

justinchuby May 4, 2024

Choose a reason for hiding this comment

justinchuby May 4, 2024

Choose a reason for hiding this comment

justinchuby commented May 5, 2024

xadupre May 6, 2024

Choose a reason for hiding this comment

justinchuby commented May 3, 2024 •

edited

github-actions bot commented May 3, 2024 •

edited