Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed int8 quantization and added experimental mixed int8/int16 quantization #228

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mikljohansson
Copy link

@mikljohansson mikljohansson commented Sep 12, 2020

Thanks for providing these examples of working with Yolo v4/v3!

I managed to fix the int8 quantization by adding a model.compile() statement to fix the "optimize global tensors" exception. I could also remove overriding supported_ops, by following the examples TensorFlow Lite provides for quantization.

FYI: I'm currently trying to port these models to the K210 / MaixPy MCU, but so far haven't managed to get nncase fully consume the tflite files yet (it doesn't support the SPLIT and DEQUANTIZE op tflite codes).

Note: This only works on the latest tf-nightly (2.4.0+). It doesn't work on tensorflow-2.3.0

It doesn't fully quantize currently, since the network uses some non-quantizable ops (EXP). I've not looked further into that yet.

Best regards,
Mikael

@JimBratsos
Copy link

Hello, thanks for the great work, will the above fixes be able to fully quantize the yolov4/v3 model in order for it to run at tpu?

@mikljohansson
Copy link
Author

Hello, thanks for the great work, will the above fixes be able to fully quantize the yolov4/v3 model in order for it to run at tpu?

No, unfortunately it it doesn't fully quantize currently, since the network uses some non-quantizable ops (EXP). I've not looked further into that yet.

When I try with

    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.target_spec.supported_types = [tf.int8, tf.uint8]
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.int8
    converter.representative_dataset = representative_data_gen

I get

RuntimeError: Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.

@raryanpur
Copy link

raryanpur commented Sep 23, 2020

@mikljohansson, running your patch set (pulling your forked repo) gives the following error in my environment

  File "convert_tflite.py", line 87, in <module>                                          
    app.run(main)                                                                                                                                                                                                                                                                                                           
  File "/Applications/anaconda3/envs/yolov3-tf2-cpu/lib/python3.7/site-packages/absl/app.py", line 300, in run                         
    _run_main(main, args)                                                                                                                                                                                                                                
  File "/Applications/anaconda3/envs/yolov3-tf2-cpu/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))                                                                                                                                                             
  File "convert_tflite.py", line 82, in main                                                                                                     
    save_tflite()                                                                                                                                                                                                                                                                           
  File "convert_tflite.py", line 56, in save_tflite                          
    tflite_model = converter.convert()                                                                                                                                            
  File "/Applications/anaconda3/envs/yolov3-tf2-cpu/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 892, in convert                                                                                                                                                   
    self).convert(graph_def, input_tensors, output_tensors)            
  File "/Applications/anaconda3/envs/yolov3-tf2-cpu/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 650, in convert
    result = self._calibrate_quantize_model(result, **flags)                                                                                
  File "/Applications/anaconda3/envs/yolov3-tf2-cpu/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 478, in _calibrate_quantize_model                                                    
    inference_output_type, allow_float, activations_type)                                                  
  File "/Applications/anaconda3/envs/yolov3-tf2-cpu/lib/python3.7/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 98, in calibrate_and_quantize
    np.dtype(activations_type.as_numpy_dtype()).num)                 
RuntimeError: Max and min for dynamic tensors should be recorded during calibration: Failed for tensor input_1
Empty min/max for tensor input_1

Since this depends on tf-nightly, perhaps something has changed in the last 10 days since you made this PR? I'm using tf-nightly:2.4.0-dev20200918 and Python 3.7.0

Note the below was also in the debug log, a good ways before the backtrace information

WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.                                                               
W0922 23:03:29.533921 4621229504 load.py:133] No training configuration found in save file, so the model was *not* compiled. Compile it manually.

@mikljohansson
Copy link
Author

mikljohansson commented Sep 23, 2020

@mikljohansson, running your patch set (pulling your forked repo) gives the following error in my environment

RuntimeError: Max and min for dynamic tensors should be recorded during calibration: Failed for tensor input_1
Empty min/max for tensor input_1

@raryanpur I think the problem might be perhaps that some file paths are incorrect in the calibration dataset (e.g. ./data/dataset/val2017.txt), I got this error myself and it took me a while to figure out that I had gotten the sample image paths incorrect 😅

I've improved the error reporting for this now, if you pull and try again it might give you a better error message about what's wrong. If it turns out the dataset is missing there's instructions in the README.md about how to download it

Best,
Mikael

@raryanpur
Copy link

Ah that did the trick - thanks @mikljohansson, works now!

@raryanpur
Copy link

raryanpur commented Sep 25, 2020

@mikljohansson when using this quantized model, how are the inputs and outputs scaled? My understanding is that the inputs are still floats, but the values must be scaled from [0.0, 255.0] to [-128.0, 127.0]. Do the outputs (score and box tensor values) need to be scaled as well?

@YLTsai0609
Copy link

Hi @mikljohansson , thanks for your great work. After running your modification. I got my yolov3_int_8.tflite model work.

And the message was shown below

[{'name': 'input_1', 'index': 549, 'shape': array([  1, 416, 416,   3], dtype=int32), 'shape_signature': array([ -1, 416, 416,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
[{'name': 'Identity', 'index': 550, 'shape': array([    1, 10647,     4], dtype=int32), 'shape_signature': array([ 1, -1,  4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'Identity_1', 'index': 551, 'shape': array([    1, 10647,     3], dtype=int32), 'shape_signature': array([ 1, -1,  3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

My question is : Why does it say int32? We actually did a int-8 quantization right?
Is there any resource about it?
Very appreciated. Thanks for your nice work again.

@mikljohansson
Copy link
Author

@mikljohansson when using this quantized model, how are the inputs and outputs scaled? My understanding is that the inputs are still floats, but the values must be scaled from [0.0, 255.0] to [-128.0, 127.0]. Do the outputs (score and box tensor values) need to be scaled as well?

@raryanpur sorry for not getting back, E-mail got lost in my inbox :(

I honestly don't know, sorry. I haven't dug into the input/output scaling and haven't worked on this model for a while (focusing on other things right now). Hopefully you've been able to work it out already :)

@mikljohansson
Copy link
Author

Hi @mikljohansson , thanks for your great work. After running your modification. I got my yolov3_int_8.tflite model work.

And the message was shown below

[{'name': 'input_1', 'index': 549, 'shape': array([  1, 416, 416,   3], dtype=int32), 'shape_signature': array([ -1, 416, 416,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
[{'name': 'Identity', 'index': 550, 'shape': array([    1, 10647,     4], dtype=int32), 'shape_signature': array([ 1, -1,  4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'Identity_1', 'index': 551, 'shape': array([    1, 10647,     3], dtype=int32), 'shape_signature': array([ 1, -1,  3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

My question is : Why does it say int32? We actually did a int-8 quantization right?
Is there any resource about it?
Very appreciated. Thanks for your nice work again.

Not sure honestly why that is. I could imagine it could be because the network doesn't quantize fully (due to the EXP operator mentioned in earlier comments on this PR).

Perhaps you could try to uncomment these lines in convert_tflite.py and see if it makes a difference?

    #converter.inference_input_type = tf.uint8
    #converter.inference_output_type = tf.int8

This flag might set all intermediate weights and calculations to 8-bit, but I don't think it'd work currently due to the inability to fully quantize the network

converter.target_spec.supported_types = [tf.int8]

@Arfinul
Copy link

Arfinul commented Jan 8, 2021

after converting a customized(not coco) yolov3-tiny into .tflite format
i executed the command below

python convert_tflite.py --weights ./checkpoints/yolov4-416 --output ./checkpoints/yolov4-416-int8.tflite --quantize_mode int8 --dataset ./coco_dataset/coco/val207.txt

./checkpoints/yolov4-416 ---> this is not coco model, it is from customized/different dataset

  1. Please suggest, still shall i have to use ./coco_dataset/coco/val207.txt ?
  2. if not, how can i convert my dataset from yolo annotated format to the format of which val207.txt ?

@spalani7
Copy link

RuntimeError: Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.
Quantization not yet supported for op: 'EXP'.

I also have the above error, @mikljohansson were you able to fix this ? if yes can you provide the solution. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants