Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trt.create_inference_graph step in detection.ipynb stuck for long time #77

Open
roarjn opened this issue Jun 1, 2020 · 5 comments
Open

Comments

@roarjn
Copy link

roarjn commented Jun 1, 2020

Hi,
In the detection.ipynb I have set the score_threshold=0.3 as recommended. The above cells run as expected however, the trt.create_inference_graph cell does not run. When i run this cell I see a Python process utilize 100% cpu on top command. Then this cpu utilization goes to 0 but the cell still keeps running. I have kept the cell running for >30mins.
https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/examples/detection/detection.ipynb

Appreciate any help.

@dkatsios
Copy link

dkatsios commented Jun 5, 2020

I have a similar issue. I try to run detection.ipynb on Jetson Nano (jetpack 4.3, python 3.6, tensorflow 1.15) but when it reaches trt.create_inference_graph() it stucks for several minutes and the kernel restarts. Memory usage is 3.3/3.9GB and swap almost empty. Last terminal outputs:

2020-06-05 23:51:45.473972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2
2020-06-05 23:51:45.688493: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job)
[I 23:55:25.776 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel bc86b93e-4a68-4470-a522-7bdfd2c6f95a restarted

Appreciate any help.

@evil-potato
Copy link

hello,have you ever solved this problem? I encounter same

@evil-potato
Copy link

I have a similar issue. I try to run detection.ipynb on Jetson Nano (jetpack 4.3, python 3.6, tensorflow 1.15) but when it reaches trt.create_inference_graph() it stucks for several minutes and the kernel restarts. Memory usage is 3.3/3.9GB and swap almost empty. Last terminal outputs:

2020-06-05 23:51:45.473972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2
2020-06-05 23:51:45.688493: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job)
[I 23:55:25.776 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel bc86b93e-4a68-4470-a522-7bdfd2c6f95a restarted

Appreciate any help.

hello,have you ever solved this problem? I encounter same

@sachinkmohan
Copy link

sachinkmohan commented Oct 11, 2021

I have a similar issue. I try to run detection.ipynb on Jetson Nano (jetpack 4.3, python 3.6, tensorflow 1.15) but when it reaches trt.create_inference_graph() it stucks for several minutes and the kernel restarts. Memory usage is 3.3/3.9GB and swap almost empty. Last terminal outputs:

2020-06-05 23:51:45.473972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2 2020-06-05 23:51:45.688493: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job) [I 23:55:25.776 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel bc86b93e-4a68-4470-a522-7bdfd2c6f95a restarted

Appreciate any help.

The Kernel gets restarted in my case too.
I raised an issue with nvidia, but their solution didn't work for me.

My current settings are
TF 1.15.5
TensorRT 8.0.0
Ubuntu 18.04

Guess a lot of people are facing this issue when trying to optimize the frozen graph using TensorRT.

Repository owners, please fix this bug.

@sachinkmohan
Copy link

Here is the solution to this issue. @dkatsios @roarjn @evil-potato

Add one new parameter to this below code, i.e force_nms_cpu=False which is not present in this repository version of the code. Make sure you are also having the right TF and Jetpack version installed.

frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path,
    force_nms_cpu=False,
    #score_threshold=0.3,
    batch_size=1
)

When I closely looked in the jupyter terminal, the error pointed to something like this.
Tensorflow TensorRT: Could not load dynamic library 'libnvinfer.so.5'
which led me to the below links.
tensorflow/tensorflow#34329
https://forums.developer.nvidia.com/t/tf-trt-error-on-jetson-nano/187611
https://forums.developer.nvidia.com/t/error-while-converting-object-detection-model-to-tensorrt/117127
tensorflow/tensorrt#197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants