nvOCDR deployment with Triton on Jetson #15

abhay-iy97 · 2024-04-26T23:56:24Z

Hello!
I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.

GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.
```
I0424 01:24:25.977171 295 python_be.cc:2055] TRITONBACKEND_ModelInstanceInitialize: nvOCDR (CPU device 0)
```
I see the warpInfer() / warpInferPatches() in pybind.cpp place data on the GPU from the host + gpu usage increasing during calls to the server. However, I wanted to confirm that nvocdr model is utilizing the GPU for inferencing on Jetson Orin NX with JP5.1.2 or whether the model initialization with CPU device 0 needs to be investigated further?
Posting a few references below -
a. How to serve Python models on GPU · Issue #5889 · triton-inference-server/server · GitHub
b. Does Python backend in Triton Server for Jetson supports GPU?
c. Input tensor device placement - Triton

Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?

root@ubuntu:/enhancement# python3 client.py -d /data/images/test_img/ -bs 1 --url localhost:8001
/usr/local/lib/python3.8/dist-packages/tritongrpcclient/__init__.py:33: DeprecationWarning: The package `tritongrpcclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.grpc`
  warnings.warn(
[nvOCDR] Find total 2 images in /data/images/test_img/
Initializing CUDA
NvMMLiteBlockCreate : Block : BlockType = 256 
[JPEG Decode] BeginSequence Display WidthxHeight 1118x1063
NvMMLiteBlockCreate : Block : BlockType = 1 
[nvOCDR] Processing for: /data/images/test_img/scene_text.jpg, image size: (1063, 1118, 3)
Traceback (most recent call last):
  File "client.py", line 147, in <module>
    results = triton_client.infer(model_name=args.model_name,
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_client.py", line 1572, in infer
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'nvOCDR', message: LogicError: cuFuncSetBlockShape failed: invalid resource handle

At:
  /usr/local/lib/python3.8/dist-packages/pycuda/driver.py(481): function_call
  /opt/nvocdr/ocdr/triton/utils/cuda_resize_keep_AR.py(169): image_resize
  /opt/nvocdr/ocdr/triton/utils/process.py(87): preprocess
  /opt/nvocdr/ocdr/triton/models/nvOCDR/1/model.py(160): execute

[JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
[JPEG Decode] NvMMLiteJPEGDecBlockClose done

Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?

The text was updated successfully, but these errors were encountered:

Tyler-D · 2024-04-28T00:51:44Z

@morganh-nv @Bin-NV to check TritonServer issue

morganh-nv · 2024-04-29T09:35:42Z

We verify on dgpu machines only. You can refer to dockerfile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvOCDR deployment with Triton on Jetson #15

nvOCDR deployment with Triton on Jetson #15

abhay-iy97 commented Apr 26, 2024

Tyler-D commented Apr 28, 2024

morganh-nv commented Apr 29, 2024

nvOCDR deployment with Triton on Jetson #15

nvOCDR deployment with Triton on Jetson #15

Comments

abhay-iy97 commented Apr 26, 2024

Tyler-D commented Apr 28, 2024

morganh-nv commented Apr 29, 2024