You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.
GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.
Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?
root@ubuntu:/enhancement# python3 client.py -d /data/images/test_img/ -bs 1 --url localhost:8001
/usr/local/lib/python3.8/dist-packages/tritongrpcclient/__init__.py:33: DeprecationWarning: The package `tritongrpcclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.grpc`
warnings.warn(
[nvOCDR] Find total 2 images in /data/images/test_img/
Initializing CUDA
NvMMLiteBlockCreate : Block : BlockType = 256
[JPEG Decode] BeginSequence Display WidthxHeight 1118x1063
NvMMLiteBlockCreate : Block : BlockType = 1
[nvOCDR] Processing for: /data/images/test_img/scene_text.jpg, image size: (1063, 1118, 3)
Traceback (most recent call last):
File "client.py", line 147, in <module>
results = triton_client.infer(model_name=args.model_name,
File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_client.py", line 1572, in infer
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'nvOCDR', message: LogicError: cuFuncSetBlockShape failed: invalid resource handle
At:
/usr/local/lib/python3.8/dist-packages/pycuda/driver.py(481): function_call
/opt/nvocdr/ocdr/triton/utils/cuda_resize_keep_AR.py(169): image_resize
/opt/nvocdr/ocdr/triton/utils/process.py(87): preprocess
/opt/nvocdr/ocdr/triton/models/nvOCDR/1/model.py(160): execute
[JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
[JPEG Decode] NvMMLiteJPEGDecBlockClose done
Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?
The text was updated successfully, but these errors were encountered:
Hello!
I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.
GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.
I see the warpInfer() / warpInferPatches() in pybind.cpp place data on the GPU from the host + gpu usage increasing during calls to the server. However, I wanted to confirm that nvocdr model is utilizing the GPU for inferencing on Jetson Orin NX with JP5.1.2 or whether the model initialization with
CPU device 0
needs to be investigated further?Posting a few references below -
a. How to serve Python models on GPU · Issue #5889 · triton-inference-server/server · GitHub
b. Does Python backend in Triton Server for Jetson supports GPU?
c. Input tensor device placement - Triton
Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?
Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?
The text was updated successfully, but these errors were encountered: