Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvOCDR deployment with Triton on Jetson #15

Open
abhay-iy97 opened this issue Apr 26, 2024 · 2 comments
Open

nvOCDR deployment with Triton on Jetson #15

abhay-iy97 opened this issue Apr 26, 2024 · 2 comments

Comments

@abhay-iy97
Copy link

Hello!
I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.

  1. GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.

    I0424 01:24:25.977171 295 python_be.cc:2055] TRITONBACKEND_ModelInstanceInitialize: nvOCDR (CPU device 0)
    

    I see the warpInfer() / warpInferPatches() in pybind.cpp place data on the GPU from the host + gpu usage increasing during calls to the server. However, I wanted to confirm that nvocdr model is utilizing the GPU for inferencing on Jetson Orin NX with JP5.1.2 or whether the model initialization with CPU device 0 needs to be investigated further?
    Posting a few references below -
    a. How to serve Python models on GPU · Issue #5889 · triton-inference-server/server · GitHub
    b. Does Python backend in Triton Server for Jetson supports GPU?
    c. Input tensor device placement - Triton

  2. Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?

    root@ubuntu:/enhancement# python3 client.py -d /data/images/test_img/ -bs 1 --url localhost:8001
    /usr/local/lib/python3.8/dist-packages/tritongrpcclient/__init__.py:33: DeprecationWarning: The package `tritongrpcclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.grpc`
      warnings.warn(
    [nvOCDR] Find total 2 images in /data/images/test_img/
    Initializing CUDA
    NvMMLiteBlockCreate : Block : BlockType = 256 
    [JPEG Decode] BeginSequence Display WidthxHeight 1118x1063
    NvMMLiteBlockCreate : Block : BlockType = 1 
    [nvOCDR] Processing for: /data/images/test_img/scene_text.jpg, image size: (1063, 1118, 3)
    Traceback (most recent call last):
      File "client.py", line 147, in <module>
        results = triton_client.infer(model_name=args.model_name,
      File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_client.py", line 1572, in infer
        raise_error_grpc(rpc_error)
      File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
        raise get_error_grpc(rpc_error) from None
    tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'nvOCDR', message: LogicError: cuFuncSetBlockShape failed: invalid resource handle
    
    At:
      /usr/local/lib/python3.8/dist-packages/pycuda/driver.py(481): function_call
      /opt/nvocdr/ocdr/triton/utils/cuda_resize_keep_AR.py(169): image_resize
      /opt/nvocdr/ocdr/triton/utils/process.py(87): preprocess
      /opt/nvocdr/ocdr/triton/models/nvOCDR/1/model.py(160): execute
    
    [JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
    [JPEG Decode] NvMMLiteJPEGDecBlockClose done
    
  3. Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?
    d478fc491520058ceec726461c4d08967404dd8f

@Tyler-D
Copy link
Collaborator

Tyler-D commented Apr 28, 2024

@morganh-nv @Bin-NV to check TritonServer issue

@morganh-nv
Copy link
Collaborator

We verify on dgpu machines only. You can refer to dockerfile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants