You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I created a simple resnet classification model and converted it to ONNX format. I want to measure inference speed on GPU and CPU to select best for me.
I use nvcr.io/nvidia/tritonserver:22.03-py.
I use perf_analyzer to measure this speed and i faced a problem that creating multiple instances on CPU results in decreasing inference speed.
You might not sufficient CPU resources to do the work. You can use top or similar calls to see what's happening with your CPU, your RAM, and whatever else your model needs. Note that PA also runs on CPU, so it can compete for resources. More discussion here: #5108
If you are trying to figure out the optimal number of GPU or CPU instances, that is best answered by Model Analyzer.
Hi!
I created a simple resnet classification model and converted it to ONNX format. I want to measure inference speed on GPU and CPU to select best for me.
I use nvcr.io/nvidia/tritonserver:22.03-py.
I use perf_analyzer to measure this speed and i faced a problem that creating multiple instances on CPU results in decreasing inference speed.
When i create model with such parameters:
and runs perf_analyzer i get next results:
Terminal output 4 Instances
When i create model with 1 CPU instance:
perf_analyzer returns next:
Terminal output 1 CPU Instance
Can you please explain why increasing cpu instances results in increasing inference time?
Thanks a lot in advance!
The text was updated successfully, but these errors were encountered: