Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Acceleration Doesn't Work with CUDA 12 #316

Open
1 of 2 tasks
yeldarby opened this issue Mar 10, 2024 · 1 comment
Open
1 of 2 tasks

GPU Acceleration Doesn't Work with CUDA 12 #316

yeldarby opened this issue Mar 10, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@yeldarby
Copy link
Contributor

Search before asking

  • I have searched the Inference issues and found no similar bug report.

Bug

If you install inference-gpu on a machine with CUDA 12 it'll complain and fall back to CPU execution mode.

2024-03-10 21:00:56.403012156 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

There's a special version of onnxruntime needed for CUDA 12: https://onnxruntime.ai/docs/install/

Ideally we'd detect this automatically & make it "just work" with CUDA 12. But alternatively we could let the user know why they're not getting GPU acceleration.

Environment

root@C.10017133:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
root@C.10017133:/$ 

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

image

Minimal Reproducible Example

pip install inference-gpu
inference benchmark python-package-speed -m "yolov8n-640"

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@yeldarby yeldarby added the bug Something isn't working label Mar 10, 2024
@hvaria
Copy link
Contributor

hvaria commented Mar 16, 2024

To resolve the issue where inference-gpu defaults to CPU execution on systems with CUDA 12, due to compatibility with onnxruntime, consider extending the platform.py script to detect the CUDA version. This can be achieved by invoking system commands to extract the CUDA version information and including this data within the retrieve_platform_specifics function's return dictionary. Once CUDA version detection is implemented, use this information in benchmark_adapter.py before running benchmarks or initializing models that require GPU support. Check the detected CUDA version and, if CUDA 12 is identified, ensure the appropriate version of onnxruntime-gpu that supports CUDA 12 is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants