Really bad Performance with YOLOv8 #12638

SpaceFlier100 · 2024-05-13T02:35:25Z

Search before asking

I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

Running YOLOv8 on a laptop with a 4070, I get an FPS of around 1 or 2. Looking in task manager shows that the GPU usage is around 20%, with it never utilising fully. I have tried resizing the input stream but that has no visible effect on frame times, even going as low as a 360x360 res.
My code:

import cv2
from ultralytics import YOLO

model = YOLO('runs/detect/yolov8n_v8_100e/weights/best.pt')
model.to('cuda')
cam = cv2.VideoCapture(0)

while True:
    result, img = cam.read()
    results = model(img, conf=0.2, show=True)

Additional

No response

github-actions · 2024-05-13T02:35:49Z

👋 Hello @SpaceFlier100, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2024-05-13T08:17:35Z

@SpaceFlier100 hi there! It sounds frustrating to deal with such performance issues. 🙁 Let's try a couple of things to optimize your model's utilization:

Ensure that CUDA is properly configured - Sometimes, the system may not effectively delegate the task to the GPU, despite calling .to('cuda').
Minimize data transfer overhead - Running inference in a loop with transfer of images from CPU to GPU might be slowing you down. Try minimizing this overhead or batch processing if possible.
Adjust inference settings - Tweak parameters like batch size, number of workers in data loading, or enable features like Automatic Mixed Precision (amp) to possibly enhance the performance:

from ultralytics import YOLO

model = YOLO('runs/detect/yolov8n_v8_100e/weights/best.pt', amp=True)  # Enabling AMP
model.to('cuda')
cam = cv2.VideoCapture(0)

while True:
    result, img = cam.read()
    results = model(img, conf=0.2, show=True, batch=8)  # Adjust the batch size as needed

Check and update drivers - Sometimes, older GPU drivers might lead to poor utilization. Ensure your NVIDIA drivers are up-to-date.

See if these changes help, and please reach back out if you continue experiencing issues or have any more questions!

SpaceFlier100 · 2024-05-14T05:51:20Z

Hi, I tried implementing some of your recommendations. My interpreter doesn't recognise the amp keywords, so I'm getting a Typeerror error. I'm afraid I need low latency for my application so I can't batch process. I have the latest drivers, and can you elaborate on "Ensure that CUDA is properly configured"? I'm not sure what you mean by that.

glenn-jocher · 2024-05-14T09:35:39Z

Hello @SpaceFlier100! Thanks for trying out the suggestions and providing feedback. Let's resolve these issues step by step:

AMP Support: The error with amp seems to be due to usage in a context that doesn't support it directly. Unfortunately, AMP can't be enabled directly through the YOLO API as of now. My apologies for the incorrect advice there!

CUDA Configuration: Ensuring CUDA is properly configured typically involves checking that PyTorch is using the available GPU. You can check this by running:

import torch
print(torch.cuda.is_available())  # Should return True
print(torch.cuda.current_device())  # Should show your GPU index, usually 0
print(torch.cuda.get_device_name(0))  # Should display your GPU name

Continue to ensure your system's utilization is efficient despite not being able to batch process. Sometimes reducing the resolution, as you've tried, might not be enough if the overhead of setting up each frame's detection is high.

If your setup confirms that CUDA is being used but the GPU still isn’t fully utilized, consider profiling your code to identify any bottlenecks. Tools like NVIDIA’s Nsight Systems can be quite revealing for this purpose.

Hope this helps, and let's keep the optimizations going! 😊

SpaceFlier100 · 2024-05-15T07:58:38Z

Cuda is outputting everything found, and upon looking into nsight it seems to be a graphics debugger rather than an AI tool, can it be applied to AI applications as well? Do you have any other tests or suggestions to determine why my GPU isn't fully used? The only gap where the code hangs is when results = model(img, conf=0.2, show=True) is called, everywhere else it is lightning fast

glenn-jocher · 2024-05-15T17:59:49Z

Hello @SpaceFlier100! Great job diving into CUDA checks and using Nsight for debugging! 🚀 Indeed, while Nsight Systems is primarily targeted towards debugging and profiling applications at the system level, it can definitely be applied to AI applications to visualize compute and memory usage, which can help identify bottlenecks in your application.

If you're noticing that the code hangs specifically at the model inference step (results = model(img, conf=0.2, show=True)), it could be valuable to check if the data loading or preprocessing are indirectly causing delays. Given that the GPU isn't fully used, you might want to ensure that the image data is pre-loaded and ready to be processed by the GPU without waiting.

Another quick thing to check is whether any other processes are competing for GPU resources. Simplifying the model or reducing the resolution were good initial tests but might not address the root cause if it’s related to how the incoming data is managed.

If these suggestions don't help, running a simplified version of your script with minimal overhead can help isolate whether the issue is framework-related or specific to your current setup. Keep pushing; you’re on the right track! 👍

SpaceFlier100 added the question Further information is requested label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Really bad Performance with YOLOv8 #12638

Really bad Performance with YOLOv8 #12638

SpaceFlier100 commented May 13, 2024

github-actions bot commented May 13, 2024

glenn-jocher commented May 13, 2024

SpaceFlier100 commented May 14, 2024

glenn-jocher commented May 14, 2024

SpaceFlier100 commented May 15, 2024

glenn-jocher commented May 15, 2024

Really bad Performance with YOLOv8 #12638

Really bad Performance with YOLOv8 #12638

Comments

SpaceFlier100 commented May 13, 2024

Search before asking

Question

Additional

github-actions bot commented May 13, 2024

Install

Environments

Status

glenn-jocher commented May 13, 2024

SpaceFlier100 commented May 14, 2024

glenn-jocher commented May 14, 2024

SpaceFlier100 commented May 15, 2024

glenn-jocher commented May 15, 2024