-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Really bad Performance with YOLOv8 #12638
Comments
👋 Hello @SpaceFlier100, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered. If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it. If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results. Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users. InstallPip install the pip install ultralytics EnvironmentsYOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
@SpaceFlier100 hi there! It sounds frustrating to deal with such performance issues. 🙁 Let's try a couple of things to optimize your model's utilization:
from ultralytics import YOLO
model = YOLO('runs/detect/yolov8n_v8_100e/weights/best.pt', amp=True) # Enabling AMP
model.to('cuda')
cam = cv2.VideoCapture(0)
while True:
result, img = cam.read()
results = model(img, conf=0.2, show=True, batch=8) # Adjust the batch size as needed
See if these changes help, and please reach back out if you continue experiencing issues or have any more questions! |
Hi, I tried implementing some of your recommendations. My interpreter doesn't recognise the amp keywords, so I'm getting a Typeerror error. I'm afraid I need low latency for my application so I can't batch process. I have the latest drivers, and can you elaborate on "Ensure that CUDA is properly configured"? I'm not sure what you mean by that. |
Hello @SpaceFlier100! Thanks for trying out the suggestions and providing feedback. Let's resolve these issues step by step:
Continue to ensure your system's utilization is efficient despite not being able to batch process. Sometimes reducing the resolution, as you've tried, might not be enough if the overhead of setting up each frame's detection is high. If your setup confirms that CUDA is being used but the GPU still isn’t fully utilized, consider profiling your code to identify any bottlenecks. Tools like NVIDIA’s Nsight Systems can be quite revealing for this purpose. Hope this helps, and let's keep the optimizations going! 😊 |
Cuda is outputting everything found, and upon looking into nsight it seems to be a graphics debugger rather than an AI tool, can it be applied to AI applications as well? Do you have any other tests or suggestions to determine why my GPU isn't fully used? The only gap where the code hangs is when |
Hello @SpaceFlier100! Great job diving into CUDA checks and using Nsight for debugging! 🚀 Indeed, while Nsight Systems is primarily targeted towards debugging and profiling applications at the system level, it can definitely be applied to AI applications to visualize compute and memory usage, which can help identify bottlenecks in your application. If you're noticing that the code hangs specifically at the model inference step ( Another quick thing to check is whether any other processes are competing for GPU resources. Simplifying the model or reducing the resolution were good initial tests but might not address the root cause if it’s related to how the incoming data is managed. If these suggestions don't help, running a simplified version of your script with minimal overhead can help isolate whether the issue is framework-related or specific to your current setup. Keep pushing; you’re on the right track! 👍 |
Search before asking
Question
Running YOLOv8 on a laptop with a 4070, I get an FPS of around 1 or 2. Looking in task manager shows that the GPU usage is around 20%, with it never utilising fully. I have tried resizing the input stream but that has no visible effect on frame times, even going as low as a 360x360 res.
My code:
Additional
No response
The text was updated successfully, but these errors were encountered: