Add/Implement NUMA awareness #1093

GermanAizek · 2024-01-22T10:03:32Z

Current and Expected Behavior

obs-StreamFX/source/util/util-threadpool.hpp

Line 119 in bbcce86

    
           threadpool(size_t minimum = 2, size_t maximum = std::thread::hardware_concurrency());

obs-StreamFX/components/ffmpeg/source/encoders/encoder-ffmpeg.cpp

Line 207 in bbcce86

_context->thread_count = static_cast<int>(std::thread::hardware_concurrency());

obs-StreamFX/components/ffmpeg/source/encoders/encoder-ffmpeg.cpp

Line 1085 in bbcce86

    
           auto p = obs_properties_add_int_slider(grp, ST_KEY_FFMPEG_THREADS, D_TRANSLATE(ST_I18N_FFMPEG_THREADS), 0, static_cast<int64_t>(std::thread::hardware_concurrency()) * 2, 1);

Steps to Reproduce the Problem

For more detailed explanation issue and fixes bug using my PR changes, you can read further at link here: notepad-plus-plus/notepad-plus-plus#14627

Log files & Crash Dumps

No response

Any additional Information we need to know?

No response

Xaymar · 2024-01-22T13:16:01Z

Seems fine as it is. It isn't returning 0, so nothing breaks.

GermanAizek · 2024-01-22T13:22:41Z

On my configuration hes return 36, not 72.

Xaymar · 2024-01-23T17:52:05Z

I think I get what's going on, and why you're expecting a different value. You have a NUMA-enabled system with multiple CPUs, which you incorrectly assume to act as free additional performance. However in order to actually take advantage of this NUMA design (even the fake NUMA design on AMD Ryzen), we'd need to be fully aware of it.

Why? Because the OS scheduler will pin our threads and resources to the (fake) NUMA node that is closest to the primary resource. In StreamFX case, this is the CPU closest to the GPU. The other CPU will be deprioritized by the OS scheduler, so more threads won't run faster automatically. If anything, any memory allocated on the wrong CPU will end up significantly slower than normal.

The value you see reported here, 36, is the correct value for NUMA unaware software. Since there is no reason to support NUMA architectures, there's no reason to change this. If you believe that an encoder would work better with more threads, or you believe that the micro-tasks really require more cores (on average there's 1-2 tasks per minute), then you're free to implement the required NUMA awareness code.

Xaymar · 2024-01-23T17:54:14Z

Also you can use the Custom Settings field to override what the UI offers for the FFmpeg encoders. Pretty sure there's no encoder out there that can effectively make use of that many threads in real time anyway.

GermanAizek · 2024-01-25T15:50:48Z

@Xaymar,
well, for correct detection, do not need to rewrite a lot. It is enough to write your own function taking into detect logical cores, I think it will be useful for your project. I can do PR for you to merge, if u dont really want to do it yourself. Without additional half threads, my CPU load does not grow more than ~55-60% for entire OS.

Xaymar · 2024-01-25T16:15:15Z

You do need to rewrite a lot for NUMA awareness. What is already being detected is correct for NUMA unaware applications. You can manually override the threads value using Custom Settings as already stated.

GermanAizek · 2024-01-25T16:19:04Z

I understand, u don't know ffmpeg itself can use all logical cores, then I need to do PR there to fix it?

Xaymar · 2024-01-25T19:14:49Z

You would need to make FFmpeg and StreamFX and OBS Studio fully NUMA aware on all platforms. This means that you need to replace new/delete, make_shared/make_unique, etc. with NUMA aware allocations close to where the primary resource related to the task is. Anything else will reduce performance.

Again, you can simply override the number of threads to the number of threads in your system. That's what Custom Settings is for. Note that Windows' incorrectly reports Core/Thread pairs as two cores, and thus fully relies on the CPU to report full utilization. A 36 Core system with 72 Threads at 50% is actually 100% utilized on Windows, but it can still do some additional work using unused or stalled pipelines. Threads are not more cores, just subdivisions of the Core that happen while the primary work on the Core is stalled.

AFAIK, only Intel properly reports Thread usage. AMD incorrectly claims Threads are at 0% usage if nothing is running on them or can run on them.

GermanAizek · 2024-01-26T16:20:02Z

@Xaymar,
Indeed, u are right. For example, in the x64dbg debugger, rewriting VirtualAllocExNuma here really gave good performance boost.
x64dbg/x64dbg#3307

Xaymar · 2024-01-26T16:49:35Z

Indeed. And for OBS Studio, and by extension StreamFX, to run better, you would need to allocate memory and threads close to the resource in question. This would mean allocating GPU-reliant data near the NUMA node that has the GPU, network-reliant data near the NUMA node that has the NIC, and disk related data near the NUMA node that has the disk. All for a tiny gain in performance that 99.9% of users will never need, and the remaining 0.1% could simply use Custom Settings and type in -threads=NUMBERHERE and be done in StreamFX.

Not to mention that it would actually slow down OBS Studio and StreamFX for users that do not have a NUMA system, since additional care has to be taken on every single allocation. With the amount of data StreamFX throws around per frame, and the amount of data OBS Studio throws around per frame, you'd end up slower than if you just used the Custom Settings as described above.

GermanAizek added the bug Bugs such as crashing, freezing, broken functionality, etc. label Jan 22, 2024

Xaymar added enhancement Enhancements and removed bug Bugs such as crashing, freezing, broken functionality, etc. labels Jan 22, 2024

Xaymar changed the title ~~Wrong detect all threads in multi-CPU motherboards~~ Add/Implement NUMA awareness Jan 23, 2024

karliss mentioned this issue Jan 29, 2024

Add support to Windows NUMA core count rizinorg/rizin#4167

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add/Implement NUMA awareness #1093

Add/Implement NUMA awareness #1093

GermanAizek commented Jan 22, 2024

Xaymar commented Jan 22, 2024

GermanAizek commented Jan 22, 2024

Xaymar commented Jan 23, 2024

Xaymar commented Jan 23, 2024

GermanAizek commented Jan 25, 2024

Xaymar commented Jan 25, 2024

GermanAizek commented Jan 25, 2024

Xaymar commented Jan 25, 2024

GermanAizek commented Jan 26, 2024 •

edited

Xaymar commented Jan 26, 2024

Add/Implement NUMA awareness #1093

Add/Implement NUMA awareness #1093

Comments

GermanAizek commented Jan 22, 2024

Current and Expected Behavior

Steps to Reproduce the Problem

Log files & Crash Dumps

Any additional Information we need to know?

Xaymar commented Jan 22, 2024

GermanAizek commented Jan 22, 2024

Xaymar commented Jan 23, 2024

Xaymar commented Jan 23, 2024

GermanAizek commented Jan 25, 2024

Xaymar commented Jan 25, 2024

GermanAizek commented Jan 25, 2024

Xaymar commented Jan 25, 2024

GermanAizek commented Jan 26, 2024 • edited

Xaymar commented Jan 26, 2024

GermanAizek commented Jan 26, 2024 •

edited