Created attachment 1933971 [details] libvirt domain xml Description of problem: Using IOThreads gives better performance for virtio based disks. Ideally the IOThread is bound to a specific CPU and stays on that CPU. In the current implementation there is one CPU specified automatically, but looking on which CPU the IOThreads are running on shows that this setting is not taken into account. Additionally when using multiple IOThreads, one would like to specify more then one CPU (one CPU per IOThread). Version-Release number of selected component (if applicable): $ oc version Client Version: 4.10.45 Server Version: 4.10.45 How reproducible: Using a VM with the attached configuration and looking at which CPUs the actual IOThreads are running on when the guest does it's IO. The domxml for the VM specifies: <emulatorpin cpuset='25'/> <iothreadpin iothread='1' cpuset='25'/> but looking where the IOThreads run, shows that it's only IOThread 1 that is using that pinning, the others can use almost all CPUs: # ps -T -p 832716 PID SPID TTY TIME CMD 832716 832716 ? 00:00:10 qemu-kvm 832716 832724 ? 00:00:00 IO iothread1 832716 832725 ? 00:00:01 IO iothread2 832716 832726 ? 00:01:18 IO iothread3 832716 832727 ? 00:02:33 IO iothread4 832716 832728 ? 00:00:05 IO iothread5 # taskset -cp 832724 pid 832724's current affinity list: 25 [root@worker-0 tools]# taskset -cp 832725 pid 832725's current affinity list: 1-25,28-111,113-136,140-223 [root@worker-0 tools]# taskset -cp 832726 pid 832726's current affinity list: 1-25,28-111,113-136,140-223 [root@worker-0 tools]# taskset -cp 832727 pid 832727's current affinity list: 1-25,28-111,113-136,140-223 [root@worker-0 tools]# taskset -cp 832728 pid 832728's current affinity list: 1-25,28-111,113-136,140-223 Expected results: The IOthreads should run on the CPU defined in the domain.xml. Additional info: Ideally, the CPUs for the emulator pin can be specified in the VM.yml. This is important since you would want those threads to be on the NUMA socket where also the storage controller is located.
It might be worthwhile to point out that the following settings were used: isolateEmulatorThread: true and then dedicatedIOThread: true for each disk.
In the KubeVirt docs https://kubevirt.io/user-guide/virtual_machines/disks_and_volumes/#high-performance-features For the "IOThreads with Dedicated (pinned) CPUs" section, it appears the described behavior matches your description of what happened. Vladik, is there a way to achieve what Nils expected to happen?
(In reply to sgott from comment #4) > In the KubeVirt docs > > https://kubevirt.io/user-guide/virtual_machines/disks_and_volumes/#high- > performance-features > > For the "IOThreads with Dedicated (pinned) CPUs" section, it appears the > described behavior matches your description of what happened. > > Vladik, is there a way to achieve what Nils expected to happen? I think it's a bug that we are not treating all of the created IOthreads but the first one. We should pin all of these to the overallocated pCPU. With that said, I think there is no benefit of creating multiple threads when the emulator thread and IOThread are isolated on a separate pCPU.
@vromanso > I think it's a bug that we are not treating all of the created IOthreads but the first one. We should pin all of these to the overallocated pCPU. > With that said, I think there is no benefit of creating multiple threads when the emulator thread and IOThread are isolated on a separate pCPU. What about a usecase where you have two devices that sit on different NUMA nodes and hence would require different IOThreads?
We are pushing this Bug to 4.15 as per the severity and taking teams capacity into consideration.
This issue looks like a feature request to me, I don't see a bug. The isolateEmulatorThread CPU option pins the emulator iothread (iothread1 in your case) to a physical CPU. That option appears to work correctly. The dedicatedIOThread disk option however merely requests a separate iothread for each disk, but does not pin it to anything. That option appears to work correctly as well! The documentation in https://kubevirt.io/api-reference/main/definitions.html reflects the above. I do agree though that the lack of NUMA-aware behavior in KubeVirt is really unfortunate... We should do things like prefer all CPUs and memory for a given guest be allocated in the same NUMA node. And yes, pinning the iothread of a disk to its NUMA node would just make so much sense! Definitely things I'd like to see in future releases, but again, not a bug IMO.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days