Bug 838070
Summary: | host cpu offline then online, vcpupin guest vcpu to the online cpu will fail | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | EricLee <bili> |
Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> |
Status: | CLOSED ERRATA | QA Contact: | yalzhang <yalzhang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 | CC: | dyasny, dyuan, gsun, jdee, jdenemar, jsuchane, lsu, mkletzan, mzhan, rbalakri, rwu, whuang, xuzhang, yalzhang, zhpeng |
Target Milestone: | rc | Keywords: | Reopened, TestOnly |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 10:33:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 947004 | ||
Bug Blocks: | 1098106 |
Description
EricLee
2012-07-06 11:56:02 UTC
*** Bug 846894 has been marked as a duplicate of this bug. *** The problem is the same as Bug 748885, Bug 714271, etc. I'm closing it as a dup, feel free to reopen (and reassign on kernel) if the problem still persists, thanks. *** This bug has been marked as a duplicate of bug 748885 *** This bug can also be reproduced in latest libvirt and kernel: # rpm -q libvirt kernel libvirt-0.10.2-29.el6.x86_64 kernel-2.6.32-422.el6.x86_64 # virsh start r6 # virsh dumpxml r6 <domain type='kvm' id='2'> <name>r6</name> <uuid>a8042aca-ab5f-e1d2-cff1-98266854f75b</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static'>4</vcpu> ... # virsh vcpupin r6 VCPU: CPU Affinity ---------------------------------- 0: 0-7 1: 0-7 2: 0-7 3: 0-7 # echo 0 >/sys/devices/system/cpu/cpu7/online # virsh vcpupin r6 0 7 error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0 # echo 1 >/sys/devices/system/cpu/cpu7/online # virsh vcpupin r6 0 7 error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0 # virsh vcpupin r6 0 6 # virsh vcpupin r6 VCPU: CPU Affinity ---------------------------------- 0: 6 1: 0-7 2: 0-7 3: 0-7 Hi Martin, could you take a look at this problem? Try reproducing this without libvirt and if it's still a problem, post it to the bug this has been marked duplicate of (bug 748885) and reopen if applicable. 1. check cgroup mount points # cat /proc/mounts | grep cgroup cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0 cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0 check qemu-kvm process id # pidof qemu-kvm 2736 2. create cgroup point for qemu-kvm process # mkdir /cgroup/cpuset/r6 # cat /cgroup/cpuset/r6/cpuset.cpus # echo 0-7 > /cgroup/cpuset/r6/cpuset.cpus # echo 0 > /cgroup/cpuset/r6/cpuset.mems # echo `pidof qemu-kvm` > /cgroup/cpuset/r6/tasks 3. stop cpu3 # echo 0 >/sys/devices/system/cpu/cpu3/online # cat /cgroup/cpuset/r6/cpuset.cpus 0-2,4-7 since cpu3 is down, vcpu could not be pinned to cpu3 # echo 3 > /cgroup/cpuset/r6/cpuset.cpus -bash: echo: write error: Invalid argument 4. restart cpu3 # echo 1 >/sys/devices/system/cpu/cpu3/online we can pin vcpu to cpu3 # echo 3 > /cgroup/cpuset/r6/cpuset.cpus # echo $? 0 As above, without libvirt, vcpu pin to a restored cpu is working well. After some test on Cgroup and libvirt vcpupin, I found that this bug resides in libvirt: Libvirtd does not set the cpuset.cpus recursively when performing vcpupin operation. The cpuset.cpus is hierarchical for a domain vcpu: /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus If one cpu is logically hot-unplugged, every cpuset.cpus of that path will remove this cpu, like: # echo 0 >/sys/devices/system/cpu/cpu3/online # cat /cgroup/cpuset/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus 0-2,4-7 But if raise that cpu, only the topmost cpuset.cpus will restore. # echo 1 >/sys/devices/system/cpu/cpu3/online # cat /cgroup/cpuset/cpuset.cpus 0-7 # cat /cgroup/cpuset/libvirt/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus 0-2,4-7 # cat /cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus 0-2,4-7 Libvirtd only executes vcpupin to /cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus , that will cause a "-bash: echo: write error: Permission denied" in libvirtd.log: 2013-10-24 08:37:54.213+0000: 2413: debug : virCgroupSetValueStr:331 : Set value '/cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus' to '3' 2013-10-24 08:37:54.213+0000: 2413: debug : virFileClose:72 : Closed fd 22 2013-10-24 08:37:54.213+0000: 2413: debug : virCgroupSetValueStr:335 : Failed to write value '3': Permission denied 2013-10-24 08:37:54.213+0000: 2413: error : qemuSetupCgroupEmulatorPin:519 : Unable to set cpuset.cpus: Permission denied 2013-10-24 08:37:54.214+0000: 2413: debug : qemudDomainPinVcpuFlags:4423 : Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0 So if I adjust the upper cpuset.cpus, vcpupin can pass: # echo 0-7 > /cgroup/cpuset/libvirt/cpuset.cpus # echo 0-7 > /cgroup/cpuset/libvirt/qemu/cpuset.cpus # echo 0-7 > /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus # virsh vcpupin r6 0 3 # echo $? 0 So I think I should re-open this libvirt bug. Unfortunately, this is not the same as libvirt does. I'm afraid you'll have to do it a little bit differently. The problem is that the bug in kernel was related to hierarchical cgroups. Let me try to get it reproducible and I'll get back to you with more info. This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product. *** Bug 1098101 has been marked as a duplicate of this bug. *** *** Bug 1098106 has been marked as a duplicate of this bug. *** Hi Martin, Here is an another scenario to encounter this fail: 1. Set 'cpuset' in <vcpu> element: # virsh dumpxml test|grep cpu <vcpu placement='static' cpuset='1'>1</vcpu> # virsh start test 2. Do vcpupin or emulatorpin # virsh vcpupin test 0 0 error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0 # virsh emulatorpin test 0 error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for emulator threads or set 'cpuset' in <vcpu> element and set 'cpuset' in <cputune> element, then try to start domain # virsh dumpxml test|grep cpu <vcpu placement='static' cpuset='1'>1</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> </cputune> # virsh start test error: Failed to start domain test error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dtest.scope/vcpu0/cpuset.cpus': Permission denied Two problems here(Both on rhel6 and rhel7): 1. libvirt doesn't ignore 'cpuset' inside the <vcpu> element, which mentioned in libvirt.org: http://libvirt.org/formatdomain.html#elementsCPUAllocation 2. But encounter the kernel bug(https://bugzilla.redhat.com/show_bug.cgi?id=947004) without cpu offline/online operation What do you think? Should we open a new separate bug? Sorry for the delay, I forgot about this bug. About (1.), please file a new bug with the info in comment #17. About (2.), what do you mean "without cpu off/on operation? Anyway, I think this bug should be closed (it's still a duplicate of kernel bugs). (In reply to Martin Kletzander from comment #18) > Sorry for the delay, I forgot about this bug. > > About (1.), please file a new bug with the info in comment #17. About (2.), Thank, file bug https://bugzilla.redhat.com/show_bug.cgi?id=1131486 for issue (1) > what do you mean "without cpu off/on operation? In the kernel bug(also this bug), which mentioned offline/online operation: #echo 0>/sys/devices/system/cpu/cpu3/online #echo 1>/sys/devices/system/cpu/cpu3/online > > Anyway, I think this bug should be closed (it's still a duplicate of kernel > bugs). You mean you encountered the kernel bug even without doing the offline/online operation? That would have nothing to do with that bug then (IIUC). Anyway, if the kernel bug is the only problem missing to fix this issue right now, I would either put it to ON_QA with TestOnly flag (there is nothing to do in libvirt after that bug is fixed) or close it as a dup of that. If there's nothing else to discuss, please consider doing one of these two things, so we can move on with this bug, thank you. (In reply to Martin Kletzander from comment #20) > You mean you encountered the kernel bug even without doing the > offline/online operation? That would have nothing to do with that bug then > (IIUC). Anyway, if the kernel bug is the only problem missing to fix this > issue right now, I would either put it to ON_QA with TestOnly flag (there is > nothing to do in libvirt after that bug is fixed) or close it as a dup of > that. If there's nothing else to discuss, please consider doing one of > these two things, so we can move on with this bug, thank you. I agree, we can set TestOnly on this bug, and re-test it when the kernel bug fix later. kernel bug 947004 is still in 'new' status. Will test this until kernel bug is fixed. kernel bug 947004 is addressed to rhel7.5, so move this bug to rhel7.5. And modify the status from ON_QA to Assgined for this testonly bug. Test on kernel-3.10.0-799.el7.x86_64, the result is as expected. 1. remount with option 'cpuset_v2_mode' # mount | grep cpuset cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset) # umount /sys/fs/cgroup/cpuset # mount -t cgroup -ocpuset,nosuid,nodev,noexec,cpuset_v2_mode cgroup /sys/fs/cgroup/cpuset # mount | grep cpuset cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,cpuset_v2_mode) 2. start a vm and check the info # virsh start rhel # ll /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d1\x2drhel.scope total 0 -rw-r--r--. 1 root root 0 Dec 11 14:33 cgroup.clone_children --w--w--w-. 1 root root 0 Dec 11 14:33 cgroup.event_control -rw-r--r--. 1 root root 0 Dec 11 14:33 cgroup.procs -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.cpu_exclusive -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.cpus -r--r--r--. 1 root root 0 Dec 11 14:33 cpuset.effective_cpus --> new -r--r--r--. 1 root root 0 Dec 11 14:33 cpuset.effective_mems --> new -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.mem_exclusive -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.mem_hardwall -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_migrate -r--r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_pressure -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_spread_page -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_spread_slab -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.mems -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.sched_load_balance -rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.sched_relax_domain_level drwxr-xr-x. 2 root root 0 Dec 11 14:33 emulator -rw-r--r--. 1 root root 0 Dec 11 14:33 notify_on_release -rw-r--r--. 1 root root 0 Dec 11 14:33 tasks drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu0 drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu1 drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu2 drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu3 # cat cpuset.cpus cpuset.effective_cpus 0-3 0-3 # echo 0 >/sys/devices/system/cpu/cpu3/online # cat cpuset.cpus cpuset.effective_cpus 0-3 0-2 # virsh vcpupin rhel 0 3 error: cannot set CPU affinity on process 3029: Invalid argument # echo 1 >/sys/devices/system/cpu/cpu3/online # cat cpuset.cpus cpuset.effective_cpus 0-3 0-3 # virsh vcpupin rhel 0 3 # virsh vcpupin rhel 0 VCPU: CPU Affinity ---------------------------------- 0: 3 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |