Bug 838070

Summary: host cpu offline then online, vcpupin guest vcpu to the online cpu will fail
Product: Red Hat Enterprise Linux 7 Reporter: EricLee <bili>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED ERRATA QA Contact: yalzhang <yalzhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dyasny, dyuan, gsun, jdee, jdenemar, jsuchane, lsu, mkletzan, mzhan, rbalakri, rwu, whuang, xuzhang, yalzhang, zhpeng
Target Milestone: rcKeywords: Reopened, TestOnly
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 10:33:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 947004    
Bug Blocks: 1098106    

Description EricLee 2012-07-06 11:56:02 UTC
Description of problem:
host cpu offline then online, vcpupin guest vcpu to the online cpu will fail 

Version-Release number of selected component (if applicable):
libvirt-0.9.13-2.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
My machine include 4 cores  /sys/devices/system/cpu/cpu[0-3]

# echo 0 >/sys/devices/system/cpu/cpu3/online

# virsh vcpupin aaa 0 3
error: Physical CPU 3 doesn't exist.
error: cpulist: Invalid format.

# echo 1 >/sys/devices/system/cpu/cpu3/online

# virsh vcpupin aaa 0 3
error: cannot set CPU affinity on process 3398: Invalid argument

But, this cpu is online:
# cat /sys/devices/system/cpu/online
0-3

Test it with other Process:
# mkdir /cpusets
# grep cpu /proc/filesystems
# mount -t cpuset nodev /cpusets
# mkdir        /cpusets/newvim
# vim &
# pidof vim   ---> 3932
# echo 3 > /cpusets/newvim/cpuset.cpus
# echo 0 > /cpusets/newvim/cpuset.mems
# echo 3932 > /cpusets/newvim/tasks
# cat /cpusets/newvim/tasks
3932
# cat /proc/3932/cpuset
/newvim

So, this core is online, but vcpupin to this core failed.

And the cgroup of libvirt get wrong info of the online core:
# cat /proc/cpuinfo | grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3

# cat /cgroup/cpuset/libvirt/cpuset.cpus 
0-2

# virsh dumpxml test | grep vcpu
  <vcpu placement='static'>4</vcpu>

# virsh start test
Domain test started

# ps aux| grep kvm
qemu     11404 77.9  9.9 2538956 379000 ?      Sl   19:22   0:22 /usr/libexec/qemu-kvm -name test -S -M rhel6.3.0 -enable-kvm -m 1024 -smp 4,sockets=4,cores=1,threads=1 -uuid 65c542c7-daa6-56d7-450b-9d5ae55372eb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait .......

# grep Cpus_allowed_list /proc/11404/task/*/status
/proc/11404/task/11404/status:Cpus_allowed_list:	0-3
/proc/11404/task/11427/status:Cpus_allowed_list:	0-3
/proc/11404/task/11428/status:Cpus_allowed_list:	0-3
/proc/11404/task/11429/status:Cpus_allowed_list:	0-3
/proc/11404/task/11430/status:Cpus_allowed_list:	0-3
/proc/11404/task/11431/status:Cpus_allowed_list:	0-3
/proc/11404/task/11435/status:Cpus_allowed_list:	0-3
/proc/11404/task/11454/status:Cpus_allowed_list:	0-3
.....

# echo 0 > /sys/devices/system/cpu/cpu3/online 

# grep Cpus_allowed_list /proc/11404/task/*/status
/proc/11404/task/11404/status:Cpus_allowed_list:	0-2
/proc/11404/task/11427/status:Cpus_allowed_list:	0-2
/proc/11404/task/11428/status:Cpus_allowed_list:	0-2
/proc/11404/task/11429/status:Cpus_allowed_list:	0-2
/proc/11404/task/11430/status:Cpus_allowed_list:	0-2
/proc/11404/task/11431/status:Cpus_allowed_list:	0-2

# cat /sys/devices/system/cpu/online 
0-3
# grep processor /proc/cpuinfo 
processor	: 0
processor	: 1
processor	: 2
processor	: 3
# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]
# cat /cgroup/cpuset/libvirt/cpuset.cpus 
0-2
# virsh destroy test
Domain test destroyed

# virsh start test
Domain test started

# ps aux | grep kvm
qemu     11866 83.0  9.3 2597792 354884 ?      Sl   19:25   0:19 /usr/libexec/qemu-kvm -name test -S -M rhel6.3.0 -enable-kvm -m 1024 -smp 4,sockets=4,cores=1,threads=1 ......

# grep Cpus_allowed_list /proc/11866/task/*/status
/proc/11866/task/11866/status:Cpus_allowed_list:	0-2
/proc/11866/task/11887/status:Cpus_allowed_list:	0-2
/proc/11866/task/11888/status:Cpus_allowed_list:	0-2
/proc/11866/task/11889/status:Cpus_allowed_list:	0-2
/proc/11866/task/11890/status:Cpus_allowed_list:	0-2
/proc/11866/task/11891/status:Cpus_allowed_list:	0-2

# cat /sys/devices/system/cpu/online
0-3

Actual results:
As steps.

Expected results:
1. # virsh vcpupin test 0 3 should successfully;
2. cgoup of libvirt should get right online cores info.

Comment 2 Peter Krempa 2012-08-27 12:15:21 UTC
*** Bug 846894 has been marked as a duplicate of this bug. ***

Comment 4 Martin Kletzander 2013-02-05 16:11:44 UTC
The problem is the same as Bug 748885, Bug 714271, etc.  I'm closing it as a dup, feel free to reopen (and reassign on kernel) if the problem still persists, thanks.

*** This bug has been marked as a duplicate of bug 748885 ***

Comment 5 Jincheng Miao 2013-10-10 12:06:29 UTC
This bug can also be reproduced in latest libvirt and kernel:

# rpm -q libvirt kernel
libvirt-0.10.2-29.el6.x86_64
kernel-2.6.32-422.el6.x86_64

# virsh start r6

# virsh dumpxml r6
<domain type='kvm' id='2'>
  <name>r6</name>
  <uuid>a8042aca-ab5f-e1d2-cff1-98266854f75b</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>4</vcpu>
...

# virsh vcpupin r6
VCPU: CPU Affinity
----------------------------------
   0: 0-7
   1: 0-7
   2: 0-7
   3: 0-7

# echo 0 >/sys/devices/system/cpu/cpu7/online

# virsh vcpupin r6 0 7
error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0

# echo 1 >/sys/devices/system/cpu/cpu7/online

# virsh vcpupin r6 0 7
error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0

# virsh vcpupin r6 0 6

# virsh vcpupin r6 
VCPU: CPU Affinity
----------------------------------
   0: 6
   1: 0-7
   2: 0-7
   3: 0-7

Comment 6 Jincheng Miao 2013-10-11 08:59:22 UTC
Hi Martin, could you take a look at this problem?

Comment 7 Martin Kletzander 2013-10-16 16:37:56 UTC
Try reproducing this without libvirt and if it's still a problem, post it to the bug this has been marked duplicate of (bug 748885) and reopen if applicable.

Comment 8 Jincheng Miao 2013-10-24 03:12:12 UTC
1. check cgroup mount points
# cat /proc/mounts  | grep cgroup
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0

check qemu-kvm process id
# pidof qemu-kvm
2736

2. create cgroup point for qemu-kvm process
# mkdir /cgroup/cpuset/r6

# cat /cgroup/cpuset/r6/cpuset.cpus 

# echo 0-7 > /cgroup/cpuset/r6/cpuset.cpus 

# echo 0 > /cgroup/cpuset/r6/cpuset.mems 

# echo `pidof qemu-kvm` > /cgroup/cpuset/r6/tasks 

3. stop cpu3
# echo 0 >/sys/devices/system/cpu/cpu3/online

# cat /cgroup/cpuset/r6/cpuset.cpus
0-2,4-7

since cpu3 is down, vcpu could not be pinned to cpu3
# echo 3 > /cgroup/cpuset/r6/cpuset.cpus
-bash: echo: write error: Invalid argument

4. restart cpu3
# echo 1 >/sys/devices/system/cpu/cpu3/online

we can pin vcpu to cpu3
# echo 3 > /cgroup/cpuset/r6/cpuset.cpus
# echo $?
0

As above, without libvirt, vcpu pin to a restored cpu is working well.

Comment 9 Jincheng Miao 2013-10-24 09:02:35 UTC
After some test on Cgroup and libvirt vcpupin, I found that this bug resides in libvirt:
Libvirtd does not set the cpuset.cpus recursively when performing vcpupin operation.

The cpuset.cpus is hierarchical for a domain vcpu: /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus

If one cpu is logically hot-unplugged, every cpuset.cpus of that path will remove this cpu, like:
# echo 0 >/sys/devices/system/cpu/cpu3/online
# cat /cgroup/cpuset/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus 
0-2,4-7

But if raise that cpu, only the topmost cpuset.cpus will restore.
# echo 1 >/sys/devices/system/cpu/cpu3/online
# cat /cgroup/cpuset/cpuset.cpus 
0-7
# cat /cgroup/cpuset/libvirt/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/qemu/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus 
0-2,4-7
# cat /cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus 
0-2,4-7

Libvirtd only executes vcpupin to /cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus , that will cause a 
"-bash: echo: write error: Permission denied"

in libvirtd.log:
2013-10-24 08:37:54.213+0000: 2413: debug : virCgroupSetValueStr:331 : Set value '/cgroup/cpuset/libvirt/qemu/r6/vcpu0/cpuset.cpus' to '3'
2013-10-24 08:37:54.213+0000: 2413: debug : virFileClose:72 : Closed fd 22
2013-10-24 08:37:54.213+0000: 2413: debug : virCgroupSetValueStr:335 : Failed to write value '3': Permission denied
2013-10-24 08:37:54.213+0000: 2413: error : qemuSetupCgroupEmulatorPin:519 : Unable to set cpuset.cpus: Permission denied
2013-10-24 08:37:54.214+0000: 2413: debug : qemudDomainPinVcpuFlags:4423 : Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0

So if I adjust the upper cpuset.cpus, vcpupin can pass:
# echo 0-7 > /cgroup/cpuset/libvirt/cpuset.cpus 
# echo 0-7 > /cgroup/cpuset/libvirt/qemu/cpuset.cpus 
# echo 0-7 > /cgroup/cpuset/libvirt/qemu/r6/cpuset.cpus
# virsh vcpupin r6 0 3
# echo $?
0

So I think I should re-open this libvirt bug.

Comment 10 Martin Kletzander 2013-10-24 10:30:53 UTC
Unfortunately, this is not the same as libvirt does.  I'm afraid you'll have to do it a little bit differently.

The problem is that the bug in kernel was related to hierarchical cgroups.  Let me try to get it reproducible and I'll get back to you with more info.

Comment 13 Jiri Denemark 2014-04-04 21:36:39 UTC
This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product.

Comment 14 Jiri Denemark 2014-05-16 08:02:37 UTC
*** Bug 1098101 has been marked as a duplicate of this bug. ***

Comment 15 Jiri Denemark 2014-05-16 08:02:56 UTC
*** Bug 1098106 has been marked as a duplicate of this bug. ***

Comment 17 yanbing du 2014-07-31 08:10:54 UTC
Hi Martin,
Here is an another scenario to encounter this fail:
1. Set 'cpuset' in <vcpu> element:
# virsh dumpxml test|grep cpu
  <vcpu placement='static' cpuset='1'>1</vcpu>
# virsh start test
2. Do vcpupin or emulatorpin
# virsh vcpupin test 0 0
error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for vcpu 0

# virsh emulatorpin test 0
error: Requested operation is not valid: failed to set cpuset.cpus in cgroup for emulator threads

or set 'cpuset' in <vcpu> element and set 'cpuset' in <cputune> element, then try to start domain
# virsh dumpxml test|grep cpu
  <vcpu placement='static' cpuset='1'>1</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
  </cputune>
# virsh start test
error: Failed to start domain test
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dtest.scope/vcpu0/cpuset.cpus': Permission denied


Two problems here(Both on rhel6 and rhel7):
1. libvirt doesn't ignore 'cpuset' inside the <vcpu> element, which mentioned in libvirt.org: http://libvirt.org/formatdomain.html#elementsCPUAllocation

2. But encounter the kernel bug(https://bugzilla.redhat.com/show_bug.cgi?id=947004) without cpu offline/online operation

What do you think? Should we open a new separate bug?

Comment 18 Martin Kletzander 2014-08-19 10:14:08 UTC
Sorry for the delay, I forgot about this bug.

About (1.), please file a new bug with the info in comment #17.  About (2.), what do you mean "without cpu off/on operation?

Anyway, I think this bug should be closed (it's still a duplicate of kernel bugs).

Comment 19 yanbing du 2014-08-19 12:01:18 UTC
(In reply to Martin Kletzander from comment #18)
> Sorry for the delay, I forgot about this bug.
> 
> About (1.), please file a new bug with the info in comment #17.  About (2.),

Thank, file bug https://bugzilla.redhat.com/show_bug.cgi?id=1131486 for issue (1)

> what do you mean "without cpu off/on operation?

In the kernel bug(also this bug), which mentioned offline/online operation:
 #echo 0>/sys/devices/system/cpu/cpu3/online
 #echo 1>/sys/devices/system/cpu/cpu3/online

> 
> Anyway, I think this bug should be closed (it's still a duplicate of kernel
> bugs).

Comment 20 Martin Kletzander 2014-08-22 10:51:24 UTC
You mean you encountered the kernel bug even without doing the offline/online operation?  That would have nothing to do with that bug then (IIUC).  Anyway, if the kernel bug is the only problem missing to fix this issue right now, I would either put it to ON_QA with TestOnly flag (there is nothing to do in libvirt after that bug is fixed) or close it as a dup of that.  If there's nothing else to discuss, please consider doing one of these two things, so we can move on with this bug, thank you.

Comment 21 yanbing du 2014-08-25 08:54:42 UTC
(In reply to Martin Kletzander from comment #20)
> You mean you encountered the kernel bug even without doing the
> offline/online operation?  That would have nothing to do with that bug then
> (IIUC).  Anyway, if the kernel bug is the only problem missing to fix this
> issue right now, I would either put it to ON_QA with TestOnly flag (there is
> nothing to do in libvirt after that bug is fixed) or close it as a dup of
> that.  If there's nothing else to discuss, please consider doing one of
> these two things, so we can move on with this bug, thank you.

I agree, we can set TestOnly on this bug, and re-test it when the kernel bug fix later.

Comment 27 yalzhang@redhat.com 2017-03-05 06:07:48 UTC
kernel bug 947004 is still in 'new' status. Will test this until kernel bug is fixed.

Comment 28 yalzhang@redhat.com 2017-06-08 04:49:14 UTC
kernel bug 947004 is addressed to rhel7.5, so move this bug to rhel7.5. And modify the status from ON_QA to Assgined for this testonly bug.

Comment 31 yalzhang@redhat.com 2017-12-11 06:51:59 UTC
Test on kernel-3.10.0-799.el7.x86_64, the result is as expected.

1. remount with option 'cpuset_v2_mode'

# mount  | grep cpuset
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)

# umount /sys/fs/cgroup/cpuset

# mount -t cgroup -ocpuset,nosuid,nodev,noexec,cpuset_v2_mode cgroup  /sys/fs/cgroup/cpuset

# mount  | grep cpuset
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset,cpuset_v2_mode)


2. start a vm and check the info

# virsh start rhel

# ll /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d1\x2drhel.scope
total 0
-rw-r--r--. 1 root root 0 Dec 11 14:33 cgroup.clone_children
--w--w--w-. 1 root root 0 Dec 11 14:33 cgroup.event_control
-rw-r--r--. 1 root root 0 Dec 11 14:33 cgroup.procs
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.cpu_exclusive
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.cpus
-r--r--r--. 1 root root 0 Dec 11 14:33 cpuset.effective_cpus  --> new
-r--r--r--. 1 root root 0 Dec 11 14:33 cpuset.effective_mems  --> new
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.mem_exclusive
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.mem_hardwall
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_migrate
-r--r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_pressure
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_spread_page
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.memory_spread_slab
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.mems
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.sched_load_balance
-rw-r--r--. 1 root root 0 Dec 11 14:33 cpuset.sched_relax_domain_level
drwxr-xr-x. 2 root root 0 Dec 11 14:33 emulator
-rw-r--r--. 1 root root 0 Dec 11 14:33 notify_on_release
-rw-r--r--. 1 root root 0 Dec 11 14:33 tasks
drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu0
drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu1
drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu2
drwxr-xr-x. 2 root root 0 Dec 11 14:33 vcpu3

# cat cpuset.cpus cpuset.effective_cpus
0-3
0-3

# echo 0 >/sys/devices/system/cpu/cpu3/online
# cat cpuset.cpus cpuset.effective_cpus
0-3
0-2
# virsh vcpupin rhel 0 3
error: cannot set CPU affinity on process 3029: Invalid argument

# echo 1 >/sys/devices/system/cpu/cpu3/online
# cat cpuset.cpus cpuset.effective_cpus
0-3
0-3

# virsh vcpupin rhel 0 3
# virsh vcpupin rhel 0
VCPU: CPU Affinity
----------------------------------
   0: 3

Comment 35 errata-xmlrpc 2018-04-10 10:33:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704