Bug 1251445

Summary: Fail to start vm with placement auto after offline certain cpu without restart libvirtd
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Wayne Sun <gsun>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DEFERRED QA Contact: jiyan <jiyan>
Severity: low Docs Contact:
Priority: medium    
Version: 8.0CC: dyuan, lhuang, mkletzan, mzhan, pkrempa, xuzhang, yalzhang
Target Milestone: rcKeywords: Triaged
Target Release: 8.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-11 13:05:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wayne Sun 2015-08-07 10:03:37 UTC
Description of problem:
offlice the cpu in the node which numad return, without restart libvirtd, start domain will fail


Version-Release number of selected component (if applicable):
# rpm -q libvirt kernel qemu-kvm-rhev
libvirt-1.2.17-3.el7.x86_64
kernel-3.10.0-302.el7.x86_64
qemu-kvm-rhev-2.3.0-15.el7.x86_64

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65514 MB
node 0 free: 60859 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 61649 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

How reproducible:
always

Steps to Reproduce:
1. update numad to return certain node for testing
# mv /usr/bin/numad /usr/bin/numad_
# vim /usr/bin/numad
#!/usr/bin/env python

import os
import sys

if len(sys.argv) == 3 and sys.argv[1] == '-w':
    print(1)
    sys.exit(0)

os.execv('/usr/bin/numad_', sys.argv)
sys.exit(1)

# chmod +x /usr/bin/numad

2. offline a cpu in node 1
# cat /sys/devices/system/cpu/cpu8/online 
1

# echo 0 > /sys/devices/system/cpu/cpu8/online 

3. start vm with placement as auto
# virsh dumpxml virt-tests-vm1
...
  <vcpu placement='auto'>2</vcpu>
...

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: Invalid value '8-15' for 'cpuset.cpus': Invalid argument
             
vm failed to start as '8-15' should be updated to '9-15' after offline cpu8


Actual results:
vm failed to start

Expected results:
vm could start with updated cpu value

Additional info:

If restarted libvirtd after offline the cpu, then start domain will succeed.

Comment 1 Martin Kletzander 2015-08-07 10:13:26 UTC
Thank you for filing this BZ.  Could you also try if running "virsh capabilities" fixes the issue (without restarting libvirtd)?  If it does, the problem is in virCapabilitiesGetCpusForNodemask() that it does not take actual state of the host into account, but rather uses cached data.

It's hard to say at this point whether libvirt should use inotify for the changes or not use the capabilities.

Comment 2 Martin Kletzander 2015-08-07 10:25:35 UTC
Also similar issue should happen if you just remove the cpu from cpuset.cpus from the machine.slice (but leave it online).  The only difference will be that instead of 'Invalid argument' you'll get 'Permission denied'.  It's worth noting because there will be two different fixes needed for this to work in libvirt, even though very similar error appears.  Thanks for understanding.

Comment 3 Wayne Sun 2015-08-07 10:31:53 UTC
(In reply to Martin Kletzander from comment #1)
virsh capabilites did not work

# echo 0 > /sys/devices/system/cpu/cpu8/online

# virsh capabilities
...

# echo $?
0

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: Invalid value '8-15' for 'cpuset.cpus': Invalid argument

# echo 1 > /sys/devices/system/cpu/cpu8/online 
# systemctl restart libvirtd

# head /sys/fs/cgroup/cpuset/{,machine.slice/}cpuset.{mems,cpus}
==> /sys/fs/cgroup/cpuset/cpuset.mems <==
0-1

==> /sys/fs/cgroup/cpuset/cpuset.cpus <==
0-15

==> /sys/fs/cgroup/cpuset/machine.slice/cpuset.mems <==
0-1

==> /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus <==
0-7,9-15

^ here cpuset.cpus not updated even after restart libvirtd

as comment #2 said the error now will be following:

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator/cpuset.cpus': Permission denied

Comment 4 Peter Krempa 2015-08-07 11:42:23 UTC
Looks like this will be closely related to a few fixes I'm planing to to so I'll assign this to myself.

Comment 6 Jaroslav Suchanek 2019-04-24 12:26:26 UTC
This bug is going to be addressed in next major release.

Comment 7 Jaroslav Suchanek 2020-02-11 13:05:23 UTC
This bug was closed deferred as a result of bug triage.

Please reopen if you disagree and provide justification why this bug should
get enough priority. Most important would be information about impact on
customer or layered product. Please indicate requested target release.