Bug 1251445 - Fail to start vm with placement auto after offline certain cpu without restart libvirtd
Fail to start vm with placement auto after offline certain cpu without restar...
Status: ASSIGNED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.2
x86_64 Linux
medium Severity low
: rc
: ---
Assigned To: Peter Krempa
Luyao Huang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-07 06:03 EDT by Wayne Sun
Modified: 2017-09-01 09:32 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Wayne Sun 2015-08-07 06:03:37 EDT
Description of problem:
offlice the cpu in the node which numad return, without restart libvirtd, start domain will fail


Version-Release number of selected component (if applicable):
# rpm -q libvirt kernel qemu-kvm-rhev
libvirt-1.2.17-3.el7.x86_64
kernel-3.10.0-302.el7.x86_64
qemu-kvm-rhev-2.3.0-15.el7.x86_64

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65514 MB
node 0 free: 60859 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 61649 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

How reproducible:
always

Steps to Reproduce:
1. update numad to return certain node for testing
# mv /usr/bin/numad /usr/bin/numad_
# vim /usr/bin/numad
#!/usr/bin/env python

import os
import sys

if len(sys.argv) == 3 and sys.argv[1] == '-w':
    print(1)
    sys.exit(0)

os.execv('/usr/bin/numad_', sys.argv)
sys.exit(1)

# chmod +x /usr/bin/numad

2. offline a cpu in node 1
# cat /sys/devices/system/cpu/cpu8/online 
1

# echo 0 > /sys/devices/system/cpu/cpu8/online 

3. start vm with placement as auto
# virsh dumpxml virt-tests-vm1
...
  <vcpu placement='auto'>2</vcpu>
...

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: Invalid value '8-15' for 'cpuset.cpus': Invalid argument
             
vm failed to start as '8-15' should be updated to '9-15' after offline cpu8


Actual results:
vm failed to start

Expected results:
vm could start with updated cpu value

Additional info:

If restarted libvirtd after offline the cpu, then start domain will succeed.
Comment 1 Martin Kletzander 2015-08-07 06:13:26 EDT
Thank you for filing this BZ.  Could you also try if running "virsh capabilities" fixes the issue (without restarting libvirtd)?  If it does, the problem is in virCapabilitiesGetCpusForNodemask() that it does not take actual state of the host into account, but rather uses cached data.

It's hard to say at this point whether libvirt should use inotify for the changes or not use the capabilities.
Comment 2 Martin Kletzander 2015-08-07 06:25:35 EDT
Also similar issue should happen if you just remove the cpu from cpuset.cpus from the machine.slice (but leave it online).  The only difference will be that instead of 'Invalid argument' you'll get 'Permission denied'.  It's worth noting because there will be two different fixes needed for this to work in libvirt, even though very similar error appears.  Thanks for understanding.
Comment 3 Wayne Sun 2015-08-07 06:31:53 EDT
(In reply to Martin Kletzander from comment #1)
virsh capabilites did not work

# echo 0 > /sys/devices/system/cpu/cpu8/online

# virsh capabilities
...

# echo $?
0

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: Invalid value '8-15' for 'cpuset.cpus': Invalid argument

# echo 1 > /sys/devices/system/cpu/cpu8/online 
# systemctl restart libvirtd

# head /sys/fs/cgroup/cpuset/{,machine.slice/}cpuset.{mems,cpus}
==> /sys/fs/cgroup/cpuset/cpuset.mems <==
0-1

==> /sys/fs/cgroup/cpuset/cpuset.cpus <==
0-15

==> /sys/fs/cgroup/cpuset/machine.slice/cpuset.mems <==
0-1

==> /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus <==
0-7,9-15

^ here cpuset.cpus not updated even after restart libvirtd

as comment #2 said the error now will be following:

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator/cpuset.cpus': Permission denied
Comment 4 Peter Krempa 2015-08-07 07:42:23 EDT
Looks like this will be closely related to a few fixes I'm planing to to so I'll assign this to myself.

Note You need to log in before you can comment on or make changes to this bug.