Bug 1436544
| Summary: | [PPC] Failed to migration after CPU hotplug | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Israel Pinto <ipinto> | ||||||||||
| Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | ||||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 7.3 | CC: | bugs, ipinto, mzamazal, pkrempa, rbalakri, tjelinek | ||||||||||
| Target Milestone: | pre-dev-freeze | Keywords: | AutomationBlocker, Regression | ||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-03-30 14:49:50 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 1266846 [details]
vdsm log
Are you sure this has been working before? I don't remember if this has been ever tested. In which version did it work? Also, there is a libvirt error, can you please provide libvirt and qemu logs? If it is of any relevance, I can see that there may be some difference in NUMA policies on the two hosts. Source: # numactl --show policy: default preferred node: current physcpubind: 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 cpubind: 0 1 16 17 nodebind: 0 1 16 17 membind: 0 1 16 17 Destination: # numactl --show policy: default preferred node: current physcpubind: 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 cpubind: 0 16 nodebind: 0 16 membind: 0 16 Comparing with the reported error libvirtError: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument there can be some relation. Possibly changed NUMA policies could be also one of possible explanations why the bug hasn't been present before. Created attachment 1267539 [details]
Source host libvirtd log
Created attachment 1267540 [details]
Destination host libvirtd log
The bug seems to be in libvirt or below, reassigning to it. The two hosts are in NUMA configuration described in comment #4. The destination host has 4 NUMA nodes (0, 1, 16, 17), but 2 of them (1, 17) have inaccessible memory and the CPUs assigned to them have been offlined. The CPU state is as follows: # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0,8,16,24,32,40,48,56,64,72 Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-159 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s): 2 NUMA node(s): 4 Model: 2.0 (pvr 004b 0200) Model name: POWER8E (raw), altivec supported L1d cache: 64K L1i cache: 32K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0,8,16,24,32 NUMA node1 CPU(s): NUMA node16 CPU(s): 40,48,56,64,72 NUMA node17 CPU(s): Here are the details how to reproduce the bug with virsh on that setup: - On the source host, create dummy.xml domain XML file with the following content: <domain type='kvm'> <name>dummy</name> <uuid>4f836db9-e87f-425b-9278-5693be04a978</uuid> <vcpu placement='static' current='1'>16</vcpu> <memory unit='KiB'>1048576</memory> <os> <type arch='ppc64le' machine='pseries-rhel7.3.0'>hvm</type> </os> </domain> - Start the VM: virsh create dummy.xml - Hot plug 1 CPU to it: virsh setvcpus dummy 2 - Try to migrate the VM: virsh migrate dummy qemu+tcp://root@DESTINATION-HOST/system --verbose --live --auto-converge --compressed --p2p --abort-on-error The command fails with the following error: error: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument See virsh-*.log attachments for libvirtd logs from the two hosts. libvirt-2.0.0-10.el7_3.5.ppc64le qemu-kvm-rhev-2.6.0-28.el7_3.8.ppc64le kernel 3.10.0-514.16.1.el7.ppc64le Since the nodes have no memory in it it's an instance of bug https://bugzilla.redhat.com/show_bug.cgi?id=1375268 . vCPU hotplug triggered this as vCPUs are hotplugged right after the qemu process is started. *** This bug has been marked as a duplicate of bug 1375268 *** |
Created attachment 1266845 [details] engine.log Description of problem: On PPC EVN, Hotplug CPU to VM and migration it. Migration failed. Version-Release number of selected component (if applicable): Engine: 4.1.1.6-0.1.el7 Host: OS Version:RHEL - 7.3 - 7.el7 Kernel Version:3.10.0 - 514.16.1.el7.ppc64le KVM Version:2.6.0 - 28.el7_3.8 LIBVIRT Version:libvirt-2.0.0-10.el7_3.5 VDSM Version:vdsm-4.19.10-1.el7ev How reproducible: All the time Steps to Reproduce: 1. Start VM and hotplug CPU to 4 CPUs 2. Migration VM Actual results: Migration Failed Engine log: 2017-03-28 09:45:17,065+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [5bbe997a] EVENT_ID: VM_MIGRATION_FAILED(65), Correlation ID: e0172605-487f-4b69-9e10-af13f2adb8f9, Job ID: c4ac7fe0-42c7-4e5a-b094-da30cb1d9a11, Call Stack: null, Custom Event ID: -1, Message: Migration failed (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_2). VDSM log: 2017-03-28 01:44:59,702-0500 INFO (migsrc/804785bb) [virt.vm] (vmId='804785bb-e857-4252-9979-60e0913d79c6') starting migration to qemu+tls://ibm-p8-rhevm-hv-01.klab.eng.bos.redhat.com/system with miguri tcp://10.16.160.28 (migration:480) 2017-03-28 01:45:01,149-0500 ERROR (migsrc/804785bb) [virt.vm] (vmId='804785bb-e857-4252-9979-60e0913d79c6') Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument (migration:287) 2017-03-28 01:45:01,172-0500 ERROR (migsrc/804785bb) [virt.vm] (vmId='804785bb-e857-4252-9979-60e0913d79c6') Failed to migrate (migration:429) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 411, in run self._startUnderlyingMigration(time.time()) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 489, in _startUnderlyingMigration self._perform_with_downtime_thread(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 555, in _perform_with_downtime_thread self._perform_migration(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 528, in _perform_migration self._vm._dom.migrateToURI3(duri, params, flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1939, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument