RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1436544 - [PPC] Failed to migration after CPU hotplug
Summary: [PPC] Failed to migration after CPU hotplug
Keywords:
Status: CLOSED DUPLICATE of bug 1375268
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Libvirt Maintainers
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-28 07:16 UTC by Israel Pinto
Modified: 2017-03-30 14:49 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-30 14:49:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (13.86 KB, application/zip)
2017-03-28 07:16 UTC, Israel Pinto
no flags Details
vdsm log (720.87 KB, application/zip)
2017-03-28 07:16 UTC, Israel Pinto
no flags Details
Source host libvirtd log (29.12 KB, application/x-xz)
2017-03-30 13:37 UTC, Milan Zamazal
no flags Details
Destination host libvirtd log (35.99 KB, application/x-xz)
2017-03-30 13:38 UTC, Milan Zamazal
no flags Details

Description Israel Pinto 2017-03-28 07:16:24 UTC
Created attachment 1266845 [details]
engine.log

Description of problem:
On PPC EVN, Hotplug CPU to VM and migration it.
Migration failed.

Version-Release number of selected component (if applicable):
Engine: 4.1.1.6-0.1.el7
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 514.16.1.el7.ppc64le
KVM Version:2.6.0 - 28.el7_3.8
LIBVIRT Version:libvirt-2.0.0-10.el7_3.5
VDSM Version:vdsm-4.19.10-1.el7ev

How reproducible:
All the time

Steps to Reproduce:
1. Start VM and hotplug CPU to 4 CPUs
2. Migration VM

Actual results:
Migration Failed

Engine log:
2017-03-28 09:45:17,065+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [5bbe997a] EVENT_ID: VM_MIGRATION_FAILED(65), Correlation ID: e0172605-487f-4b69-9e10-af13f2adb8f9, Job ID: c4ac7fe0-42c7-4e5a-b094-da30cb1d9a11, Call Stack: null, Custom Event ID: -1, Message: Migration failed  (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_2).

VDSM log: 
2017-03-28 01:44:59,702-0500 INFO  (migsrc/804785bb) [virt.vm] (vmId='804785bb-e857-4252-9979-60e0913d79c6') starting migration to qemu+tls://ibm-p8-rhevm-hv-01.klab.eng.bos.redhat.com/system with miguri tcp://10.16.160.28 (migration:480)
2017-03-28 01:45:01,149-0500 ERROR (migsrc/804785bb) [virt.vm] (vmId='804785bb-e857-4252-9979-60e0913d79c6') Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument (migration:287)
2017-03-28 01:45:01,172-0500 ERROR (migsrc/804785bb) [virt.vm] (vmId='804785bb-e857-4252-9979-60e0913d79c6') Failed to migrate (migration:429)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 411, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 489, in _startUnderlyingMigration
    self._perform_with_downtime_thread(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 555, in _perform_with_downtime_thread
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 528, in _perform_migration
    self._vm._dom.migrateToURI3(duri, params, flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1939, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument

Comment 1 Israel Pinto 2017-03-28 07:16:49 UTC
Created attachment 1266846 [details]
vdsm log

Comment 2 Tomas Jelinek 2017-03-28 08:28:00 UTC
Are you sure this has been working before? I don't remember if this has been ever tested. In which version did it work?

Also, there is a libvirt error, can you please provide libvirt and qemu logs?

Comment 4 Milan Zamazal 2017-03-29 13:02:58 UTC
If it is of any relevance, I can see that there may be some difference in NUMA policies on the two hosts.

Source:
  
  # numactl --show
  policy: default
  preferred node: current
  physcpubind: 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 
  cpubind: 0 1 16 17 
  nodebind: 0 1 16 17 
  membind: 0 1 16 17 

Destination:

  # numactl --show
  policy: default
  preferred node: current
  physcpubind: 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 
  cpubind: 0 16 
  nodebind: 0 16 
  membind: 0 16 

Comparing with the reported error

  libvirtError: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument

there can be some relation. Possibly changed NUMA policies could be also one of possible explanations why the bug hasn't been present before.

Comment 5 Milan Zamazal 2017-03-30 13:37:54 UTC
Created attachment 1267539 [details]
Source host libvirtd log

Comment 6 Milan Zamazal 2017-03-30 13:38:26 UTC
Created attachment 1267540 [details]
Destination host libvirtd log

Comment 7 Milan Zamazal 2017-03-30 13:45:27 UTC
The bug seems to be in libvirt or below, reassigning to it.

The two hosts are in NUMA configuration described in comment #4.  The destination host has 4 NUMA nodes (0, 1, 16, 17), but 2 of them (1, 17) have inaccessible memory and the CPUs assigned to them have been offlined.  The CPU state is as follows:

  # lscpu
  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                160
  On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
  Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-159
  Thread(s) per core:    1
  Core(s) per socket:    5
  Socket(s):             2
  NUMA node(s):          4
  Model:                 2.0 (pvr 004b 0200)
  Model name:            POWER8E (raw), altivec supported
  L1d cache:             64K
  L1i cache:             32K
  L2 cache:              512K
  L3 cache:              8192K
  NUMA node0 CPU(s):     0,8,16,24,32
  NUMA node1 CPU(s):     
  NUMA node16 CPU(s):    40,48,56,64,72
  NUMA node17 CPU(s):    

Here are the details how to reproduce the bug with virsh on that setup:

- On the source host, create dummy.xml domain XML file with the following content:

  <domain type='kvm'>
    <name>dummy</name>
    <uuid>4f836db9-e87f-425b-9278-5693be04a978</uuid>
    <vcpu placement='static' current='1'>16</vcpu>
    <memory unit='KiB'>1048576</memory>
    <os>
      <type arch='ppc64le' machine='pseries-rhel7.3.0'>hvm</type>
    </os>
  </domain>

- Start the VM:

  virsh create dummy.xml

- Hot plug 1 CPU to it:

  virsh setvcpus dummy 2

- Try to migrate the VM:

  virsh migrate dummy qemu+tcp://root@DESTINATION-HOST/system --verbose --live --auto-converge --compressed --p2p --abort-on-error

The command fails with the following error:

  error: Invalid value '0-1,16-17' for 'cpuset.mems': Invalid argument

See virsh-*.log attachments for libvirtd logs from the two hosts.

Comment 8 Milan Zamazal 2017-03-30 13:54:24 UTC
libvirt-2.0.0-10.el7_3.5.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.8.ppc64le
kernel 3.10.0-514.16.1.el7.ppc64le

Comment 10 Peter Krempa 2017-03-30 14:49:50 UTC
Since the nodes have no memory in it it's an instance of bug https://bugzilla.redhat.com/show_bug.cgi?id=1375268 . vCPU hotplug triggered this as vCPUs are hotplugged right after the qemu process is started.

*** This bug has been marked as a duplicate of bug 1375268 ***


Note You need to log in before you can comment on or make changes to this bug.