Bug 1285474 - [ppc64le] VM migration fail on qemu-kvm error on 'spapr/htab'
[ppc64le] VM migration fail on qemu-kvm error on 'spapr/htab'
Status: CLOSED DUPLICATE of bug 1282833
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity medium
: rc
: ---
Assigned To: David Gibson
Virtualization Bugs
:
Depends On:
Blocks: 1284775 RHV4.1PPC 1305498 RHEV4.0PPC
  Show dependency treegraph
 
Reported: 2015-11-25 11:47 EST by Ilanit Stein
Modified: 2016-07-25 10:18 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-30 22:05:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vdsm log (716.71 KB, application/x-gzip)
2015-11-25 11:55 EST, Ilanit Stein
no flags Details
engine log (108.31 KB, application/x-gzip)
2015-11-25 11:56 EST, Ilanit Stein
no flags Details
qemu log (2.51 KB, text/plain)
2015-11-25 11:56 EST, Ilanit Stein
no flags Details

  None (edit)
Description Ilanit Stein 2015-11-25 11:47:27 EST
Description of problem:
VM migration fail on:
2015-11-25T15:51:48.366843Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-25T15:51:48.366995Z qemu-kvm: load of migration failed: Invalid argument

Version-Release number of selected component (if applicable):
engine - rhevm 3.6.0-20

host-
vdsm 4.17.10.1-0.el7ev
libvirt 1.2.17-13.el7_2.2.ppc64le

qemu-img-rhev-2.3.0-31.el7_2.3.ppc64le
qemu-kvm-common-rhev-2.3.0-31.el7_2.3.ppc64le
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.2.ppc64le
qemu-kvm-tools-rhev-2.3.0-31.el7_2.3.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le

kernel -  3.10.0-327.2.1.el7.ppc64le

How reproducible:
Occurred on one setup with 2 ppc hosts. Did not occur on a second setup, with other 2 ppc hosts, with same versions as above.

vdsm.log error:
Thread-1060::DEBUG::2015-11-25 11:05:33,016::migration::558::virt.vm::(stop) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::stopping migration monitor thread
Thread-1060::DEBUG::2015-11-25 11:05:33,016::migration::453::virt.vm::(stop) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::stopping migration downtime thread
Thread-1060::ERROR::2015-11-25 11:05:33,017::migration::208::virt.vm::(_recover) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::internal error: early end of file from monitor: possible problem:
2015-11-25T16:05:32.585940Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-25T16:05:32.586070Z qemu-kvm: load of migration failed: Invalid argument
Thread-1061::DEBUG::2015-11-25 11:05:33,016::migration::450::virt.vm::(run) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::migration downtime thread exiting
Thread-1060::DEBUG::2015-11-25 11:05:33,017::stompreactor::389::jsonrpc.AsyncoreClient::(send) Sending response
Thread-1060::DEBUG::2015-11-25 11:05:33,064::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 42978047030, "3ad53ed2-6bf9-4494-9db5-d7adb7854256": {"status": "Migration Source"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|3ad53ed2-6bf9-4494-9db5-d7adb7854256"}
Thread-1060::ERROR::2015-11-25 11:05:33,065::migration::310::virt.vm::(run) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 294, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 364, in _startUnderlyingMigration
    self._perform_migration(duri, muri)
  File "/usr/share/vdsm/virt/migration.py", line 403, in _perform_migration
    self._vm._dom.migrateToURI3(duri, params, flags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1836, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: internal error: early end of file from monitor: possible problem:
2015-11-25T16:05:32.585940Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-25T16:05:32.586070Z qemu-kvm: load of migration failed: Invalid argument
Comment 1 Ilanit Stein 2015-11-25 11:55 EST
Created attachment 1098917 [details]
vdsm log
Comment 2 Ilanit Stein 2015-11-25 11:56 EST
Created attachment 1098918 [details]
engine log
Comment 3 Ilanit Stein 2015-11-25 11:56 EST
Created attachment 1098919 [details]
qemu log
Comment 4 Ilanit Stein 2015-11-26 06:29:44 EST
Checked VM migration again with disabling memory hot plug, restart engine, and VM restart (power off & run again), and migration was successful,

Same as it was on bug 1282833, in this bug as well the memory hot plug was the root cause.

Disabled memory hot plug by:
engine=# insert into vdc_options (option_name, option_value, version)  VALUES ('HotPlugMemorySupported', '{"x86_64":"true","ppc64":"false"}' ,'3.6');
INSERT 0 1
engine=# select * from vdc_options where option_name ='HotPlugMemorySupported';
 option_id |      option_name       |            option_value            | version 
-----------+------------------------+------------------------------------+---------
       178 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.0
       179 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.1
       180 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.2
       181 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.3
       182 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.4
       183 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.5
       840 | HotPlugMemorySupported | {"x86_64":"true","ppc64":"false"}  | 3.6
(7 rows)
Comment 5 Michal Skrivanek 2015-11-26 06:31:52 EST
this seems to be related to memory hotplug, the 1TB maxmem size we use for all VMs
Comment 6 Qunfang Zhang 2015-11-27 03:07:04 EST
Reproduced the issue on the following version:

Host A (256G mem):

kernel-3.10.0-327.4.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le

Host B (128G mem):

kernel-3.10.0-327.2.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le


# /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 4G,slots=4,maxmem=1024G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi0,bus=pci.0 -drive file=RHEL-7.2-20151015.0-Server-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -incoming tcp:0:5800
QEMU 2.3.0 monitor - type 'help' for more information
(qemu) 
(qemu) 2015-11-27T08:05:15.985197Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-27T08:05:15.985265Z qemu-kvm: load of migration failed: Invalid argument


After change maxmem from 1024G to 512G, this issue does not happen.
Comment 7 David Gibson 2015-11-30 22:05:43 EST
This is essentially the same problem as bug 1282833 - the destination host cannot allocate a hash page table the same size as the guest had on the source host.  

Although it's allocated outside the guest, the hash page table size is visible to the guest, and so there's no way to migrate if it has a different size on source and destination.

*** This bug has been marked as a duplicate of bug 1282833 ***

Note You need to log in before you can comment on or make changes to this bug.