Bug 1285474

Summary: [ppc64le] VM migration fail on qemu-kvm error on 'spapr/htab'
Product: Red Hat Enterprise Linux 7 Reporter: Ilanit Stein <istein>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.2CC: gklein, hannsj_uhl, knoel, mazhang, michal.skrivanek, michen, qzhang, rbalakri, shuyu, virt-maint, xuhan, xuma, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-01 03:05:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1282833    
Bug Blocks: 1284775, 1305498, 1308609, 1359843    
Attachments:
Description Flags
vdsm log
none
engine log
none
qemu log none

Description Ilanit Stein 2015-11-25 16:47:27 UTC
Description of problem:
VM migration fail on:
2015-11-25T15:51:48.366843Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-25T15:51:48.366995Z qemu-kvm: load of migration failed: Invalid argument

Version-Release number of selected component (if applicable):
engine - rhevm 3.6.0-20

host-
vdsm 4.17.10.1-0.el7ev
libvirt 1.2.17-13.el7_2.2.ppc64le

qemu-img-rhev-2.3.0-31.el7_2.3.ppc64le
qemu-kvm-common-rhev-2.3.0-31.el7_2.3.ppc64le
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.2.ppc64le
qemu-kvm-tools-rhev-2.3.0-31.el7_2.3.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le

kernel -  3.10.0-327.2.1.el7.ppc64le

How reproducible:
Occurred on one setup with 2 ppc hosts. Did not occur on a second setup, with other 2 ppc hosts, with same versions as above.

vdsm.log error:
Thread-1060::DEBUG::2015-11-25 11:05:33,016::migration::558::virt.vm::(stop) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::stopping migration monitor thread
Thread-1060::DEBUG::2015-11-25 11:05:33,016::migration::453::virt.vm::(stop) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::stopping migration downtime thread
Thread-1060::ERROR::2015-11-25 11:05:33,017::migration::208::virt.vm::(_recover) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::internal error: early end of file from monitor: possible problem:
2015-11-25T16:05:32.585940Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-25T16:05:32.586070Z qemu-kvm: load of migration failed: Invalid argument
Thread-1061::DEBUG::2015-11-25 11:05:33,016::migration::450::virt.vm::(run) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::migration downtime thread exiting
Thread-1060::DEBUG::2015-11-25 11:05:33,017::stompreactor::389::jsonrpc.AsyncoreClient::(send) Sending response
Thread-1060::DEBUG::2015-11-25 11:05:33,064::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 42978047030, "3ad53ed2-6bf9-4494-9db5-d7adb7854256": {"status": "Migration Source"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|3ad53ed2-6bf9-4494-9db5-d7adb7854256"}
Thread-1060::ERROR::2015-11-25 11:05:33,065::migration::310::virt.vm::(run) vmId=`3ad53ed2-6bf9-4494-9db5-d7adb7854256`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 294, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 364, in _startUnderlyingMigration
    self._perform_migration(duri, muri)
  File "/usr/share/vdsm/virt/migration.py", line 403, in _perform_migration
    self._vm._dom.migrateToURI3(duri, params, flags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1836, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: internal error: early end of file from monitor: possible problem:
2015-11-25T16:05:32.585940Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-25T16:05:32.586070Z qemu-kvm: load of migration failed: Invalid argument

Comment 1 Ilanit Stein 2015-11-25 16:55:49 UTC
Created attachment 1098917 [details]
vdsm log

Comment 2 Ilanit Stein 2015-11-25 16:56:15 UTC
Created attachment 1098918 [details]
engine log

Comment 3 Ilanit Stein 2015-11-25 16:56:50 UTC
Created attachment 1098919 [details]
qemu log

Comment 4 Ilanit Stein 2015-11-26 11:29:44 UTC
Checked VM migration again with disabling memory hot plug, restart engine, and VM restart (power off & run again), and migration was successful,

Same as it was on bug 1282833, in this bug as well the memory hot plug was the root cause.

Disabled memory hot plug by:
engine=# insert into vdc_options (option_name, option_value, version)  VALUES ('HotPlugMemorySupported', '{"x86_64":"true","ppc64":"false"}' ,'3.6');
INSERT 0 1
engine=# select * from vdc_options where option_name ='HotPlugMemorySupported';
 option_id |      option_name       |            option_value            | version 
-----------+------------------------+------------------------------------+---------
       178 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.0
       179 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.1
       180 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.2
       181 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.3
       182 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.4
       183 | HotPlugMemorySupported | {"x86_64":"false","ppc64":"false"} | 3.5
       840 | HotPlugMemorySupported | {"x86_64":"true","ppc64":"false"}  | 3.6
(7 rows)

Comment 5 Michal Skrivanek 2015-11-26 11:31:52 UTC
this seems to be related to memory hotplug, the 1TB maxmem size we use for all VMs

Comment 6 Qunfang Zhang 2015-11-27 08:07:04 UTC
Reproduced the issue on the following version:

Host A (256G mem):

kernel-3.10.0-327.4.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le

Host B (128G mem):

kernel-3.10.0-327.2.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7_2.3.ppc64le


# /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 4G,slots=4,maxmem=1024G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi0,bus=pci.0 -drive file=RHEL-7.2-20151015.0-Server-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -incoming tcp:0:5800
QEMU 2.3.0 monitor - type 'help' for more information
(qemu) 
(qemu) 2015-11-27T08:05:15.985197Z qemu-kvm: error while loading state for instance 0x0 of device 'spapr/htab'
2015-11-27T08:05:15.985265Z qemu-kvm: load of migration failed: Invalid argument


After change maxmem from 1024G to 512G, this issue does not happen.

Comment 7 David Gibson 2015-12-01 03:05:43 UTC
This is essentially the same problem as bug 1282833 - the destination host cannot allocate a hash page table the same size as the guest had on the source host.  

Although it's allocated outside the guest, the hash page table size is visible to the guest, and so there's no way to migrate if it has a different size on source and destination.

*** This bug has been marked as a duplicate of bug 1282833 ***