Bug 1288570 - Live migration of Instances with >=64GB RAM are not successful [NEEDINFO]
Live migration of Instances with >=64GB RAM are not successful
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
5.0 (RHEL 6)
x86_64 Linux
high Severity high
: ---
: 5.0 (RHEL 6)
Assigned To: Eoghan Glynn
nlevinki
: Reopened, Unconfirmed, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-04 10:48 EST by Matt Flusche
Modified: 2016-01-22 10:25 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-22 10:25:44 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
sgordon: needinfo? (chhudson)


Attachments (Terms of Use)

  None (edit)
Description Matt Flusche 2015-12-04 10:48:32 EST
Description of problem:
Live migration of instances with >=64GB RAM hang and never complete

Version-Release number of selected component (if applicable):

python-nova-2014.1.4-4.el6ost.noarch
openstack-nova-common-2014.1.4-4.el6ost.noarch
openstack-nova-compute-2014.1.4-4.el6ost.noarch
libvirt-0.10.2-29.el6_5.8.x86_64


How reproducible:

100%  (for customer)

Steps to Reproduce:
1. start nova live migration
2. migration will hang until cancelled 
3.

Actual results:

live migration hangs


Expected results:

completed live migration


Additional info:

When doing virsh domjobinfo, the RAM the amount to transfer get to near-zero, but never finishes.

Debug logs for libvirt seem to indicate the RAM transfer completes.  I could not locate the source of the hang in the log.

2015-12-03 21:41:16.867+0000: 21997: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe7f4000b90 buf={"return": {"status": "active", "total-time": 675475, "ram": {"total": 68736647168, "remaining": 54284288, "transferred": 90642667413}}, "id": "libvirt-8302"}
2015-12-03 21:41:16.930+0000: 21997: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe7f4000b90 buf={"return": {"status": "active", "total-time": 675538, "ram": {"total": 68736647168, "remaining": 33681408, "transferred": 90663270293}}, "id": "libvirt-8303"}
2015-12-03 21:41:16.967+0000: 21997: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe7f4000b90 buf={"timestamp": {"seconds": 1449178876, "microseconds": 967271}, "event": "STOP"}
2015-12-03 21:41:17.290+0000: 21997: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe7f4000b90 buf={"return": {"status": "completed", "downtime": 336, "total-time": 675896, "ram": {"total": 68736647168, "remaining": 0, "transferred": 90691304697}}, "id": "libvirt-8304"}
Comment 9 Sahid Ferdjaoui 2015-12-15 09:08:24 EST
We were not able to reproduce case and error disappear after a software update without to ensure whether error comes from one of them.

Please feel free to reopen it and update component if you have further information.

Thanks,
s.

Note You need to log in before you can comment on or make changes to this bug.