Description of problem:
live migration between hosts of same cluster fails. Source is a centos 7.3 node, target is a centos 7.2 node
Version-Release number of selected component (if applicable):
centos 7.2 host:
- libvirt 1.2.17-13.el7_2.6
- qemu 2.3.0-31.el7.21.1
centos 7.3 host:
- libvirt 2.0.0-10.el7_3.2
- qemu 2.6.0-27.1.el7
- ovirt 4.0.6
Steps to Reproduce:
1. start VM on centos 7.3 node
2. live migrate vm to centos 7.2 node
Migration does not finish. Cancelled after 6 hours.
Logs of source vdsm and target vdsm attached. Live migrated VM is colvm60. Processing started at ~ 20:15:22
Created attachment 1241559 [details]
Created attachment 1241560 [details]
So, what happens is that the downtime thread fails right at the beginning of the migration because of:
So the migration than keeps going with the minimal downtime which is not enough to finish the migration successfully and than it is cancelled. The strange thing is, why did libvirt not return the memory_bps...
I would guess the issue is that the monitor thread started before the migration actually started so the data returned by libvirt were not there yet.
@Markus: is this happening all the time or was this a one time issue? Is it happening with all VMs or only with this one?
Failure rate is 100%. 5/5 migrations stalled because of this error.
I'm raising the severity to high as it affects core oVirt features for 7.3 nodes.
Reason for the bug was a faulty network card with high packet drop. after exchange everything works flawlessly.
Nevertheless OVirt should detect and report the issue in the WebUI.
ok, this is actually a subset of 1414626 so marking it as duplicate.
The root cause of this one is that when the stats don't contain some value, they throw an exception turning the monitor and downtime thread off letting the migration progress wrongly for couple of hours.
*** This bug has been marked as a duplicate of bug 1414626 ***