Bug 1413847
Summary: | Live migration failure not detected by OVirt | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Markus Stockhausen <mst> | ||||||
Component: | General | Assignee: | Dan Kenigsberg <danken> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | meital avital <mavital> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.18.15.2 | CC: | bugs, mst, tjelinek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-01-25 10:21:43 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Markus Stockhausen
2017-01-17 06:38:31 UTC
Created attachment 1241559 [details]
VDSM target
Created attachment 1241560 [details]
VDSM source
So, what happens is that the downtime thread fails right at the beginning of the migration because of: KeyError: 'memory_bps' So the migration than keeps going with the minimal downtime which is not enough to finish the migration successfully and than it is cancelled. The strange thing is, why did libvirt not return the memory_bps... I would guess the issue is that the monitor thread started before the migration actually started so the data returned by libvirt were not there yet. @Markus: is this happening all the time or was this a one time issue? Is it happening with all VMs or only with this one? Failure rate is 100%. 5/5 migrations stalled because of this error. I'm raising the severity to high as it affects core oVirt features for 7.3 nodes. Reason for the bug was a faulty network card with high packet drop. after exchange everything works flawlessly. Nevertheless OVirt should detect and report the issue in the WebUI. ok, this is actually a subset of 1414626 so marking it as duplicate. The root cause of this one is that when the stats don't contain some value, they throw an exception turning the monitor and downtime thread off letting the migration progress wrongly for couple of hours. *** This bug has been marked as a duplicate of bug 1414626 *** |