Bug 1282744
| Summary: | Actual downtime - Sometimes libvirt doesn't report 'downtime_net' in jobStats while migrating VM/s | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michael Burman <mburman> | ||||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.2 | CC: | dyuan, fjin, jdenemar, mburman, rbalakri, zpeng | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | virt | ||||||||
| Fixed In Version: | libvirt-1.3.3-1.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-11-03 18:30:54 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Could you attach libvirtd debug logs from both source and destination hosts? Created attachment 1096025 [details]
source log
Created attachment 1096052 [details]
destination log
Fixed upstream by v1.3.2-86-gcb483a6:
commit cb483a68fdc3503efc9b0996570e58aaf0c11c17
Author: Jiri Denemark <jdenemar>
AuthorDate: Tue Feb 23 10:47:01 2016 +0100
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Mar 8 16:26:00 2016 +0100
qemu: Fix a race when computing migration downtime
Computing a total downtime during a migration requires us to store a
time stamp when guest CPUs get stopped. The value (and all other
statistics) is then transferred to the destination to compute the
downtime. Because the stopped time stamp is stored by a STOP event
handler while the statistics which will be sent over to the destination
are copied synchronously within qemuMigrationWaitForCompletion.
Depending on the timing of STOP and MIGRATION events, we may end up
copying (and transferring) statistics without the stopped time stamp
set. Let's make sure we always use the correct time stamp.
https://bugzilla.redhat.com/show_bug.cgi?id=1282744
Signed-off-by: Jiri Denemark <jdenemar>
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions test with pure libvirt: libvirt-2.0.0-8.el7.x86_64 qemu-kvm-rhev-2.6.0-24.el7.x86_64 step: 1: prepare two machine 2: loop live migration 60 times 3: check downtime every time both on source and target statistics always get actual downtime. test with RHV both source and target build: libvirt-2.0.0-8.el7.x86_64 vdsm-4.18.13-1.el7ev.x86_64 3.10.0-505.el7.x86_64 test ping-pong migration from 2 rhel 7.3 servers check event log in the UI reports: Migration completed (VM: n2, Source: A, Destination: B, Duration: 53 seconds, Total: 53 seconds, Actual downtime: 368ms) Migration completed (VM: n2, Source: B, Destination: A, Duration: 53 seconds, Total: 53 seconds, Actual downtime: 310ms) Migration completed (VM: n2, Source: A, Destination: B, Duration: 54 seconds, Total: 54 seconds, Actual downtime: 365ms) Migration completed (VM: n2, Source: B, Destination: A, Duration: 55 seconds, Total: 55 seconds, Actual downtime: 302ms) Migration completed (VM: n2, Source: A, Destination: B, Duration: 53 seconds, Total: 53 seconds, Actual downtime: 373ms) Migration completed (VM: n2, Source: B, Destination: A, Duration: 53 seconds, Total: 53 seconds, Actual downtime: 303ms) Migration completed (VM: n2, Source: A, Destination: B, Duration: 53 seconds, Total: 53 seconds, Actual downtime: 387ms) Migration completed (VM: n2, Source: B, Destination: A, Duration: 53 seconds, Total: 53 seconds, Actual downtime: 299ms) Migration completed (VM: n2, Source: A, Destination: B, Duration: 54 seconds, Total: 54 seconds, Actual downtime: 389ms) Migration completed (VM: n2, Source: B, Destination: A, Duration: 54 seconds, Total: 54 seconds, Actual downtime: 266ms) all get actual downtime. worked as expect, move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |
Description of problem: Actual downtime - Sometimes libvirt doesn't report 'downtime_net' in jobStats while migrating VM/s. - We are running this code on the destination host: stat = self._vm._dom.jobStats(libvirt.VIR_DOMAIN_JOB_STATS_COMPLETED) if 'downtime_net' in stat: ... and in some cases 'downtime_net' key is not exists in stat. As a result, Actual downtime reported as :(N/A) in the events log in UI. And sometimes reports 37ms, 47ms.. - Note that migration is fast - 16, 18, 29, 30, 33 seconds.. Version-Release number of selected component (if applicable): libvirt-1.2.17-13.el7.x86_64 vdsm-4.17.10.1-0.el7ev.noarch RHEL - 7.2 - 9.el7 3.10.0 - 327.el7.x86_64 How reproducible: 50-70% Steps to Reproduce: 1. Run live migration between 2 rhel 7.2 servers in latest 3.6 RHEV-M UI Actual results: Sometimes the Event log in the UI reports: Migration completed (VM: v3, Source: silver-vdsa.qa.lab.tlv.redhat.com, Destination: orchid-vds2.qa.lab.tlv.redhat.com, Duration: 18 seconds, Total: 18 seconds, Actual downtime: (N/A)) Expected results: Always report Actual downtime.