Bug 1523036
Summary: | "virsh domjobinfo --completed" prints "Expected downtime" instead of "Total downtime" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Fangge Jin <fjin> | ||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.5 | CC: | dzheng, lmiksik, rbalakri, xuzhang, yafu, yanqzhan, zpeng | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-3.9.0-8.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-04-10 11:02:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Additional info: During managedsave, after "migration" is completed and before managedsave is completed, "virsh domjobinfo" returns "Job type: Completed". # while true; do date; virsh domjobinfo rhel7; done ...... Fri Dec 8 10:16:18 CST 2017 Job type: Unbounded Operation: Save Time elapsed: 2109 ms Data processed: 901.944 MiB Data remaining: 0.000 B Data total: 1.126 GiB Memory processed: 901.944 MiB Memory remaining: 0.000 B Memory total: 1.126 GiB Memory bandwidth: 452.502 MiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 3 Constant pages: 64947 Normal pages: 230305 Normal data: 899.629 MiB Expected downtime: 1031 ms Setup time: 11 ms Fri Dec 8 10:16:19 CST 2017 Job type: Completed Fri Dec 8 10:16:19 CST 2017 Job type: Completed Fri Dec 8 10:16:19 CST 2017 Job type: Completed ...... libvirtd log: 2017-12-08 02:16:17.276+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"stop","id":"libvirt-25"} 2017-12-08 02:16:17.338+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 337802}, "event": "STOP"} 2017-12-08 02:16:17.338+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-25"} 2017-12-08 02:16:17.384+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-26"} 2017-12-08 02:16:17.386+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-26"} 2017-12-08 02:16:17.386+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-27"} 2017-12-08 02:16:17.387+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-27"} 2017-12-08 02:16:17.388+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-28"} 2017-12-08 02:16:17.389+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 389075}, "event": "MIGRATION", "data": {"status": "setup"}} 2017-12-08 02:16:17.389+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-28"} 2017-12-08 02:16:17.400+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 400042}, "event": "MIGRATION_PASS", "data": {"pass": 1}} 2017-12-08 02:16:17.400+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 400086}, "event": "MIGRATION", "data": {"status": "active"}} 2017-12-08 02:16:18.351+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699378, "microseconds": 351290}, "event": "MIGRATION_PASS", "data": {"pass": 2}} 2017-12-08 02:16:18.352+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699378, "microseconds": 351962}, "event": "MIGRATION_PASS", "data": {"pass": 3}} 2017-12-08 02:16:19.383+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699379, "microseconds": 383094}, "event": "MIGRATION", "data": {"status": "completed"}} 2017-12-08 02:16:19.384+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"query-migrate","id":"libvirt-59"} 2017-12-08 02:16:19.385+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {"status": "completed", "setup-time": 11, "downtime": 1031, "total-time": 1994, "ram": {"total": 1209352192, "postcopy-requests": 0, "dirty-sync-count": 3, "page-size": 4096, "remaining": 0, "mbps": 3795.863029, "transferred": 945756604, "duplicate": 64947, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 943329280, "normal": 230305}}, "id": "libvirt-59"} 2017-12-08 02:16:19.386+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-60"} 2017-12-08 02:16:19.387+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-60"} This is caused by wrong type store in the completed job info. It's set to UNBOUNDED while it should be COMPLETED. Broken by: commit 3f2d6d829eb8de0348fcbd58d654b29d5c5bebc2 Refs: v3.7.0-29-g3f2d6d829e Author: Nikolay Shirokovskiy <nshirokovskiy> AuthorDate: Fri Sep 1 09:49:31 2017 +0300 Commit: Jiri Denemark <jdenemar> CommitDate: Thu Sep 7 12:52:36 2017 +0200 qemu: migration: don't expose incomplete job as complete In case of real migration (not migrating to file on save, dump etc) migration info is not complete at time qemu finishes migration in normal (non postcopy) mode. We need to update disks stats, downtime info etc. Thus let's not expose this job status as completed. To archive this let's set status to 'qemu completed' after qemu reports migration is finished. It is not visible as complete job to clients. Cookie code on confirm phase will finally turn job into completed. As we don't need more things to do when migrating to file status is set to 'completed' as before in this case. Signed-off-by: Jiri Denemark <jdenemar> Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-January/msg00392.html Fixed upstream by commit e8784e7868d44a2ce796b376cf78f0f6c61c114a Refs: v3.10.0-183-ge8784e7868 Author: Jiri Denemark <jdenemar> AuthorDate: Thu Jan 11 20:47:50 2018 +0100 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Jan 12 10:45:31 2018 +0100 qemu: Fix type of a completed job Libvirt 3.7.0 and earlier libvirt reported a migration job as completed immediately after QEMU finished sending migration data at which point migration was not really complete yet. Commit v3.7.0-29-g3f2d6d829e fixed this, but caused a regression in reporting statistics for completed jobs which started reporting the job as still running. This happened because the completed job statistics including the job status are copied from the running job before we finally mark it as completed. Let's make sure QEMU_DOMAIN_JOB_STATUS_COMPLETED is always set in the completed job info even when the job has not finished yet. https://bugzilla.redhat.com/show_bug.cgi?id=1523036 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Pavel Hrdina <phrdina> Reproduced with libvirt-3.9.0-5.el7.x86_64. Verified pass with libvirt-3.9.0-9.el7.x86_64. Test steps: 1.Start a guest; 2.Do managedsave: #virsh managedsave iommu1 Domain iommu1 state saved by libvirt 3.Check the completed job info: # virsh domjobinfo iommu1 --completed Job type: ***Completed*** Operation: Save Time elapsed: 1007 ms Data processed: 794.056 MiB Data remaining: 0.000 B Data total: 4.171 GiB Memory processed: 794.056 MiB Memory remaining: 0.000 B Memory total: 4.171 GiB Memory bandwidth: 798.506 MiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 3 Constant pages: 892407 Normal pages: 200925 Normal data: 784.863 MiB ***Total downtime***: 1002 ms Setup time: 19 ms 4.Also test dump/snapshot/migrate, all work well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |
Created attachment 1364005 [details] libvirtd log Description of problem: "virsh domjobinfo --completed" prints "Expected downtime" instead of "Total downtime" Version-Release number of selected component: libvirt-3.9.0-5.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a guest 2. Do managedsave or other async job(for example: save, dump, snapshot, migration) # virsh managedsave rhel7 Domain rhel7 state saved by libvirt 3. Check the completed job info: # virsh domjobinfo rhel7 --completed Job type: Unbounded Operation: Save Time elapsed: 559 ms Data processed: 707.478 MiB Data remaining: 0.000 B Data total: 1.126 GiB Memory processed: 707.478 MiB Memory remaining: 0.000 B Memory total: 1.126 GiB Memory bandwidth: 1.275 GiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 3 Constant pages: 114614 Normal pages: 180510 Normal data: 705.117 MiB Expected downtime: 553 ms Setup time: 6 ms Actual results: In the output of "virsh domjobinfo --completed": 1) The "Job type" is unbounded. 2) "Expected downtime" is printed Expected results: 1) The "Job type" should be completed. 2) "Total downtime" should be printed, not "Expected downtime"