Bug 1523036
| Summary: | "virsh domjobinfo --completed" prints "Expected downtime" instead of "Total downtime" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Fangge Jin <fjin> | ||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.5 | CC: | dzheng, lmiksik, rbalakri, xuzhang, yafu, yanqzhan, zpeng | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-3.9.0-8.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-04-10 11:02:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Additional info:
During managedsave, after "migration" is completed and before managedsave is completed, "virsh domjobinfo" returns "Job type: Completed".
# while true; do date; virsh domjobinfo rhel7; done
......
Fri Dec 8 10:16:18 CST 2017
Job type: Unbounded
Operation: Save
Time elapsed: 2109 ms
Data processed: 901.944 MiB
Data remaining: 0.000 B
Data total: 1.126 GiB
Memory processed: 901.944 MiB
Memory remaining: 0.000 B
Memory total: 1.126 GiB
Memory bandwidth: 452.502 MiB/s
Dirty rate: 0 pages/s
Page size: 4096 bytes
Iteration: 3
Constant pages: 64947
Normal pages: 230305
Normal data: 899.629 MiB
Expected downtime: 1031 ms
Setup time: 11 ms
Fri Dec 8 10:16:19 CST 2017
Job type: Completed
Fri Dec 8 10:16:19 CST 2017
Job type: Completed
Fri Dec 8 10:16:19 CST 2017
Job type: Completed
......
libvirtd log:
2017-12-08 02:16:17.276+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"stop","id":"libvirt-25"}
2017-12-08 02:16:17.338+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 337802}, "event": "STOP"}
2017-12-08 02:16:17.338+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-25"}
2017-12-08 02:16:17.384+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-26"}
2017-12-08 02:16:17.386+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-26"}
2017-12-08 02:16:17.386+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-27"}
2017-12-08 02:16:17.387+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-27"}
2017-12-08 02:16:17.388+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-28"}
2017-12-08 02:16:17.389+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 389075}, "event": "MIGRATION", "data": {"status": "setup"}}
2017-12-08 02:16:17.389+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-28"}
2017-12-08 02:16:17.400+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 400042}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
2017-12-08 02:16:17.400+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699377, "microseconds": 400086}, "event": "MIGRATION", "data": {"status": "active"}}
2017-12-08 02:16:18.351+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699378, "microseconds": 351290}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
2017-12-08 02:16:18.352+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699378, "microseconds": 351962}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
2017-12-08 02:16:19.383+0000: 16141: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff8b0000fb0 event={"timestamp": {"seconds": 1512699379, "microseconds": 383094}, "event": "MIGRATION", "data": {"status": "completed"}}
2017-12-08 02:16:19.384+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"query-migrate","id":"libvirt-59"}
2017-12-08 02:16:19.385+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {"status": "completed", "setup-time": 11, "downtime": 1031, "total-time": 1994, "ram": {"total": 1209352192, "postcopy-requests": 0, "dirty-sync-count": 3, "page-size": 4096, "remaining": 0, "mbps": 3795.863029, "transferred": 945756604, "duplicate": 64947, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 943329280, "normal": 230305}}, "id": "libvirt-59"}
2017-12-08 02:16:19.386+0000: 16221: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7ff8b0000fb0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-60"}
2017-12-08 02:16:19.387+0000: 16141: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff8b0000fb0 reply={"return": {}, "id": "libvirt-60"}
This is caused by wrong type store in the completed job info. It's set to UNBOUNDED while it should be COMPLETED.
Broken by:
commit 3f2d6d829eb8de0348fcbd58d654b29d5c5bebc2
Refs: v3.7.0-29-g3f2d6d829e
Author: Nikolay Shirokovskiy <nshirokovskiy>
AuthorDate: Fri Sep 1 09:49:31 2017 +0300
Commit: Jiri Denemark <jdenemar>
CommitDate: Thu Sep 7 12:52:36 2017 +0200
qemu: migration: don't expose incomplete job as complete
In case of real migration (not migrating to file on save, dump etc)
migration info is not complete at time qemu finishes migration
in normal (non postcopy) mode. We need to update disks stats,
downtime info etc. Thus let's not expose this job status as
completed.
To archive this let's set status to 'qemu completed' after
qemu reports migration is finished. It is not visible as complete
job to clients. Cookie code on confirm phase will finally turn
job into completed. As we don't need more things to do when
migrating to file status is set to 'completed' as before
in this case.
Signed-off-by: Jiri Denemark <jdenemar>
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-January/msg00392.html Fixed upstream by
commit e8784e7868d44a2ce796b376cf78f0f6c61c114a
Refs: v3.10.0-183-ge8784e7868
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu Jan 11 20:47:50 2018 +0100
Commit: Jiri Denemark <jdenemar>
CommitDate: Fri Jan 12 10:45:31 2018 +0100
qemu: Fix type of a completed job
Libvirt 3.7.0 and earlier libvirt reported a migration job as completed
immediately after QEMU finished sending migration data at which point
migration was not really complete yet. Commit v3.7.0-29-g3f2d6d829e
fixed this, but caused a regression in reporting statistics for
completed jobs which started reporting the job as still running. This
happened because the completed job statistics including the job status
are copied from the running job before we finally mark it as completed.
Let's make sure QEMU_DOMAIN_JOB_STATUS_COMPLETED is always set in the
completed job info even when the job has not finished yet.
https://bugzilla.redhat.com/show_bug.cgi?id=1523036
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Pavel Hrdina <phrdina>
Reproduced with libvirt-3.9.0-5.el7.x86_64. Verified pass with libvirt-3.9.0-9.el7.x86_64. Test steps: 1.Start a guest; 2.Do managedsave: #virsh managedsave iommu1 Domain iommu1 state saved by libvirt 3.Check the completed job info: # virsh domjobinfo iommu1 --completed Job type: ***Completed*** Operation: Save Time elapsed: 1007 ms Data processed: 794.056 MiB Data remaining: 0.000 B Data total: 4.171 GiB Memory processed: 794.056 MiB Memory remaining: 0.000 B Memory total: 4.171 GiB Memory bandwidth: 798.506 MiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 3 Constant pages: 892407 Normal pages: 200925 Normal data: 784.863 MiB ***Total downtime***: 1002 ms Setup time: 19 ms 4.Also test dump/snapshot/migrate, all work well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |
Created attachment 1364005 [details] libvirtd log Description of problem: "virsh domjobinfo --completed" prints "Expected downtime" instead of "Total downtime" Version-Release number of selected component: libvirt-3.9.0-5.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a guest 2. Do managedsave or other async job(for example: save, dump, snapshot, migration) # virsh managedsave rhel7 Domain rhel7 state saved by libvirt 3. Check the completed job info: # virsh domjobinfo rhel7 --completed Job type: Unbounded Operation: Save Time elapsed: 559 ms Data processed: 707.478 MiB Data remaining: 0.000 B Data total: 1.126 GiB Memory processed: 707.478 MiB Memory remaining: 0.000 B Memory total: 1.126 GiB Memory bandwidth: 1.275 GiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 3 Constant pages: 114614 Normal pages: 180510 Normal data: 705.117 MiB Expected downtime: 553 ms Setup time: 6 ms Actual results: In the output of "virsh domjobinfo --completed": 1) The "Job type" is unbounded. 2) "Expected downtime" is printed Expected results: 1) The "Job type" should be completed. 2) "Total downtime" should be printed, not "Expected downtime"