Bug 1265579
Summary: | qemu give "main-loop: WARNING: I/O thread spun for 1000 iterations" when do live commit | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Shuang Yu <shuyu> |
Component: | qemu-kvm-rhev | Assignee: | John Snow <jsnow> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Gu Nini <ngu> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 7.4 | CC: | famz, hannsj_uhl, jsnow, juzhang, knoel, michen, ngu, pezhang, qizhu, qzhang, stefanha, virt-maint, xfu, xuhan, xuma |
Target Milestone: | rc | ||
Target Release: | 7.5 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-08-20 07:26:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1444027 |
Description
Shuang Yu
2015-09-23 09:27:18 UTC
Ok, there's nothing obvious here that would make this ppc64 specific. Can you please check: 1. Does the bug occur on x86? 2. Does the bug occur with different guests (e.g. RHEL6-LE, or RHEL7). 3. Does the bug occur if you use virtio-blk instead of virtio-scsi 4. Does the bug occur if you use spapr-vscsi instead of virtio-scsi. (In reply to David Gibson from comment #2) > Ok, there's nothing obvious here that would make this ppc64 specific. Can > you please check: > > 1. Does the bug occur on x86? Retest this issue with RHEL-Server-6.7-64.qcow2 and RHEL-Server-7.2-64.qcow2 guest on x86 host,when do live commit after snapshot,hit this bug. > 2. Does the bug occur with different guests (e.g. RHEL6-LE, or RHEL7). Retest this issue with RHEL-7.2-20150924.0-Server-ppc64le-dvd1.iso guest on PowerPC host,when do live commit after snapshot,hit this bug. > 3. Does the bug occur if you use virtio-blk instead of virtio-scsi Use virtio-blk instead of virtio-scsi as boot disk ,hit this bug. # /usr/libexec/qemu-kvm -name live-commit -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-blk-pci,id=virtio0,drive=drive-virtio0 -drive file=RHEL-6.7-20150702.0-Server-ppc64.raw,format=raw,if=none,id=drive-virtio0,werror=stop,cache=none -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:52:5f:5d:5a -uuid 781c99a0-7bef-47de-9ca7-344b87cef66d -m 8G -smp 4 -qmp tcp:0:6666,server,nowait > 4. Does the bug occur if you use spapr-vscsi instead of virtio-scsi. Use spapr-vscsi instead of virtio-scsi as boot disk ,hit this bug. # /usr/libexec/qemu-kvm -name live-commit -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=RHEL-6.7-20150702.0-Server-ppc64.raw,format=raw,if=none,id=drive-scsi0,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:52:5f:5d:5a -uuid 781c99a0-7bef-47de-9ca7-344b87cef66d -m 8G -smp 4 -qmp tcp:0:6666,server,nowait Ok, looks like this isn't actually Power specific. Re-assigning. John, How we implemented speed=XXX is like this: in the block job loop, between each iteration, we calculate a sleep "period" with "how much IO we have done" and the "speed" parameter, and then we sleep for a while, yielding the BQL mutex, to let vcpu thread run. The problem with that is, if the speed is set very high, (or if somehow the sleep period is always calculated as very short), we don't really yield the mutex, because the sleeping timer will fire so quickly that the main loop dispatches it even before the first yielding of BQL happens. If that is the case, after 1000 iterations, the warning will be printed. It's a known problem but I don't know the proper fix yet. Maybe we can add a primitive with an "sleep no shorter than next iteration". To confirm is the cause we can try a lower speed= value. If I'm right, feel free to re-assign this bug to me. I'll debug a little further... it happens with speeds down to 1000, unless you want something /quite/ small in that parameter. I've digged the issue by tracing the actual events that are dispatched, it does boil down to the known "block_job_sleep_ns(..., 0);" problem in block jobs. Again, this keeps seting up a timer with "now" as the expiration time which prevents main loop from releasing the iothread lock, therefore it is complaining. I prototyped something mentioned in comment 5 but that "sleep no shorter than next iteration" is going to slow down the block job too much than I would like. We need to think about something better. Anyway this is definitely upstream first material, for 7.2 I think it doesn't hurt to ignore this warning, or disable it temporarily before we have a proper fix. Fam Postponing to 7.3: The warning isn't an error and can be ignored. Will investigate a proper fix and consider a backport to 7.2.z once we discover the root cause and have fixed it upstream. This may have been fixed (Or at least show different behavior) by recent changes, but it is not a priority as it is just a warning. We've been adjusting the way blockjobs work upstream quite a bit lately; I'm not able to reproduce this any longer. That doesn't mean it may not still be an issue, but it doesn't appear to be an issue as of 2.11-rc3. I re-tested in qemu-kvm-rhev-2.10.0-9.el7 too, and it looks alright there as well. Odds are very good that we have more work to do upstream as we're running into some block-job scheduling difficulties, but the problem as it existed for right now appears to be handled. Tested as: Guest: Fedora 26 Server Host: Fedora 27 Workstation Command Line: ./x86_64-softmmu/qemu-system-x86_64 \ -machine pc,accel=kvm,usb=off \ -realtime mlock=off \ -nodefaults \ -monitor stdio \ -rtc base=utc \ -device virtio-scsi-pci,id=scsi \ -drive file=/media/ext/img/f26.raw,format=raw,if=none,id=drive-scsi0,cache=none \ -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 \ -msg timestamp=on \ -usb \ -device usb-tablet,id=tablet1 \ -vga std \ -drive file=/media/ext/iso/linux/Fedora/Fedora-Server-dvd-x86_64-26-1.5.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 \ -device virtio-scsi-pci,id=bus2,addr=0x5 \ -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 \ -uuid 781c99a0-7bef-47de-9ca7-344b87cef66d \ -m 8G \ -smp 4 \ -qmp tcp:0:4444,server,nowait \ -display gtk Process (using qmp-shell): blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn1.qcow2 format=qcow2 blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn2.qcow2 format=qcow2 blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn3.qcow2 format=qcow2 blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn4.qcow2 format=qcow2 block-commit device=drive-scsi0 base=/media/ext/img/f26.raw top=/media/ext/img/f26.sn4.qcow2 speed=10000 To the QA team: if you were to retest, I suspect you'd find the behavior is now different. (Or, I just can't reproduce. Please let me know!) Thank you, John. I can still reproduced this issue with latest qemu version: Version: qemu-kvm-rhev-2.10.0-9.el7.x86_64 qemu-img-rhev-2.10.0-9.el7.x86_64 kernel-3.10.0-800.el7.x86_64 Steps: 1. Create base image: qemu-img create /home/test.raw 1G 2. Launch guest with the image: /usr/libexec/qemu-kvm \ -name rhel7_4 \ -m 1G \ -smp 2 \ -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,id=drive_image1,if=none,cache=none,media=disk \ -device virtio-blk-pci,drive=drive_image1,id=virtio-blk-0,bootindex=0 \ -qmp stdio \ -vnc :3 \ -netdev tap,id=hostnet0,script=/etc/qemu-ifup \ -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 \ -drive file=/home/test.raw,id=drive_image2,format=raw,if=none,cache=none,media=disk \ -device virtio-blk-pci,drive=drive_image2,id=virtio-blk-1 3. Live snapshot and live commit: {"execute":"qmp_capabilities"} {"return": {}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file": "/home/sn1","format":"qcow2"}} Formatting '/home/sn1', fmt=qcow2 size=1073741824 backing_file=/home/test.raw backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file": "/home/sn2","format":"qcow2"}} Formatting '/home/sn2', fmt=qcow2 size=1073741824 backing_file=/home/sn1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file": "/home/sn3","format":"qcow2"}} Formatting '/home/sn3', fmt=qcow2 size=1073741824 backing_file=/home/sn2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}} {"execute":"block-commit","arguments":{"device":"drive_image2","base":"/home/test.raw","top":"/home/sn2","speed": 10000}} {"return": {}} main-loop: WARNING: I/O thread spun for 1000 iterations {"timestamp": {"seconds": 1512012910, "microseconds": 534683}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image2", "len": 1073741824, "offset": 1073741824, "speed": 10000, "type": "commit"}} Result: As above, still have: main-loop: WARNING: I/O thread spun for 1000 iterations Qianqian, my apologies -- I misread the test case. There is indeed a difference between whether you use SN3 or SN4 for the top. I was able to reproduce and identify the issue. If the speed of the operation is slow enough, the delay computed by the copy loop will effectively always be zero, and we will try to yield aggressively with 0 nanosecond sleeps every single iteration of the loop. After 1000 of these, we get the warning. I've taken a stab at fixing this upstream that should be a bit nicer on the main loop, "[PATCH 0/7] blockjob: refactor mirror_throttle" For now, I will be postponing this to next release -- and I don't think we need to include a z-stream fix for now, as the bug is mostly a case of inefficiency and not necessarily a showstopper. |