Bug 1265579

Summary:	qemu give "main-loop: WARNING: I/O thread spun for 1000 iterations" when do live commit
Product:	Red Hat Enterprise Linux 7	Reporter:	Shuang Yu <shuyu>
Component:	qemu-kvm-rhev	Assignee:	John Snow <jsnow>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Gu Nini <ngu>
Severity:	low	Docs Contact:
Priority:	low
Version:	7.4	CC:	famz, hannsj_uhl, jsnow, juzhang, knoel, michen, ngu, pezhang, qizhu, qzhang, stefanha, virt-maint, xfu, xuhan, xuma
Target Milestone:	rc
Target Release:	7.5
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-20 07:26:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1444027

Description Shuang Yu 2015-09-23 09:27:18 UTC

Description of problem:
When do live commit after create snapshot sn1-sn4,the qemu will give warning
"(qemu) main-loop: WARNING: I/O thread spun for 1000 iterations"

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.3.0-23.el7.ppc64le
kernel-3.10.0-316.el7.ppc64le
SLOF-20150313-5.gitc89b0df.el7.noarch

Guest version:
RHEL6.7-z BE (install rhel6.7 released tree and update to latest rhel6.7-z kernel)
RHEL6.7 install tree:  http://download.eng.bos.redhat.com/released/RHEL-6/6.7/Server/ppc64/
Update guest kernel to:  kernel-2.6.32-573.7.1.el7.ppc64
https://brewweb.devel.redhat.com/buildinfo?buildID=456162

How reproducible:
3/3

Steps to Reproduce:
1.Boot up the guest:
/usr/libexec/qemu-kvm -name live-commit -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-scsi-pci,id=scsi -drive file=RHEL-6.7-20150702.0-Server-ppc64.raw,format=raw,if=none,id=drive-scsi0,cache=none -device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:52:5f:5d:5a -drive file=RHEL-6.7-20150702.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 -device virtio-scsi-pci,id=bus2,addr=0x5 -device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 -uuid 781c99a0-7bef-47de-9ca7-344b87cef66d -m 8G -smp 4 -qmp tcp:0:6666,server,nowait

2.Create snapshot sn1-sn4
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive-scsi0","snapshot-file": "/root/test_home/shuyu/9-23-live-commit/sn1","format":"qcow2"}}
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive-scsi0","snapshot-file": "/root/test_home/shuyu/9-23-live-commit/sn2","format":"qcow2"}}
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive-scsi0","snapshot-file": "/root/test_home/shuyu/9-23-live-commit/sn3","format":"qcow2"}}
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive-scsi0","snapshot-file": "/root/test_home/shuyu/9-23-live-commit/sn4","format":"qcow2"}}
{"return": {}}

3.Do live commit
{"execute":"block-commit","arguments":{"device":"drive-scsi0","base":"/root/test_home/shuyu/9-23-live-commit/sn1","top":"/root/test_home/shuyu/9-23-live-commit/sn3","speed": 10000}}
{"return": {}}
{"timestamp": {"seconds": 1442999489, "microseconds": 48296}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-scsi0", "len": 53687091200, "offset": 53687091200, "speed": 10000, "type": "commit"}}

Actual results:
After the step 3,the live commit can be finished successful,but the qemu will give warning:
(qemu) main-loop: WARNING: I/O thread spun for 1000 iterations

Expected results:
No any warning 

Additional info:

Comment 2 David Gibson 2015-09-25 05:09:27 UTC

Ok, there's nothing obvious here that would make this ppc64 specific.  Can you please check:

    1. Does the bug occur on x86?
    2. Does the bug occur with different guests (e.g. RHEL6-LE, or RHEL7).
    3. Does the bug occur if you use virtio-blk instead of virtio-scsi
    4. Does the bug occur if you use spapr-vscsi instead of virtio-scsi.

Comment 3 Shuang Yu 2015-09-28 11:13:18 UTC

(In reply to David Gibson from comment #2)
> Ok, there's nothing obvious here that would make this ppc64 specific.  Can
> you please check:
> 
>     1. Does the bug occur on x86?

Retest this issue with RHEL-Server-6.7-64.qcow2 and RHEL-Server-7.2-64.qcow2 guest on x86 host,when do live commit after snapshot,hit this bug.

>     2. Does the bug occur with different guests (e.g. RHEL6-LE, or RHEL7).

Retest this issue with RHEL-7.2-20150924.0-Server-ppc64le-dvd1.iso guest on PowerPC host,when do live commit after snapshot,hit this bug.

>     3. Does the bug occur if you use virtio-blk instead of virtio-scsi

Use virtio-blk instead of virtio-scsi as boot disk ,hit this bug.

# /usr/libexec/qemu-kvm -name live-commit -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device virtio-blk-pci,id=virtio0,drive=drive-virtio0 -drive file=RHEL-6.7-20150702.0-Server-ppc64.raw,format=raw,if=none,id=drive-virtio0,werror=stop,cache=none -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:52:5f:5d:5a -uuid 781c99a0-7bef-47de-9ca7-344b87cef66d -m 8G -smp 4 -qmp tcp:0:6666,server,nowait


>     4. Does the bug occur if you use spapr-vscsi instead of virtio-scsi.

Use spapr-vscsi instead of virtio-scsi as boot disk ,hit this bug.

# /usr/libexec/qemu-kvm -name live-commit -machine pseries,accel=kvm,usb=off -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=RHEL-6.7-20150702.0-Server-ppc64.raw,format=raw,if=none,id=drive-scsi0,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:52:5f:5d:5a -uuid 781c99a0-7bef-47de-9ca7-344b87cef66d -m 8G -smp 4 -qmp tcp:0:6666,server,nowait

Comment 4 David Gibson 2015-09-29 01:12:53 UTC

Ok, looks like this isn't actually Power specific.  Re-assigning.

Comment 5 Fam Zheng 2015-10-15 03:10:38 UTC

John,

How we implemented speed=XXX is like this: in the block job loop, between each iteration, we calculate a sleep "period" with "how much IO we have done" and the "speed" parameter, and then we sleep for a while, yielding the BQL mutex, to let vcpu thread run.

The problem with that is, if the speed is set very high, (or if somehow the sleep period is always calculated as very short), we don't really yield the mutex, because the sleeping timer will fire so quickly that the main loop dispatches it even before the first yielding of BQL happens. If that is the case, after 1000 iterations, the warning will be printed.

It's a known problem but I don't know the proper fix yet. Maybe we can add a primitive with an "sleep no shorter than next iteration".

To confirm is the cause we can try a lower speed= value. If I'm right, feel free to re-assign this bug to me.

Comment 6 John Snow 2015-10-27 16:02:57 UTC

I'll debug a little further... it happens with speeds down to 1000, unless you want something /quite/ small in that parameter.

Comment 7 Fam Zheng 2015-11-03 02:52:43 UTC

I've digged the issue by tracing the actual events that are dispatched, it does boil down to the known "block_job_sleep_ns(..., 0);" problem in block jobs. Again, this keeps seting up a timer with "now" as the expiration time which prevents main loop from releasing the iothread lock, therefore it is complaining.

I prototyped something mentioned in comment 5 but that "sleep no shorter than next iteration" is going to slow down the block job too much than I would like.

We need to think about something better.

Anyway this is definitely upstream first material, for 7.2 I think it doesn't hurt to ignore this warning, or disable it temporarily before we have a proper fix.

Fam

Comment 8 John Snow 2015-11-24 20:26:05 UTC

Postponing to 7.3: The warning isn't an error and can be ignored. Will investigate a proper fix and consider a backport to 7.2.z once we discover the root cause and have fixed it upstream.

Comment 10 John Snow 2017-05-04 19:58:01 UTC

This may have been fixed (Or at least show different behavior) by recent changes, but it is not a priority as it is just a warning.

Comment 11 John Snow 2017-11-30 00:28:16 UTC

We've been adjusting the way blockjobs work upstream quite a bit lately; I'm not able to reproduce this any longer.

That doesn't mean it may not still be an issue, but it doesn't appear to be an issue as of 2.11-rc3.

I re-tested in qemu-kvm-rhev-2.10.0-9.el7 too, and it looks alright there as well. Odds are very good that we have more work to do upstream as we're running into some block-job scheduling difficulties, but the problem as it existed for right now appears to be handled.

Tested as:

Guest: Fedora 26 Server
Host: Fedora 27 Workstation
Command Line:

./x86_64-softmmu/qemu-system-x86_64 \
-machine pc,accel=kvm,usb=off \
-realtime mlock=off \
-nodefaults \
-monitor stdio \
-rtc base=utc \
-device virtio-scsi-pci,id=scsi \
-drive file=/media/ext/img/f26.raw,format=raw,if=none,id=drive-scsi0,cache=none \
-device scsi-hd,drive=drive-scsi0,id=disk0,bus=scsi.0 \
-msg timestamp=on \
-usb \
-device usb-tablet,id=tablet1 \
-vga std \
-drive file=/media/ext/iso/linux/Fedora/Fedora-Server-dvd-x86_64-26-1.5.iso,if=none,media=cdrom,format=raw,rerror=stop,werror=stop,id=scsi-cdrom0 \
-device virtio-scsi-pci,id=bus2,addr=0x5 \
-device scsi-cd,bus=bus2.0,drive=scsi-cdrom0,id=cdrom0 \
-uuid 781c99a0-7bef-47de-9ca7-344b87cef66d \
-m 8G \
-smp 4 \
-qmp tcp:0:4444,server,nowait \
-display gtk

Process (using qmp-shell):

blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn1.qcow2 format=qcow2
blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn2.qcow2 format=qcow2
blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn3.qcow2 format=qcow2
blockdev-snapshot-sync device=drive-scsi0 snapshot-file=/media/ext/img/f26.sn4.qcow2 format=qcow2

block-commit device=drive-scsi0 base=/media/ext/img/f26.raw top=/media/ext/img/f26.sn4.qcow2 speed=10000


To the QA team: if you were to retest, I suspect you'd find the behavior is now different.
(Or, I just can't reproduce. Please let me know!)

Thank you,
John.

Comment 12 Qianqian Zhu 2017-11-30 03:42:38 UTC

I can still reproduced this issue with latest qemu version:

Version:
qemu-kvm-rhev-2.10.0-9.el7.x86_64
qemu-img-rhev-2.10.0-9.el7.x86_64
kernel-3.10.0-800.el7.x86_64

Steps:
1. Create base image:
qemu-img create /home/test.raw 1G

2. Launch guest with the image:
/usr/libexec/qemu-kvm \
-name rhel7_4 \
-m 1G \
-smp 2 \
 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,id=drive_image1,if=none,cache=none,media=disk \
-device virtio-blk-pci,drive=drive_image1,id=virtio-blk-0,bootindex=0 \
-qmp stdio \
-vnc :3 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 \
-drive file=/home/test.raw,id=drive_image2,format=raw,if=none,cache=none,media=disk \
-device virtio-blk-pci,drive=drive_image2,id=virtio-blk-1

3. Live snapshot and live commit:
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file": "/home/sn1","format":"qcow2"}}
Formatting '/home/sn1', fmt=qcow2 size=1073741824 backing_file=/home/test.raw backing_fmt=raw cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file": "/home/sn2","format":"qcow2"}}
Formatting '/home/sn2', fmt=qcow2 size=1073741824 backing_file=/home/sn1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"return": {}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file": "/home/sn3","format":"qcow2"}}
Formatting '/home/sn3', fmt=qcow2 size=1073741824 backing_file=/home/sn2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"return": {}}
{"execute":"block-commit","arguments":{"device":"drive_image2","base":"/home/test.raw","top":"/home/sn2","speed": 10000}}
{"return": {}}
main-loop: WARNING: I/O thread spun for 1000 iterations
{"timestamp": {"seconds": 1512012910, "microseconds": 534683}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image2", "len": 1073741824, "offset": 1073741824, "speed": 10000, "type": "commit"}}

Result:
As above, still have:
main-loop: WARNING: I/O thread spun for 1000 iterations

Comment 13 John Snow 2017-12-14 01:02:06 UTC

Qianqian, my apologies -- I misread the test case. There is indeed a difference between whether you use SN3 or SN4 for the top. I was able to reproduce and identify the issue.

If the speed of the operation is slow enough, the delay computed by the copy loop will effectively always be zero, and we will try to yield aggressively with 0 nanosecond sleeps every single iteration of the loop. After 1000 of these, we get the warning.

I've taken a stab at fixing this upstream that should be a bit nicer on the main loop, "[PATCH 0/7] blockjob: refactor mirror_throttle"

For now, I will be postponing this to next release -- and I don't think we need to include a z-stream fix for now, as the bug is mostly a case of inefficiency and not necessarily a showstopper.