1492295 – Guest hit call trace with iothrottling(iops) after the status from stop to cont during doing io testing

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1492295 - Guest hit call trace with iothrottling(iops) after the status from stop to cont during doing io testing

Summary: Guest hit call trace with iothrottling(iops) after the status from stop to co...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4-Alt
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Stefan Hajnoczi
QA Contact:	Gu Nini
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-16 08:22 UTC by Yongxue Hong
Modified:	2018-04-11 00:37 UTC (History)
CC List:	10 users (show)
Fixed In Version:	qemu-kvm-rhev-2.10.0-8.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-11 00:36:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:1104	0	None	None	None	2018-04-11 00:37:49 UTC

Description Yongxue Hong 2017-09-16 08:22:01 UTC

Description of problem:
Guest hit call trace with iothrottling(iops) after the status from stop to cont during doing io testing on (DD2)POWER9.

Version-Release number of selected component (if applicable):
Host: 4.11.0-32.el7a.ppc64le
Guest: 4.11.0-32.el7a.ppc64le
Qemu-kvm:qemu-kvm-2.9.0-22.el7a.ppc64le

How reproducible:
100%

Steps to Reproduce:
1.boot a guest with iothrottling
eg:
[root@virt4 command]# cat rhel-alt74-8381-bug.sh
/usr/libexec/qemu-kvm \
-name 'yhong-rhel-alt-74-8381' \
-machine pseries-rhel7.4.0alt \
-m 4G \
-nodefaults \
-smp 4,cores=4,threads=1,sockets=1 \
-boot order=cdn,once=d,menu=on,strict=on  \
-device nec-usb-xhci,id=xhci \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-enable-kvm \
-drive file=/home/yhong/image/RHEL-ALT-7.4-20170830.0-throttling-virtio-blk-20G.raw,format=raw,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 \
-device virtio-blk-pci,bus=pci.0,addr=0x04,drive=drive-0,id=virtio-blk-0,bootindex=0 \
-drive file=/home/yhong/image/data-disk1-11G.raw,format=raw,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-1,iops=100 \
-device virtio-blk-pci,bus=pci.0,addr=0x05,drive=drive-1,id=virtio-blk-1 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:90 \
-qmp tcp:0:4000,server,nowait \
-chardev socket,id=serial_id_serial,path=/var/tmp/serial-yhong-throttling-virtio-blk-0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial \
-monitor stdio

2.do dd testing with data disk
eg:
[root@localhost home]# cat debug-auto-dd.sh
#!/bin/bash
for BS in 512b 1K 4K 8K 16K
do
		echo "dd if=/dev/zero of=/dev/vdb bs=$BS count=1000 oflag=direct status=progress" 
		dd if=/dev/zero of=/dev/vdb bs=$BS count=1000 oflag=direct status=progress
done

3.stop the guest.
eg:
{"execute": "stop"}

4.cont the guest after a while (60s ~ 120S)
eg:
{"execute": "cont"}

Actual results:
The testing is hung and hit call trace.
[root@localhost ~]# [  243.693446] INFO: task dd:1454 blocked for more than 120 seconds.
[  243.693552]       Not tainted 4.11.0-32.el7a.ppc64le #1
[  243.693693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.693836] dd              D    0  1454   1453 0x00040080
[  243.693888] Call Trace:
[  243.693917] [c0000000f3b2f6c0] [c0000000f3ec84a8] 0xc0000000f3ec84a8 (unreliable)
[  243.693990] [c0000000f3b2f890] [c00000000001cfb0] __switch_to+0x310/0x480
[  243.694052] [c0000000f3b2f8f0] [c000000000beb354] __schedule+0x3b4/0xb90
[  243.694111] [c0000000f3b2f9c0] [c000000000170494] io_schedule+0x84/0x120
[  243.694171] [c0000000f3b2f9f0] [c00000000048da8c] __blkdev_direct_IO_simple+0x19c/0x400
[  243.694243] [c0000000f3b2fb20] [c00000000048e150] blkdev_direct_IO+0x460/0x4a0
[  243.694319] [c0000000f3b2fbd0] [c0000000002feae8] generic_file_direct_write+0x298/0x3a0
[  243.694393] [c0000000f3b2fc60] [c0000000002fed4c] __generic_file_write_iter+0x15c/0x240
[  243.694463] [c0000000f3b2fcc0] [c00000000048f474] blkdev_write_iter+0x94/0x140
[  243.694549] [c0000000f3b2fd30] [c000000000414a28] __vfs_write+0x178/0x1f0
[  243.694611] [c0000000f3b2fdd0] [c00000000041929c] SyS_write+0x16c/0x340
[  243.694675] [c0000000f3b2fe30] [c00000000000b004] system_call+0x38/0xe0

Expected results:
the testing could be go on doing and no call trace.

Additional info:
It is also reproduced on X86 and POWER8

X86:
Host :3.10.0-693.2.1.el7.x86_64
Guest :3.10.0-693.2.1.el7.x86_64
Qemu-kvm:qemu-kvm-rhev-2.9.0-16.el7_4.6.x86_64

Sep 16 03:41:26 localhost kernel: INFO: task dd:1406 blocked for more than 120 seconds.
Sep 16 03:41:26 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 16 03:41:26 localhost kernel: dd              D ffff880136388000     0  1406   1404 0x00000080
Sep 16 03:41:26 localhost kernel: ffff88003663b920 0000000000000082 ffff8800364a4f10 ffff88003663bfd8
Sep 16 03:41:26 localhost kernel: ffff88003663bfd8 ffff88003663bfd8 ffff8800364a4f10 ffff88013fd16cc0
Sep 16 03:41:26 localhost kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff880136388000
Sep 16 03:41:26 localhost kernel: Call Trace:
Sep 16 03:41:26 localhost kernel: [<ffffffff816a94c9>] schedule+0x29/0x70
Sep 16 03:41:26 localhost kernel: [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
Sep 16 03:41:26 localhost kernel: [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
Sep 16 03:41:26 localhost kernel: [<ffffffff816a8b4d>] io_schedule_timeout+0xad/0x130
Sep 16 03:41:26 localhost kernel: [<ffffffff816a8be8>] io_schedule+0x18/0x20
Sep 16 03:41:26 localhost kernel: [<ffffffff812421d5>] do_blockdev_direct_IO+0x1c45/0x2020
Sep 16 03:41:26 localhost kernel: [<ffffffff8123cb30>] ? I_BDEV+0x10/0x10
Sep 16 03:41:26 localhost kernel: [<ffffffff81242605>] __blockdev_direct_IO+0x55/0x60
Sep 16 03:41:26 localhost kernel: [<ffffffff8123cb30>] ? I_BDEV+0x10/0x10
Sep 16 03:41:26 localhost kernel: [<ffffffff8123d420>] blkdev_direct_IO+0x60/0xa0
Sep 16 03:41:26 localhost kernel: [<ffffffff8123cb30>] ? I_BDEV+0x10/0x10
Sep 16 03:41:26 localhost kernel: [<ffffffff8118464d>] generic_file_direct_write+0xcd/0x190
Sep 16 03:41:26 localhost kernel: [<ffffffff81184947>] __generic_file_aio_write+0x237/0x400
Sep 16 03:41:26 localhost kernel: [<ffffffff8123de16>] blkdev_aio_write+0x56/0xb0
Sep 16 03:41:26 localhost kernel: [<ffffffff8120026d>] do_sync_write+0x8d/0xd0
Sep 16 03:41:26 localhost kernel: [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0
Sep 16 03:41:26 localhost kernel: [<ffffffff81201b3f>] SyS_write+0x7f/0xe0
Sep 16 03:41:26 localhost kernel: [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b

POWER8:
Host :3.10.0-693.el7.ppc64le
Guest :3.10.0-693.el7.ppc64le
Qemu-kvm:qemu-kvm-rhev-2.9.0-16.el7_4.3.ppc64le

[root@dhcp71-168 ~]# [10800.790048] INFO: task fio:23344 blocked for more than 120 seconds.
[10800.790149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10800.790197] fio             D 00003fffb78e47bc     0 23344  23342 0x00040080
[10800.790249] Call Trace:
[10800.790372] [c0000000a12d74a0] [c0000000a12d75f0] 0xc0000000a12d75f0 (unreliable)
[10800.790464] [c0000000a12d7670] [c000000000019dbc] __switch_to+0x25c/0x470
[10800.790529] [c0000000a12d76d0] [c0000000009c98e8] __schedule+0x438/0xb00
[10800.790580] [c0000000a12d77b0] [c0000000009c59a8] schedule_timeout+0x398/0x460
[10800.790638] [c0000000a12d78b0] [c0000000009c93fc] io_schedule+0xcc/0x180
[10800.790695] [c0000000a12d78f0] [c000000000399bac] __blockdev_direct_IO+0xf7c/0x2170
[10800.790755] [c0000000a12d7ae0] [c000000000392590] blkdev_direct_IO+0x60/0xc0
[10800.790806] [c0000000a12d7b60] [c00000000025ab38] generic_file_aio_read+0x548/0xd80
[10800.790862] [c0000000a12d7c60] [c0000000003930bc] blkdev_aio_read+0x5c/0xb0
[10800.790914] [c0000000a12d7ca0] [c000000000324fb4] do_sync_read+0x94/0x110
[10800.790965] [c0000000a12d7da0] [c000000000325f50] vfs_read+0x110/0x290
[10800.791015] [c0000000a12d7de0] [c000000000327f70] SyS_pread64+0xb0/0xd0
[10800.791065] [c0000000a12d7e30] [c00000000000a184] system_call+0x38/0xb4

Comment 1 Stefan Hajnoczi 2017-09-19 15:59:29 UTC

Patch posted upstream:
https://patchwork.ozlabs.org/patch/815627/

Comment 2 Gu Nini 2017-11-03 08:42:04 UTC

The bug also occurred on following qemu-kvm-rhev versions:

qemu-kvm-rhev-2.9.0-16.el7.x86_64 (RHEL7.4 released)
qemu-kvm-rhev-2.10.0-4.el7.x86_64 (Latest RHEL7.5)

It's not a regression.

Comment 4 Miroslav Rezanina 2017-11-27 14:46:52 UTC

Fix included in qemu-kvm-rhev-2.10.0-8.el7

Comment 5 Gu Nini 2017-11-30 06:11:36 UTC

Verify the bug with the same steps as that for the bug report:

host kernel: 3.10.0-799.el7.x86_64
guest kernel: 3.10.0-768.el7.x86_64
qemu: qemu-kvm-rhev-2.10.0-9.el7.x86_64

Comment 7 errata-xmlrpc 2018-04-11 00:36:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104

Note You need to log in before you can comment on or make changes to this bug.