Red Hat Bugzilla – Bug 1492295
Guest hit call trace with iothrottling(iops) after the status from stop to cont during doing io testing
Last modified: 2018-04-10 20:37:50 EDT
Description of problem: Guest hit call trace with iothrottling(iops) after the status from stop to cont during doing io testing on (DD2)POWER9. Version-Release number of selected component (if applicable): Host: 4.11.0-32.el7a.ppc64le Guest: 4.11.0-32.el7a.ppc64le Qemu-kvm:qemu-kvm-2.9.0-22.el7a.ppc64le How reproducible: 100% Steps to Reproduce: 1.boot a guest with iothrottling eg: [root@virt4 command]# cat rhel-alt74-8381-bug.sh /usr/libexec/qemu-kvm \ -name 'yhong-rhel-alt-74-8381' \ -machine pseries-rhel7.4.0alt \ -m 4G \ -nodefaults \ -smp 4,cores=4,threads=1,sockets=1 \ -boot order=cdn,once=d,menu=on,strict=on \ -device nec-usb-xhci,id=xhci \ -device usb-tablet,id=usb-tablet0 \ -device usb-kbd,id=usb-kbd0 \ -enable-kvm \ -drive file=/home/yhong/image/RHEL-ALT-7.4-20170830.0-throttling-virtio-blk-20G.raw,format=raw,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 \ -device virtio-blk-pci,bus=pci.0,addr=0x04,drive=drive-0,id=virtio-blk-0,bootindex=0 \ -drive file=/home/yhong/image/data-disk1-11G.raw,format=raw,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-1,iops=100 \ -device virtio-blk-pci,bus=pci.0,addr=0x05,drive=drive-1,id=virtio-blk-1 \ -netdev tap,id=hostnet0,script=/etc/qemu-ifup \ -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:90 \ -qmp tcp:0:4000,server,nowait \ -chardev socket,id=serial_id_serial,path=/var/tmp/serial-yhong-throttling-virtio-blk-0,server,nowait \ -device spapr-vty,reg=0x30000000,chardev=serial_id_serial \ -monitor stdio 2.do dd testing with data disk eg: [root@localhost home]# cat debug-auto-dd.sh #!/bin/bash for BS in 512b 1K 4K 8K 16K do echo "dd if=/dev/zero of=/dev/vdb bs=$BS count=1000 oflag=direct status=progress" dd if=/dev/zero of=/dev/vdb bs=$BS count=1000 oflag=direct status=progress done 3.stop the guest. eg: {"execute": "stop"} 4.cont the guest after a while (60s ~ 120S) eg: {"execute": "cont"} Actual results: The testing is hung and hit call trace. [root@localhost ~]# [ 243.693446] INFO: task dd:1454 blocked for more than 120 seconds. [ 243.693552] Not tainted 4.11.0-32.el7a.ppc64le #1 [ 243.693693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.693836] dd D 0 1454 1453 0x00040080 [ 243.693888] Call Trace: [ 243.693917] [c0000000f3b2f6c0] [c0000000f3ec84a8] 0xc0000000f3ec84a8 (unreliable) [ 243.693990] [c0000000f3b2f890] [c00000000001cfb0] __switch_to+0x310/0x480 [ 243.694052] [c0000000f3b2f8f0] [c000000000beb354] __schedule+0x3b4/0xb90 [ 243.694111] [c0000000f3b2f9c0] [c000000000170494] io_schedule+0x84/0x120 [ 243.694171] [c0000000f3b2f9f0] [c00000000048da8c] __blkdev_direct_IO_simple+0x19c/0x400 [ 243.694243] [c0000000f3b2fb20] [c00000000048e150] blkdev_direct_IO+0x460/0x4a0 [ 243.694319] [c0000000f3b2fbd0] [c0000000002feae8] generic_file_direct_write+0x298/0x3a0 [ 243.694393] [c0000000f3b2fc60] [c0000000002fed4c] __generic_file_write_iter+0x15c/0x240 [ 243.694463] [c0000000f3b2fcc0] [c00000000048f474] blkdev_write_iter+0x94/0x140 [ 243.694549] [c0000000f3b2fd30] [c000000000414a28] __vfs_write+0x178/0x1f0 [ 243.694611] [c0000000f3b2fdd0] [c00000000041929c] SyS_write+0x16c/0x340 [ 243.694675] [c0000000f3b2fe30] [c00000000000b004] system_call+0x38/0xe0 Expected results: the testing could be go on doing and no call trace. Additional info: It is also reproduced on X86 and POWER8 X86: Host :3.10.0-693.2.1.el7.x86_64 Guest :3.10.0-693.2.1.el7.x86_64 Qemu-kvm:qemu-kvm-rhev-2.9.0-16.el7_4.6.x86_64 Sep 16 03:41:26 localhost kernel: INFO: task dd:1406 blocked for more than 120 seconds. Sep 16 03:41:26 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 16 03:41:26 localhost kernel: dd D ffff880136388000 0 1406 1404 0x00000080 Sep 16 03:41:26 localhost kernel: ffff88003663b920 0000000000000082 ffff8800364a4f10 ffff88003663bfd8 Sep 16 03:41:26 localhost kernel: ffff88003663bfd8 ffff88003663bfd8 ffff8800364a4f10 ffff88013fd16cc0 Sep 16 03:41:26 localhost kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff880136388000 Sep 16 03:41:26 localhost kernel: Call Trace: Sep 16 03:41:26 localhost kernel: [<ffffffff816a94c9>] schedule+0x29/0x70 Sep 16 03:41:26 localhost kernel: [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0 Sep 16 03:41:26 localhost kernel: [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20 Sep 16 03:41:26 localhost kernel: [<ffffffff816a8b4d>] io_schedule_timeout+0xad/0x130 Sep 16 03:41:26 localhost kernel: [<ffffffff816a8be8>] io_schedule+0x18/0x20 Sep 16 03:41:26 localhost kernel: [<ffffffff812421d5>] do_blockdev_direct_IO+0x1c45/0x2020 Sep 16 03:41:26 localhost kernel: [<ffffffff8123cb30>] ? I_BDEV+0x10/0x10 Sep 16 03:41:26 localhost kernel: [<ffffffff81242605>] __blockdev_direct_IO+0x55/0x60 Sep 16 03:41:26 localhost kernel: [<ffffffff8123cb30>] ? I_BDEV+0x10/0x10 Sep 16 03:41:26 localhost kernel: [<ffffffff8123d420>] blkdev_direct_IO+0x60/0xa0 Sep 16 03:41:26 localhost kernel: [<ffffffff8123cb30>] ? I_BDEV+0x10/0x10 Sep 16 03:41:26 localhost kernel: [<ffffffff8118464d>] generic_file_direct_write+0xcd/0x190 Sep 16 03:41:26 localhost kernel: [<ffffffff81184947>] __generic_file_aio_write+0x237/0x400 Sep 16 03:41:26 localhost kernel: [<ffffffff8123de16>] blkdev_aio_write+0x56/0xb0 Sep 16 03:41:26 localhost kernel: [<ffffffff8120026d>] do_sync_write+0x8d/0xd0 Sep 16 03:41:26 localhost kernel: [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0 Sep 16 03:41:26 localhost kernel: [<ffffffff81201b3f>] SyS_write+0x7f/0xe0 Sep 16 03:41:26 localhost kernel: [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b POWER8: Host :3.10.0-693.el7.ppc64le Guest :3.10.0-693.el7.ppc64le Qemu-kvm:qemu-kvm-rhev-2.9.0-16.el7_4.3.ppc64le [root@dhcp71-168 ~]# [10800.790048] INFO: task fio:23344 blocked for more than 120 seconds. [10800.790149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10800.790197] fio D 00003fffb78e47bc 0 23344 23342 0x00040080 [10800.790249] Call Trace: [10800.790372] [c0000000a12d74a0] [c0000000a12d75f0] 0xc0000000a12d75f0 (unreliable) [10800.790464] [c0000000a12d7670] [c000000000019dbc] __switch_to+0x25c/0x470 [10800.790529] [c0000000a12d76d0] [c0000000009c98e8] __schedule+0x438/0xb00 [10800.790580] [c0000000a12d77b0] [c0000000009c59a8] schedule_timeout+0x398/0x460 [10800.790638] [c0000000a12d78b0] [c0000000009c93fc] io_schedule+0xcc/0x180 [10800.790695] [c0000000a12d78f0] [c000000000399bac] __blockdev_direct_IO+0xf7c/0x2170 [10800.790755] [c0000000a12d7ae0] [c000000000392590] blkdev_direct_IO+0x60/0xc0 [10800.790806] [c0000000a12d7b60] [c00000000025ab38] generic_file_aio_read+0x548/0xd80 [10800.790862] [c0000000a12d7c60] [c0000000003930bc] blkdev_aio_read+0x5c/0xb0 [10800.790914] [c0000000a12d7ca0] [c000000000324fb4] do_sync_read+0x94/0x110 [10800.790965] [c0000000a12d7da0] [c000000000325f50] vfs_read+0x110/0x290 [10800.791015] [c0000000a12d7de0] [c000000000327f70] SyS_pread64+0xb0/0xd0 [10800.791065] [c0000000a12d7e30] [c00000000000a184] system_call+0x38/0xb4
Patch posted upstream: https://patchwork.ozlabs.org/patch/815627/
The bug also occurred on following qemu-kvm-rhev versions: qemu-kvm-rhev-2.9.0-16.el7.x86_64 (RHEL7.4 released) qemu-kvm-rhev-2.10.0-4.el7.x86_64 (Latest RHEL7.5) It's not a regression.
Fix included in qemu-kvm-rhev-2.10.0-8.el7
Verify the bug with the same steps as that for the bug report: host kernel: 3.10.0-799.el7.x86_64 guest kernel: 3.10.0-768.el7.x86_64 qemu: qemu-kvm-rhev-2.10.0-9.el7.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104