Bug 1420722 - Block mirror does not work with dataplane, the mirrored disk can not be operated.
Summary: Block mirror does not work with dataplane, the mirrored disk can not be opera...
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Qianqian Zhu
URL:
Whiteboard:
Keywords: Regression, TestOnly
Depends On: 1415947
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-09 11:56 UTC by Yongxue Hong
Modified: 2017-07-12 06:01 UTC (History)
13 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-06-16 20:53:10 UTC


Attachments (Terms of Use)

Description Yongxue Hong 2017-02-09 11:56:55 UTC
Description of problem:
Block mirror does not work with dataplane, the mirrored disk can not be operated since blocked for more than 120 seconds.
the file /var/log/messages indicates "NFO: task fdisk:2874 blocked for more than 120 seconds....kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message."

Version-Release number of selected component (if applicable):
Host kernel: kernel-3.10.0-558.el7.ppc64le
qemu: qemu-kvm-rhev-2.8.0-3.el7
slof: SLOF-20160223-6.gitdbbfda4.el7
Guest Kernel:kernel-3.10.0-558.el7.ppc64

How reproducible:
100%

Steps to Reproduce:
1.boot a guest with dataplane 
[root@ibm-p8-garrison-06 command]# cat qemu-kvm-rhel7-virtio-blk-9343-3
/usr/libexec/qemu-kvm \
-name RHEL7-virtio-blk-9343-3 \
-M pseries-rhel7.4.0 \
-m 8G \
-nodefaults \
-smp 2,sockets=1,cores=1,threads=2 \
-boot menu=on,order=cd \
-device VGA,id=vga0,addr=0 \
-device nec-usb-xhci,id=xhci \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0 \
-drive file=/home/hyx/os/RHEL-7.3-20161019.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,id=image0 \
-device scsi-cd,id=scsi-cd0,drive=image0,channel=0,scsi-id=0,lun=0,bootindex=1 \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-drive file=/home/hyx/image/RHEL7-9343-20G.qcow2,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,iothread=iothread0,bus=pci.0,addr=0x15,bootindex=0 \
-drive file=/home/hyx/image/RHEL7-9343-25G.qcow2,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,iothread=iothread1,bus=pci.0,addr=0x14,serial="QEMU-DISK2" \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 \
-monitor stdio \
-serial unix:./sock4,server,nowait \
-qmp tcp:0:3000,server,nowait \
-vnc :1

2.mirror and do block-job-complete with qmp
[root@ibm-p8-garrison-06 command]# telnet 0 3000
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 8, "major": 2}, "package": "(qemu-kvm-rhev-2.8.0-3.el7)"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
{"return": {}}
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk1", "target": "/home/hyx/image/drive-mirror-9343.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{"timestamp": {"seconds": 1486633339, "microseconds": 359360}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
{"return": {}}
{ "execute": "block-job-complete", "arguments": { "device": "drive-virtio-blk1"} }
{"return": {}}
{"timestamp": {"seconds": 1486633364, "microseconds": 112021}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}

3.login guest and run fio
[root@dhcp113-212 ~]# fio --filename=/dev/vdb --direct=1 --rw=write --bs=64k --size=200M --name=test --iodepth=1 --ioengine=libaio
test: (g=0): rw=write, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1

Actual results:
the fio has no any response and cat /var/log/messages show "INFO: task fdisk:2874 blocked for more than 120 seconds..."

Expected results:
the fio should be run successfully.

Additional info:
the part of /var/log/messages:
Feb  9 04:54:53 dhcp113-212 kernel: INFO: task fdisk:2874 blocked for more than 120 seconds.
Feb  9 04:54:53 dhcp113-212 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 04:54:53 dhcp113-212 kernel: fdisk           D 00003fff788f673c     0  2874      1 0x00002082
Feb  9 04:54:53 dhcp113-212 kernel: Call Trace:
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959ae10] [c0000001f959aeb0] 0xc0000001f959aeb0 (unreliable)
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959afe0] [c000000000016194] .__switch_to+0x254/0x460
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b090] [c0000000009845c8] .__schedule+0x418/0xad0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b1c0] [c0000000009808b8] .schedule_timeout+0x398/0x460
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b2d0] [c000000000984104] .io_schedule+0xc4/0x170
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b360] [c000000000980ad8] .bit_wait_io+0x18/0x70
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b3e0] [c0000000009810e4] .__wait_on_bit_lock+0x124/0x2e0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b4a0] [c00000000024dba0] .__lock_page+0xc0/0xe0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b550] [c00000000026b39c] .truncate_inode_pages_range+0x94c/0x960
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b740] [c000000000381368] .__blkdev_put+0xd8/0x260
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b7f0] [c000000000381a44] .blkdev_close+0x74/0x1c0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b890] [c00000000031c0a0] .____fput+0xd0/0x2e0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b940] [c000000000110254] .task_work_run+0x114/0x150
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b9e0] [c0000000000de234] .do_exit+0x364/0xba0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bae0] [c0000000000deb14] .do_group_exit+0x54/0xf0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bb70] [c0000000000f74e0] .get_signal_to_deliver+0x210/0x9c0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bc70] [c000000000017fc4] .do_signal+0x54/0x320
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bdb0] [c0000000000183ec] .do_notify_resume+0x8c/0x100
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959be30] [c00000000000a730] .ret_from_except_lite+0x5c/0x60

and the problem also exist in x86.

Comment 4 Qianqian Zhu 2017-04-26 08:31:16 UTC
Test on qemu-kvm-rhev-2.9.0-1.el7.x86_64, if mirror system disk with data-plane, guest will hang after reopen with mirror image; if it's second disk, guest looks well, but fio will hang.

So reopen this issue. 

Steps:
1. Launch guest with data-plane:
/usr/libexec/qemu-kvm -name rhel7_4-9343 -m 1G -smp 2 -object iothread,id=iothread0 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 -device virtio-blk-pci,drive=drive-0,id=virtio-blk-0,iothread=iothread0,bootindex=0 -drive file=/home/sn1,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,iothread=iothread1,bus=pci.0,addr=0x14,serial="QEMU-DISK2" -object iothread,id=iothread1  -monitor stdio -qmp tcp:0:5555,server,nowait -vnc :3 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15

2. Block mirror:
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk1", "target": "/home/mirror2.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{ "execute": "block-job-complete", "arguments": { "device": "drive-virtio-blk1"} }

3. Run fio inside guest:
fio --filename=/dev/vdb --direct=1 --rw=write --bs=64k --size=200M --name=test --iodepth=1 --ioengine=libaio

Result:
# cat /var/log/messages
Apr 26 16:22:19 dhcp-8-107 kernel: INFO: task kworker/1:1:45 blocked for more than 120 seconds.
Apr 26 16:22:19 dhcp-8-107 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Comment 6 Paolo Bonzini 2017-06-14 13:04:04 UTC
Junyi, I'm looking at this.

Comment 10 Paolo Bonzini 2017-06-15 16:53:29 UTC
I cannot reproduce this (Fedora 21 guest, but that shouldn't matter).

$ x86_64-softmmu/qemu-system-x86_64 \
   -drive id=hd,if=none,file=$HOME/jeos-21-64.qcow2 -object iothread,id=io \
   -device virtio-blk-pci,drive=hd,iothread=io -m 512 -qmp stdio \
   -drive id=drive-virtio-blk1,if=none,cache=none,file=source.qcow2,aio=native,werror=stop,rerror=stop \
   -object iothread,id=iothread1 \
   -device virtio-blk-pci,iothread=iothread1,drive=drive-virtio-blk1,id=virtio-blk1,serial=QEMU-DISK2
<< {"QMP": {"version": {"qemu": {"micro": 0, "minor": 9, "major": 2}, "package": " (v2.9.0-140-g8bd5355efe)"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
<< {"return": {}}
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk1", "target": "target.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
<< Formatting 'target.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
<< {"return": {}}
<< {"timestamp": {"seconds": 1497545152, "microseconds": 910084}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
{ "execute": "block-job-complete", "arguments": { "device": "drive-virtio-blk1"} }
<< {"return": {}}
<< {"timestamp": {"seconds": 1497545162, "microseconds": 784525}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}

(fio runs well now)

{"timestamp": {"seconds": 1497545223, "microseconds": 486175}, "event": "SHUTDOWN"}

Comment 11 Paolo Bonzini 2017-06-15 16:53:57 UTC
Please retest with qemu-kvm-rhev-2.9.0-8.el7.

Comment 12 Ademar Reis 2017-06-15 17:28:59 UTC
(In reply to Paolo Bonzini from comment #11)
> Please retest with qemu-kvm-rhev-2.9.0-8.el7.

Setting needinfo.

Comment 13 Qianqian Zhu 2017-06-16 05:30:34 UTC
Retest with qemu-kvm-rhev-2.9.0-10.el7.x86_64, the issue has gone, it should be fixed in latest version.

Comment 14 Chao Yang 2017-06-16 07:19:30 UTC
(In reply to Qianqian Zhu from comment #13)
> Retest with qemu-kvm-rhev-2.9.0-10.el7.x86_64, the issue has gone, it should
> be fixed in latest version.

Hi Paolo, 

Now that it is not reproducible with latest build, I think we can close it now, any other concerns?

Comment 15 Paolo Bonzini 2017-06-16 12:51:43 UTC
Yes, please move it to VERIFIED.

Comment 16 Ademar Reis 2017-06-16 20:53:10 UTC
Given this was found during development, there's no need to add it to add it to the errata. I'm closing it as CURRENTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.