1420722 – Block mirror does not work with dataplane, the mirrored disk can not be operated.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1420722 - Block mirror does not work with dataplane, the mirrored disk can not be operated.

Summary: Block mirror does not work with dataplane, the mirrored disk can not be opera...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Paolo Bonzini
QA Contact:	Qianqian Zhu
Docs Contact:
URL:
Whiteboard:
Depends On:	1415947
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-09 11:56 UTC by Yongxue Hong
Modified:	2017-07-12 06:01 UTC (History)
CC List:	13 users (show)
Fixed In Version:	qemu-kvm-rhev-2.9.0-10.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-16 20:53:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Yongxue Hong 2017-02-09 11:56:55 UTC

Description of problem:
Block mirror does not work with dataplane, the mirrored disk can not be operated since blocked for more than 120 seconds.
the file /var/log/messages indicates "NFO: task fdisk:2874 blocked for more than 120 seconds....kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message."

Version-Release number of selected component (if applicable):
Host kernel: kernel-3.10.0-558.el7.ppc64le
qemu: qemu-kvm-rhev-2.8.0-3.el7
slof: SLOF-20160223-6.gitdbbfda4.el7
Guest Kernel:kernel-3.10.0-558.el7.ppc64

How reproducible:
100%

Steps to Reproduce:
1.boot a guest with dataplane 
[root@ibm-p8-garrison-06 command]# cat qemu-kvm-rhel7-virtio-blk-9343-3
/usr/libexec/qemu-kvm \
-name RHEL7-virtio-blk-9343-3 \
-M pseries-rhel7.4.0 \
-m 8G \
-nodefaults \
-smp 2,sockets=1,cores=1,threads=2 \
-boot menu=on,order=cd \
-device VGA,id=vga0,addr=0 \
-device nec-usb-xhci,id=xhci \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0 \
-drive file=/home/hyx/os/RHEL-7.3-20161019.0-Server-ppc64-dvd1.iso,if=none,media=cdrom,id=image0 \
-device scsi-cd,id=scsi-cd0,drive=image0,channel=0,scsi-id=0,lun=0,bootindex=1 \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-drive file=/home/hyx/image/RHEL7-9343-20G.qcow2,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,iothread=iothread0,bus=pci.0,addr=0x15,bootindex=0 \
-drive file=/home/hyx/image/RHEL7-9343-25G.qcow2,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,iothread=iothread1,bus=pci.0,addr=0x14,serial="QEMU-DISK2" \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 \
-monitor stdio \
-serial unix:./sock4,server,nowait \
-qmp tcp:0:3000,server,nowait \
-vnc :1

2.mirror and do block-job-complete with qmp
[root@ibm-p8-garrison-06 command]# telnet 0 3000
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 8, "major": 2}, "package": "(qemu-kvm-rhev-2.8.0-3.el7)"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
{"return": {}}
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk1", "target": "/home/hyx/image/drive-mirror-9343.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{"timestamp": {"seconds": 1486633339, "microseconds": 359360}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
{"return": {}}
{ "execute": "block-job-complete", "arguments": { "device": "drive-virtio-blk1"} }
{"return": {}}
{"timestamp": {"seconds": 1486633364, "microseconds": 112021}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}

3.login guest and run fio
[root@dhcp113-212 ~]# fio --filename=/dev/vdb --direct=1 --rw=write --bs=64k --size=200M --name=test --iodepth=1 --ioengine=libaio
test: (g=0): rw=write, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1

Actual results:
the fio has no any response and cat /var/log/messages show "INFO: task fdisk:2874 blocked for more than 120 seconds..."

Expected results:
the fio should be run successfully.

Additional info:
the part of /var/log/messages:
Feb  9 04:54:53 dhcp113-212 kernel: INFO: task fdisk:2874 blocked for more than 120 seconds.
Feb  9 04:54:53 dhcp113-212 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 04:54:53 dhcp113-212 kernel: fdisk           D 00003fff788f673c     0  2874      1 0x00002082
Feb  9 04:54:53 dhcp113-212 kernel: Call Trace:
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959ae10] [c0000001f959aeb0] 0xc0000001f959aeb0 (unreliable)
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959afe0] [c000000000016194] .__switch_to+0x254/0x460
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b090] [c0000000009845c8] .__schedule+0x418/0xad0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b1c0] [c0000000009808b8] .schedule_timeout+0x398/0x460
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b2d0] [c000000000984104] .io_schedule+0xc4/0x170
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b360] [c000000000980ad8] .bit_wait_io+0x18/0x70
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b3e0] [c0000000009810e4] .__wait_on_bit_lock+0x124/0x2e0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b4a0] [c00000000024dba0] .__lock_page+0xc0/0xe0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b550] [c00000000026b39c] .truncate_inode_pages_range+0x94c/0x960
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b740] [c000000000381368] .__blkdev_put+0xd8/0x260
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b7f0] [c000000000381a44] .blkdev_close+0x74/0x1c0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b890] [c00000000031c0a0] .____fput+0xd0/0x2e0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b940] [c000000000110254] .task_work_run+0x114/0x150
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959b9e0] [c0000000000de234] .do_exit+0x364/0xba0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bae0] [c0000000000deb14] .do_group_exit+0x54/0xf0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bb70] [c0000000000f74e0] .get_signal_to_deliver+0x210/0x9c0
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bc70] [c000000000017fc4] .do_signal+0x54/0x320
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959bdb0] [c0000000000183ec] .do_notify_resume+0x8c/0x100
Feb  9 04:54:53 dhcp113-212 kernel: [c0000001f959be30] [c00000000000a730] .ret_from_except_lite+0x5c/0x60

and the problem also exist in x86.

Comment 4 Qianqian Zhu 2017-04-26 08:31:16 UTC

Test on qemu-kvm-rhev-2.9.0-1.el7.x86_64, if mirror system disk with data-plane, guest will hang after reopen with mirror image; if it's second disk, guest looks well, but fio will hang.

So reopen this issue. 

Steps:
1. Launch guest with data-plane:
/usr/libexec/qemu-kvm -name rhel7_4-9343 -m 1G -smp 2 -object iothread,id=iothread0 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 -device virtio-blk-pci,drive=drive-0,id=virtio-blk-0,iothread=iothread0,bootindex=0 -drive file=/home/sn1,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,iothread=iothread1,bus=pci.0,addr=0x14,serial="QEMU-DISK2" -object iothread,id=iothread1  -monitor stdio -qmp tcp:0:5555,server,nowait -vnc :3 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15

2. Block mirror:
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk1", "target": "/home/mirror2.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
{ "execute": "block-job-complete", "arguments": { "device": "drive-virtio-blk1"} }

3. Run fio inside guest:
fio --filename=/dev/vdb --direct=1 --rw=write --bs=64k --size=200M --name=test --iodepth=1 --ioengine=libaio

Result:
# cat /var/log/messages
Apr 26 16:22:19 dhcp-8-107 kernel: INFO: task kworker/1:1:45 blocked for more than 120 seconds.
Apr 26 16:22:19 dhcp-8-107 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Comment 6 Paolo Bonzini 2017-06-14 13:04:04 UTC

Junyi, I'm looking at this.

Comment 10 Paolo Bonzini 2017-06-15 16:53:29 UTC

I cannot reproduce this (Fedora 21 guest, but that shouldn't matter).

$ x86_64-softmmu/qemu-system-x86_64 \
   -drive id=hd,if=none,file=$HOME/jeos-21-64.qcow2 -object iothread,id=io \
   -device virtio-blk-pci,drive=hd,iothread=io -m 512 -qmp stdio \
   -drive id=drive-virtio-blk1,if=none,cache=none,file=source.qcow2,aio=native,werror=stop,rerror=stop \
   -object iothread,id=iothread1 \
   -device virtio-blk-pci,iothread=iothread1,drive=drive-virtio-blk1,id=virtio-blk1,serial=QEMU-DISK2
<< {"QMP": {"version": {"qemu": {"micro": 0, "minor": 9, "major": 2}, "package": " (v2.9.0-140-g8bd5355efe)"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
<< {"return": {}}
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk1", "target": "target.qcow2", "format": "qcow2", "mode": "absolute-paths", "sync": "full" } }
<< Formatting 'target.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
<< {"return": {}}
<< {"timestamp": {"seconds": 1497545152, "microseconds": 910084}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
{ "execute": "block-job-complete", "arguments": { "device": "drive-virtio-blk1"} }
<< {"return": {}}
<< {"timestamp": {"seconds": 1497545162, "microseconds": 784525}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}

(fio runs well now)

{"timestamp": {"seconds": 1497545223, "microseconds": 486175}, "event": "SHUTDOWN"}

Comment 11 Paolo Bonzini 2017-06-15 16:53:57 UTC

Please retest with qemu-kvm-rhev-2.9.0-8.el7.

Comment 12 Ademar Reis 2017-06-15 17:28:59 UTC

(In reply to Paolo Bonzini from comment #11)
> Please retest with qemu-kvm-rhev-2.9.0-8.el7.

Setting needinfo.

Comment 13 Qianqian Zhu 2017-06-16 05:30:34 UTC

Retest with qemu-kvm-rhev-2.9.0-10.el7.x86_64, the issue has gone, it should be fixed in latest version.

Comment 14 Chao Yang 2017-06-16 07:19:30 UTC

(In reply to Qianqian Zhu from comment #13)
> Retest with qemu-kvm-rhev-2.9.0-10.el7.x86_64, the issue has gone, it should
> be fixed in latest version.

Hi Paolo, 

Now that it is not reproducible with latest build, I think we can close it now, any other concerns?

Comment 15 Paolo Bonzini 2017-06-16 12:51:43 UTC

Yes, please move it to VERIFIED.

Comment 16 Ademar Reis 2017-06-16 20:53:10 UTC

Given this was found during development, there's no need to add it to add it to the errata. I'm closing it as CURRENTRELEASE.

Note You need to log in before you can comment on or make changes to this bug.