1492559 – virtio-blk mutiwrite merge causes too big IO

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1492559 - virtio-blk mutiwrite merge causes too big IO

Summary: virtio-blk mutiwrite merge causes too big IO

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Fam Zheng
QA Contact:	Longxiang Lyu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-18 07:23 UTC by yisun
Modified:	2018-04-10 14:35 UTC (History)
CC List:	15 users (show)
Fixed In Version:	qemu-kvm-1.5.3-143.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 14:35:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
pre_iscsi.sh (866 bytes, text/plain) 2017-09-19 01:01 UTC, yisun	no flags	Details
strace.log (506.63 KB, text/plain) 2017-09-20 11:08 UTC, yisun	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:0816	0	normal	SHIPPED_LIVE	Low: qemu-kvm security, bug fix, and enhancement update	2018-04-10 12:47:03 UTC

Description yisun 2017-09-18 07:23:05 UTC

Description of problem:
mkfs.ext3/ext4 a iscsi disk will return a warning in vm

Version-Release number of selected component (if applicable):
qemu-kvm-common-1.5.3-141.el7_4.2.x86_64
qemu-kvm-1.5.3-141.el7_4.2.x86_64
qemu-img-1.5.3-141.el7_4.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare a iscsi lun (can refer to attachment - pre_iscsi.sh
2.start a vm with extra disk as follow:
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw'/>
      <source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'>
        <host name='127.0.0.1' port='3260'/>
      </source>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>

The vm's qemu process will be as:
qemu     21943     1  1 15:08 ?        00:00:11 /usr/libexec/qemu-kvm -name avocado-vt-vm1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 19b984c1-07ec-43b6-a253-a2b3e23ad476 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-avocado-vt-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/RHEL-7.4-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=iscsi://127.0.0.1:3260/iqn.2016-03.com.virttest%3Alogical-pool.target/0,format=raw,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=29,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ac:95:18,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-avocado-vt-vm1/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on


3. login vm
4. run "mkfs.ext4 /dev/vdb" in vm
[root@localhost ~]# mkfs.ext4 /dev/vdb
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
64000 inodes, 256000 blocks
12800 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=262144000
8 block groups
32768 blocks per group, 32768 fragments per group
8000 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: [  295.964108] blk_update_request: I/O error, dev vdb, sector 548480
[  295.966615] Buffer I/O error on dev vdb, logical block 68560, lost async page write
[  295.967283] Buffer I/O error on dev vdb, logical block 68561, lost async page write
[  295.967952] Buffer I/O error on dev vdb, logical block 68562, lost async page write
[  295.968568] Buffer I/O error on dev vdb, logical block 68563, lost async page write
[  295.969222] Buffer I/O error on dev vdb, logical block 68564, lost async page write
[  295.969870] Buffer I/O error on dev vdb, logical block 68565, lost async page write
[  295.970486] Buffer I/O error on dev vdb, logical block 68566, lost async page write
[  295.971176] Buffer I/O error on dev vdb, logical block 68567, lost async page write
[  295.971831] Buffer I/O error on dev vdb, logical block 68568, lost async page write
[  295.972455] Buffer I/O error on dev vdb, logical block 68569, lost async page write
[  295.973151] blk_update_request: I/O error, dev vdb, sector 549488
[  295.973726] blk_update_request: I/O error, dev vdb, sector 550496
[  295.974340] blk_update_request: I/O error, dev vdb, sector 551504
[  295.974955] blk_update_request: I/O error, dev vdb, sector 552512
[  295.975531] blk_update_request: I/O error, dev vdb, sector 553520
[  295.976179] blk_update_request: I/O error, dev vdb, sector 554528
[  295.976855] blk_update_request: I/O error, dev vdb, sector 555536
[  295.977446] blk_update_request: I/O error, dev vdb, sector 556544
[  295.978863] blk_update_request: I/O error, dev vdb, sector 786432

Warning, had trouble writing out superblocks.


[root@localhost ~]# echo $?
5

** sometimes there is only "Warning, had trouble writing out superblocks." but no "... lost async page write" or "blk_update_request: I/O error, dev vdb ..."

Actual results:
mkfs returns error

Expected results:
mkfs should work

Additional info:
This is not reproduced if I reinstall qemu with pkgs:
qemu-img-rhev-2.9.0-16.el7_4.8.x86_64
qemu-kvm-common-rhev-2.9.0-16.el7_4.8.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64

Comment 2 yisun 2017-09-18 07:25:30 UTC

btw, the guest is installed with latest os tree:
RHEL-7.4-20170711.0-x86_64

Comment 3 yisun 2017-09-19 01:01:36 UTC

Created attachment 1327658 [details]
pre_iscsi.sh

Comment 4 Fam Zheng 2017-09-19 02:01:16 UTC

Did this work with previous guest versions? E.g. RHEL 7.3?

Comment 5 yisun 2017-09-19 03:41:42 UTC

With RHEL-7.3-20161019.0-x86_64, there are still some i/o errors, but mkfs's exit code is normal as 0. 

[root@localhost ~]# uname -r
3.10.0-514.el7.x86_64

[root@localhost ~]# mkfs.ext4 /dev/vdb
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
64000 inodes, 256000 blocks
12800 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=262144000
8 block groups
32768 blocks per group, 32768 fragments per group
8000 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: [  192.808837] blk_update_request: I/O error, dev vdb, sector 548480
[  192.810756] Buffer I/O error on dev vdb, logical block 68560, lost async page write
[  192.811284] Buffer I/O error on dev vdb, logical block 68561, lost async page write
[  192.811815] Buffer I/O error on dev vdb, logical block 68562, lost async page write
[  192.812599] Buffer I/O error on dev vdb, logical block 68563, lost async page write
[  192.813366] Buffer I/O error on dev vdb, logical block 68564, lost async page write
[  192.814127] Buffer I/O error on dev vdb, logical block 68565, lost async page write
[  192.814906] Buffer I/O error on dev vdb, logical block 68566, lost async page write
[  192.815623] Buffer I/O error on dev vdb, logical block 68567, lost async page write
[  192.816372] Buffer I/O error on dev vdb, logical block 68568, lost async page write
[  192.817083] Buffer I/O error on dev vdb, logical block 68569, lost async page write
[  192.817830] blk_update_request: I/O error, dev vdb, sector 549488
[  192.818453] blk_update_request: I/O error, dev vdb, sector 550496
[  192.819131] blk_update_request: I/O error, dev vdb, sector 551504
[  192.819810] blk_update_request: I/O error, dev vdb, sector 552512
[  192.820474] blk_update_request: I/O error, dev vdb, sector 553520
[  192.821145] blk_update_request: I/O error, dev vdb, sector 554528
[  192.821804] blk_update_request: I/O error, dev vdb, sector 555536
[  192.823135] blk_update_request: I/O error, dev vdb, sector 556544
[  192.823754] blk_update_request: I/O error, dev vdb, sector 786432
done

[root@localhost ~]# echo $?
0

Comment 6 Fam Zheng 2017-09-20 06:23:34 UTC

I cannot reproduce that, can you collect "strace -f -p $PID_OF_QEMU_KVM" at host (with $PID_OF_QEMU_KVM replaced by the actual pid number) when you run the mkfs command in guest?

Comment 7 yisun 2017-09-20 11:08:12 UTC

Created attachment 1328383 [details]
strace.log

Comment 8 yisun 2017-09-20 11:11:12 UTC

(In reply to Fam Zheng from comment #6)
> I cannot reproduce that, can you collect "strace -f -p $PID_OF_QEMU_KVM" at
> host (with $PID_OF_QEMU_KVM replaced by the actual pid number) when you run
> the mkfs command in guest?

the strace output attached as https://bugzilla.redhat.com/attachment.cgi?id=1328383
It seems to has a lot of redundant info, pls check if it's useful

Comment 9 Fam Zheng 2017-09-21 01:17:23 UTC

Sorry, I was confused, strace won't help because it doesn't reflect the iscsi driver activities. You are not using the host kernel iscsi initiator although you login in the pre_iscsi.sh:

iscsiadm --mode node --login --targetname iqn.2016-03.com.virttest:logical-pool.target --portal 127.0.0.1

Two more questions:

Are there any error message from the host kernel?

Can you rule out the disk failure? (Reproducible on a different host?) If yes, maybe it's easier if you can let me access the test machine.

Comment 11 Fam Zheng 2017-09-21 13:19:51 UTC

Thanks, that helps!

I've fully understand the problem and I will work on a fix.

Summary: in qemu-kvm, we do a relatively old version of "write request merging" in virtio-blk, which could generate an errneously large request to be sent to the iscsi target when guest writes out the new filesystem metadata, resulting in EIO. Upstream QEMU (2.10) and qemu-kvm-rhev has new request merging logic that doesn't have this problem because the merging limit has been correctly considered.

Comment 13 Fam Zheng 2017-09-21 14:28:17 UTC

http://post-office.corp.redhat.com/archives/rhvirt-patches/2017-September/msg00531.html

Comment 15 Wainer dos Santos Moschetta 2017-09-28 21:58:33 UTC

Fix included in qemu-kvm-1.5.3-143.el7

Comment 17 Longxiang Lyu 2017-09-29 04:49:50 UTC

Verification: Bug is fixed in qemu-kvm-1.5.3-143.el7.

Steps:
1. verify qemu-kvm version
# rpm -qa | grep qemu-kvm
qemu-kvm-common-1.5.3-143.el7.x86_64
qemu-kvm-tools-1.5.3-143.el7.x86_64
qemu-kvm-debuginfo-1.5.3-143.el7.x86_64
qemu-kvm-1.5.3-143.el7.x86_64

2. boot up a vm with an iscsi drive as second disk
/usr/libexec/qemu-kvm \
-name test \
-machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-boot strict=on \
-drive file=/home/test/bugs/1492559/test.qcow2,format=qcow2,if=none,id=img0 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=img0,id=disk0,bootindex=1 \
-drive file=iscsi://127.0.0.1:3260/iqn.2017-09.com.versace:logical-pool.target/0,format=raw,if=none,id=img1 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=img1,id=disk1 \
-netdev tap,vhost=on,id=hostnet0 \
-device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ac:95:18,bus=pci.0,addr=0x3 \
-vnc 127.0.0.1:1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-monitor stdio \

3. disk info of guest
# lsblk
NAME                              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0                                 2:0    1    4K  0 disk 
sr0                                11:0    1 1024M  0 rom  
vda                               252:0    0   20G  0 disk 
├─vda1                            252:1    0    1G  0 part /boot
└─vda2                            252:2    0   19G  0 part 
  ├─rhel_bootp--73--199--239-root 253:0    0   17G  0 lvm  /
  └─rhel_bootp--73--199--239-swap 253:1    0    2G  0 lvm  [SWAP]
vdb                               252:16   0  9.8G  0 disk 

4. format vdb
# mkfs.ext3 /dev/vdb 
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
640848 inodes, 2560000 blocks
128000 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2621440000
79 block groups
32768 blocks per group, 32768 fragments per group
8112 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# echo $?
0

# mkfs.ext4 /dev/vdb
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
640848 inodes, 2560000 blocks
128000 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
79 block groups
32768 blocks per group, 32768 fragments per group
8112 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# echo $?
0

Comment 22 errata-xmlrpc 2018-04-10 14:35:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0816

Note You need to log in before you can comment on or make changes to this bug.