Bug 1492559

Summary: virtio-blk mutiwrite merge causes too big IO
Product: Red Hat Enterprise Linux 7 Reporter: yisun
Component: qemu-kvmAssignee: Fam Zheng <famz>
Status: CLOSED ERRATA QA Contact: Longxiang Lyu <lolyu>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: aliang, chayang, chhu, coli, hhan, juzhang, knoel, lmen, meili, michen, qzhang, rbalakri, virt-maint, xuzhang, yisun
Target Milestone: rcKeywords: Automation
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-143.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 14:35:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pre_iscsi.sh
none
strace.log none

Description yisun 2017-09-18 07:23:05 UTC
Description of problem:
mkfs.ext3/ext4 a iscsi disk will return a warning in vm

Version-Release number of selected component (if applicable):
qemu-kvm-common-1.5.3-141.el7_4.2.x86_64
qemu-kvm-1.5.3-141.el7_4.2.x86_64
qemu-img-1.5.3-141.el7_4.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare a iscsi lun (can refer to attachment - pre_iscsi.sh
2.start a vm with extra disk as follow:
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw'/>
      <source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'>
        <host name='127.0.0.1' port='3260'/>
      </source>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>

The vm's qemu process will be as:
qemu     21943     1  1 15:08 ?        00:00:11 /usr/libexec/qemu-kvm -name avocado-vt-vm1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 19b984c1-07ec-43b6-a253-a2b3e23ad476 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-avocado-vt-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/RHEL-7.4-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=iscsi://127.0.0.1:3260/iqn.2016-03.com.virttest%3Alogical-pool.target/0,format=raw,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=29,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ac:95:18,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-avocado-vt-vm1/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on


3. login vm
4. run "mkfs.ext4 /dev/vdb" in vm
[root@localhost ~]# mkfs.ext4 /dev/vdb
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
64000 inodes, 256000 blocks
12800 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=262144000
8 block groups
32768 blocks per group, 32768 fragments per group
8000 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: [  295.964108] blk_update_request: I/O error, dev vdb, sector 548480
[  295.966615] Buffer I/O error on dev vdb, logical block 68560, lost async page write
[  295.967283] Buffer I/O error on dev vdb, logical block 68561, lost async page write
[  295.967952] Buffer I/O error on dev vdb, logical block 68562, lost async page write
[  295.968568] Buffer I/O error on dev vdb, logical block 68563, lost async page write
[  295.969222] Buffer I/O error on dev vdb, logical block 68564, lost async page write
[  295.969870] Buffer I/O error on dev vdb, logical block 68565, lost async page write
[  295.970486] Buffer I/O error on dev vdb, logical block 68566, lost async page write
[  295.971176] Buffer I/O error on dev vdb, logical block 68567, lost async page write
[  295.971831] Buffer I/O error on dev vdb, logical block 68568, lost async page write
[  295.972455] Buffer I/O error on dev vdb, logical block 68569, lost async page write
[  295.973151] blk_update_request: I/O error, dev vdb, sector 549488
[  295.973726] blk_update_request: I/O error, dev vdb, sector 550496
[  295.974340] blk_update_request: I/O error, dev vdb, sector 551504
[  295.974955] blk_update_request: I/O error, dev vdb, sector 552512
[  295.975531] blk_update_request: I/O error, dev vdb, sector 553520
[  295.976179] blk_update_request: I/O error, dev vdb, sector 554528
[  295.976855] blk_update_request: I/O error, dev vdb, sector 555536
[  295.977446] blk_update_request: I/O error, dev vdb, sector 556544
[  295.978863] blk_update_request: I/O error, dev vdb, sector 786432

Warning, had trouble writing out superblocks.


[root@localhost ~]# echo $?
5

** sometimes there is only "Warning, had trouble writing out superblocks." but no "... lost async page write" or "blk_update_request: I/O error, dev vdb ..."

Actual results:
mkfs returns error

Expected results:
mkfs should work

Additional info:
This is not reproduced if I reinstall qemu with pkgs:
qemu-img-rhev-2.9.0-16.el7_4.8.x86_64
qemu-kvm-common-rhev-2.9.0-16.el7_4.8.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64

Comment 2 yisun 2017-09-18 07:25:30 UTC
btw, the guest is installed with latest os tree:
RHEL-7.4-20170711.0-x86_64

Comment 3 yisun 2017-09-19 01:01:36 UTC
Created attachment 1327658 [details]
pre_iscsi.sh

Comment 4 Fam Zheng 2017-09-19 02:01:16 UTC
Did this work with previous guest versions? E.g. RHEL 7.3?

Comment 5 yisun 2017-09-19 03:41:42 UTC
With RHEL-7.3-20161019.0-x86_64, there are still some i/o errors, but mkfs's exit code is normal as 0. 

[root@localhost ~]# uname -r
3.10.0-514.el7.x86_64

[root@localhost ~]# mkfs.ext4 /dev/vdb
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
64000 inodes, 256000 blocks
12800 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=262144000
8 block groups
32768 blocks per group, 32768 fragments per group
8000 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: [  192.808837] blk_update_request: I/O error, dev vdb, sector 548480
[  192.810756] Buffer I/O error on dev vdb, logical block 68560, lost async page write
[  192.811284] Buffer I/O error on dev vdb, logical block 68561, lost async page write
[  192.811815] Buffer I/O error on dev vdb, logical block 68562, lost async page write
[  192.812599] Buffer I/O error on dev vdb, logical block 68563, lost async page write
[  192.813366] Buffer I/O error on dev vdb, logical block 68564, lost async page write
[  192.814127] Buffer I/O error on dev vdb, logical block 68565, lost async page write
[  192.814906] Buffer I/O error on dev vdb, logical block 68566, lost async page write
[  192.815623] Buffer I/O error on dev vdb, logical block 68567, lost async page write
[  192.816372] Buffer I/O error on dev vdb, logical block 68568, lost async page write
[  192.817083] Buffer I/O error on dev vdb, logical block 68569, lost async page write
[  192.817830] blk_update_request: I/O error, dev vdb, sector 549488
[  192.818453] blk_update_request: I/O error, dev vdb, sector 550496
[  192.819131] blk_update_request: I/O error, dev vdb, sector 551504
[  192.819810] blk_update_request: I/O error, dev vdb, sector 552512
[  192.820474] blk_update_request: I/O error, dev vdb, sector 553520
[  192.821145] blk_update_request: I/O error, dev vdb, sector 554528
[  192.821804] blk_update_request: I/O error, dev vdb, sector 555536
[  192.823135] blk_update_request: I/O error, dev vdb, sector 556544
[  192.823754] blk_update_request: I/O error, dev vdb, sector 786432
done

[root@localhost ~]# echo $?
0

Comment 6 Fam Zheng 2017-09-20 06:23:34 UTC
I cannot reproduce that, can you collect "strace -f -p $PID_OF_QEMU_KVM" at host (with $PID_OF_QEMU_KVM replaced by the actual pid number) when you run the mkfs command in guest?

Comment 7 yisun 2017-09-20 11:08:12 UTC
Created attachment 1328383 [details]
strace.log

Comment 8 yisun 2017-09-20 11:11:12 UTC
(In reply to Fam Zheng from comment #6)
> I cannot reproduce that, can you collect "strace -f -p $PID_OF_QEMU_KVM" at
> host (with $PID_OF_QEMU_KVM replaced by the actual pid number) when you run
> the mkfs command in guest?

the strace output attached as https://bugzilla.redhat.com/attachment.cgi?id=1328383
It seems to has a lot of redundant info, pls check if it's useful

Comment 9 Fam Zheng 2017-09-21 01:17:23 UTC
Sorry, I was confused, strace won't help because it doesn't reflect the iscsi driver activities. You are not using the host kernel iscsi initiator although you login in the pre_iscsi.sh:

iscsiadm --mode node --login --targetname iqn.2016-03.com.virttest:logical-pool.target --portal 127.0.0.1

Two more questions:

Are there any error message from the host kernel?

Can you rule out the disk failure? (Reproducible on a different host?) If yes, maybe it's easier if you can let me access the test machine.

Comment 11 Fam Zheng 2017-09-21 13:19:51 UTC
Thanks, that helps!

I've fully understand the problem and I will work on a fix.

Summary: in qemu-kvm, we do a relatively old version of "write request merging" in virtio-blk, which could generate an errneously large request to be sent to the iscsi target when guest writes out the new filesystem metadata, resulting in EIO. Upstream QEMU (2.10) and qemu-kvm-rhev has new request merging logic that doesn't have this problem because the merging limit has been correctly considered.

Comment 15 Wainer dos Santos Moschetta 2017-09-28 21:58:33 UTC
Fix included in qemu-kvm-1.5.3-143.el7

Comment 17 Longxiang Lyu 2017-09-29 04:49:50 UTC
Verification: Bug is fixed in qemu-kvm-1.5.3-143.el7.

Steps:
1. verify qemu-kvm version
# rpm -qa | grep qemu-kvm
qemu-kvm-common-1.5.3-143.el7.x86_64
qemu-kvm-tools-1.5.3-143.el7.x86_64
qemu-kvm-debuginfo-1.5.3-143.el7.x86_64
qemu-kvm-1.5.3-143.el7.x86_64

2. boot up a vm with an iscsi drive as second disk
/usr/libexec/qemu-kvm \
-name test \
-machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-boot strict=on \
-drive file=/home/test/bugs/1492559/test.qcow2,format=qcow2,if=none,id=img0 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=img0,id=disk0,bootindex=1 \
-drive file=iscsi://127.0.0.1:3260/iqn.2017-09.com.versace:logical-pool.target/0,format=raw,if=none,id=img1 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=img1,id=disk1 \
-netdev tap,vhost=on,id=hostnet0 \
-device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ac:95:18,bus=pci.0,addr=0x3 \
-vnc 127.0.0.1:1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-monitor stdio \

3. disk info of guest
# lsblk
NAME                              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0                                 2:0    1    4K  0 disk 
sr0                                11:0    1 1024M  0 rom  
vda                               252:0    0   20G  0 disk 
├─vda1                            252:1    0    1G  0 part /boot
└─vda2                            252:2    0   19G  0 part 
  ├─rhel_bootp--73--199--239-root 253:0    0   17G  0 lvm  /
  └─rhel_bootp--73--199--239-swap 253:1    0    2G  0 lvm  [SWAP]
vdb                               252:16   0  9.8G  0 disk 

4. format vdb
# mkfs.ext3 /dev/vdb 
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
640848 inodes, 2560000 blocks
128000 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2621440000
79 block groups
32768 blocks per group, 32768 fragments per group
8112 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# echo $?
0

# mkfs.ext4 /dev/vdb
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
640848 inodes, 2560000 blocks
128000 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
79 block groups
32768 blocks per group, 32768 fragments per group
8112 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# echo $?
0

Comment 22 errata-xmlrpc 2018-04-10 14:35:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0816