Red Hat Bugzilla – Bug 1492559
virtio-blk mutiwrite merge causes too big IO
Last modified: 2018-04-10 10:35:07 EDT
Description of problem: mkfs.ext3/ext4 a iscsi disk will return a warning in vm Version-Release number of selected component (if applicable): qemu-kvm-common-1.5.3-141.el7_4.2.x86_64 qemu-kvm-1.5.3-141.el7_4.2.x86_64 qemu-img-1.5.3-141.el7_4.2.x86_64 How reproducible: 100% Steps to Reproduce: 1.prepare a iscsi lun (can refer to attachment - pre_iscsi.sh 2.start a vm with extra disk as follow: <disk type='network' device='disk'> <driver name='qemu' type='raw'/> <source protocol='iscsi' name='iqn.2016-03.com.virttest:logical-pool.target/0'> <host name='127.0.0.1' port='3260'/> </source> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> The vm's qemu process will be as: qemu 21943 1 1 15:08 ? 00:00:11 /usr/libexec/qemu-kvm -name avocado-vt-vm1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 19b984c1-07ec-43b6-a253-a2b3e23ad476 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-avocado-vt-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/RHEL-7.4-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=iscsi://127.0.0.1:3260/iqn.2016-03.com.virttest%3Alogical-pool.target/0,format=raw,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=29,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ac:95:18,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-avocado-vt-vm1/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on 3. login vm 4. run "mkfs.ext4 /dev/vdb" in vm [root@localhost ~]# mkfs.ext4 /dev/vdb mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 64000 inodes, 256000 blocks 12800 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=262144000 8 block groups 32768 blocks per group, 32768 fragments per group 8000 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: [ 295.964108] blk_update_request: I/O error, dev vdb, sector 548480 [ 295.966615] Buffer I/O error on dev vdb, logical block 68560, lost async page write [ 295.967283] Buffer I/O error on dev vdb, logical block 68561, lost async page write [ 295.967952] Buffer I/O error on dev vdb, logical block 68562, lost async page write [ 295.968568] Buffer I/O error on dev vdb, logical block 68563, lost async page write [ 295.969222] Buffer I/O error on dev vdb, logical block 68564, lost async page write [ 295.969870] Buffer I/O error on dev vdb, logical block 68565, lost async page write [ 295.970486] Buffer I/O error on dev vdb, logical block 68566, lost async page write [ 295.971176] Buffer I/O error on dev vdb, logical block 68567, lost async page write [ 295.971831] Buffer I/O error on dev vdb, logical block 68568, lost async page write [ 295.972455] Buffer I/O error on dev vdb, logical block 68569, lost async page write [ 295.973151] blk_update_request: I/O error, dev vdb, sector 549488 [ 295.973726] blk_update_request: I/O error, dev vdb, sector 550496 [ 295.974340] blk_update_request: I/O error, dev vdb, sector 551504 [ 295.974955] blk_update_request: I/O error, dev vdb, sector 552512 [ 295.975531] blk_update_request: I/O error, dev vdb, sector 553520 [ 295.976179] blk_update_request: I/O error, dev vdb, sector 554528 [ 295.976855] blk_update_request: I/O error, dev vdb, sector 555536 [ 295.977446] blk_update_request: I/O error, dev vdb, sector 556544 [ 295.978863] blk_update_request: I/O error, dev vdb, sector 786432 Warning, had trouble writing out superblocks. [root@localhost ~]# echo $? 5 ** sometimes there is only "Warning, had trouble writing out superblocks." but no "... lost async page write" or "blk_update_request: I/O error, dev vdb ..." Actual results: mkfs returns error Expected results: mkfs should work Additional info: This is not reproduced if I reinstall qemu with pkgs: qemu-img-rhev-2.9.0-16.el7_4.8.x86_64 qemu-kvm-common-rhev-2.9.0-16.el7_4.8.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64
btw, the guest is installed with latest os tree: RHEL-7.4-20170711.0-x86_64
Created attachment 1327658 [details] pre_iscsi.sh
Did this work with previous guest versions? E.g. RHEL 7.3?
With RHEL-7.3-20161019.0-x86_64, there are still some i/o errors, but mkfs's exit code is normal as 0. [root@localhost ~]# uname -r 3.10.0-514.el7.x86_64 [root@localhost ~]# mkfs.ext4 /dev/vdb mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 64000 inodes, 256000 blocks 12800 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=262144000 8 block groups 32768 blocks per group, 32768 fragments per group 8000 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: [ 192.808837] blk_update_request: I/O error, dev vdb, sector 548480 [ 192.810756] Buffer I/O error on dev vdb, logical block 68560, lost async page write [ 192.811284] Buffer I/O error on dev vdb, logical block 68561, lost async page write [ 192.811815] Buffer I/O error on dev vdb, logical block 68562, lost async page write [ 192.812599] Buffer I/O error on dev vdb, logical block 68563, lost async page write [ 192.813366] Buffer I/O error on dev vdb, logical block 68564, lost async page write [ 192.814127] Buffer I/O error on dev vdb, logical block 68565, lost async page write [ 192.814906] Buffer I/O error on dev vdb, logical block 68566, lost async page write [ 192.815623] Buffer I/O error on dev vdb, logical block 68567, lost async page write [ 192.816372] Buffer I/O error on dev vdb, logical block 68568, lost async page write [ 192.817083] Buffer I/O error on dev vdb, logical block 68569, lost async page write [ 192.817830] blk_update_request: I/O error, dev vdb, sector 549488 [ 192.818453] blk_update_request: I/O error, dev vdb, sector 550496 [ 192.819131] blk_update_request: I/O error, dev vdb, sector 551504 [ 192.819810] blk_update_request: I/O error, dev vdb, sector 552512 [ 192.820474] blk_update_request: I/O error, dev vdb, sector 553520 [ 192.821145] blk_update_request: I/O error, dev vdb, sector 554528 [ 192.821804] blk_update_request: I/O error, dev vdb, sector 555536 [ 192.823135] blk_update_request: I/O error, dev vdb, sector 556544 [ 192.823754] blk_update_request: I/O error, dev vdb, sector 786432 done [root@localhost ~]# echo $? 0
I cannot reproduce that, can you collect "strace -f -p $PID_OF_QEMU_KVM" at host (with $PID_OF_QEMU_KVM replaced by the actual pid number) when you run the mkfs command in guest?
Created attachment 1328383 [details] strace.log
(In reply to Fam Zheng from comment #6) > I cannot reproduce that, can you collect "strace -f -p $PID_OF_QEMU_KVM" at > host (with $PID_OF_QEMU_KVM replaced by the actual pid number) when you run > the mkfs command in guest? the strace output attached as https://bugzilla.redhat.com/attachment.cgi?id=1328383 It seems to has a lot of redundant info, pls check if it's useful
Sorry, I was confused, strace won't help because it doesn't reflect the iscsi driver activities. You are not using the host kernel iscsi initiator although you login in the pre_iscsi.sh: iscsiadm --mode node --login --targetname iqn.2016-03.com.virttest:logical-pool.target --portal 127.0.0.1 Two more questions: Are there any error message from the host kernel? Can you rule out the disk failure? (Reproducible on a different host?) If yes, maybe it's easier if you can let me access the test machine.
Thanks, that helps! I've fully understand the problem and I will work on a fix. Summary: in qemu-kvm, we do a relatively old version of "write request merging" in virtio-blk, which could generate an errneously large request to be sent to the iscsi target when guest writes out the new filesystem metadata, resulting in EIO. Upstream QEMU (2.10) and qemu-kvm-rhev has new request merging logic that doesn't have this problem because the merging limit has been correctly considered.
http://post-office.corp.redhat.com/archives/rhvirt-patches/2017-September/msg00531.html
Fix included in qemu-kvm-1.5.3-143.el7
Verification: Bug is fixed in qemu-kvm-1.5.3-143.el7. Steps: 1. verify qemu-kvm version # rpm -qa | grep qemu-kvm qemu-kvm-common-1.5.3-143.el7.x86_64 qemu-kvm-tools-1.5.3-143.el7.x86_64 qemu-kvm-debuginfo-1.5.3-143.el7.x86_64 qemu-kvm-1.5.3-143.el7.x86_64 2. boot up a vm with an iscsi drive as second disk /usr/libexec/qemu-kvm \ -name test \ -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off \ -m 4G \ -smp 2,sockets=2,cores=1,threads=1 \ -boot strict=on \ -drive file=/home/test/bugs/1492559/test.qcow2,format=qcow2,if=none,id=img0 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=img0,id=disk0,bootindex=1 \ -drive file=iscsi://127.0.0.1:3260/iqn.2017-09.com.versace:logical-pool.target/0,format=raw,if=none,id=img1 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=img1,id=disk1 \ -netdev tap,vhost=on,id=hostnet0 \ -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ac:95:18,bus=pci.0,addr=0x3 \ -vnc 127.0.0.1:1 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ -monitor stdio \ 3. disk info of guest # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT fd0 2:0 1 4K 0 disk sr0 11:0 1 1024M 0 rom vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 19G 0 part ├─rhel_bootp--73--199--239-root 253:0 0 17G 0 lvm / └─rhel_bootp--73--199--239-swap 253:1 0 2G 0 lvm [SWAP] vdb 252:16 0 9.8G 0 disk 4. format vdb # mkfs.ext3 /dev/vdb mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 640848 inodes, 2560000 blocks 128000 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2621440000 79 block groups 32768 blocks per group, 32768 fragments per group 8112 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done # echo $? 0 # mkfs.ext4 /dev/vdb mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 640848 inodes, 2560000 blocks 128000 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2151677952 79 block groups 32768 blocks per group, 32768 fragments per group 8112 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done # echo $? 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0816