Bug 2141964

Summary: Guest hit EXT4-fs error on host 4K disk when repeatedly hot-plug/unplug running IO disk
Product: Red Hat Enterprise Linux 8 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aliang, coli, esandeen, jinzhao, juzhang, kwolf, lijin, qzhang, siwang, vcojot, vgoyal, virt-maint, xuwei, yfu, zhenyzha
Version: 8.8Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-37.module+el8.9.0+19491+15e62c0a Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2174676 (view as bug list) Environment:
Last Closed: 2023-11-14 15:33:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2140910, 2174676    
Attachments:
Description Flags
Simpler reproducer none

Description qing.wang 2022-11-11 09:06:40 UTC
Description of problem:

Boot a guest, the os image file is located on a 4K disk.
hotplug one disk as a data disk, and execute io (iozone) on the disk.
then unplug it. repeatedly execute hotplug -> iozone ->unplug action. 
It hit the below error result in iozone failed. 


2022-11-08 02:22:20: [   74.849846] EXT4-fs error (device vdb1): ext4_find_entry:1505: inode #2: comm iozone: reading directory lblock 0
2022-11-08 02:22:20: [   74.851790] Buffer I/O error on dev vdb1, logical block 0, lost sync page write
2022-11-08 02:22:20: [   74.852838] EXT4-fs (vdb1): I/O error while writing superblock
2022-11-08 02:22:20: [   74.853692] EXT4-fs error (device vdb1): ext4_find_entry:1505: inode #2: comm iozone: reading directory lblock 0
2022-11-08 02:22:24: [   79.196478] pci 0000:06:00.0: [1af4:1042] type 00 class 0x010000
2022-11-08 02:22:24: [   79.197428] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x00000fff]
2022-11-08 02:22:24: [   79.198379] pci 0000:06:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
2022-11-08 02:22:24: [   79.200729] pci 0000:06:00.0: BAR 4: assigned [mem 0xfd000000-0xfd003fff 64bit pref]
2022-11-08 02:22:24: [   79.202148] pci 0000:06:00.0: BAR 1: assigned [mem 0xfe200000-0xfe200fff]
2022-11-08 02:22:24: [   79.203987] virtio-pci 0000:06:00.0: enabling device (0000 -> 0002)
2022-11-08 02:22:24: [   79.215542] virtio_blk virtio2: [vdc] 62914560 512-byte logical blocks (32.2 GB/30.0 GiB)
2022-11-08 02:22:24: [   79.217058] vdc: detected capacity change from 0 to 32212254720
2022-11-08 02:22:24: [   79.219321]  vdc: vdc1
2022-11-08 02:22:26: [   80.863042] Aborting journal on device vdb1-8.
2022-11-08 02:22:26: [   80.863780] Buffer I/O error on dev vdb1, logical block 3702784, lost sync page write
2022-11-08 02:22:26: [   80.865005] JBD2: Error -5 detected when updating journal superblock for vdb1-8.
2022-11-08 02:22:26: [   80.899246]  vdc:
2022-11-08 02:22:26: [   81.315697]  vdc: vdc1
2022-11-08 02:22:28: [   82.462054] EXT4-fs (vdc1): mounted filesystem with ordered data mode. Opts: (null)
2022-11-08 02:23:03: [  117.587267] EXT4-fs error (device vdc1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:03: [  117.589912] EXT4-fs error (device vdc1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:03: [  117.591492] EXT4-fs error (device vdc1) in ext4_orphan_del:3019: IO failure
2022-11-08 02:23:03: [  117.592916] EXT4-fs error (device vdc1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:03: [  117.595299] EXT4-fs error (device vdc1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:03: [  117.597031] EXT4-fs error (device vdc1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:03: [  117.599706] EXT4-fs error (device vdc1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:03: [  117.601459] ------------[ cut here ]------------
2022-11-08 02:23:03: [  117.602524] bdi-(unknown) not registered
2022-11-08 02:23:03: [  117.603448] WARNING: CPU: 9 PID: 3257 at fs/fs-writeback.c:2325 __mark_inode_dirty+0x29a/0x390
2022-11-08 02:23:03: [  117.605391] Modules linked in: ext4 mbcache jbd2 fuse xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm iTCO_wdt iTCO_vendor_support lpc_ich joydev rapl i2c_i801 pcspkr xfs libcrc32c nvme_tcp(X) nvme_fabrics nvme nvme_core t10_pi bochs drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm ahci drm crct10dif_pclmul libahci crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net serio_raw net_failover virtio_blk failover dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp
2022-11-08 02:23:03: [  117.605515]  libiscsi scsi_transport_iscsi
2022-11-08 02:23:03: [  117.625692] CPU: 9 PID: 3257 Comm: iozone Kdump: loaded Tainted: G               X --------- -  - 4.18.0-432.el8.x86_64 #1
2022-11-08 02:23:03: [  117.628156] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.8.0+16781+9f4724c2 04/01/2014
2022-11-08 02:23:03: [  117.630227] RIP: 0010:__mark_inode_dirty+0x29a/0x390
2022-11-08 02:23:03: [  117.631350] Code: ff ff 48 8b 38 49 89 c4 f6 47 3c 01 74 1e 48 8b 40 08 a8 01 75 16 e8 a5 af f2 ff 48 c7 c7 e7 e5 50 84 48 89 c6 e8 43 0b d6 ff <0f> 0b 48 8b 05 7d 4f 47 01 48 89 85 d0 00 00 00 45 85 ff 74 0e 48
2022-11-08 02:23:03: [  117.635514] RSP: 0018:ffffaa1702fdbd20 EFLAGS: 00010282
2022-11-08 02:23:03: [  117.636691] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
2022-11-08 02:23:03: [  117.638289] RDX: ffff9a51b3ca6ec0 RSI: ffff9a51b3c96698 RDI: ffff9a51b3c96698
2022-11-08 02:23:03: [  117.639887] RBP: ffff9a4f47b6d308 R08: 0000000000000000 R09: c0000000ffff7fff
2022-11-08 02:23:03: [  117.641486] R10: 0000000000000001 R11: ffffaa1702fdbb38 R12: ffff9a4f4e803858
2022-11-08 02:23:03: [  117.643073] R13: ffff9a4f47b6d390 R14: 0000000000000000 R15: 0000000000000000
2022-11-08 02:23:03: [  117.644676] FS:  00007fb287b90740(0000) GS:ffff9a51b3c80000(0000) knlGS:0000000000000000
2022-11-08 02:23:03: [  117.646490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-11-08 02:23:03: [  117.647781] CR2: 0000559e7d4b8050 CR3: 000000010535e001 CR4: 00000000007706e0
2022-11-08 02:23:03: [  117.649395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2022-11-08 02:23:03: [  117.650990] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2022-11-08 02:23:03: [  117.652590] PKRU: 55555554
2022-11-08 02:23:03: [  117.653209] Call Trace:
2022-11-08 02:23:03: [  117.653792]  generic_update_time+0xba/0xd0
2022-11-08 02:23:03: [  117.654725]  file_update_time+0xe5/0x130
2022-11-08 02:23:03: [  117.655624]  __generic_file_write_iter+0x9c/0x1d0
2022-11-08 02:23:03: [  117.656699]  ext4_file_write_iter+0x175/0x3e0 [ext4]
2022-11-08 02:23:03: [  117.657899]  ? __schedule+0x2d9/0x860
2022-11-08 02:23:03: [  117.658748]  new_sync_write+0x112/0x160
2022-11-08 02:23:03: [  117.659625]  vfs_write+0xa5/0x1b0
2022-11-08 02:23:03: [  117.660381]  ksys_write+0x4f/0xb0
2022-11-08 02:23:03: [  117.661131]  do_syscall_64+0x5b/0x1b0
2022-11-08 02:23:03: [  117.661969]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
2022-11-08 02:23:03: [  117.663117] RIP: 0033:0x7fb28755d9e5
2022-11-08 02:23:03: [  117.663927] Code: 00 00 75 05 48 83 c4 58 c3 e8 27 4a ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 8b 05 66 da 20 00 85 c0 75 12 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
2022-11-08 02:23:03: [  117.667236] RSP: 002b:00007ffec795f2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
2022-11-08 02:23:03: [  117.668392] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fb28755d9e5
2022-11-08 02:23:03: [  117.669474] RDX: 0000000000020000 RSI: 00007fb286000000 RDI: 0000000000000003
2022-11-08 02:23:03: [  117.670562] RBP: 00000000000003ee R08: 00007ffec79ac080 R09: 0000000000000000
2022-11-08 02:23:03: [  117.671648] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb286000000
2022-11-08 02:23:03: [  117.672735] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000000
2022-11-08 02:23:03: [  117.673820] ---[ end trace fd08fdd2baddf162 ]---
2022-11-08 02:23:03: [  117.674577] EXT4-fs error (device vdc1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:03: [  117.676612] EXT4-fs error (device vdc1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:03: [  117.694719] EXT4-fs error (device vdc1) in ext4_orphan_add:2940: IO failure
2022-11-08 02:23:07: [  121.923307] pci 0000:06:00.0: [1af4:1042] type 00 class 0x010000
2022-11-08 02:23:07: [  121.924251] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x00000fff]
2022-11-08 02:23:07: [  121.925207] pci 0000:06:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
2022-11-08 02:23:07: [  121.927507] pci 0000:06:00.0: BAR 4: assigned [mem 0xfd000000-0xfd003fff 64bit pref]
2022-11-08 02:23:07: [  121.928636] pci 0000:06:00.0: BAR 1: assigned [mem 0xfe200000-0xfe200fff]
2022-11-08 02:23:07: [  121.929659] virtio-pci 0000:06:00.0: enabling device (0000 -> 0002)
2022-11-08 02:23:07: [  121.941351] virtio_blk virtio2: [vdd] 62914560 512-byte logical blocks (32.2 GB/30.0 GiB)
2022-11-08 02:23:07: [  121.942539] vdd: detected capacity change from 0 to 32212254720
2022-11-08 02:23:07: [  121.944611]  vdd: vdd1
2022-11-08 02:23:08: [  122.847114] Aborting journal on device vdc1-8.
2022-11-08 02:23:08: [  122.847956] Buffer I/O error on dev vdc1, logical block 3702784, lost sync page write
2022-11-08 02:23:08: [  122.849200] JBD2: Error -5 detected when updating journal superblock for vdc1-8.
2022-11-08 02:23:09: [  123.640519]  vdd:
2022-11-08 02:23:10: [  125.202764] EXT4-fs (vdd1): mounted filesystem with ordered data mode. Opts: (null)
2022-11-08 02:23:45: [  160.320460] EXT4-fs error (device vdd1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:45: [  160.322168] EXT4-fs error (device vdd1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:45: [  160.323275] EXT4-fs error (device vdd1) in ext4_orphan_del:3019: IO failure
2022-11-08 02:23:45: [  160.324418] EXT4-fs error (device vdd1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:45: [  160.326800] EXT4-fs error (device vdd1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:45: [  160.328125] EXT4-fs error (device vdd1): __ext4_get_inode_loc:4726: inode #12: block 1061: comm iozone: unable to read itable block
2022-11-08 02:23:45: [  160.329911] EXT4-fs error (device vdd1) in ext4_reserve_inode_write:5971: IO failure
2022-11-08 02:23:45: [  160.331075] ------------[ cut here ]------------

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 8.8 Beta (Ootpa)
4.18.0-432.el8.x86_64
qemu-kvm-6.2.0-23.module+el8.8.0+17074+095c619a.x86_64
seabios-bin-1.16.0-3.module+el8.8.0+16781+9f4724c2.noarch
edk2-ovmf-20220126gitbb1bba3d77-3.el8.noarch
libvirt-8.0.0-10.module+el8.8.0+16781+9f4724c2.x86_64
virtio-win-prewhql-0.1-228.iso


How reproducible:
>50%

Steps to Reproduce:
1. environment 

/dev/mapper/rhel_dell--per440--11-root xfs        70G  6.2G   64G   9% /
/dev/sda2                              xfs      1014M  254M  761M  25% /boot
/dev/mapper/rhel_dell--per440--11-home xfs       456G  149G  307G  33% /home


fdisk -l /dev/sda

Disk /dev/sda: 558.4 GiB, 599550590976 bytes, 146374656 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt

2.boot guest
The guest image file locates on 4k disk.

/home/kvm_autotest_root/images/rhel880-64-virtio.qcow2
qemu-img create -f qcow2 /home/kvm_autotest_root/images/storage0.qcow2 30G

/usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -sandbox on  \
     -machine q35,memory-backend=mem-machine_mem \
     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
     -nodefaults \
     -device VGA,bus=pcie.0,addr=0x2 \
     -m 12288 \
     -object memory-backend-ram,size=12288M,id=mem-machine_mem  \
     -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
     -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
     -chardev socket,id=qmp_id_qmpmonitor1,wait=off,server=on,path=/var/tmp/avocado_qxy90voo/monitor-qmpmonitor1-20221111-031822-9TPxVAnJ  \
     -mon chardev=qmp_id_qmpmonitor1,mode=control \
     -chardev socket,id=qmp_id_catch_monitor,wait=off,server=on,path=/var/tmp/avocado_qxy90voo/monitor-catch_monitor-20221111-031822-9TPxVAnJ  \
     -mon chardev=qmp_id_catch_monitor,mode=control \
     -device pvpanic,ioport=0x505,id=id8YJ0PS \
     -chardev socket,id=chardev_serial0,wait=off,server=on,path=/var/tmp/avocado_qxy90voo/serial-serial0-20221111-031822-9TPxVAnJ \
     -device isa-serial,id=serial0,chardev=chardev_serial0  \
     -chardev socket,id=seabioslog_id_20221111-031822-9TPxVAnJ,path=/var/tmp/avocado_qxy90voo/seabios-20221111-031822-9TPxVAnJ,server=on,wait=off \
     -device isa-debugcon,chardev=seabioslog_id_20221111-031822-9TPxVAnJ,iobase=0x402 \
     -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
     -object iothread,id=iothread0 \
     -object iothread,id=iothread1 \
     -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel880-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \
     -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
     -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
     -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
     -device virtio-net-pci,mac=9a:04:0d:5f:0d:27,id=idmLBjcb,netdev=idDzO7iE,bus=pcie-root-port-3,addr=0x0  \
     -netdev tap,id=idDzO7iE,vhost=on  \
     -vnc :0  \
     -rtc base=utc,clock=host,driftfix=slew  \
     -boot menu=off,order=cdn,once=c,strict=off \
     -enable-kvm \
     -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

3.hotplug disk
{'execute': 'blockdev-add', 'arguments': {'node-name': 'file_stg0', 'driver': 'file', 'auto-read-only': True, 'discard': 'unmap', 'aio': 'threads', 'filename': '/home/kvm_autotest_root/images/storage0.qcow2', 'cache': {'direct': True, 'no-flush': False}}, 'id': 'E2JIZiWf'}
{'execute': 'blockdev-add', 'arguments': {'node-name': 'drive_stg0', 'driver': 'qcow2', 'read-only': False, 'cache': {'direct': True, 'no-flush': False}, 'file': 'file_stg0'}, 'id': 'wByWn732'}
{'execute': 'device_add', 'arguments': OrderedDict([('driver', 'virtio-blk-pci'), ('id', 'stg0'), ('drive', 'drive_stg0'), ('write-cache', 'on'), ('bus', 'pcie_extra_root_port_0'), ('addr', '0x0'), ('iothread', 'iothread1')]), 'id': 'IoBoQcIr'}


4.partition,format and mount the data disk

 parted -s "/dev/vdb" mkpart primary 0M 30720.0M
 partprobe /dev/vdb
 yes|mkfs.ext4 -F '/dev/vdb1'
 rm -rf /mnt/vdb1; mkdir /mnt/vdb1
 mount -t ext4 /dev/vdb1 /mnt/vdb1

5.execute iozone on the disk
iozone -az -g 10G -y 32k -i 0 -i 1 -I -f /mnt/vdb1/iozone_test

6.unplug disk
{'execute': 'device_del', 'arguments': {'id': 'stg0'}, 'id': 'OK44JRhf'}
{'execute': 'blockdev-del', 'arguments': {'node-name': 'drive_stg0'}, 'id': 'vqD7Jm2K'}
{'execute': 'blockdev-del', 'arguments': {'node-name': 'file_stg0'}, 'id': '48nbWJ9L'}

7. repeat 100 times  step 3-6 




Actual results:
FS error message indicates file system is bad

Expected results:
iozone may run on disk after hotplug

Additional info:
Same test passed on other host
 fdisk -l /dev/sda
Disk /dev/sda: 446.6 GiB, 479559942144 bytes, 936640512 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Comment 3 Vivek Goyal 2022-11-14 15:44:40 UTC
Qing Wang,

So you hot-unplugged the disk while iozone was running?

Will ext4 (or any other filesystem) and application, not see errors (EIO or equivalent) if disk is suddenly hot unplugged.

In virtiofs, IIRC, we do this in little bit graceful manner. We disconnect all queues so that any new request submitted on queue will immediately return with -EIO and the wait for pending requests to finish. (virtio_fs_remove()).

I am assuming that virtio-blk must be doing something similar. Have not looked at the code. IOW, if device is suddenly hot-unplugged, filesystem and applications are bound to see I/O failures and might choose to send error messages on the console. And there is no guarantee that filesystem or application will resume automatically when device is plugged back in.

Please correct me if my understanding is not right.

Comment 4 Eric Sandeen 2022-11-14 16:05:01 UTC
It's not clear to me from the logs when the unplugs happen vs. when the IO errors happen. Logging something to dmesg when you perform the unplug action could make it more clear.

But of course you will see IO errors after you unplug the disk, this is expected.

> Expected results:
> iozone may run on disk after hotplug

As far as  I can tell, it /is/ running on the new disk after it is hotplugged (added) and then it fails after it is unplugged. This is expected, of course.

Surely you don't expect iozone to keep running with no errors after the disk has been unplugged, right?

Comment 5 qing.wang 2022-11-15 02:02:31 UTC
I upload a normal result of this test.

http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-14

It also will hit same fs error in serial log.
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-14/serial-serial0-avocado-vt-vm1.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-14/debug.log

but the iozone may run on new hotplug disk.
it will hit error when the disk is unplugged.
The iozone error is "write: Input/output error\n\niozone: interrupted\n\nexiting iozone\n\n"


but for 4k disk, it looks like new hotplug disk can not work.
The iozone Error is "reading block 140 7f30fde00000\nread: Input/output error\n

Comment 6 Kevin Wolf 2022-11-16 12:26:23 UTC
So what you're saying is that the last iozone instance is failing even though its disk has _not_ been unplugged yet, right?

The messages in the kernel log do look a bit different from the other instances caused by the unplug:

2022-11-08 02:32:27: [  681.517270] EXT4-fs (vdq1): mounted filesystem with ordered data mode. Opts: (null)
2022-11-08 02:32:40: [  694.648027] blk_update_request: I/O error, dev vdq, sector 614401 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.649759] blk_update_request: I/O error, dev vdq, sector 616433 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.651326] blk_update_request: I/O error, dev vdq, sector 618465 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
2022-11-08 02:32:40: [  694.652792] blk_update_request: I/O error, dev vdq, sector 618497 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.654316] blk_update_request: I/O error, dev vdq, sector 620529 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.655847] blk_update_request: I/O error, dev vdq, sector 622561 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0

And in the QMP logs, I can't see any 'device_del' request for this disk indeed. I also don't see any I/O errors from the host (which would be visible as a QMP event). Maybe the unexpected previous unplugs messed up some state?

Can you confirm that the same iozone invocation doesn't fail on this host and with the same configuration when you run it without unplugging busy disks first? You mentioned that it uses a 4k disk. If something was wrong with respect to request alignment and that caused the I/O errors, it would be visible with a normal run (however, as I said above, it would also be expected to result in a QMP event).

(Actually, with respect to the unplug operations that happen before, I'm wondering why the kernel even allows the unplug. My understanding is that PCI unplug only works with guest cooperation, so QEMU wouldn't unplug the device if the guest kernel didn't confirm it. But that's just an aside.)

Comment 7 qing.wang 2022-11-17 10:31:16 UTC
The iozone invocation has no issue on the host, and it is executed after each hotplug finished.
The process in guest is:
 hotplugging disk -> iozone -> check iozone pid -> sleep 30s -> check iozone pid to confirm IO is doing.

 

Let's check the log of guest:

[stdlog] 2022-11-08 02:31:44,674 avocado.test INFO | Start iozone under background.

sleep 30s

[stdlog] 2022-11-08 02:32:19,567 avocado.test DEBUG| The index of disks before unplug:
[stdlog]  {'/dev/vda', '/dev/vdp'}   <------indicates the new disk is identified by guest os
...
[stdlog] 2022-11-08 02:32:19,569 avocado.virttest.qemu_monitor DEBUG| Send command: {'execute': 'device_del', 'arguments': {'id': 'stg0'}, 'id': 'AVPnEpRT'}
[stdlog] 2022-11-08 02:32:19,673 aexpect.client DEBUG| Sending command: echo $?
...
[stdlog] 2022-11-08 02:32:23,903 avocado.test DEBUG| The index of disks before hotplug:
[stdlog]  {'/dev/vda'}   <------indicates the new disk is unpluged by guest os

[stdlog] 2022-11-08 02:32:23,904 avocado.test INFO | Start to hotplug devices "stg0" by monitor qmpmonitor1.
[stdlog] 2022-11-08 02:32:23,906 avocado.virttest.qemu_monitor DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'blockdev-add' 
[stdlog] 2022-11-08 02:32:23,906 avocado.virttest.qemu_monitor DEBUG| Send command: {'execute': 'blockdev-add', 'arguments': {'node-name': 'file_stg0', 'driver': 'file', 'auto-read-only': True, 'discard': 'unmap', 'aio': 'threads', 'filename': '/home/kvm_autotest_root/images/storage0.qcow2', 'cache': {'direct': True, 'no-flush': False}}, 'id': 'meE4QckK'}
[stdlog] 2022-11-08 02:32:23,908 avocado.virttest.qemu_monitor DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'blockdev-add' 
[stdlog] 2022-11-08 02:32:23,908 avocado.virttest.qemu_monitor DEBUG| Send command: {'execute': 'blockdev-add', 'arguments': {'node-name': 'drive_stg0', 'driver': 'qcow2', 'read-only': False, 'cache': {'direct': True, 'no-flush': False}, 'file': 'file_stg0'}, 'id': 't7iuOPpX'}
[stdlog] 2022-11-08 02:32:23,910 avocado.virttest.qemu_monitor DEBUG| (monitor avocado-vt-vm1.qmpmonitor1) Sending command 'device_add' 
[stdlog] 2022-11-08 02:32:23,910 avocado.virttest.qemu_monitor DEBUG| Send command: {'execute': 'device_add', 'arguments': OrderedDict([('driver', 'virtio-blk-pci'), ('id', 'stg0'), ('drive', 'drive_stg0'), ('write-cache', 'on'), ('bus', 'pcie_extra_root_port_0'), ('addr', '0x0'), ('iothread', 'iothread1')]), 'id': 'z9OMDLvl'}
.....

[stdlog] 2022-11-08 02:32:23,926 avocado.virttest.virt_vm DEBUG| Attempting to log into 'avocado-vt-vm1' (timeout 360s)
[stdlog] 2022-11-08 02:32:23,927 avocado.virttest.virt_vm DEBUG| Found/Verified IP 192.168.122.85 for VM avocado-vt-vm1 NIC 0
[stdlog] 2022-11-08 02:32:25,330 aexpect.client DEBUG| Sending command: ls /dev/[vhs]d* | grep -v [0-9]$
[stdlog] 2022-11-08 02:32:25,435 aexpect.client DEBUG| Sending command: echo $?
[stdlog] 2022-11-08 02:32:25,596 aexpect.client DEBUG| Sending command: parted -s "/dev/vdq" mklabel msdos  <--- new disk vdq is accepted by OS
[stdlog] 2022-11-08 02:32:25,721 aexpect.client DEBUG| Sending command: echo $?

[stdlog] 2022-11-08 02:32:27,384 avocado.test INFO | Start iozone under background.
.....
[stdlog] 2022-11-08 02:32:28,851 aexpect.client DEBUG| Sending command: pgrep -lx iozone
[stdlog] 2022-11-08 02:32:40,423 aexpect.client DEBUG| Sending command: echo $?
[stderr] Exception in thread Thread-29:
[stderr] Traceback (most recent call last):
[stderr]   File "/usr/lib64/python3.6/threading.py", line 937, in _bootstrap_inner
[stderr]     self.run()
[stderr]   File "/home/kar/workspace/avocado-vt/virttest/utils_misc.py", line 125, in run
[stderr]     self._retval = self._target(*self._args, **self._kwargs)
[stderr]   File "/home/kar/workspace/root/avocado/data/avocado-vt/virttest/test-providers.d/downloads/io-github-autotest-qemu/provider/storage_benchmark.py", line 337, in run
[stderr]     return super(Iozone, self).run(cmd, timeout)
[stderr]   File "/home/kar/workspace/root/avocado/data/avocado-vt/virttest/test-providers.d/downloads/io-github-autotest-qemu/provider/storage_benchmark.py", line 243, in run
[stderr]     return self.session.cmd(cmd, timeout=timeout)
[stderr]   File "/home/kar/workspace/lib/python3.6/site-packages/aexpect-1.6.4-py3.6.egg/aexpect/client.py", line 1356, in cmd
[stderr]     raise ShellCmdError(cmd, status, output)
[stderr] aexpect.exceptions.ShellCmdError: Shell command failed: '/home/iozone_inst/src/current/iozone -az -g 10G -y 32k -i 0 -i 1 -I -f /mnt/vdq1/iozone_test'    (status: 61,    output: '\tIozone: Performance Test of File I/O\n\t        Version $Revision: 3.489 $\n\t\tCompiled for 64 bit mode.\n\t\tBuild: linux-AMD64 \n\n\tContributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins\n\t             Al Slater, Scott Rhine, Mike Wisner, Ken Goss\n\t             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,\n\t             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,\n\t             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,\n\t             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,\n\t             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,\n\t             Vangel Bojaxhi, Ben England, Vikentsi Lapa,\n\t             Alexey Skidanov, Sudhir Kumar.\n\n\tRun began: Tue Nov  8 15:32:26 2022\n\n\tAuto Mode\n\tCross over of record size disabled.\n\tUsing maximum file size of 10485760 kilobytes.\n\tUsing Minimum Record Size 32 kB\n\tO_DIRECT feature enabled\n\tCommand line used: /home/iozone_inst/src/current/iozone -az -g 10G -y 32k -i 0 -i 1 -I -f /mnt/vdq1/iozone_test\n\tOutput is in kBytes/sec\n\tTime Resolution = 0.000001 seconds.\n\tProcessor cache size set to 1024 kBytes.\n\tProcessor cache line size set to 32 bytes.\n\tFile stride size set to 17 * record size.\n                                                              random    random     

iozone error quit

==================================================
kernel log:

2022-11-08 02:32:23: [  678.259446] virtio-pci 0000:06:00.0: enabling device (0000 -> 0002)
2022-11-08 02:32:23: [  678.270099] virtio_blk virtio2: [vdq] 62914560 512-byte logical blocks (32.2 GB/30.0 GiB)
2022-11-08 02:32:23: [  678.271385] vdq: detected capacity change from 0 to 32212254720   <<-----new disk detected.
2022-11-08 02:32:23: [  678.273388]  vdq: vdq1    
2022-11-08 02:32:25: [  679.957501]  vdq:
2022-11-08 02:32:26: [  680.372993]  vdq: vdq1
2022-11-08 02:32:27: [  681.517270] EXT4-fs (vdq1): mounted filesystem with ordered data mode. Opts: (null)
2022-11-08 02:32:40: [  694.648027] blk_update_request: I/O error, dev vdq, sector 614401 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.649759] blk_update_request: I/O error, dev vdq, sector 616433 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.651326] blk_update_request: I/O error, dev vdq, sector 618465 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
2022-11-08 02:32:40: [  694.652792] blk_update_request: I/O error, dev vdq, sector 618497 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.654316] blk_update_request: I/O error, dev vdq, sector 620529 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
2022-11-08 02:32:40: [  694.655847] blk_update_request: I/O error, dev vdq, sector 622561 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
2022-11-08 02:36:04: [  898.842243] vda1: Can't mount, would change RO state    < --- vda1 (root partition broken result in iozone quit) ?
2022-11-08 02:36:04: [[0;32m  OK  [0m] Stopped GSSAPI Proxy Daemon.

=========================================
2022-11-08 02:36:04: [  898.842243] vda1: Can't mount, would change RO state    < --- vda1 (root partition broken result in iozone quit) ?
I think the previous unplugs broken the guest os system file system (not the hotplugged disk)

Comment 8 Kevin Wolf 2022-11-17 16:01:20 UTC
(In reply to qing.wang from comment #7)
> The iozone invocation has no issue on the host, and it is executed after
> each hotplug finished.
> The process in guest is:
>  hotplugging disk -> iozone -> check iozone pid -> sleep 30s -> check iozone
> pid to confirm IO is doing.

Can iozone finish successfully on the guest when you never unplug a disk in this guest?

> 2022-11-08 02:32:40: [  694.655847] blk_update_request: I/O error, dev vdq,
> sector 622561 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
> 2022-11-08 02:36:04: [  898.842243] vda1: Can't mount, would change RO state
> < --- vda1 (root partition broken result in iozone quit) ?

Note that there is a big time difference between those two messages. I don't think the vda1 error is related to the vdq errors, but it's rather an unrelated quirk of the guest shutdown. The same message exists in the logs of the successful case from comment 5.

Comment 9 qing.wang 2022-11-21 09:50:11 UTC
(In reply to Kevin Wolf from comment #8)
> (In reply to qing.wang from comment #7)
> > The iozone invocation has no issue on the host, and it is executed after
> > each hotplug finished.
> > The process in guest is:
> >  hotplugging disk -> iozone -> check iozone pid -> sleep 30s -> check iozone
> > pid to confirm IO is doing.
> 
> Can iozone finish successfully on the guest when you never unplug a disk in
> this guest?

Yes, it runs the same command on the disk over 8 hours.

> 
> > 2022-11-08 02:32:40: [  694.655847] blk_update_request: I/O error, dev vdq,
> > sector 622561 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
> > 2022-11-08 02:36:04: [  898.842243] vda1: Can't mount, would change RO state
> > < --- vda1 (root partition broken result in iozone quit) ?
> 
> Note that there is a big time difference between those two messages. I don't
> think the vda1 error is related to the vdq errors, but it's rather an
> unrelated quirk of the guest shutdown. The same message exists in the logs
> of the successful case from comment 5.

Comment 10 Vivek Goyal 2022-11-21 19:25:31 UTC
(In reply to Kevin Wolf from comment #6)
> So what you're saying is that the last iozone instance is failing even
> though its disk has _not_ been unplugged yet, right?
> 
> The messages in the kernel log do look a bit different from the other
> instances caused by the unplug:
> 
> 2022-11-08 02:32:27: [  681.517270] EXT4-fs (vdq1): mounted filesystem with
> ordered data mode. Opts: (null)
> 2022-11-08 02:32:40: [  694.648027] blk_update_request: I/O error, dev vdq,
> sector 614401 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
> 2022-11-08 02:32:40: [  694.649759] blk_update_request: I/O error, dev vdq,
> sector 616433 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
> 2022-11-08 02:32:40: [  694.651326] blk_update_request: I/O error, dev vdq,
> sector 618465 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
> 2022-11-08 02:32:40: [  694.652792] blk_update_request: I/O error, dev vdq,
> sector 618497 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
> 2022-11-08 02:32:40: [  694.654316] blk_update_request: I/O error, dev vdq,
> sector 620529 op 0x0:(READ) flags 0x4000 phys_seg 254 prio class 0
> 2022-11-08 02:32:40: [  694.655847] blk_update_request: I/O error, dev vdq,
> sector 622561 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
> 

These block error messages indeed do seem suspicious. They are not present in success case. Also in error case, I am seeing these once.

So looks like we hot plugged disk /dev/vdq and it was mounted at 2.:32:27 and after 13 seconds we started getting I/O
errors on this disk. 

Are there any qemu logs as well corresponding to above error. May be that will help figure out why I/O to disk /dev/vdq failed.

Comment 11 qing.wang 2022-11-22 08:41:42 UTC
I upload  a new failed log:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-22

It has same error blk_update_request and check 
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-22/test-results/1-Host_RHEL.m8.u8.product_rhel.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.8.8.0.x86_64.io-github-autotest-qemu.hotplug_unplug_during_io_repeat.q35/catch_monitor-avocado-vt-vm1-pid-16413.log

There is error message .
2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds": 981048}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "drive_stg0", "reason": "Invalid argument", "operation": "write", "action": "report"}}
2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds": 981172}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "drive_stg0", "reason": "Invalid argument", "operation": "write", "action": "report"}}
2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds": 981223}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "drive_stg0", "reason": "Invalid argument", "operation": "write", "action": "report"}}
2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds": 981271}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "drive_stg0", "reason": "Invalid argument", "operation": "write", "action": "report"}}
2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds": 981304}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "drive_stg0", "reason": "Invalid argument", "operation": "write", "action": "report"}}
2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds": 981328}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "drive_stg0", "reason": "Invalid argument", "operation": "write", "action": "report"}}

The same message also found in :
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-11/test-results/1-Host_RHEL.m8.u8.product_rhel.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.8.8.0.x86_64.io-github-autotest-qemu.hotplug_unplug_during_io_repeat.q35/catch_monitor-avocado-vt-vm1-pid-794693.log

I check the file, it indicates has enough space. 

image: /home/kvm_autotest_root/images/storage0.qcow2
file format: qcow2
virtual size: 30 GiB (32212254720 bytes)
disk size: 2.36 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
image: /home/kvm_autotest_root/images/rhel880-64-virtio.qcow2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 5.79 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

Comment 12 Kevin Wolf 2022-11-22 13:36:45 UTC
(In reply to qing.wang from comment #11)
> There is error message .

Oh, thank you, this is important information. It seems that I looked in the wrong file.

What is the generally speaking the difference between qmpmonitor1-avocado-... and catch_monitor-avocado...? Are there two QMP monitors for the same VM in Avocado test cases?

> 2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds":
> 981048}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false,
> "node-name": "drive_stg0", "reason": "Invalid argument", "operation":
> "write", "action": "report"}}

So there is in fact a failure on the host. EINVAL probably means that something went wrong with the request alignment.

Can we get an strace? It should both show the failing I/O request with EINVAL (with its offset and length, hopefully showing the misalignment) and the probing of the alignment when the image file is opened.

Comment 13 qing.wang 2022-11-23 08:31:37 UTC

I reupload a new failed log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-22
(In reply to Kevin Wolf from comment #12)
> (In reply to qing.wang from comment #11)
> > There is error message .
> 
> Oh, thank you, this is important information. It seems that I looked in the
> wrong file.
> 
> What is the generally speaking the difference between
> qmpmonitor1-avocado-... and catch_monitor-avocado...? Are there two QMP
> monitors for the same VM in Avocado test cases?
> 
catch_monitor used to do something like heartbeat  checking and record screenshot
qmdmonitor used to record cases using qmp command.

> > 2022-11-22 03:28:18: {"timestamp": {"seconds": 1669105695, "microseconds":
> > 981048}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false,
> > "node-name": "drive_stg0", "reason": "Invalid argument", "operation":
> > "write", "action": "report"}}
> 
> So there is in fact a failure on the host. EINVAL probably means that
> something went wrong with the request alignment.
> 
> Can we get an strace? It should both show the failing I/O request with
> EINVAL (with its offset and length, hopefully showing the misalignment) and
> the probing of the alignment when the image file is opened.

I reupload a new failed case log and the strace full log may refer to
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2141964/2022-11-22/strace.log.lz4

Comment 15 Hanna Czenczek 2022-12-07 09:31:07 UTC
Took a quick look at the strace (sorry a quick look took so long), and the only EINVALs it shows are the ones that occur during probing of the I/O and buffer alignment: They’re always ten pread64()s grouped together, the first five read lengths 1/512/1024/2048/4096 to get the I/O alignment (the 4096 one succeeds), the next five similarly read to buffers that are aligned to 1/512/1024/2048/4096 to get the buffer alignment (again, the 4096 one succeeds).  No other EINVALs occur, which may indicate that it came from within qemu.  I’ll try to look into those EINVAL-generating places (especially in qcow2) to see if anything stands out.

Comment 17 Kevin Wolf 2023-03-01 10:27:27 UTC
I just noticed that this bug exists only for RHEL 8. Do we know if it can be reproduced on RHEL 9?

Comment 18 Hanna Czenczek 2023-03-02 17:32:43 UTC
Here’s what I’ve been trying so far:
- As written in comment 15, I’ve looked through the strace, but found no occurrences of EINVAL that would lead to an error visible to the guest or on the QMP monitor (the only instances I found occurred during probing of the I/O alignment).
- Thus I thought maybe the error is produced by qemu itself, so I’ve looked through the source code, at places where EINVAL is generated, but did not find anything that looked like it might cause the problem seen here.
- I’ve tried to reproduce the issue myself, but so far without any luck.  (To note, I don’t have a physical device with 4k sectors, so I had to resort to using `losetup --sector-size 4096`.)

Can you perhaps see whether the issue appears also in the following configurations?
A. Using a raw storage image instead of qcow2,
B. Adding {"logical_block_size": 4096, "physical_block_size": 4096} to the device_add QMP command that adds the virtio-blk device.

Comment 19 Hanna Czenczek 2023-03-02 19:54:27 UTC
Correction, I was able to reproduce the issue now, and it comes from qemu_iovec_init_extended() (called through bdrv_pad_request()): The number of I/O vector elements required for the operation (1025) exceeds IOV_MAX (1024), a limitation that 4c002cef0e9abe7135d7916c51abce47f7fc1ee2 has introduced.

I’ll try to prepare a test build tomorrow with that commit reverted.

Question is, if that’s indeed the problem, how do we resolve this?  If the internal padding process required for a RMW cycle leads to exceeding the IOV_MAX limit, we (1) can’t really ignore it (reverting the commit seems wrong), and (2) we can’t really make it the guest’s problem, as we do now.  Splitting the write into two seems like an obvious solution, but that’s complicated and may lead to various issues I cannot yet fully foresee.  Joining a couple of adjacent I/O vector elements using a bounce buffer might be simple enough and help alleviate the issue.

Comment 21 qing.wang 2023-03-03 09:55:01 UTC
(In reply to Hanna Czenczek from comment #18)
> Here’s what I’ve been trying so far:
> - As written in comment 15, I’ve looked through the strace, but found no
> occurrences of EINVAL that would lead to an error visible to the guest or on
> the QMP monitor (the only instances I found occurred during probing of the
> I/O alignment).
> - Thus I thought maybe the error is produced by qemu itself, so I’ve looked
> through the source code, at places where EINVAL is generated, but did not
> find anything that looked like it might cause the problem seen here.
> - I’ve tried to reproduce the issue myself, but so far without any luck. 
> (To note, I don’t have a physical device with 4k sectors, so I had to resort
> to using `losetup --sector-size 4096`.)
> 
> Can you perhaps see whether the issue appears also in the following
> configurations?
> A. Using a raw storage image instead of qcow2,

raw format still hit same issue

http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/section2/images_backup/qbugs/2141964/2023-03-03/job-2023-03-03T02.42-745083f/

> B. Adding {"logical_block_size": 4096, "physical_block_size": 4096} to the
> device_add QMP command that adds the virtio-blk device.

Comment 22 Hanna Czenczek 2023-03-03 12:47:55 UTC
Created attachment 1947687 [details]
Simpler reproducer

I think the issue can be reproduced much more simply (assuming the build I posted over on BZ 2174676 does indeed “fix” it): Running a pwritev() in a guest with 1024 I/O vectors to an area that is only aligned to 512 bytes, while the host disk has 4k sectors, yields I/O errors in the guest (reported by the kernel).

I.e., use as follows:
$ qemu-system-x86_64 \
    -enable-kvm -m 1G \
    -blockdev host_device,node-name=stg,filename=/dev/loop0,cache.direct=on \
    -blockdev file,node-name=sys-prot,filename=${guest-os-disk}.qcow2 \
    -blockdev qcow2,node-name=sys-fmt,file=sys-prot \
    -device virtio-blk,drive=sys-fmt \
    -device virtio-blk,drive=stg \
    -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 7}, "package": "qemu-foo-bar"}, "capabilities": ["oob"]}}
{"execute":"qmp_capabilities"}
{"return": {}}

Then, in the guest:
$ sudo pwritev 512 1024 /dev/vdb
[   19.098402] I/O error, dev vdb, sector 1 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
[   19.098500] I/O error, dev vdb, sector 127 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
[   19.098530] I/O error, dev vdb, sector 253 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 2
[   19.098580] I/O error, dev vdb, sector 257 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
[   19.098620] I/O error, dev vdb, sector 383 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
[   19.098650] I/O error, dev vdb, sector 509 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 2
[   19.098684] I/O error, dev vdb, sector 513 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
[   19.098732] I/O error, dev vdb, sector 639 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
[   19.098762] I/O error, dev vdb, sector 765 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 2
[   19.098797] I/O error, dev vdb, sector 769 op 0x1:(WRITE) flags 0xc800 phys_seg 126 prio class 2
Writing failed: Input/output error

And in the QMP monitor:

{"timestamp": {"seconds": 1677847471, "microseconds": 716262}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716347}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716384}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716424}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716464}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716498}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716536}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716577}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716610}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716647}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716687}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1677847471, "microseconds": 716725}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "stg", "reason": "Das Argument ist ung\u00FCltig", "operation": "write", "action": "report"}}

Comment 23 Hanna Czenczek 2023-03-03 12:52:32 UTC
Addendum to comment 22: Forgot to mention the prerequisite steps to get a /dev/loop0 to test:
$ truncate -s 1G ~/tmp/fs.img
$ sudo losetup -f --show --sector-size 4096 --direct-io=on ~/tmp/fs.img
/dev/loop0


Alternatively, the issue can also be seen entirely without a guest:

$ truncate -s 1G ~/tmp/fs.img
$ sudo losetup -f --show --sector-size 4096 --direct-io=on ~/tmp/fs.img
/dev/loop0
$ qemu-io -f raw -t none -c "writev -P 42 512 $(yes 512 | head -n 1024 | tr '\n' ' ')" /dev/loop0
writev failed: Invalid argument
$ qemu-io -f raw -t none -c "writev -P 42 4096 $(yes 512 | head -n 1024 | tr '\n' ' ')" /dev/loop0
wrote 524288/524288 bytes at offset 4096
512 KiB, 1 ops; 00.00 sec (758.435 MiB/sec and 1516.8706 ops/sec)

Comment 25 Yanan Fu 2023-07-26 10:19:48 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 38 qing.wang 2023-07-31 08:14:22 UTC
Passed on 

Red Hat Enterprise Linux release 8.9 Beta (Ootpa)
4.18.0-504.el8.x86_64
qemu-kvm-6.2.0-37.module+el8.9.0+19491+15e62c0a.x86_64
seabios-bin-1.16.0-3.module+el8.9.0+18724+20190c23.noarch
edk2-ovmf-20220126gitbb1bba3d77-5.el8.noarch
libvirt-8.0.0-21.module+el8.9.0+19166+e262ca96.x86_64
virtio-win-prewhql-0.1-239.iso


python ConfigTest.py --testcase=hotplug_unplug_during_io_repeat.default,block_io_with_unaligned_offset --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.9.3.0 --driveformat=virtio_blk --imageformat=qcow2 --machines=q35 --customsparams="vm_mem_limit = 12G\nimage_aio=native"  --firmware=default_bios --netdst=virbr0 --nrepeat=20

Comment 41 qing.wang 2023-08-01 08:17:25 UTC
Verified according to comment #38 result.

Comment 43 errata-xmlrpc 2023-11-14 15:33:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6980