Bug 1738839
Summary: | I/O error when virtio-blk disk is backed by a raw image on 4k disk | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Dan Horák <dhorak> | |
Component: | qemu-kvm | Assignee: | Thomas Huth <thuth> | |
Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 8.0 | CC: | areis, cohuck, coli, dbenoit, dgibson, dzheng, juzhang, lcapitulino, ngu, qzhang, rbalakri, smitterl, thuth, virt-maint, wchadwic, zhenyzha | |
Target Milestone: | rc | Keywords: | Patch | |
Target Release: | 8.1 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-2.12.0-86.module+el8.1.0+4146+4ed2d185 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1749134 (view as bug list) | Environment: | ||
Last Closed: | 2019-11-05 20:51:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1749134 |
Description
Dan Horák
2019-08-08 09:15:23 UTC
FWIW this affects the "OCP on z" high-priority project. We could reproduce the bz with following steps: 1. Install a guest on a qcow2/virtio-blk-ccw disk image. 2. Boot up a guest with above image a a raw/virtio-blk-ccw data disk image. 3. Try to mkfs.xfs the data disk, the bug was reproduced. And if continue to mkfs.ext4, it's a success. [root@localhost ~]# ll /dev/vd* ll /dev/vd* brw-rw----. 1 root disk 252, 0 Aug 9 15:20 /dev/vda brw-rw----. 1 root disk 252, 1 Aug 9 15:20 /dev/vda1 brw-rw----. 1 root disk 252, 2 Aug 9 15:20 /dev/vda2 brw-rw----. 1 root disk 252, 16 Aug 9 15:20 /dev/vdb [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# mkfs.xfs /dev/vdb mkfs.xfs /dev/vdb meta-data=/dev/vdb isize=512 agcount=4, agsize=131072 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=524288, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [ 267.368700] print_req_error: I/O error, dev vdb, sector 4194048 flags 8801 mkfs.xfs: pwrite failed: Input/output error [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# mkfs.ext4 /dev/vdb mkfs.ext4 /dev/vdb mke2fs 1.44.3 (10-July-2018) Creating filesystem with 524288 4k blocks and 131072 inodes Filesystem UUID: 57d3d5f3-d058-41c2-aeac-ee6d6722d668 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done The sw version: Host kernel: 4.18.0-107.el8.s390x Guest kernel: 4.18.0-80.el8.s390x Qemu-kvm: qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01.s390x Will have a try with the latest qemu. Likely this is a deficiency in qemu, I've retried with the same commands (VM installation) on Fedora 30 host with qemu-system-s390x-3.1.1-1.fc30.s390x and it seems to work OK. Last week Zhenyu talked about this bug and it's reproduced on s390x only, can not reproduce it on ppc64le and x86. Tested the same steps on Power9,no hit this issue. [root@dhcp19-129-145 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 200G 0 disk ├─vda1 252:1 0 4M 0 part ├─vda2 252:2 0 1G 0 part /boot └─vda3 252:3 0 199G 0 part ├─rhel_dhcp19--129--61-root 253:0 0 50G 0 lvm / ├─rhel_dhcp19--129--61-swap 253:1 0 10G 0 lvm [SWAP] └─rhel_dhcp19--129--61-home 253:2 0 139G 0 lvm /home vdb 252:16 0 10G 0 disk [root@dhcp19-129-145 ~]# cd /home/ [root@dhcp19-129-145 home]# mkfs.xfs /dev/vdb meta-data=/dev/vdb isize=512 agcount=4, agsize=655360 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=2621440, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [root@dhcp19-129-145 home]# dmesg | grep vdb [ 1.712776] virtio_blk virtio2: [vdb] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB) [root@dhcp19-129-145 home]# The sw version: Host kernel: 4.18.0-128.el8.ppc64le Guest kernel: 4.18.0-128.el8.ppc64le Qemu-kvm: qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0 Will have a try with the latest qemu on S390. Hi! What kind of file system and device is used on the host for /var/lib/libvirt ? DASD? FCP SCSI disks? XFS? Ext4? (In reply to Thomas Huth from comment #8) > Hi! What kind of file system and device is used on the host for > /var/lib/libvirt ? DASD? FCP SCSI disks? XFS? Ext4? It was a default RHEL-8 install done by beaker, but I've already reinstalled the guest. But IIRC the host was z/VM RHEL-8 guest with with xfs on LVM on DASDs. Hi Dan, I think this might be related to the fact that DASDs use a block size of 4096, but mkfs.xfs tries to use "sectsz=512" here. Since you've specified "cache=none", the guest has to use 4096, too. Could you please try if the problem goes away when you use "cache=writeback" in the command line on your host instead? Also, when this was working for you with Fedora 30, did you maybe use another kind of disks (SCSI instead of DASDs) or another caching mode there? Otherwise, I really wonder why this worked there differently... The guest for both rhel8 and f30 is the same, but you are right, rhel8 used xfs for the image location, while f30 has ext4. I'll recheck ASAP, keeping needinfo till then. The problem likely only occurs with mkfs.xfs (and not with mkfs.ext4), since this program tries to erase some blocks at the end of the partition while it is preparing the device: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c?h=libxfs-5.0-sync#n3274 This then causes these "print_req_error: I/O error, dev vdb, sector" failures - apparently this ends up in a block write beyond the end of the device. Now what's really weird is that the problem only occurs when the raw disk is still blank. For example, when I do this (with a pre-installed RHEL8 guest): qemu-img create -f raw $HOME/test.raw 10G /usr/libexec/qemu-kvm -drive file=$HOME/rhel8.qcow2,if=none,id=disk1 \ -device virtio-blk-ccw,drive=disk1 -nographic -m 1G \ -drive file=$HOME/test.raw,format=raw,if=none,id=disk2,cache=none \ -device virtio-blk-ccw,drive=disk2 and then in the guest: parted /dev/vdb mktable gpt parted /dev/vdb mkpart primary xfs "0%" "100%" mkfs.xfs /dev/vdb1 I then get the error "print_req_error: I/O error, dev vdb, sector 20969216 flags 8801". But if I then shut down the guest and start it again (a reboot is not enough), mkfs.xfs works like a charm. It also works fine if I already prepare the partition on the host before starting QEMU: qemu-img create -f raw $HOME/test.raw 10G parted $HOME/test.raw mktable gpt parted $HOME/test.raw mkpart primary xfs "0%" "100%" So it seems like either the virtio driver of the guest kernel or QEMU get something wrong here if the disk image is initally blank... After hacking through the code of mkfs.xfs and the virtio-block code in the guest kernel, I finally came to the conclusion that the problem is likely rather in QEMU, indeed. And after updating my local QEMU git tree, I discovered that the problem goes away with the latest and greatest version from the master branch! I bisected the issue and the fix for the problem is this commit here: https://github.com/qemu/qemu/commit/a6b257a08e3d72219f03e461a52152672fec0612 *** Bug 1744207 has been marked as a duplicate of this bug. *** (In reply to Dan Horák from comment #0) > Description of problem: > > [anaconda root@localhost ~]# fdisk -l /dev/vda > Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 512 bytes > I/O size (minimum/optimal): 512 bytes / 512 bytes > Disklabel type: dos > Disk identifier: 0x31a91458 > > Device Boot Start End Sectors Size Id Type > /dev/vda1 2048 2099199 2097152 1G 83 Linux > > > [anaconda root@localhost ~]# dmesg | grep vda > [ 9.549329] virtio_blk virtio0: [vda] 20971520 512-byte logical blocks > (10.7 GB/10.0 GiB) > > [ 631.327503] vda: > [ 636.477787] vda: > [ 653.054900] print_req_error: I/O error, dev vda, sector 2048 flags 0 > [ 653.056996] print_req_error: I/O error, dev vda, sector 2098944 flags 8801 > [ 1618.702294] print_req_error: I/O error, dev vda, sector 2098944 flags 8801 Hi Thomas, I saw the fix is about 4k sector sizes, but the description in comment 0 is 512 bytes. Could you please help explain it ? Thanks. (In reply to CongLi from comment #26) [...] > I saw the fix is about 4k sector sizes, but the description in comment 0 > is 512 bytes. The guest wants to use 512 byte sectors, but the raw disk image on the host is located on a DASD disk with 4k sectors. So it's about the way QEMU deals with the disk image on the host - which is 4k, but without the fix for this BZ, QEMU was not able to detect it properly with sparse raw files, so it tried to access the disk image in a wrong way, leading to an error which it then passed to the guest. reproduce it on qemu-kvm-2.12.0-85.module+el8.1.0+4066+0f1aadab. Details as below: Host: kernel-4.18.0-137.el8.x86_64 qemu-kvm-2.12.0-85.module+el8.1.0+4066+0f1aadab Guest: kernel-4.18.0-135.el8.x86_64 1. create raw image on 4k disk on host (e.g. sdc) # fdisk -l /dev/sdc Disk /dev/sdc: 558.4 GiB, 599550590976 bytes, 146374656 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0xa2466bb0 Device Boot Start End Sectors Size Id Type /dev/sdc1 * 256 262399 262144 1G 83 Linux /dev/sdc2 262400 146374655 146112256 557.4G 8e Linux LVM # mkdir /mnt/test # mount /dev/sdc1 /mnt/test/ # qemu-img create -f raw /mnt/test/test.raw 1G 2. boot guest with below cmd lines: /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_vkzzzsjy/monitor-qmpmonitor1-20190827-054125-X8YHvELh,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_vkzzzsjy/monitor-catch_monitor-20190827-054125-X8YHvELh,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idnZn1j7 \ -chardev socket,nowait,server,path=/var/tmp/avocado_vkzzzsjy/serial-serial0-20190827-054125-X8YHvELh,id=chardev_serial0 \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190827-054125-X8YHvELh,path=/var/tmp/avocado_vkzzzsjy/seabios-20190827-054125-X8YHvELh,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190827-054125-X8YHvELh,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-3,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:c8:3a:2f:3f:1c,id=idXNk6ZE,netdev=id595yhy,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=id595yhy,vhost=on \ -m 14336 \ -smp 24,maxcpus=24,cores=12,threads=1,sockets=2 \ -cpu 'Skylake-Server',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ -drive id=drive_data,if=none,snapshot=off,cache=none,format=raw,file=/mnt/test/test.raw \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -device virtio-blk-pci,id=data1,drive=drive_data,bus=pcie.0-root-port-5,addr=0x0 \ 3. create partition and format it in guest # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 19G 0 part ├─rhel_bootp--73--75--125-root 253:0 0 17G 0 lvm / └─rhel_bootp--73--75--125-swap 253:1 0 2G 0 lvm [SWAP] vdb 252:16 0 1G 0 disk # fdisk -l /dev/vdb Disk /dev/vdb: 1 GiB, 1073741824 bytes, 2097152 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes # parted /dev/vdb mktable gpt # parted /dev/vdb mkpart primary xfs "0%" "100%" # mkfs.xfs /dev/vdb1 meta-data=/dev/vdb1 isize=512 agcount=4, agsize=65408 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=261632, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1566, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite failed: Input/output error /var/log/message: Sep 4 13:52:33 localhost kernel: vdb: vdb1 Sep 4 13:52:43 localhost kernel: print_req_error: I/O error, dev vdb, sector 2094848 flags 8801 after step 3, hit this issue. Retested on qemu-kvm-2.12.0-86.module+el8.1.0+4146+4ed2d185, not hit this issue. So set status to VERIFIED. after step 3, format successfully, not hit any error. # parted /dev/vdb mktable gpt # parted /dev/vdb mkpart primary xfs "0%" "100%" # mkfs.xfs /dev/vdb1 meta-data=/dev/vdb1 isize=512 agcount=4, agsize=65408 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=261632, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1566, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # dmesg |grep vdb [ 2.602719] virtio_blk virtio2: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB) [ 195.485260] vdb: [ 230.093727] vdb: vdb1 # mount /dev/vdb1 /mnt/ # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 19G 0 part ├─rhel_bootp--73--75--125-root 253:0 0 17G 0 lvm / └─rhel_bootp--73--75--125-swap 253:1 0 2G 0 lvm [SWAP] vdb 252:16 0 1G 0 disk └─vdb1 252:17 0 1022M 0 part /mnt # dmesg |grep vdb [ 2.602719] virtio_blk virtio2: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB) [ 195.485260] vdb: [ 230.093727] vdb: vdb1 [ 922.793141] XFS (vdb1): Mounting V5 Filesystem [ 922.802219] XFS (vdb1): Ending clean mount The upstream fix here went in after the v4.1 release, so AFAICT this will also affect The qemu-4.1 based RHEL-AV version. I think we may be seeing this in bug 1747110, although I think the main problems there are probably unrelated to this. Cloning for RHEL-AV. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3345 |