Bug 1738839

Summary:	I/O error when virtio-blk disk is backed by a raw image on 4k disk
Product:	Red Hat Enterprise Linux 8	Reporter:	Dan Horák <dhorak>
Component:	qemu-kvm	Assignee:	Thomas Huth <thuth>
Status:	CLOSED ERRATA	QA Contact:	Xueqiang Wei <xuwei>
Severity:	medium	Docs Contact:
Priority:	high
Version:	8.0	CC:	areis, cohuck, coli, dbenoit, dgibson, dzheng, juzhang, lcapitulino, ngu, qzhang, rbalakri, smitterl, thuth, virt-maint, wchadwic, zhenyzha
Target Milestone:	rc	Keywords:	Patch
Target Release:	8.1	Flags:	pm-rhel: mirror+
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-2.12.0-86.module+el8.1.0+4146+4ed2d185	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1749134 (view as bug list)		Environment:
Last Closed:	2019-11-05 20:51:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1749134

Description Dan Horák 2019-08-08 09:15:23 UTC

Description of problem:
We see I/O errors when doing eg. mkfs.xfs on virtio-blk disk backed by a raw image on s390x.

Version-Release number of selected component (if applicable):
qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092.s390x

How reproducible:
100%

Steps to Reproduce:
1. qemu-img create -f raw /var/lib/libvirt/images/rhel8-1.raw 10G
2. virt-install --name rhel8-1 --memory 8192 --vcpus 2 --disk /var/lib/libvirt/images/rhel8-1.raw,format=raw,bus=virtio,cache=none --graphics none --location http://download-node-02.eng.bos.redhat.com/released/RHEL-8/8.0.0/BaseOS/s390x/os/
3. manual install with "use all disk space" and "LVM" disk layout


Actual results:
[anaconda root@localhost ~]# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0         7:0    0 366.5M  1 loop 
loop1         7:1    0     3G  1 loop 
├─live-rw   253:0    0     3G  0 dm   /
└─live-base 253:1    0     3G  1 dm   
loop2         7:2    0    32G  0 loop 
└─live-rw   253:0    0     3G  0 dm   /
vda         252:0    0    10G  0 disk 
└─vda1      252:1    0     1G  0 part 
[anaconda root@localhost ~]# mkfs.xfs /dev/vda1

meta-data=/dev/vda1              isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: pwrite failed: Input/output error

[anaconda root@localhost ~]# fdisk -l /dev/vda
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x31a91458

Device     Boot Start     End Sectors Size Id Type
/dev/vda1        2048 2099199 2097152   1G 83 Linux


[anaconda root@localhost ~]# dmesg | grep vda
[    9.549329] virtio_blk virtio0: [vda] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB)

[  631.327503]  vda:
[  636.477787]  vda:
[  653.054900] print_req_error: I/O error, dev vda, sector 2048 flags 0
[  653.056996] print_req_error: I/O error, dev vda, sector 2098944 flags 8801
[ 1618.702294] print_req_error: I/O error, dev vda, sector 2098944 flags 8801



Expected results:
no errors


Additional info:
works fine when qcow2 images is used instead

Comment 1 Dan Horák 2019-08-08 09:20:13 UTC

FWIW this affects the "OCP on z" high-priority project.

Comment 3 Gu Nini 2019-08-09 08:01:37 UTC

We could reproduce the bz with following steps:
1. Install a guest on a qcow2/virtio-blk-ccw disk image.
2. Boot up a guest with above image a a raw/virtio-blk-ccw data disk image.
3. Try to mkfs.xfs the data disk, the bug was reproduced. And if continue to mkfs.ext4, it's a success.
[root@localhost ~]# ll /dev/vd*
ll /dev/vd*
brw-rw----. 1 root disk 252,  0 Aug  9 15:20 /dev/vda
brw-rw----. 1 root disk 252,  1 Aug  9 15:20 /dev/vda1
brw-rw----. 1 root disk 252,  2 Aug  9 15:20 /dev/vda2
brw-rw----. 1 root disk 252, 16 Aug  9 15:20 /dev/vdb
[root@localhost ~]# 

[root@localhost ~]# 

[root@localhost ~]# mkfs.xfs /dev/vdb
mkfs.xfs /dev/vdb
meta-data=/dev/vdb               isize=512    agcount=4, agsize=131072 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[  267.368700] print_req_error: I/O error, dev vdb, sector 4194048 flags 8801
mkfs.xfs: pwrite failed: Input/output error
[root@localhost ~]# 

[root@localhost ~]# 

[root@localhost ~]# mkfs.ext4 /dev/vdb
mkfs.ext4 /dev/vdb
mke2fs 1.44.3 (10-July-2018)
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 57d3d5f3-d058-41c2-aeac-ee6d6722d668
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 


The sw version:
Host kernel: 4.18.0-107.el8.s390x
Guest kernel: 4.18.0-80.el8.s390x
Qemu-kvm: qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01.s390x


Will have a try with the latest qemu.

Comment 4 Dan Horák 2019-08-09 08:59:46 UTC

Likely this is a deficiency in qemu, I've retried with the same commands (VM installation) on Fedora 30 host with qemu-system-s390x-3.1.1-1.fc30.s390x and it seems to work OK.

Comment 6 Qunfang Zhang 2019-08-11 13:36:37 UTC

Last week Zhenyu talked about this bug and it's reproduced on s390x only, can not reproduce it on ppc64le and x86.

Comment 7 Zhenyu Zhang 2019-08-12 02:18:35 UTC

Tested the same steps on Power9,no hit this issue.
[root@dhcp19-129-145 ~]# lsblk
NAME                          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                           252:0    0  200G  0 disk 
├─vda1                        252:1    0    4M  0 part 
├─vda2                        252:2    0    1G  0 part /boot
└─vda3                        252:3    0  199G  0 part 
  ├─rhel_dhcp19--129--61-root 253:0    0   50G  0 lvm  /
  ├─rhel_dhcp19--129--61-swap 253:1    0   10G  0 lvm  [SWAP]
  └─rhel_dhcp19--129--61-home 253:2    0  139G  0 lvm  /home
vdb                           252:16   0   10G  0 disk 
[root@dhcp19-129-145 ~]# cd /home/
[root@dhcp19-129-145 home]# mkfs.xfs /dev/vdb
meta-data=/dev/vdb               isize=512    agcount=4, agsize=655360 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2621440, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@dhcp19-129-145 home]#  dmesg | grep vdb
[    1.712776] virtio_blk virtio2: [vdb] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB)
[root@dhcp19-129-145 home]# 

The sw version:
Host kernel: 4.18.0-128.el8.ppc64le
Guest kernel: 4.18.0-128.el8.ppc64le
Qemu-kvm: qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0


Will have a try with the latest qemu on S390.

Comment 8 Thomas Huth 2019-08-12 06:53:20 UTC

Hi! What kind of file system and device is used on the host for /var/lib/libvirt ? DASD? FCP SCSI disks? XFS? Ext4?

Comment 9 Dan Horák 2019-08-12 11:21:07 UTC

(In reply to Thomas Huth from comment #8)
> Hi! What kind of file system and device is used on the host for
> /var/lib/libvirt ? DASD? FCP SCSI disks? XFS? Ext4?

It was a default RHEL-8 install done by beaker, but I've already reinstalled the guest. But IIRC the host was z/VM RHEL-8 guest with with xfs on LVM on DASDs.

Comment 10 Thomas Huth 2019-08-15 13:27:28 UTC

 Hi Dan,

I think this might be related to the fact that DASDs use a block size of 4096, but mkfs.xfs tries to use "sectsz=512" here. Since you've specified "cache=none", the guest has to use 4096, too. Could you please try if the problem goes away when you use "cache=writeback" in the command line on your host instead?
Also, when this was working for you with Fedora 30, did you maybe use another kind of disks (SCSI instead of DASDs) or another caching mode there? Otherwise, I really wonder why this worked there differently...

Comment 11 Dan Horák 2019-08-15 13:59:46 UTC

The guest for both rhel8 and f30 is the same, but you are right, rhel8 used xfs for the image location, while f30 has ext4. I'll recheck ASAP, keeping needinfo till then.

Comment 12 Thomas Huth 2019-08-16 11:59:04 UTC

The problem likely only occurs with mkfs.xfs (and not with mkfs.ext4), since this program tries to erase some blocks at the end of the partition while it is preparing the device:

 https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c?h=libxfs-5.0-sync#n3274

This then causes these "print_req_error: I/O error, dev vdb, sector" failures - apparently this ends up in a block write beyond the end of the device.

Now what's really weird is that the problem only occurs when the raw disk is still blank. For example, when I do this (with a pre-installed RHEL8 guest):

 qemu-img create -f raw $HOME/test.raw 10G
 /usr/libexec/qemu-kvm -drive file=$HOME/rhel8.qcow2,if=none,id=disk1 \
       -device virtio-blk-ccw,drive=disk1 -nographic -m 1G \
       -drive file=$HOME/test.raw,format=raw,if=none,id=disk2,cache=none \
       -device virtio-blk-ccw,drive=disk2

and then in the guest:

 parted /dev/vdb mktable gpt
 parted /dev/vdb mkpart primary xfs "0%" "100%"
 mkfs.xfs  /dev/vdb1

I then get the error "print_req_error: I/O error, dev vdb, sector 20969216 flags 8801".

But if I then shut down the guest and start it again (a reboot is not enough), mkfs.xfs works like a charm.

It also works fine if I already prepare the partition on the host before starting QEMU:

 qemu-img create -f raw $HOME/test.raw 10G
 parted $HOME/test.raw mktable gpt
 parted $HOME/test.raw mkpart primary xfs "0%" "100%"

So it seems like either the virtio driver of the guest kernel or QEMU get something wrong here if the disk image is initally blank...

Comment 14 Thomas Huth 2019-08-27 06:01:31 UTC

After hacking through the code of mkfs.xfs and the virtio-block code in the guest kernel, I finally came to the conclusion that the problem is likely rather in QEMU, indeed.
And after updating my local QEMU git tree, I discovered that the problem goes away with the latest and greatest version from the master branch! I bisected the issue and the fix for the problem is this commit here:

 https://github.com/qemu/qemu/commit/a6b257a08e3d72219f03e461a52152672fec0612

Comment 25 Hanna Czenczek 2019-09-02 12:46:09 UTC

*** Bug 1744207 has been marked as a duplicate of this bug. ***

Comment 26 CongLi 2019-09-03 01:26:03 UTC

(In reply to Dan Horák from comment #0)
> Description of problem:
> 
> [anaconda root@localhost ~]# fdisk -l /dev/vda
> Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disklabel type: dos
> Disk identifier: 0x31a91458
> 
> Device     Boot Start     End Sectors Size Id Type
> /dev/vda1        2048 2099199 2097152   1G 83 Linux
> 
> 
> [anaconda root@localhost ~]# dmesg | grep vda
> [    9.549329] virtio_blk virtio0: [vda] 20971520 512-byte logical blocks
> (10.7 GB/10.0 GiB)
> 
> [  631.327503]  vda:
> [  636.477787]  vda:
> [  653.054900] print_req_error: I/O error, dev vda, sector 2048 flags 0
> [  653.056996] print_req_error: I/O error, dev vda, sector 2098944 flags 8801
> [ 1618.702294] print_req_error: I/O error, dev vda, sector 2098944 flags 8801


Hi Thomas,

I saw the fix is about 4k sector sizes, but the description in comment 0 
is 512 bytes.

Could you please help explain it ?

Thanks.

Comment 27 Thomas Huth 2019-09-03 05:07:22 UTC

(In reply to CongLi from comment #26)
[...]
> I saw the fix is about 4k sector sizes, but the description in comment 0 
> is 512 bytes.

The guest wants to use 512 byte sectors, but the raw disk image on the host is located on a DASD disk with 4k sectors. So it's about the way QEMU deals with the disk image on the host - which is 4k, but without the fix for this BZ, QEMU was not able to detect it properly with sparse raw files, so it tried to access the disk image in a wrong way, leading to an error which it then passed to the guest.

Comment 31 Xueqiang Wei 2019-09-04 06:24:17 UTC

reproduce it on qemu-kvm-2.12.0-85.module+el8.1.0+4066+0f1aadab.


Details as below:

Host:
kernel-4.18.0-137.el8.x86_64
qemu-kvm-2.12.0-85.module+el8.1.0+4066+0f1aadab

Guest:
kernel-4.18.0-135.el8.x86_64


1. create raw image on 4k disk on host (e.g. sdc)

# fdisk -l /dev/sdc
Disk /dev/sdc: 558.4 GiB, 599550590976 bytes, 146374656 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xa2466bb0

Device     Boot  Start       End   Sectors   Size Id Type
/dev/sdc1  *       256    262399    262144     1G 83 Linux
/dev/sdc2       262400 146374655 146112256 557.4G 8e Linux LVM

# mkdir /mnt/test
# mount /dev/sdc1 /mnt/test/
# qemu-img create -f raw /mnt/test/test.raw 1G


2. boot guest with below cmd lines:

/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_vkzzzsjy/monitor-qmpmonitor1-20190827-054125-X8YHvELh,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_vkzzzsjy/monitor-catch_monitor-20190827-054125-X8YHvELh,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idnZn1j7 \
    -chardev socket,nowait,server,path=/var/tmp/avocado_vkzzzsjy/serial-serial0-20190827-054125-X8YHvELh,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20190827-054125-X8YHvELh,path=/var/tmp/avocado_vkzzzsjy/seabios-20190827-054125-X8YHvELh,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190827-054125-X8YHvELh,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-3,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:c8:3a:2f:3f:1c,id=idXNk6ZE,netdev=id595yhy,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=id595yhy,vhost=on \
    -m 14336  \
    -smp 24,maxcpus=24,cores=12,threads=1,sockets=2  \
    -cpu 'Skylake-Server',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -drive id=drive_data,if=none,snapshot=off,cache=none,format=raw,file=/mnt/test/test.raw \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device virtio-blk-pci,id=data1,drive=drive_data,bus=pcie.0-root-port-5,addr=0x0 \

3. create partition and format it in guest

# lsblk
NAME                             MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda                              252:0    0  20G  0 disk 
├─vda1                           252:1    0   1G  0 part /boot
└─vda2                           252:2    0  19G  0 part 
  ├─rhel_bootp--73--75--125-root 253:0    0  17G  0 lvm  /
  └─rhel_bootp--73--75--125-swap 253:1    0   2G  0 lvm  [SWAP]
vdb                              252:16   0   1G  0 disk 

# fdisk -l /dev/vdb 
Disk /dev/vdb: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

# parted /dev/vdb mktable gpt
# parted /dev/vdb mkpart primary xfs "0%" "100%"
# mkfs.xfs /dev/vdb1 
meta-data=/dev/vdb1              isize=512    agcount=4, agsize=65408 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=261632, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1566, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: pwrite failed: Input/output error

/var/log/message:
Sep  4 13:52:33 localhost kernel: vdb: vdb1
Sep  4 13:52:43 localhost kernel: print_req_error: I/O error, dev vdb, sector 2094848 flags 8801


after step 3, hit this issue.



Retested on qemu-kvm-2.12.0-86.module+el8.1.0+4146+4ed2d185, not hit this issue. So set status to VERIFIED.


after step 3,  format successfully, not hit any error.

# parted /dev/vdb mktable gpt
# parted /dev/vdb mkpart primary xfs "0%" "100%"
# mkfs.xfs /dev/vdb1 
meta-data=/dev/vdb1              isize=512    agcount=4, agsize=65408 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=261632, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1566, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
 
# dmesg |grep vdb
[    2.602719] virtio_blk virtio2: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB)
[  195.485260]  vdb:
[  230.093727]  vdb: vdb1

# mount /dev/vdb1 /mnt/
# lsblk
NAME                             MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                              252:0    0   20G  0 disk 
├─vda1                           252:1    0    1G  0 part /boot
└─vda2                           252:2    0   19G  0 part 
  ├─rhel_bootp--73--75--125-root 253:0    0   17G  0 lvm  /
  └─rhel_bootp--73--75--125-swap 253:1    0    2G  0 lvm  [SWAP]
vdb                              252:16   0    1G  0 disk 
└─vdb1                           252:17   0 1022M  0 part /mnt

# dmesg |grep vdb
[    2.602719] virtio_blk virtio2: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB)
[  195.485260]  vdb:
[  230.093727]  vdb: vdb1
[  922.793141] XFS (vdb1): Mounting V5 Filesystem
[  922.802219] XFS (vdb1): Ending clean mount

Comment 32 David Gibson 2019-09-05 01:31:11 UTC

The upstream fix here went in after the v4.1 release, so AFAICT this will also affect The qemu-4.1 based RHEL-AV version.  I think we may be seeing this in bug 1747110, although I think the main problems there are probably unrelated to this.  Cloning for RHEL-AV.

Comment 34 errata-xmlrpc 2019-11-05 20:51:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3345