Bug 1743365

Summary: qemu, qemu-img fail to detect alignment with XFS and Gluster/XFS on 4k block device
Product: Red Hat Enterprise Linux 7 Reporter: Nir Soffer <nsoffer>
Component: qemu-kvm-rhevAssignee: Hanna Czenczek <hreitz>
Status: CLOSED ERRATA QA Contact: qing.wang <qinwang>
Severity: high Docs Contact:
Priority: high    
Version: 7.8CC: coli, jinzhao, juzhang, mtessun, qinwang, rcyriac, virt-maint, vjuranek
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-37.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1745443 (view as bug list) Environment:
Last Closed: 2020-03-31 14:34:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1744207, 1745443    

Description Nir Soffer 2019-08-19 18:04:03 UTC
Description of problem:

When using storage with sector size of 4k, qemu and qemu-img fail to probe 
the alignment requirement for direct I/O, and fail with EINVAL when accesing
storage.

Several flows may fail:
- Provisioning a VM on 4k storage, fail when the installer try to create
  filesystems
- Coping disk image from 4k storage, fail when reading from source image
- Copying disk to 4k storage, fail when writing to the target disk

I reproduced the failures with:
- xfs on loop devices using 4k sector size
- gluster backed by xfs, on vdo device (exposing 4k sector size)

The root cause for both issues is alignment probing. The issue was fixed
upstream in this commit:
https://github.com/qemu/qemu/commit/a6b257a08e3d72219f03e461a52152672fec0612

This is the RHEL version of the these Fedora bugs:
- Bug 1737256 - Provisioning VM on 4k gluster storage fails with "Invalid argument" - qemu fail to detect block size
- Bug 1738657 - qemu-img convert fail to read with "Invalid argument" on gluster storage with 4k sector size

I merged both bugs for RHEL since we understand now that both issue are the
same.

Version-Release number of selected component (if applicable):
Tested with qemu/qemu-img 4.1 rc2 on Fedora 29
Tested with qemu-rhev/qemu-img-rhev on CentOS 7.6

How reproducible:
Always
Note: copying disk depends on the disk content, not all disk fail.


Steps to Reproduce - provisioning - xfs on loop device:

1. Create loop device with 4 sector size:

    losetup -f backing-file --show --sector-size=4096

2. Create xfs file system

    mkfs -t xfs /dev/loop0

3. Mount 

    mkdir /tmp/loop0
    mount /dev/loop0 /tmp/loop0

4. Create new disk

   qemu-img create -f raw /tmp/loop0/disk.img

5. Start a VM:

   qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
   -drive file=/tmp/loop0/disk.img,format=raw,cache=none \
   -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso

6. Try to install with default options.

The installer fails in few seconds when trying to create filesystem on the
root logical volume.


Steps to Reproduce - copying disk from 4k storage - xfs on loop device:

1. Create a new image on 4k storage. One way is to use virt-builder:

    virt-builder fedora-29 -o disk.img

2. Copy the disk to target image elsewhere:

    qemu-img convert -f raw -O raw -t none -T none /tmp/loop0/disk.img \
        disk-clone.img

Will fail with EINVAL when reading from the image:
qemu-img: error while reading sector XXX: Invalid argument


Steps to Reproduce - copying disk to 4k storage - xfs on loop device:

1. Create a new image on 4k storage. One way is to use virt-builder:

    virt-builder fedora-29 -o /tmp/loop0/disk.img

2. Copy the disk to target image on the 4k storage:

    qemu-img convert -f raw -O raw -t none -T none disk.img \
       /tmp/loop0/disk-clone.img

Will fail with EINVAL when writing to target image:
qemu-img: error while writing sector XXX: Invalid argument


Steps to reproduce - gluster/xfs/vdo storage

Creating this storage is more complex.
I reproduced this using 3 vms, deployed using these scripts:
- https://github.com/oVirt/vdsm/blob/master/contrib/deploy-gluster.sh
- https://github.com/oVirt/vdsm/blob/master/contrib/create-vdo-brick.sh
- https://github.com/oVirt/vdsm/blob/master/contrib/create-gluster-volume.sh

You need also to set this gluster volume option:

    gluster volume set gv0 performance.strict-o-direct on

Once all gluster nodes are up, mount the storage:

    mkdir /tmp/gv0
    mount -t glusterfs gluster1:/gv0 /tmp/gv0

Now you can reproduce using the same flows explained above for loop device,
replacing /tmp/loop0 with /tmp/gv0.

Comment 2 Nir Soffer 2019-08-19 18:09:37 UTC
See also bug 1743360 for RHEL 8.2.

Comment 3 Ademar Reis 2019-08-19 18:11:54 UTC
Take

Comment 11 qing.wang 2019-11-06 06:18:43 UTC
Verified on

Version:
Host:
kernel-3.10.0-1107.el7.x86_64
qemu-kvm-rhev-2.12.0-38.el7.x86_64

Guest:
kernel-3.10.0-1062.el7.x86_64

Not found issue.


Scenario 1(Installation)

# fdisk -l /dev/sdc 

Disk /dev/sdc: 599.6 GB, 599550590976 bytes, 146374656 sectors
Units = sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

1. Create a raw image on the xfs with 4k sectors
# mkfs.xfs /dev/sdc
# mount /dev/sdc /mnt/sdc
# qemu-img create -f raw /mnt/sdc/base.img 20G

2. install guest on it
/usr/libexec/qemu-kvm \
        -name 'guest-rhel77' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=/home/kvm_autotest_root/iso/linux/RHEL-7.7-20190723.1-Server-x86_64-dvd1.iso \
        -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
        -device pcie-root-port,id=pcie.0-root-port-9,slot=9,chassis=9,addr=0x9,bus=pcie.0 \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/mnt/sdc/base.img \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pcie.0-root-port-9,addr=0x0,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-8,addr=0x0  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/hucheng/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \

after step 2, install successfully.


Scenario 2(Convert)
1. Create a test image on the xfs with 4k sectors
# dd if=/dev/urandom of=/mnt/sdc/test.img bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 11.5596 s, 186 MB/s

2. Convert the image to the xfs with 512 bytes sectors
# qemu-img convert -f raw -O raw /mnt/sdc/test.img /home/tgt.img -t none -T none -p
    (100.00/100%)

3. Convert back to xfs with 4k sectors
# qemu-img convert -f raw -O raw /home/tgt.img /mnt/sdc/tgt.img -t none -T none -p
    (100.00/100%)

after step 3, not hit any error.


Scenario 3(dd test)
1. Create a test image on the xfs with 4k sectors
qemu-img create -f raw /mnt/sdc/test.raw 1G

2. boot guest with below cmd lines:

/usr/libexec/qemu-kvm \
        -name 'guest-rhel77' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=/home/kvm_autotest_root/iso/linux/RHEL-7.7-20190723.1-Server-x86_64-dvd1.iso \
        -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
        -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/mnt/sdc/base.img \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pcie.0-root-port-5,addr=0x0,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-8,addr=0x0  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \
        -drive id=drive_data,if=none,snapshot=off,cache=none,format=raw,file=/mnt/sdc/test.raw \
        -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
        -device virtio-blk-pci,id=data1,drive=drive_data,bus=pcie.0-root-port-6,addr=0x0 \


3. create partition and format it in guest
# parted /dev/vdb mktable gpt
# parted /dev/vdb mkpart primary xfs "0%" "100%"
# mkfs.xfs /dev/vdb1
# mount /dev/vdb1 /mnt/
# dmesg |grep vdb[  296.209926]  vdb:
[  632.812664]  vdb: vdb1
[  651.834427]  vdb: vdb1
[  714.456776] XFS (vdb1): Mounting V5 Filesystem
[  714.464536] XFS (vdb1): Ending clean mount

4. dd test
# dd if=/dev/zero of=/mnt/test.img bs=512k count=100 oflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 0.254502 s, 206 MB/s

# dd if=/dev/zero of=/mnt/test.img bs=4096k count=100 oflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 1.56952 s, 267 MB/s

# dd if=/mnt/test.img of=/dev/null bs=512k count=100 iflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 0.0557941 s, 940 MB/s

# dd if=/mnt/test.img of=/dev/null bs=4096k count=100 iflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 0.359295 s, 1.2 GB/s

# dmesg |grep error


Scenario 4(Convert: tested with gluster)

gluster volume set gv0 performance.strict-o-direct on
mount.glusterfs gluster-virt-qe-01.lab.eng.pek2.redhat.com:/gv0 /mnt/gluster

1. Create a test image on the xfs with 4k sectors
# dd if=/dev/urandom of=/mnt/gluster/test.img bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 58.3335 s, 36.8 MB/s

2. Convert the image to the xfs with 512 bytes sectors
# qemu-img convert -f raw -O raw /mnt/gluster/test.img /home/tgt.img -t none -T none -p
    (100.00/100%)

3. Convert back to xfs with 4k sectors
# qemu-img convert -f raw -O raw /home/tgt.img /mnt/gluster/tgt.img -t none -T none -p
    (100.00/100%)

after step 3, not hit any error.


Scenario 5(dd test with gluster)

# dd if=/dev/zero of=/mnt/gluster/test.img bs=512k count=100 oflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 4.00484 s, 13.1 MB/s

# dd if=/dev/zero of=/mnt/gluster/test.img bs=4096k count=100 oflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 30.3556 s, 13.8 MB/s

# dd if=/mnt/gluster/test.img of=/dev/null bs=512k count=100 iflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 0.628825 s, 83.4 MB/s

# dd if=/mnt/gluster/test.img of=/dev/null bs=4096k count=100 iflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 3.79064 s, 111 MB/s

# dmesg |grep error

Comment 13 errata-xmlrpc 2020-03-31 14:34:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1216