1743365 – qemu, qemu-img fail to detect alignment with XFS and Gluster/XFS on 4k block device

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1743365 - qemu, qemu-img fail to detect alignment with XFS and Gluster/XFS on 4k block device

Summary: qemu, qemu-img fail to detect alignment with XFS and Gluster/XFS on 4k block ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Hanna Czenczek
QA Contact:	qing.wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1744207 1745443
TreeView+	depends on / blocked

Reported:	2019-08-19 18:04 UTC by Nir Soffer
Modified:	2020-03-31 14:36 UTC (History)
CC List:	8 users (show)
Fixed In Version:	qemu-kvm-rhev-2.12.0-37.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1745443 (view as bug list)
Environment:
Last Closed:	2020-03-31 14:34:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:1216	0	None	None	None	2020-03-31 14:36:59 UTC

Description Nir Soffer 2019-08-19 18:04:03 UTC

Description of problem:

When using storage with sector size of 4k, qemu and qemu-img fail to probe
the alignment requirement for direct I/O, and fail with EINVAL when accesing
storage.

Several flows may fail:
- Provisioning a VM on 4k storage, fail when the installer try to create
filesystems
- Coping disk image from 4k storage, fail when reading from source image
- Copying disk to 4k storage, fail when writing to the target disk

I reproduced the failures with:
- xfs on loop devices using 4k sector size
- gluster backed by xfs, on vdo device (exposing 4k sector size)

The root cause for both issues is alignment probing. The issue was fixed
upstream in this commit:
https://github.com/qemu/qemu/commit/a6b257a08e3d72219f03e461a52152672fec0612

This is the RHEL version of the these Fedora bugs:
- Bug 1737256 - Provisioning VM on 4k gluster storage fails with "Invalid argument" - qemu fail to detect block size
- Bug 1738657 - qemu-img convert fail to read with "Invalid argument" on gluster storage with 4k sector size

I merged both bugs for RHEL since we understand now that both issue are the
same.

Version-Release number of selected component (if applicable):
Tested with qemu/qemu-img 4.1 rc2 on Fedora 29
Tested with qemu-rhev/qemu-img-rhev on CentOS 7.6

How reproducible:
Always
Note: copying disk depends on the disk content, not all disk fail.

Steps to Reproduce - provisioning - xfs on loop device:

1. Create loop device with 4 sector size:

losetup -f backing-file --show --sector-size=4096

2. Create xfs file system

mkfs -t xfs /dev/loop0

3. Mount

mkdir /tmp/loop0
mount /dev/loop0 /tmp/loop0

4. Create new disk

qemu-img create -f raw /tmp/loop0/disk.img

5. Start a VM:

qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
-drive file=/tmp/loop0/disk.img,format=raw,cache=none \
-cdrom Fedora-Server-dvd-x86_64-29-1.2.iso

6. Try to install with default options.

The installer fails in few seconds when trying to create filesystem on the
root logical volume.

Steps to Reproduce - copying disk from 4k storage - xfs on loop device:

1. Create a new image on 4k storage. One way is to use virt-builder:

virt-builder fedora-29 -o disk.img

2. Copy the disk to target image elsewhere:

qemu-img convert -f raw -O raw -t none -T none /tmp/loop0/disk.img \
disk-clone.img

Will fail with EINVAL when reading from the image:
qemu-img: error while reading sector XXX: Invalid argument

Steps to Reproduce - copying disk to 4k storage - xfs on loop device:

1. Create a new image on 4k storage. One way is to use virt-builder:

virt-builder fedora-29 -o /tmp/loop0/disk.img

2. Copy the disk to target image on the 4k storage:

qemu-img convert -f raw -O raw -t none -T none disk.img \
/tmp/loop0/disk-clone.img

Will fail with EINVAL when writing to target image:
qemu-img: error while writing sector XXX: Invalid argument

Steps to reproduce - gluster/xfs/vdo storage

Creating this storage is more complex.
I reproduced this using 3 vms, deployed using these scripts:
- https://github.com/oVirt/vdsm/blob/master/contrib/deploy-gluster.sh
- https://github.com/oVirt/vdsm/blob/master/contrib/create-vdo-brick.sh
- https://github.com/oVirt/vdsm/blob/master/contrib/create-gluster-volume.sh

You need also to set this gluster volume option:

gluster volume set gv0 performance.strict-o-direct on

Once all gluster nodes are up, mount the storage:

mkdir /tmp/gv0
mount -t glusterfs gluster1:/gv0 /tmp/gv0

Now you can reproduce using the same flows explained above for loop device,
replacing /tmp/loop0 with /tmp/gv0.

Comment 2 Nir Soffer 2019-08-19 18:09:37 UTC

See also bug 1743360 for RHEL 8.2.

Comment 3 Ademar Reis 2019-08-19 18:11:54 UTC

Take

Comment 11 qing.wang 2019-11-06 06:18:43 UTC

Verified on

Version:
Host:
kernel-3.10.0-1107.el7.x86_64
qemu-kvm-rhev-2.12.0-38.el7.x86_64

Guest:
kernel-3.10.0-1062.el7.x86_64

Not found issue.


Scenario 1(Installation)

# fdisk -l /dev/sdc 

Disk /dev/sdc: 599.6 GB, 599550590976 bytes, 146374656 sectors
Units = sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

1. Create a raw image on the xfs with 4k sectors
# mkfs.xfs /dev/sdc
# mount /dev/sdc /mnt/sdc
# qemu-img create -f raw /mnt/sdc/base.img 20G

2. install guest on it
/usr/libexec/qemu-kvm \
        -name 'guest-rhel77' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=/home/kvm_autotest_root/iso/linux/RHEL-7.7-20190723.1-Server-x86_64-dvd1.iso \
        -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
        -device pcie-root-port,id=pcie.0-root-port-9,slot=9,chassis=9,addr=0x9,bus=pcie.0 \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/mnt/sdc/base.img \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pcie.0-root-port-9,addr=0x0,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-8,addr=0x0  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/hucheng/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \

after step 2, install successfully.


Scenario 2(Convert)
1. Create a test image on the xfs with 4k sectors
# dd if=/dev/urandom of=/mnt/sdc/test.img bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 11.5596 s, 186 MB/s

2. Convert the image to the xfs with 512 bytes sectors
# qemu-img convert -f raw -O raw /mnt/sdc/test.img /home/tgt.img -t none -T none -p
    (100.00/100%)

3. Convert back to xfs with 4k sectors
# qemu-img convert -f raw -O raw /home/tgt.img /mnt/sdc/tgt.img -t none -T none -p
    (100.00/100%)

after step 3, not hit any error.


Scenario 3(dd test)
1. Create a test image on the xfs with 4k sectors
qemu-img create -f raw /mnt/sdc/test.raw 1G

2. boot guest with below cmd lines:

/usr/libexec/qemu-kvm \
        -name 'guest-rhel77' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=/home/kvm_autotest_root/iso/linux/RHEL-7.7-20190723.1-Server-x86_64-dvd1.iso \
        -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
        -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=raw,file=/mnt/sdc/base.img \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pcie.0-root-port-5,addr=0x0,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-8,addr=0x0  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \
        -drive id=drive_data,if=none,snapshot=off,cache=none,format=raw,file=/mnt/sdc/test.raw \
        -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
        -device virtio-blk-pci,id=data1,drive=drive_data,bus=pcie.0-root-port-6,addr=0x0 \


3. create partition and format it in guest
# parted /dev/vdb mktable gpt
# parted /dev/vdb mkpart primary xfs "0%" "100%"
# mkfs.xfs /dev/vdb1
# mount /dev/vdb1 /mnt/
# dmesg |grep vdb[  296.209926]  vdb:
[  632.812664]  vdb: vdb1
[  651.834427]  vdb: vdb1
[  714.456776] XFS (vdb1): Mounting V5 Filesystem
[  714.464536] XFS (vdb1): Ending clean mount

4. dd test
# dd if=/dev/zero of=/mnt/test.img bs=512k count=100 oflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 0.254502 s, 206 MB/s

# dd if=/dev/zero of=/mnt/test.img bs=4096k count=100 oflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 1.56952 s, 267 MB/s

# dd if=/mnt/test.img of=/dev/null bs=512k count=100 iflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 0.0557941 s, 940 MB/s

# dd if=/mnt/test.img of=/dev/null bs=4096k count=100 iflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 0.359295 s, 1.2 GB/s

# dmesg |grep error


Scenario 4(Convert: tested with gluster)

gluster volume set gv0 performance.strict-o-direct on
mount.glusterfs gluster-virt-qe-01.lab.eng.pek2.redhat.com:/gv0 /mnt/gluster

1. Create a test image on the xfs with 4k sectors
# dd if=/dev/urandom of=/mnt/gluster/test.img bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 58.3335 s, 36.8 MB/s

2. Convert the image to the xfs with 512 bytes sectors
# qemu-img convert -f raw -O raw /mnt/gluster/test.img /home/tgt.img -t none -T none -p
    (100.00/100%)

3. Convert back to xfs with 4k sectors
# qemu-img convert -f raw -O raw /home/tgt.img /mnt/gluster/tgt.img -t none -T none -p
    (100.00/100%)

after step 3, not hit any error.


Scenario 5(dd test with gluster)

# dd if=/dev/zero of=/mnt/gluster/test.img bs=512k count=100 oflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 4.00484 s, 13.1 MB/s

# dd if=/dev/zero of=/mnt/gluster/test.img bs=4096k count=100 oflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 30.3556 s, 13.8 MB/s

# dd if=/mnt/gluster/test.img of=/dev/null bs=512k count=100 iflag=direct
100+0 records in
100+0 records out
52428800 bytes (52 MB) copied, 0.628825 s, 83.4 MB/s

# dd if=/mnt/gluster/test.img of=/dev/null bs=4096k count=100 iflag=direct
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 3.79064 s, 111 MB/s

# dmesg |grep error

Comment 13 errata-xmlrpc 2020-03-31 14:34:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1216

Note You need to log in before you can comment on or make changes to this bug.