Description of problem: Converting a raw image from gluster storage with 4k sector size to same storage fails: $ strace -f -tt -o /tmp/convert.trace qemu-img convert -f raw -O raw -t none -T none fedora29.raw fedora29-clone.raw qemu-img: error while reading sector 4190208: Invalid argument For info on how to create the gluster storage, see bug 1737256. Version-Release number of selected component (if applicable): qemu-img-4.1.0-0.1.rc2.fc29.x86_64 (from virt-preview repo) How reproducible: Always - depending on the image. Steps to Reproduce: 1. Provision a VM on gluster storage with 4k sector size (using https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html) 2. Try to clone the image using qemu-img Looking at the trace show: 1. Open the source image: 18703 20:14:58.755169 openat(AT_FDCWD, "fedora29.raw", O_RDONLY|O_DIRECT|O_CLOEXEC) = 9 18703 20:14:58.755292 fstat(9, {st_mode=S_IFREG|0644, st_size=6442450944, ...}) = 0 18703 20:14:58.755337 lseek(9, 0, SEEK_END) = 6442450944 2. Try to detect buffer alignment by reading from buf + 512 18703 20:14:58.755503 pread64(9, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 This succeeds, since on Gluster there is no alignment requirement for buffers. The buffer used to do direct I/O is allocated on the Gluster node. But the actual block size on this storage is 4096 bytes. This is basically same issue as in bug 1737256. 3. Trying to detect request alignment by reading 512, 1024, 2048, 4096 bytes into aligned buffer: 18703 20:14:58.756835 pread64(9, 0x5562768ef200, 512, 0) = -1 EINVAL (Invalid argument) 18703 20:14:58.758469 pread64(9, 0x5562768ef200, 1024, 0) = -1 EINVAL (Invalid argument) 18703 20:14:58.760132 pread64(9, 0x5562768ef200, 2048, 0) = -1 EINVAL (Invalid argument) 18703 20:14:58.761631 pread64(9, <unfinished ...> 18704 20:14:58.762213 <... nanosleep resumed>0x7fdde6e20a70) = 0 18704 20:14:58.762249 futex(0x5562752ea508, FUTEX_WAIT, 4294967295, NULL <unfinished ...> 18703 20:14:58.762530 <... pread64 resumed>"\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 As expected, request alignment of 4096 bytes was detected. 4. Open target image (creating it) 18703 20:14:58.778105 openat(AT_FDCWD, "fedora29-clone.raw", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 10 5. Truncate target image to 0 and then to virtual size 18706 20:14:58.885829 ftruncate(10, 0) = 0 ... 18706 20:14:58.887700 ftruncate(10, 6442450944) = 0 6. Try to detect block alignment on target image 18703 20:14:58.933049 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096 This wrongly detect buffer alignment of 512 bytes like step 2. 7. Try to detect request alignment 18703 20:14:58.936354 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 This wrongly detect 512 because when using xfs (used on gluster backed), reading from unallocated file succceeds even with unaligned request length (xfs probably optimize read when block is not allocated). So now we have: source: buf_align = 512 request_alignment = 4096 min_mem_alignment = 512 <<< uses buf_align target: buf_align = 512 request_alignment = 512 min_mem_alignment = 512 We can argue that buf_align = 512 is correct, since Gluster does not require any alignment - but qemu uses buf_align to checking iov_len 8. qemu fails to read 512 bytes from source image 18709 20:15:09.670293 preadv(9, [{iov_base=0x7fdde5013000, iov_len=2096640}, {iov_base=0x556276902000, iov_len=512}], 2, ) = -1 EINVAL (Invalid argument) This is expected, since the underlying storage has sector size of 4096. I tried the same flow using qemu-img with patch: https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html $ strace -f -tt -o /tmp/convert-patched.trace /home/nsoffer/src/qemu/build/qemu-img convert -f raw -O raw -t none -T none fedora29.raw fedora29-clone.raw qemu-img: error while writing sector 4194303: Invalid argument This fixes the read failure, but fail to write to the target. Looking at the new trace show: 1. Same flow for opening the source image and detecting block size 22894 20:49:00.487962 openat(AT_FDCWD, "fedora29.raw", O_RDONLY|O_DIRECT|O_CLOEXEC) = 9 ... 22894 20:49:00.488662 pread64(9, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 22894 20:49:00.489776 pread64(9, 0x5615d6044400, 512, 0) = -1 EINVAL (Invalid argument) 22894 20:49:00.491139 pread64(9, 0x5615d6044400, 1024, 0) = -1 EINVAL (Invalid argument) 22894 20:49:00.492392 pread64(9, 0x5615d6044400, 2048, 0) = -1 EINVAL (Invalid argument) 22894 20:49:00.493647 pread64(9, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096 2. Same flow for opening the target and detecting block size 22894 20:49:00.506364 openat(AT_FDCWD, "fedora29-clone.raw", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 10 ... 22898 20:49:00.518228 ftruncate(10, 0) = 0 ... 22898 20:49:00.518545 ftruncate(10, 6442450944) = 0 ... 22894 20:49:00.529136 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096 22894 20:49:00.529664 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 source: buf_align = 512 request_alignment = 4096 min_mem_alignment = 4096 <<< uses max(buf_align, request_alignment) target: buf_align = 512 request_alignment = 512 min_mem_alignment = 512 3. The read we failed before succeeds now 22912 20:49:11.086632 pread64(9, "6\317M5i?\332Qm\231\342\253\205+\303@ht\335\341\353\300\331\201\314\nb]3R\262\377"..., 2097152, 2145386496) = 2097152 4. Writing to target fails with EINVAL 22911 20:49:11.327101 pwrite64(10, "\246\243\366\4)U(\360\334^\345\277'\313\261\\\211a\302!P\360\300\263\222\2303\365\201\377\223/"..., 2097152, 2147483136 <unfinished ...> 22911 20:49:11.368238 <... pwrite64 resumed>) = -1 EINVAL (Invalid argument) The write fails because 2147483136 is not aligned the block size: >>> 2147483136 % 4096 3584 As a workaround, I tried to fix target image block size detection by using existing file with the fist block allocated. (management system can easily ensure this) $ rm fedora29-clone.raw $ truncate -s 6g fedora29-clone.raw $ dd if=/dev/zero bs=4096 count=1 of=fedora29-clone.raw conv=notrunc 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.016563 s, 247 kB/s $ ls -lhs fedora29-clone.raw 4.0K -rw-rw-r--. 1 nsoffer nsoffer 6.0G Aug 7 21:05 fedora29-clone.raw And now the command succeeded: $ strace -f -tt -o /tmp/convert-patched-nocreate.trace /home/nsoffer/src/qemu/build/qemu-img convert -n -f raw -O raw -t none -T none fedora29.raw fedora29-clone.raw Looking in the trace we can see that now that request alignment is detected correctly: 25185 21:06:42.432081 openat(AT_FDCWD, "fedora29-clone.raw", O_RDWR|O_DIRECT|O_CLOEXEC) = 10 ... 25185 21:06:42.432841 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096 25185 21:06:42.434180 pread64(10, 0x555901256a00, 512, 0) = -1 EINVAL (Invalid argument) 25185 21:06:42.435651 pread64(10, 0x555901256a00, 1024, 0) = -1 EINVAL (Invalid argument) 25185 21:06:42.437179 pread64(10, 0x555901256a00, 2048, 0) = -1 EINVAL (Invalid argument) 25185 21:06:42.438611 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096 So we have 2 issues: - Incorrect use of buf_align to verify that preadv() arguments are aligned (same as bug 1737256) - Incorrect detection of target request alignment with unallocated file. The second issue can be fixed by using existing file with -n argument. I think this is good enough workaround for management system, but users will fail in this trap. Since qemu control the target file, it can allocate one block of zeros before detecting the block size. I did not try gluster:// since it failed for bug 1737256, and libgfapi is not high priority for oVirt now.
Created attachment 1601638 [details] Traces from commands mentioned in comment 0
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'.
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to 31.
Patch posted for review: https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html
Fixed upstream in: https://github.com/qemu/qemu/commit/a6b257a08e3d72219f03e461a52152672fec0612
FEDORA-2019-6a014367d5 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-6a014367d5
qemu-4.1.0-2.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-6a014367d5
qemu-4.1.0-2.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.