Bug 1738657 - qemu-img convert fail to read with "Invalid argument" on gluster storage with 4k sector size
Summary: qemu-img convert fail to read with "Invalid argument" on gluster storage with...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1592916
TreeView+ depends on / blocked
 
Reported: 2019-08-07 18:35 UTC by Nir Soffer
Modified: 2019-09-28 00:01 UTC (History)
10 users (show)

Fixed In Version: qemu-4.1.0-2.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-28 00:01:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Traces from commands mentioned in comment 0 (1.01 MB, application/x-xz)
2019-08-07 23:24 UTC, Nir Soffer
no flags Details

Description Nir Soffer 2019-08-07 18:35:03 UTC
Description of problem:

Converting a raw image from gluster storage with 4k sector size to same storage 
fails:

$ strace -f -tt -o /tmp/convert.trace qemu-img convert -f raw -O raw -t none -T none fedora29.raw fedora29-clone.raw
qemu-img: error while reading sector 4190208: Invalid argument

For info on how to create the gluster storage, see bug 1737256.


Version-Release number of selected component (if applicable):
qemu-img-4.1.0-0.1.rc2.fc29.x86_64 (from virt-preview repo)


How reproducible:
Always - depending on the image.


Steps to Reproduce:

1. Provision a VM on gluster storage with 4k sector size
   (using https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html)

2. Try to clone the image using qemu-img


Looking at the trace show:

1. Open the source image:

18703 20:14:58.755169 openat(AT_FDCWD, "fedora29.raw", O_RDONLY|O_DIRECT|O_CLOEXEC) = 9
18703 20:14:58.755292 fstat(9, {st_mode=S_IFREG|0644, st_size=6442450944, ...}) = 0
18703 20:14:58.755337 lseek(9, 0, SEEK_END) = 6442450944

2. Try to detect buffer alignment by reading from buf + 512

18703 20:14:58.755503 pread64(9, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096

This succeeds, since on Gluster there is no alignment requirement for buffers.
The buffer used to do direct I/O is allocated on the Gluster node.

But the actual block size on this storage is 4096 bytes.
This is basically same issue as in bug 1737256.

3. Trying to detect request alignment by reading 512, 1024, 2048, 4096 bytes
   into aligned buffer:

18703 20:14:58.756835 pread64(9, 0x5562768ef200, 512, 0) = -1 EINVAL (Invalid argument)
18703 20:14:58.758469 pread64(9, 0x5562768ef200, 1024, 0) = -1 EINVAL (Invalid argument)
18703 20:14:58.760132 pread64(9, 0x5562768ef200, 2048, 0) = -1 EINVAL (Invalid argument)
18703 20:14:58.761631 pread64(9,  <unfinished ...>
18704 20:14:58.762213 <... nanosleep resumed>0x7fdde6e20a70) = 0
18704 20:14:58.762249 futex(0x5562752ea508, FUTEX_WAIT, 4294967295, NULL <unfinished ...>
18703 20:14:58.762530 <... pread64 resumed>"\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096

As expected, request alignment of 4096 bytes was detected.

4. Open target image (creating it)

18703 20:14:58.778105 openat(AT_FDCWD, "fedora29-clone.raw", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 10

5. Truncate target image to 0 and then to virtual size

18706 20:14:58.885829 ftruncate(10, 0)  = 0
...
18706 20:14:58.887700 ftruncate(10, 6442450944) = 0

6. Try to detect block alignment on target image

18703 20:14:58.933049 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096

This wrongly detect buffer alignment of 512 bytes like step 2.

7. Try to detect request alignment

18703 20:14:58.936354 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512

This wrongly detect 512 because when using xfs (used on gluster backed),
reading from unallocated file succceeds even with unaligned request
length (xfs probably optimize read when block is not allocated).

So now we have:

source:
    buf_align = 512
    request_alignment = 4096
    min_mem_alignment = 512    <<< uses buf_align
target:
    buf_align = 512
    request_alignment = 512
    min_mem_alignment = 512

We can argue that buf_align = 512 is correct, since Gluster does not require
any alignment - but qemu uses buf_align to checking iov_len

8. qemu fails to read 512 bytes from source image

18709 20:15:09.670293 preadv(9, [{iov_base=0x7fdde5013000, iov_len=2096640}, {iov_base=0x556276902000, iov_len=512}], 2, ) = -1 EINVAL (Invalid argument)

This is expected, since the underlying storage has sector size of 4096.


I tried the same flow using qemu-img with patch:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html


$ strace -f -tt -o /tmp/convert-patched.trace /home/nsoffer/src/qemu/build/qemu-img convert -f raw -O raw -t none -T none fedora29.raw fedora29-clone.raw
qemu-img: error while writing sector 4194303: Invalid argument

This fixes the read failure, but fail to write to the target.


Looking at the new trace show:

1. Same flow for opening the source image and detecting block size

22894 20:49:00.487962 openat(AT_FDCWD, "fedora29.raw", O_RDONLY|O_DIRECT|O_CLOEXEC) = 9
...
22894 20:49:00.488662 pread64(9, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096
22894 20:49:00.489776 pread64(9, 0x5615d6044400, 512, 0) = -1 EINVAL (Invalid argument)
22894 20:49:00.491139 pread64(9, 0x5615d6044400, 1024, 0) = -1 EINVAL (Invalid argument)
22894 20:49:00.492392 pread64(9, 0x5615d6044400, 2048, 0) = -1 EINVAL (Invalid argument)
22894 20:49:00.493647 pread64(9, "\353c\220\20\216\320\274\0\260\270\0\0\216\330\216\300\373\276\0|\277\0\6\271\0\2\363\244\352!\6\0"..., 4096, 0) = 4096

2. Same flow for opening the target and detecting block size

22894 20:49:00.506364 openat(AT_FDCWD, "fedora29-clone.raw", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 10
...
22898 20:49:00.518228 ftruncate(10, 0)  = 0
...
22898 20:49:00.518545 ftruncate(10, 6442450944) = 0
...
22894 20:49:00.529136 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
22894 20:49:00.529664 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512

source:
    buf_align = 512
    request_alignment = 4096
    min_mem_alignment = 4096    <<< uses max(buf_align, request_alignment)
target:
    buf_align = 512
    request_alignment = 512
    min_mem_alignment = 512


3. The read we failed before succeeds now

22912 20:49:11.086632 pread64(9, "6\317M5i?\332Qm\231\342\253\205+\303@ht\335\341\353\300\331\201\314\nb]3R\262\377"..., 2097152, 2145386496) = 2097152

4. Writing to target fails with EINVAL


22911 20:49:11.327101 pwrite64(10, "\246\243\366\4)U(\360\334^\345\277'\313\261\\\211a\302!P\360\300\263\222\2303\365\201\377\223/"..., 2097152, 2147483136 <unfinished ...>
22911 20:49:11.368238 <... pwrite64 resumed>) = -1 EINVAL (Invalid argument)

The write fails because 2147483136 is not aligned the block size:

>>> 2147483136 % 4096
3584


As a workaround, I tried to fix target image block size detection by using
existing file with the fist block allocated.
(management system can easily ensure this)

$ rm fedora29-clone.raw 

$ truncate -s 6g fedora29-clone.raw

$ dd if=/dev/zero bs=4096 count=1 of=fedora29-clone.raw conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.016563 s, 247 kB/s

$ ls -lhs fedora29-clone.raw 
4.0K -rw-rw-r--. 1 nsoffer nsoffer 6.0G Aug  7 21:05 fedora29-clone.raw

And now the command succeeded:

$ strace -f -tt -o /tmp/convert-patched-nocreate.trace /home/nsoffer/src/qemu/build/qemu-img convert -n -f raw -O raw -t none -T none fedora29.raw fedora29-clone.raw

Looking in the trace we can see that now that request alignment is detected correctly:

25185 21:06:42.432081 openat(AT_FDCWD, "fedora29-clone.raw", O_RDWR|O_DIRECT|O_CLOEXEC) = 10
...
25185 21:06:42.432841 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
25185 21:06:42.434180 pread64(10, 0x555901256a00, 512, 0) = -1 EINVAL (Invalid argument)
25185 21:06:42.435651 pread64(10, 0x555901256a00, 1024, 0) = -1 EINVAL (Invalid argument)
25185 21:06:42.437179 pread64(10, 0x555901256a00, 2048, 0) = -1 EINVAL (Invalid argument)
25185 21:06:42.438611 pread64(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096


So we have 2 issues:
- Incorrect use of buf_align to verify that preadv() arguments are aligned
  (same as bug 1737256)
- Incorrect detection of target request alignment with unallocated file.

The second issue can be fixed by using existing file with -n argument.
I think this is good enough workaround for management system, but users
will fail in this trap.

Since qemu control the target file, it can allocate one block of zeros
before detecting the block size.

I did not try gluster:// since it failed for bug 1737256, and libgfapi is
not high priority for oVirt now.

Comment 1 Nir Soffer 2019-08-07 23:24:30 UTC
Created attachment 1601638 [details]
Traces from commands mentioned in comment 0

Comment 2 Ben Cotton 2019-08-13 16:53:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 3 Ben Cotton 2019-08-13 17:33:59 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 4 Nir Soffer 2019-08-13 18:34:06 UTC
Patch posted for review:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html

Comment 5 Nir Soffer 2019-08-17 00:18:57 UTC
Fixed upstream in:
https://github.com/qemu/qemu/commit/a6b257a08e3d72219f03e461a52152672fec0612

Comment 6 Fedora Update System 2019-09-04 20:15:50 UTC
FEDORA-2019-6a014367d5 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-6a014367d5

Comment 7 Fedora Update System 2019-09-04 21:35:36 UTC
qemu-4.1.0-2.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-6a014367d5

Comment 8 Fedora Update System 2019-09-28 00:01:39 UTC
qemu-4.1.0-2.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.