Bug 1891673 - Unable to create qcow2 image via qemu-img command on Windows 2019 based NFS storage pool
Summary: Unable to create qcow2 image via qemu-img command on Windows 2019 based NFS s...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: 8.3
Assignee: Hanna Czenczek
QA Contact: Xueqiang Wei
URL:
Whiteboard:
: 1892185 (view as bug list)
Depends On:
Blocks: 1892185
TreeView+ depends on / blocked
 
Reported: 2020-10-27 03:28 UTC by SUNYONG PARK
Modified: 2021-03-22 15:20 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1892185 (view as bug list)
Environment:
Last Closed: 2021-03-22 15:19:59 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description SUNYONG PARK 2020-10-27 03:28:54 UTC
Description of problem:

qemu-img-ev-2.12.0-44.1.el7_8.1.x86_64 : 
Unable to create qcow2 image via qemu-img command on Windows 2019 based NFS storage pool.

However, qemu-img-ev-2.9.0-16.el7_4.14.1 works normally.

Version-Release number of selected component (if applicable):
2.12.0-44.1

How reproducible:
Always

Steps to Reproduce:
1. On Windows 2019 Server OS, build NFS storage using Storage Share service.
2. Mount the built NFS storage to the qemu-kvm-ev operation server.
3. Using the command "qemu-img create -f qcow2 test.qcow2 1G", test the creation of 1GB of image.

Actual results:
Formatting 'test.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-img: test.qcow2: Image is not in qcow2 format


Expected results:
will be create "test.qcow2" image of 1GB

Additional info:
Tested on version 2.9.0 of qemu-kvm-ev based on the same NFS storage, it works fine.

Comment 2 Xueqiang Wei 2020-10-27 16:24:29 UTC
Tested with qemu-kvm-rhev-2.9.0-16.el7_4.14, not hit this issue.

Versions:
kernel-3.10.0-1127.18.2.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.14


Steps:
# mount -t nfs 10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/

# qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16

# qemu-img info /home/win_nfs_test/test.qcow2 
image: /home/win_nfs_test/test.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false



Reproduced it with qemu-kvm-rhev-2.12.0-44.el7_8.1.

Versions:
kernel-3.10.0-1127.18.2.el7.x86_64
qemu-kvm-rhev-2.12.0-44.el7_8.1


Steps:
# mount -t nfs 10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/

# rm -rf /home/win_nfs_test/test.qcow2
# qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-img: /home/win_nfs_test/test.qcow2: Image is not in qcow2 format

# qemu-img info /home/win_nfs_test/test.qcow2 
image: /home/win_nfs_test/test.qcow2
file format: raw
virtual size: 0 (0 bytes)
disk size: 0

# qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-img: /home/win_nfs_test/test.qcow2: Failed to get "resize" lock
Is another process using the image [/home/win_nfs_test/test.qcow2]?



Tested with raw format, also hit this issue.

# qemu-img create -f raw /home/win_nfs_test/test.raw 1G
Formatting '/home/win_nfs_test/test.raw', fmt=raw size=1073741824
qemu-img: /home/win_nfs_test/test.raw: Failed to get "resize" lock
Is another process using the image [/home/win_nfs_test/test.raw]?

# qemu-img info /home/win_nfs_test/test.raw 
image: /home/win_nfs_test/test.raw
file format: raw
virtual size: 0 (0 bytes)
disk size: 0

Comment 4 Ademar Reis 2020-11-05 14:26:57 UTC
This happens because QEMU introduced support for locking in qcow2 (to prevent an image from being open by multiple apps or users) and this server doesn't seem to have locking support or it's not enabled.

Max: the error message is confusing though, see inline comments:


(In reply to Xueqiang Wei from comment #2)
> Reproduced it with qemu-kvm-rhev-2.12.0-44.el7_8.1.
> 
> Versions:
> kernel-3.10.0-1127.18.2.el7.x86_64
> qemu-kvm-rhev-2.12.0-44.el7_8.1
> 
> 
> Steps:
> # mount -t nfs 10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/
> 
> # rm -rf /home/win_nfs_test/test.qcow2
> # qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
> Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> qemu-img: /home/win_nfs_test/test.qcow2: Image is not in qcow2 format

"Image is not in qcow2 format" doesn't seem to be right.

> 
> # qemu-img info /home/win_nfs_test/test.qcow2 
> image: /home/win_nfs_test/test.qcow2
> file format: raw
> virtual size: 0 (0 bytes)
> disk size: 0

Apparently the image was created, but empty.

> 
> # qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
> Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> qemu-img: /home/win_nfs_test/test.qcow2: Failed to get "resize" lock
> Is another process using the image [/home/win_nfs_test/test.qcow2]?

This gives us a good hint. I wonder if there's a way to differentiate between fail if locking is not supported vs fail when the lock is held.

> 
> Tested with raw format, also hit this issue.
> 
> # qemu-img create -f raw /home/win_nfs_test/test.raw 1G
> Formatting '/home/win_nfs_test/test.raw', fmt=raw size=1073741824
> qemu-img: /home/win_nfs_test/test.raw: Failed to get "resize" lock
> Is another process using the image [/home/win_nfs_test/test.raw]?
> 
> # qemu-img info /home/win_nfs_test/test.raw 
> image: /home/win_nfs_test/test.raw
> file format: raw
> virtual size: 0 (0 bytes)
> disk size: 0

For Max:
 * Any additional hints or advice for what to enable in the NFS server to enable locking?
 * This BZ is in RHEL-7, with QEMU-2.12. How is this handled in recent QEMU versions? If things are OK in RHEL-8, we can close this BZ, as support for locking in the NFS server is a requirement to have this working.


For the reporter, SUNYONG PARK:

We use reports like yours to keep improving the quality of our products and releases. That said, we're not able to guarantee the timeliness or suitability of a resolution for issues entered here because this is not a mechanism for requesting support.

If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization that will result in a timely resolution.

For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto

Comment 5 Hanna Czenczek 2020-11-10 16:39:53 UTC
Seeing that this is about a Windows-based NFS server (i.e. not the Linux NFS server), I wonder whether this is a duplicate of BZ 1817640 (where a Dell/EMC NFS server was used).

I don’t think the problem is locking support, because if I deliberately break all file locking calls in qemu, this is the result:

$ ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 64M
Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-img: /mnt/tmp/foo.qcow2: Failed to lock byte 101

What would help is an strace of qemu-img create (i.e. "strace -f -o log-file qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G").

For BZ 1817640, what "helped" was to revert commit e56693b6f482bf – but that is by no means anything that we should do.  It just showed that there was most likely a bug in the NFS server.  So to see whether this is the same bug, I’ve again (like in BZ 1817640 comment 20) created a build of 2.12.0-44.el7 with a revert of e56693b6f482bf on top:

http://brew-task-repos.usersys.redhat.com/repos/scratch/mreitz/qemu-kvm-rhev/2.12.0/44.el7.maxx202011101709/

For example, the x86-64 RPM of qemu-img is here: http://brew-task-repos.usersys.redhat.com/repos/scratch/mreitz/qemu-kvm-rhev/2.12.0/44.el7.maxx202011101709/x86_64/qemu-img-rhev-2.12.0-44.el7.maxx202011101709.x86_64.rpm

It would be good if someone who can reproduce the failure could test this qemu-img version.

As for NFS options to work around the issue, the only thing I can imagine would be to provide the "nolock" option to mount (this will make the client kernel keep locks local on the client instead of syncing them over NFS, so NFS support for locks becomes irrelevant).  I don't know whether that will help, though.

Max

Comment 6 Hanna Czenczek 2020-11-10 16:45:02 UTC
(I noticed that I should probably make the possible next steps more visible, so) this is what would be great if someone who can reproduce the failure tried:

(1) "strace -f -o log-file qemu-img create -f qcow2 /path/to/mount/test.qcow2 1G" and attach the strace log here or upload it somewhere and link it here (using a qemu-img version that fails to create that image);
(2) Install the qemu-img RPM linked in comment 5 and test whether that shows the same error;
(3) Try whether passing "nolock" to the NFS mount allows qemu-img to create the image.

Max

Comment 7 Xueqiang Wei 2020-11-12 05:55:09 UTC
(In reply to Max Reitz from comment #6)
> (I noticed that I should probably make the possible next steps more visible,
> so) this is what would be great if someone who can reproduce the failure
> tried:
> 
> (1) "strace -f -o log-file qemu-img create -f qcow2
> /path/to/mount/test.qcow2 1G" and attach the strace log here or upload it
> somewhere and link it here (using a qemu-img version that fails to create
> that image);
> (2) Install the qemu-img RPM linked in comment 5 and test whether that shows
> the same error;
> (3) Try whether passing "nolock" to the NFS mount allows qemu-img to create
> the image.
> 
> Max


(1) Failed to create image with qemu-img
Versions:
kernel-3.10.0-1160.6.1.el7.x86_64
qemu-kvm-rhev-2.12.0-44.el7_8.1

# mount -t nfs 10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/
# strace -f -o log-file-0 qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
# strace -f -o log-file-1 qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G

strace log:
http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xuwei/win2019_nfs_server/log-file-0
http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xuwei/win2019_nfs_server/log-file-1


(2) Tested it with qemu-img-rhev-2.12.0-44.el7.maxx202011101709, also hit this issue.
Versions:
kernel-3.10.0-1160.6.1.el7.x86_64
qemu-kvm-rhev-2.12.0-44.el7.maxx202011101709
qemu-img-rhev-2.12.0-44.el7.maxx202011101709

# mount -t nfs 10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/

# strace -f -o log-file-3 qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-img: /home/win_nfs_test/test.qcow2: Failed to get "resize" lock
Is another process using the image [/home/win_nfs_test/test.qcow2]?

# qemu-img info /home/win_nfs_test/test.qcow2 
image: /home/win_nfs_test/test.qcow2
file format: raw
virtual size: 0 (0 bytes)
disk size: 0

strace log:
http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xuwei/win2019_nfs_server/log-file-3



Tested with nolock, it works well.
# mount -t nfs -o nolock  10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/

# strace -f -o log-file-4 qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16

# qemu-img info /home/win_nfs_test/test.qcow2 
image: /home/win_nfs_test/test.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

strace log:
http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xuwei/win2019_nfs_server/log-file-4



(3) try with nolock, create image successfully with qemu-img
Versions:
kernel-3.10.0-1160.6.1.el7.x86_64
qemu-kvm-rhev-2.12.0-44.el7_8.1

# mount -t nfs -o nolock  10.73.75.104:/E/Shares/win2019_nfs_server /home/win_nfs_test/

# strace -f -o log-file-2 qemu-img create -f qcow2 /home/win_nfs_test/test.qcow2 1G
Formatting '/home/win_nfs_test/test.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16

# qemu-img info /home/win_nfs_test/test.qcow2 
image: /home/win_nfs_test/test.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

strace log:
http://fileshare.englab.nay.redhat.com/pub/section2/kvm/xuwei/win2019_nfs_server/log-file-2

Comment 8 Hanna Czenczek 2020-11-12 13:37:53 UTC
Hi,

Thanks a lot for testing.  log-file-0 shows the same thing we saw in BZ 1817640:

21323 pwrite64(10, "QFI\373\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\0"..., 65536, 0 <unfinished ...>
21323 <... pwrite64 resumed>)           = 65536
21323 pwrite64(10, "\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 65536 <unfinished ...>
21323 <... pwrite64 resumed>)           = 131072
[Some locking]
21323 pread64(10, "", 104, 0)           = 0

So qemu-img writes the image header into the file, then tries to read it back (after some locking operations), and it is suddenly no longer there.  That doesn’t look like qemu’s fault.

Since it’s the same pattern as in BZ 1817640, I would have expected the revert to have an effect, but it doesn’t.  What seems to be the problem is this:

22835 fcntl(10, F_OFD_GETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=3548955689268246127, l_len=0, l_pid=18446744073709551615}) = 0

Comparing with another strace where this GETLK succeeded (e.g. log-file-0), we can see that qemu tried to inquire about offset 203, which is the reported “resize” lock.

As seen in the strace, the kernel returns some weird l_start (which is actually a string that reads “1@lenovo”; in log-file-1, it’s “0@lenovo”), an l_pid of -1 (which is OK), and a length of 0.

So apparently the kernel is of the opinion that there indeed is a conflicting lock on the file, although it’s in a very different position and so shouldn’t interfere with qemu’s lock at all.

Since the error disappears with the nolock mount option, my best guess is that there is some other software (the strings “x@lenovo” may be a clue) on the NFS server or on some other client connected to the server (or perhaps it’s the NFS server itself) that immediately after file creation take a lock on the image file and so prevent qemu-img from taking its locks.  That race sometimes results in such a failure, then the error message “Failed to get "resize" lock” appears; or it doesn’t, then BZ 1817640 becomes apparent with the error message “Image is not in qcow2 format”.

Why the “x@lenovo” locks interfere at all, I don’t know, because there l_start (probably a symbolic constant) and l_len of 0 mean that they don’t intersect with qemu’s locks.  Perhaps that’s a problem in the NFS server, too.


So, summarizing:

As far as I can see, there are two problems, and none of these seem to be qemu’s fault, as far as I can tell.

First, there is something that takes a file lock on the newly created image file at some symbolic offset that spells the strings “0@lenovo” or “1@lenovo”.  This races with qemu’s locks (even though it shouldn’t) and so sometimes qemu reports “Failed to get "resize" lock”.  Note that this is not part of the original report in comment 0, only of the reproduction in comment 2.  It seems like there is some concurrently running software that takes locks on the newly created image file and thus interferes with qemu-img.

Second, if that race doesn’t result in an error, we get the same behavior as in BZ 1817640: qemu-img writes the qcow2 image header, performs some file locking operations, and then the image header has suddenly disappeared.  qemu-img then reports “Image is not in qcow2 format”, which is the original report in comment 0.


(As such, I would expect the RPM provided in comment 5 to sometimes succeed in creating an image.  When it fails, the error should always be “Failed to get (some) lock”, but never “Image is not in qcow2 format”.)

Max

Comment 9 Ademar Reis 2020-11-12 14:42:29 UTC
*** Bug 1892185 has been marked as a duplicate of this bug. ***

Comment 10 Ademar Reis 2020-11-12 14:44:59 UTC
Moving to RHEL-8-AV, as we do not intend to fix this in RHEL-7, which is EUS.

SUNYONG PARK: 

If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization that will result in a timely resolution.

For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto(In reply to Xueqiang Wei from comment #7)

Comment 12 Hanna Czenczek 2021-03-22 15:19:59 UTC
As laid out in comment 8, I can find no bug in qemu, therefore I’m closing this.

Max


Note You need to log in before you can comment on or make changes to this bug.