Bug 1519242

Summary: Provide a way to supply file.locking=off to qemu
Product: Red Hat Enterprise Linux 9 Reporter: Richard W.M. Jones <rjones>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DEFERRED QA Contact: Han Han <hhan>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: dyuan, imomin, jomurphy, jsuchane, kchamart, kwolf, lmen, mkalinin, pkrempa, ptoscano, rjones, tzheng, virt-maint, xuzhang, yafu
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-05 15:03:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269, 1809703, 1897025    

Description Richard W.M. Jones 2017-11-30 13:42:51 UTC
Description of problem:

For monitoring existing VMs which may be running, and may be
using qcow2, we need a way to bypass QEMU's locking completely.
(Of course this will only be done read-only, and garbage data
is to be expected and is handled by the client retrying).

NB this is a regression over RHEL 7.4.

Version-Release number of selected component (if applicable):

libvirt in RHEL 7.5

Additional info:

See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1518517#c12

Comment 3 Peter Krempa 2018-01-31 11:57:39 UTC
Since I was asked by multiple parties whether this still is necessary I've investigated the interactions of libguestfs/qemu/libvirt in cases where libguestfs is asked to operate on a image file which is currently in use by a qemu process for read-write.

I've tested this by creating an image file and starting a libvirt VM using it and then attempting experiments with guestfish to access the image.

I've tested the following versions of software:
libguestfs-1.37.36-1.fc27.x86_64
libguestfs-tools-c-1.37.36-1.fc27.x86_64
git version of qemu (qemu-img-2.10.1-2.fc27.x86_64 does not want to create the overlay on an used image ... )

With the version above I tried to use the following commandline "guestfish --ro -v -x -a /var/lib/libvirt/images/upstream.qcow2":

First failure I've got from the usage of qemu-img in libguestfs:

libguestfs: creating COW overlay to protect original drive content
libguestfs: trace: disk_format "/var/lib/libvirt/images/upstream.qcow2"
libguestfs: command: run: qemu-img
libguestfs: command: run: \ info
libguestfs: command: run: \ --output json
libguestfs: command: run: \ /var/lib/libvirt/images/upstream.qcow2
qemu-img: Could not open '/var/lib/libvirt/images/upstream.qcow2': Failed to get shared "write" lock
Is another process using the image?
libguestfs: error: qemu-img info: /var/lib/libvirt/images/upstream.qcow2: qemu-img info exited with error status 1, see debug messages above

It looks like libguestfs is not using the '--force-share' switch for info when doing this which results in the failure. I've installed a shim script which injects --force-share for any qemu-img info call and then I've got the following results:

For the direct libguestfs qemu backend the command now worked as expected, since libguestfs has a hack installed which disables locking for the second image in the backing chain:
     drv->src.protocol == drive_protocol_file) {                                 
       append_list_format ("file.file.filename=%s", drv->overlay);               
       append_list ("file.driver=qcow2");                                        
       append_list ("file.backing.file.locking=off");           

For the libvirt backend I've got the expected locking failure when trying to start qemu process since we did not do anythin with it yet. This means that this BZ is necessary to make the above work with the libvirt backend.

I've also tried using the '-d' switch which uses all disks from a given libvirt domain and since it queries the disk image format from the XML I did not get the qemu-img failure, but then it behaved the same as above.

Possible solutions:

The currently only viable solution for libvirt is to use the 'force-share' attribute similarly as qemu-img requires. 'force-share' is ihnerited accross the backing chain so no -blockdev work is required for this. The only drawback of using 'force-share' is the necessity to declare the disk as '<readonly/>' as qemu does not accept the 'force-share' option otherwise.

The use of the approach similar to libguestfs (use file.locking='off') attribute is currently not possible as locking needs to be disabled for the backing file of the top level image and this requires -blockdev. (Or very ugly hacks that I'm not willing to do)

Once -blockdev work is complete it will be also possible to disable locking for individual members of the backing chain.

Comment 4 Peter Krempa 2018-01-31 12:01:55 UTC
Rich,

would the usage of <readonly/> be a problem for libguestfs? This would make it possible to use 'force-share' and thus at least make the libvirt driver work in a short term prior to implementing -blockdev fully.

Comment 5 Richard W.M. Jones 2018-01-31 18:45:53 UTC
Is --force-share the same as the -U option?
There was a proposal to add the -U option, but it didn't go upstream
(I can't find the link now).


(In reply to Peter Krempa from comment #4)
> Rich,
> 
> would the usage of <readonly/> be a problem for libguestfs? This would make
> it possible to use 'force-share' and thus at least make the libvirt driver
> work in a short term prior to implementing -blockdev fully.

I'm unclear where <readonly/> would be used.  Can you give some
example XML?

Comment 6 Peter Krempa 2018-02-01 09:05:40 UTC
(In reply to Richard W.M. Jones from comment #5)
> Is --force-share the same as the -U option?
> There was a proposal to add the -U option, but it didn't go upstream
> (I can't find the link now).

Yes -U is the short option for --force-share.

> (In reply to Peter Krempa from comment #4)
> > Rich,
> > 
> > would the usage of <readonly/> be a problem for libguestfs? This would make
> > it possible to use 'force-share' and thus at least make the libvirt driver
> > work in a short term prior to implementing -blockdev fully.
> 
> I'm unclear where <readonly/> would be used.  Can you give some
> example XML?

It would be part of the disk XML along with one extra attribute to enable the "force-share" option for the disk:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/upstream.qcow2'/>
      <readonly/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

I'm still thinking of how to present this feature in the XML, so I'll let you know once I have a design.

I was thinking of expressing it via a combination of the <shared/> and <readonly/> flags, but that looks too brittle to me. Other option is to add an atribute or element, but I need to think about implications and future use of that.

At any rate <readonly/> is required to be used with "force-share" which implies that qemu will deny all writes to the drive regardless of the overlay. Disks connected to SATA or IDE don't support <readonly/> for some reason.

Comment 7 Richard W.M. Jones 2018-02-01 09:11:31 UTC
It's still very unclear to me.  When a disk is added to libguestfs
with the readonly flag, we create a writable overlay on the disk.
That must be writable.  Are you saying the <readonly/> flag would
be added to the disk, or to the writable overlay?  In this scenario
the writable overlay "catches" all writes to the disk so nothing should
be written to the original disk.

When a disk is added to libguestfs without the readonly flag we don't
create any overlay, and the disk is used directly, and written to.

Comment 8 Peter Krempa 2018-02-01 09:54:35 UTC
(In reply to Richard W.M. Jones from comment #7)
> It's still very unclear to me.  When a disk is added to libguestfs
> with the readonly flag, we create a writable overlay on the disk.
> That must be writable.  Are you saying the <readonly/> flag would

The proposed solution by using "force-share=on" as an argument for the disk requires the disk to be fully read-only, so the <readonly/> flag would make that "writable" overlay non-writable.

If your use case requires it to be writable, the "force-share=on" approach is unusable in this case since qemu will not allow it without making the whole disk read-only and we need to force locking off via blockdev options.

My experiments with the 'force-share' options originated from the fact that it's a drive property and thus it's shared accross the backing chain automatically. Unfortunately the 'locking' blockdev option is not inherited through the backing chain, so it needs to be specified for the correct backing chain element.

I can currently suggest only one workaround. By specifying the backing file string as JSON when creating the overlay image you can inject arbitrary blockdev options for the backing file. Creating the overlay as:

qemu-img create -f qcow2 -b 'json:{"file":{"driver":"file","filename":"/var/lib/libvirt/images/upstream.qcow2","locking":"off"}}' -F qcow2 /tmp/test

Injects the locking option for the backing store and qemu happily ignores locking after being specified this way. As a added benefit I've observed that qemu-img create works properly even with the unfixed qemu-img binary.

I have no idea whether qemu block guys will be happy about such hack, but I don't see any other short term solution for this since using the 'locking' attribute will require wiring up -blockdev, which has some roadblocks still.

Comment 10 Richard W.M. Jones 2018-02-01 10:18:29 UTC
Yes the overlay must be writable.  Even operations like
‘mount -o ro’ write to the disk.

Comment 12 Peter Krempa 2018-02-01 18:36:17 UTC
(In reply to Richard W.M. Jones from comment #10)
> Yes the overlay must be writable.  Even operations like
> ‘mount -o ro’ write to the disk.

Indeed it does not even allow to mount the filesystem in such case:
[   69.511033] EXT4-fs (sdb1): INFO: recovery required on readonly filesystem
[   69.511650] EXT4-fs (sdb1): write access unavailable, cannot proceed

Well this rules out usage of "force-share" in this case. Is there any other option we can use in qemu (besides using -blockdev's file.locking=off) to allow libguestfs working as it was before?

Comment 13 Kevin Wolf 2018-02-01 20:58:59 UTC
The solution that you want to have in the end is backing.force-shared=on. What libguestfs is doing when directly talking to qemu is actually a bit over the top, there is no reason to fully disable locking.

Unfortunately, I expect that this won't be much easier to set for you than backing.file.locking=off. Without being able to set options for non-toplevel nodes, I'm afraid I don't have anything to offer.

I mean, you can always add some <writeable-backing-chain /> tag and translate that into a literal "backing.force-shared=on" even without actually manging the backing node explicitly, but I can see why you're reluctant to do this.

Comment 14 Peter Krempa 2018-02-02 12:14:14 UTC
(In reply to Kevin Wolf from comment #13)
> The solution that you want to have in the end is backing.force-shared=on.
> What libguestfs is doing when directly talking to qemu is actually a bit
> over the top, there is no reason to fully disable locking.
> 
> Unfortunately, I expect that this won't be much easier to set for you than
> backing.file.locking=off. Without being able to set options for non-toplevel
> nodes, I'm afraid I don't have anything to offer.
> 
> I mean, you can always add some <writeable-backing-chain /> tag and
> translate that into a literal "backing.force-shared=on" even without
> actually manging the backing node explicitly, but I can see why you're
> reluctant to do this.

Well, actually if qemu then propagates the 'force-shared' attribute to all backing files it would be possible.

The problem here is that we can't express a property of the backing image in the XML, but as long as we can sell it as a property of the disk and it get's automatically propagated to every backing store member it might be possible to use this approach.

I'll explore this and report my findings.

Comment 15 Peter Krempa 2018-02-06 12:58:47 UTC
Given the current situation in libvirt it's really awkward to express configuration regarding the backing volumes in the disk XML part, especially since when 'backing.force-shared=on' is added to the disk command line which does not have a backing image, qemu will fail to start.

The workaround using the 'json:{}' pseudo-protocol specification that I've mentioned in Comment 8 would help libguestfs both in the libvirt and in the direct backend case.

Comment 17 Pino Toscano 2018-08-10 12:46:07 UTC
(In reply to Peter Krempa from comment #8)
> I can currently suggest only one workaround. By specifying the backing file
> string as JSON when creating the overlay image you can inject arbitrary
> blockdev options for the backing file. Creating the overlay as:
> 
> qemu-img create -f qcow2 -b
> 'json:{"file":{"driver":"file","filename":"/var/lib/libvirt/images/upstream.
> qcow2","locking":"off"}}' -F qcow2 /tmp/test

For sake of completion (and for self-documentation, to make sure I do not forget about it), discussing it Peter (thanks!) resulted in the following version/feature constraints for libguestfs' case:
- qemu rejects unknown properties in the JSON, so unconditionally passing locking:off is not an option
- qemu >= 2.8 (not totally sure though, but surely not very old versions) supports the pseudo json: URLs
- qemu >= 2.10 supports the locking for file; luckly in libguestfs we already do a (too simplistic though) QMP-based detection of locking (see lib/qemu.c, guestfs_int_qemu_mandatory_locking)
- libvirt >= 2.1.0 supports the pseudo json: URLs for backing chains; older version cannot parse than, causing libvirt to error out
- creating the overlay this way will be fine with both our qemu, and libvirt backends; libvirt (2.1.0+, as per points above) will do the right thing

Comment 18 Kevin Wolf 2018-08-10 12:52:33 UTC
(In reply to Pino Toscano from comment #17)
> - qemu >= 2.8 (not totally sure though, but surely not very old versions)
> supports the pseudo json: URLs

It's qemu >= 2.1, in fact.

Comment 20 Jaroslav Suchanek 2019-04-24 12:26:46 UTC
This bug is going to be addressed in next major release.

Comment 24 John Ferlan 2021-09-09 16:03:53 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 25 Jaroslav Suchanek 2021-11-05 15:03:10 UTC
This bug was closed deferred as a result of bug triage.

Please reopen if you disagree and provide justification why this bug should
get enough priority. Most important would be information about impact on
customer or layered product. Please indicate requested target release.