Bug 1432523

Summary: Enable QEMU image file locking by default
Product: Red Hat Enterprise Linux 7 Reporter: Ademar Reis <areis>
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.0CC: aliang, berrange, chayang, coli, famz, hachen, hhan, jsuchane, juzhang, knoel, meyang, michen, mtessun, ngu, pingl, qzhang, redhat-bugzilla, rjones, virt-maint, xuwei
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1378241 Environment:
Last Closed: 2017-04-24 14:47:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1378241    
Bug Blocks: 1378242, 1415250, 1415252, 1417306    

Description Ademar Reis 2017-03-15 14:52:23 UTC
QEMU Image locking is being introduced but disabled by default (Bug 1378241). The plan is to enable it by default in a future release, hence this BZ.

+++ This bug was initially created as a clone of Bug #1378241 +++

QEMU image locking, which should prevent multiple runs of QEMU or qemu-img when a VM is running.

Upstream series (v7): https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg01306.html

--- Additional comment from Martin Tessun on 2017-02-13 08:13:48 BRST ---

As I understood this, the solutions does only work on local machines, so it will not work if the block device is consumed by another host doing some changes on it.

This said, the RHV team has a similar approach for not accessing the same image twice, which is solved by using SANlock.

So maybe we can just utlilize this work and get the SANlock based solution implemented in libvirt (thus libvirt would require SANlock for the locking).

Perhaps we should discuss this with the RHV storage team to see if this is the more generic and easier to implement approach.

Thoughts?
Martin

--- Additional comment from Fam Zheng on 2017-02-13 10:58:14 BRST ---

We deliberately want QEMU to lock the images by itself instead of relying on libvirt or RHV, to protect manual invocation of QEMU commmands, such as 'qemu-img snapshot' against an in-use qcow2 image, which is a very common but dangerous misuse.

IMO this, SANlock and libvirt virtlockd (which has a SANlock plugin too) can go together.

--- Additional comment from Martin Tessun on 2017-02-13 11:42:47 BRST ---

(In reply to Fam Zheng from comment #8)
> We deliberately want QEMU to lock the images by itself instead of relying on
> libvirt or RHV, to protect manual invocation of QEMU commmands, such as
> 'qemu-img snapshot' against an in-use qcow2 image, which is a very common
> but dangerous misuse.
> 

Ack. So does this also prevent to use this volume from another host that has access to the same block device where the image does live on?
If not, I believe we are missing the most important usecase for RHV and OSP (let's say enterprise) installations, as these typically have more hosts that might access the image concurrently.

> IMO this, SANlock and libvirt virtlockd (which has a SANlock plugin too) can
> go together.

Ack. Maybe to resolve the above concerns a "joint" approach is needed between qemu and libvirt to ensure that also "remote access" to the image is denied.

--- Additional comment from Fam Zheng on 2017-02-13 12:07:45 BRST ---

(In reply to Martin Tessun from comment #9)
> So does this also prevent to use this volume from another host that has
> access to the same block device where the image does live on?

The current work in progress is focusing on file system based lock (OFD locks) that protects local images and NFS, but hasn't considered block based solutions like SANlock or SCSI reservation. Gluster and ceph are developing their own locking implementations which could be integrated (or even transparently beneficial) to QEMU in the future.

--- Additional comment from Ademar Reis on 2017-02-13 12:41:15 BRST ---

(In reply to Martin Tessun from comment #9)
> (In reply to Fam Zheng from comment #8)
> > We deliberately want QEMU to lock the images by itself instead of relying on
> > libvirt or RHV, to protect manual invocation of QEMU commmands, such as
> > 'qemu-img snapshot' against an in-use qcow2 image, which is a very common
> > but dangerous misuse.
> > 
> 
> Ack. So does this also prevent to use this volume from another host that has
> access to the same block device where the image does live on?
> If not, I believe we are missing the most important usecase for RHV and OSP
> (let's say enterprise) installations, as these typically have more hosts
> that might access the image concurrently.
>

I fully agree and we have discussed this in the recent past, concluding that the only viable solution appears to be some sort of monitoring done by libvirt on all block devices being used by a VM:

    Bug 1337005 - Log event when a block device in use by a guest is open read-write by external applications

The bug description contains a good explanation for the motivation and the use-cases that it covers.

But we (libvirt) need to investigate and better research the idea and potential solutions.

--- Additional comment from Daniel Berrange on 2017-03-08 12:33:49 BRT ---

(In reply to Ademar Reis from comment #11)
> (In reply to Martin Tessun from comment #9)
> > (In reply to Fam Zheng from comment #8)
> > > We deliberately want QEMU to lock the images by itself instead of relying on
> > > libvirt or RHV, to protect manual invocation of QEMU commmands, such as
> > > 'qemu-img snapshot' against an in-use qcow2 image, which is a very common
> > > but dangerous misuse.
> > > 
> > 
> > Ack. So does this also prevent to use this volume from another host that has
> > access to the same block device where the image does live on?
> > If not, I believe we are missing the most important usecase for RHV and OSP
> > (let's say enterprise) installations, as these typically have more hosts
> > that might access the image concurrently.
> >
> 
> I fully agree and we have discussed this in the recent past, concluding that
> the only viable solution appears to be some sort of monitoring done by
> libvirt on all block devices being used by a VM:
> 
>     Bug 1337005 - Log event when a block device in use by a guest is open
> read-write by external applications
> 
> The bug description contains a good explanation for the motivation and the
> use-cases that it covers.
> 
> But we (libvirt) need to investigate and better research the idea and
> potential solutions.

Libvirt already has a framework for disk locking that can protect against concurrent access from multiple hosts. It has two plugins, one using sanlock, and one using virtlockd. The virtlockd daemon uses fcntl() locks, but does not have to apply them directly to the disk file. It can use a lock-aside lock file  based on some unique identifier. For example, you can mount an NFS vol at /var/lib/libvirt/lockspace, and use the SCSI  WWID or LVM UUID as the name of a lock file to create in that directory. This of course requires that all nodes in the cluster have that NFS vol mounted, but it is does not require high bandwidth data transfer for this volume. There is also scope for creating new plugins for libvirt to use other mechanism if someone comes up with other ideas.

Comment 1 Ademar Reis 2017-04-24 14:47:17 UTC
(In reply to Ademar Reis from comment #0)
> QEMU Image locking is being introduced but disabled by default (Bug
> 1378241). The plan is to enable it by default in a future release, hence
> this BZ.
> 

Turns out the QEMU locking mechanism requires F_OFD_* locks, not present in the RHEL-7.4 linux kernel.

So I'm closing this BZ and leaving only the original BZ, for a future release of  RHEL.