Bug 1113327

Summary:	Disk image ownership changed after failed VM start due to existing lock
Product:	Red Hat Enterprise Linux 7	Reporter:	Need Real Name <karlcz>
Component:	libvirt	Assignee:	Michal Privoznik <mprivozn>
Status:	CLOSED DUPLICATE	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	acathrow, dyuan, jdenemar, mkletzan, mzhan, shyu, ydu, zhwang
Target Milestone:	pre-dev-freeze
Target Release:	7.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-08-05 14:06:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Need Real Name 2014-06-26 00:52:19 UTC

Description of problem:

After a VM start fails to acquire an NFS disk image direct lock, the ownership of the image file is changed from qemu:qemu to root:root, causing the existing lock holder to lose access permission and begin experiencing IO errors in the guest.

Version-Release number of selected component (if applicable):

libvirt-1.1.1-29.el7.x86_64  on CentOS 7 Public QA build.

How reproducible:

Always.

Steps to Reproduce:
1. enable virtlockd.socket service on two hosts (may need reboot)
2. configure lock_manager=lockd in /etc/libvirt/qemu.conf on both hosts
3. use a shared NFS space for disk image files on both hosts
4. create VMs A and B on separate hosts, both using same disk image file
5. start VM A and confirm that it boots normally
6. attempt to start VM B
7. VM B fails to start due to existing lock on disk image file

Actual results:

8. VM A experiences virtual disk failure due to lost permissions
9. stopping and restarting VM A restores permissions

Expected results:

8. VM A continues to run normally since it holds the lock

Additional info:

My datastore is configured as "dir" type where the actual /var/lib/libvirt/images/teststore is an autofs mount so that I can control the NFS mount options to get NFSv4.1 in-band locks and callbacks plus a soft mount.

Comment 1 Need Real Name 2014-06-26 04:39:30 UTC

I just did another test and determined that this can be reproduced on a single host using a regular local filesystem for image storage.  Follow the steps above on a single host, using a local filesystem and create two VMs that use the same image file as their virtual disk.

When the second one fails to start due to the lock, the first loses contact with its disk.  However, the lock remains held, and multiple attempts to start the second VM will continue to fail until the first VM is powered off and the lock released.

Comment 2 Michal Privoznik 2014-06-26 09:42:47 UTC

The problem is, when starting a new domain libvirt executes the following operations (P = parent (libvirtd), C = Child):

1P) prepare cmd line
2P) fork()
3C) lock_domain_disks()
4P) do the labeling
5P) Notify child
6C) exec(qemu, ...)


The problem is disk locking and labeling are done in parallel, so if one of if fails, we may have unfortunatelly done the other one. Looking deeper into the code this diff should fix the bug:

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 5b598be..7cb0ae9 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2736,6 +2736,12 @@ static int qemuProcessHook(void *data)
     if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0)
         goto cleanup;
 
+    VIR_DEBUG("Setting domain security labels");
+    if (virSecurityManagerSetAllLabel(driver->securityManager,
+                                      vm->def, stdin_path) < 0)
+        goto cleanup;
+    VIR_DEBUG("Labelling done");
+
     if (virNumaSetupMemoryPolicy(h->vm->def->numatune, h->nodemask) < 0)
         goto cleanup;
 
@@ -4092,17 +4098,6 @@ int qemuProcessStart(virConnectPtr conn,
         qemuProcessInitCpuAffinity(driver, vm, nodemask) < 0)
         goto cleanup;
 
-    VIR_DEBUG("Setting domain security labels");
-    if (virSecurityManagerSetAllLabel(driver->securityManager,
-                                      vm->def, stdin_path) < 0)
-        goto cleanup;
-
-    /* Security manager labeled all devices, therefore
-     * if any operation from now on fails and we goto cleanup,
-     * where virSecurityManagerRestoreAllLabel() is called
-     * (hidden under qemuProcessStop) we need to restore labels. */
-    stop_flags &= ~VIR_QEMU_PROCESS_STOP_NO_RELABEL;
-
     if (stdin_fd != -1) {
         /* if there's an fd to migrate from, and it's a pipe, put the
          * proper security label on it
@@ -4121,12 +4116,17 @@ int qemuProcessStart(virConnectPtr conn,
             goto cleanup;
     }
 
-    VIR_DEBUG("Labelling done, completing handshake to child");
     if (virCommandHandshakeNotify(cmd) < 0) {
         goto cleanup;
     }
     VIR_DEBUG("Handshake complete, child running");
 
+    /* Security manager labeled all devices, therefore
+     * if any operation from now on fails and we goto cleanup,
+     * where virSecurityManagerRestoreAllLabel() is called
+     * (hidden under qemuProcessStop) we need to restore labels. */
+    stop_flags &= ~VIR_QEMU_PROCESS_STOP_NO_RELABEL;
+
     if (migrateFrom)
         flags |= VIR_QEMU_PROCESS_START_PAUSED;

Comment 4 Martin Kletzander 2014-06-26 11:22:37 UTC

I've missed your comment, and I had a patch prepared already, the proposal is here:

https://www.redhat.com/archives/libvir-list/2014-June/msg01352.html

Comment 6 Jiri Denemark 2014-08-05 14:06:33 UTC


*** This bug has been marked as a duplicate of bug 547546 ***

Comment 7 Need Real Name 2014-08-05 18:36:40 UTC

How is this a duplicate of 547546?

The bug isn't that permissions are not restored, as described in 547546, but that they should never have been adjusted in the first place when locks had not yet been acquired.  Anything other than strict ordering of lock acquisition prior to any modification to disk image state or permissions will inherently have race conditions and endanger the existing domain that already holds valid locks...

Comment 8 Michal Privoznik 2014-08-06 08:15:34 UTC

(In reply to Need Real Name from comment #7)
> How is this a duplicate of 547546?
> 
> The bug isn't that permissions are not restored, as described in 547546, but
> that they should never have been adjusted in the first place when locks had
> not yet been acquired.  Anything other than strict ordering of lock
> acquisition prior to any modification to disk image state or permissions
> will inherently have race conditions and endanger the existing domain that
> already holds valid locks...

The patchset proposed in the referenced bug should fix this bug too.