Description of problem: Ovirt supports NFS 4.2 portocol support for storage domains. If one of the disks of a running VM resides on a nfs 4.2 attached storage and the VM is live migrated (NOT storage migrated) the VM is stopped due to storage error. Version-Release number of selected component (if applicable): Ovirt 4.1.2 qemu 2.6.0-28.el7_3.9.1 How reproducible: 100% Steps to Reproduce: 1. create NFS storage domain. Enforce protocol 4.2 2. create vm with disk on nfs 4.2 storage 3. start VM 4. generate disk I/O in VM (disk writes will be best) 5. Migrate VM to other OVirt node Actual results: Migration might succeed 1-2 times. Afterwards migration fails. VM is set to stopped. Expected results: VM should migrate smoothly Additional info: moving the VM disk to NFS protocol auto negotiation (version 4.0) we can no longer reproduce the error.
Created attachment 1291748 [details] vdsm first node
Created attachment 1291749 [details] vdsm second node
BZ1406398 tracked the implementation of NFS 4.2. Live migration might not have been tested.
Using the newest NFS protocol version is required for making use of discard support. Going back to 4.1 for prper operation is only the second best option. See BZ1462504.
Attached you get a qemu strace from source system during live migration and the corresponding vdsm.log Just notice the failed pwritev to the virtual disk image during the live VM migration. Immediately afterwards the migration fails. 22:43:21.203706 pwritev(21, [{"\300;9\230\0\0\0\1\0\0\2060\0\30\1\275\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096}, {"\244\201\0\0-\0\0\0\353G,Y\353G,Y\353G,Y\0\0\0\0\0\0\1\0\10\0\0\0"..., 4096}], 2, 2283819008) = -1 EACCES (Permission denied)
Created attachment 1292469 [details] 2nd try source qemu
Created attachment 1292471 [details] 2nd try source vdsm
Attached an nfs client trace included with qemu error from /var/log/messages
Created attachment 1292475 [details] nfs client
Michal - someone from your team should probably take a look here too. Nir - is there anything to do from our side, or should we just pass this BZ on to qemu's devs?
Hi, I did some cross checks with upstream 4.4.72 lt kernel. Same behaviour. Either qemu/libvirt side is doing something wrong (dynamic permission changes) or NFS server/client does not like multiple file accesses during migration.. https://bugzilla.kernel.org/show_bug.cgi?id=196215
(In reply to Allon Mureinik from comment #10) > Nir - is there anything to do from our side, or should we just pass this BZ > on to qemu's devs? This smells like libvirt issue, since libvirt is labeling the vm disks with nfs 4.2. We need more info from the reporter.
Markus, can you try this? 1. Change selinux to permissive on the source and destination hosts setenforce 0 2. Perform live migration - does it work now? 3. If not, share the selinux denials ausearch -m avc -i -ts recent If this is indeed selinux issue, libvirt developers would like to see a libvirt debug log of this flow, from starting the vm (vm disks are labeled at this point) until migration fail.
Setting selinux to permissive fixes the error. Changing back to enfocing and the error immediately reappears. ausearch gives: type=PROCTITLE msg=audit(06/29/2017 13:58:53.066:5698) : proctitle=/usr/libexec/qemu-kvm -name guest=sol_cvmsaprouter01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/l type=SYSCALL msg=audit(06/29/2017 13:58:53.066:5698) : arch=x86_64 syscall=pwrite success=no exit=EACCES(Permission denied) a0=0x15 a1=0x7f36bb86e000 a2=0x1000 a3=0x5261a000 items=0 ppid=1 pid=21099 auid=unset uid=qemu gid=qemu euid=qemu suid=qemu fsuid=qemu egid=qemu sgid=qemu fsgid=qemu tty=(none) ses=unset comm=worker exe=/usr/libexec/qemu-kvm subj=system_u:system_r:svirt_t:s0:c522,c608 key=(null) type=AVC msg=audit(06/29/2017 13:58:53.066:5698) : avc: denied { write } for pid=21099 comm=worker path=/rhev/data-center/mnt/100.64.253.1:_var_data_nas3_OVirtIB/1ad1daac-474e-4f62-bb07-321c4b3f001e/images/507d208d-9738-4a87-b44c-453421a1ca05/755d86a3-a16b-41e9-8fbf-de01b8484f5a dev="0:46" ino=19327352961 scontext=system_u:system_r:svirt_t:s0:c522,c608 tcontext=system_u:object_r:svirt_image_t:s0:c357,c642 tclass=file permissive=0
(In reply to Markus Stockhausen from comment #14) > Setting selinux to permissive fixes the error. Changing back to enfocing and > the error immediately reappears. Thanks for confirming this! Any chance to get libvirt debug logs for this flow - from the time the vm was started, until migration finished/completed? To get libvirt debug logs: 1. edit /etc/libvirt/libvirtd.conf: log_level = 1 log_outputs="1:file:/var/log/libvirt/libvirtd.log" 3. Restart libvirtd Note that libvirt debug log are huge, and you want to disable them after you collect the logs for the migration.
Daniel, can you take a look? it looks like a libvirt labeling issue during migration.
This AVC type=AVC msg=audit(06/29/2017 13:58:53.066:5698) : avc: denied { write } for pid=21099 comm=worker path=/rhev/data-center/mnt/100.64.253.1:_var_data_nas3_OVirtIB/1ad1daac-474e-4f62-bb07-321c4b3f001e/images/507d208d-9738-4a87-b44c-453421a1ca05/755d86a3-a16b-41e9-8fbf-de01b8484f5a dev="0:46" ino=19327352961 scontext=system_u:system_r:svirt_t:s0:c522,c608 tcontext=system_u:object_r:svirt_image_t:s0:c357,c642 tclass=file permissive=0 is referring to a process / executable called "worker". AFAIK, that's not something belonging to libvirt, so presumably part of oVirt.
(In reply to Daniel Berrange from comment #17) > This AVC > ... > is referring to a process / executable called "worker". AFAIK, that's not > something belonging to libvirt, so presumably part of oVirt. oVirt does not have such process, and we never write to images used by libvirt/qemu. Kevin, is it possible that "worker" is the name of the qemu thread writing to a vm image?
Yes, that is it: $ cd src/virt/qemu $ git grep '"worker"' util/thread-pool.c: qemu_thread_create(&t, "worker", worker_thread, pool, QEMU_THREAD_DETACHED); This is an example of the problem with sVirt + shared filesystems, when libvirtd is allocating labels. Libvirt allocates a sVirt label each time a guest is started, unique to the node the guest is running on. So what we have here is QEMU running on one host with label c522,c608, and QEMU running on the other host with label c357,c642. Hence SELinux is denying access to the file from one of the QEMU instances. For sVirt to work correctly + securely with shared filesystems supporting labelling, oVirt needs to take responsibility for allocating SELinux labels. It has to ensure the VM uses the same label on both sides of migration, and must make sure that label is unique across every guest in the cluster. You can still let libvirt do automatic labelling of files - you just have to give libvirt the pre-determined label, instead of having libvirt pick one itself.
Based on comment 19, we cannot support live migration with NFS 4.2 in 4.1. We can work on managing selinux labels in the cluster level for 4.2. On the Vdsm side, this should be simple as passing the label from engine to the libvirt xml. On engine side, this requires managing the labels in the dc level. Ensuring that label is unique on all hosts is an interesting question. Allon, how do you want to proceed?
This the libvirt implementation, for selecting a label on a single host: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/security/security_selinux.c;h=2e3082b7a8d77a241f0aa88d08e4b0e46a611a17;hb=HEAD#l260
(In reply to Nir Soffer from comment #20) > Based on comment 19, we cannot support live migration with NFS 4.2 in 4.1. > > We can work on managing selinux labels in the cluster level for 4.2. > > On the Vdsm side, this should be simple as passing the label from engine to > the > libvirt xml. > > On engine side, this requires managing the labels in the dc level. Ensuring > that > label is unique on all hosts is an interesting question. > > Allon, how do you want to proceed? If we can disbale selinux for these mounts somehow, we can have a partial support for NFS 4.2 (which isn't worse than our support for previous NFS versions), and provide a proper solution in 4.2. If we can't, we'd need to retract the statement in BZ1406398
(In reply to Allon Mureinik from comment #22) > If we can disbale selinux for these mounts somehow, we can have a partial > support for NFS 4.2 (which isn't worse than our support for previous NFS > versions), and provide a proper solution in 4.2. > If we can't, we'd need to retract the statement in BZ1406398 Note that as of upstream kernel commit 32ddd944a056 "nfsd: opt in to labeled nfs per export", labeled nfs is off by default and needs to be turned on per-export with a new "security_label" export flag (recent nfs-utils also needed for that). So on sufficiently recent Fedora, for example, you can keep security labels off and still use 4.2 for the new sparse file support.
Markus, can you give more details on your NFS server and it's configuration? Maybe you can use the solution suggested in comment 23?
(In reply to Nir Soffer from comment #24) > Markus, can you give more details on your NFS server and it's configuration? > > Maybe you can use the solution suggested in comment 23? NFS server and clients are CentOS 7 with upstream 4.4.73 kernel (elrepo). Due to my suspect that it might be kernel related. Regarding the the upstream commit. I'm willing to patch, compile and test. I just need to know if this is the only patch that I have to include. Or are there other ones that need to be taken into account. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.12-rc7&id=32ddd944a056c786f6acdd95ed29e994adc613a2
And to clarify: The patch needs only to be implemented on NFS server side? The mentioned nfs-utils bugfixes are not needed - I guess. They are only required if I want to reactivate the security_label flag for a mount.
(In reply to Markus Stockhausen from comment #26) > And to clarify: The patch needs only to be implemented on NFS server side? > The mentioned nfs-utils bugfixes are not needed - I guess. They are only > required if I want to reactivate the security_label flag for a mount. Yes, since you don't actually want labeled nfs turned on, it should be sufficient just to patch the server kernel. That one patch should do it. Though actually as long as you're recompiling the kernel you could skip the patch and just turn off NFSD_V4_SECURITY_LABEL in the kernel build configuration.
Even with relative new longterm kernel 4.4 the above patch does not integrate nicely. So I chose to recompile with NFSD_V4_SECURITY_LABEL disabled. Now I can use NFS 4.2 as excepted. To sum it up: From the current point of view we need the following setup for a working NFS 4.2 OVirt storage domain (that is hosted on CentOS or similar distros) 1) Either setting selinux to permissive on OVirt nodes 2) Or disabling security labels on the NFS server by variant a: applying the above patch (exports will automatically be without security labels) variant b: compile kernel with disabled NFSD_V4_SECURITY_LABEL option Thanks for the help and thumbs up for a Ovirt managed longterm solution.
Live migration works with NFS 4.2 server running Fedora 24 (kernel 4.11.6) without the security_label export flag. So far I did not find any solution on the client side for disabling labeling, so the only solution is using NFS server that does not do labeling.
(In reply to Nir Soffer from comment #29) > Live migration works with NFS 4.2 server running Fedora 24 (kernel 4.11.6) > without the security_label export flag. > > So far I did not find any solution on the client side for disabling > labeling, so > the only solution is using NFS server that does not do labeling. IIUC, you should always be able to use the 'context=XXX' option to mount on the client, which instructs the kernel to force a particular label for all files.
(In reply to Nir Soffer from comment #29) > Live migration works with NFS 4.2 server running Fedora 24 (kernel 4.11.6) > without the security_label export flag. > > So far I did not find any solution on the client side for disabling > labeling, so > the only solution is using NFS server that does not do labeling. <seclabel type='none'/> in libvirt XML?
(In reply to Yaniv Kaul from comment #31) > <seclabel type='none'/> > > in libvirt XML? This will disable svirt for this vm - we don't want to do that. I think we need to check the mount context option, if this allows ignoring selinux per mount, this can be useful to use svirt where we can.
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
We didn't get to this bug for more than 2 years, and it's not being considered for the upcoming 4.4. It's unlikely that it will ever be addressed so I'm suggesting to close it. If you feel this needs to be addressed and want to work on it please remove cond nack and target accordingly.
ok, closing. Please reopen if still relevant/you want to work on it.