This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1464787 - NFS 4.2 with SELinux of the server side breaks live migration
NFS 4.2 with SELinux of the server side breaks live migration
Status: NEW
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core (Show other bugs)
4.1.2
Unspecified Unspecified
medium Severity high (vote)
: ovirt-4.2.0
: ---
Assigned To: Nir Soffer
meital avital
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-25 15:13 EDT by Markus Stockhausen
Modified: 2017-09-28 05:02 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ylavi: ovirt‑4.2+


Attachments (Terms of Use)
vdsm first node (38.81 KB, text/plain)
2017-06-25 15:15 EDT, Markus Stockhausen
no flags Details
vdsm second node (24.29 KB, text/plain)
2017-06-25 15:15 EDT, Markus Stockhausen
no flags Details
2nd try source qemu (4.75 MB, application/zip)
2017-06-27 17:05 EDT, Markus Stockhausen
no flags Details
2nd try source vdsm (17.72 KB, application/zip)
2017-06-27 17:05 EDT, Markus Stockhausen
no flags Details
nfs client (361.58 KB, text/plain)
2017-06-27 18:01 EDT, Markus Stockhausen
no flags Details

  None (edit)
Description Markus Stockhausen 2017-06-25 15:13:17 EDT
Description of problem:

Ovirt supports NFS 4.2 portocol support for storage domains. If one of the disks of a running VM resides on a nfs 4.2 attached storage and the VM is live migrated (NOT storage migrated) the VM is stopped due to storage error.

Version-Release number of selected component (if applicable):

Ovirt 4.1.2
qemu 2.6.0-28.el7_3.9.1

How reproducible:

100%

Steps to Reproduce:
1. create NFS storage domain. Enforce protocol 4.2
2. create vm with disk on nfs 4.2 storage
3. start VM
4. generate disk I/O in VM (disk writes will be best)
5. Migrate VM to other OVirt node

Actual results:

Migration might succeed 1-2 times. Afterwards migration fails. VM is set to stopped.

Expected results:

VM should migrate smoothly

Additional info:

moving the VM disk to NFS protocol auto negotiation (version 4.0) we can no longer reproduce the error.
Comment 1 Markus Stockhausen 2017-06-25 15:15 EDT
Created attachment 1291748 [details]
vdsm first node
Comment 2 Markus Stockhausen 2017-06-25 15:15 EDT
Created attachment 1291749 [details]
vdsm second node
Comment 3 Markus Stockhausen 2017-06-25 15:20:07 EDT
BZ1406398 tracked the implementation of NFS 4.2. Live migration might not have been tested.
Comment 4 Markus Stockhausen 2017-06-25 15:38:29 EDT
Using the newest NFS protocol version is required for making use of discard support. Going back to 4.1 for prper operation is only the second best option. See BZ1462504.
Comment 5 Markus Stockhausen 2017-06-27 17:04:35 EDT
Attached you get a qemu strace from source system during live migration and the corresponding vdsm.log 

Just notice the failed pwritev to the virtual disk image during the live VM migration. Immediately afterwards the migration fails.

22:43:21.203706 pwritev(21, [{"\300;9\230\0\0\0\1\0\0\2060\0\30\1\275\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096}, {"\244\201\0\0-\0\0\0\353G,Y\353G,Y\353G,Y\0\0\0\0\0\0\1\0\10\0\0\0"..., 4096}], 2, 2283819008) = -1 EACCES (Permission denied)
Comment 6 Markus Stockhausen 2017-06-27 17:05 EDT
Created attachment 1292469 [details]
2nd try source qemu
Comment 7 Markus Stockhausen 2017-06-27 17:05 EDT
Created attachment 1292471 [details]
2nd try source vdsm
Comment 8 Markus Stockhausen 2017-06-27 18:01:09 EDT
Attached an nfs client trace included with qemu error from /var/log/messages
Comment 9 Markus Stockhausen 2017-06-27 18:01 EDT
Created attachment 1292475 [details]
nfs client
Comment 10 Allon Mureinik 2017-06-29 03:27:48 EDT
Michal - someone from your team should probably take a look here too.

Nir - is there anything to do from our side, or should we just pass this BZ on to qemu's devs?
Comment 11 Markus Stockhausen 2017-06-29 06:36:59 EDT
Hi,

I did some cross checks with upstream 4.4.72 lt kernel. Same behaviour. Either qemu/libvirt side is doing something wrong (dynamic permission changes) or NFS server/client does not like multiple file accesses during migration..

https://bugzilla.kernel.org/show_bug.cgi?id=196215
Comment 12 Nir Soffer 2017-06-29 06:51:30 EDT
(In reply to Allon Mureinik from comment #10)
> Nir - is there anything to do from our side, or should we just pass this BZ
> on to qemu's devs?

This smells like libvirt issue, since libvirt is labeling the vm disks with 
nfs 4.2.

We need more info from the reporter.
Comment 13 Nir Soffer 2017-06-29 06:53:46 EDT
Markus, can you try this?

1. Change selinux to permissive on the source and destination hosts

    setenforce 0

2. Perform live migration - does it work now?
3. If not, share the selinux denials

    ausearch -m avc -i -ts recent

If this is indeed selinux issue, libvirt developers would like to see a libvirt
debug log of this flow, from starting the vm (vm disks are labeled at this point) until migration fail.
Comment 14 Markus Stockhausen 2017-06-29 08:03:04 EDT
Setting selinux to permissive fixes the error. Changing back to enfocing and the error immediately reappears.

ausearch gives:

type=PROCTITLE msg=audit(06/29/2017 13:58:53.066:5698) : proctitle=/usr/libexec/qemu-kvm -name guest=sol_cvmsaprouter01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/l
type=SYSCALL msg=audit(06/29/2017 13:58:53.066:5698) : arch=x86_64 syscall=pwrite success=no exit=EACCES(Permission denied) a0=0x15 a1=0x7f36bb86e000 a2=0x1000 a3=0x5261a000 items=0 ppid=1 pid=21099 auid=unset uid=qemu gid=qemu euid=qemu suid=qemu fsuid=qemu egid=qemu sgid=qemu fsgid=qemu tty=(none) ses=unset comm=worker exe=/usr/libexec/qemu-kvm subj=system_u:system_r:svirt_t:s0:c522,c608 key=(null)
type=AVC msg=audit(06/29/2017 13:58:53.066:5698) : avc:  denied  { write } for  pid=21099 comm=worker path=/rhev/data-center/mnt/100.64.253.1:_var_data_nas3_OVirtIB/1ad1daac-474e-4f62-bb07-321c4b3f001e/images/507d208d-9738-4a87-b44c-453421a1ca05/755d86a3-a16b-41e9-8fbf-de01b8484f5a dev="0:46" ino=19327352961 scontext=system_u:system_r:svirt_t:s0:c522,c608 tcontext=system_u:object_r:svirt_image_t:s0:c357,c642 tclass=file permissive=0
Comment 15 Nir Soffer 2017-06-29 08:27:24 EDT
(In reply to Markus Stockhausen from comment #14)
> Setting selinux to permissive fixes the error. Changing back to enfocing and
> the error immediately reappears.
Thanks for confirming this!

Any chance to get libvirt debug logs for this flow - from the time the vm was
started, until migration finished/completed?

To get libvirt debug logs:

1. edit /etc/libvirt/libvirtd.conf:

log_level = 1
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

3. Restart libvirtd

Note that libvirt debug log are huge, and you want to disable them after you
collect the logs for the migration.
Comment 16 Nir Soffer 2017-06-29 08:30:11 EDT
Daniel, can you take a look? it looks like a libvirt labeling issue during 
migration.
Comment 17 Daniel Berrange 2017-06-29 08:34:15 EDT
This AVC

type=AVC msg=audit(06/29/2017 13:58:53.066:5698) : avc:  denied  { write } for  pid=21099 comm=worker path=/rhev/data-center/mnt/100.64.253.1:_var_data_nas3_OVirtIB/1ad1daac-474e-4f62-bb07-321c4b3f001e/images/507d208d-9738-4a87-b44c-453421a1ca05/755d86a3-a16b-41e9-8fbf-de01b8484f5a dev="0:46" ino=19327352961 scontext=system_u:system_r:svirt_t:s0:c522,c608 tcontext=system_u:object_r:svirt_image_t:s0:c357,c642 tclass=file permissive=0

is referring to a process / executable called "worker".  AFAIK, that's not something belonging to libvirt, so presumably part of oVirt.
Comment 18 Nir Soffer 2017-06-29 09:08:08 EDT
(In reply to Daniel Berrange from comment #17)
> This AVC
> ...
> is referring to a process / executable called "worker".  AFAIK, that's not
> something belonging to libvirt, so presumably part of oVirt.

oVirt does not have such process, and we never write to images used by libvirt/qemu.

Kevin, is it possible that "worker" is the name of the qemu thread writing to 
a vm image?
Comment 19 Daniel Berrange 2017-06-29 09:13:28 EDT
Yes, that is it:

$ cd src/virt/qemu
$ git grep '"worker"'
util/thread-pool.c:    qemu_thread_create(&t, "worker", worker_thread, pool, QEMU_THREAD_DETACHED);

This is an example of the problem with sVirt + shared filesystems, when libvirtd is allocating labels. Libvirt allocates a sVirt label each time a guest is started, unique to the node the guest is running on. 

So what we have here is QEMU running on one host with label c522,c608, and QEMU running on the other host with label c357,c642. Hence SELinux is denying access to the file from one of the QEMU instances.

For sVirt to work correctly + securely with shared filesystems supporting labelling, oVirt needs to take responsibility for allocating SELinux labels. It has to ensure the VM uses the same label on both sides of migration, and must make sure that label is unique across every guest in the cluster.  You can still let libvirt do automatic labelling of files - you just have to give libvirt the pre-determined label, instead of having libvirt pick one itself.
Comment 20 Nir Soffer 2017-06-29 09:26:23 EDT
Based on comment 19, we cannot support live migration with NFS 4.2 in 4.1.

We can work on managing selinux labels in the cluster level for 4.2.

On the Vdsm side, this should be simple as passing the label from engine to the
libvirt xml.

On engine side, this requires managing the labels in the dc level. Ensuring that
label is unique on all hosts is an interesting question.

Allon, how do you want to proceed?
Comment 21 Nir Soffer 2017-06-29 09:45:51 EDT
This the libvirt implementation, for selecting a label on a single host:
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/security/security_selinux.c;h=2e3082b7a8d77a241f0aa88d08e4b0e46a611a17;hb=HEAD#l260
Comment 22 Allon Mureinik 2017-06-29 10:19:14 EDT
(In reply to Nir Soffer from comment #20)
> Based on comment 19, we cannot support live migration with NFS 4.2 in 4.1.
> 
> We can work on managing selinux labels in the cluster level for 4.2.
> 
> On the Vdsm side, this should be simple as passing the label from engine to
> the
> libvirt xml.
> 
> On engine side, this requires managing the labels in the dc level. Ensuring
> that
> label is unique on all hosts is an interesting question.
> 
> Allon, how do you want to proceed?

If we can disbale selinux for these mounts somehow, we can have a partial support for NFS 4.2 (which isn't worse than our support for previous NFS versions), and provide a proper solution in 4.2.
If we can't, we'd need to retract the statement in BZ1406398
Comment 23 J. Bruce Fields 2017-06-29 11:15:00 EDT
(In reply to Allon Mureinik from comment #22)
> If we can disbale selinux for these mounts somehow, we can have a partial
> support for NFS 4.2 (which isn't worse than our support for previous NFS
> versions), and provide a proper solution in 4.2.
> If we can't, we'd need to retract the statement in BZ1406398

Note that as of upstream kernel commit 32ddd944a056 "nfsd: opt in to labeled nfs per export", labeled nfs is off by default and needs to be turned on per-export with a new "security_label" export flag (recent nfs-utils also needed for that).

So on sufficiently recent Fedora, for example, you can keep security labels off and still use 4.2 for the new sparse file support.
Comment 24 Nir Soffer 2017-06-29 11:44:38 EDT
Markus, can you give more details on your NFS server and it's configuration?

Maybe you can use the solution suggested in comment 23?
Comment 25 Markus Stockhausen 2017-06-29 15:41:02 EDT
(In reply to Nir Soffer from comment #24)
> Markus, can you give more details on your NFS server and it's configuration?
> 
> Maybe you can use the solution suggested in comment 23?

NFS server and clients are CentOS 7 with upstream 4.4.73 kernel (elrepo). Due to my suspect that it might be kernel related.

Regarding the the upstream commit. I'm willing to patch, compile and test. I just need to know if this is the only patch that I have to include. Or are there other ones that need to be taken into account.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.12-rc7&id=32ddd944a056c786f6acdd95ed29e994adc613a2
Comment 26 Markus Stockhausen 2017-06-29 15:44:32 EDT
And to clarify: The patch needs only to be implemented on NFS server side? The mentioned nfs-utils bugfixes are not needed - I guess. They are only required if I want to reactivate the security_label flag for a mount.
Comment 27 J. Bruce Fields 2017-06-29 16:46:16 EDT
(In reply to Markus Stockhausen from comment #26)
> And to clarify: The patch needs only to be implemented on NFS server side?
> The mentioned nfs-utils bugfixes are not needed - I guess. They are only
> required if I want to reactivate the security_label flag for a mount.

Yes, since you don't actually want labeled nfs turned on, it should be sufficient just to patch the server kernel.  That one patch should do it.

Though actually as long as you're recompiling the kernel you could skip the patch and just turn off NFSD_V4_SECURITY_LABEL in the kernel build configuration.
Comment 28 Markus Stockhausen 2017-06-30 12:40:51 EDT
Even with relative new longterm kernel 4.4 the above patch does not integrate nicely. So I chose to recompile with NFSD_V4_SECURITY_LABEL disabled. Now I can use NFS 4.2 as excepted. To sum it up: From the current point of view we need the following setup for a working NFS 4.2 OVirt storage domain (that is hosted on CentOS or similar distros)

1) Either setting selinux to permissive on OVirt nodes

2) Or disabling security labels on the NFS server by
  variant a: applying the above patch (exports will automatically be without security labels)
  variant b: compile kernel with disabled NFSD_V4_SECURITY_LABEL option 

Thanks for the help and thumbs up for a Ovirt managed longterm solution.
Comment 29 Nir Soffer 2017-06-30 15:23:59 EDT
Live migration works with NFS 4.2 server running Fedora 24 (kernel 4.11.6)
without the security_label export flag.

So far I did not find any solution on the client side for disabling labeling, so 
the only solution is using NFS server that does not do labeling.
Comment 30 Daniel Berrange 2017-07-03 04:51:31 EDT
(In reply to Nir Soffer from comment #29)
> Live migration works with NFS 4.2 server running Fedora 24 (kernel 4.11.6)
> without the security_label export flag.
> 
> So far I did not find any solution on the client side for disabling
> labeling, so 
> the only solution is using NFS server that does not do labeling.

IIUC, you should always be able to use the 'context=XXX' option to mount on the client, which instructs the kernel to force a particular label for all files.
Comment 31 Yaniv Kaul 2017-07-10 05:50:13 EDT
(In reply to Nir Soffer from comment #29)
> Live migration works with NFS 4.2 server running Fedora 24 (kernel 4.11.6)
> without the security_label export flag.
> 
> So far I did not find any solution on the client side for disabling
> labeling, so 
> the only solution is using NFS server that does not do labeling.

<seclabel type='none'/>

in libvirt XML?
Comment 32 Nir Soffer 2017-07-10 06:10:44 EDT
(In reply to Yaniv Kaul from comment #31)
> <seclabel type='none'/>
> 
> in libvirt XML?

This will disable svirt for this vm - we don't want to do that. I think we need
to check the mount context option, if this allows ignoring selinux per mount, this
can be useful to use svirt where we can.

Note You need to log in before you can comment on or make changes to this bug.