Description of problem: Currently the ownership of the directory "/var/lib/vhost_sockets/" is qemu:qemu and ovs has permission to read with qemu group in OSP12. With OSP13, a new group "hugetlbfs" has been added in ovs and qemu for DPDK deployments. And this vhost_socket directory should be accessible to qemu and hugetlbfs groups. We need to SELinux policy change to allow this access. [root@overcloud-computeovsdpdk-0 ~]# ll /var/lib/vhost_sockets/ -Z srwxrwxr-x. qemu hugetlbfs system_u:object_r:var_lib_t:s0 vhu2515c21d-a2 srwxrwxr-x. qemu hugetlbfs system_u:object_r:var_lib_t:s0 vhudd20b9be-e8
Any additional information? Lon?
Groups are not handled by SELinux, that's handled by group configuration. SELinux restrictions are based on process contexts, which are a different thing. What labels are going to write to these new vhost_sockets stuff?
UID/GID is distinct from SELinux label, that is.
Are there logs (audit.log) of failures here? Permission denied may simply be OS stuff, but permissions look right (g+rwx).
Unable to create vhost socket when selinux is enforcing, but it works by disabling it. So either policy mismatch or directory is not applied with right context. audit.log --------- http://chunk.io/krsacme/4163bbdfc28a4446966cf639bdd8014f sealert ------ http://chunk.io/krsacme/c1acf523796a416cb090748bedabd5e6 [root@overcloud-computeovsdpdk-0 ~]# ll -Z /var/lib/vhost_sockets/ -d drwxr-xr-x. qemu qemu system_u:object_r:var_lib_t:s0 /var/lib/vhost_sockets/ [root@overcloud-computeovsdpdk-0 ~]# ll -Z /var/lib/vhost_sockets/ srwxrwxr-x. qemu hugetlbfs system_u:object_r:var_lib_t:s0 vhu6bfcd765-04 [root@overcloud-computeovsdpdk-0 ~]# restorecon -R /var/lib/vhost_sockets/ [root@overcloud-computeovsdpdk-0 ~]# ll -Z /var/lib/vhost_sockets/ -d drwxr-xr-x. qemu qemu system_u:object_r:virt_cache_t:s0 /var/lib/vhost_sockets/ [root@overcloud-computeovsdpdk-0 ~]# ll -Z /var/lib/vhost_sockets/ srwxrwxr-x. qemu hugetlbfs system_u:object_r:virt_cache_t:s0 vhu6bfcd765-04
There are the denials. type=AVC msg=audit(1520583704.004:744): avc: denied { write } for pid=72686 comm="qemu-kvm" path="pipe:[900115]" dev="pipefs" ino=900115 scontext=system_u:system_r:svirt_t:s0:c57,c132 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1520583704.004:744): avc: denied { write } for pid=72686 comm="qemu-kvm" path="pipe:[900115]" dev="pipefs" ino=900115 scontext=system_u:system_r:svirt_t:s0:c57,c132 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1520583704.004:744): avc: denied { write } for pid=72686 comm="qemu-kvm" path="pipe:[894143]" dev="pipefs" ino=894143 scontext=system_u:system_r:svirt_t:s0:c57,c132 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file
The spc_t denial is not related to the uid/gid changes. What version of container-selinux is installed? This is the problem; the directory is somehow mislabeled: type=AVC msg=audit(1520575038.275:536): avc: denied { write } for pid=29345 comm="qemu-kvm" name="vhost_sockets" dev="sdb2" ino=671088704 scontext=system_u:system_r:svirt_t:s0:c154,c907 tcontext=system_u:object_r:var_lib_t:s0 tclass=dir This should have been resolved by commit c6158ceb in upstream (changing to setfiles instead of 'restorecon'). What openstack-selinux policy is installed?
FYI /var/lib/vhost_sockets should be set to virt_cache_t. I will doublecheck openstack-selinux installation is doing the right thing here.
I dont remember the puddle version, but I am trying to validate with the latest puddle though 2018-03-15.1. WIll update the results. Can you confirm the exact selinux labels that we need to apply to this directory - /var/lib/vhost_sockets? I could do a restorecon after creating a directory.
It should be virt_cache_t; e.g.: [root@localhost lib]# mkdir vhost_sockets [root@localhost lib]# ls -ldZ !$ ls -ldZ vhost_sockets drwxr-xr-x. root root unconfined_u:object_r:var_lib_t:s0 vhost_sockets [root@localhost lib]# restorecon !$ restorecon vhost_sockets [root@localhost lib]# ls -ldZ !$ ls -ldZ vhost_sockets drwxr-xr-x. root root unconfined_u:object_r:virt_cache_t:s0 vhost_sockets If you're creating the directory, restorecon is a must. SELinux labels are inherited at the creation time from the parent directory.
If we know the context of the process creating the file, we can certainly ensure things are right using a file transition.
(In reply to Lon Hohberger from comment #16) > If we know the context of the process creating the file, we can certainly > ensure things are right using a file transition. For the docker deployment, the directory is created by ansible tasks [1] in the tripleo docker service template. And for the baremetal deployment, puppet-tripleo takes care of it (and puppet directory creation does a restorecon internally to ensure right contenxt). [1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/nova-libvirt.yaml#L394 - name: create directory for vhost-user sockets with qemu ownership file: path: /var/lib/vhost_sockets state: directory owner: qemu group: qemu The group name has to be changed here to the new group name, I am working on the patch with the latest osp13 puddle to include the selinux context also to it.
OK. You should only need to do (in the Ansible task): restorecon -Rv /var/lib/vhost_sockets ... after creating it.
I have added patch to set the correct context to the directory being created. https://review.openstack.org/#/c/555661/ After the deployment, the directory has the expected context and permissions. [root@overcloud-computeovsdpdk-0 ~]# ll /var/lib/vhost_sockets/ -dZ drwxr-xr-x. qemu hugetlbfs system_u:object_r:virt_cache_t:s0 /var/lib/vhost_sockets/ [root@overcloud-computeovsdpdk-0 ~]# ll /var/lib/vhost_sockets/ -Z srwxrwxr-x. qemu hugetlbfs system_u:object_r:virt_cache_t:s0 vhu747cb0c7-0e srwxrwxr-x. qemu hugetlbfs system_u:object_r:virt_cache_t:s0 vhudf07069b-bd [root@overcloud-computeovsdpdk-0 ~]# But even with this, guest VM is not successfully created, VM is in the paused state, which means qemu has created the VM but waiting for ack from ovs/neutron on the port accessibility, which is not received yet. During the paused state, below is the ovs log in the ovs-vswitchd.log file Mar 23 04:47:28 overcloud-computeovsdpdk-0 ovs-vswitchd[4620]: ovs|31032623|dpdk|ERR|VHOST_CONFIG: truncted msg Mar 23 04:47:28 overcloud-computeovsdpdk-0 ovs-vswitchd[4620]: ovs|31032624|dpdk|ERR|VHOST_CONFIG: vhost read message failed When SELinux is disabled, the VM moves to running state from the paused immediately. Looks like, ovs is trying to read the socket continuously and failing, but when disabled it is successful. Strangely, there is no denial logs in the audit.log, and below is the context of ovs processes. [root@overcloud-computeovsdpdk-0 ~]# ps auxZ | grep ovs-vswitchd | grep -v grep system_u:system_r:openvswitch_t:s0 openvsw+ 4620 205 0.3 26203984 928264 ? S<Lsl 06:16 380:17 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach With all this info, I could find a relevant BZ from RHEL for the the similar SELinux issue. https://bugzilla.redhat.com/show_bug.cgi?id=1547250 I will move this bz to me for the tripleo-heat-tempalte patch mentioned earlier and will rely on the RHEL BZ for the actual fix.
Created attachment 1412031 [details] sosreport with failure and disable seliux to create vm successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086