Bug 1417164 - OVS agent crashes when using OVS 2.6 and OSP 10 due to SELinux violations
Summary: OVS agent crashes when using OVS 2.6 and OSP 10 due to SELinux violations
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
: 10.0 (Newton)
Assignee: Daniel Alvarez Sanchez
QA Contact: GenadiC
URL:
Whiteboard:
: 1433432 (view as bug list)
Depends On:
Blocks: 1408224
TreeView+ depends on / blocked
 
Reported: 2017-01-27 11:37 UTC by Karthik Sundaravel
Modified: 2017-03-31 07:03 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-30 10:56:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
sosreport Compute (9.16 MB, application/x-xz)
2017-01-27 11:45 UTC, Karthik Sundaravel
no flags Details
sosreport controller (13.17 MB, application/x-xz)
2017-01-27 11:47 UTC, Karthik Sundaravel
no flags Details
Traceback from OVS agent (3.92 KB, text/plain)
2017-01-27 12:35 UTC, Assaf Muller
no flags Details
SELINUX logs (/var/log/messages) (23.33 KB, application/x-xz)
2017-01-27 13:44 UTC, Karthik Sundaravel
no flags Details

Description Karthik Sundaravel 2017-01-27 11:37:46 UTC
Description of problem:

I am trying to get ovs 2.6.1 (without DPDK) in RHOSP10. 
OVS 2.6 rpm is taken from [1].  
The OVS 2.6 rpm is installed in the overcloud-full.qcow2 via  
virt-customize -a overcloud-full.qcow2 --run-command 'yum install openvswitch-2.6.1-3.git20161206.el7fdb.x86_64.rpm -y'

The deployment is successful, but the compute and controller nodes doesn't have the bridge 'br-int' created.

Please find attached the sosreport for these nodes.


[1] http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.6.1/3.git20161206.el7fdb/x86_64/

Comment 1 Karthik Sundaravel 2017-01-27 11:45:56 UTC
Created attachment 1245101 [details]
sosreport Compute

Comment 2 Karthik Sundaravel 2017-01-27 11:47:24 UTC
Created attachment 1245102 [details]
sosreport controller

Comment 3 Assaf Muller 2017-01-27 12:35:04 UTC
Created attachment 1245122 [details]
Traceback from OVS agent

Comment 4 Karthik Sundaravel 2017-01-27 13:12:05 UTC
I find that, after setting the SELINUX to permissive and reboot, it works.

Comment 5 Karthik Sundaravel 2017-01-27 13:44:46 UTC
Created attachment 1245154 [details]
SELINUX logs (/var/log/messages)

Comment 6 Assaf Muller 2017-01-27 13:53:15 UTC
Switching component to SELinux. Since we want to update OVS to 2.6 in more than one OSP release stream we'll actually need to duplicate this bug once it's fixed and make sure it's fixed in likely at least OSP 8, 9, 10 and 11.

Comment 7 Ryan Hallisey 2017-02-10 21:03:24 UTC
type=AVC msg=audit(1485511068.007:113): avc: denied { unlink } for pid=2319 comm="NetworkManager" name="dhclient-ens802f0.pid" dev="tmpfs" ino=16544 scontext=system_u:system_r:NetworkManager_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=file

/var/run/dhclient-eno1.pid should be labeled dhcpc_var_run_t not var_run_t.
Try `restorecon -Rv /var/run/dhclient-eno1.pid`.

Comment 8 Red Hat Bugzilla Rules Engine 2017-02-10 21:03:33 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 9 Lon Hohberger 2017-03-08 15:12:50 UTC
Hmm, Ryan, there's something else going on here.

A similar issue happened recently with /var/run/haproxy.sock - the context is correct if you restorecon, but touching the file doesn't set it properly.

Comment 11 Lon Hohberger 2017-03-10 21:18:14 UTC
[root@localhost run]# touch /var/run/dhclient-eno1.pid
[root@localhost run]# ls -lZ !$
ls -lZ /var/run/dhclient-eno1.pid
-rw-r--r--. root root unconfined_u:object_r:var_run_t:s0 /var/run/dhclient-eno1.pid
[root@localhost run]# restorecon !$
restorecon /var/run/dhclient-eno1.pid
[root@localhost run]# ls -lZ !$
ls -lZ /var/run/dhclient-eno1.pid
-rw-r--r--. root root unconfined_u:object_r:dhcpc_var_run_t:s0 /var/run/dhclient-eno1.pid


So, the context is there, it's just not getting set quite correctly. Is something besides NetworkManager creating it?

Comment 12 Lon Hohberger 2017-03-10 21:21:18 UTC
I ask because other such files created by NetworkManager do have the right label on my box.

Comment 13 Jakub Libosvar 2017-03-16 17:34:22 UTC
Hi Karthik, is this issue reproducible? We have a bug 1425507 with the same traceback so I  would like to ask you, if you can reproduce this and try out to:

1) Kill rootwrap daemon before starting OVS agent.

2) If that doesn't help: confirm that setting selinux to permissive solves it *without* rebooting the node. Just setenforce 0 and start agent (after you confirm that you can't start it).

Comment 14 Saravanan KR 2017-03-17 16:56:01 UTC
We tried your steps and I have attached the logs - http://paste.openstack.org/show/603165/

After we disable the selinux with setenforce 0 and restarting agent, brings up the br-int interface.

Comment 15 Assaf Muller 2017-03-23 17:27:14 UTC
Daniel will try to reproduce the bug tomorrow.

Comment 16 Assaf Muller 2017-03-23 17:39:00 UTC
*** Bug 1433432 has been marked as a duplicate of this bug. ***

Comment 17 Daniel Alvarez Sanchez 2017-03-23 17:57:07 UTC
I'll update this tomorrow but I've seen this other bug [0] with the *exact* same denial message. Apparently they installed ovs manually and something went wrong during the installation, then they did the process again and it worked. 
I'll try to install the linked package in this RHBZ on an OSP10 all-in-one and see if it reproduces the issue.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1402032

Comment 18 Daniel Alvarez Sanchez 2017-03-24 09:05:13 UTC
Can I have access to this environment? It looks like the system is mislabeled so I'd want to restore context (sudo restorecon -RV /) and/or upgrade selinux-policy package to check if we still see denials on ovs after that.

Comment 19 Daniel Alvarez Sanchez 2017-03-24 14:29:50 UTC
Thanks guys for providing access to the environment. This is what i've seen so far:

- Both compute nodes had denials for ovs-vsctl:

time->Fri Mar 24 13:54:33 2017
type=SYSCALL msg=audit(1490363673.009:278): arch=c000003e syscall=21 success=no exit=-13 a0=7fc4a8000ce0 a1=1 a2=1 a3=fffff000 items=0 ppid=86487 pid=86493 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="neutron-rootwra" exe="/usr/bin/python2.7" subj=system_u:system_r:neutron_t:s0 key=(null)
type=AVC msg=audit(1490363673.009:278): avc:  denied  { execute } for  pid=86493 comm="neutron-rootwra" name="ovs-vsctl" dev="sdb2" ino=2139730 scontext=system_u:system_r:neutron_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file


- Apparently, ovs binaries are unlabeled:

[root@overcloud-compute-0 audit]# ls -Z /usr/bin/ovs-*
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-appctl
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-dpctl
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-dpctl-top
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-ofctl
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-pki
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-vsctl

- Running "restorecon -Rv /" in one of the nodes, solved the issue and, after restarting ovs agent, I can't see any more denials and files are correctly labeled:

[root@overcloud-compute-0 audit]# ls -Z /usr/bin/ovs-*
-rwxr-xr-x. root root system_u:object_r:openvswitch_exec_t:s0 /usr/bin/ovs-appctl
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/ovs-dpctl
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/ovs-dpctl-top
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/ovs-ofctl
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/ovs-pki
-rwxr-xr-x. root root system_u:object_r:openvswitch_exec_t:s0 /usr/bin/ovs-vsctl


- In the other node, I didn't run restorecon but, instead, I reinstalled the openvswitch package to see if the files are correctly labeled:

[heat-admin@overcloud-compute-1 ~]$ ls -aZ /usr/bin/ovs-vsctl 
-rwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /usr/bin/ovs-vsctl

[heat-admin@overcloud-compute-1 ~]$ sudo rpm -Uvh openvswitch-2.6.1-10.git20161206.el7fdp.x86_64.rpm  --replacepkgs
Preparing...                          ################################# [100%]
Updating / installing...
   1:openvswitch-2.6.1-10.git20161206.################################# [100%]


[heat-admin@overcloud-compute-1 ~]$ ls -aZ /usr/bin/ovs-vsctl 
-rwxr-xr-x. root root system_u:object_r:openvswitch_exec_t:s0 /usr/bin/ovs-vsctl
[heat-admin@overcloud-compute-1 ~]$ 


- They are correctly labeled and no more denials in audit logs after that.


So, apparently, installing openvswitch in the overcloud image left the system mislabeled and we have to fix that. I'll try to reach out to someone else and will update this BZ accordingly.

Comment 20 Daniel Alvarez Sanchez 2017-03-24 16:28:34 UTC
Quoting [0], could you please try generating the overcloud image with --selinux-relabel option so that it gets correct SELinux labels? 

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1253623#c21

Comment 22 Daniel Alvarez Sanchez 2017-03-30 10:56:30 UTC
I'm closing this BZ since it looks like the image was generated only for testing and not using the right parameters as in [0]. Therefore, the SELinux labels were not correctly applied and ovs failed.

If you think that this requires to revisit, please re-open and I'll look into it.

Thanks,
Daniel

[0] https://github.com/redhat-openstack/infrared/blob/d38d396d20af238b1994e95ad5c48c8c80008b04/plugins/tripleo-undercloud/tasks/images/repos.yml#L77


Note You need to log in before you can comment on or make changes to this bug.