Bug 1630318

Summary: neutron-openvswitch-agent crashes on RHEL 7.6 Beta with SELinux enabled
Product: Red Hat Enterprise Linux 7 Reporter: Lon Hohberger <lhh>
Component: selinux-policyAssignee: Lukas Vrabec <lvrabec>
Status: CLOSED ERRATA QA Contact: Milos Malik <mmalik>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.6CC: aaustin, goneri, jpichon, jschluet, lhh, lmiksik, lvrabec, mmalik, mthacker, omoris, plautrba, psedlak, rsroka, salmy, ssekidde, toneata, vmojzis, zcaplovi
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1628679
: 1635704 (view as bug list) Environment:
Last Closed: 2019-08-06 12:52:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1628679, 1633527, 1635704, 1653106    
Attachments:
Description Flags
audit.log gathered with permissive mode set none

Description Lon Hohberger 2018-09-18 11:47:41 UTC
+++ This bug was initially created as a clone of Bug #1628679 +++

Description of problem:

When testing OSP13 with the latest RHEL 7.6 partner snapshot, neutron-openvswitch-agent was found to be constantly crashing and restarting on the undercloud with the following traceback:
2018-09-13 13:41:05.030 30641 ERROR neutron Traceback (most recent call last):
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/bin/neutron-openvswitch-agent", line 10, in <module>
2018-09-13 13:41:05.030 30641 ERROR neutron     sys.exit(main())
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2018-09-13 13:41:05.030 30641 ERROR neutron     agent_main.main()
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 47, in main
2018-09-13 13:41:05.030 30641 ERROR neutron     mod.main()
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
2018-09-13 13:41:05.030 30641 ERROR neutron     'neutron.plugins.ml2.drivers.openvswitch.agent.'
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in run_apps
2018-09-13 13:41:05.030 30641 ERROR neutron     hub.joinall(services)
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 103, in joinall
2018-09-13 13:41:05.030 30641 ERROR neutron     t.wait()
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2018-09-13 13:41:05.030 30641 ERROR neutron     return self._exit_event.wait()
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait
2018-09-13 13:41:05.030 30641 ERROR neutron     current.throw(*self._exc)
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2018-09-13 13:41:05.030 30641 ERROR neutron     result = function(*args, **kwargs)
2018-09-13 13:41:05.030 30641 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 65, in _launch
2018-09-13 13:41:05.030 30641 ERROR neutron     raise e
2018-09-13 13:41:05.030 30641 ERROR neutron Exception: Failed to spawn rootwrap process.
2018-09-13 13:41:05.030 30641 ERROR neutron stderr:
2018-09-13 13:41:05.030 30641 ERROR neutron sudo: PAM account management error: Authentication service cannot retrieve authentication info


Version-Release number of selected component (if applicable):

OSP13 Puddle 2018-09-11.1 with RHEL 7.6 Partner Snapshot 2

How reproducible:

Deploy an OSP undercloud with the versions above and observe tracebacks in /var/log/neutron/openvswitch-agent.log

Steps to Reproduce:
1. Deploy an undercloud based on a RHEL 7.5 image
2. Enable OSP puddle and RHEL 7.6 snapshot repositories
3. yum update -y
4. Install undercloud normally
5. Observe tracebacks in /var/log/neutron/openvswitch-agent.log
6. Observe avc deny messages in /var/log/audit/audit.log

Actual results:

neutron-openvswitch-agent is in a crash loop and there are many SELinux denials logged

Expected results:

neutron-openvswitch-agent should run normally

Additional info:

[root@undercloud selinux]# cat /var/log/audit/audit.log | audit2allow
#============= neutron_t ==============
allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read };
allow neutron_t pam_var_run_t:file { read write };
allow neutron_t sendmail_exec_t:file execute;
allow neutron_t shadow_t:file { getattr open read };
allow neutron_t sudo_db_t:dir search;
allow neutron_t var_log_t:file { create open };

[root@undercloud selinux]# rpm -qa | grep openstack-selinux
openstack-selinux-0.8.14-14.el7ost.noarch

[root@undercloud selinux]# rpm -qa | grep selinux
openvswitch-selinux-extra-policy-1.0-5.el7fdp.noarch
libselinux-utils-2.5-14.1.el7.x86_64
openstack-selinux-0.8.14-14.el7ost.noarch
libselinux-python-2.5-14.1.el7.x86_64
selinux-policy-3.13.1-223.el7.noarch
libselinux-2.5-14.1.el7.x86_64
container-selinux-2.68-1.el7.noarch
selinux-policy-targeted-3.13.1-223.el7.noarch
libselinux-ruby-2.5-14.1.el7.x86_64

--- Additional comment from Lon Hohberger on 2018-09-17 16:41:37 EDT ---

allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read };

^ This has been seen before. It seems how chkpwd_unix is executed changed, or otherwise, there is something different in the sudo stack that breaks existing policies here.

--- Additional comment from Lon Hohberger on 2018-09-17 16:42:08 EDT ---

This behavior does not occur on 7.5 and prior.

Comment 3 Lon Hohberger 2018-09-20 13:05:09 UTC
I have not done a complete RCA here.  It is also possible that a change in the selinux-policy package may have done this.  I'll add Lukas for his input.

Comment 9 Daniel Kopeček 2018-09-21 09:26:38 UTC
(In reply to Lon Hohberger from comment #0)
> +++ This bug was initially created as a clone of Bug #1628679 +++
> 
> Description of problem:
> 
> When testing OSP13 with the latest RHEL 7.6 partner snapshot,
> neutron-openvswitch-agent was found to be constantly crashing and restarting
> on the undercloud with the following traceback:
> 2018-09-13 13:41:05.030 30641 ERROR neutron Traceback (most recent call
> last):
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/bin/neutron-openvswitch-agent", line 10, in <module>
> 2018-09-13 13:41:05.030 30641 ERROR neutron     sys.exit(main())
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/
> ovs_neutron_agent.py", line 20, in main
> 2018-09-13 13:41:05.030 30641 ERROR neutron     agent_main.main()
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/
> agent/main.py", line 47, in main
> 2018-09-13 13:41:05.030 30641 ERROR neutron     mod.main()
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/
> agent/openflow/native/main.py", line 35, in main
> 2018-09-13 13:41:05.030 30641 ERROR neutron    
> 'neutron.plugins.ml2.drivers.openvswitch.agent.'
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in
> run_apps
> 2018-09-13 13:41:05.030 30641 ERROR neutron     hub.joinall(services)
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 103, in joinall
> 2018-09-13 13:41:05.030 30641 ERROR neutron     t.wait()
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
> 2018-09-13 13:41:05.030 30641 ERROR neutron     return
> self._exit_event.wait()
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait
> 2018-09-13 13:41:05.030 30641 ERROR neutron     current.throw(*self._exc)
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
> 2018-09-13 13:41:05.030 30641 ERROR neutron     result = function(*args,
> **kwargs)
> 2018-09-13 13:41:05.030 30641 ERROR neutron   File
> "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 65, in _launch
> 2018-09-13 13:41:05.030 30641 ERROR neutron     raise e
> 2018-09-13 13:41:05.030 30641 ERROR neutron Exception: Failed to spawn
> rootwrap process.
> 2018-09-13 13:41:05.030 30641 ERROR neutron stderr:
> 2018-09-13 13:41:05.030 30641 ERROR neutron sudo: PAM account management
> error: Authentication service cannot retrieve authentication info
> 
> 
> Version-Release number of selected component (if applicable):
> 
> OSP13 Puddle 2018-09-11.1 with RHEL 7.6 Partner Snapshot 2
> 
> How reproducible:
> 
> Deploy an OSP undercloud with the versions above and observe tracebacks in
> /var/log/neutron/openvswitch-agent.log
> 
> Steps to Reproduce:
> 1. Deploy an undercloud based on a RHEL 7.5 image
> 2. Enable OSP puddle and RHEL 7.6 snapshot repositories
> 3. yum update -y
> 4. Install undercloud normally
> 5. Observe tracebacks in /var/log/neutron/openvswitch-agent.log
> 6. Observe avc deny messages in /var/log/audit/audit.log
> 
> Actual results:
> 
> neutron-openvswitch-agent is in a crash loop and there are many SELinux
> denials logged
> 
> Expected results:
> 
> neutron-openvswitch-agent should run normally
> 
> Additional info:
> 
> [root@undercloud selinux]# cat /var/log/audit/audit.log | audit2allow
> #============= neutron_t ==============
> allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read };
> allow neutron_t pam_var_run_t:file { read write };
> allow neutron_t sendmail_exec_t:file execute;
> allow neutron_t shadow_t:file { getattr open read };
> allow neutron_t sudo_db_t:dir search;
> allow neutron_t var_log_t:file { create open };
> 
> [root@undercloud selinux]# rpm -qa | grep openstack-selinux
> openstack-selinux-0.8.14-14.el7ost.noarch
> 
> [root@undercloud selinux]# rpm -qa | grep selinux
> openvswitch-selinux-extra-policy-1.0-5.el7fdp.noarch
> libselinux-utils-2.5-14.1.el7.x86_64
> openstack-selinux-0.8.14-14.el7ost.noarch
> libselinux-python-2.5-14.1.el7.x86_64
> selinux-policy-3.13.1-223.el7.noarch
> libselinux-2.5-14.1.el7.x86_64
> container-selinux-2.68-1.el7.noarch
> selinux-policy-targeted-3.13.1-223.el7.noarch
> libselinux-ruby-2.5-14.1.el7.x86_64
> 
> --- Additional comment from Lon Hohberger on 2018-09-17 16:41:37 EDT ---
> 
> allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read };
> 
> ^ This has been seen before. It seems how chkpwd_unix is executed changed,
> or otherwise, there is something different in the sudo stack that breaks
> existing policies here.

So this is caused solely by an error due to some SELinux imposed restriction? Or is this reproducible even with SELinux in permissive mode?

unix_chkpwd is AFAIK a pam_unix suid helper binary.

I wonder whether we could be hitting this regression which was fixed in a later version of sudo (we are on 1.8.23 in 7.6):

```
On systems using PAM, sudo now ignores the PAM_NEW_AUTHTOK_REQD and PAM_AUTHTOK_EXPIRED errors from PAM account management if authentication is disabled for the user. This fixes a regression introduced in sudo 1.8.23. Bug #843. 
```

Comment 10 Daniel Kopeček 2018-09-21 10:14:00 UTC
(In reply to Daniel Kopeček from comment #9)
> (In reply to Lon Hohberger from comment #0)
> > --- Additional comment from Lon Hohberger on 2018-09-17 16:41:37 EDT ---
> > 
> > allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read };
> > 
> > ^ This has been seen before. It seems how chkpwd_unix is executed changed,
> > or otherwise, there is something different in the sudo stack that breaks
> > existing policies here.
> 
> So this is caused solely by an error due to some SELinux imposed
> restriction? Or is this reproducible even with SELinux in permissive mode?
> 
> unix_chkpwd is AFAIK a pam_unix suid helper binary.
> 
> I wonder whether we could be hitting this regression which was fixed in a
> later version of sudo (we are on 1.8.23 in 7.6):
> 
> ```
> On systems using PAM, sudo now ignores the PAM_NEW_AUTHTOK_REQD and
> PAM_AUTHTOK_EXPIRED errors from PAM account management if authentication is
> disabled for the user. This fixes a regression introduced in sudo 1.8.23.
> Bug #843. 
> ```

Upstream patch for this regression is: https://github.com/millert/sudo/commit/394524fd5d9ee493ce34894579a8c937fa3b9090

Comment 11 Lon Hohberger 2018-09-21 11:05:36 UTC
AVCs appear in permissive mode, but I believe things work. I will check.

Comment 13 Lukas Vrabec 2018-09-24 16:03:19 UTC
Lon, 

Could you please attach raw audit msgs from audit log? I

Comment 14 Lukas Vrabec 2018-09-24 16:04:39 UTC
I re-assigning this to selinux-policy component, it's for sure issue in SELinux security policy, but I would like to be sure what's going on in reaw AVC msgs. 

Thanks,
Lukas.

Comment 16 Lon Hohberger 2018-09-25 16:53:31 UTC
I've asked for audit.log from the original bug. Current CI runs still exhibit the problem, however, I have not checked which selinux-policy was in use.

Comment 25 Zoli Caplovic 2018-09-27 11:15:36 UTC
Created attachment 1487732 [details]
audit.log gathered with permissive mode set

audit.lig gathered by our QA colleagues with SSELinux in Permissive mode

Comment 27 Milos Malik 2018-09-27 12:09:48 UTC
Investigation of the attached audit.log file revealed that some SELinux denials are already fixed (in selinux-policy-3.13.1-229.el7) by following rule:

allow neutron_t initrc_var_run_t:file { lock open read };

The only SELinux denial that is NOT yet fixed is:
----
type=USER_AVC msg=audit(09/27/2018 12:34:05.682:15393) : pid=2252 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_return dest=:1.1103 spid=2405 tpid=1096 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cinder_volume_t:s0 tclass=dbus  exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?' 
----

To fix it following rule needs to be added:

allow systemd_logind_t cinder_volume_t:dbus { send_msg };

Comment 28 Milos Malik 2018-09-27 12:22:39 UTC
The fact that following rule is NOT present in the latest selinux-policy build is considered a minor issue from my point of view:

allow systemd_logind_t cinder_volume_t:dbus { send_msg };

Result of the missing rule is that D-bus communication between systemd-logind and cinder-volume does not work both ways.

If you are familiar with cinder-volume, please tell us how important such communication is and if additional fix is needed ASAP.

Comment 29 Lon Hohberger 2018-09-27 14:09:09 UTC
I'll check on this. However, this latter part doesn't seem like a regression.

Comment 30 Lon Hohberger 2018-09-27 14:11:00 UTC
Here's a previous example of it:

https://bugzilla.redhat.com/show_bug.cgi?id=1207098

Comment 31 Lon Hohberger 2018-09-27 18:55:05 UTC
Also confirmed - audit2why on the 3 utmp AVCs all net:

	Was caused by:
		Unknown - would be allowed by active policy
		Possible mismatch between this policy and the one under which the audit message was generated.

		Possible mismatch between current in-memory boolean settings vs. permanent ones.

... signalling that these AVCs are fixed (even though our test runs were pulling -228 for some reason)

Comment 57 errata-xmlrpc 2019-08-06 12:52:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2127