Bug 1630318
| Summary: | neutron-openvswitch-agent crashes on RHEL 7.6 Beta with SELinux enabled | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Lon Hohberger <lhh> | ||||
| Component: | selinux-policy | Assignee: | Lukas Vrabec <lvrabec> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Milos Malik <mmalik> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.6 | CC: | aaustin, goneri, jpichon, jschluet, lhh, lmiksik, lvrabec, mmalik, mthacker, omoris, plautrba, psedlak, rsroka, salmy, ssekidde, toneata, vmojzis, zcaplovi | ||||
| Target Milestone: | rc | Keywords: | Regression, ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1628679 | ||||||
| : | 1635704 (view as bug list) | Environment: | |||||
| Last Closed: | 2019-08-06 12:52:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1628679, 1633527, 1635704, 1653106 | ||||||
| Attachments: |
|
||||||
|
Description
Lon Hohberger
2018-09-18 11:47:41 UTC
I have not done a complete RCA here. It is also possible that a change in the selinux-policy package may have done this. I'll add Lukas for his input. (In reply to Lon Hohberger from comment #0) > +++ This bug was initially created as a clone of Bug #1628679 +++ > > Description of problem: > > When testing OSP13 with the latest RHEL 7.6 partner snapshot, > neutron-openvswitch-agent was found to be constantly crashing and restarting > on the undercloud with the following traceback: > 2018-09-13 13:41:05.030 30641 ERROR neutron Traceback (most recent call > last): > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/bin/neutron-openvswitch-agent", line 10, in <module> > 2018-09-13 13:41:05.030 30641 ERROR neutron sys.exit(main()) > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ > ovs_neutron_agent.py", line 20, in main > 2018-09-13 13:41:05.030 30641 ERROR neutron agent_main.main() > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/ > agent/main.py", line 47, in main > 2018-09-13 13:41:05.030 30641 ERROR neutron mod.main() > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/ > agent/openflow/native/main.py", line 35, in main > 2018-09-13 13:41:05.030 30641 ERROR neutron > 'neutron.plugins.ml2.drivers.openvswitch.agent.' > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in > run_apps > 2018-09-13 13:41:05.030 30641 ERROR neutron hub.joinall(services) > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 103, in joinall > 2018-09-13 13:41:05.030 30641 ERROR neutron t.wait() > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait > 2018-09-13 13:41:05.030 30641 ERROR neutron return > self._exit_event.wait() > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait > 2018-09-13 13:41:05.030 30641 ERROR neutron current.throw(*self._exc) > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main > 2018-09-13 13:41:05.030 30641 ERROR neutron result = function(*args, > **kwargs) > 2018-09-13 13:41:05.030 30641 ERROR neutron File > "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 65, in _launch > 2018-09-13 13:41:05.030 30641 ERROR neutron raise e > 2018-09-13 13:41:05.030 30641 ERROR neutron Exception: Failed to spawn > rootwrap process. > 2018-09-13 13:41:05.030 30641 ERROR neutron stderr: > 2018-09-13 13:41:05.030 30641 ERROR neutron sudo: PAM account management > error: Authentication service cannot retrieve authentication info > > > Version-Release number of selected component (if applicable): > > OSP13 Puddle 2018-09-11.1 with RHEL 7.6 Partner Snapshot 2 > > How reproducible: > > Deploy an OSP undercloud with the versions above and observe tracebacks in > /var/log/neutron/openvswitch-agent.log > > Steps to Reproduce: > 1. Deploy an undercloud based on a RHEL 7.5 image > 2. Enable OSP puddle and RHEL 7.6 snapshot repositories > 3. yum update -y > 4. Install undercloud normally > 5. Observe tracebacks in /var/log/neutron/openvswitch-agent.log > 6. Observe avc deny messages in /var/log/audit/audit.log > > Actual results: > > neutron-openvswitch-agent is in a crash loop and there are many SELinux > denials logged > > Expected results: > > neutron-openvswitch-agent should run normally > > Additional info: > > [root@undercloud selinux]# cat /var/log/audit/audit.log | audit2allow > #============= neutron_t ============== > allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read }; > allow neutron_t pam_var_run_t:file { read write }; > allow neutron_t sendmail_exec_t:file execute; > allow neutron_t shadow_t:file { getattr open read }; > allow neutron_t sudo_db_t:dir search; > allow neutron_t var_log_t:file { create open }; > > [root@undercloud selinux]# rpm -qa | grep openstack-selinux > openstack-selinux-0.8.14-14.el7ost.noarch > > [root@undercloud selinux]# rpm -qa | grep selinux > openvswitch-selinux-extra-policy-1.0-5.el7fdp.noarch > libselinux-utils-2.5-14.1.el7.x86_64 > openstack-selinux-0.8.14-14.el7ost.noarch > libselinux-python-2.5-14.1.el7.x86_64 > selinux-policy-3.13.1-223.el7.noarch > libselinux-2.5-14.1.el7.x86_64 > container-selinux-2.68-1.el7.noarch > selinux-policy-targeted-3.13.1-223.el7.noarch > libselinux-ruby-2.5-14.1.el7.x86_64 > > --- Additional comment from Lon Hohberger on 2018-09-17 16:41:37 EDT --- > > allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read }; > > ^ This has been seen before. It seems how chkpwd_unix is executed changed, > or otherwise, there is something different in the sudo stack that breaks > existing policies here. So this is caused solely by an error due to some SELinux imposed restriction? Or is this reproducible even with SELinux in permissive mode? unix_chkpwd is AFAIK a pam_unix suid helper binary. I wonder whether we could be hitting this regression which was fixed in a later version of sudo (we are on 1.8.23 in 7.6): ``` On systems using PAM, sudo now ignores the PAM_NEW_AUTHTOK_REQD and PAM_AUTHTOK_EXPIRED errors from PAM account management if authentication is disabled for the user. This fixes a regression introduced in sudo 1.8.23. Bug #843. ``` (In reply to Daniel Kopeček from comment #9) > (In reply to Lon Hohberger from comment #0) > > --- Additional comment from Lon Hohberger on 2018-09-17 16:41:37 EDT --- > > > > allow neutron_t chkpwd_exec_t:file { execute execute_no_trans open read }; > > > > ^ This has been seen before. It seems how chkpwd_unix is executed changed, > > or otherwise, there is something different in the sudo stack that breaks > > existing policies here. > > So this is caused solely by an error due to some SELinux imposed > restriction? Or is this reproducible even with SELinux in permissive mode? > > unix_chkpwd is AFAIK a pam_unix suid helper binary. > > I wonder whether we could be hitting this regression which was fixed in a > later version of sudo (we are on 1.8.23 in 7.6): > > ``` > On systems using PAM, sudo now ignores the PAM_NEW_AUTHTOK_REQD and > PAM_AUTHTOK_EXPIRED errors from PAM account management if authentication is > disabled for the user. This fixes a regression introduced in sudo 1.8.23. > Bug #843. > ``` Upstream patch for this regression is: https://github.com/millert/sudo/commit/394524fd5d9ee493ce34894579a8c937fa3b9090 AVCs appear in permissive mode, but I believe things work. I will check. Lon, Could you please attach raw audit msgs from audit log? I I re-assigning this to selinux-policy component, it's for sure issue in SELinux security policy, but I would like to be sure what's going on in reaw AVC msgs. Thanks, Lukas. I've asked for audit.log from the original bug. Current CI runs still exhibit the problem, however, I have not checked which selinux-policy was in use. Created attachment 1487732 [details]
audit.log gathered with permissive mode set
audit.lig gathered by our QA colleagues with SSELinux in Permissive mode
Investigation of the attached audit.log file revealed that some SELinux denials are already fixed (in selinux-policy-3.13.1-229.el7) by following rule:
allow neutron_t initrc_var_run_t:file { lock open read };
The only SELinux denial that is NOT yet fixed is:
----
type=USER_AVC msg=audit(09/27/2018 12:34:05.682:15393) : pid=2252 uid=dbus auid=unset ses=unset subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc: denied { send_msg } for msgtype=method_return dest=:1.1103 spid=2405 tpid=1096 scontext=system_u:system_r:systemd_logind_t:s0 tcontext=system_u:system_r:cinder_volume_t:s0 tclass=dbus exe=/usr/bin/dbus-daemon sauid=dbus hostname=? addr=? terminal=?'
----
To fix it following rule needs to be added:
allow systemd_logind_t cinder_volume_t:dbus { send_msg };
The fact that following rule is NOT present in the latest selinux-policy build is considered a minor issue from my point of view:
allow systemd_logind_t cinder_volume_t:dbus { send_msg };
Result of the missing rule is that D-bus communication between systemd-logind and cinder-volume does not work both ways.
If you are familiar with cinder-volume, please tell us how important such communication is and if additional fix is needed ASAP.
I'll check on this. However, this latter part doesn't seem like a regression. Here's a previous example of it: https://bugzilla.redhat.com/show_bug.cgi?id=1207098 Also confirmed - audit2why on the 3 utmp AVCs all net: Was caused by: Unknown - would be allowed by active policy Possible mismatch between this policy and the one under which the audit message was generated. Possible mismatch between current in-memory boolean settings vs. permanent ones. ... signalling that these AVCs are fixed (even though our test runs were pulling -228 for some reason) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2127 |