Bug 1261711
Summary: | Problem with fence_virsh in RHEL 6 - selinux denial | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Madison Kelly <mkelly> | ||||
Component: | fence-agents | Assignee: | Marek Grac <mgrac> | ||||
Status: | CLOSED WONTFIX | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.7 | CC: | cluster-maint, dlavu, jpokorny, rbalakri, tojeline | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1427986 (view as bug list) | Environment: | |||||
Last Closed: | 2017-12-06 10:38:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1427986 | ||||||
Attachments: |
|
Description
Madison Kelly
2015-09-10 03:14:14 UTC
Noticed that there is a boolean to permit ssh_t access to the fenced_t, can you please try enabling it? setsebool -P fenced_can_ssh on And what are the results? Thanks. That appears to have fixed it. ==== [root@node1 ~]# sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: enforcing Mode from config file: permissive Policy version: 24 Policy from config file: targeted ==== ==== [root@node1 ~]# setsebool -P fenced_can_ssh on [root@node1 ~]# ls -Z `which fence_virsh` `which fence_ipmilan` `which ssh` -rwxr-xr-x. root root system_u:object_r:ssh_exec_t:s0 /usr/bin/ssh -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/sbin/fence_ipmilan -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/sbin/fence_virsh ==== ==== [root@node1 ~]# clustat Cluster Status for ccrs @ Thu Sep 10 04:19:19 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.ccrs.bcn 1 Online, Local node2.ccrs.bcn 2 Online ==== ==== [root@node1 ~]# fence_node node2 fence node2 success ==== syslog: ==== Sep 10 04:18:44 node1 dbus: avc: received policyload notice (seqno=2) Sep 10 04:18:44 node1 dbus: [system] Reloaded configuration Sep 10 04:18:44 node1 setsebool: The fenced_can_ssh policy boolean was changed to on by root ==== Manual fence call: ==== Sep 10 04:19:28 node1 fence_node[27458]: fence node2 success ==== Corosync-initiated fence: ==== Sep 10 04:19:35 node1 corosync[2792]: [TOTEM ] A processor failed, forming new configuration. Sep 10 04:19:37 node1 corosync[2792]: [QUORUM] Members[1]: 1 Sep 10 04:19:37 node1 corosync[2792]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Sep 10 04:19:37 node1 corosync[2792]: [CPG ] chosen downlist: sender r(0) ip(10.20.10.1) r(1) ip(10.10.10.1) ; members(old:2 left:1) Sep 10 04:19:37 node1 corosync[2792]: [MAIN ] Completed service synchronization, ready to provide service. Sep 10 04:19:37 node1 kernel: dlm: closing connection to node 2 Sep 10 04:19:37 node1 fenced[2879]: node_history_fence_external no nodeid -1 Sep 10 04:19:37 node1 fenced[2879]: fencing node node2.ccrs.bcn Sep 10 04:19:40 node1 fenced[2879]: fence node2.ccrs.bcn success ==== Excellent! Now, is this a bug to be fixed in the package, or something I need to manually do when I setup the clusters in the future? This will be something you'd want to add to your configuration steps. It is a boolean to be enabled, not all clusters use this agent so I don't think we can convince the SELinux folks to enable it by default. Could the fence-agents package set it in the post section of the RPM? I could imagine many users hitting this and being stumped... Digimer, I agree, it is something that should be documented better, the official documentation makes a note about the following selinux boolean fenced_can_network_connect and fence_xvm. Maybe file an RFE to have it added to the documentation and to the fence_agent directly. @Digimer: I'm not sure about adding it to the post-section because fence agents are integrated also with other packages (e.g. RHEV, OpenStack) where this setup will open security more than required. The best place where to do it is probably in pcs which can allow when fence agent is really used in cluster. I will investigate that a bit more. @Digimer: I have tested 6.7, 6.6, 6.5, 6.4 (I don't have installed earlier versions) and fenced_can_ssh is there in every version (and is off by default). So this does not look like regression to me. @Marek: I totally believe you, but somehow, it started to be a problem for me very recently only. I've used this setup to test HA on VMs for quite some time and never hit this selinux issue. I am curious/worried on what might have changed. Re: "The best place where to do it is probably in pcs"; Please don't forget rgmanager. :) For me personally, I now have our installer enabling fenced_can_ssh, so for me, the issue is fixed. Can I make a suggestion though? Can you have fence_virsh check and, if that isn't set, log a more useful error message telling the user what is wrong? Marku, it's a fence agents vs SELinux integration issue, please do not try to push changes elsewhere. It's simply not manageable/scalable, in addition to digimer's point wrt. rgmanager: - do you really want pcs to play SELinux magic across all nodes at arbitrary configuration, over and over, just to be sure? pcs is a management tool, not a tool for workarounding distribution not playing well together out of the box - this applies to RHEL distributions only, everything beside SELinux policy tends to be suitable for general consumption by arbitrary distros (and distro specific patches is something one wants to avoid as much as possible) @Jan: pcs & pacemaker are tools that induce usage of fenced_can_ssh (and others) so they should be one that allow them in SELinux. So, I don't think that putting it there is work-around, it is more standard integration stuff. From the fence agents perspective, we can include all required info in metadata. Configurable at build time, so it won't be distro specific at all. Proposition made by Tomas Jelinek to do it in post-install script of pcs is an acceptable solution for both pcs and users. @Digimer: IMHO it is not possible for applications to find out that SELinux blocked something. I can improve error message to contain info about SELinux booleans, does it sounds good? I was saying to do it in post-install script of particular fence agents not pcs. It does not make any sense to do this in post-install of pcs. For example one can run pcs on a host which is not part of a cluster at all (and use the host to manage other clusters via pcsd web UI). @Marek
I'm all for anything that helps a user realize the source of a failure more easily. Adding something like "Hint, check 'getsebool fenced_can_ssh', it needs to be 'on'." if selinux is enforcing? Whatever makes sense to you.
@Thomas
> I was saying to do it in post-install script of particular fence agents
Makes logical sense to me.
Created attachment 1073829 [details] Script validating some points in the connected comment Couple of notes here: 1. note that setting a SELinux boolean is an expensive operation, "/usr/sbin/setsebool -P fenced_can_ssh on" takes ca. 35 seconds in my VM 2. setsebool doesn't take a current state into account, it will do this expensive operation even if it's not necessary (already enabled) 1. + 2. --> you always want to run: LANG=C /usr/sbin/getsebool fenced_can_ssh | grep -qE 'on$' \ || /usr/sbin/setsebool -P fenced_can_ssh on --- 3. fencing library (core of the fence agents) does *NOT* do a proper job of detecting whether the commands to be used are actually regular existing files, and if so, whether they are executable 4. python-pexpect package will then not save the situation (either the errors/exceptions are not propagated correctly or fencing library doesn't handle them well) 5. from SELinux vs. Python perspective, one can check without a risk of exception: - os.access(path, os.F_OK) <-- whether file exists at all, should report true value for /usr/bin/ssh even for fenced_t process - os.access(path, os.X_OK) <-- whether executable file exists, should report false value for /usr/bin/ssh for fenced_t process, even if it exists and is executable and while being prepared for OSError exception: - os.stat(path) <-- wrapper around stat(3), for /usr/bin/ssh in in fenced_t process, it will raise that exception, if its errno==13 (errno.EACCES), the cause is allegedly (not 100%) SELinux (similarly with subprocess.Popen, but this one has the undesired side-effect of actually running the executable if possible) FWIW, this partially refute claim from [comment 13]: > IMHO it is not possible for applications to find out that SELinux > blocked something. I can improve error message to contain info about > SELinux booleans, does it sounds good? as os.stat + OSError + errno == 13 is quite a reliable combination to detect SELinux silently ruling 3. + 4. + 5. --> there is an apparent technical solution for when to proceed with [comment 15]: > anything that helps a user realize the source of a failure more easily. --- Attached is a script used to validate some points. To use it, the environment has to be prepared in a special way; SELinux has to enabled, but permissive only (setenforce 0) as the artificial transition of the context (as with runcon utility) is, apparently, also a protected action. Hence the sequence to play with the script, with setsebool line commented or not for good measure, is: # wget https://bugzilla.redhat.com/<ATTACHMENT> -O test.py # setenforce 0 # runcon system_u:object_r:fenced_t:s0 /usr/bin/python test.py \ /usr/bin/ssh -V Results with setsebool line commented out: > Exist and Executable and Executable: /usr/bin/ssh > Running: /usr/sbin/setenforce 1 > Process returned: 0 > SELinux switch to enforcing > System refused access to `/usr/bin/ssh', perhaps due to SELinux > Exist and Not executable and Not executable: /usr/bin/ssh > Running: /usr/bin/ssh -V > System refused to execute `/usr/bin/ssh -V', perhaps due to SELinux and when not: > Exist and Executable and Executable: /usr/bin/ssh > Running: /usr/sbin/setsebool -P fenced_can_ssh on > Process returned: 0 > Running: /usr/sbin/setenforce 1 > Process returned: 0 > SELinux switch to enforcing > Exist and Executable and Executable: /usr/bin/ssh > Running: /usr/bin/ssh -V > OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013 > Process returned: 0 Re debate as to which package should proceed with setsebool command in its post-install scriptlet: Let me suggest alternative to sticking with pcs/pacemaker: - fence-agents.srpm will spawn another, dedicated subpackage, say fence-agents-cluster, that will be empty and only exist for that post-install command, plus postun scriptlet will contain a command that reverts that, i.e., /usr/sbin/setsebool -P fenced_can_ssh on - we can then decide whether pacemaker should require fence-agents-cluster or "please install fence-agents-cluster" will be a documented solution for the title issue, either way, everybody could be happy again, without tainting specfiles of distinct components Re [comment 17]: reverting command is apparently: /usr/sbin/setsebool -P fenced_can_ssh off This has slid above my coding skills, so let me take a step back and say that my only real concern is; If/when fence_virsh fails, something hints the user in /var/log/messages to look at fenced_can_ssh. As a user, the time and effort needed to diagnose this and find the right magical incantation was the hard part, not the ~30 seconds it took to actually run setsebool. Anything done in addition to avoid it failing in the first place is icing on the cake. Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ |