Bug 1335986
Summary: | BUG: memory allocation failure in inode_doinit_with_dentry()/context_to_sid() | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Orion Poplawski <orion> | |
Component: | kernel | Assignee: | Ondrej Mosnacek <omosnace> | |
kernel sub component: | SELinux | QA Contact: | Milos Malik <mmalik> | |
Status: | CLOSED ERRATA | Docs Contact: | Lenka Špačková <lkuprova> | |
Severity: | high | |||
Priority: | high | CC: | abetkike, dhoward, ksrot, lmanasko, mmalik, mthacker, omosnace, orion, plautrba, sbarcomb | |
Version: | 7.2 | Keywords: | Patch, Reproducer, ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-1029.el7 | Doc Type: | Bug Fix | |
Doc Text: |
.An SELinux policy reload no longer causes false ENOMEM
Reloading SELinux policy previously caused the internal security context lookup table to become unresponsive. Consequently, when the kernel encountered a new security context during a policy reload, the operation failed with a false "Out of memory" (ENOMEM) error. With this update, the internal Security Identifier (SID) lookup table has been redesigned and no longer freezes. As a result, the kernel no longer returns misleading ENOMEM errors during an SELinux policy reload.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1656787 1710468 1760823 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 12:05:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1710468, 1760823 |
Description
Orion Poplawski
2016-05-13 18:16:47 UTC
In this case the error code you are seeing (12/ENOMEM) indicates that the kernel was unable to allocate enough memory to properly initialize the file's security label. Was the system under heavy memory pressure when this occurred? Have you noticed this on other systems, or was it a one time occurrence? It was during boot, and I would not expect much memory pressure. Currently the machine reports (which should be typical): total used free shared buff/cache available Mem: 7983480 1278868 142128 401236 6562484 5979592 Swap: 16547832 184464 16363368 Looking at logs again I found: Jun 9 05:44:25 mail python: ansible-<stdin> Invoked with warn=True executable=None _uses_shell=F alse _raw_params=/usr/sbin/semanage fcontext -a -t httpd_var_lib_t "/var/lib/vacation_sieve(/.*)?" removes=None creates=None chdir=None Jun 9 05:44:26 mail clamav-milter: ERROR: Failed to initiate streaming/fdpassing Jun 9 05:44:26 mail clamav-milter: LibClamAV Error: cli_gentempfd: Can't create temporary file /tmp/clamav-3b8174a1a592d17afc580d6ad832cd08.tmp: Cannot allocate memory Jun 9 05:44:26 mail clamav-milter: ERROR: Failed to create temporary file Jun 9 05:44:26 mail clamav-milter: ERROR: Failed to initiate streaming/fdpassing Jun 9 05:44:26 mail kernel: SELinux: inode_doinit_with_dentry: context_to_sid(system_u:object_r:selinux_login_config_t:s0) returned 12 for dev=md3 ino=1064 Jun 9 05:44:26 mail kernel: SELinux: inode_doinit_with_dentry: context_to_sid(system_u:object_r:lastlog_t:s0) returned 12 for dev=sda1 ino=4063265 Jun 9 05:44:27 mail dbus-daemon: dbus[789]: avc: received policyload notice (seqno=2) Jun 9 05:44:27 mail dbus[789]: avc: received policyload notice (seqno=2) Jun 9 05:44:27 mail dbus-daemon: dbus[789]: [system] Reloaded configuration Jun 9 05:44:27 mail dbus[789]: [system] Reloaded configuration So perhaps something about ansible or semanage triggers memory pressure. # ls -ldZ /etc/selinux/targeted/logins drwxr-xr-x. root root system_u:object_r:unlabeled_t:s0 /etc/selinux/targeted/logins I remember now fixing the label on lastlog recently, but couldn't figure out what had changed it then. This must have been it. # find /var -xdev -inum 4063265 /var/log/lastlog Can you reliably trigger the problem on boot? I doubt it. The ansible commands are run at least once a day, and I've only seen it triggered twice so far. Oh goodie, I do enjoy these sporadic bugs. I'll keep this BZ open in case we get more reports, or you're able to reproduce it on a more reliable basis, but as it currently stands I'm not sure there is much we can do. Please keep us current if things change, or get worse. Hmm, I take that back a bit. The system has been rebooted twice, and each time this error occurred. So maybe it is reproducible to an extent. It's our primary (only) company email server so it's not rebooted that often. I haven't seen it on any other machine. Any kind of diagnostic information that would be useful to collect the next time I reboot? Nothing specific to this failure immediately springs to mind, other than what you provided in comment #3. Although problems like this tend to be very transient and hard to capture/snapshot. Let me think on this further, but we may be able to do some work to reduce/eliminate the need for the string copies in security_context_to_sid_core() which would help resolve the problem you are seeing. (NOTE TO FUTURE SELF: this involves changes to string_to_context_struct() and the hash functions) Hi Orion, It has been a few months and I just wanted to check on this and see if you are continuing to see this issue? Well, there has only been one reboot since last time (yesterday to get 3.10.0-327.36.1) and that didn't trigger it. Haven't seen it anywhere else either. Not much data I'm afraid. Orion, have you been able to reproduce this at all since last September? If you are not seeing this problem anymore I'm tempted to close this BZ as there is nothing immediately obvious in the code. Looks like I have one occurrence in the logs I have: May 18 10:15:54 saga python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=/usr/sbin/semodule -u /root/ansible-selinux/zabbix_lvs.pp removes=None creates=Nonechdir=None May 18 10:16:14 saga kernel: SELinux: inode_doinit_with_dentry: context_to_sid(system_u:object_r:virt_etc_t:s0) returned 12 for dev=dm-3 ino=6377871 Different server but during boot again. This was with 3.10.0-514.16.1.el7. Thanks Orion; let's keep this open. Not 100% sure, but this looks like an instance of this bug tracked upstream: https://github.com/SELinuxProject/selinux-kernel/issues/38 Hi Orion, FYI, I inherited this bug from Paul Moore, who no longer works at RH. I see you Cc'd my private e-mail address (apparently following Paul's advice from the GH discussion), but as you can see I am already subscribed via my work address :) Anyway, it seems this bug has a simple one-line workaround (see the linked GH issue) so I'll try to come up with a reproducer and see if the workaround works. If we can show that it resolves a real problem then I think it has a good chance to be accepted upstream and then I could backport it. I would like to eventually overhaul the whole policy reload to fix this properly (along with a bunch of other issues) but that will take some time... I found a simple reproducer via runcon (which calls setexeccon(3)): while true; do load_policy; echo -n .; sleep 0.1; done & for (( i = 0; i < 1024; i++ )); do runcon -l s0:c$i echo -n x || break done (It takes some time to trigger, but if you are patient enough it happens.) Unfortunately, the EINTR workaround doesn't work out-of-the box for runcon, because it also calls in security_check_context(3), which does not retry on EINTR. This could be fixed in libselinux, but... Instead of runcon -l <...> one can reproduce this also with chcon -l s0:c$i /some/file, which triggers the bug via setxattr(2). However, setxattr(2) isn't expected to return EINTR (see man pages), so this pretty much foils the whole just-return-EINTR plan :( Hi Orion, would you mind if I made this BZ publicly accessible? I'm asking because there are some log outputs in description/comments that you might consider sensitive. (There is probably nothing important, but to be on the safe side I rather ask :) Yes, that would be fine. (In reply to Orion Poplawski from comment #19) > Yes, that would be fine. Thanks! Let's see if it works... Patches to fix this problem have now been staged upstream for inclusion in the 4.21 kernel release: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git/commit/?h=next&id=24ed7fdae669feda4c5e0dadba2467c4c0d297d3 https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git/commit/?h=next&id=ee1a84fdfeedfd7362e9a8a8f15fedc3482ade2d Patch(es) committed on kernel-3.10.0-1029.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2029 |