Bug 801567

Summary: yum update of SELinux policy (load_policy) hangs when run in cgroup
Product: Red Hat Enterprise Linux 6 Reporter: Ian Pilcher <ipilcher>
Component: policycoreutilsAssignee: Petr Lautrbach <plautrba>
Status: CLOSED WONTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: dwalsh, eparis, lvrabec, mgrepl, mmalik, plautrba, pvrabec, sdsmall, ssekidde
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-02 17:08:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ian Pilcher 2012-03-08 21:13:56 UTC
Description of problem:
I have a fanless system that runs Asterisk to manage my home phones, along with a number of other workloads.  Because of the importance of the Asterisk workload, and the fact that the system is fanless, I don't want a CPU-intensive job (such as compiling a new SELinux policy) to starve Asterisk of CPU or memory or risk overheating the system.  Thus, I have configured cgroups to restrict the amount of CPU and memory that certain processes can use.

I have added the following to /etc/cgconfig.conf:

# Limit restricted processes to 50% of the CPU and 512 MB

  group restricted {
        cpu {
                cpu.cfs_quota_us = 50000;
                cpu.cfs_period_us = 100000;
        }
        memory {
                memory.limit_in_bytes = 536870912;
                memory.memsw.limit_in_bytes = 536870912;
        }
  }

I have added the following to /etc/cgrules.conf:

  root:sshd    cpu,memory    restricted/

Since I almost always interact with the system remotely, this means that any interactive work is run in the "restricted" cgroups.

With this setup, running "yum update selinux-policy-{,targeted}" (or downgrade) hangs every time.  "ps fax" shows:

Ss  /usr/sbin/sshd
Ss   \_ sshd: root@pts/0
Ss       \_ -bash
S+           \_ /usr/bin/python /usr/bin/yum upgrade selinux-policy selinux-policy-targeted
S+               \_ /bin/sh /var/tmp/rpm-tmp.VgCDu5 2
S+                   \_ /bin/sh /var/tmp/rpm-tmp.VgCDu5 2
R+                       \_ load_policy

None of the processes can be killed.

Rebooting the system, turning off cgred, and restarting sshd will allow you to ssh in and complete the update/downgrade successfully (possibly after a bit of RPM database cleanup).

Version-Release number of selected component (if applicable):
policycoreutils-2.0.83-19.21.el6_2.i686

How reproducible:
100%

Steps to Reproduce:
I have reproduced this in a 32-bit RHEL 6.2 KVM guest:

 * Minimal install
 * Register with RHN
 * yum update --exclude=selinux-policy*
 * yum install libcgroup
 * Make the changes to /etc/cgconfig.conf and /etc/cgrules.conf described above
 * chkconfig cgconfig on && chkconfig cgred on
 * Reboot
 * Use "ps ax -O cgroup | grep sshd" to verify that sshd is running in the
   restricted groups
 * yum update
  
Actual results:
yum update will hang.  ps fax will show that it's waiting on load_policy.

Expected results:
yum update should complete.

Additional info:
This is extra-double fun, because it normally happens as part of a larger yum update.

Comment 2 Daniel Walsh 2012-03-09 15:17:16 UTC
I think this is mainly a problem of the memory, although I am not sure what we can do about this in RHEL6.  In RHEL7 we have shrunk policy and changed the way policy installs, so the load_policy is separate from the semanage command.  Which would probably use less memory.

Comment 3 Ian Pilcher 2012-03-09 15:29:00 UTC
To be clear, are you thinking that the problem is (a) the *amount* of memory available to the load_policy process (512MB less whatever is consumed by other processes in the cgroup) or (b) the fact that there is a memory governor at all?

I believe that you mean (a), which should be easy enough to test by increasing the memory limitation of the cgroup to match the amount of physical memory in the box/VM (1 GB).

Make sense?

Comment 4 Daniel Walsh 2012-03-09 15:50:11 UTC
Yes I think the system is running out of memory.

Comment 5 Ian Pilcher 2012-03-09 15:54:52 UTC
After changing the memory limit (memory.limit_in_bytes and memory.memsw.limit_in_bytes) to 1055727616, the yum update completes successfully, so it looks like Dan's hypothesis is correct.

It would be really nice if this could fail more gracefully -- i.e. if load_policy actually failed instead of just hanging, it would avoid potentially having partially completed larger yum transaction.  It would also be nice if something were logged somewhere.

Comment 6 RHEL Program Management 2012-05-03 05:37:22 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 7 Miroslav Grepl 2012-10-10 16:53:12 UTC
I believe we could start to shrink policy in RHEL6.5. Basically I am adding some fixes also to RHEL6.4 but we need to add more complex changes. Moving it to RHEL6.5

Comment 8 Ian Pilcher 2012-10-10 17:01:59 UTC
I still think that the biggest problem is the failure mode.  If load_policy could fail, even if it left the system in a state that required a reboot/relabel, that would be far better than hanging the whole yum transaction.

BTW, I saw this same problem when I updated from 6.2 to 6.3 -- even without the cgroups memory limit (1GB of memory, no swap).

Comment 10 Miroslav Grepl 2015-12-09 07:58:13 UTC
Is this a corner case? We don't plan complex changes in RHEL-6 policy and the same is also for policycoreutils.

Comment 11 Petr Lautrbach 2016-01-19 15:51:49 UTC
We don't have a fix yet and we're limited in capacity. I'm moving this to rhel-6.9 in case we would have something in future.

Comment 12 Petr Lautrbach 2016-11-02 17:08:09 UTC
Red Hat Enterprise Linux version 6 is entering the Production 2 phase of its lifetime and this bug doesn't meet the criteria for it, i.e. only high severity issues will be fixed. Please see https://access.redhat.com/support/policy/updates/errata/ for further information.

Feel free to clone this bug to RHEL-7 if it is still a problem for you.