Bug 1060227

Summary: libcgroup-pam-0.40.rc1-5.el6_5.1 is completely broken
Product: Red Hat Enterprise Linux 6 Reporter: Pär Lindfors <paran+rhbugzilla>
Component: libcgroupAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: Cui Chun <ccui>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.5CC: agrimm, caguado, ccui, jsafrane, lampe, ovasik, pasteur, riehecky, toracat, twiest, varekova
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 06:45:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pär Lindfors 2014-01-31 14:00:32 UTC
pam_cgroup.so is currently broken and does absolutely
nothing. This is due to a bad fix for bug 972893 that was
included in errata RHBA-2013:1685-1.

The patch from bug 972893 is included in the source RPM as
libcgroup-0.37-pam_cgroup.patch. The patch adds CGFLAG_USECACHE
so the PAM module now calls:

    cgroup_change_cgroup_uid_gid_flags(pwd->pw_uid, pwd->pw_gid, pid, CGFLAG_USECACHE);

However there is no code added to make the pam module to actually
initialize the cache in libcgroup, so the cache will be
empty. Unfortunately, cgroup_change_cgroup_uid_gid_flags() does
not return an error in this situation. It simply checks for a
match in its empty rule list, obviously doesn't find any, and
return success.

With PAM debugging enabled for pam_cgroup.so you get a log entry
like:

    Jan 31 14:18:22 n1 sshd[27712]: pam_cgroup(sshd:session):
    Changed cgroup for process 27712 with username paran.

but in really it have not done anything.

I think a good fix would be to revert the changes so the pam
module does not use the cache. Even if the necessary cache
initialization code would be added to the pam module, I think it
would have to be initialized for every login as pam is not a
daemon, so the slowness in bug 972893 would happen anyway.


Steps to Reproduce:

Short version:

Follow the instructions here and notice it does not work https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu_and_memory-use_case.html

Longer version:

1. Install:
[root@n1 ~]# libcgroup-pam

2. Put some trivial rules in cgrules.conf, cgconfig.conf:
[root@n1 ~]# cat /etc/cgrules.conf 
* cpu test/%u

[root@n1 ~]# cat /etc/cgconfig.conf
mount {
  cpu = /cgroup/cpu;
}
group test {
  cpu {}
}
template test/%u {
  cpu {}
}

3. init cgconfig:
[root@n1 ~]# service cgconfig start

4. enable PAM module.
For example, add "session optional pam_cgroup.so debug" to the
end of /etc/pam.d/sshd

5. Log in using SSH.
Observe that /var/log/secure reports success, but that the user
session have _not_ been added to any cgroup under /cgroup.

Expected results:
SSH session of user foo should be added to /cgroup/cpu/test/foo

Comment 2 Jan Chaloupka 2014-06-09 15:22:02 UTC
As you proposed to change:

cgroup_change_cgroup_uid_gid_flags(pwd->pw_uid, pwd->pw_gid, pid, CGFLAG_USECACHE);

to
cgroup_change_cgroup_uid_gid_flags(pwd->pw_uid, pwd->pw_gid, pid, 0);

But I am afraid this returns the performance back to the slow one. Even if right now PAM module is not working.

Thomas, have you noticed this behaviour (no process moved to its right group based on cgrules)?

Comment 3 Jan Chaloupka 2014-06-09 15:27:09 UTC
Thomas, what rules does the OpenShift use? As Pär Lindfors stated, it is not deamon so the cache is useless. The only possible solution is to replace CGFLAG_USECACHE with 0 and change rules to templates. Or somehow run the pam_module as a daemon. Then the cache is usefull.

Comment 4 Thomas Wiest 2014-06-09 15:33:17 UTC
Andy Grimm knows this situation better than I do.

Andy, can you answer Jan's questions in comments 2 and 3 please?

Comment 5 Andy Grimm 2014-06-10 15:19:08 UTC
The short answer is:  correct functionality is 99% more important than performance here.  The patch was bad and should be reverted.

Longer answer:  My thinking was that since sshd is a daemon, it would load the module (and the cache) once, and then logins after the first one would be able to use the cache and thus perform better.  There are two problems:  1) the cache is not being initialized, and 2) I think the pam modules are actually loaded separately in each forked sshd[priv] process, in which case there would be no performance benefit, anyway.

Also related: our primary concern was on systems with thousands of users/cgroups, and we no longer have systems with this density.  So while I'm not happy about the poor performance, it's less of an issue than it had been.

If you have ideas on how to optimize the parsing of the cgconfig file (e.g., cache the password entries instead of re-reading /etc/passwd for every row), I would welcome that, but right now the important thing is to make the module functional.

Comment 6 Jan Chaloupka 2014-06-12 08:19:01 UTC
>If you have ideas on how to optimize the parsing of the cgconfig file (e.g., >cache the password entries instead of re-reading /etc/passwd for every row), I >would welcome that, but right now the important thing is to make the module >functional.

Decompose cgrules.conf into more files and read only the needed file. Something like cgrules<hashid>.conf. Hash username and use it to find the correct cgrules file. But this is rather pam/sshd specific implementation, not for upstream, only RHEL.

Looking at libcgroup sources there is no template reading. So this won't work:

template test/%u {
  cpu {}
}

For the moment, I will turn off the cache as it is not working at the moment.

Comment 12 errata-xmlrpc 2014-10-14 06:45:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1480.html