From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 Description of problem: After configuring RHEL5 server to use Windows 2003 Active Directory servers for user authentication, certain processes usually hang when I try to get user info. I am just using pam-ldap and nss-ldap so my nsswitch.conf has "files ldap". I will attach my configuration notes, they are fairly exhaustive. Version-Release number of selected component (if applicable): nss_ldap-253-3 How reproducible: Always Steps to Reproduce: (see attached) Actual Results: id <username> hangs, 100% cpu utilization, no output nscd also goes up to 100% cpu utilization sshd process for user log hangs when user tried to login (after they provide uid) getent passwd and getent passwd <username> both work Expected Results: id <username> should immediately return user info (homedir, shell, etc) Additional info: selinux now disabled, nscd off, nothing comes up in syslog. Some users I can get "id" info back right away, some take time, but most just don't return.
Created attachment 196061 [details] Installation notes
System is RHEL 5, all updates Linux 2.6.18-8.1.10.el5 #1 SMP Thu Aug 30 20:43:28 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux 2 x Xeon dual-core (4 cores total) 4 GB ram
I've seen mention of bindpolicy soft vs. hard, non-existant "nvram" group, and comments about using a non-SMP kernel but with various random platforms... I just don't see anything in syslog and I am getting as far as getting getent passwd working so this really seems like a bug of some sort.
I was able to reproduce this problem on RHEL5.2 and RHEL5.3 with the same configuration as used by RHEL4. On RHEL4 it is working correctly, while on RHEL5 CPU utilization is sky-high. I attempted some tcpdump on the session to find it queries the DC and does not get answer. The problem disappeared when I entered "ldap_version 2" to /etc/ldap.conf and waited for a short while (killing all processes using 100% CPU)
That this doesn't trigger a problem with "ldap_version 2" suggests that this might be related to referral processing, which isn't something the client even tries to do unless ldap_version is set to (or left at the default value of) 3. The 5.5 build links with a newer version of openldap that fixed a few things in this area (notably #472920). Does it make a difference here? As an aside, you may want to try setting "nss_initgroups backlink" in ldap.conf to make use of the group membership information AD prefers clients use -- there's a good chance it'll be faster.
It's been a while and I don't remember the work-around, but I've been using "referrals no". I tried removing "referrals no" which definitely used to give me problems, and it works fine now (nss_ldap-253-22.el5_4). So maybe this is fixed? I have also been using different attribute mappings than what's been provided with the stock config in the past. nss_map_objectclass posixAccount user nss_map_objectclass shadowAccount user nss_map_attribute uid sAMAccountName nss_map_attribute homeDirectory unixHomeDirectory nss_map_attribute shadowLastChange pwdLastSet nss_map_attribute gecos name nss_map_attribute cn sAMAccountName nss_map_objectclass posixGroup group nss_map_attribute uniqueMember member pam_login_attribute sAMAccountName pam_filter objectclass=User Everything else is fairly stock / generic. System config otherwise works with authconfig. I just tried adding the nss_initgroups backlink as well. This does not appear to have significant impact for our site. Thanks
It seems that the problem is addressed in the later versions of the nss_ldap. Closing the bug.