From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:22.214.171.124) Gecko/20070725 Firefox/126.96.36.199
Description of problem:
After configuring RHEL5 server to use Windows 2003 Active Directory servers for user authentication, certain processes usually hang when I try to get user info. I am just using pam-ldap and nss-ldap so my nsswitch.conf has "files ldap". I will attach my configuration notes, they are fairly exhaustive.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
id <username> hangs, 100% cpu utilization, no output
nscd also goes up to 100% cpu utilization
sshd process for user log hangs when user tried to login (after they provide uid)
getent passwd and getent passwd <username> both work
id <username> should immediately return user info (homedir, shell, etc)
selinux now disabled, nscd off, nothing comes up in syslog. Some users I can get "id" info back right away, some take time, but most just don't return.
Created attachment 196061 [details]
System is RHEL 5, all updates
Linux 2.6.18-8.1.10.el5 #1 SMP Thu Aug 30 20:43:28 EDT 2007 x86_64 x86_64 x86_64
2 x Xeon dual-core (4 cores total)
4 GB ram
I've seen mention of bindpolicy soft vs. hard, non-existant "nvram" group, and
comments about using a non-SMP kernel but with various random platforms... I
just don't see anything in syslog and I am getting as far as getting getent
passwd working so this really seems like a bug of some sort.
I was able to reproduce this problem on RHEL5.2 and RHEL5.3 with the same configuration as used by RHEL4.
On RHEL4 it is working correctly, while on RHEL5 CPU utilization is sky-high.
I attempted some tcpdump on the session to find it queries the DC and does not get answer.
The problem disappeared when I entered "ldap_version 2" to /etc/ldap.conf and waited for a short while (killing all processes using 100% CPU)
That this doesn't trigger a problem with "ldap_version 2" suggests that this might be related to referral processing, which isn't something the client even tries to do unless ldap_version is set to (or left at the default value of) 3.
The 5.5 build links with a newer version of openldap that fixed a few things in this area (notably #472920). Does it make a difference here?
As an aside, you may want to try setting "nss_initgroups backlink" in ldap.conf to make use of the group membership information AD prefers clients use -- there's a good chance it'll be faster.
It's been a while and I don't remember the work-around, but I've been using "referrals no". I tried removing "referrals no" which definitely used to give me problems, and it works fine now (nss_ldap-253-22.el5_4). So maybe this is fixed? I have also been using different attribute mappings than what's been provided with the stock config in the past.
nss_map_objectclass posixAccount user
nss_map_objectclass shadowAccount user
nss_map_attribute uid sAMAccountName
nss_map_attribute homeDirectory unixHomeDirectory
nss_map_attribute shadowLastChange pwdLastSet
nss_map_attribute gecos name
nss_map_attribute cn sAMAccountName
nss_map_objectclass posixGroup group
nss_map_attribute uniqueMember member
Everything else is fairly stock / generic. System config otherwise works with authconfig.
I just tried adding the nss_initgroups backlink as well. This does not appear to have significant impact for our site.
It seems that the problem is addressed in the later versions of the nss_ldap.
Closing the bug.