Description of problem: Changes in bash-3.2 cause commands to fail when wrapped with $() or `` if nscd is not running.. How to test: Set up system with local user/local passwd and user with remote user/remote passwd in ldap. Log in as local user.. default bash profile works. Log in as remote user.. default bash profile fails. Last login: Thu May 22 14:27:40 2008 from canopus.unm.edu -bash: [: =: unary operator expected -bash: [: -le: unary operator expected -bash: [: ==: unary operator expected using a bash -x we can see it clearer. ++ . /etc/profile.d/vim.sh +++ '[' -n '3.2.25(1)-release' -o -n '' -o -n '' ']' +++ '[' -x /usr/bin/id ']' +++ '[' -le 100 ']' bash: [: -le: unary operator expected Backing off to bash-3.1 from 5.1 removes the problem. Turning on nscd also removes the problem. An easier way to see the problem is without nscd [smooge@kore ~]$ x=$(/bin/ls -l) [smooge@kore ~]$ echo $x [smooge@kore ~]$ with nscd [smooge@kore ~]$ x=`/bin/ls -l` [smooge@kore ~]$ echo $x total 81896 -rw-rw-r-- 1 smooge dsys 1925627 Jan 18 2007 bash-3.1-16.1.i386.rpm -rw-r--r-- 1 smooge dsys 1946698 May 22 12:54 bash-3.2-21.el5.i386.rpm drwx------ 3 smooge dsys 4096 Apr 30 15:29 bin -rw-r--r-- 1 smooge dsys 17585 May 6 14:16 iptables.dns-master -rw-r--r-- 1 smooge dsys 18215 May 15 14:11 iptables.m_mice -rw-r--r-- 1 smooge dsys 14455556 May 6 15:32 mm-20080506.tgz drwxr-xr-x 4 smooge dsys 4096 Nov 30 2005 mmsuite-5.1.3 -rw-r--r-- 1 smooge dsys 7051456 Jan 26 2007 mmsuite-5.1.linux.tgz drwxr-xr-x 5 smooge dsys 4096 Feb 25 08:32 mmsuite-5.7.0 -rw-r--r-- 1 smooge dsys 12054744 Mar 5 13:10 mmsuite-5.7.0.linux.tgz -rw-r--r-- 1 smooge dsys 2689481 Mar 5 13:11 mmsuite-cmd-5.7.0.linux.tgz -rw-r--r-- 1 smooge dsys 41220068 Jan 26 2007 mmsuite-web-5.1.linux.tgz drwxr-xr-x 5 smooge dsys 4096 Feb 25 08:34 mmsuite-web-5.7.0 -rw-r--r-- 1 smooge dsys 2206876 Mar 5 13:11 mmsuite-web-5.7.0.linux.tgz drwxr-xr-x 4 smooge dsys 4096 Apr 29 2005 mmwebint -rw-r--r-- 1 smooge dsys 2362 May 6 14:21 named-template.conf -rw-r--r-- 1 smooge dsys 2879 May 6 14:25 root.hint drwxr-xr-x 3 smooge dsys 4096 May 6 15:45 var -rw-rw-r-- 1 smooge dsys 26426 May 21 15:56 x [smooge@kore ~]$ I am putting this to medium impact for us as we have a work-around (nscd) but it is impacting a lot of scripts and our ldap/dns servers do not usually run nscd.
Moving to nss_ldap. This seems to be the root problem.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
The bigger issue in here is su is not working for LDAP users in case nscd is not running. So that means we can get root access only by logging from console. This can be re-produced by simply enabling user information and authentication from ldap without caching and TLS on. I haven't had a possibility to test if the behavior is similar without TLS( or ldaps).
We've also had to revert to earlier nss_ldap (253-5) to resolve problems on systems configured for ssl lookups of group information from ldap. In our case the problem causes cfengine (3rd-party configuration management app) to hang as a result. (Bugs 448016 and 447881 look like likely duplicates?)
Adding "service named start" into /etc/rc.local worked around the problem because by default named starts before nscd in the boot sequence. Not sure why downgrading nss_ldap only fixed dhcpd and not named.
Created attachment 306988 [details] proposed fix It looks like the child portion of the atfork handler is pretty consistently hitting SIGPIPE. Most LDAP operations in nss_ldap are performed with SIGPIPE blocked, but this wasn't.
Created attachment 306995 [details] the upstream fix Fix used upstream.
*** Bug 447881 has been marked as a duplicate of this bug. ***
Thanks Nalin. I think my other bug is also a duplicate of this (break with su). I can't mark it as a duplicate though.
We are seeing the same issue in CentOS-5.2 testing, the attachment in #9 seems to fix the issue in our nss_ldap as well.
And other relational problem is with crontabs for LDAP/remote users. After upgrade to 5.2, this crontabs don't run !! :-( Downgrade nss_ldap for version nss_ldap-253-5.el5 work fine. tested in two systems (64bit) bye
Any word on a fasttrack fix.. we had 20 boxes 'broke' this morning because they got rebooted and pipe's in scripts didn't work for some users. Tryign to see if it was nscd problem or some other one.
We have the same problem that appeared after the upgrade to RHEL 5.2 (in particular nss_ldap-253-5 -> nss_ldap-253-12). What I also noticed is that if I deactivate SSL in /etc/ldap.conf, then "su - <username>" and login from the console work again. As work-around I added "auth sufficient pam_ldap.so" and/or "account sufficient pam_ldap.so" to the following pam.d files: "su", "login" solving the problem mentioned above, but also to "gdm", "gnome-screensaver" and "kscreensaver". Tested on i386 and x86_64
we have created testing RPMS/SRPMS that based on comment #9 above for CentOS. Here is a link for those it might help: http://people.centos.org/hughesjr/nss_ldap/5/
(In reply to comment #23) > we have created testing RPMS/SRPMS that based on comment #9 above for CentOS. > Here is a link for those it might help: > > http://people.centos.org/hughesjr/nss_ldap/5/ Thanks, WORKSFORME (x86_64).
*** Bug 452550 has been marked as a duplicate of this bug. ***
I have this problem with 1. nscd running: [flengyel@nept ~]$ ps aux | grep nscd nscd 3592 0.0 0.0 279092 3808 ? Ssl 00:33 0:00 /usr/sbin/nscd flengyel 18143 0.0 0.0 61168 732 pts/1 S+ 09:28 0:00 grep nscd [flengyel@nept ~]$ 2. SELinux in enforcing mode and 3. SSL START_TLS enabled in /etc/ldap.conf Disabling SELinux and commenting out the SSL line in /etc/ldap.conf resolves the problem, but is a completely unacceptable solution. This occurred after a YUM UPDATE on two RHEL 5 systems--I'm shocked.
Following up on the comment previously, these are the related SELinux errors host=nept.gc.cuny.edu type=AVC msg=audit(1214454807.368:42): avc: denied { connectto } for pid=3818 comm="sh" path="/var/run/nscd/socket" scontext=system_u:system_r:setroubleshootd_t:s0 tcontext=system_u:system_r:initrc_t:s0 tclass=unix_stream_socket host=nept.gc.cuny.edu type=SYSCALL msg=audit(1214454807.368:42): arch=c000003e syscall=42 success=no exit=-13 a0=9 a1=7fffec28f870 a2=6e a3=3 items=0 ppid=3377 pid=3818 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="sh" exe="/bin/bash" subj=system_u:system_r:setroubleshootd_t:s0 key=(null)
I filed for bug #453372 yesterday. It is caused by the same issue, but we have had nscd running all along and it does not help with that. The patch in comment #9 fixes both for us (i386). Now that I have worked without nscd for a while, I've encountered erratic and puzzling behaviour or downright crashes in many more places. The magnitude of the regression introduced by nss_ldap-253-12 is huge. Please fix as soon as possible.
I ran into this problem as well. While running nscd allows my LDAP users to log in over SSH, it totally broke OpenPBS (Torque) interactive logins. The session hangs or terminates (it seems to alternate randomly) as soon as the remote shell is started. I would like to echo the other comments in requesting a fix for this ASAP.
Is this delayed because the posted patch may be insufficient or is it due to the Red Hat release process?
(In reply to comment #31) > Is this delayed because the posted patch may be insufficient or is it due to the > Red Hat release process? I think the patch in question will fix this.
I thought I should mention that while this patch does fix the immediate issue listed in this thread, we are now experiencing even more problems related to the same 253-12 nss_ldap upgrade. I've created bug 454675 for the new problems.
We're hitting this too. For added fun, switch your shell to tcsh -- any command you try to run will fail with Broken Pipe.
Add another one to the list having this issue accross ALL our servers, temporarily we have enabled nscd on all, but this is not really what we wanted to do? nearly 2 month redhat have been aware of this issue and still no fix? What the hell do we pay for support for? - might as well just use CentOS
"What the hell do we pay for support for?" For marketing and the privilege of being consumers. Red Hat Enterprise Linux has a history of trouble with LDAP. We have a computational cluster on which numerous jobs were failing with errors like the following: Job 27814 caused action: Job 27814 set to ERROR User = vnamenskiy Queue = p3.q.cuny.edu Host = m01.gc.cuny.edu Start Time = <unknown> End Time = <unknown> failed assumedly before job:can't get password entry for user "vnamenskiy". Either the user does not exist or NIS error! Here's another one: Sent: Wed 5/21/2008 8:21 PM > To: Lengyel, Florian > Subject: GE 6.0u10: Job 28424 failed > > Job 28424 caused action: Job 28424 set to ERROR > User = gqian > Queue = p3.q.cuny.edu > Host = m05.gc.cuny.edu > Start Time = <unknown> > End Time = <unknown> > failed assumedly before job:can't get password entry > for user "gqian". Either the user does not exist > or NIS error! This was attributable to Red Hat's LDAP handling. An upgrade of RHEL 5 fixed this, only very recently. By then the damage was done, the Red Hat portion of the cluster was unusable, and some users gave up in frustration. I also gave up and switched the newer nodes to Open Suse.
I have recently encountered this exact same issue while configure rhel 5 boxes to auth against w2k3 AD server. As others have mentioned, disabling ssl/tls in ldap.conf "works around the issue" but is not ideal. Upgrading to a newer nss_ldap such as the version shipped with Fedora also works to fix the problems (I just rebuilt the SRPM on my rhel5 box). Seems like this issue is fixed upstream. Is there any chance or "official" update getting pushed out? This bug report is almost 2 months old now.......
Seriously now, it appears that upstream have fixed the issue how much longer must we wait for an official RPM from RedHat? Can someone from redhat please respond.. I will be taking this up with my red hat account rep. tomorrow and pointing them to this bug thread as an indication as to the lack of support and pathetic response times from redhat! How many people have to tell you that this is affecting them before you will take action? Especially when the action is just to add the upstream fix? If there is more to this then that then why not lets us know? I am reassinging to QA owner as the current assignee doesnt appear to be taking this seriously at all (hopefully it is not the same person)
I think this is definately one of those "how can we trust our supplier in the future" bugs that comes up with every OS. It will probably be a lot of account reps questions. If the bug has been put to rest in upstream and a hotfix available, why did is it taking so long to get out of Product Management? Why wasn't a status report from PM of it being delayed added to either this or tickets? Is it due to 4.7 General Release or "waiting for PM to finish level 40 of nethack?".
I've tried to stray from general comments, as they aren't helpful to the bug process, but this has been 2 months now and this is a CRITICAL bug. It breaks SO much on our systems that we've had to downgrade to version 253-5. The patch on this list fixed some of our problems, but not all. Seriously, out of 20 machines running centos/redhat, this is the only one we PAY for support for, and still nothing!!! Even centos popped the patch into an RPM and released it. Come on RedHat, give us what we pay for please. And it aint broken systems.
While defects can be reported using Red Hat's Bugzilla, please be advised that Bugzilla is not a support tool and thus does not offer any Service Level Agreement (SLA) for issues reported through that channel. If you require technical support assistance, please direct those requests as per you support contract with us. Our technical support staff would be glad to assist you with issues relating to this bug.
While I completely agree that it's been a ridiculous amount of time for this bug to stay open, the RPM's in comment 24 have been working well for us. I do hope this gets fixed by RH soon though.
This is interesting what RH say in comment 56. As machines we are seeing this on have full RH support on them. But (as in this case) I've usually never bothered to open support tickets where there is an existing Bugzilla? It just seemed overkill and duplication, I just assumed RH would be aware the issue was effecting supported customers (after all they are in the majority on here?). Is it ok just to open a support ticket and say sort of, I need a fix for this bugzilla number? (not talking in this case now it seems pretty high priority now). I always thought support tickets were more for an individual (perhaps configuration problem), and not a known bug?
Chris Evich (comment #56) -- you're exactly right. This is not a support tool for our benefit. It is a tool for YOUR benefit to which we are contributing. I know you're coming at this from a damage-control point of view, but you've got it all wrong. As you can see, few of the contributors here need technical support beyond what's already offered. What we need is *an official errata package*. And it's a good idea on your part to at least keep us updated on the status of that package, because *we're on your side*.
(In reply to comment #56) Chris, the problem with your argument is that since the last two months each and every customer who uses nss_ldap runs into this problem and then has to contact technical support to identify the cause. Do you know how much customer time is wasted by this? Customers don't want to run in the problem to begin with or at least get a prompt fix (automatically installed from RHN). This is a Priority==urgent bug.
(In reply to comment #58) Yes, you absolutely should open a support case and reference this bugzilla number. Especially for critical bugs like this, global support services has more options to assist you then are offered through bugzilla.
Comment 56 is a particularly stupid, inflammatory thing to say; it made my blood boil even though _I_ am not affected. Just a few minutes ago I was praising RH support. How stupid I feel!
Guys please don't take the comments in #56 out of context - I work in GSS and our mode of operation does not include monitoring bug reports via Bugzilla. We don't use bugzilla.redhat.com as a primary escalation point in GSS at all. Bugzilla is an engineering tool and we use it as such. If you have problems and have support really the BEST way to get that support is via the support channels. And in particular if you need direct assistance to resolve the problems reported in this Bugzilla, call us at GSS or raise a support request because we _can_ actually help you. I have helped a couple of customers in this very issue recently. Cheers
If a workaround is what you want here is what I have suggested already: ###### - Check what version you have installed: # yum list nss_ldap Loading "rhnplugin" plugin Loading "security" plugin rhel-x86_64-server-5 100% |=========================| 1.4 kB 00:00 ..., Installed Packages nss_ldap.x86_64 253-12.el5 installed nss_ldap.i386 253-12.el5 installed - Install yum-utils ( To get yumdownloader script) # yum -y install yum-utils ..., Running Transaction Installing: yum-utils ######################### [1/1] - Download the previous version of nss_ldap # yumdownloader nss_ldap-253-5.el5 Loading "rhnplugin" plugin rhel-x86_64-server-5 100% |=========================| 1.4 kB 00:00 ..., nss_ldap-253-5.el5.x86_64 100% |=========================| 1.4 MB 00:00 nss_ldap-253-5.el5.i386.r 100% |=========================| 1.4 MB 00:00 # ls nss_ldap* nss_ldap-253-5.el5.i386.rpm nss_ldap-253-5.el5.x86_64.rpm - Downgrade the package using rpm : # rpm -Uvh --oldpackage nss_ldap-253-5.el5.*.rpm Preparing... ########################################### [100%] 1:nss_ldap ########################################### [ 50%] 2:nss_ldap ########################################### [100%] - Let RHN know that you have downgraded the packages: # rhn-profile-sync Updating package profile... Updating hardware profile... Reboot and you will avoid the problem. If this workaround is not suitable for you _call GSS_ and we will help further. Cheers
You may have merely "suggested" it previously, but the helpful detailed information below was not forthcoming until now. What made you decide to provide detailed helpful information at this late stage--albeit in a snottily grudging tone, as if you were mightily put upon to have to provide anything beyond a vague suggestion? Cheers yourself.
Florian, I do apologize if you interpreted my comments as rude - that was not my intention. Note that my suggested workaround was sent to other customers that communicated via our support channels, not in Bugzilla. I added the suggested workaround here because it became obvious to me that people were not getting the message to contact support!! The reason that my information was not provided here initially is because I do not use bugzilla.redhat.com to communicate to customers. My primary channel is via our support phone line or via the web support systems. Bugzilla is an engineering tool. The message to take away is - if you want support contact GSS. This message is not intended as a rebuff, a challenge or meant to insult. I aim to help. Finally, I will end my communication here in Bugzilla on this subject now. Since Bugzilla is an engineering tool, discussions like this do not belong here. Kind Regards, Michael
The only thing I have against this, is that I am an Academic Subscriber. We HAVE NO support other than bugzilla. I could understand if someone had a problem, say with a configuration issue, or maybe a problem with their installation and need help diagnosing whether it is a bug from red hat, or a local configuration issue. This is a clear "support" issue. But to say that we somehow don't have the right, or rather, that any urgency expressed in bugzilla is somehow misplaced I believe is just wrong. To be even more to the point, we didn't pay for RedHat, albeit an academic license, to have broken packages. You are somehow infering that you MUST pay for support to get a redhat system that works. I don't expect you to diagnose my problem personally for me. But when something like this is sat as Urgent for weeks, it'd be nice to know redhat was even planning to release this to 5.2 customers. And that's right, a CUSTOMER. Not just some people spouting off lines at you, BUT CUSTOMERS!! If bugzilla isn't monitored by you guys as a support device, then close it off, and keep it internal. What do you need us for anyway?
FWIW, there is an official erratum rpm now: nss_ldap-253-13.el5_2.1 See bug #455271 for details.
This does indeed resolve my problems with the previous version. Thanks for the updated package. Really appreciated.
As similar misbehavior can occur when nss_ldap is in use, but nscd has been improperly disabled on a system it may be worth while to have an nss.sh in /etc/profile.d that has something to the effect of. ( uid=$(/usr/bin/id -u) [ -z "$uid" ] && echo "NSS appears to be misconfigured. Please have your admin perform NSS problem determination before reporting a bug." ) I mention this as krb5-devel and krb5-workstation and vim-enhanced all have incorrect .sh files that give the error message that started this bug report.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0241.html
*** Bug 448016 has been marked as a duplicate of this bug. ***
*** Bug 454292 has been marked as a duplicate of this bug. ***
*** Bug 564108 has been marked as a duplicate of this bug. ***